Linearly Bounded Reformulations
of Conjunctive Databases
(extended abstract)

Rada Chirkova and Michael R. Genesereth

Stanford University, Stanford CA 94305, USA
{rada, genesereth}cs. stanford. edu

Abstract. Database reformulation is the process of rewriting the data
and rules of a deductive database in a functionally equivalent manner.
We focus on the problem of automatically reformulating a database in a
way that reduces query processing time while satisfying strong storage
space constraints.
In previous work we have investigated database reformulation for the case
of unary databases. In this paper we extend this work to arbitrary arity,
while concentrating on databases with conjunctive rules. The main result
of the paper is that the database reformulation problem is decidable for
conjunctive databases.

I Introduction

In the life cycle of a database system, there are recurring problems whose solu-
tion involves transformations of the database schema and/or queries defined on
the schema. Prominent examples are database design, data model translation,
schema (de)composition, view materialization, and multidatabase integration.
Interestingly, nearly all these problems can be regarded as aspects of the same
problem in a theoretical framework that we proceed to describe.
Consider an abstract database transformation problem. Suppose the input to
the problem comprises the schema and rules of a deductive database and a set of
elementary queries which, together with some algebra, forms a query language
on the database. Suppose the objective of database transformation is to build
an "optimal" structure of the database with respect to the requirements and
constraints that are also provided in the input.
Generally, the transformations of the given database schema and rules need
to be performed in such a way that the resulting database satisfies three conditions.
First, it should be possible to extract from the transformed database, by
means of the input query language, exactly the same information as from the
original database. Second, the result should satisfy the input requirements, such
as minimizing query processing costs. Finally, the result should satisfy the input
constraints; one pervasive constraint is a guarantee of a (low) upper bound on
the disk space for storing the transformed database. Notice that since the input

does not include a specific database instance, all three conditions must hold for
all instances of the input database and of the transformed database.
We call this problem database reformulation and consider logic-based approaches
to its solution; a formal definition of the problem is the first contribution
of this paper. Database reformulation is the process of rewriting the data
and rules of a deductive database in a functionally equivalent manner: it takes
as input the schema and rules of a database and a characterization of a query
language, and produces as output new schema and rules, as well as a rewriting of
(all elementary queries of) the query language in terms of the schema and rules.
Notice that database reformulation, unlike query optimization, rewrites multiple
rules, thus amortizing over many queries. By specifying various input requirements
and constraints, the database reformulation problem translates into any
of the database schema/query transformation problems mentioned above.
We focus on database reformulations whose input requirement is to minimize
the computational costs of processing the given elementary queries, under
strong storage space constraints that guarantee no more than linear increase in
database size. In this formulation, the database reformulation framework is most
suitable for dealing with the problems of view materialization and multidatabase
integration.
This paper treats the base case where all rules in the reformulation input are
conjunctive. We show that for such inputs, the database reformulation problem,
in the narrow sense described above, is decidable: we describe an algorithm which
outputs a solution if there exists at least one satisfactory reformulation of the
input. This result is the second contribution of this paper. In previous work
we have proposed a solution to the reformulation problem for unary databases;
notice that the solution described in this paper works for deductive databases
that contain relations of arbitrary arity. Our long-term objective is to extend
research in database reformulation to deductive databases whose queries and
views are formulated in progressively more general standard query languages
(datalog and its extensions), as well as to databases with integrity constraints.
In this extended abstract, all proofs have been omitted. The proof of the
main result and additional examples can be found in the appendix.

2 Preliminaries and Terminology

Our representation of the domain includes a set of relations; the set of attributes
for a relation is called a relation schema. A database schema, for a given database
D, is a collection of relation schemas for all stored relations in D.
A function-free Horn rule is an expression of the form

p() : - p(), ..., (1)

where p and p, ... , Pn are relation names, and , , ... , , are tuples of
variables and constants such that any variable appearing in  appears also in
? u... u 2.

A conjunctive query (view) is a single non-recursive Horn rule. A conjunctive
query (view) relation is the relation defined by a conjunctive query (view).
Given a query q, a query q is called a rewriting of q in terms of a set  of
view relations if q and q are equivalent and q contains only literals of , i.e.,
q is defined in terms of . If a query relation qv is defined in terms of a set
, and if Tv is a set of definitions of view relations in , an expansion qnv of
qv, in terms of Tv, is a query obtained by replacing, in the body of qv, every
occurrence of a view literal vi (vie ) by its body in Tv, with suitable variable
renamings.
In this paper we use the notion of a substitution, and in this respect generally
follow the terminology of [24]. The substitutions we consider are of the form
 v - t, ... , v - t ) where all vi's are distinct variables, and each ti is
either a variable from among v, v2, ... , v or a constant.
A containment mapping [9] from a query q to a query q2 is a mapping from
the variables of q into the variables of q, such that every literal in the body of
q is mapped to a literal in q, and the head of q is mapped into the head of
q. If both q and q are conjunctive and neither contains built-in predicates, the
existence of a containment mapping from q to q2 is a necessary and sufficient
condition for q to contain q; this result is called the containment mapping
theorem.

A conjunctive query q' is a minimal equivalent to a conjunctive query q if it
has as few subgoals as any query equivalent to q [34]. Both minimization and
equivalence of conjunctive queries are shown to be NP-complete in [4, 9].

3 Concept of Database Reformulation

This section gives a definition and a formal specification of the database reformulation
problem for the general case where rules are not necessarily conjunctive.
Database reformulation is the process of rewriting the data and rules of a
deductive database in a functionally equivalent manner; we use the word "reformulation
" both for the process and for its output. We focus on the problem
of automatically reformulating a database in a way that reduces the processing
time for a prespecified set of queries while satisfying strong storage space constraints.
The prespecified set of queries is the set of elementary queries in the
input query language; see the Introduction for details.
Let us describe the input and the output of the database reformulation pro-
cess. Consider a set 7 ) of relation names. Let $ consist of schemas for some rela-

tion names in 7); $ is the input database schema. Let T$ be a set of definitions,
in terms of $, for some relations whose names are in 7); T$ is the set of rules in
the input. Let Q be a set of names of all elementary query relations in the input,
such that Q C_ 7 ) and that T$ contains definitions of all relations in Q.
Now, let  consist of schemas for some relation names in 7);  is the output
database schema, i.e., the set of schemes of new stored relations which are materialized
in the process of database reformulation. Finally, let Tv be a set of
views defined in terms of ; Tv is the set of rules in the output.

Definition 1. For a given triple ($, 7s, Q), a triple (, 7v, Q) is a reformulation
of ($,7$, Q) if for each query relation in Q with a definition q$ in 7$, 7v
contains a rewriting of q$.

Let Ds be an arbitrary database with schema $; let Dv be a database that
consists of the tables for all and only those (materialized, starting from Ds)
view relations in )2 that are used to define Q. For a fixed database schema ,
and for a fixed set of definitions of view relations in )2 in terms of ,, consider all
possible databases Ds and all corresponding databases Dr, with sizes (in bytes)
ID$1 and IDyl respectively.

Definition 2. A reformulation (, 7v, Q)of an input ($, 7$, Q)is linearly
bounded with parameter t, where t is a positive constant, if for all pairs (Ds, Dr)
the storage space IDyl taken up by Dv is no more than linear in both ID$1and t:

IDvl t  IDl. (2)

One example of a linear bound is the no-growth bound; there, t = 1. The
no-growth bound may be too restrictive for some applications.

4 Views with Bounded Definition Length

Our objective is to automate the database reformulation process for as general
query languages as possible; in other words, we strive to design reformulation
algorithms. To that end, it is first necessary to understand, for each class of
query languages, whether the potentially infinite, for each input, search space
of reformulations can be transformed in such a way that it becomes finite but
still contains valuable reformulations. One way of making the search space of
reformulations more tractable is to restrict the number of view relations that

are used to generate rewritings of the input queries.
In this paper we focus on reformulating, in terms of conjunctive views, deductive
databases where all rules are conjunctive; in the remainder of the paper
we will denote conjunctive queries (views) simply by "queries" ("views"). In
the conjunctive case, one might be able to restrict the number of views under
consideration by setting an upper bound on the number of subgoals in views.
Suppose we could show that in any "good" rewriting of an arbitrary conjunctive
input, all participating view relations can be defined by conjunctions of up to a
fixed number of subgoals. If this hypothesis were true, the problem of finding all
"good" reformulations of the given input would be reduced to the clearly feasible
problem of enumerating and combining all "short" views, thereby yielding an
enumeration procedure. Notice that we do not use the number of subgoals as a
cost measure for executing the query.
Consider a database schema $, a set  of view relations, and a set Tv of
definitions of relations in  in terms of $. Consider a query q$ in terms of $ and
a query qv in terms of , with the corresponding expansion qnv in terms of Tv.
For each view literal vi in the body of qv let us denote by ri the body of vi in

qnv. Suppose there is a containment mapping from qnv to q$; such a mapping is
called noniterleaving if no two ris in qnv map into the same subset of the body
of qs, and if the mapping preserves head variables of all the views involved.

Theorem I (Restricted: New Definitions with Fewer Subgoals). If q$
is minimal, q$ and qv are equivalent, and there is a noninterleaving containment
mapping from qnv to q$, then there exists a set 7g of (alternative) definitions
of the view relations in , such that each definition in 7g has no more subgoals
than q$.

Informally, the theorem states that for each view used to define qv, there
exists a "short" definition of the view that has no more subgoals than the original
query qs; thus, in the setting of the theorem, one can apply the enumeration
procedure described above to generate all views that can be used in any rewriting
of the input query.
Unfortunately, the result stated in Theorem i holds in an extremely limited
setting and thus is not very useful. It would be desirable to make a similar claim
for a more general case. Suppose we could show that if the queries q8 and qv
are equivalent then there exists a set 7g of (alternative) definitions of the view
relations in , such that each definition in 7g has no more subgoals than qs.
Notice that this formulation is similar to that of Theorem 1, except that we no
longer require that q8 be minimal or that there exist a noninterleaving mapping
from qnv to
This conjecture, however desirable, is not true: removing the noninterleaving
mapping requirement invalidates the claim. Consider a counterexample.

Example 1. Consider a database schema $ that consists of one binary relation
schema s, and consider a query qs:

qs(X, Y) :-s(X, ), s(, X); (3)

for a graphic depiction of qs, see Figure 1.
Consider a set )2 of two view relations v and v2 with the following definitions:

v(X, Y) :-s(X, Y); (4)

v2(, x) :-s(, x), s(X, s(Z, ..., zk); (5)

let these definitions of v and of v= form a set Tv; for a graphic depiction of the
views see Figure 2.

Fig. 1. Graphic depiction of the query for Example 1.

Vl

x Y

Z Z_ 
#

Fig. 2. Graphic depiction of the views for Example 1.

It is easy to show that the query

qv(X, Y) :-vx(X, Y), v(Y, X) (6)

is equivalent to q$. At the same time, it is not possible to construct a nonin-
terleaving containment mapping from qnv (the expansion of qvin terms of ) to

q$.
The number of subgoals of v2 can be made arbitrarily large by varying the
parameter k; it is clear that neither v nor v2 is redundant in qv, and therefore
no upper bound -- one that could be related to the length of q8- can be
established on the number of subgoals in the definition of v2.

This example allows us to formulate

Theorem 2 (General: New Definitions with Fewer Subgoals). It is
not true for all conjunctive queries q$ and all their rewritings qv that each view
used in qv has an alternative (equivalent) definition which has no more subgoals
than q$.

The failure of the conjecture formulated above makes one wonder whether it
is possible at all to generalize the claim of Theorem 1. Fortunately, the answer
is yes: a much more general result is described in the next section.

5 Shorter Views Contained in Longer Views

The main result of the previous section is that it is not possible to rewrite views
in an arbitrary formulation of a query, in such a way that each view has no more
subgoals than the query itself. However, in this section we show that it is possible
to "reformulate" the conjunctive views in an arbitrary rewriting of an arbitrary
conjunctive query. This reformulation process outputs views that, although not
necessarily equivalent to the original views, constitute an equivalent rewriting
of the query and, at the same time, have each no more subgoals than the query
itself. Moreover, we show that if, in addition, the input query is linearly bounded
with some parameter t (see Definition 2 in Section 3) then the reformulated query
is also linearly bounded with the same parameter t.

Example 2. Recall $, qs, 12 = { v, v2 }, Tgv, and qv introduced in Example 1;
we choose k = 2 for v2 which will thus be defined as follows:

v(Y, X) :-s(Y, X), s(X, Z), s(Z, Z). (7)

We define a new view relation v[ (v[) by applying a substitution to the body
of vx (v). For v[, consider the trivial substitution applied to the only subgoal
s(X, Y) ofv:
v(x, :-s(X, (8)
For v[ , consider a substitution { X - X, Y - Y, Z - Y, Z - X }
applied to the body of v2:

(, X) :-s(, X), s(X, ). (9)

Notice that v is not equivalent to v.
Now the query qv':

qv,(X, Y) :-v[(X, Y), v(Y, X) (10)

is equivalent to both qs and qv, since the expansion qnv, Of qv' is isomorphic to
the query qs (after removing duplicate subgoals). It is easy to see that when
{ v[, v[ } is the schema of the reformulated database (i.e., when both v[ and
v[ are materialized), the query of interest is computed faster than in databases
with schema $, as well as in databases with schema 12.

Table 1. Stored relation s and view relations v2 and v 
2 in Example 2.

a b

b a

c d

d c

b c
V2

a b

b a

c d

d c

b c
a b

b a

c d

d c

Consider a database instance D with schema $. Suppose the data in the table
for s, in that database instance D, is as in Table 1. Table i also shows the result
of materializing view relations v and vl (both v and vl are the same as s).
Notice that the size of the table for each vi is no larger than the size of the table
for its corresponding vi; this is also true for any other database instance with
schema $, as will be explained shortly. Thus, if a reformulation that uses the
rewriting qv is linearly bounded with some parameter t, then the reformulation
that uses the rewriting qv' is also guaranteed to be linearly bounded with the
same parameter t.
Notice that vi is not needed to compute qv', and thus the reformulation
that contains only v[, in addition to improved computation time, satisfies the
no-growth bound compared to the original (input) database schema.

In formulating our main result and the reformulation algorithm, we will use
the notion of a unifiee of a query:

Definition 3. Given a query q, a unifiee of q is a query q such that the body
of q is the result of applying some substitution to some subset of the body of q.

Consider, as usual, a database schema $ and a set  of view relations defined
in terms of $. Consider a query q8 defined in terms of $ and a query qv defined
in terms of . In this setting, the following main result holds.

Theorem 3 (Main Result). If q$ and qv are equivalent then, for each view
literal vi in qv, there is a (not necessarily equivalent) view relation vi which is
a unifiee of q$, and a query qv, obtained by replacing, in qv, each vi with its
corresponding vi, is equivalent to qv ;
in addition, if both vi and v are materialized in any database with schema 3,
then the size of the table for v is no larger than the size of the table for its vi.

Informally, the theorem states that for any rewriting qv of a query q$ we
can always find an alternative rewriting qv' with two special properties. First,
qv' is composed entirely of views with no more subgoals each than in q$. The
second property is that in qv', each v is contained in its corresponding vi (this
becomes clear from the proof of the Theorem -- see the appendix); therefore,
when the view relations in both rewritings are materialized, the tables for the
view relations that define qv' always "fit in" the space required to store the
tables for the view relations that define qv.
Consider again Example 2. By the main result, the table for each v is no
larger than the table for its corresponding vi in any database instance with
schema $.

Notice that the theorem is true independently of whether the input query q$
is minimized.

6 A Database Reformulation Algorithm

In this section we describe a database reformulation algorithm for deductive
databases where all rules are conjunctive. The algorithm is based on the result
described in the previous section: for each space-satisfactory rewriting of a
given query there exists another space-satisfactory rewriting, defined in terms
of "short" views only, i.e., of those views that have no more subgoals than the
query itself. Thus, the idea of the algorithm is, for each input query, to examine
all "short" views (obviously there is only a finite number of them) and to con-
struct rewritings of the query by combining such views. The successful rewritings
then undergo a storage space check to decide whether they belong to a spacesatisfactory
reformulation. If such rewritings cannot be found by the algorithm,
then, by our results, no rewritings of the query exist.
The reformulation algorithm takes as input a database schema $, a set T$
of view definitions in terms of $, a set Q of elementary query relations with definitions
in T$, and the value of parameter t for the linear bound. For each query

relation in Q with definition q$ in 7$, the algorithm generates a set of rewrit-
ings of q8 by, first, producing all unifiees of q8 (see Definition 3 in Section 5) as
views and combining the views in conjunctive definitions in all possible ways,
and, second, by testing the resulting conjunctions for equivalence with qs. After
all such rewritings of all input queries have been built, the algorithm generates
candidate reformulations as all combinations where each candidate reformula-

tion contains one rewriting per input query. Only those candidate reformulation
that are linearly bounded for the given value of t, belong in the output of the
algorithm.

Theorem 4. For a given input ($, 7$, Q) and for a given positive number t, the
reformulation algorithm outputs only those reformulations (,7v, Q)that are
linearly bounded with parameter t, and their 7 v contain only definitions with no
more subgoals in each than the number of subgoals in the longest query in 7$.

The longest query in 8 is the query with the most subgoals. This theorem
is true by construction of the algorithm.

Theorem 5. If, for some input and for some positive number t, the output of
the reformulation algorithm is empty then no reformulation of the input in terms
of conjunctive views is linearly bounded with parameter t.

This result follows immediately from Theorem 3 in Section 5.
The reformulation algorithm presented in this section is proof that the database
reformulation problem is decidable for conjunctive databases. Notice that
the algorithm is, in all probability, too expensive to be used to actually generate
reformulations. To see why, it suffices to notice that one of the steps of the algorithm
involves testing pairs of conjunctive queries for equivalence, i.e., solving a
known NP-complete problem.

7 Related Work

Database schema evolution is an integral part of database design, data model
translation, schema (de)composition, and multidatabase integration; fundamen-
tal to these problems is the notion of equivalence between database schemata.
The first definition of schema equivalence was proposed in [11]; schema equivalence
was studied in [6, 8, 29]. Later, relative information capacity was introduced
in [19] as a fundamental theoretical concept which encompasses schema
equivalence and dominance. Other work on relative information capacity includes
[5, 25, 26]. Tutorial [18] surveys a number of frameworks -- relative information
capacity among others -- for dealing with the issue of semantic heterogeneity
arising in database integration.
In practical database systems, database design frequently uses normalization,
first introduced in [10] and described in detail in [33]. Papers [7, 20] survey
methods and issues in multidatabase integration.
Query transformation is another aspect of database transformation tasks;
query rewriting methods complement schema transformation methods in that

they are applied to databases that are already operational. Query rewriting is
important for query optimization, especially in deductive databases [27] where
queries can be complex and the amount of data accessed can be overwhelm-
ing. [28] is a survey on implementation techniques and implemented projects in
deductive databases.

There is an extensive body of work on theoretical aspects of query rewriting.
For an overview of query rewriting methods in datalog and its extensions see [2,
33, 34]. In addition, applications such as data warehousing and multidatabase
integration have promoted the study of views in databases. The paper [32] is
a survey of containment and rewriting/optimization of queries using views. [1]
discusses the complexity of answering queries using materialized views and conrains
references to major results in the areas of query containment and view
materialization. Papers [16,17, 21, 30] describe approaches to view materialization.
[3, 12, 13, 23] treat the problem of using available materialized views for
query evaluation. Nearly all results described in the literature concern rewriting
of single queries; notice that the database reformulation approach we propose
involves simultaneous rewriting of sets of queries.
Transformations of database schemas and queries can be considered together
as reformulations of logical theories. [31] provides a theoretical foundation for
theory reformulations, and [15, 22] contain work on general transformations of
logical theories.
Descriptions of basic methods used in this paper can be found, e.g., in [14].

8 Conclusions and Future Work

The first contribution of this paper is that it introduces the notion of database
reformulation which is the process of rewriting the data and rules of a deductive
database in a functionally equivalent manner. We focus on the problem of automatically
reformulating a database in a way that reduces query processing time
while satisfying strong storage space constraints.
The second contribution of this paper is a proof that it is decidable to reformulate,
using conjunctive views, deductive databases where all rules are conjunctive,
in the presence of strong storage space constraints. We have shown that
for any space-satisfactory rewriting of a conjunctive query there is a rewriting
which is also space-satisfactory and is composed entirely of views that have no
more subgoals than the original query. We have described a reformulation algorithm
which returns space-satisfactory query rewritings composed of views that
have no more subgoals than the longest query in the input. Notice that all the
results automatically hold for select-project-join (SPJ) queries in SQL.
There are several directions of future research in database reformulation. A

pressing research problem is the issue of update complexity in the reformulated
database. On a more long-term scale, it is desirable to develop criteria for comparing
different outputs produced by the reformulation algorithm, as well as
criteria for choosing the optimal output among the potentially multiple answers.
Another challenge is to examine possible interactions between views chosen in

support of different queries and to understand how such interactions might influence
the quality of a reformulation, in particular its storage space requirements.
We also plan to study the complexity of the reformulation problem. Finally, our
long-term objective is to extend research in database reformulation to deductive
databases whose queries and views are formulated in progressively more general
standard query languages, for example include disjunction or negation, as well
as to databases with integrity constraints.

Acknowledgements

The authors would like to thank Jeffrey D. Ullman for pointing out the problem
of query folding in the first place, which ultimately led to the database reformulation
problem. Also we would like to thank Vasilis Vassalos for the reading and
extensive discussions of the paper.

References

1. Serge Abiteboul and Oliver Duschka. Complexity of answering queries using materialized
views. In PODS-98, pages 254-263.
2. Serge Abiteboul, Richard Hull, and Victor Vianu. Foundations of Databases.
Addison-Wesley, Reading, Mass., 1995.
3. F.N. Afrati, M. Gergatsoulis, and T.G. Kavalieros. Answering queries using materialized
views with disjunctions. In ICDT-99, pages 435-452.
4. A.V. Aho, Y. Sagiv, and J.D. Ullman. Equivalences among relational expressions.
SIAM J. Cornput., 8(2):218-246, 1979.
5. J. Albert, Y. Ioannidis, and R. Ramakrishnan. Conjunctive query equivalence of
keyed relational schemas. In PODS-97, pages 44-50.
6. P. Atzeni, G. Ausiello, C. Batini, and M. Moscarini. Inclusion and equivalence
between relational database schemata. Theoretical Computer Science, 19:267-285,
1982.

7. C. Batini, M. Lenzerini, and S.B. Navathe. A comparative analysis of methodologies
for database schema integration. ACM Computing Surveys, 18(4):323-364,
1986.

8. C. Beeri, A.O. Mendelzon, Y. Sagiv, and J.D. Ullman. Equivalence of relational
database schemes. SIAM J. Cornput., 10(2):352-370, 1981.
9. Ashok K. Chandra and Philip M. Merlin. Optimal implementation of conjunctive
queries in relational data bases. In STOC-77, pages 77-90.
10. E.F. Codd. A relational model of data for large shared data banks. Comm. ACM,
13(6):377-387, June 1970.
11. E.F. Codd. Further normalization of the data base relational model. In R. Rustin,
editor, Database Systems, pages 33-64. Prentice Hall Inc., Englewood Cliffs, N J,
1972.

12. Oliver M. Duschka and Michael R. Genesereth. Answering recursive queries using
views. In PODS-97, pages 109-116.
13. Oliver M. Duschka and Michael R. Genesereth. Query planning with disjunctive
sources. In AAAI-98 Workshop on AI and Information Integration, 1997.

14. Herbert B. Enderton. A Mathematical Introduction to Logic. Academic Press, New
York, 1972.
15. Fausto Giunchiglia and Toby Walsh. A theory of abstraction. ArtiJcial Intelligence,
57(2-3):323-389, 1992.
16. Himanshu Gupta. Selection of views to materialize in a data warehouse. In ICDT97,
pages 98-112.
17. Himanshu Gupta and lnderpal Singh Mumick. Selection of views to materialize
under a maintenance cost constraint. In ICDT-99, pages 453470.
18. Richard Hull. Managing semantic heterogeneity in databases: a theoretical perspective.
In PODS-97, pages 51-61.
19. Richard Hull. Relative information capacity of simple relational database
schemata. SIAM J. Cornput., 15(3):856-886, August 1986.
20. Won Kim, editor. Modern Database Systems. ACM Press, New York, New York,
1995.

21. Yannis Kotidis and Nick Roussopoulos. Dynamat: a dynamic view management
system for data warehouses. In SIGMOD-99.
22. Alon Y. Levy and P. Pandurang Nayak. A semantic theory of abstractions. In
IJCAI-95, pages 196-203.
23. A.Y. Levy, A.O. Mendelzon, Y. Sagiv, and D. Srivastava. Answering queries using
views. In PODS-95, pages 95-104.
24. John Wylie Lloyd. Foundations of Logic Programming. Springer-Verlag, 1987.
25. R.J. Miller, Y.E. 1oannidis, and R. Ramakrishnan. The use of information capacity
in schema integration and translation. In VLDB-93, pages 120-133.
26. R.J. Miller, Y.E. 1oannidis, and R. Ramakrishnan. Schema equivalence in hetero-
geneous systems: bridging theory and practice. Information Systems, 19(1):3-31,
1994.
27. Jack Minker. Logic and databases: a 20 year retrospective. In D. Pedreschi and
C. Zaniolo, editors, Logic in Databases, pages 3-57. Springer, 1996. (Proceedings
of the LID'96 international workshop).
28. Raghu Ramakrishnan and Jeffrey D. Ullman. A survey of deductive database
systems. J. Logic Progr., 23(2):125-149, May 1995.
29. J. Rissanen. On equivalences of database schemes. In PODS-82, pages 23-26.
30. K.A. Ross, D. Srivastava, and S. Sudarshan. Materialized view maintenance and
integrity constraint checking: trading space for time. In SIGMOD-96, pages 447-
458.
31. Devika Subramanian. A theory of justiJed reformulations. PhD thesis, Stanford
University, 1989.
32. Jeffrey D. Ullman. Information integration using logical views. In ICDT-97, pages
1940.
33. Jeffrey D. Ullman. Principles of Database and Knowledge-Base Systems, volume I.
Computer Science Press, New York, 1988.
34. Jeffrey D. Ullman. Principles of Database and Knowledge-Base Systems, volume II.
Computer Science Press, New York, 1989.

A Some Examples and Proofs

A.1 A Recursive Reformulation Example

Here is an example which illustrates the power of database reformulation for a
deductive database with recursive rules.

Example 3. Consider the "founding fathers" problem invented by Devika Sub-
ramanian [31]. There is just one stored relation in this case: the father relation,
where father(X, Y) means that X is the father of Y. In addition, there are two
views defined as shown below. The (male) ancestor relation holds of two people
X and Y if X is the father of Y or if there is a third person V who is a child
of X and an ancestor of Y. The samefamily relation holds of two people if they
have a common (male) ancestor.

ancestor(X, Y) :-father(X, Y); (11)

ancestor(X, Y) :-father(X, V), ancestor(V, Y); (12)

samefamily(Y, Z) :-ancestor(X, Y), ancestor(X, Z). (13)

With the definitions above, determining whether two people are in the same
family involves computing all of their ancestors and intersecting the two sets, a
fairly expensive operation.
To improve the query processing time, we could materialize the samefamily
relation. However, the table for that relation, in a database with schema
 father , would take up storage space that is, in the worst case, quadratic in
the number of people in the database. For large databases, such materialization
is impractical.
On the other hand, it is possible to get the same performance gain without
any space growth whatsoever. The definitions below show how. We say that a
person X is the founding father (founder) of the family of person Y if X is an
ancestor of Y and there is no person W such that W is the father of X (9 stands
for negation). The samefamily relation can then be defined in terms of this new
founder relation: two people are in the same family if and only if they have the
same founding father.

hasancestors(X) :- father(W, X); (14)

founder(X, Y) :- ancestor(X, Y), - hasancestors(X); (15)

samefamily(Y, Z) :-founder(X, Y), founder(X, Z). (16)

Using the definitions above, we can materialize the founder relation; furthermore,
if we are interested only in the samefamily query we can also dematerialize
the father relation. With such a reformulation we get the same performance improvement
for our target query as if we did materialize the samefamily relation
itself. However, the amount of space required for the reformulated database is
the same as that required for the input database; in other words, this reformulation
satisfies the no-growth bound. This reformulation is so good because we
have succeeded in defining an equivalence relation (samefamily) by choosing a
single representative (founder) from each equivalence class.

A.2 An Example of Conjunctive Reformulation

Below is a very simple example of database reformulation for a conjunctive
deductive database.

Example 6. Suppose in some database there are two stored relations: parent and
gender (either male or female). Suppose the only elementary query of interest is
grandparent- imagine that we only plan to ask queries about grandfathers or
grandmothers. The grandparent relation can be defined in terms of the parent
relation in an obvious way as follows:

grandparent (X, Y) : - parent (X, Z), parent (Z, Y). (17)

Once we materialize the grandparent relation, we can dematerialize the parent
relation since we are not interested in querying on that relation anyway. As a
result, the new database contains only relevant information (relations gender
and grandparent), the processing costs for the queries of interest are minimized,
and the reformulation is linearly bounded with t = 2.

A.3 Proof of Theorem 3 in Section 5

To prove the theorem, we need the notion of a self-map.

Definition 4. A containment mapping on a conjunctive query is called a selfmap
if it maps the query into itself.

Consider an arbitrary conjunctive query q; consider another query q such
that there exist two containment mappings M : q -- q and M2 : q --
q, where both M and M2 preserve head variables of the queries. Consider a
composition M of the two mappings; M: q -- q. Notice that M is a self-map
by construction.
Then it is easy to show that each such q has the following property:

Lemma 1. q is equivalent to its image q under M:

q -- q'. (18)

We proceed to prove the main result which is formulated as Theorem 3 in
Section 5:

Proof (Theorem 3). Let Tv be the set of definitions of view relations in )2 in
terms of $, and let qnv be the (unique and equivalent) expansion of qv in terms
of Tv. By transitivity of equivalence, qnv is equivalent to
From the containment mapping theorem, the equivalence of qnv and of
means that there is a containment mapping Mn8: qnv - q8 and there is a
containment mapping Msn: q8 - qnv, where both mappings preserve head
variables of the queries.

Consider a composition M of Mn$ with Msn:

M: qnv -- qnv. (19)

By construction, M is a self-map that satisfies the conditions of Lemma 1; there-
fore, by that lemma, the image M(qn)of qn under M is equivalent to qn:

-- M(qnv). (20)

Consider an arbitrary view literal vi in qv and the body ri of the definition
of vi in qnv. Consider the image ri of ri under M; ri is a subset of the body
of M(qnv). Notice that by construction of M, ri defines some (at least one
-- depending on the choice of head variables) unifiee of q8 (see Definition 3 in
Section 5).
Let us take ri as the body of a definition of some view relation v, such that
the head variables of vi are images, under M, of the head variables of vi (in ri).
After we have built a new relation v based on each vi used in the definition
qv, M(qnv ) can be viewed as an expansion of some conjunctive query qv', where
 = J (vi). It is easy to show that qv' exists and is equivalent to M(qnv);
therefore, from transitivity of equivalence, qv' is equivalent to both qv and to
Consider an arbitary view literal vi in the body of qv and its counterpart
vi in the body of qv,. Notice that vi is, in fact, constructed as a unifiee of vi,
not of q8, although v is also a unifiee of q8 by properties of the mapping M.
Also, by the containment mapping theorem, vi is contained in vi; if M does not
preserve head variables of some vi, we assume that vi has the same number of
head variables as vi, only some head variables of vi may be unified with each
other. Thus we have that in any database instance with schema $, if both
and v are materialized, then the table for vi takes up at most as much storage
space as the table for its corresponding vi.