﻿Logical Methods in Computer Science
Vol. 3 (2:1) 2007, pp. 1–1–20
www.lmcs-online.org
Submitted May 30, 2006
Published Apr. 18, 2007
PREDICATE ABSTRACTION VIA SYMBOLIC DECISION
PROCEDURES ∗
SHUVENDU K. LAHIRI a , THOMAS BALL a , AND BYRON COOK b
a Microsoft Research, Redmond, WA 98052
e-mail address: {shuvendu,tball}@microsoft.com
b Microsoft Research, Cambridge, United Kingdom
e-mail address: bycook@microsoft.com
Abstract. We present a new approach for performing predicate abstraction based on
symbolic decision procedures. Intuitively, a symbolic decision procedure for a theory takes
a set of predicates in the theory and symbolically executes a decision procedure on all
the subsets over the set of predicates. The result of the symbolic decision procedure is a
shared expression (represented by a directed acyclic graph) that implicitly represents the
answer to a predicate abstraction query.
We present symbolic decision procedures for the logic of Equality and Uninterpreted
Functions (EUF) and Difference logic (DIFF) and show that these procedures run in
pseudo-polynomial (rather than exponential) time. We then provide a method to construct
symbolic decision procedures for simple mixed theories (including the two theories mentioned
above) using an extension of the Nelson-Oppen combination method. We present
preliminary evaluation of our Procedure on predicate abstraction benchmarks from device
driver verification in SLAM.
1. Introduction
Predicate abstraction is a technique for automatically creating finite abstract models
of finite and infinite state systems [GS97]. The method has been widely used in abstracting
finite-state models of programs in SLAM [BMMR01] and numerous other software verification
projects [HJMS02, CCG + 04]. It has also been used for synthesizing loop invariants
[FQ02] and verifying distributed protocols [DDP99, LBC03].
The fundamental operation in predicate abstraction can be summarized as follows:
Given a set of predicates P describing some set of properties of the system state, and a
formula e, compute the weakest Boolean formula FP(e) over the predicates P that implies
2000 ACM Subject Classification: F.3.1, F.4.1.
Key words and phrases: predicate abstraction, decision procedures, formal verification, symbolic
algorithms.
∗ An earlier version of the paper appeared in the proceedings of the Computer Aided Verification conference,
2005.
LOGICAL METHODS
ÐIN COMPUTER SCIENCE DOI:10.2168/LMCS-3 (2:1) 2007
c○ S. K. Lahiri, T. Ball, and B. Cook
○CC Creative Commons
2 S. K. LAHIRI, T. BALL, AND B. COOK
e 1 . Most implementations of predicate abstraction [GS97, BMMR01] construct FP(e) by
collecting the set of cubes (a conjunction of the predicates or their negations) over P that
imply e. The implication is checked using a first-order theorem prover. This method may
require making a very large (2 |P | in the worst case) number of calls to a theorem prover
and can be expensive.
We propose a new way to perform predicate abstraction based on symbolic decision
procedures. A symbolic decision procedure for a theory T (SDPT) takes sets of predicates
G and E and symbolically executes a decision procedure for T on G ′ ∪ {¬e | e ∈ E} 2 , for
all the subsets G ′ of G. The output of SDPT(G,E) is a shared expression (an expression
where common subexpressions can be shared) representing those subsets G ′ ⊆ G, for which
G ′ ∪ {¬e | e ∈ E} is unsatisfiable. We show that such a procedure can be used to compute
FP(e) for performing predicate abstraction.
We present symbolic decision procedures for the logic of Equality and Uninterpreted
Functions(EUF) and Difference logic (DIF) and show that these procedures run in polynomial
and pseudo-polynomial time respectively, and therefore produce compact shared
expressions. We provide a method to construct SDP for a combination of two simple
theories T1 ∪ T2 (including EUF + DIF), by using an extension of the Nelson-Oppen combination
[NO80] method. We use Binary Decision Diagrams (BDDs) [Bry86] to construct
FP(e) from the shared representations efficiently in practice.
We present a preliminary evaluation of our procedure on predicate abstraction benchmarks
from device driver verification in SLAM, and show that our method outperforms
existing methods for doing predicate abstraction.
The rest of the paper is organized as follows: Section 1.1 describes related work in
predicate abstraction techniques. Section 2 describes the background concepts including
predicate abstraction. Section 3 describes symbolic decision procedures, and instantiates it
for two different theories (EUF and DIF). Section 4 describes a framework for modularly
combining the SDPs for two theories that satisfy certain requirements, using an extension
of the Nelson-Oppen combination method. Section 5 describes the implementation and the
experimental evaluation of our technique. Finally, we present the conclusions and future
work in Section 6.
1.1. Related Work. Several techniques have been suggested to improve the performance of
predicate abstraction. The techniques can be broadly classified into three categories: In the
first category, we classify methods that treat the decision procedures as a “black box”, and
attempt to minimize the number of decision procedure calls during predicate abstraction.
The second category consists of methods that use a quantifier elimination procedure to
perform predicate abstraction. Finally, there are techniques that do not compute the most
precise abstract directly; instead, they rely on counterexamples or proofs in the overall
verification process to refine the abstraction. In the following paragraphs, we describe these
techniques in more details.
The techniques that aim to reduce the number of calls to the theorem prover or decision
procedure are mostly based on enumerating cubes over P in an increasing order of their
size. Das et al. [DDP99] enumerates cubes over a tree, after fixing the order of predicates
1The dual of this problem, which is to compute the strongest Boolean formula GP(e) that is implied by
e, can be expressed as ¬FP(¬e).
2Throughout this paper, we interpret a set of expressions to be a conjunction over the expressions in the
set.
PREDICATE ABSTRACTION VIA SYMBOLIC DECISION PROCEDURES ∗ 3
that appear in any path to the leaves. If a cube is found unsatisfiable, then all its subcubes
(represented by the subtree) are pruned off. This method may require 2 |P |+1 calls to
the theorem prover in the worst case. Saidi and Shankar [SS99] relaxes the order on the
predicates, and enumerate all possible cubes (3 |P | of them) over the predicates. Flanagan
and Qadeer [FQ02] provide an algorithm that searches over the 2 |P | clauses (disjunction
of cubes over the predicates or their negations) of size |P |, but attempts to greedily grow
the clause (by dropping literals) when such a clause is implied by the formula e. Their
technique requires |P |.2 |P | theorem prover calls in the worst case. Other techniques sacrifice
precision to gain efficiency, by only considering cubes of some fixed length [BMMR01]. All
these techniques may require an exponential number of theorem prover calls in the worst
case, and demonstrate worst case behavior in practice. However, more importantly, since
these queries are not incremental, the state of the prover has to be reset across each call,
precluding any learning across calls.
Alternately, predicate abstraction can be formulated as a quantifier elimination problem.
Lahiri et al. [LBC03] and Clarke et al. [CKSY04] perform predicate abstraction by
reducing the problem of computing FP(e) to Boolean quantifier elimination. The former
method first transforms a first-order quantifier elimination problem into Boolean quantifier
elimination by encoding first-order formulas into Boolean formulas; the latter assumes
all variables are propositional. The method in [LBC03] first converts the quantifier-free
first-order formula to a Boolean formula such that the translation preserves the set of satisfying
assignments of the Boolean variables in the original formula. Both these techniques
use incremental Boolean Satisfiability (SAT) techniques [CKSY04, McM02] to perform the
Boolean quantifier elimination. These techniques have the benefit that the large number of
calls to the theorem prover is avoided, and learning can be used to prune away the search
space in the SAT solver. However, the translation from a first-order formula to a Boolean
formula can result in a loss of structure (since the arithmetic operations are encoded as
bitwise operations), and make the translation inefficient. Namjoshi and Kurshan [NK00]
also proposed using quantifier elimination for first-order logic directly to perform predicate
abstraction — however many theories (such as the theory of Equality with Uninterpreted
Functions) do not admit quantifier elimination.
Most of the above approaches use decision procedures or SAT solvers as “black boxes”,
at best in an incremental fashion, to perform predicate abstraction. We believe that having
a customized procedure for predicate abstraction can help improve the efficiency of predicate
abstraction on large problems.
Finally, there are a set of techniques to avoid computing the most precise abstraction
upfront, and refine it only based on failed proof attempts in the verification tool. Das
and Dill [DD01] and subsequently Ball et al. [BCDR04] use counterexamples to refine the
predicate abstraction incrementally. Jhala and McMillan [JM05] use interpolants to refine
the predicate abstraction. It is not clear if it is always preferable to compute the abstraction
incrementally. But, we have observed that the refinement loop can often becomes the main
bottleneck in these techniques (for example in SLAM), and limits the scalability of the
overall system [BCDR04].
2. Setup
Figure 1 defines the syntax of a quantifier-free fragment of first-order logic. An expression
in the logic can either be a term or a formula. A term can either be a variable or an
4 S. K. LAHIRI, T. BALL, AND B. COOK
term ::= variable | function-symbol(term,... ,term)
atomic-formula ::= term = term | predicate-symbol(term,... ,term)
formula ::= true |false | atomic-formula
| formula ∧ formula | formula ∨ formula | ¬formula
Figure 1: Syntax of a quantifier-free fragment of first-order logic.
application of a function symbol to a list of terms. A formula can be the constants true
or false or an atomic formula or Boolean combination of other formulas. Atomic formulas
can be formed by an equality between terms or by an application of a predicate symbol to
a list of terms.
The function and predicate symbols can either be uninterpreted or can be defined by a
particular theory. For instance, the theory of integer linear arithmetic defines the functionsymbol
“+” to be the addition function over integers and “<” to be the comparison predicate
over integers. If an expression involves function or predicate symbols from multiple theories,
then it is said to be an expression over mixed theories.
A formula F is said to be satisfiable if it is possible to assign values to the various
symbols in the formula from the domains associated with the theories to make the formula
true. A formula is valid if ¬F is not satisfiable (or unsatisfiable). We say a formula A
implies a formula B (A ⇒ B) if and only if (¬A) ∨ B is valid.
We define a shared expression to be a Directed Acyclic Graph (DAG) representation
of an expression where common subexpressions can be shared, by using names to refer to
common subexpressions. For example, the intermediate variable t refers to the expression
e1 in the shared expression “let t = e1 in (e2 ∧ t) ∨ (e3 ∧ ¬t)”.
2.1. Predicate Abstraction. A predicate is an atomic formula or its negation 3 . If G is a
set of predicates, then we define � G . = {¬g | g ∈ G}, to be the set containing the negations of
the predicates in G. We use the term “predicate” in a general sense to refer to any atomic
formula or its negation and should not be confused to only mean the set of predicates that
are used in predicate abstraction.
Definition 2.1. For a set of predicates P, a literal li over P is either a predicate pi or
¬pi, where pi ∈ P. A cube c over P is a conjunction of literals. A clause cl over P is a
disjunction of literals. Finally, a minterm over P is a cube with |P | literals, and exactly
one of pi or ¬pi is present in the cube.
Given a set of predicates P . = {p1,...,pn} and a formula e, the main operation in
predicate abstraction involves constructing the weakest Boolean formula FP(e) over P such
that FP(e) ⇒ e. The expression FP(e) can be expressed as the set of all the minterms over
P that imply e:
FP(e) = � {c | c is a minterm over P and c implies e} (2.1)
Proposition 2.2. For a set of predicates P and a formula e, the following statements are
true:
3 We always use the term “predicate symbol” (and not “predicate”) to refer to symbols like “<”.
PREDICATE ABSTRACTION VIA SYMBOLIC DECISION PROCEDURES ∗ 5
X = Y
Y = X
X = Y Y = Z
X = Z
X = Y X �= Y
⊥
X1 = Y1 · · · Xn = Yn
f(X1, · · · ,Xn) = f(Y1, · · · ,Yn)
Figure 2: Inference rules for theory of equality and uninterpreted functions.
(1) FP(¬e) ⇒ ¬FP(e),
(2) FP(e1 ∧ e2) ⇔ FP(e1) ∧ FP(e2), and
(3) FP(e1) ∨ FP(e2) ⇒ FP(e1 ∨ e2)
Proof. These properties follow very easily from the definition of FP.
We know that FP(e) ⇒ e, by the definition of FP(e). By contrapositive rule, ¬e ⇒
¬FP(e). But FP(¬e) ⇒ ¬e. Therefore, FP(¬e) ⇒ ¬FP(e).
To prove the second equation, we prove that (i) FP(e1 ∧ e2) ⇒ (FP(e1) ∧ FP(e2)), and
(ii) (FP(e1) ∧ FP(e2)) ⇒ FP(e1 ∧ e2). Since e1 ∧ e2 ⇒ ei (for i ∈ {1,2}), FP(e1 ∧ e2) ⇒
FP(ei). Therefore FP(e1 ∧ e2) ⇒ (FP (e1) ∧ FP(e2)). On the other hand, FP(e1) ⇒ e1 and
FP(e2) ⇒ e2, FP(e1) ∧ FP(e2) ⇒ e1 ∧ e2. Since FP(e1 ∧ e2) is the weakest expression that
implies e1 ∧ e2, FP(e1) ∧ FP(e2) ⇒ FP(e1 ∧ e2).
To prove the third equation, note that FP(e1) ∨ FP(e2) ⇒ e1 ∨ e2 and FP(e1 ∨ e2) is
the weakest expression that implies e1 ∨ e2.
The operation FP(e) does not distribute over disjunctions. Consider the example where
P . = {x �= 5} and e . = x < 5 ∨ x > 5. In this case, FP(e) = x �= 5. However FP(x < 5) =
false and FP(x > 5) = false and thus (FP(x < 5) ∨ FP(x > 5)) is not the same as FP(e).
The above properties suggest that one can adopt a two-tier approach to compute FP(e)
for any formula e:
(1) Convert e into an equivalent Conjunctive Normal Form (CNF), which comprises of
a conjunction of clauses, i.e., e ≡ ( �
i cli).
.
(2) For each clause cli = (ei 1 ∨ ei 2 ... ∨ ei .
m), compute ri = FP(cli) and return FP(e) . �
=
i ri.
To obtain an equivalent CNF form, one cannot introduce auxiliary variables (to keep
the size of the resulting formula linear in the size of the input formula), as is typically
done during an equisatisfiable CNF translation. These auxiliary variables introduced have
to be existentially quantified out to obtain an equivalent formula. In our case, the CNF
representation of the formula can be exponentially large compared to the original formula.
However, we can use recent techniques to obtain the CNF form lazily, by a method proposed
by McMillan [McM02].
For the rest of hte paepr, we focus here on computing FP( �
ei∈E ei) when ei is a predicate.
Unless specified otherwise, we always use e to denote ( �
ei∈E ei), a disjunction of
predicates in the set E in the sequel.
6 S. K. LAHIRI, T. BALL, AND B. COOK
3. Symbolic Decision Procedures (SDP)
We now show how to perform predicate abstraction using symbolic decision procedures.
We start by describing a saturation-based decision procedure for a theory T and then use
it to describe the meaning of a symbolic decision procedure for the theory T. Finally, we
show how a symbolic decision procedure can yield a shared expression of FP(e) for predicate
abstraction.
A set of predicates G (over theory T) is unsatisfiable if the formula ( �
g∈G
g) is unsat-
isfiable. For a given theory T, the decision procedure for T takes a set of predicates G in
the theory and checks if G is unsatisfiable. A theory is defined by a set of inference rules.
An inference rule R is of the form:
A1 A2 ... An
A
which denotes that the predicate A can be derived from predicates A1,... ,An in one step.
Each theory has at least one inference rule for deriving contradiction (⊥). We also use
g : − g1,... ,gk to denote that the predicate g (or ⊥, where g = ⊥) can be derived from the
predicates g1,... ,gk using one of the inference rules in a single step. Figure 2 describes the
inference rules for the theory of Equality and Uninterpreted Functions.
3.1. Saturation based decision procedures. Consider a simple saturation-based procedure
DPT shown in Figure 3, that takes a set of predicates G as input and returns
satisfiable or unsatisfiable.
The algorithm maintains two sets: (i) W is the set of predicates derived from G up
to (and including) the current iteration of the loop in step (2); (ii) W ′ is the set of all
predicates derived before the current iteration. These sets are initialized in step (1). During
each iteration of step (2), if a new predicate g can be derived from a set of predicates
{g1,... ,gk} ⊆ W ′ , then g is added to W. The loop terminates after a bound
derivDepth T(G). In step (3), we check if any subset of facts in W can derive contradiction.
If such a subset exists, the algorithm returns unsatisfiable, otherwise it returns
satisfiable.
The parameter d . = derivDepth T(G) is a bound (that is determined solely by the set
G for the theory T) such that if the loop in step (2) is repeated for at least d steps, then
DPT(G) returns unsatisfiable if and only if G is unsatisfiable. If such a bound exists for
any set of predicates G in the theory, then DPT procedure implements a decision procedure
for T.
Definition 3.1. A theory T is called a bounded saturation theory, if the procedure DPT
described in Figure 3 implements a decision procedure for T.
In the rest of the paper, we only consider bounded saturation theories. Since there is
no ambiguity, we will drop the term “bounded” in the rest of the paper and refer to such
a theory as saturation theory. To show that a theory T is a saturation theory, it suffices
to consider a decision procedure algorithm for T (say AT) and show that DPT implements
AT. This can be shown by deriving a bound on derivDepth T(G) for any set G in the theory.
(R)
PREDICATE ABSTRACTION VIA SYMBOLIC DECISION PROCEDURES ∗ 7
(1) Initialize W ← G. W ′ ← {}.
(2) For i = 1 to derivDepth T(G):
(a) Let W ′ ← W.
(b) For every fact g �∈ W ′ , if (g : − g1,... ,gk) and gm ∈ W ′ for all m ∈ [1,k]:
• W ← W ∪ {g}.
(3) If (⊥ : − g1,... ,gk) and gm ∈ W for all m ∈ [1,k]:
• return unsatisfiable
(4) else return satisfiable
Figure 3: DPT(G): A simple saturation-based procedure for theory T. We use m ∈ [i,j]
to denote i ≤ m ≤ j.
3.2. Symbolic Decision Procedure. For a (saturation) theory T, a symbolic decision
procedure for T (SDPT) takes sets of predicates G and E as inputs, and symbolically
simulates DPT on G ′ ∪ � E, for every subset G ′ ⊆ G. The output of SDPT(G,E) is a
symbolic expression representing those subsets G ′ ⊆ G, such that G ′ ∪ � E is unsatisfiable.
Thus with |G| = n, a single run of SDPT symbolically executes 2 n runs of DPT.
We introduce a set of Boolean variables BG . = {bg | g ∈ G}, one for each predicate
in G. An assignment σ : BG → {true,false} over BG uniquely represents a subset
G ′ . = {g | σ(bg) = true} of G.
Figure 4 presents the symbolic decision procedure for a theory T, which symbolically
executes the saturation based decision procedure DPT on all possible subsets of the input
component G. Just like the DPT algorithm, this procedure also has three main components:
initialization, saturation and contradiction detection. The algorithm also maintains sets W
and W ′ , as the DPT algorithm does.
Since SDP(G,E) has to execute DPT(G ′ ∪ � E) on all G ′ ⊆ G, the number of steps to
iterate the saturation loop equals the maximum derivDepth T(G ′ ∪ � E) for any G ′ ⊆ G. For
a set of predicates S, we define the bound maxDerivDepth T(S) as follows:
maxDerivDepth T(S) . = max {derivDepth T(S ′ ) | S ′ ⊆ S}
During the execution, the algorithm constructs a set of shared expressions with the
variables over BG as the leaves and temporary variables t[·] to name intermediate expressions.
We use t[(g,i)] to denote the expression for the predicate g after the iteration i of
the loop in step (2) of the algorithm. We use t[(g, ⊤)] to denote the top-most expression
for g in the shared expression. Below, we briefly describe each of the phases of SDPT:
: Initialization [Step (1)]. The set W is initialized to G ∪ � E and W ′ to {}. The leaves of
the shared expression symbolically encode each subset G ′ ∪ �E, for every G ′ ⊆ G. For each
g ∈ G, the leaf t[(g,0)] is set to bg. For any ei ∈ E, since ¬ei is present in all possible
subset G ′ ∪ � E, we replace the leaf for ¬ei with true.
: Saturation [Step (2)]. For each predicate g, S(g) is the set of derivations of g from
predicates in W ′ during any iteration. For any predicate g, we first add all the ways
to derive g until the previous steps by adding t[(g,i − 1)] to S(g). Every time g can be
derived from some set of facts g1,...,gk such that each gj is in W ′ , we add this derivation
to S(g) in Equation 3.1. At the end of the iteration i, t[(g,i)] and t[(g, ⊤)] are updated
with the set of derivations in S(g). The loop is executed maxDerivDepth T(G ∪ � E) times.
8 S. K. LAHIRI, T. BALL, AND B. COOK
(1) Initialization
(a) W ← G ∪ � E and W ′ ← {}.
(b) For each g ∈ G, t[(g,0)] ← bg.
(c) For each ei ∈ E, t[(¬ei,0)] ← true.
(2) For i = 1 to maxDerivDepthT(G ∪ � E) do: // Saturation
(a) W ′ ← W.
(b) Initialize S(g) = {}, for any predicate g.
(c) For every g ∈ W ′ , S(g) ← S(g) ∪ {t[(g,i − 1)]}.
(d) For every g, if (g : − g1,... ,gk) and gm ∈ W ′ for all m ∈ [1,k]:
(i) Update the set of derivations of g at this level:
⎛
S(g) ← S(g) ∪ { ⎝ �
⎞
t[(gm,i − 1)] ⎠} (3.1)
m∈[1,k]
(ii) W ← W ∪ {g}.
(e) For each g ∈ W: t[(g,i)] ← �
d∈S(g) d
(f) For each g ∈ W, t[(g, ⊤)] ← t[(g,i)]
(3) Check for contradiction:
(a) Initialize S(e) = {}.
(b) For every {g1,... ,gk} ⊆ W, if (⊥ : − g1,... ,gk) then
⎛
S(e) ← S(e) ∪ { ⎝ �
⎞
t[(gm, ⊤)] ⎠} (3.2)
m∈[1,k]
(c) Create the derivations for the goal e as t[e] ←
(4) Return the shared expression for t[e].
�� d∈S(e) d
�
Figure 4: Symbolic decision procedure SDPT(G,E) for theory T. The expression e stands
for �
ei∈E ei.
: Contradiction [Steps (3,4)]. We know that if G ′ ∪ � E is unsatisfiable, then G ′ implies e
(recall, e stands for �
ei∈E ei). Therefore, each derivation of ⊥ from predicates in W gives
a new derivation of e. The set S(e) collects these derivations and constructs the final
expression t[e], which is returned in step (4).
The output of the procedure is the shared expression t[e], where the leaves of the expression
are the variables in BG. The only operations in t[e] are conjunction and disjunction;
t[e] is thus a Boolean expression (or a Boolean circuit) over BG. The internal nodes in the
expression are shared and can be inputs to multiple nodes in the subsequent level. We now
define the evaluation of a (shared) Boolean expression inductively with respect to a subset
G ′ ⊆ G.
Definition 3.2. For any Boolean expression t[x] whose leaves are in set BG, and a set
G ′ ⊆ G, we define eval(t[x],G ′ ) as the recursive evaluation of t[x], after replacing each leaf
bg of t[x] with true if g ∈ G ′ and with false otherwise. The propositional connectives in
the expression (∧ and ∨) are interpreted using their standard meaning.
The following theorem explains the correctness of the symbolic decision procedure.
PREDICATE ABSTRACTION VIA SYMBOLIC DECISION PROCEDURES ∗ 9
Theorem 3.3. If t[e] . = SDPT(G,E), then for any set of predicates G ′ ⊆ G, eval(t[e],G ′ ) =
true if and only if DPT(G ′ ∪ � E) returns unsatisfiable.
To prove Theorem 3.3, we first describe an intermediate lemma about SDPT. To
disambiguate between the data structures used in DPT and SDPT, we use WS and W ′ S
(corresponding to symbolic) to denote W and W ′ respectively for the SDP algorithm.
Moreover, it is also clear that W ′ (respectively W ′ S ) at the iteration i (i > 1) is the same
as W (respectively WS) after i − 1 iterations.
Lemma 3.4. For any set of predicates G ′ ⊆ G, at the end of i (i ≥ 0) iterations of the loop
in step (2) of SDPT(G,E) and DPT(G ′ ∪ � E) procedures:
(1) W ⊆ WS, and
(2) eval(t[(g,i)],G ′ ) = true if and only if g ∈ W for the DPT algorithm.
Proof. We use an induction on i to prove this lemma, starting from i = 0.
For the base case (after step (1) of both algorithms), W = G ′ ∪ � E ⊆ G ∪ � E ⊆ WS.
Moreover, for this step, eval(t[(g,0)],G ′ ) for a predicate g can be true in two ways.
(1) If g ∈ � E, then step (1) of SDPT assigns it to true. Therefore eval(t[(g,0)],G ′ ) is true
for any G ′ . But in step (1) of DPT(G ′ ∪ � E), W contains all the predicates in G ′ ∪ � E,
and therefore g ∈ W.
(2) If g ∈ G ′ , then eval(t[(g,0)],G ′ ) = eval(bg,G ′ ) which is true, by the definition of
eval(,). Again g ∈ W after step (1) of the DPT algorithm too.
Let us assume that the inductive hypothesis holds for all values of i less than m.
Consider the iteration number m. It is easy to see that if any fact g is added to W in this
step, then g is also added to WS; therefore part (1) of the lemma is easily established.
To prove part (2) of the lemma, we will consider two cases depending of whether a
predicate g was present in W before the m th iteration:
(1) Let us assume that after m − 1 iterations of DPT(G ′ ∪ � E) procedure, g ∈ W. Since g is
never removed from W during any step of DPT, g ∈ W after m iterations too. Now, by
the inductive hypothesis, eval(t[(g,m − 1)],G ′ ) = true. However, t[(g,m − 1)] =⇒
t[(g,m)] (because t[(g,m)] contains t[(g,m − 1)] as one of its disjuncts in step 2(c) of
the SDPT algorithm). Therefore, eval(t[(g,m)],G ′ ) = true.
(2) We have to consider two cases depending on whether g can be derived in DPT(G ′ ∪ � E)
in step m.
(a) If g can’t be derived in this step in DPT algorithm, then there is no set {g1,...,gk} ⊆
W ′ (of DPT) such that g : − g1,...,gk. Since W ′ is the same as W after m − 1
iterations, we can invoke the induction hypothesis to show that there exists a predicate
gj ∈ {g1,... ,gk}, eval(t[(gj ,m − 1)],G ′ ) = false. Again, by the induction
hypothesis, eval(t[(g,m − 1)],G ′ ) = false, since g �∈ W after m − 1 steps. Thus
eval(t[(g,m)],G ′ ) = false.
(b) If g can be derived from {g1,... ,gk} ⊆ W ′ (of DPT), then �
j t[(gj,m − 1)] implies
t[(g,m)]. But for each gj ∈ {g1,...,gk}, eval((gj,m − 1),G ′ ) = true and thus
eval((g,m),G ′ ) = true.
This completes the induction proof.
10 S. K. LAHIRI, T. BALL, AND B. COOK
We are now ready to complete the proof of Theorem 3.3.
Proof. Consider the situation where both SDPT(G,E) and DPT(G ′ ∪ � E) have executed the
loop in step (2) for i = maxDerivDepthT(G ∪ � E). We will consider two cases depending on
whether ⊥ can be derived in DPT(G ′ ∪ � E) in step (3).
• Suppose after i iterations, there is a set {g1,... ,gk} ⊆ W, such that ⊥ : − g1,...,gk. This
implies that G ′ ∪ � E is unsatisfiable. By Lemma 3.4, we know that eval(t[(gj , ⊤)],G ′ ) =
true for each gj ∈ {g1,... ,gk}, and therefore eval(t[e],G ′ ) = true.
• On the other hand, let eval(t[e],G ′ ) = true. This implies that there exists a set
{g1,... ,gk} ⊆ WS, such that ⊥ : − g1,... ,gk and eval(t[(gj , ⊤)],G ′ ) = true for each
gj ∈ {g1,... ,gk}. By Lemma 3.4, we know that {g1,... ,gk} ∈ W, for the DPT procedure
too. This means that DPT(G ′ ∪ � E) will return unsatisfiable.
This completes the proof.
.
Corollary 3.5. For a set of predicates P, if t[e] = SDPT(P ∪ � P,E), then for any P ′ ⊆
(P ∪ � P) representing a minterm over P (i.e. pi ∈ P ′ iff ¬pi �∈ P ′ ), eval(t[e],P ′ ) =
eval(FP(e),P ′ ).
Hence t[e] is a shared expression for FP (e), where e denotes �
ei∈E ei. An explicit
.
representation of FP(e) can be obtained by first computing t[e]
= SDPT(P ∪ � P,E) and
then enumerating the cubes over P that make t[e] true.
In the following sections, we will instantiate T to be the EUF and DIF theories and
show that SDPT exists for such theories. For each theory, we only need to determine the
value of maxDerivDepth T(G) for any set of predicates G.
Example 3.6. Figure 5 demonstrates the working of the SDP(G,E) for a simple example.
The predicates in G . = {a = b,b = c,a = d,d = c} and E . = {a = c} are limited to equality
and disequality predicates. For this theory T, maxDerivDepth T(G ∪ � E) equals the lg(m),
where m is the number of terms in G ∪ � E. We do not show this result for equality theory
in this paper, but prove it for the more general theory of difference logic in Section 3.4.
Therefore, we need to iterate Step (2) of the algorithm, for lg({a,b,c,d}) = 2 steps in
Figure 4.
First, a Boolean variable bg is introduced for each of the predicate g ∈ G. These
variables represent t[(g,0)] for each g ∈ G. For each ei ∈ � E, we use true to represent
t[(ei,0)]. Then the Step (2) of the algorithm is repeated for 2 steps. At each step, new
derivations are produced from the existing set of predicates at the level. The nodes at each
level denotes the set W for the particular iteration. Each derivation from two predicates in
W is represented as the conjunction of the two predicates (using the diamond connective),
and multiple derivations for a predicate (e.g. 3 ways to derive a = c for i = 2) are represented
with multiple incoming edges to a node.
Finally, the contradiction inference rule is used to derive contradictions (⊥) at the last
level. Since the only way to derive contradiction in this example is using a = c and a �= c,
this is the only derivation of ⊥. The expression t[e] represents the acyclic graph rooted
at ⊥, whose leaves are symbols in BG. The expression t[e] intuitively represents all the
derivations of a = c from G. More precisely, it represents all the subsets of G that are
inconsistent with a �= c.
There are a couple of observations that one can make from the previous example:
PREDICATE ABSTRACTION VIA SYMBOLIC DECISION PROCEDURES ∗ 11
i = 0
i = 1
a=b
i = 2 a=b
b a=b
a=b
b=c
b=c
b b=c b a=d b d=c
b=c a=d d=c a ≠ c
a=c a=d d=c
a ≠ c
t[e] t[e
⊥
true
a=c b=d
a ≠ c
Figure 5: Example of SDP, where G . = {a = b,b = c,a = d,d = c} and E . = {a = c}.
The diamond connective represents conjunction, and multiple incoming edges
to a node represents a disjunction. The node corresponding to the predicate g
at level i represents t[(g,i)]. The figure omits several nodes and edges at each
level to make the diagram readable.
(1) The expression t[e] is a Boolean formula with BG as inputs and an alternation of AND
and OR operations. There are no negations (NOT) in the formula.
(2) Even for this simple example, there are several redundant derivations. For example,
consider the node a = b in level i = 2. At this level, a = b can either be derived from
a = b or from b = c and a = c, in the previous level. However, the derivation of a = c
in level i = 1 already uses a = b (at level i = 0) for one of its derivations. This means
that the set of derivations of a = b in level i = 2 contains redundant derivations. These
derivations do not affect the correctness of the procedure, but simply increases the size
of t[e]. However, as we will see in the next two sections, the size of the graph for t[e] is
still (pseudo) polynomially bounded for interesting theories.
Remark 3.7. It may be tempting to terminate the loop in step (2) of SDPT(G,E) once
the set of predicates in W does not change across two iterations. However, this would lead
to an incomplete procedure and the following example demonstrates this.
Example 3.8. Consider an example where G contains a set of predicates that denotes an
“almost” fully connected graph over vertices x1,...,xn. G contains an equality predicate
between every pair of variables except the edge between x1 and xn. Let E . = {x1 = xn}.
After one iteration of the SDPT algorithm on this example, W will contain an equality
between every pair of variables including x1 and xn since x1 = xn can be derived from
x1 = xi,xi = xn, for every 1 < i < n. Therefore, if the SDPT algorithm terminates once
the set of predicates in W stabilizes, the procedure will terminate after two steps.
Now, consider the subset G ′ = {x1 = x2,x2 = x3,... ,xi = xi+1,... ,xn−1 = xn} of
G. For this subset of G, DPT(G ′ ∪ � E) requires lg(n) > 1 (for n > 2) steps to derive the
12 S. K. LAHIRI, T. BALL, AND B. COOK
(1) Partition the set of terms in terms(G) into equivalence classes using the G= predicates.
At any point in the algorithm, let EC(t) denote the equivalence class for
any term t ∈ terms(G).
(a) Initially, each term belongs to its own distinct equivalence class.
(b) We define a procedure merge(t1,t2) that takes two terms as inputs. The
procedure first merges the equivalence classes of t1 and t2. If there are two
terms s1 . = f(u1,... ,un) and s2 . = f(v1,...,vn) such that EC(ui) = EC(vi),
for every 1 ≤ i ≤ n, then it recursively calls merge(s1,s2).
(c) For each t1 = t2 ∈ G=, call merge(t1,t2).
(2) If there exists a predicate t1 �= t2 in G�=, such that EC(t1) = EC(t2), then return
unsatisfiable; else satisfiable.
Figure 6: Simple description of the congruence closure algorithm.
fact x1 = xn. Therefore SDPT(G,E) does not simulate the action of DPT(G ′ ∪ � E). More
formally, we can show that eval(t[e],G ′ ) = false, but G ′ ∪ � E is unsatisfiable.
3.3. SDP for Equality and Uninterpreted Functions. The terms in this logic can
either be variables or application of an uninterpreted function symbol to a list of terms.
A predicate in this theory is t1 ∼ t2, where ti is a term and ∼ ∈ {=, �=}. For a set G
of EUF predicates, G= and G�= denote the set of equality and disequality predicates in G,
respectively. Figure 2 describes the inference rules for this theory.
Let terms(φ) denote the set of syntactically distinct terms in an expression (a term or
a formula) φ. For example, terms(f(h(x))) is {x,h(x),f(h(x))}. For a set of predicates G,
terms(G) denotes the union of the set of terms in any g ∈ G.
A decision procedure for EUF can be obtained by the congruence closure algorithm [NO80],
described in Figure 6.
For a set of predicates G, let m = |terms(G)|. We can show that if we iterate the loop
in step (2) of DPT(G) (shown in Figure 3) for at least 3m steps, then DPT can implement
the congruence closure algorithm. More precisely, for two terms t1 and t2 in terms(G), the
predicate t1 = t2 will be derived within 3m iterations of the loop in step 2 of DPT(G) if
and only if EC(t1) = EC(t2) after step (1) of the congruence closure algorithm (see proof
below).
Proposition 3.9. For a set of EUF predicates G, if m . = |terms(G)|, then the value of
maxDerivDepth T(G) for the theory is bound by 3m.
Proof. We first determine the derivDepth T(G) for any set of predicates in this theory.
Given a set of EUF predicates G, and two terms t1 and t2 in terms(G), we need to
determine the maximum number of iterations in step (2) of DPT(G) to derive t1 = t2 (if
G= implies t1 = t2).
Recall that the congruence closure algorithm(described in Figure 6) is a decision procedure
for the theory of EUF. At any point in the algorithm, the terms in G are partitioned
into a set of equivalence classes. The operation EC(t1) = EC(t2) is used to determine if t1
and t2 belong to the same equivalence class.
One way to maintain an equivalence class C . = {t1,...,tn} is to keep an equality ti = tj
between every pair of terms in C. At any point in the congruence closure algorithm, the
PREDICATE ABSTRACTION VIA SYMBOLIC DECISION PROCEDURES ∗ 13
set of equivalence classes corresponds to a set of equalities C= over terms. Then EC(u) =
EC(v) can be implemented by checking if u = v ∈ C=. Although this is certainly not an
efficient representation of equivalence classes, this representation allows us to build SDPT
for this theory.
Let us implement the C ′ .
= = merge(C=,t1,t2) operation that takes in the current set of
equivalence classes C=, two terms t1 and t2 that are merged and returns the set of equalities
C ′ =
denoting the new set of equivalence classes. This can be implemented using the step
(2) of the DPT algorithm as follows:
(1) C ′ = ← C= ∪ {t1 = t2}.
(2) For every term u ∈ EC(t1), (i.e. u = t1 ∈ C=), add the predicate u = t2 to C ′ = by
the transitive rule u = t2 : − u = t1,t1 = t2. Similarly, for every v ∈ EC(t2), add the
predicate v = t1 to C ′ = by v = t1 : − v = t2,t2 = t1. All these steps can be performed
in one iteration of step 2.
(3) For every u ∈ EC(t1) and every v ∈ EC(t2), add the edge u = v to C ′ = by either of the
two transitive rules (u = v : − u = t2,t2 = v) or (u = v : − u = t1,t1 = v).
(4) Return C ′ =
If there are m distinct terms in G, then there can be at most m merge operations, as
each merge reduces the number of equivalence classes by one and there were m equivalence
classes at the start of the congruence closure algorithm. Each merge requires three iterations
of the step (2) of the DPT algorithm to generate the new equivalence classes. Hence, we
will need at most 3m iterations of step (2) of DPT to derive any fact t1 = t2 that is implied
by G=.
Observe that this decision procedure DPT for EUF does not need to derive a predicate
t1 = t2 from G, if both t1 and t2 do not belong to terms(G). Otherwise, if one generates
t1 = t2, then the infinite sequence of predicates f(t1) = f(t2),f(f(t1)) = f(f(t2)),... can
be generated without ever converging.
Again, since maxDerivDepth T(G) is the maximum derivDepth T(G ′ ) for any subset G ′ ⊆
G, and any G ′ can have at most m terms, maxDerivDepth T(G) is bounded by 3m. We also
believe that a more refined counting argument can reduce it to 2m, because two equivalent
classes can be merged simultaneously in the DPT algorithm.
3.3.1. Complexity of SDPT. The run time and size of expression generated by SDPT depend
both on maxDerivDepth T(G) for the theory and also on the maximum number of predicates
in W at any point during the algorithm. The maximum number of predicates in W can be
at most m(m −1)/2, considering equality between every pair of term. The disequalities are
never used except for generating contradictions. It is also easy to verify that the size of S(g)
(used in step (2) of SDPT) is polynomial in the size of input.Hence the run time of SDPT
for EUF and the size of the shared expression returned by the procedure is polynomial in
the size of the input.
3.4. SDP for Difference Logic. Difference logic is a simple yet useful fragment of linear
arithmetic, where predicates are of the form x ⊲⊳ y +c, where x, y are variables, ⊲⊳∈ {<, ≤}
and c is a real constant. Any equality x = y +c is represented as a conjunction of x ≤ y +c
and y ≤ x − c. The variables x and y are interpreted over real numbers. The function
14 S. K. LAHIRI, T. BALL, AND B. COOK
X ≤ Z + C Z ⊲⊳ Y + D
X ⊲⊳ Y + (C + D)
(a)
X < Z + C Z ⊲⊳ Y + D
X < Y + (C + D)
(b)
X < Y + C Y ⊲⊳ X + D C + D ≤ 0
⊥
(c)
X ≤ Y + C Y ≤ X + D C + D < 0
⊥
X ≤ Y Y ≤ X
X = Y
Figure 7: Inference rules for Difference logic.
symbol “+” and the predicate symbols {<, ≤} are the interpreted symbols of this theory.
Figure 7 presents the inference rules for this theory 4 .
Given a set G of difference logic predicates, we can construct a graph where the vertices
of the graph are the variables in G and there is a directed edge in the graph from x to y,
labeled with (⊲⊳,c) if x ⊲⊳ y + c ∈ G. We will use a predicate and an edge interchangeably
in this section.
Definition 3.10. A simple cycle x1 ⊲⊳ x2 + c1,x2 ⊲⊳ x3 + c2,...,xn ⊲⊳ x1 + cn (where each
xi is distinct) is “illegal” if the sum of the edges is d = Σ i∈[1,n]ci and either (i) all the edges
in the cycle are ≤ edges and d < 0, or (ii) at least one edge is an < edge and d ≤ 0.
It is well known [CLR90] that a set of difference predicates G is unsatisfiable if and
only the graph constructed from the predicates has a simple illegal cycle. Alternately, if
we add an edge (⊲⊳,c) between x and y for every simple path from x to y of weight c (⊲⊳
determined by the labels of the edges in the path), then we only need to check for simple
cycles of length two in the resultant graph. This corresponds to the rules (C) and (D) in
Figure 7.
For a set of predicates G, a predicate corresponding to a simple path in the graph of
G can be derived within lg(m) iterations of step (2) of DPT procedure, where m is the
number of variables in G (see proof below).
Proposition 3.11. For a set of DIF predicates G, if m is the number of variables in G,
then maxDerivDepth T(G) for the DIF theory is bound by lg(m).
Proof. It is not hard to see that if there is a simple path x ⊲⊳1 x1 + c1,x1 ⊲⊳2 x2 +
c2,...,xn−1 ⊲⊳n y + cn in the original graph of G, then after lg(m) iterations of the loop in
step (2), there is a predicate x ⊲⊳ ′ y + c in W; where c = Σ i∈[1,n−1]ci and ⊲⊳ ′ is < if at least
one of ⊲⊳i is < and ≤ otherwise. This is because if there is a simple path between x and y
through edges in G with length (number of edges from G) between 2 i−1 and 2 i , then the
algorithm DPT generates a predicate for the path during iteration i.
However, DPT can produce a predicate x ⊲⊳ y+c, even though none of the simple paths
between x and y add up to this predicate. These facts are generated by the non-simple paths
that go around cycles one or more times. Consider the set G . = {x < y + 1,y < x − 2,x <
4 Constraints like x ⊲⊳ c are handled by adding a special variable x0 to denote the constant 0, and rewriting
the constraint as x ⊲⊳ x0 + c [SSB02].
(d)
(e)
PREDICATE ABSTRACTION VIA SYMBOLIC DECISION PROCEDURES ∗ 15
z − 1,...}. In this case we can produce the fact y < z − 3 from y < x − 2,x < z − 1 and
then x < z − 2 from y < z − 3,x < y + 1.
To prove the correctness of the DPT algorithm, we will show these additional facts can
be safely generated. Consider two cases:
• Suppose there is an illegal cycle in the graph. In that case, after lg(m) steps, we will have
two facts x ⊲⊳ y + c and y ⊲⊳ x + d in W such that they form an illegal cycle. Thus DPT
returns unsatisfiable.
• Suppose there are no illegal cycles in the original graph for G. For simplicity, let us
assume that there are only < edges in the graph. A similar argument can be made when
≤ edges are present.
In this case, every cycle in the graph has a strictly positive weight. A predicate x ⊲⊳ y+d
can be generated from non-simple paths only if there is a predicate x ⊲⊳ y + c ∈ G such
that c < d. The predicate x ⊲⊳ y + d can’t be a part of an illegal cycle, because otherwise
x ⊲⊳ y + c would have to be part of an illegal cycle too. Hence DPT returns satisfiable.
Note that we do not need any inference rule to weaken a predicate, X < Y + D : −
X < Y + C, with C < D. This is because we use the predicates generated only to detect
illegal cycles. If a predicate x < y + c does not form an illegal cycle, then neither does any
weaker predicate x < y + d, where d ≥ c.
3.4.1. Complexity of SDPT. Let cmax be the absolute value of the largest constant in the
set G. We can ignore any derived predicate in of the form x ⊲⊳ y + C from the set W where
the absolute value of C is greater than (m −1) ∗cmax. This is because the maximum weight
of any simple path between x and y can be at most (m − 1) ∗ cmax. Again, let const(g) be
the absolute value of the constant in a predicate g. The maximum weight on any simple
path has to be a combination of these weights. Thus, the absolute value of the constant is
bound by:
C ≤ min{(m − 1) ∗ cmax,Σg∈Gconst(g)}
The maximum number of derived predicates in W can be 2 ∗ m 2 ∗ (2 ∗ C + 1), where a
predicate can be either ≤ or <, with m 2 possible variable pairs and the absolute value of
the constant is bound by C. This is a pseudo polynomial bound as it depends on the value
of the constants in the input.
However, many program verification queries use a subset of difference logic where each
predicate is of the form x ⊲⊳ y or x ⊲⊳ c. For this case, the maximum number of predicates
generated can be 2 ∗ m ∗ (m − 1 + k), where k is the number of different constants in the
input.
4. Combining SDP for saturation theories
In this section, we provide a method to construct a symbolic decision procedure for the
combination of saturation theories T1 and T2, given SDP for T1 and T2. The combination
is based on an extension of the Nelson-Oppen (N-O) framework [NO79] that constructs a
decision procedure for the theory T1 ∪ T2 using the decision procedures of T1 and T2.
16 S. K. LAHIRI, T. BALL, AND B. COOK
We assume that the theories T1 and T2 have disjoint signatures (i.e., they do not share
any function symbol), and each theory Ti is convex and stably infinite 5 . Let us briefly
explain the N-O method for combining decision procedures before explaining the method
for combining SDP.
4.1. Nelson-Oppen method for Combining Decision Procedures. Given two theories
T1 and T2, and the decision procedures DPT1 and DPT2 , the N-O framework constructs
the decision procedure for T1 ∪ T2, denoted as DPT1 ∪T2 .
To decide an input set G, the first step in the procedure is to purify G into sets G1
and G2 such that Gi only contains symbols from theory Ti and G is satisfiable if and only
if G1 ∪ G2 is satisfiable. Consider a predicate g . = p(t1,... ,tn) in G, where p is a theory
T1 symbol. The predicate g is purified to g ′ by replacing each subterm tj whose top-level
symbol does not belong to T1 with a fresh variable wj. The expression tj is then purified to
t ′ j recursively. We add g′ to G1 and the binding predicate wj = t ′ j to the set G2. We denote
the latter as binding predicate because it binds the fresh variable wj to a term t ′ j .
Let Vsh be the set of shared variables that appear in G1 ∩ G2. A set of equalities ∆
over variables in Vsh is maintained; ∆ records the set of equalities implied by the facts from
either theory. Initially, ∆ = {}.
Each theory Ti then alternately decides if DPTi (Gi ∪ ∆) is unsatisfiable. If any theory
reports unsatisfiable, the algorithm returns unsatisfiable; otherwise, the theory Ti
generates the new set of equalities over Vsh that are implied by Gi ∪ ∆6 . These equalities
are added to ∆ and are communicated to the other theory. This process is continued until
the set ∆ does not change. In this case, the method returns satisfiable. Let us denote
this algorithm as DPT1 ∪T2 .
Theorem 4.1 ([NO79]). For convex, stably infinite and signature-disjoint theories T1 and
T2, DPT1 ∪T2 is a decision procedure for T1 ∪ T2.
There can be at most |Vsh| irredundant equalities over Vsh, therefore the N-O loop
terminates after |Vsh| iterations for any input.
4.2. Combining SDP using Nelson-Oppen method. We will briefly describe a method
to construct the SDPT1 ∪T2 by combining SDPT1 and SDPT2 . As before, the input to the
method is the pair (G,E) and the output is an expression t[e]. The facts in E are also
purified into sets E1 and E2 and the new binding predicates are added to either G1 or G2.
Our goal is to symbolically encode the runs of the N-O procedure for G ′ ∪ � E, for every
G ′ ⊆ G. For any equality predicate δ over Vsh, we maintain an expression ψδ that records
all the different ways to derive δ (initialized to false). We also maintain an expression ψe
to record all the derivations of e (initialized to false).
The N-O loop operates just like the case for constructing DPT1 ∪T2 . The SDPTi for
each theory Ti now takes (Gi ∪ ∆,Ei) as input, where ∆ is the set of equalities over Vsh
derived so far. In addition to computing the (shared) expression t[e] as before, SDPTi also
5We need these restrictions only to exploit the N-O combination result. The definition of convexity and
stably infiniteness can be found in [NO79].
6We assume that each theory has an inference rule for deriving equality between variables in the theory,
and DPT also returns a set of equality over variables.
PREDICATE ABSTRACTION VIA SYMBOLIC DECISION PROCEDURES ∗ 17
returns the expression t[(δ, ⊤)], for each equality δ over Vsh that can be derived in step (2)
of the SDPT algorithm.
The leaves of the expressions t[e] and t[(δ, ⊤)] are Gi∪∆ (since leaves for � Ei are replaced
with true). We substitute the leaves for any δ ∈ ∆ with the expression ψδ, to incorporate
the derivations of δ until this point. We also update ψδ ← (ψδ ∨ t[(δ, ⊤)]) to add the new
derivations of δ. Similarly, we update ψe ← (ψe ∨ t[e]) with the new derivations.
The N-O loop iterates |Vsh| number of times to ensure that it has seen every derivation
of a shared equality over Vsh from any set G ′ 1 ∪ G′ 2 ∪ � E1 ∪ � E2, where G ′ i ⊆ Gi.
After the N-O iteration terminates, ψe contains all the derivations of e from G. However,
at this point, there are two kind of predicates in the leaves of ψe; the purified predicates
and the binding predicates. If g ′ was the purified form of a predicate g ∈ G, we replace the
leaf for g ′ with bg. The leaves of the binding predicates are replaced with true, as the fresh
variables in these predicates are really names for subterms in any predicate, and thus their
presence does not affect the satisfiability of a formula. Let t[e] denote the final expression
for ψe that is returned by SDPT1 ∪T2 . Observe that the leaves of t[e] are variables in BG.
Theorem 4.2. For two convex, stably-infinite and signature-disjoint theories T1 and T2, if
t[e] . = SDPT1 ∪T2 (G,E), then for any set of predicates G′ ⊆ G, eval(t[e],G ′ ) = true if and
only if DPT1 ∪T2 (G′ ∪ � E) returns unsatisfiable.
Since the theory of EUF and DIF satisfy all the restrictions of the theories of this section,
we can construct an SDP for the combined theory that still runs in pseudo-polynomial time.
5. Implementation and Results
We have implemented a prototype of the symbolic decision procedure for the combination
of EUF and DIF theories.To construct FP(e), we first build a BDD (using the
CUDD [CUD] BDD package) for the expression t[e] (returned by SDPT(P ∪ � P,E)) and
then enumerate the cubes from the BDD.
Creating the BDD for the shared expression t[e] and enumerating the cubes from the
BDD can have exponential complexity in the worst case. This is because the expression for
FP(e) can involve an exponential number of cubes (e.g. the example in Fig 8). However,
most problems in practice have a few cubes in FP(e). Secondly, as the number of leaves of
t[e] (alternately, number of BDD variables) is bound by |P |, the size of the overall BDD is
usually small, and is computed efficiently in practice. Finally, by generating only the prime
implicants 7 of FP(e) from the BDD, we obtain a compact representation of FP(e).
We report preliminary results evaluating our symbolic decision procedure based predicate
abstraction method on a set of software verification benchmarks. The benchmarks are
generated from the predicate abstraction step for constructing Boolean Programs from C
programs of Microsoft Windows device drivers in SLAM [BMMR01].
We compare our method with two other methods for performing predicate abstraction:
: DP-based: This method uses the decision procedure zapato [BCLZ04] to enumerate
the set of cubes that imply e. Various optimizations (e.g. considering cubes in
7 .
For any Boolean formula φ over variables in V , prime implicants of φ is a set of cubes C = {c1, . . . , cm}
over V such that φ ⇔ W
c∈C c and two or more cubes from C can’t be combined to form a larger cube.
18 S. K. LAHIRI, T. BALL, AND B. COOK
a1
b1
c1
d1
a2
b2
c2
d2
an
bn
cn
dn
n |P | SDPT UCLID
time (s) time (s)
3 14 0.20 19.37
4 19 0.43 656
5 24 0.65 -
10 49 5.81 -
12 59 12.28 -
Figure 8: Result on diamond examples with increasing number of diamonds. The expression
e is (a1 = dn). A “-” denotes a timeout of 1000 seconds.
increasing order of size) are used to prevent enumerating exponential number of
cubes in practice.
: UCLID-based: This method performs quantifier-elimination using incremental SATbased
methods [LBC03]. The procedure works by first converting the problem into
an existential quantifier elimination problem in first-order logic and then reducing
it to Boolean quantifier elimination by using an encoding to Boolean logic. Finally,
it uses SAT-based methods for performing Boolean quantification.
To compare with the DP-based method, we generated 665 predicate abstraction queries
from the verification of device-driver programs. Most of these queries had between 5 and
14 predicates in them and are fairly representative of queries in SLAM. The run time of
DP-based method was 27904 seconds on a 3 GHz. machine with 1GB memory. The run
time of SDP-based method was 273 seconds. This gives a little more than 100X speedup
on these examples, demonstrating that our approach can scale much better than decision
procedure based methods. We have not been able to run UCLID-based method on these
particular SLAM benchmarks; the UCLID-based tool is no longer actively maintained, and
we had trouble translating these SLAM benchmarks to input of UCLID. From our earlier
experience of using UCLID on similar benchmarks (Fig. 3 in [LBC03]), we believe that
most of these benchmarks can be solved within a few seconds, and the total runtime would
not differ by more than 2–3X (in favor of the current technique).
To compare with UCLID-based approach, we generated different instances of a problem
(see Figure 8 for the example) where P is a set of equality predicates representing
n diamonds connected in a chain and e is an equality a1 = dn. We generated different
problem instances by varying the size of n. For an instance with n diamonds, there are
5n − 1 predicates in P and 2 n cubes in FP(e) to denote all the paths from a1 to dn. Figure
8 shows the result comparing both the methods. We should note that UCLID method
was run on a slightly slower 2GHz machine. The results illustrate that our method scales
much better than the SAT-based enumeration used in UCLID for this example. Intuitively,
UCLID-based approach grows exponentially with the number of predicates (2 |P | ), whereas
our approach only grows exponentially with the number of diamonds (2 n ) in the result.
6. Conclusions and future work
In this paper, we have presented the concept of symbolic decision procedures and showed
its use for predicate abstraction. We have provided an algorithm for synthesizing a SDP for
any bounded saturation theory. We show that such SDP exists for interesting theories such
PREDICATE ABSTRACTION VIA SYMBOLIC DECISION PROCEDURES ∗ 19
as EUF and difference logic. These SDP construct a shared expression and run with polynomial
and pseudo-polynomial complexity respectively. Finally, we have provided a method
for constructing the SDP for simple mixed theories using an extension of the Nelson-Oppen
combination framework. Preliminary results comparing it some of the existing approaches
are encouraging.
There are several avenues of future work, some of which are outlined below:
• First, it is interesting to find out how to construct a SDP for other theories, including
the theory of linear arithmetic (over rationals). For linear arithmetic, one can perform
a “symbolic” Fourier-Motzkin [DE73] elimination procedure to construct an SDP — the
inference rule would eliminate a variable from all the predicates in a given level. However,
it is not clear how to generate implied equalities from such a procedure to combine the
SDP with SDP for other theories.
• Second, as the example in Figure 5 illustrated, there are a lot of redundant derivations
present in the resultant expression. The algorithm will benefit from optimizations that
can minimize such redundant derivations.
• Extend the combination of SDPs to non-convex theories.
References
[BCDR04] T. Ball, B. Cook, S. Das, and S. K. Rajamani. Refining Approximations in Software Predicate
Abstraction. In Kurt Jensen and Andreas Podelski, editors, Proc. Tools and Algorithms for
the Construction and Analysis of Systems (TACAS ’04), LNCS 2988, pages 388–403. Springer-
Verlag, 2004.
[BCLZ04] T. Ball, B. Cook, S. K. Lahiri, and L. Zhang. Zapato: Automatic Theorem Proving for Software
Predicate Abstraction Refinement. In R. Alur and D. Peled, editors, Computer Aided Verification
(CAV ’04), LNCS 3114. Springer-Verlag, 2004.
[BMMR01] T. Ball, R. Majumdar, T. Millstein, and S. K. Rajamani. Automatic predicate abstraction of C
programs. In Programming Language Design and Implementation (PLDI ’01), pages 203–213,
Snowbird, Utah, June, 2001. SIGPLAN Notices, 36(5), May 2001.
[Bry86] R. E. Bryant. Graph-based algorithms for Boolean function manipulation. IEEE Transactions
on Computers, C-35(8):677–691, August 1986.
[CCG + 04] S. Chaki, E. M. Clarke, A. Groce, S. Jha, and H. Veith. Modular Verification of Software
Components in C. IEEE Transactions on Software Engineering, 30(6):388–402, June 2004.
[CKSY04] E. Clarke, D. Kroening, N. Sharygina, and K. Yorav. Predicate abstraction of ANSI–C programs
using SAT. Formal Methods in System Design (FMSD), 25:105–127, September–November 2004.
[CLR90] T. H. Cormen, C. E. Leiserson, and R. L. Rivest. Introduction to Algorithms. MIT Press, 1990.
[CUD] CUDD:CU Decision Diagram Package.
Available at http://vlsi.colorado.edu/~fabio/CUDD/cuddIntro.html.
[DD01] S. Das and D. Dill. Successive approximation of abstract transition relations. In IEEE Symposium
of Logic in Computer Science(LICS ’01), pages 51–60. IEEE Computer Society, June
2001.
[DDP99] S. Das, D. Dill, and S. Park. Experience with predicate abstraction. In N. Halbwachs and
D. Peled, editors, Computer-Aided Verification (CAV ’99), LNCS 1633, pages 160–171. Springer-
Verlag, July 1999.
[DE73] G.B. Dantzig and B. C. Eaves. Fourier-motzkin elimination and its dual. Journal of Combinatorial
Theory, A(14):288–297, 1973.
[FQ02] C. Flanagan and S. Qadeer. Predicate abstraction for software verification. In Launchbury and
Mitchell [LM02], pages 191–202.
[GS97] S. Graf and H. Saïdi. Construction of abstract state graphs with PVS. In O. Grumberg, editor,
Computer-Aided Verification (CAV ’97), LNCS 1254. Springer-Verlag, June 1997.
[HJMS02] T. A. Henzinger, R. Jhala, R. Majumdar, and G. Sutre. Lazy Abstraction. In Launchbury and
Mitchell [LM02], pages 58–70.
20 S. K. LAHIRI, T. BALL, AND B. COOK
[JM05] R. Jhala and K. L. McMillan. Interpolant-based transition relation approximation. In Kousha
Etessami and Sriram K. Rajamani, editors, Computer Aided Verification (CAV ’05), volume
3576 of Lecture Notes in Computer Science, pages 39–51. Springer, 2005.
[LBC03] S. K. Lahiri, R. E. Bryant, and B. Cook. A symbolic approach to predicate abstraction. In W. A.
Hunt, Jr. and F. Somenzi, editors, Computer-Aided Verification (CAV 2003), LNCS 2725, pages
141–153. Springer-Verlag, 2003.
[LM02] John Launchbury and John C. Mitchell, editors. Proceedings of the 29th ACM SIGPLAN-
SIGACT Symposium on Principles of programming languages (POPL ’02). ACM Press, 2002.
[McM02] K. McMillan. Applying SAT Methods in Unbounded Symbolic Model Checking. In E. Brinksma
and K. G. Larsen, editors, Proc. Computer-Aided Verification (CAV’02), LNCS 2404, pages
250–264, July 2002.
[NK00] K. S. Namjoshi and R. P. Kurshan. Syntactic program transformations for automatic abstraction.
In A. Emerson and P. Sistla, editors, Computer Aided Verification, LNCS 1855, pages 435–449,
2000.
[NO79] G. Nelson and D. C. Oppen. Simplification by cooperating decision procedures. ACM Transactions
on Programming Languages and Systems (TOPLAS), 2(1):245–257, 1979.
[NO80] G. Nelson and D. C. Oppen. Fast decision procedures based on the congruence closure. Journal
of the ACM, 27(2):356–364, 1980.
[SS99] H. Saïdi and N. Shankar. Abstract and model check while you prove. In N. Halbwachs and
D. Peled, editors, Computer-Aided Verification, volume 1633 of LNCS, pages 443–454. Springer-
Verlag, July 1999.
[SSB02] O. Strichman, S. A. Seshia, and R. E. Bryant. Deciding Separation Formulas with SAT. In
E. Brinksma and K. G. Larsen, editors, Proc. Computer-Aided Verification (CAV’02), LNCS
2404, pages 209–222, July 2002.
This work is licensed under the Creative Commons Attribution-NoDerivs License. To view
a copy of this license, visit http://creativecommons.org/licenses/by-nd/2.0/ or send a
letter to Creative Commons, 559 Nathan Abbott Way, Stanford, California 94305, USA.
