DRMacIver's Notebook
Set operations on SAT problems
Set operations on SAT problems
I’ve figured out a neat trick recently that I’ve not seen before, although I don’t know if it’s actually novel. It’s a fairly general method for manipulating SAT problems in CNF form to perform arbitrary set options on their solution set, without adding additional variables to the problem.It’s somewhere between well-known and obvious that you can do it with adding additional variables, and I make use of that for doing this.
(Disclaimer: The idea definitely works, and I’ve run the code I’m posting and it seems to work, but it’s undertested and may be buggy)
Here’s some code for calculating the set difference of two SAT problems expressed in CNF:
from pysat.solvers import Glucose4
from .cnfbuilder import CNFBuilder
def setminus(A: list[list[int]], B: list[list[int]]) -> list[list[int]]:
"""Returns a set of SAT clauses such that the satisfying
solutions are precisely the satisfying solutions of A that
do not satisfy B.
A, B are SAT formulae represented in CNF in the usual way
(positive literals correspond to a variable, negative literals
to the negation fo that variable). The returned formula is in
the same format.
"""
# Set of clauses that we incrementally expand until their
# satisfying solutions are solutions(A) - solutions(B).
= list(A)
result_clauses
= list(B)
B
# We are only interested in clauses in B that are not also in A,
# as every clause that is in A is always satisfied by an element
# of our result.
= {tuple(sorted(c)) for c in B} - {tuple(sorted(c)) for c in A}
target_clauses
# Every element of A satisfies B, so the set is empty.
if not target_clauses:
return [[1], [-1]]
= CNFBuilder(A)
builder
= builder.all([builder.any(c) for c in target_clauses])
b_is_satisfied
# This is a set of clauses for the intersection of our current
# best answer with B. If these are satisfiable we're not done yet.
= list(A) + list(B)
intersection_clauses
# This is a set of clauses with new variables introduced
# such that an assignment is valid iff it can be extended
# to a valid assignment of these clauses.
= builder.clauses + [[-b_is_satisfied]]
general_clauses
while True:
= Glucose4(intersection_clauses)
solver if not solver.solve():
return result_clauses
# This is a solution to our result_clauses that is also in
# B, and so indicates that we're not done.
= solver.get_model()
counter_example
# We now feed it in to the general clauses. This cannot be
# extended to a satisfying solution, because a solution is
# satisfying for general_clauses if and only if it corresponds
# to the unique extension of some satisfaction of A setminus B.
= Glucose4(general_clauses)
clause_generator = clause_generator.solve(assumptions=counter_example)
solved assert not solved
# This gives us a minimal clause that any solution to A setminus B must
# satisfy k
= clause_generator.get_core()
core assert core is not None
= [-v for v in core]
new_clause
general_clauses.append(new_clause)
intersection_clauses.append(new_clause) result_clauses.append(new_clause)
Where CNFBuilder is the following fairly straightforward class for building up a set of clauses:
class CNFBuilder:
def __init__(self, clauses):
self.clauses = list(clauses)
self.__nextvar = (
max(abs(literal) for clause in clauses for literal in clause) + 1
)
def any(self, variables) -> int:
= self.__newvar()
result # Any variable being true implies the result variable is true.
for v in variables:
self.__implies(v, result)
# If none of the variables are true, then any(variables) is false.
self.clauses.append([-result] + [v for v in variables])
return result
def all(self, variables) -> int:
= self.__newvar()
result for v in variables:
self.__implies(result, v)
# If all of the variables are true, then all(variables) is true.
self.clauses.append([result] + [-v for v in variables])
return result
def __implies(self, l, r):
self.clauses.append([-l, r])
def __newvar(self):
= self.__nextvar
result self.__nextvar += 1
return result
Let’s unpack this:
We’ve got sets of clauses A, B, and we want to calculate a set of clauses that represents the set of solutions solutions(A) - solutions(B). We do this by starting from A, and iteratively refining it to exclude solutions we don’t want.
This works by building up a set of clauses C as follows:
- Initially, C = A. We maintain throughout that
solutions(A) - solutions(B) subset solutions(C) subset solutions(A)
. - We check if there are any solutions to C that are also solutions to
B. If there are not, we’re done and
solutions(C) = solutions(A) - solutions(B)
. - If not, we find a subset of the resulting assignment such that no
extension of that partial assignment can be in
solutions(A) - solutions(B)
. - We add a negation of that subset to C as a new clause, and repeat.
Part (2) is easy: If you concatenate two sets of clauses, you’ll get
a set of clauses whose solutions are precisely the intersection of the
two solutions.
i.e. solutions(C + B) = solutions(C) & solutions(B)
. So
any solution to solutions(C) + solutions(B)
gives us the
desired example.
Part (3) is solved through two observations: Firstly, we can easily
construct a larger SAT problem with more variables such that
x in solutions(A) - solutions(B)
iff x
extends
to a solution of the larger problem. We do this by introducing variables
that correspond to B being satisfied (first we introduce a variable for
each clause in B that is not in A, and then we introduce a variable
corresponding to their intersection). We can then feed in our solution
to an incremental SAT solver, get a minimal subset of it that cannot be
extended to a solution (and thus is not present in any member of
solutions(A) - solutions(B)
, and add this as a clause
ruling out our counterexample and anything like it, without ruling out
any solutions we want to include.
This is a fairly general solution that can be used to perform arbitrary boolean operations on SAT problems, because the components you need are:
- Some set of clauses representing a superset of your desired target set.
- An oracle that will provide you with an example satisfying your set of clauses that is not in the desired target set (or say that none exists).
- An oracle that will take that example and turn it into clause that every element of your desired target set satisfies.
You then just iteratively add clauses to your initial set of clauses until you are done.
You can always use the empty set of clauses for (1),This isn’t great though, because the runtime of this is that it takes two calls to a SAT solver per clause added, so where possible you want to start with a set of clauses that closely approximates your target. and in the absence of a more convenient choice of oracle for (2) you can always construct it in the same way that the oracle for (3) is constructed, by adding new variables.
Another neat thing you can do with essentially the same ideaAnd, in fact, where I first figured this out. is you can take a SAT problem and remove a variable from itYou can use this to refine to arbitrary subsets of the variables by removing one variable at a time, but I’ve not found a better way to do it than that - the natural ways of representing it all introduce exponentially many new variables in the number of variables you want to remove., creating the set of clauses that have a satisfying solution if and only if there is some assignment of the removed variable that satisfies them. Here’s some code that does that:
def remove_one_variable(clauses, variable):
"""Return a set of SAT clauses not containing `variable` that accepts only
assignments that can be turned into satisfying assignments of `clauses` through
a suitable choice of assignment for `variable`."""
assert variable > 0
= Glucose4(clauses)
solver
# Current set of clauses representing our result set. Initially starts
# out as all clauses that do not reference the variable to remove.
list[Clause] = []
refined_clauses:
# All clauses in the original where the variable appears positively.
= []
positive_clauses
# All clauses in the original where the variable appears negatively.
= []
negative_clauses
for clause in clauses:
if variable in clause:
= sorted(set(clause) - {variable})
clause
positive_clauses.append(clause)elif -variable in clause:
= sorted(set(clause) - {-variable})
clause
negative_clauses.append(clause)else:
refined_clauses.append(clause)
assert positive_clauses or negative_clauses
= CNFBuilder(refined_clauses)
builder
= builder.all(
positive_satisfied any(clause) for clause in positive_clauses]
[builder.
)
= builder.all(
negative_satisfied any(clause) for clause in negative_clauses]
[builder.
)
# These clauses are satisfied if and only if all of our
# current result clauses are satisfied but there is no possible
# assignment of the removed variable that makes the original set
# of clauses satisfied.
= builder.clauses
oracle_clauses -positive_satisfied])
oracle_clauses.append([-negative_satisfied])
oracle_clauses.append([= Glucose4(oracle_clauses)
oracle
while True:
# Every solution of our refined_clauses has a satisfying extension,
# so we're done.
if not oracle.solve():
return refined_clauses
# We've got some solution to the refined clauses that cannot be extended.
# Turn it into a set of assumptions that cannot be satisfied in the original
# SAT problem.
= oracle.get_model()
counterexample assert counterexample is not None
= [
assumptions for literal in counterexample if abs(literal) != variable
literal
]
= solver.solve(assumptions=assumptions)
solved_assumptions assert not solved_assumptions
= solver.get_core()
core assert core is not None
# We've learned a new clause that must always hold for the remaining variables.
= tuple([-literal for literal in core])
new_clause
refined_clauses.append(new_clause) oracle.add_clause(new_clause)
The idea here is that we use our oracle to find some assignment of our current set of clauses over all the remaining variables that cannot be satisfied, then use the original set of clauses to turn that into a new learned clause. It’s not exactly the same as the set operations method, but it involves essentially the same ideas and technologies.
The downside of all of these methods is that they take two SAT calls per added clause, and there’s no guarantee that they don’t add exponentially many clauses. This isn’t just an “I can’t prove they don’t” no guarantee - sometimes they actually do. For example, in the context I was trying this, I found that most of the variables in my SAT problem were easily removable by adding 10s of clauses, but some seemed to run forever. This isn’t actually a problem for my use case for the most part (because I can just skip the ones that take too long), but it is a fairly fundamental limitation of the approach.
I’m reasonably sure this is just intrinsic to the problem - you can represent any boolean expression as CNF, but there’s no guarantee that the CNF isn’t exponentially larger than the represented expression, so where it is you’ll end up with exactly this sort of blow up.
Anyway, like I said, I don’t know if this is a new idea. I’ve not seen it before, and I think I would have if it was common knowledge, but it’s totally possible I’ve just not read the right bits of the literature to run into it.