DRMacIver's Notebook

Set operations on SAT problems

Set operations on SAT problems

I’ve figured out a neat trick recently that I’ve not seen before, although I don’t know if it’s actually novel. It’s a fairly general method for manipulating SAT problems in CNF form to perform arbitrary set options on their solution set, without adding additional variables to the problem.It’s somewhere between well-known and obvious that you can do it with adding additional variables, and I make use of that for doing this.

(Disclaimer: The idea definitely works, and I’ve run the code I’m posting and it seems to work, but it’s undertested and may be buggy)

Here’s some code for calculating the set difference of two SAT problems expressed in CNF:

from pysat.solvers import Glucose4
from .cnfbuilder import CNFBuilder


def setminus(A: list[list[int]], B: list[list[int]]) -> list[list[int]]:
    """Returns a set of SAT clauses such that the satisfying
    solutions are precisely the satisfying solutions of A that
    do not satisfy B.

    A, B are SAT formulae represented in CNF in the usual way
    (positive literals correspond to a variable, negative literals
    to the negation fo that variable). The returned formula is in
    the same format.
    """

    # Set of clauses that we incrementally expand until their
    # satisfying solutions are solutions(A) - solutions(B).
    result_clauses = list(A)

    B = list(B)

    # We are only interested in clauses in B that are not also in A,
    # as every clause that is in A is always satisfied by an element
    # of our result.
    target_clauses = {tuple(sorted(c)) for c in B} - {tuple(sorted(c)) for c in A}

    # Every element of A satisfies B, so the set is empty.
    if not target_clauses:
        return [[1], [-1]]

    builder = CNFBuilder(A)

    b_is_satisfied = builder.all([builder.any(c) for c in target_clauses])

    # This is a set of clauses for the intersection of our current
    # best answer with B. If these are satisfiable we're not done yet.
    intersection_clauses = list(A) + list(B)

    # This is a set of clauses with new variables introduced
    # such that an assignment is valid iff it can be extended
    # to a valid assignment of these clauses.
    general_clauses = builder.clauses + [[-b_is_satisfied]]

    while True:
        solver = Glucose4(intersection_clauses)
        if not solver.solve():
            return result_clauses

        # This is a solution to our result_clauses that is also in
        # B, and so indicates that we're not done.
        counter_example = solver.get_model()

        # We now feed it in to the general clauses. This cannot be
        # extended to a satisfying solution, because a solution is
        # satisfying for general_clauses if and only if it corresponds
        # to the unique extension of some satisfaction of A setminus B.
        clause_generator = Glucose4(general_clauses)
        solved = clause_generator.solve(assumptions=counter_example)
        assert not solved

        # This gives us a minimal clause that any solution to A setminus B must
        # satisfy k
        core = clause_generator.get_core()
        assert core is not None
        new_clause = [-v for v in core]
        general_clauses.append(new_clause)
        intersection_clauses.append(new_clause)
        result_clauses.append(new_clause)

Where CNFBuilder is the following fairly straightforward class for building up a set of clauses:

class CNFBuilder:
    def __init__(self, clauses):
        self.clauses = list(clauses)
        self.__nextvar = (
            max(abs(literal) for clause in clauses for literal in clause) + 1
        )

    def any(self, variables) -> int:
        result = self.__newvar()
        # Any variable being true implies the result variable is true.
        for v in variables:
            self.__implies(v, result)

        # If none of the variables are true, then any(variables) is false.
        self.clauses.append([-result] + [v for v in variables])
        return result

    def all(self, variables) -> int:
        result = self.__newvar()
        for v in variables:
            self.__implies(result, v)

        # If all of the variables are true, then all(variables) is true.
        self.clauses.append([result] + [-v for v in variables])
        return result

    def __implies(self, l, r):
        self.clauses.append([-l, r])

    def __newvar(self):
        result = self.__nextvar
        self.__nextvar += 1
        return result

Let’s unpack this:

We’ve got sets of clauses A, B, and we want to calculate a set of clauses that represents the set of solutions solutions(A) - solutions(B). We do this by starting from A, and iteratively refining it to exclude solutions we don’t want.

This works by building up a set of clauses C as follows:

  1. Initially, C = A. We maintain throughout that solutions(A) - solutions(B) subset solutions(C) subset solutions(A).
  2. We check if there are any solutions to C that are also solutions to B. If there are not, we’re done and solutions(C) = solutions(A) - solutions(B).
  3. If not, we find a subset of the resulting assignment such that no extension of that partial assignment can be in solutions(A) - solutions(B).
  4. We add a negation of that subset to C as a new clause, and repeat.

Part (2) is easy: If you concatenate two sets of clauses, you’ll get a set of clauses whose solutions are precisely the intersection of the two solutions. i.e. solutions(C + B) = solutions(C) & solutions(B). So any solution to solutions(C) + solutions(B) gives us the desired example.

Part (3) is solved through two observations: Firstly, we can easily construct a larger SAT problem with more variables such that x in solutions(A) - solutions(B) iff x extends to a solution of the larger problem. We do this by introducing variables that correspond to B being satisfied (first we introduce a variable for each clause in B that is not in A, and then we introduce a variable corresponding to their intersection). We can then feed in our solution to an incremental SAT solver, get a minimal subset of it that cannot be extended to a solution (and thus is not present in any member of solutions(A) - solutions(B), and add this as a clause ruling out our counterexample and anything like it, without ruling out any solutions we want to include.

This is a fairly general solution that can be used to perform arbitrary boolean operations on SAT problems, because the components you need are:

  1. Some set of clauses representing a superset of your desired target set.
  2. An oracle that will provide you with an example satisfying your set of clauses that is not in the desired target set (or say that none exists).
  3. An oracle that will take that example and turn it into clause that every element of your desired target set satisfies.

You then just iteratively add clauses to your initial set of clauses until you are done.

You can always use the empty set of clauses for (1),This isn’t great though, because the runtime of this is that it takes two calls to a SAT solver per clause added, so where possible you want to start with a set of clauses that closely approximates your target. and in the absence of a more convenient choice of oracle for (2) you can always construct it in the same way that the oracle for (3) is constructed, by adding new variables.

Another neat thing you can do with essentially the same ideaAnd, in fact, where I first figured this out. is you can take a SAT problem and remove a variable from itYou can use this to refine to arbitrary subsets of the variables by removing one variable at a time, but I’ve not found a better way to do it than that - the natural ways of representing it all introduce exponentially many new variables in the number of variables you want to remove., creating the set of clauses that have a satisfying solution if and only if there is some assignment of the removed variable that satisfies them. Here’s some code that does that:

def remove_one_variable(clauses, variable):
    """Return a set of SAT clauses not containing `variable` that accepts only
    assignments that can be turned into satisfying assignments of `clauses` through
    a suitable choice of assignment for `variable`."""
    assert variable > 0

    solver = Glucose4(clauses)

    # Current set of clauses representing our result set. Initially starts
    # out as all clauses that do not reference the variable to remove.
    refined_clauses: list[Clause] = []

    # All clauses in the original where the variable appears positively.
    positive_clauses = []

    # All clauses in the original where the variable appears negatively.
    negative_clauses = []

    for clause in clauses:
        if variable in clause:
            clause = sorted(set(clause) - {variable})
            positive_clauses.append(clause)
        elif -variable in clause:
            clause = sorted(set(clause) - {-variable})
            negative_clauses.append(clause)
        else:
            refined_clauses.append(clause)

    assert positive_clauses or negative_clauses

    builder = CNFBuilder(refined_clauses)

    positive_satisfied = builder.all(
        [builder.any(clause) for clause in positive_clauses]
    )

    negative_satisfied = builder.all(
        [builder.any(clause) for clause in negative_clauses]
    )

    # These clauses are satisfied if and only if all of our
    # current result clauses are satisfied but there is no possible
    # assignment of the removed variable that makes the original set
    # of clauses satisfied.
    oracle_clauses = builder.clauses
    oracle_clauses.append([-positive_satisfied])
    oracle_clauses.append([-negative_satisfied])
    oracle = Glucose4(oracle_clauses)

    while True:
        # Every solution of our refined_clauses has a satisfying extension,
        # so we're done.
        if not oracle.solve():
            return refined_clauses

        # We've got some solution to the refined clauses that cannot be extended.
        # Turn it into a set of assumptions that cannot be satisfied in the original
        # SAT problem.
        counterexample = oracle.get_model()
        assert counterexample is not None
        assumptions = [
            literal for literal in counterexample if abs(literal) != variable
        ]

        solved_assumptions = solver.solve(assumptions=assumptions)
        assert not solved_assumptions
        core = solver.get_core()
        assert core is not None

        # We've learned a new clause that must always hold for the remaining variables.
        new_clause = tuple([-literal for literal in core])
        refined_clauses.append(new_clause)
        oracle.add_clause(new_clause)

The idea here is that we use our oracle to find some assignment of our current set of clauses over all the remaining variables that cannot be satisfied, then use the original set of clauses to turn that into a new learned clause. It’s not exactly the same as the set operations method, but it involves essentially the same ideas and technologies.

The downside of all of these methods is that they take two SAT calls per added clause, and there’s no guarantee that they don’t add exponentially many clauses. This isn’t just an “I can’t prove they don’t” no guarantee - sometimes they actually do. For example, in the context I was trying this, I found that most of the variables in my SAT problem were easily removable by adding 10s of clauses, but some seemed to run forever. This isn’t actually a problem for my use case for the most part (because I can just skip the ones that take too long), but it is a fairly fundamental limitation of the approach.

I’m reasonably sure this is just intrinsic to the problem - you can represent any boolean expression as CNF, but there’s no guarantee that the CNF isn’t exponentially larger than the represented expression, so where it is you’ll end up with exactly this sort of blow up.

Anyway, like I said, I don’t know if this is a new idea. I’ve not seen it before, and I think I would have if it was common knowledge, but it’s totally possible I’ve just not read the right bits of the literature to run into it.