DRMacIver's Notebook

“Fast” enumeration of partial assignments in SAT problems

Published: 2024-10-15

In my last post on SAT problems I noted that you can remove a single variable from a SAT problem using the technique I outlined. That is, you can build a SAT problem over the remaining variables that is solvable iff there is some assignment of the removed variable that satisfies the original problem.

This will only sometimes run in reasonable time, because it runs in one SAT solver call per clause you add, but most of the timeThat is, for most but not all variables in a large SAT problem. in my relatively limited experiments that seems to be actually quite small.

One open question I have is whether you can do better than removing one variable at a time. In particular, if you have some very small subset of the variables of a SAT problem, and you want to find the set of clauses that those variables must satisfy to be extensible to an assignment of the whole, removing one variable at a time ends up hugely expensive.

There is a fairly obvious brute force algorithm that you can use of essentially enumerating all partial assignments up until the point where you get a contradiction and adding the negations of the contradiction as a clause. This will be worst-case exponential in the number of variables in your subset.

I suspect, but haven’t yet got an example that proves, that this is actually essential and that it’s relatively easy to construct SAT problems where exactly one assignment of the subset is not extensible to the assignment of the whole, and this is in some way non-obvious. If so, you might intrinsically need to check every possible assignment, which is intrinsically exponential.

Anyway this argument might be wrong, but I’ve still not figured out a guaranteed better way to do this than brute force enumeration. But I have figured out how to do the brute force enumeration faster.

The core is the following function:

from pysat.solvers import Glucose4
import numpy as np


def refine_assignment_table(clauses, variables, table):
    """Take a k x n table of assignments of n variables,
    and refine it so that it contains only satisfying assignments
    of a set of SAT clauses, where a row in the table is interpreted
    such that row[i] is the assigned value of variables[i]."""

    n = len(variables)
    assert table.shape[1] == n

    variables = np.array(variables)
    variable_indexes = {v: i for i, v in enumerate(variables)}

    sub_clauses = []

    solver = Glucose4(clauses)

    i = 0
    while i < len(table):
        row = table[i]
        assert len(row) == len(variables)
        assignment = [int(v if x else -v) for x, v in zip(row, variables)]
        if solver.solve(assignment):
            # This row is a valid assignment. Move on to the next one.
            i += 1
            continue

        # This row is an invalid assignment, so we can learn a new clause
        # from it by extracting an unsatisfying core.
        clause = [-v for v in solver.get_core()]
        sub_clauses.append(clause)

        # We now remove all rows that don't satisfy this clause. Note that
        # all rows before this one must satisfy it as they are valid
        # assignments.
        clause_mask = np.array(clause)
        positive_mask = np.zeros(n, dtype=bool)
        negative_mask = np.zeros(n, dtype=bool)

        for v in clause:
            if v > 0:
                positive_mask[variable_indexes[v]] = True
            else:
                negative_mask[variable_indexes[-v]] = True

        positive_satisfaction = table[:, positive_mask] == True
        negative_satisfaction = table[:, negative_mask] == False
        satisfying = np.any(
            np.concatenate(
                [positive_satisfaction, negative_satisfaction], axis=1
            ),
            axis=1,
        )
        assert len(satisfying) == len(table)
        table = table[satisfying]
    return sub_clauses, table

This takes some table of assignments of the set of variables and refines it until it contains only valid assignments, learning a set of clauses along the way required to achieve that subset. This allows us to fairly rapidly filter out any unsatisfying assignments, so that when the set of valid assignments is small we should typically run in a much smaller number of SAT calls. Even in an absolute worst case scenario (where every core is the length of the whole set of variables) this does no worse than a more direct brute force enumeration.

The big advantage of this over previous attempts I’ve made at doing it via direct enumeration is that it’s not in any way sensitive to variable ordering. If you’re attempting the obvious recursive bit and you find a new clause, you’ve now got to figure out how to propagate that clause up and back down the recursion. Having everything in a single table of assignments means this is handled automatically for you.

We can then just run this on a table consisting of the set of all possible assignments of the variables, but we can do slightly better than that by breaking the variables up as follows:

def enumerate_subsets(n):
    """Returns a 2^n x n table of all possible subsets of {0, ..., n - 1}"""
    numbers = np.arange(2**n, dtype=np.uint32)[:, np.newaxis]
    powers_of_two = 2 ** np.arange(n, dtype=np.uint32)
    return (numbers & powers_of_two) != 0


def concatenate_matrix_rows(matrix1, matrix2):
    """Return a matrix containing every concatenation of a row
    from each of its two source matrices."""
    m, n = matrix1.shape
    j, k = matrix2.shape

    reshaped1 = matrix1.reshape(m, 1, n).repeat(j, axis=1)

    reshaped2 = matrix2.reshape(1, j, k).repeat(m, axis=0)

    result = np.concatenate([reshaped1, reshaped2], axis=2).reshape(
        m * j, n + k
    )

    return result


def assignment_table_and_clauses(clauses, variables):
    """Returns a set of sub-clauses over `variables` such that
    an assignment of variables can be extended to an assignment
    for the whole set of clauses iff it satisfies those subclauses,
    together with an assignment table representing all valid assignments
    of the variables."""

    if len(variables) <= 8:
        sub_clauses = []
        table = enumerate_subsets(len(variables))
    else:
        # Solve the problem for two halves of the variables and then
        # merge the tables. NB: If we had a reliable heuristic for
        # grouping variables together that have a lot of influence
        # on each other we should probably use it here. e.g. it probably
        # makes sense to group variables in the same clause together.
        split = len(variables) // 2
        sub_clauses_left, table_left = assignment_table_and_clauses(
            clauses, variables[:split]
        )
        sub_clauses_right, table_right = assignment_table_and_clauses(
            clauses, variables[split:]
        )
        sub_clauses = cub_clauses_left + sub_clauses_right
        table = concatenate_matrix_rows(table_left, table_right)

    new_clauses, table = refine_assignment_table(clauses, variables, table)
    return sub_clauses + new_clauses, table

When there are subsets of the variables that also have fairly restricted assignments, working out their possible assignments first means that you can do a little better than exponential. If those subsets end up being mostly unconstrained you at least don’t do worse.

Now, obviously this is all still exponential in the worst case, and the worst case is actually pretty common, but it’s fast enough that it significantly improves what I thought was the number of variables that it was reasonable to expect this sort of thing to work on - in my current experiments it seems to work pretty well with sets of up to about 16-20 variables (16 is fast, 20 is tractable) in a SAT problem of high hundreds of variables. This isn’t huge, but it’s still a useful boundary to have pushed up a bit.