ILoggable

A place to keep my thoughts on programming

September 2, 2009 .net, geek , , , ,

Boolean Algebra for Tag queries

Currently working on a tag query engine and couldn’t find anything all that useful in the published approaches. I want to do arbitrary boolean algebras against a set of tags in a database, which seems to be out of scope of SQL approaches. All the various tagging schemas out there reduce to either just AND or just OR queries, but not complex logic. However, I want to be able to do something like:

(foo+bar)|(foo+baz)|(bar+^baz)

If there is a way to do this with SQL, i’d love to know. But the way i look at it, i really have to fetch all tags for each item and then do apply that formula to the list of tags on the item.

But let’s break down the matching problem itself into something i can execute. Let’s assume I’ve got a simple parser that can turn the above into an AST. Really, i can decompose any variation into three operations, AND(a,b), OR(a,b) and NOT(a). And I can represent those with some simple Func<> definitions:

Func<bool, bool, bool> AND = (a, b) => a && b;
Func<bool, bool, bool> OR = (a, b) => a || b;
Func<bool, bool> NOT = (a) => !a;

Assuming that i have boolean tokens for foo, bar and baz, the expressions becomes:

OR(AND(foo, bar), OR(AND(foo, baz), AND(bar,NOT(baz))))

Now, the above expression can be expressed as a function that takes three booleans describing the presence of the mentioned tags, ignoring any other tags that the item has, returning a boolean indicating a successful match. In C# that expression would look like this:

Func<bool[], bool> f = x => OR(AND(x[0], x[1]), OR(AND(x[0], x[2]), AND(x[1],NOT(x[2]))));

Next we need to generate this boolean map from the list of tags on the item. Assuming that the tag list is a list of strings, we can define an extension methods on IEnumerable<string> to generate the boolean map like this:

public static bool[] GetBinaryMap(this IEnumerable<string> tags, string[] mask) {
    var map = new bool[mask.Length];
    foreach(var x in tags) {
        for(var i = 0; i < mask.Length; i++) {
            if(x == mask[i]) {
                map[i] = true;
            }
        }
    }
    return map;
}

And with this we can define a linq query that will return us all matching items:

var mask = new[] { "foo", "bar", "baz"};
Func<bool[], bool> f = x => OR(AND(x[0], x[1]), OR(AND(x[0], x[2]), AND(x[1],NOT(x[2]))));
var match = from item in items
            where f(item.Tags.GetBinaryMap(mask))
            select item;

Clearly this is isn’t the fastest executing query, since we first had to create our items, each item in which has a collection of tags. But there is a lot of optimizations left on the table here, such as using our tag mask to pre-qualify items, breaking down the AST into sub-matches that could be used against a cache to find items, etc.

But at least we have a fairly simple way to take complex boolean algebra on tags and convert them into something that we can evaluate generically

Leave a comment