Scallop, a Language for Neurosymbolic Programming

Scallop is a language based on DataLog that supports differentiable logical and relational reasoning. Scallop program can be easily integrated in Python and even with a PyTorch learning module. You can also use it as another DataLog solver. This book aims to give both high-level overview of the language usage and also low-level documentation on how each language feature is used.

The following example shows how knowledge base facts, rules, and probabilistic facts recognized from images can operate together.

// Knowledge base facts
rel is_a("giraffe", "mammal")
rel is_a("tiger", "mammal")
rel is_a("mammal", "animal")

// Knowledge base rules
rel name(a, b) :- name(a, c), is_a(c, b)

// Recognized from an image, maybe probabilistic
rel name = {
  0.3::(1, "giraffe"),
  0.7::(1, "tiger"),
  0.9::(2, "giraffe"),
  0.1::(2, "tiger"),
}

// Count the animals
rel num_animals(n) :- n = count(o: name(o, "animal"))

Table of Content

Please refer to the side-bar for a detailed table of content. At a high-level, we organize this book into the following 5 sections:

Installation and Crash Course

Installation gives instructions on how to install the Scallop on your machine. Crash Course gives a quick introduction to what the language is and how it is used. Both sections are designed so that you can start quickly with Scallop.

Scallop and Logic Programming

Scallop and Logic Programming aims to give you a detailed introduction on the language. It introduces language features such as relational programming, negation and aggregation, queries, foreign constructs, and etc. Reading through all of these you should be well-versed in Scallop's core functionality and you will be able to use Scallop as a Datalog engine.

type fib(bound x: i32, y: i32)
rel fib = {(0, 1), (1, 1)}
rel fib(x, y1 + y2) = fib(x - 1, y1) and fib(x - 2, y2) and x > 1
query fib(10, y)

Scallop and Probabilistic Programming

Scallop and Probabilistic Programming introduces the probabilistic side of Scallop. You will learn to tag facts with probabilities, its underlying algorithms and frameworks, and additional programming constructs for probabilistic semantics. By the end of this section, you will be familiar with using Scallop as a probabilistic programming language.

rel attr = { 0.99::(OBJECT_A, "blue"), 0.01::(OBJECT_B, "red"), ... }
rel relate = { 0.01::(OBJECT_A, "holds", OBJECT_B), ... }

Scallopy and Neurosymbolic Programming

Scallopy and Neurosymbolic Programming goes into the heart of Scallop to introduce applying Scallop to write Neurosymbolic applications. Neurosymbolic methods are for methods that have both neural and logical components. For this, we are going to use the Python binding of Scallop, scallopy, to integrate with machine learning libraries such as PyTorch. This section will be describing the API of scallopy.

sum_2 = scallopy.Module(
  program="""type digit_1(i32), digit_2(i32)
             rel sum_2(a + b) = digit_1(a) and digit_2(b)""",
  input_mappings={"digit_1": range(10), "digit_2": range(10)},
  output_mapping=("sum_2", range(19)))

For Developers

For Developers discusses how developers and researchers who are interested in extending Scallop can step into the source code of Scallop and program extensions.

Installation

There are many ways in which you can use Scallop, forming a complete toolchain. We specify how to installing the toolchain from source. The following instructions assume you have access to the Scallop source code and have basic pre-requisites installed.

Requirements

Rust - nightly 2023-03-07 (please visit here to learn more about Rust nightly and how to install them)
Python 3.7+ (for connecting Scallop with Python and PyTorch)

Scallop Interpreter

The interpreter of Scallop is named scli. To install it, please do

$ make install-scli

From here, you can use scli to test and run simple programs

$ scli examples/datalog/edge_path.scl

Scallop Interactive Shell

Scallop Python Interface

Scallop and Logic Programming

In this part of the book we introduce Scallop as a relational and logical programming language. Scallop is a Datalog based language extended with various language features such as negation, aggregation, disjunctive head, algebraic data type, foreign functions, and foreign predicates. We will explain all of these concepts in detail, aiming to provide a comprehensive introduction to the core language constructs.

Relations and Facts

Scallop is a relational and logical programming language. As described in the Wikipedia:

Logic programming is a programming paradigm which is largely based on formal logic. Any program written in a logic programming language is a set of sentences in logical form, expressing facts and rules about some problem domain.

In Scallop, relations are the most fundamental building blocks of program. In the following example, we have declared the type of a relation called edge, using the type keyword:

type edge(a: i32, b: i32)

We say that the name edge is a predicate or a relation. Inside of the parenthesis, we have two arguments, a: i32 and b: i32. Therefore, we have edge being an arity-2 relation, due to it having 2 arguments. For the argument a: i32, we give a name of the field (a) and a type of that argument i32. Here, both of the arguments are of the i32 type, which means signed-integer, 32-bit. For more information on value types, refer to the Value Types section.

The above line only declares the type of the relation but not the content of the relation. The actual information stored in the relations are called facts. Here we define a single fact under the relation edge:

rel edge(0, 1)

Assuming 0 and 1 each denote an ID of a node, this fact declares that there is an edge going from node 0 to node 1. There are two arguments in this fact, matching the arity of this relation. Regardless of the predicate edge, one also simply consider the (0, 1) as a tuple, more specifically, a 2-tuple.

To declare multiple facts, one can simply write multiple single fact declaration using the rel keyword, like

rel edge(0, 1)
rel edge(1, 2)

One can also use the set syntax to declare multiple facts of a relation. The following line reads: "the relation edge contains a set of tuples, including (0, 1) and (1, 2)":

rel edge = {(0, 1), (1, 2)}

Note that it is possible to declare multiple fact sets for the same relation.

rel edge = {(0, 1), (1, 2)}
rel edge = {(2, 3)}

With the above two lines the edge relation now contains 3 facts, (0, 1), (1, 2), and (2, 3).

Examples of Relations

Boolean and 0-arity Relation

Many things can be represented as relations. We start with the most basic programming construct, boolean. While Scallop allows value to have the boolean type, relations themselves can encode boolean values. The following example contains an arity-0 relation named is_target:

type is_target()

There is only one possible tuple that could form a fact in this relation, that is the empty tuple (). Assuming that we are treating the relation is_target as a set, then if the set contains no element (i.e., empty), then it encodes boolean "false". Otherwise, if the set contains exactly one (note: it can contain at most one) tuple, then it encodes boolean "true". Declaring only the type of is_target as above, would assume that the relation is empty. To declare the fact, we can do:

rel is_target()
// or
rel is_target = {()}

Unary Relations

Unary relations are relations of arity 1. We can define unary relations for "variables" as we see in other programming languages. The following example declares a relation named greeting containing one single string of "hello world!".

rel greeting("hello world!")
// or
rel greeting = {("hello world!",)}

Note that for the second way of expressing the fact, we may omit the parenthesis and make it cleaner:

rel greeting = {"hello world!"}

In light of this, we may write the following rule:

rel possible_digit = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}

Integer Arithmetics as Relations

Integer arithmetics can be represented as relations as well. Consider a simple summation in algebra, a + b = c encodes the sum relationship among two operands (a and b) and their summation (c). Encoded in Scallop, they form arity-3 relations:

type add(op1: i32, op2: i32, result: i32)

Note that, in Scallop, relations are not polymorphic. That is, every relation, no matter declared or inferred, only has one type annotation.

We are working on an update in the future to relax this restriction.

To declare facts of this add relation, such as 3 + 4 = 7, we write

rel add(3, 4, 7) // 3 + 4 = 7

However, you might notice that the add relation is theoretically infinite. That is, there are infinite amount of facts that can satisfy the add relation. There is no way that we can possibly enumerate or declare all the facts. In such case, we resort to declaring rules using foreign functions or predicates, which we will discuss later. For now, let's use add as an example relation that encodes integer arithmetics.

Terminologies

We have the following terminologies for describing relations.

Boolean Relation: arity-0 relation
Unary Relation: arity-1 relation
Binary Relation: arity-2 relation
Ternary Relation: arity-3 relation

Type Inference

Scallop supports Type Inference. One does not need to fully annotate every relation on their types. Types are inferred during the compilation process.

For example, given the following code,

rel edge = {(0, 1), (1, 2)}

we can infer that the relation edge is a binary-relation where both arguments are integers. Note that when integers are specified, they are set default to the type of i32.

Type inference will fail if conflicts are detected. In the following snippet, we have the second argument being 1 as integer and also "1" as string.

rel edge = {(0, 1), (0, "1")}

Having this code will raise the following compile error, suggesting that the types cannot be unified. Note that the following response is generated in sclrepl command line interface.

[Error] cannot unify types `numeric` and `string`, where the first is declared here
  REPL:0 | rel edge = {(0, 1), (0, "1")}
         |                 ^
and the second is declared here
  REPL:0 | rel edge = {(0, 1), (0, "1")}
         |                         ^^^

For more information on values and types, please refer to the next section

Rules

Rules are the fundamental to computation in Scallop. Each rule defines the value and data flowing from some relation to another relation. In the following program, we have defined a few facts for the edge relation. On the second line, we have defined that, for each edge (a, b), there is also a path (a, b). We note that here, a and b are variables instead of constants as we have with defining facts. During computation, the two facts in edge will populate the path relation. This way, we have defined a rule for the path, which is executed during computation.

rel edge = {(0, 1), (1, 2)}
rel path(a, b) = edge(a, b) // (0, 1), (1, 2)

In this section, we talk about how we write rules in Scallop and how intricate computation can be done through it.

Syntax

In general, the basic rules in Scallop are of the form

RULE    ::= rel ATOM = FORMULA
FORMULA ::= ATOM
          | not ATOM
          | CONSTRAINT
          | AGGREGATION
          | FORMULA and FORMULA
          | FORMULA or FORMULA
          | ( FORMULA )

For each rule, we name the atom on the left to be the head of the rule, and the formula on the right to be the body. We read it from right to left: when the body formula holds, the head also holds. The formula might contain atoms, negated atoms, aggregations, conjunction, disjunction, and a few more constructs. For this section, we focus on simple (positive) atom, constraints, and their conjunctions and disjunctions. We will leave the discussion of negation and aggregation to the next sections.

Atom

Simple atoms are of the form RELATION(ARG_1, ARG_2, ...). Similar to facts, we have the relation name followed by a tuple of numerous arguments. Now, the arguments can be of richer forms, involving variables, constants, expressions, function calls, and many more.

Considering the most basic example from above:

rel path(a, b) = edge(a, b)

We have two variables a and b grounded by the edge relation. This means we are treating the variables a and b as source of information, which can be propagated to the head. In this example, the head also contains two variables, both being grounded by the body. Therefore the whole rule is well formed.

In case the head variables are not grounded by the body, such as the following,

rel path(a, c) = edge(a, b)

we would get an error that looks like the following:

[Error] Argument of the head of a rule is ungrounded
  REPL:1 | rel path(a, c) = edge(a, b)
         |             ^

The error message points us to the variable c that has not being grounded in the body.

For basic atoms, such as the ones that the user has defined, can be used to directly ground variables which are directly arguments of the atoms. They can be used to ground other variables or expressions. In the following example, although the rule itself might not make any sense, the variable a is used to ground the expression a + 1. Therefore, the rule is completely valid.

rel output_relation(a, a + 1) = input_relation(a)

In certain cases, expressions can be used to bound variables as well!

rel output_relation(a, b) = input_relation(a, b + 1)

In the above example, the expression b + 1 can be used to derive b, and thus making the variable b grounded. However, this might not be true for other expressions:

rel output_relation(b, c) = input_relation(b + c) // FAILURE

The input_relation can ground the expression b + c directly, however, the two arguments b and c cannot be derived from their sum, as there are (theoretically) infinite amount of combinations. In this case, we will get a compilation failure.

There can be constraints present in atoms as well. For example, consider the following rule:

rel self_edge(a) = edge(a, a)

The atom edge(a, a) in the body grounds only one variable a. But the pattern is used to match any edge that goes from a and to a itself. Therefore, instead of grounding two values representing the "from" and "to" of an edge, we are additionally posing constraint on the type of edge that we are matching. Conceptually, we can view the above rule as the following equivalent rule:

rel self_edge(a) = edge(a, b) and a == b

where there is an additional constraint posed on the equality of a and b. We are going to touch on and and constraints in the upcoming sections.

Disjunction (Or)

The body formula can contain logical connectives such as and, or, not, and implies, used to connect basic formulas such as Atom. In the following example, we are defining that if a is b's father or mother, then a is b's parent:

rel parent(a, b) = father(a, b)
rel parent(a, b) = mother(a, b)

In this program, we have divided the derivation of parent into two separate rules, one processing the father relationship and the other processing the mother relationship. This natually form a disjunction (or), as the derivation of parent can come from 2 disjunctive sources. Note that in Scallop (or Datalog in general), the ordering of the two rules does not matter.

Therefore, given that

rel father = {("Bob", "Alice")}
rel mother = {("Christine", "Alice")}

we can derive that the parent relation holding two tuples, ("Bob", "Alice") and ("Christine", "Alice").

The above program can be rewritten into a more compact form that looks like the following:

rel parent(a, b) = father(a, b) or mother(a, b)
// or
rel parent(a, b) = father(a, b) \/ mother(a, b)

We have used an explicit or (\/) keyword to connect the two atoms, father(a, b) and mother(a, b). The \/ symbol, which is commonly seen in the formal logics as the symbol vee ($\vee$), is also supported. Notice that written in this way, each branch of the disjunction need to fully bound the variables/expressions in the head.

Conjunction (And)

To demonstrate the use of and, let's look at the following example computing the relation of grandmother based on father and mother:

rel grandmother(a, c) = mother(a, b) and father(b, c)
// or
rel grandmother(a, c) = mother(a, b) /\ father(b, c)

Notice that the symbol /\ is a replacement for the and operator, which resembles the wedge ($\wedge$) symbol seen in formal logics.

As can be seen from the rule, the body grounds three variables a, b, and c. The variables a and b comes from mother and the variables b and c comes from father. Notice that there is one variable, b, in common. In this case, we are joining the relation of mother and father on the variable b.

Constraints

Rule body can have boolean constraints. For example, the conjunctive rule above can be re-written as

rel grandmother(a, c) = mother(a, b) and father(bp, c) and b == bp

Here, we are posing an equality (==) constraint on b and bp. Normally, constraints are such kind of binary expressions involving predicates such as

equality and inequality (== and !=)
numerical comparisons (<, >, <=, and >=)

Other constructs

There are other constructs available for defining rules, which we continue to discuss in detail in other sections:

Traditional Datalog Syntax

If you are familiar with traditional Datalog, you can have it by swapping the = with :-, and the and to , For example, the rule for defining grandmother can be rewritten as

rel grandmother(a, c) :- mother(a, b), father(b, c)

Values and Types

Scallop has a built-in set of basic value types, following Rust's naming convention. From there, we have types such as Symbol, DateTime, Entity, and Tensor, which are special types to Scallop.

Type	Description
`i8`	Signed-integer, 8-bit
`i16`	Signed-integer, 16-bit
`i32`	Signed-integer, 32-bit
`i64`	Signed-integer, 64-bit
`i128`	Signed-integer, 128-bit
`isize`	Signed size; its size is dependent on the system
`u8`	Unsigned-integer, 8-bit
`u16`	Unsigned-integer, 16-bit
`u32`	Unsigned-integer, 32-bit
`u64`	Unsigned-integer, 64-bit
`u128`	Unsigned-integer, 128-bit
`usize`	Unsigned size; its size is dependent on the system
`f32`	Floating-point number, 32-bit
`f64`	Floating-point number, 64-bit
`bool`	Boolean
`char`	Character
`String`	Variable-length string
`Symbol`	Symbol
`DateTime`	Date and time
`Duration`	Duration
`Entity`	Entity
`Tensor`	Tensor

Integers

Integers are the most basic data-type in Scallop. If not specified, the default integer type that the system will pick is the i32 (signed integer 32-bit) type:

rel edge = {(0, 1), (1, 2)} // (i32, i32)

If an unsigned integer type is specified but a negative number is used in the declared facts, a type inference error will be raised. We demonstrate this in the sclrepl environment:

scl> type my_edge(usize, usize)
scl> rel my_edge = {(-1, -5), (0, 3)}
[Error] cannot unify types `usize` and `signed integer`, where the first is declared here
  REPL:0 | type my_edge(usize, usize)
         |              ^^^^^
and the second is declared here
  REPL:1 | rel my_edge = {(-1, -5), (0, 3)}
         |                 ^^

Primitive operations that can be used along with integers are

Comparators:
- == (equality)
- != (inequality)
- > (greater-than)
- >= (greater-than-or-equal-to)
- < (less-than)
- <= (less-than-or-equal-to)
Arithmetic operators:
- + (plus)
- - (minus/negate)
- * (mult)
- / (div)
- % (mod)

All of the above operations need to operate on two integers of the same type. For instance, you cannot compare an i32 value with a usize value.

Floating Point Numbers

Floating point numbers are supported in Scallop as well. The following example shows the definition of student and their class grades:

type student_grade(name: String, class: String, grade: f32)

rel student_grade = {
  ("alice", "cse 100", 95.2),
  ("bob", "cse 100", 90.8),
}

It is possible derive special floating points such as inf and -inf, though we cannot declare such values directly. For the floating point that is nan (not-a-number), we will omit the whole fact from the database to maintain sanity. Specifically, the derivation of nan is treated as a failure of foreign functions, which we explain in detail here.

All the basic operations that can work on integers would be able to work for floating point numbers as well.

Boolean

Scallop allows the use of boolean values (true and false).

type variable_assign(String, bool)
rel variable_assign = {("a", true), ("b", false)}

We support the following boolean operations:

Comparisons
- == (equality)
- != (inequality)
Logical operations
- ! (unary negate)
- && (binary and)
- || (binary or)
- ^ (binary xor)

For example, we can have the following code

rel result(a ^ b) = variable_assign("a", a) and variable_assign("b", b) // true

Character

Scallop allows definition of characters such as 'a', '*'. They are single-quoted, and can contain escaped characters such as '\n' (new-line) and '\t' (tab).

type my_chars = {(0, 'h'), (1, 'e'), (2, 'l'), (3, 'l'), (4, 'o')}

Comparisons operations == and != are available for characters.

String

Scallop support variable length strings of the type String. Strings are declared using the double quote ("), and can contain escaped characters such as \n and \t.

rel greeting = {"Hello World"}

Strings can certainly be compared using == and !=. The main ways for interacting with strings are through foreign functions such as $string_length, $substring, $string_concat, and etc. Please refer to the foreign functions section for more information.

Symbols

Symbols are internally registered strings. They are most commonly created through loading from external files. But they can still be specified using the s-quoted-string notation:

rel symbols = {s"NAME", s"AGE", s"GENDER"}

DateTime and Duration

DateTime and Duration are natively supported data structures by Scallop. We commonly specify DateTime and Duration using their string form. In the following example, we specify the DateTime values using the t-quoted-string notation (t represents time):

rel event_dates = {("enroll", t"2020-01-01"), ("finish", t"2020-03-01")}

The dates will be all transformed into UTC time-zone. When the date part is specified and the time is not specified, we will fill the time 00:00:00 UTC. When the time is specified but the date is not, we will use the current date when the program is invoked. Any reasonable date-time format are acceptable, common ones include

t"2019-11-29 08:08:05-08"
t"4/8/2014 22:05"
t"September 17, 2012 10:09am"
t"2014/04/2 03:00:51"
t"2014年04月08日"

Durations can be specified using the d-quoted-string notation (d represents duration):

rel event_durations = {("e1", d"12 days"), ("e2", d"15 days 20 seconds")}

The string can contain numbers followed by their units. When specifying durations, the following units are accepted:

nanoseconds (n)
microseconds (usecs)
milliseconds (msecs)
seconds (secs)
minutes (m)
hours (h)
days (d)
weeks (w)
months (M)
years (y)

We can operate between Duration and DateTime using simple operations such as + and -:

DateTime + Duration ==> DateTime
Duration + Duration ==> Duration
DateTime - DateTime ==> Duration
DateTime - Duration ==> DateTime
Duration - Duration ==> Duration

Entity

Entity values are 64-bit unsigned integers created through hashing. They are used to represent pointers of created entities. They cannot be directly created. Rather, they are managed by Scallop through the creation of entities. For example,

type List = Nil() | Cons(i32, List)
const MY_LIST = Cons(1, Cons(2, Nil()))
rel input_list(MY_LIST)
query input_list

The result is then

input_list: {(entity(0x4cd0d9e6652cdfc7))}

Please refer to this section for more informaiton on algebraic data types and entities.

Type Conversions

In Scallop, types can be converted using the as operator. For example, we can have

rel numbers = {1, 2, 3, 4, 5}
rel num_str(n as String) = numbers(n)

to derive the numbers to be {"1", "2", "3", "4", "5"}. In general, we can have all numbers castable to each other. We also have every type being castable to String. For converting String to other types, it undergoes a parsing process. When the parsing does not go through, no result will be returned.

Writing Queries

Consider the following example of classes, students, and enrollments, and that we want to compute the number of students who have enrolled in at least one CS class.

// There are three classes
rel classes = {0, 1, 2}

// Each student is enrolled in a course (Math or CS)
rel enroll = {
  ("tom", "CS"), ("jenny", "Math"), // Class 0
  ("alice", "CS"), ("bob", "CS"), // Class 1
  ("jerry", "Math"), ("john", "Math"), // Class 2
}

// Count how many student enrolls in CS course
rel num_enroll_cs(n) = n := count(s: enroll(s, "CS"))

Normally, executing a program would result in scli outputting every single relation.

classes: {(0), (1), (2)}
num_enroll_cs: {(3)}
enroll: {("alice", "CS"), ("bob", "CS"), ("jenny", "Math"), ...}

However, we might only be interested in the relation named num_enroll_cs. In this case, we write a query using the query keyword:

query num_enroll_cs

In this case, only the relation num_enroll_cs will be output:

num_enroll_cs: {(3)}

Atomic Query

One can also write atomic query if we just want to get a part of the relation. For instance, consider the fibonacci example:

type fib(x: i32, y: i32)
rel fib = {(0, 1), (1, 1)}
rel fib(x, y1 + y2) = fib(x - 1, y1) and fib(x - 2, y2) and x <= 10
query fib(8, y) // fib(8, y): {(8, 34)}

In this case, we are just looking at the 8-th fibonacci number, which is 34.

Recursive Rules

One very powerful programming construct with Scallop is to declaratively define recursion. Inside of a rule, if a relational predicate appearing in the head appears in the body, the predicate is recursive. For example, the definition of fibonacci number is recursive:

\[ \text{fib}(x) = \left\{ \begin{array}{ll} \text{fib}(x - 1) + \text{fib}(x - 2), & \text{if}~ x > 1 \\ 1, & \text{otherwise} \end{array} \right. \]

Written in Scallop, we encode the function fib as a binary relation between the integer input and output:

type fib(x: i32, y: i32)

We can define the base cases for $\text{fib}(0)$ and $\text{fib}(1)$:

rel fib = {(0, 1), (1, 1)}

Now it comes to the definition of recursive cases, which peeks into $\text{fib}(x - 1)$ and $\text{fib}(x - 2)$ and sums them.

rel fib(x, y1 + y2) = fib(x - 1, y1) and fib(x - 2, y2) // infinite-loop

However, when actually executing this, it would not terminate as we are attempting to compute all fibonacci numbers, and there are infinite amount of them. In order to stop it, we can temporarily add a constraint to limit the value of x, so that we only compute the fibonacci number up to 10:

rel fib(x, y1 + y2) = fib(x - 1, y1) and fib(x - 2, y2) and x <= 10

At the end, we would get a the fib relation to contain the following facts:

fib: {(0, 1), (1, 1), (2, 2), (3, 3), (4, 5), (5, 8), (6, 13), (7, 21), (8, 34), (9, 55), (10, 89)}

As suggested by the result, the 10-th fibonacci number is 89.

Case Study: Graphs and Transitive Closure

Following is one of the most widely known Datalog program: computing the paths inside of a graph. By definition, an edge or a sequence of edges constitute a path. This is reflected by the following two rules:

type edge(i32, i32)

rel path(a, b) = edge(a, b)
rel path(a, c) = path(a, b) and edge(b, c)

The first line states that an edge can form a path. The second line states that a path, connected to a new edge, forms a new path. As can be seen from the second line, the relation path appears in both the body and the head, making it a recursive relation.

In this example, suppose we have

rel edge = {(0, 1), (1, 2)}

we would get the set of paths to be

path: {(0, 1), (0, 2), (1, 2)}

Notice that the path (0, 2) is a compound path obtained from joining the two edges (0, 1) and (1, 2).

Relation Dependency

Given a rule with head and body, we say that the predicate appearing in the head depends on the predicates of the atoms appearing in the body. This forms a dependency graph. The above edge-path example would have the following dependency graph:

edge <--- path <---+
            |      |
            +------+

The relation edge depends on nothing, while path depends on edge and also path itself. This forms a loop in the dependency graph. In general, if a program has a dependency graph with a loop, then the program requires recursion. Any relation that is involved in a loop would be a recursive relation.

Notice that we are mostly talking about positive dependency here, as the atoms in the body of the rule are positive atoms (i.e., without annotation of negation or aggregation). In more complex scenarios, there will be negation or aggregation in a rule, which we explain in detail in future sections.

Fixed-point Iteration

The recursion in Scallop happens in fixed-point iteration. In plain terms, the recursion will continue until there is no new fact being derived in an iteration. In hind-sight, the whole Scallop program is executed in a loop. Within one iteration, all of the rules inside of the program are executed. Let us digest the actual execution happens when executing the above edge-path program:

rel edge = {(0, 1), (1, 2), (2, 3)}
rel path(a, b) = edge(a, b)                 // rule 1
rel path(a, c) = path(a, b) and edge(b, c)  // rule 2

Before the first iteration, the edge has already been filled with 3 facts, namely (0, 1), (1, 2), and (2, 3). But the path is empty. Let's now go through all the iterations:

Iter 0: path = {}
Iter 1: path = {(0, 1), (1, 2), (2, 3)}
       Δpath = {(0, 1), (1, 2), (2, 3)} // through applying rule 1
Iter 2: path = {(0, 1), (1, 2), (2, 3), (0, 2), (1, 3)}
       Δpath = {(0, 2), (1, 3)}         // through applying rule 2
Iter 3: path = {(0, 1), (1, 2), (2, 3), (0, 2), (1, 3), (0, 3)}
       Δpath = {(0, 3)}                 // through applying rule 2
Iter 4: path = {(0, 1), (1, 2), (2, 3), (0, 2), (1, 3), (0, 3)}
       Δpath = {}

In the above note, we also include Δpath, which contains the new paths derived during the current iteration. As can be seen, during iteration 1, paths of length 1 are derived; during iteration 2, paths of length 2 are derived. During iteration 4, there is no more path to be derived, and therefore the Δpath is empty. This tells us that no new facts are derived and the whole fixed-point iteration is stopped, giving us the final result.

Infinite Relations

As we have described in the fixed-point iteration, the recursion will continue until no more fact is derived. However, we are capable of writing rules that are infinite. As shown in the first example:

rel fib(x, y1 + y2) = fib(x - 1, y1) and fib(x - 2, y2)

gives you an infinite relation as there can always be a new x to be derived. In this case, the fixed-point iteration never stops.

The root cause of this is Scallop's support for value creationg, i.e., the creation of new values. Typically, database systems work in closed-world assumption, that is, all the items being reasoned about are already there. No computation is done on arbitrarily created values. But in the above example, we have derived x from the grounded expression x - 1, hence creating a new value.

Typically, the way to resolve this is to create bounds on the created values. For example, the rule

rel fib(x, y1 + y2) = fib(x - 1, y1) and fib(x - 2, y2) and x <= 10

restricts that x cannot be greater than 10. This makes the fixed-point iteration to stop after around 10 iterations.

Other way of getting around with this involve the use of Magic-Set Transformations, which we describe its equivalent in Scallop in a later section.

Negations

Scallop supports negation to be attached to atoms to form negations. In the following example, we are trying to obtain the set of people with no children:

rel person = {"bob", "alice", "christine"} // There are three persons of interest
rel father = {("bob", "alice")}            // Bob is Alice's father
rel mother = {("alice", "christine")}      // Alice is Christine's mother

rel has_no_child(n) = person(n) and not father(n, _) and not mother(n, _)

The last rule basically says that if there is a person n who is neither anyone's father nor anyone's mother then the person n has no child. This is indeed what we are going to obtain:

has_no_child: {("christine",)}

It is clear that negations are very helpful in writing such kind of the rules. However, there are many restrictions on negations. We explain in detail such restrictions.

Negation and Variable Grounding

If we look closely to the rule of has_no_child above, we will find that there is an atom person(n) being used in the body. Why can't we remove it and just say "if one is neither father nor mother then the one has no child"?

rel has_no_child(n) = not father(n, _) and not mother(n, _) // Error: variable `n` is not grounded

The problem is with variable grounding. For the variable n to be appeared in the head, there is no positive atom that grounds it. All we are saying are what n is not, but not what n is. With only "what it is not", it could be literally anything else in the world.

Therefore, we need to ground it with a positive atom such as person(n). With this rule, we have basically

Stratified Negation

Expanding upon our definition of dependency graph, if a predicate occurs in a negative atom in a body, we say that the predicate of the rule head negatively depends on this predicate. For example, the above has_no_child example has the following dependency graph. Notice that we have marked the positive (pos) and negative (neg) on each edge:

person <--pos-- has_no_child --neg--> father
                      |
                      +-----neg-----> mother

Scallop supports stratified negation, which states that there is never a loop in the dependency graph which involves a negative dependency edge. In other words, if there exists such a loop, the program will be rejected by the Scallop compiler. Consider the following example:

rel is_true() = not is_true() // Rejected

The relation is_true negatively depends on the relation is_true itself, making it a loop containing a negative dependency edge. The error message would show that this program "cannot be stratified". If we draw the dependency graph of this program, it look like the following:

is_true <---+
   |        |
   +--neg---+

Since there is a loop (is_true -> is_true) and the loop contains a negative edge, this program cannot be stratified.

The reason that stratified negation is named such way is that, if there is no negative dependency edge in a loop, the whole dependency can be decomposed in to strongly connected components, where inside of each strongly connected component (SCC), there is no negative dependency. In other words, the negation has been stratified, so that the negative edge can only happen between SCCs. We call each SCC a stratum, and the collection of them a strata. Any non-recursive program has a dependency graph forming a Directed Acyclic Graph (DAG), and is therefore always stratifiable.

The following program, although containing both negation and recursion, can be stratified:

rel path(a, b) = edge(a, b) and not sanitized(b)
rel path(a, c) = path(a, b) and edge(b, c) and not sanitized(b)

For it, the following dependency graph can be drawn:

sanitized <--neg-- path <----+
                   |  |      |
     edge <--pos---+  +--pos-+

In this program, we have three SCCs (or strata):

Stratum 1: {edge}
Stratum 2: {sanitized}
Stratum 3: {path}

Negative dependency only occurs between stratum 2 and 3. Therefore, the program can be accepted.

Aggregations

Aggregations in Scallop can be viewed as operations that aggregates over multiple facts. Such operations include counting, summation and product, finding min and max, and logical quantifiers such as exists and forall. Aggregations appear in the body of a rule, and can be nested for abbrevity.

As a concrete example, we look at a program which counts over a set of people:

rel person = {"alice", "bob", "christine"}
rel num_people(n) = n := count(p: person(p)) // n = 3

In general, we use the following syntax for aggregation formulas.

R1, R2, ... := AGGREGATOR(V1, V2, ...: FORMULA (where U1, U2, ...: FORMULA)?)

We name R1, ... to be the aggregation result variable, V1, ... to be the binding variable, and the formula inside of the aggregation the body. When the where keyword is used, we have the aggregation associated with explicit group-by clause. Here, we call the set of variables U1, ... as group-by variables. The formula under the where clause is named the group-by body. The binding variables need to be fully grounded by the body formula, and the group-by variables (if presented) need to also be fully grounded by the group-by body. For different types of aggregation, the AGGREGATOR might also change and annotated with different information. The number of result variables, the number of binding variables, and their types differ for each aggregation.

Here is a high-level overview of each supported aggregator and their configurations. In the table, ... is used to denote an arbitrary amount of variables.

Aggregator	Binding Variables	Result Variables
`count`	`Any...`	`usize`
`sum`	`Number`	the same as the binding variable
`prod`	`Number`	the same as the binding variable
`min`	`Any`	the same as the binding variables
`max`	`Any`	the same as the binding variables
`exists`	`Any...`	`bool`
`forall`	`Any...`	`bool`

Below, we elaborate on each aggregators and describe their usages.

Count

To count the number of facts, we can use the count aggregator. Just repeating the examples shown in the beginning:

rel person = {"alice", "bob", "christine"}
rel num_people(n) = n := count(p: person(p)) // n = 3

We are counting the number of persons appear in the person relation. To be more concrete, let's read out the aggregation formula:

We count the number of p such that p is a person, and assign the result to the variable n.

For count, there could be arbitrary (> 0) number of binding variables which can be typed arbitrarily. It will only have a single result variable which is typed usize. For example, you may count the number of edges:

rel num_edges(n) = n := count(a, b: edge(a, b))

Here, we have two binding variables a and b, meaning that we are counting the number of distinct pairs of a and b.

Note that we can use the syntax sugar for aggregation to omit the repeated n:

rel num_edges = count(a, b: edge(a, b))

Implicit Group-By

With group-by, we may count the number of facts under a pre-defined group. Consider the example where there is a scene with differet colored objects,

rel obj_color = {(0, "red"), (1, "red"), (2, "blue"), (3, "red")}
rel num_obj_per_color(col, num) = num := count(obj: obj_color(obj, col))

As suggested by the facts inside of obj_color, there are 4 objects indexed using 0, 1, 2, 3, each associated with a different color. The object #0, #1, and #3 are red and the object #2 is blue. Therefore, we will get 3 red objects and 1 blue object, as computed in the result of num_obj_per_color:

num_obj_per_color: {("blue", 1), ("red", 3)}

Let's analyze the rule in detail. We find that we are counting over obj such that the object obj has a certain color col. But col is also a variable occurring in the head of the rule. This is an implicit group-by, in that the variable col is being used as an implicit group-by variable. That is, we are conditioning the counting procedure under each group that is defined by the col variable. Since there are two colors appearing in the obj_color relation, we are performing count for each of the two groups.

In general, if a variable is positively grounded in the body and appear in the head of a parent rule, we call the variable an implicit group-by variable.

Explicit Group-By

In the above example, there is no green colored object. However, how do we know that the number of green object is 0? The result does not seem to address this problem.

The missing piece is a domain of the possible groups. Without explicitly setting the domain, Scallop could only search inside of the database on possible groups. However, we can explicitly tell Scallop about what are the groups. Consider the following rewrite of the above program:

rel colors = {"red", "green", "blue"}
rel obj_color = {(0, "red"), (1, "red"), (2, "blue"), (3, "red")}
rel num_obj_per_color(col, num) = num := count(obj: obj_color(obj, col) where col: colors(col))

With the where clause, we have explicitly declared that col is a group-by variable which is grounded by the colors relation. If we look into the colors relation, we find that there are three possible colors that we care about, red, green, and blue. In this case, we will consider "green" as the third group and try to count the number of green objects -- which there are 0:

num_obj_per_color: {("blue", 1), ("green", 0), ("red", 3)}

Sum and Product

We can use the aggregator of sum and product to aggregate multiple numerical values. Consider the following example of sales:

rel sales = {("alice", 1000.0), ("bob", 1200.0), ("christine", 1000.0)}

We can compute the sum of all the sales:

rel total_sales(s) = s := sum[p](sp: sales(p, sp)) // 3200.0
// or
rel total_sales = sum[p](sp: sales(p, sp)) // 3200.0

Notice that the result type of s is the same as the type of the binding variable sp, which is f32 as indicated by the decimals in the definition of sales. Here, the argument variable p is necessary since it is the key to index each sale number. The above rule body is equivalent to the following math formula:

\[ s = \sum_p \text{sale}_p \]

If we do not use the argument variable, we get the following:

rel total_sales_wrong(s) = s := sum(sp: sales(p, sp)) // 2200.0, since the two 1000.0 will be deduplicated without its key

The product aggregator prod can be used in a similar manner as sum.

Min, Max, Argmin, and Argmax

Scallop can compute the minimum or maximum among a set of values. In the following example, we find the maximum grade of an exam:

rel exam_grades = {("a", 95.2), ("b", 87.3), ("c", 99.9)}
rel min_score(m) = m := max(s: exam_grades(_, s)) // 99.9
// or, succinctly
rel min_score = max(s: exam_grades(_, s)) // 99.9

The number (and types) of binding variables can be arbitrary, but the result variables must match the binding variables. In the above case, since s is of type f32, m will be of type f32 as well.

It is also possible to get argmax/argmin. Suppose we want to get the person (along with their grade) who scored the best, we write:

rel best_student(n, s) = (n, s) := max[n](s: exam_grades(n, s))
// or, succinctly
rel best_student = max[n](s: exam_grades(n, s))

Here, we are still finding the maximum score s, but along with max we have specified the "arg" ([n]) which associates with the maximum score. We call n an arg variable for min/max aggregator. The arg variable is grounded by the aggregation body, and can be directly used in the head of the rule.

If we do not care about the grade and just want to know who has the best grade, we can use wildcard _ to ignore the result variable, like

rel best_student(n) = (n, _) := max[n](s: exam_grades(n, s))

Alternatively, we can also use argmax:

rel best_student(n) = n := argmax[n](s: exam_grades(n, s))
// or, succinctly
rel best_student = argmax[n](s: exam_grades(n, s))

Exists and Forall

Logical quantifier such as exists and forall can also be encoded as aggregations. They will return value of boolean as the aggregation result.

Existential Quantifier

Let us start with discussing the easier of the two, exists. Technically, all variables in the body of Scallop rule are existentially quantified. We can use exists aggregation to make it explicit. For example, we can check if there exists an object that is blue:

rel obj_color = {(0, "red"), (1, "green")}
rel has_blue(b) = b := exists(o: obj_color(o, "blue"))

Specifically, we are checking "if there exists an object o such that its color is blue". The result is being assigned to a variable b. Since there is no blue object, we will get a result of has_blue(false).

In case when we just want the result boolean to be true or false, we can omit the result variables. For example, we can rewrite the recursive case of edge-path transitive closure as

rel path(a, c) = exists(b: path(a, b) and edge(b, c))

We note that this is just a syntax sugar equivalent to the following:

rel path(a, c) = r := exists(b: path(a, b) and edge(b, c)) and r == true

When we want to know the inexistence of something, we can do

rel no_red() = not exists(o: obj_color(o, "red"))

Note that there can be arbitrary amount of binding variables.

Universal Quantifier

We can also have universal quantifier forall. For this, there is a special requirement for universal quantification: the body formula has to be an implies (=>) formula. This restriction is enforced so that all the binding variables have bounds being specified on the left-hand-side of the implies formula. In the following example, we check if all the objects are spherical:

type Shape = CUBE | SPHERE | CONE | CYLINDER
rel object = {0, 1, 2}
rel obj_shape = {(0, CUBE), (1, SPHERE), (2, SPHERE)}
rel target(b) = b := forall(o: object(o) implies obj_shape(o, SPHERE))

Notice that we have a relation which defines the domain of object, suggesting that there are just 3 objects for us to work with. In the aggregation, we are checking "for all o such that o is an object, is the object a sphere?" The result is stored in the variable b and propagated to the target relation.

The reason we need to have an implies formula is that we need to use the left-hand-side of implies to give bounds to the universally quantified variables. Scallop cannot reason about open domain variables.

Note that similar to exists, we can also remove the result variable. The following program derives a boolean (arity-0) relation target denoting whether all the red objects are cubes:

type Shape = CUBE | SPHERE | CONE | CYLINDER
type Color = RED | GREEN | BLUE
rel obj_shape = {(0, CUBE), (1, SPHERE), (2, SPHERE)}
rel obj_color = {(0, RED),  (1, GREEN),  (2, GREEN)}
rel target() = forall(o: obj_color(o, RED) implies obj_shape(o, CUBE)) // {()}

Here, we directly use obj_color to serve as the left-hand-side of the implies. There will be one empty tuple being derived, suggesting that the statement is true.

String Join

If you have multiple facts containing strings and you want to join them together, you can use the string_join aggregator:

rel R = {"hello", "world"}
rel P1(n) = n := string_join(s: R(s)) // P1("helloworld")
rel P2(n) = n := string_join<" ">(s: R(s)) // P2("hello world")

In the above example, we can either directly join, producing the string "helloworld", or join with separator " ", producing the string "hello world". Note that the order of the strings in the joined string is determined by the strings. Here, "hello" starts with "h", which is smaller than the "w" in "world", therefore occurring before "world". If you want to specify an explicit order, use the argument variable:

rel R = {(2, "hello"), (1, "world")}
rel P(n) = n := string_join<" ">[i](s: R(i, s)) // P("world hello")

Since we have specified the variable i to be the argument of string_join, it serves to order the tuples. Here, we have (1, "world") and (2, "hello"), so the joined string will be "world hello" instead of "hello world".

Declaring Constants

We can declare constants and give it names. The general syntax is the following:

const NAME (: TYPE)? = CONSTANT

For example, we can define the value of PI:

const PI = 3.1415926

Notice that here we have not specified the type of PI. By default, a float value would resort to the place where the constant is used. If we want to specify a non-default type, we can do

const PI: f64 = 3.1415926

We can also declare multiple constants at a time:

const LEFT = 0, UP = 1, RIGHT = 2, DOWN = 3

Enum Types

We sometimes want to define enum types which contain constant variables. Common examples include RED, GREEN, and BLUE under the Color type, and LEFT, RIGHT, UP under the Action type. These can be achieved by defining enum types:

type Color = RED | GREEN | BLUE
type Action = LEFT | UP | RIGHT | DOWN

Internally, the values such as RED and UP are unsigned integer constants. If not specified, the values start from 0 and goes up 1 at a time.

For example, given the type definition above, RED = 0, GREEN = 1, and BLUE = 2. For Actions, LEFT = 0, UP = 1, and etc. Notice that even when Color and Action are different types, their values can overlap.

One can specify the values of these enum variants by attaching actual numbers to them. In the following example, we have explicitly assigned three values to the colors.

type Color = RED = 3 | GREEN = 5 | BLUE = 7

We can also just set a few of those:

type Color = RED | GREEN = 10 | BLUE

In this case, RED = 0, GREEN = 10, and BLUE = 11. Notice how blue's value is incremented from GREEN.

Displaying Constants

Constants are just values and many of them are integer values. They are not explicitly associated with any symbols. If you want to display them correctly, we advise you create auxilliary relations storing the mapping from each constant to its string form. For example, we can have

rel color_to_string = {(RED, "red"), (GREEN, "green"), (BLUE, "blue")}

In this case, following the result with color_to_string relation will display their desired meanings properly.

Algebraic Data Type and Entities

Algebraic data types are powerful programming constructs that allows user to define custom data structures and variants. Consider a traditional functional definition of a List:

type IntList = Nil()
             | Cons(i32, List)

We are saying that a IntList can be one of two variants, Nil and Cons:

Nil denotes the end of a list;
Cons contains the current i32 integer and a continuation of the list.

In this representation, we can represent a list like [1, 2, 3] with Cons(1, Cons(2, Cons(3, Nil()))). This is indeed what we can write in Scallop. We can declare such a list as a constant:

const MY_LIST = Cons(1, Cons(2, Cons(3, Nil())))

In general, we call the type definition of such data structure Algebraic Data Type definitions, or ADT definitions. The name Entity is used to refer to objects of such data types. In the example above, the constant MY_LIST is an entity of the ADT named IntList.

In this section, we describe in detail the definition and use of ADT and Entities. We also touch on the internals.

Defining Algebraic Data Types (ADT)

We use the following syntax to define ADTs:

type TYPE_NAME = VARIANT_NAME(ARG_TYPE_1, ARG_TYPE_2, ...) | ...

An ADT named TYPE_NAME is defined to have multiple (at least 2) named variants with VARIANT_NAME. Each variant holds a tuple of values typed by ARG_TYPE_1, ARG_TYPE_2, etc. We call variants that have no argument terminal variants. Parenthesis are still needed for those variants.

Please note that there cannot be duplicated variant names, either within the same ADT or different ADTs. For example, the following code would result in compilation failure:

type IntList  = Cons(i32, IntList)   | Nil()
type BoolList = Cons(bool, BoolList) | Nil() // Failure: Cons and Nil are already defined

Currently, ADTs do not support generics. In the above case, the IntList and BoolList needs to be defined separately with differently named variants.

Using ADT to represent arithmetic expressions

Common data that can be expressed through ADT could be structured expressions. The following definition describes the abstract syntax tree (AST) of simple arithmetic expressions:

type Expr = Int(i32)        // An expression could be a simple integer,
          | Add(Expr, Expr) // a summation of two expressions
          | Sub(Expr, Expr) // a substraction of two expressions

The following code encodes a simple expression

// The expression (1 + 3) - 5
const MY_EXPR = Sub(Add(Int(1), Int(3)), Int(5))

Using ADT to represent data structures

Data structures such as binary trees can also be represented:

type Tree = Node(i32, Tree, Tree) | Nil()

Here, Node(i32, Tree, Tree) represents a node in a tree holding three things: an integer (i32), a left sub-tree Tree, and a right sub-tree Tree. The other variant Nil represents an empty sub-tree. In this encoding, Node(5, Nil(), Nil()) would be representing a leaf-node holding a number 5.

The following code encodes a balanced binary search tree:

//         3
//      /     \
//    1         5
//  /   \     /   \
// 0     2   4     6
const MY_TREE =
  Node(3,
    Node(1,
      Node(0, Nil(), Nil()),
      Node(2, Nil(), Nil()),
    ),
    Node(5,
      Node(4, Nil(), Nil()),
      Node(6, Nil(), Nil()),
    )
  )

Working with Entities

Entities are most commonly created as constants using the const keyword. Let us revisit the List example and see how we can use the defined constant in our analysis.

type List = Cons(i32, List) | Nil()

const MY_LIST = Cons(1, Cons(2, Cons(3, Nil()))) // [1, 2, 3]

Using Entities in Relations

We can include the constant entities as part of a fact:

rel target(MY_LIST)
query target

As a result of the above program, we are going to get the value of the entity MY_LIST:

target: {(entity(0xff08d5d60a201f17))}

The value is going to be a 64-bit integer encoded in hex. It is a unique identifier for the created entity.

Note that, identical entities are going to have the same identifier. In the following example, MY_LIST_1 and MY_LIST_2 are identical, and therefore their hex identifier are the same.

const MY_LIST_1 = Cons(1, Nil()),
      MY_LIST_2 = Cons(1, Nil()),
      MY_LIST_3 = Cons(2, Nil())

rel lists = {
  (1, MY_LIST_1),
  (2, MY_LIST_2),
  (3, MY_LIST_3),
}

query lists
// lists: {
//   (1, entity(0x678defa0a65c83ab)), // Notice that the entity 1 and 2 are the same
//   (2, entity(0x678defa0a65c83ab)),
//   (3, entity(0x3734567c3d9f8d3f)), // This one is different than above
// }

Decomposing Entities in Rules

To peek into the content of an Entity, we can destruct it using the case-is operator. We look at an example of computing the length of a list:

type length(list: List, len: i32)
rel length(list, 0)     = case list is Nil()
rel length(list, l + 1) = case list is Cons(_, tl) and length(tl, l)

We define a recursive relation length to compute the length of a list. There are two cases. When the list is Nil(), this means the list has already ended. Therefore the list has a length of 0 For the second case, the list is Cons(_, tl). Here, the length of list is the length of tl plus 1.

We can then compute the length of a list by querying the length relationship on a constant list.

query length(MY_LIST, l) // l = 3

Case Study: Decomposing Entities for Pretty-Printing

We can look at more examples of using the case-is operators. The following set of rules pretty-prints expressions:

type Expr = Int(i32) | Add(Expr, Expr) | Sub(Expr, Expr)

type to_string(expr: Expr, str: String)
rel to_string(e, $format("{}", i))           = case e is Int(i)
rel to_string(e, $format("({} + {})", a, b)) = case e is Add(e1, e2) and to_string(e1, a) and to_string(e2, b)
rel to_string(e, $format("({} - {})", a, b)) = case e is Sub(e1, e2) and to_string(e1, a) and to_string(e2, b)

Shown in the example, we have written three to_string rules for pretty-printing the Expr data structure. Each rule correspond to handling exactly one of the variants. For the inductive cases Add and Sub, we have the to_string rule defined recursively so that the sub-expressions are also converted to strings. For pretty-printing, we have used the $format foreign function.

At the end, running the following snippet

const MY_EXPR = Sub(Add(Int(3), Int(5)), Int(1))
query to_string(MY_EXPR, s)

would give the following result, suggesting that the pretty-printed expression is ((3 + 5) - 1)

to_string(MY_EXPR, s): {(entity(0xa97605c2703c6249), "((3 + 5) - 1)")}

Case Study: Checking Regular Expressions

With ADT, we can specify the language of regular expressions (regex) with ease. Let's consider a very simple regex with union (|) and star (*), while phrases can be grouped together. For example, the regex "a*b" expresses that character a can be repeated arbitrary amount of time (including 0-times), followed by a single b. This regex can be used to match strings like "aaaab" and "b", but not "ba".

Let's try to define this regex language in Scallop!

type Regex = Char(char)           // a single character
           | Star(Regex)          // the star of a regex
           | Union(Regex, Regex)  // a union of two regexes
           | Concat(Regex, Regex) // concatenation of two regexes

As can be seen, we have defined 4 variants of this regex language. With this, our regex "a*b" can be expressed as follows:

// a*b
const A_STAR_B = Concat(Star(Char('a')), Char('b'))

Now, let's define the actual semantics of this regex language and write a relation matches to check if the regex matches with a given sub-string. We first setup the types of such relations.

input_regex is a unary-relation for holding the regex to be checked against;
input_string is a unary-relation for holding the string to be checked against;
matches_substr is for checking if a sub-regex r can be matched with the input string between begin and end indices, where end is exclusive;
matches is a boolean relation telling whether the A_STAR_B regex matches with the input string or not.

type input_regex(r: Regex)
type input_string(s: String)
type matches_substr(r: Regex, begin: usize, end: usize)
type matches()

The main bulk of the code will then be dedicated to define the matches_substr relation. At a high level, we decompose on each type of regex, and match on sub-strings. The first rule that we are going to write would be for the Char variant.

rel matches_substr(r, i, i + 1) = case r is Char(c) and input_string(s) and string_chars(s, i, c)

The rule suggests that if the regex r is a single character c, then we go into the input string s and find all the index i such that its corresponding character is c. The matched sub-string would start at index i and end at index i + 1. Note that the string_chars relation is a foreign predicate that decomposes the string into characters.

Similarly, we can write the rules for other variants:

// For star; it matches empty sub-strings [i, i) and recursively on sub-regex
rel matches_substr(r, i, i) = case r is Star(_) and input_string(s) and string_chars(s, i, _)
rel matches_substr(r, b, e) = case r is Star(r1) and matches_substr(r, b, c) and matches_substr(r1, c, e)

// For union; any string that matches left or right sub-regex would match the union
rel matches_substr(r, b, e) = case r is Union(r1, r2) and matches_substr(r1, b, e)
rel matches_substr(r, b, e) = case r is Union(r1, r2) and matches_substr(r2, b, e)

// For concat; we need strings to match in a consecutive matter
rel matches_substr(r, b, e) = case r is Concat(r1, r2) and matches_substr(r1, b, c) and matches_substr(r2, c, e)

Lastly, we add the rule to derive the final matches relation. Basically, it checks if the regex matches the start-to-end of the input string

rel matches() = input_regex(r) and input_string(s) and matches_substr(r, 0, $string_length(s))

Let us test the result!

rel input_regex(A_STAR_B)
rel input_string("aaaab")
query matches // {()}

Dynamically Creating Entities

There are cases where we want to create new entities during the deductive process. This is done through the new keyword followed by the entity to create. Suppose we have the definition of List and some pretty-printing code for it:

type List = Cons(i32, List) | Nil()

rel to_string_2(l, "]")                      = case l is Nil()
rel to_string_2(l, $format("{}]", i))        = case l is Cons(i, Nil())
rel to_string_2(l, $format("{}, {}", i, ts)) = case l is Cons(i, tl) and case tl is Cons(_, _) and to_string_2(tl, ts)
rel to_string(l, $format("[{}", tl))         = to_string_2(l, tl)

The following example shows that, given an input list l, we generate a result list Cons(1, l).

type input_list(List)
rel result_list(new Cons(1, l)) = input_list(l)

Given an actual list defined as a constant, we will be able to specify that the constant is the input list:

const MY_INPUT_LIST = Cons(2, Cons(3, Nil()))
rel input_list(MY_INPUT_LIST)

Now, let's visualize the results!

rel input_list_str(s) = to_string(MY_INPUT_LIST, s)
rel result_list_str(s) = result_list(l) and to_string(l, s)

query input_list_str  // [2, 3]
query result_list_str // [1, 2, 3]

As can be seen, through the new operator, we have essentially created a new list containing the element 1. We note that the rule for result_list is not recursive. In general, extra care needs to be taken to ensure that the program does not go into infinite loop.`

Case Study: Creating Entities for Equality Saturation

In this case study we look at the problem of equality saturation. Given an symbolic expression, there might be ways to simplify it, which are defined through rewrite rules. Notice that after simplification, the program should be equivalent to the input. The problem is challenging as there might be multiple ways to apply the rewrite rules. How do we then systematically derive the simplest equivalent program?

A simple example here is the symbolic arithmetic expression language, with constant, variables, and summation rule:

type Expr = Const(i32) | Var(String) | Add(Expr, Expr)

One example expression that we can express in this language would be

const MY_EXPR = Add(Add(Const(-3), Var("a")), Const(3)) // (-3 + a) + 3

For visualization, we write a to_string function

rel to_string(p, i as String) = case p is Const(i)
rel to_string(p, v)           = case p is Var(v)
rel to_string(p, $format("({} + {})", s1, s2)) =
  case p is Add(p1, p2) and to_string(p1, s1) and to_string(p2, s2)

If we query on to_string for MY_EXPR, we would get

query to_string(MY_EXPR, s) // s = "((-3 + a) + 3)"

Now let us deal with the actual simplification. The expression (-3 + a) + 3 could be simplified to just a, as the -3 and 3 cancels out. The way to do the simplification is to write two things:

rewrite rules in the form of equivalence relation;
the weight function giving each expression a weight to tell which expression is simpler.

For this, the following set of relations needs to be defined.

type input_expr(expr: Expr)
type equivalent(expr_1: Expr, expr_2: Expr)
type weight(expr: Expr, w: i32)
type simplest(expr: Expr)

Note that we need set a prior knowledge on equivalent: the expr_1 is always more complex than the expr_2. This is to prevent the simplification to go to arbitrary direction and result in infinite-loop. In such case, equivalent would not be commutative. Let us start with equivalent and define its basic property of identity and transitivity:

// Identity
rel equivalent(e, e) = case e is Const(_) or case e is Var(_) or case e is Add(_, _)

// Transitivity
rel equivalent(e1, e3) = equivalent(e1, e2) and equivalent(e2, e3)

Now, we can write the rewrite rules. The first one we are going to write states that, if e1 and e1p are equivalent and e2 and e2p are equivalent, their additions (Add(e1, e2) and Add(e1p, e2p)) are equivalent too.

// e1 == e1p, e2 == e2p ==> (e1 + e2) == (e1p + e2p)
rel equivalent(e, new Add(e1p, e2p)) = case e is Add(e1, e2) and equivalent(e1, e1p) and equivalent(e2, e2p)

The next rule states that Addition is commutative, such that Add(a, b) is equivalent to Add(b, a):

// (a + b) == (b + a)
rel equivalent(e, new Add(b, a)) = case e is Add(a, b)

We also have a rule for associativity:

// (a + (b + c)) == ((a + b) + c)
rel equivalent(e, new Add(new Add(a, b), c)) = case e is Add(a, Add(b, c))

A rule for simplifying adding summation identity 0:

// a + 0 = a
rel equivalent(e, a) = case e is Add(a, Const(0))

A rule for reducing two constants addition:

rel equivalent(e, Const(a + b)) = case e is Add(Const(a), Const(b))

Now we have 5 rewrite-rules in place, let us define how to compute the weight of each expression. The leaf nodes (Var and Const) have weight of 1, and the addition have the weight from left and right sub-expr added together plus 1.

rel weight(e, 1) = case e is Var(_) or case e is Const(_)
rel weight(e, l + r + 1) = case e is Add(a, b) and weight(a, l) and weight(b, r)

Lastly, we use the aggregation to find the equivalent programs with the minimum weight, which is our definition of the "simplest" program. Note that we have used an argmax aggregation denoted by min[p] here:

rel best_program(p) = p := argmin[p](w: input_expr(e) and equivalent(e, p) and weight(p, w))

If we query for the best program and turn it into string, we will get our expected output, a single variable "a"!

rel best_program_str(s) = best_program(p) and to_string(p, s)
query best_program_str // {("a")}

Parsing Entities from String

Scallop provides foreign functions and predicates for dynamically parsing entities from string input. Consider the following example:

type Expr = Const(f32) | Add(Expr, Expr)

rel expr_str = {"Add(Const(1), Const(2.5))"}

Let us say that we want to parse an expression from the expr_str, we can do the following:

rel expr($parse_entity(s)) = expr_str(s)

Here, we are using the foreign function of $parse_entity. We would get the following result:

query expr
// expr: {(entity(0xadea13a2621dd155))}

On-Demand Relations

There are often times relations/predicates where you know that would not need to be fully computed. This would include the infinite relations. This means that, we want to define such relations without worrying about its infinite-ness while also being able to supply it with information needed for the computation. Such relations are called On-Demand Relations.

We show here one on-demand relation which is the fibonacci number relation:

type fib(bound x: i32, y: i32)
rel fib = {(0, 1), (1, 1)}
rel fib(x, y1 + y2) = fib(x - 1, y1) and fib(x - 2, y2) and x > 1
query fib(10, y)

Normally, if we define the fibonacci relation, it would only contain the second and the third line, which respectively defines the base cases and the recursive cases. However, as we all know, there are infinitely many fibonacci numbers and it would not be wise to compute the relation fully. Usually, we want to infer some fact inside of the infinite relation, based on some inputs. In this case, as noted on the last line, we want to know the 10th fibonacci number.

It is hinted that when we want to compute a fibonacci number, we usually supply the x value, in this case, 10, in order to get the value y. This is exactly what we tell the compiler in the first line. Inside of the type declaration, we provide an additional adornment to each of the variables.

x is adorned by bound, denoting that it is treated as an input (or bounded) variable to the relation
y is not adorned by anything, suggesting that it is a free variable which will be computed by the rules of the relation

Getting x based on y is out-of-scope in this tutorial.

By providing the adornments (with at least one bound), we are telling Scallop that the relation should be computed on-demand. From there, Scallop will search for every place where the relation is demanded, and restrict the computation of the relation only on the demand.

In our case, there is just one single place where the fib relation is demanded (where x is 10). Therefore, Scallop will compute only the necessary facts in order to derive the final solution.

Adornments

There are only two kinds of adornments:

bound
free

Annotating whether the variable is treated as bounded variable or free variable.

If an adornment is not provided on a variable, then it is by default a free variable. In this sense, all normal relations without any adornment would be treated as non-on-demand relations.

When at least one bound adornment is annotated on a relation type declaration, we know that the relation needs to be computed on-demand.

More Examples

On-Demand Path

Let's go back to our example of edge-and-path. Consider that there is a huge graph, but we only want to know a path ending at a specific node:

rel path(a, b) = edge(a, b) or (edge(a, c) and path(c, b))
query path(a, 1024)

In this case, enumerating all paths would be strictly more expensive than just exploring from the end point. Therefore, we add an adornment to the path relation like the following:

type path(free i32, bound i32)

We say the second argument is bound and the first argument is free, matching what we expect from the query.

On-Demand To-String

Let's consider an simple arithmetic expression language and a to_string predicate for the language:

type Expr = Const(i32) | Var(String) | Add(Expr, Expr) | Sub(Expr, Expr)

rel to_string(e, $format("{}", i))             = case e is Const(i)
rel to_string(e, $format("{}", v))             = case e is Var(e)
rel to_string(e, $format("({} + {})", s1, s2)) = case e is Add(e1, e2) and to_string(e1, s1) and to_string(e2, s2)
rel to_string(e, $format("({} - {})", s1, s2)) = case e is Sub(e1, e2) and to_string(e1, s1) and to_string(e2, s2)

Now that let's say there are many expressions declared as constants:

const EXPR_1 = Add(Const(1), Add(Const(5), Const(3)))
const EXPR_2 = Add(Const(1), Var("x"))
const EXPR_3 = Const(13)

Scallop would have automatically generated string for all of the expressions.

However, let's say we are only interested in one of the expressions:

query to_string(EXPR_3, s)

Then most of the computations for to_string would be redundant.

In this case, we would also declare to_string as an on-demand predicate, like this:

type to_string(bound Expr, String)

Then only the queried expression will be to_string-ed.

Internals

Internally, when there are relations being annotated with adornments, the whole Scallop program is undergone a program transformation. This transformation is traditionally called Magic-Set Transformation or Demand Transformation. There are multiple papers on the topic, which we reference below:

Loading from CSV

Scallop can be used along with existing datasets loaded from CSVs. This is usually achieved with annotating on specific relations. For example, assuming we have a file edge.csv,

0,1
1,2

we can load the content of it into a relation edge in Scallop using the following syntax

@file("edge.csv")
type edge(from: usize, to: usize)

rel path(a, c) = edge(a, c) or path(a, b) and edge(b, c)

query path

In particular, we annotate the @file(...) attribute onto the relation type declaration type edge(...). The file name is written inside the @file attribute. We require the relation to be declared with types in order for it to be loaded with CSV file content. Depending on the type declaration, the file content will be parsed into values of certain types.

From here, the edge relation will be loaded with the content (0, 1) and (1, 2). After executing the Scallop program above, we would obtain the result path being (0, 1), (0, 2), and (1, 2).

Certainly, there are many ways to load CSV. In this section, we introduce the various ways to configure the CSV loading.

Headers

There are CSV files with headers. Suppose we have the following CSV file

from,to
0,1
1,2

To load this file, we would need to add an additional argument header=true to the @file attribute:

@file("edge.csv", header=true)
type edge(from: usize, to: usize)

Note that by default we assume that CSV files don't have headers.

Deliminators

By default, we assume the values inside of the CSV file are deliminated by commas ','. In case where CSV files have values deliminated by other characters, such as tabs '\t', we would need to specify that in the @file attribute:

@file("edge.csv", deliminator="\t")
type edge(from: usize, to: usize)

Note that deliminators cannot be of multiple characters.

Parsing Field-Value Pairs

There are many CSV tables which have a lot of columns. One way is to specify all the fields and their types, like the following.

type table(field1: type1, field2: type2, ..., fieldn: typen)

However, this might be very hard to encode. Therefore, we provide another way of parsing CSV files into relations, by using primary keys and field-value pairs. Let's assume we have the following CSV file:

student_id,name,year,gender
0001,alice,2020,female
0002,bob,2021,male

We see that student_id can serve as the primary key of this table. With this, it can be loaded into the following relation

@file("student.csv", keys="student_id")
type table(student_id: usize, field: String, value: String)

By specifying keys="student", we tell Scallop that student_id should be viewed as unique primary keys. The rest of the two elements are field and value, both need to be typed Strings. As a result, it produces the following 6 facts in the table relation:

(1, "name", "alice"), (1, "year", "2020"), (1, "gender", "female"),
(2, "name", "bob"),   (2, "year", "2021"), (2, "gender", "male")

Note that there could be more than one keys. Consider the following table

student_id,course_id,enroll_time,grade
0001,cse100,fa2020,a
0001,cse101,sp2021,a
0002,cse120,sp2021,b

We see that the combination of student_id and course_id form the unique primary keys. In this case, they can be loaded using the following syntax:

@file("enrollment.csv", keys=["student_id", "course_id"])
type enrollment(student_id: usize, course_id: String, field: String, value: String)

By setting keys to be a list ["student_id", "course_id"], the student_id field is the first primary key and course_id is the second. There are still two additional arguments for the enrollment relation. In general, the arity of the relation will be the number of primary keys plus 2.

Specifying Fields to Load

In case not all fields are desired when loading, one can use the fields argument to specify what to load. Consider the same enrollment table encoded in CSV:

student_id,course_id,enroll_time,grade
0001,cse100,fa2020,a
0001,cse101,sp2021,a
0002,cse120,sp2021,b

If we only want to get everything but omit the enroll_time column, we can do

@file("enrollment.csv", fields=["student_id", "course_id", "grade"])
type enrollment(student_id: usize, course_id: String, grade: String)

This can also work in conjunction with the keys argument. In this case, we do not need to specify the primary keys.

@file("enrollment.csv", keys=["student_id", "course_id"], fields=["grade"])
type enrollment(student_id: usize, course_id: String, field: String, value: String)
// The following facts will be obtained
//   enrollment(1, "cse100", "grade", "a")
//   enrollment(1, "cse101", "grade", "a")
//   enrollment(2, "cse120", "grade", "b")

Foreign Functions

Foreign functions allows for complex value manipulation in Scallop. Conceptually, they are pure and partial functions that operate on value(s) and return one single value only. Functions with states, such as random, are not allowed as foreign functions.

Function Types

In Scallop, foreign functions are generically typed with optional and variable arguments. All the functions have a dollar sign ($) associated with the function name. We use the following syntax to denote a function signature

$FUNC_NAME<ARG(: FAMILY)?, ...>(
  POS_ARG: POS_ARG_TYPE, ...,
  OPT_ARG: OPT_ARG_TYPE?, ...,
  VAR_ARG_TYPE...
) -> RETURN_TYPE

The generic arguments are specified in the <...> after the function name, and can be annotated by optional type family. For the arguments of the function, optional arguments have to appear after all positional arguments, and the variable arg type must appear after all positional and optional arguments. Functions must have a return type.

For example, the function $string_char_at(s: String, i: usize) -> char takes in a string s and an index i, and returns the character at that location. The two arguments s and i are both positional arguments.

In the function $substring(s: String, b: usize, e: usize?), we have 2 positional arguments (s and b) and 1 optional argument (e). This means that this substring function can be invoked with 2 or 3 arguments. Invoking $substring("hello", 3) would give us "lo", and invoking $substring("hello", 1, 3) would give us "el".

For functions like $abs<T: Number>(T) -> T, we have absolute value function taking in value of any type that is a number (including integers and floating points). The function also returns a type the same as the input.

For a function like $format(f: String, Any...), it looks at the format string and fill all the {} symbol in the string with the latter arguments. Notice how there can be arbitrary number of arguments (variable arg) of Any type. For example, we can have $format("{} + {}", 3, "a") ==> "3 + a" and $format("{}", true) ==> "true".

Function Failures

Foreign functions may fail with errors such as divide-by-zero, index-out-of-bounds. When error happens, values will not be propagated along the computation, and will be dropped silently.

For example, the following program makes use of the foreign function $string_char_at. It walks through the indices 1, 3, and 5, and get the character on those indices of the string "hello".

rel indices = {1, 3, 5}
rel output(i, $string_char_at("hello", i)) = indices(i)

However, there are only 5 characters in the string, meaning that getting the 5-th character would result in an index-out-of-bounds error. Scallop will drop this invokation silently, resulting in only two facts being derived:

output: {(1, 'e'), (3, 'l')}

Similar things happen when nan is derived from floating point operations, or that the foreign function fails.

Library of Foreign Functions

We hereby list all the available foreign functions, their signatures, descriptions, and an example of how to invoke them. The functions here are ordered alphabetically. For some of the functions that are slightly more complicated (e.g. $format), please refer to the section below for more information.

Function	Description	Example
`$abs<T: Number>(x: T) -> T`	Absolute value function $\lvert x \rvert$	`$abs(-1)` => `1`
`$acos<T: Float>(x: T) -> T`	Arc cosine function $\text{acos}(x)$	`$acos(0.0)` => `1.5708`
`$atan<T: Float>(x: T) -> T`	Arc tangent function $\text{atan}(x)$	`$atan(0.0)` => `0.0`
`$atan2<T: Float>(y: T, x: T) -> T`	2-argument arc tangent function $ \text{atan}(y, x) $	`$atan2(0.0, 1.0)` => `0.0`
`$ceil<T: Number>(x: T) -> T`	Round up to closest integer $ \lceil x \rceil $	`$ceil(0.5)` => `1.0`
`$cos<T: Float>(x: T) -> T`	Cosine function $\text{cos}(x)$	`$cos(0.0)` => `1.0`
`$datetime_day(d: DateTime) -> u32`	Get the day component of a `DateTime`, starting from 1	`$datetime_day(t"2023-01-01")` => `1`
`$datetime_month(d: DateTime) -> u32`	Get the month component of a `DateTime`, starting from 1	`$datetime_month(t"2023-01-01")` => `1`
`$datetime_month0(d: DateTime) -> u32`	Get the month component of a `DateTime`, starting from 0	`$datetime_month0(t"2023-01-01")` => `0`
`$datetime_year(d: DateTime) -> i32`	Get the year component of a `DateTime`	`$datetime_month0(t"2023-01-01")` => `2023`
`$dot(a: Tensor, b: Tensor) -> Tensor`	Dot product of two tensors $a \cdot b$; only available when compiled with `torch-tensor`
`$exp<T: Float>(x: T) -> T`	Exponential function $e^x$	`$exp(0.0)` => `1.0`
`$exp2<T: Float>(x: T) -> T`	Exponential function $2^x$ (base 2)	`$exp2(2.0)` => `4.0`
`$floor<T: Number>(x: T) -> T`	Round down to closest integer $\lfloor x \rfloor$	`$exp2(2)` => `4`
`$format(String, Any...) -> String`	Formatting string	`$format("{} + {}", 3, "a")` => `"3 + a"`
`$hash(Any...) -> u64`	Hash the given values	`$hash("a", 3, 5.5)` => `5862532063111067262`
`$log<T: Float>(x: T) -> T`	Natural logarithm function $\text{log}_e(x)$	`$log(1.0)` => `0.0`
`$log2<T: Float>(x: T) -> T`	Logarithm function $\text{log}_2(x)$ (base 2)	`$log2(4.0)` => `2.0`
`$max<T: Number>(T...) -> T`	Maximum $\text{max}(x_1, x_2, \dots)$	`$max(4.0, 1.0, 9.5)` => `9.5`
`$min<T: Number>(T...) -> T`	Minimum $\text{min}(x_1, x_2, \dots)$	`$max(4.0, 1.0, 9.5)` => `1.0`
`$pow<T: Integer>(x: T, y: u32) -> T`	Integer power function $x^y$	`$pow(2.2, 2)` => `4.84`
`$powf<T: Float>(x: T, y: T) -> T`	Float power function $x^y$	`$powf(4.0, 0.5)` => `2.0`
`$sign<T: Number>(x: T) -> T`	Sign function that returns ${-1, 0, 1}$ in respective types	`$sign(-3.0)` => `-1.0`
`$sin<T: Float>(x: T) -> T`	Sine function $\text{sin}(x)$	`$sin(0.0)` => `0.0`
`$string_char_at(s: String, i: usize) -> char`	Get the `i`-th character of string `s`	`$string_char_at("hello", 2)` => `'l'`
`$string_concat(String...) -> String`	Concatenate multiple strings	`$string_concat("hello", " ", "world")` => `"hello world"`
`$string_index_of(s: String, pat: String) -> usize`	Find the index of the first occurrence of the pattern `pat` in string `s`	`$string_index_of("hello world", "world")` => `6`
`$string_length(s: String) -> usize`	Get the length of the string	`$string_length("hello")` => `5`
`$string_lower(s: String) -> String`	To lower-case	`$string_lower("LisA")` => `"lisa"`
`$string_trim(s: String) -> String`	Trim a string	`$string_trim(" hello ")` => `"hello"`
`$string_upper(s: String) -> String`	To upper-case	`$string_upper("LisA")` => `"LISA"`
`$substring(s: String, b: usize, e: usize?) -> String`	Find the substring given begin index and optional the end index	`$substring("hello world", 6)` => `"world"`
`$tan<T: Number>(x: T) -> T`	Tangent function $\text{tan}(x)$	`$tan(0.0)` => `0.0`

Foreign Predicates

Foreign predicates aim to provide programmers with extra capabilities with relational predicates. Traditional Datalog program defines relational predicate using only horn-rules. Given the assumption that the input database is finite, these derived relational predicates will also be finite. However, there are many relational predicates that are infinite and could not be easily expressed by horn-rules. One such example is the range relation. Suppose it is defined as range(begin, end, i) where i could be between begin (inclusive) and end (exclusive). There could be infinitely many triplets, and we cannot simply enumerate all of them. But if the first two arguments begin and end are given, we can reasonably enumerate the i.

In Scallop, range is available to be used as a foreign predicate. Notice that range can be applied on any integer data, making it a generic predicate. For example, to use range on i32 data, we will need to invoke range<i32>:

rel result(x) = range<i32>(0, 5, x)

Here we enumerate the value of x from 0 (inclusive) to 5 (exclusive), meaning that we will obtain that result = {0, 1, 2, 3, 4}. For the rest of this section, we describe in detail how foreign predicates are constructed in Scallop and why are they useful.

Foreign Predicate Types

Foreign predicates can be generic and are statically typed. In addition to just providing the argument types, we also need to provide a boundness pattern.

A boundness pattern is a string of length equal to the relation arity and consisting of b and f. The character b means bounded, reflecting the variable on that position is taken as input to the predicate. The character f means free, suggesting that the variable on that position will be generated as output by the predicate.

For example, the full definition of range is

range<T: Integer>(begin: T, end: T, i: T)[bbf]

Notice that at the end of the definition we have [bbf]. Here, bbf is a boundness pattern for range, suggesting that begin and end will be provided as input, and i will be generated as output.

In the future, we plan to allow the definition of multiple boundness patterns

Standard Library of Foreign Predicates (Part A)

In this part, we give an overview to the foreign predicates that are discrete only.

Foreign Predicate	Description
`datetime_ymd(d: DateTime, y: i32, m: u32, d: u32)[bfff]`	Get the `y`ear, `m`onth, and `day` from a `DateTime` value
`range<T: Integer>(begin: T, end: T, i: T)[bbf]`	Generate all the integers `i` starting from `begin` and end with `end - 1`
`string_chars(s: String, i: usize, c: char)[bff]`	Generate all the index-character tuples inside of string `s`
`string_find(s: String, pat: String, begin: usize, end: usize)[bbff]`	Generate all the begin-end ranges of the pattern `pat`'s occurrence in the string `s`
`string_split(s: String, pat: String, out: String)[bbf]`	Split the string `s` using the pattern `pat` and generate the `out` strings

Reference Guide

We list all the language features supported by Scallop.

Import Files

import "path/to/other/file.scl"

Type Definition

Type Alias Definition

type ObjectId = usize

Sub-Type Definition

type Name <: String

Enum Type Definition

type Action = LEFT | RIGHT | UP | DOWN

Algebraic Data Type Definition

type Expr = Const(i32) | Add(Expr, Expr) | Sub(Expr, Expr)

Relation Type Definition

type edge(x: i32, y: i32)

Constant Definition

const PI: f32 = 3.1415

Relation Definition

Fact Definition

rel edge(1, 2)

Set-of-Tuples Definition

rel edge = {(1, 2), (2, 3), (3, 4)}

Rule Definition

rel path(a, b) = edge(a, b) or path(a, c) and edge(c, b)

Disjunctive Head

rel { assign(v, false); assign(v, true) } = variable(v)

Atom

fib(x - 1, y)

Negation

rel has_no_child(p) = person(p) and not father(p, _) and not mother(p, _)

Constraint

rel number(0)
rel number(i + 1) = number(i) and i < 10

Aggregation

rel person = {"alice", "bob", "christine"}
rel num_people(n) = n := count(p: person(p))

Foreign Predicate

rel grid(x, y) = range<i32>(0, 5, x) and range<i32>(0, 5, y)

Query Definition

query path

Scallop and Probabilistic Programming

One fundamental concept in machine learning is probability. Scallop, being a neurosymbolic programming language, supports probability and probabilistic programming natively. For example, one can write the following program:

type Action = UP | DOWN | LEFT | RIGHT
rel predicted_action = {0.05::UP, 0.09::DOWN, 0.82::LEFT, 0.04::RIGHT}

where the predicted_action relation encodes a distribution of actions and their probabilities. In particular, the UP action is predicted to have a $0.05$ probability. Here, the :: symbol is used to suggest that probabilities (such as 0.05) are used to tag the facts (such as UP).

Since we can define probability on user declared facts, the derivation of new facts will be associated with probabilities too. This means that Scallop is doing probabilistic reasoning. The whole probabilistic reasoning semantics of Scallop is defined with the theory of provenance semiring. In this chapter, we give detailed explanation to the probabilities appeared in Scallop.

Tags and Provenance

Scallop's probabilistic semantics is realized by the Provenance Semiring framework. Inside of this framework, each fact can be tagged by an extra piece of information, which we call tag. Such information is propagated throughout the execution of Scallop program according to the provenance, which is the mathematical object defining how tags propagate.

Motivating Probabilistic Example

The following example shows a fact earthquake() being tagged by a probability 0.03 (earthquake could happen with a 0.03 probability):

rel 0.03::earthquake()

Concretely, we have an (external) tag space of $[0, 1]$, which contains real numbers between 0 and 1, which is the space of probabilities. Similarly, we define another tagged fact burglary():

rel 0.20::burglary()

We can declare a rule saying that, "when earthquake or burglary happens, an alarm will go off".

rel alarm() = earthquake() or burglary()
query alarm

Remember that the facts earthquake() and burglary() are probabilistic. Intuitively, the derived fact alarm() will also be associated with a derived probability. Based on probability theory, we have

\[ \begin{align} \Pr(\text{alarm}) &= \Pr(\text{earthquake} \vee \text{burglary}) \\ &= 1 - \Pr(\neg \text{earthquake} \wedge \neg \text{burglary}) \\ &= 1 - \Pr(\neg \text{earthquake}) \cdot \Pr(\neg \text{burglary}) \\ &= 1 - (1 - \Pr(\text{earthquake})) \cdot (1 - \Pr(\text{burglary})) \\ &= 1 - (1 - 0.03) (1 - 0.20) \\ &= 1 - 0.97 \times 0.8 \\ &= 0.224 \end{align} \]

This is indeed what we get if we use the topkproofs provenance (which we discuss later in the chapter) with the scli Scallop interpreter:

> scli alarm.scl
alarm: {0.224::()}

Proofs Provenance

Fact with Probability

Logic and Probability

Aggregation with Probability

With the introduction of probabilities, many existing aggregators are augmented with new semantics, which we typically call multi-world semantics. What's more, there are new aggregators, such as softmax, rank, and weighted_avg, that make use of the probabilities. We introduce these aggregators one-by-one in this section.

Multi-world Semantics with Aggregators

Let us take the count aggregator as an example. Consider we have 2 objects, each could be big or small with their respective probabilities:

type OBJ = OBJ_A | OBJ_B
rel size = {0.8::(OBJ_A, "big"); 0.2::(OBJ_A, "small")} // obj A is very likely big
rel size = {0.1::(OBJ_B, "big"); 0.9::(OBJ_B, "small")} // obj B is very likely small

Now let's say we want to count how many big objects are there, by using the following

Note that even when using probabilites, one can opt to not use the multi-world semantics by adding ! sign to the end of the aggregator.

New Aggregators using Probabilities

Softmax and Normalize

Rank

Weighted Average and Weighted Sum

Sampling with Probability

In Scallop, samplers share the same syntax as aggregators. They usually work with probabilistic provenances, but can also work without them. Here are some example samplers:

top: get the $k$ facts with top probabilities
categorical: treat the relation as a categorical distribution and sample from it
uniform: treat the relation as a uniform distribution and sample from it

Let's take top as an example. We can obtain the top ranked symbol by using the following rule:

rel symbols = {0.9::"+", 0.05::"-", 0.02::"3"}
rel top_symbol(s) = s := top<1>(s: symbols(s)) // 0.9::top_symbol("+")

The Scallop Python Binding `scallopy`

scallopy is the Python binding for Scallop, offering an interface for computationg with Scallop in Python. In addition, it can be integrated with PyTorch, allowing users to write Neuro-Symbolic applications that can be connected to PyTorch. In this section, we elaborate on how to install, configure, and use the scallopy library.

For an example, please look at Getting Started. To start reading the documentation, proceed to Scallopy Context

Installation

TODO: Installation with venv

TODO: Installation with Conda

Getting Started with Scallopy

Motivating Example

Let's start with a very simple example illustrating the usage of scallopy.

import scallopy

ctx = scallopy.Context()

ctx.add_relation("edge", (int, int))
ctx.add_facts("edge", [(1, 2), (2, 3)])

ctx.add_rule("path(a, c) = edge(a, c) or path(a, b) and edge(b, c)")

ctx.run()

print(list(ctx.relation("path"))) # [(1, 2), (1, 3), (2, 3)]

In this very simple edge-path example, we are interacting with Scallop through a Python class called Context. Basically, a Context manages a Scallop program, along with the relations, facts, and execution results corresponding to that Scallop program. We create a Context by ctx = scallopy.Context. Relations, facts, and rules are added through the functions add_relation(...), add_facts(...), and add_rule(...). With everything set, we can execute the program inside the context by calling run() Lastly, we pull the result from ctx by using relation(...). Please refer to a more detailed explanation of this example in Scallop Context.

Machine Learning with Scallopy and PyTorch

When doing machine learning, we usually want to have batched inputs and outputs. Instead of building the Scallop context incrementally and explicitly run the program, we can create a Module at once and be able to run the program for a batch of inputs. This offers a few advantages, such as optimization during compilation, batched execution for integration with machine learning pipelines, simplified interaction between data structures, and so on. For example, we can create a module and run it like the following:

import scallopy
import torch

# Creating a module for execution
my_sum2 = scallopy.Module(
  program="""
    type digit_1(a: i32), digit_2(b: i32)
    rel sum_2(a + b) = digit_1(a) and digit_2(b)
  """,
  input_mappings={"digit_1": range(10), "digit_2": range(10)},
  output_mappings={"sum_2": range(19)},
  provenance="difftopkproofs")

# Invoking the module with torch tensors. `result` is a tensor of 16 x 19
result = my_sum2(
  digit_1=torch.softmax(torch.randn(16, 10), dim=0),
  digit_2=torch.softmax(torch.randn(16, 10), dim=0))

As can be seen in this example, we have defined a Module which can be treated also as a PyTorch module. Similar to other PyTorch modules, it can take in torch tensors and return torch tensors. The logical symbols (such as the i32 numbers used in digit_1 and digit_2) are configured in input_mappings and output_mappings, and can be automatically converted from tensors. We also see that it is capable of handling a batch of inputs (here, the batch size is 16). Internally, Scallop also knows to execute in parallel, making it performing much faster than normal. Please refer to Scallop Module for more information.

Scallop Context

The most fundamental point of interaction of scallopy is ScallopContext. The following is a very simple example setting up a ScallopContext to compute the edge-path program:

import scallopy

# Creating a new context
ctx = scallopy.ScallopContext()

# Add relation of `edge`
ctx.add_relation("edge", (int, int))
ctx.add_facts("edge", [(0, 1), (1, 2)])

# Add rule of `path`
ctx.add_rule("path(a, c) = edge(a, c) or path(a, b) and edge(b, c)")

# Run!
ctx.run()

# Check the result!
print(list(ctx.relation("path"))) # [(0, 1), (0, 2), (1, 2)]

Roughly, the program above can be divided into three phases:

Setup the context: this involves defining relations, adding facts to relations, and adding rules that do the computation
Running the program inside of context
Fetch the results

While the 2nd and 3rd steps are the place where the computation really happens, it's more important for the programmers to correctly setup the full context for computation. We now elaborate on what are the high-level things to do when setting up the context

Configurations

When creating a new ScallopContext, one should configure it with intended provenance. If no argument is supplied, as shown in the above example, the context will be initialized with the default provenance, unit, which resembles untagged semantics (a.k.a. discrete Datalog). To explicitly specify this, you can do

ctx = scallopy.ScallopContext(provenance="unit")

Of course, Scallop can be used to perform reasoning on probabilistic and differentiable inputs. For instance, you can write the following

ctx = scallopy.ScallopContext(provenance="minmaxprob") # Probabilistic
# or
ctx = scallopy.ScallopContext(provenance="diffminmaxprob") # Differentiable

For more information on possible provenance information, please refer to the provenance section. It it worth noting that some provenance, such as topkproofs, accept additional parameters such as k. In this case, you can supply this as additional arguments when creating the context:

ctx = scallopy.ScallopContext(provenance="topkproofs", k=5) # top-k-proofs provenance with k = 5

Adding Program

Given that a context has been configured and initialized, we can set it up the quickest by loading a program into the context. One can either load an external .scl file, or directly inserting a program written as Python string. To directly add a full program string to the context, one can do

ctx.add_program("""
  rel edge = {(0, 1), (1, 2)}
  rel path(a, c) = edge(a, c) or path(a, b) and edge(b, c)
""")

On the other hand, assuming that there is a file edge_path.scl that contains the same content as the above string, one can do

ctx.import_file("edge_path.scl")

Adding Relations

Instead of adding program as a whole, one can also add relations one-at-a-time. When adding new relations, one would need to supply the name as well as the type of the relation. For example, the edge relation can be defined as follows

ctx.add_relation("edge", (int, int))

Here, we are saying that edge is an arity-2 relation storing pairs of integers. Note that we are specifying the type using Python's int type. This is equivalent to the i32 type inside Scallop. Therefore, the above instruction tranlates to the following Scallop code:

rel edge(i32, i32)

Many existing Python types can directly translate to Scallop type. In particular, we have the mapping listed as follows:

Python Type	Scallop Type
`int`	`i32`
`bool`	`bool`
`float`	`f32`
`str`	`String`

In case one want to use types other than the listed ones (e.g., usize), they can be accessed directly using the string "usize", or they can be accessed through predefined types such as scallopy.usize. The example below defines a relation of type (usize, f64, i32):

ctx.add_relation("my_relation", (scallopy.usize, "f64", int))

Specifically for arity-1 relations, users don't need to use a tuple to specify the type. For instance,

ctx.add_relation("digit", int)

Adding Facts

The most basic version of adding facts into an existing relation inside of an existing context. We are assuming that the context has a provenance of "unit".

ctx.add_facts("edge", [(1, 2), (2, 3)])

If the relation is declared to be having arity-1 and that the type is a singleton type instead of a 1-tuple, then the facts inside of the list do not need to be a tuple.

ctx.add_relation("digit", int)
ctx.add_facts("digit", [1, 2, 3])

Probabilistic Facts (Tagged Facts)

When the Scallop context is configured to use a provenance other than. If one wants to add facts along with probabilities, they can wrap their non-probabilistic facts into tuples whose first element is a simple probability. For example, if originally we have a fact 1, wrapping it with a corresponding probability gives us (0.1, 1), where 0.1 is the probability.

ctx.add_facts("digit", [1, 2, 3])                      # without probability
ctx.add_facts("digit", [(0.1, 1), (0.2, 2), (0.7, 3)]) # with probability

Of course, if the original facts are tuples, the ones with probability will be required to wrap further:

ctx.add_facts("color", [("A", "blue"), ("A", "green"), ...])               # without probability
ctx.add_facts("color", [(0.1, ("A", "blue")), (0.2, ("A", "green")), ...]) # with probability

We can extend this syntax into tagged facts in general. Suppose we are using the boolean semiring (boolean), we are going to tag each fact using values such as True or False.

ctx = scallopy.Context(provenance="boolean")
ctx.add_relation("edge", (int, int))
ctx.add_facts("edge", [(True, (1, 2)), (False, (2, 3))])

Non-tagged Facts in Tagged Context

Adding Rules

Tagged Rules

Running

Additional Features

There are more features provided by the ScallopContext interface. We hereby list them for reference.

Cloning

One can copy a context to create a new context. The resulting context will contain all the program, configurations, and provenance information.

new_ctx = ctx.clone()

The cloning feature relates to pseudo-incremental computation and branching computation. We elaborate on this in the Branching Computation section.

Compiling

Iteration Count Limit

One can configure the

Early Discarding

Obtaining Context Information

Foreign Functions and Predicates

Saving and Loading

Please refer to the Saving and Loading section for more information.

Branching Executions

One cool feature that scallopy supports is branching execution. People can create a context, clone it to form new contexts, and modify the new context without touching the old ones. This is particularly useful when incremental computation is desired.

# Create the first version of the context
ctx = scallopy.ScallopContext()
ctx.add_relation(...)
ctx.add_facts(...)

# Branch it into another context
ctx1 = ctx.clone()
ctx1.add_relation(...)
ctx1.add_facts(...)
ctx1.run() # Running the first context

# Branch it into one more context; `ctx1` and `ctx2` are completely disjoint
ctx2 = ctx.clone()
ctx2.add_relation(...)
ctx2.add_facts(...)
ctx2.run() # Running the second context

Configuring Provenance

Foreign Functions

While there are existing foreign functions such as $hash and $abs, people sometimes want more functions to be included for specialized computation. scallopy provides such interface and allows user to define foreign functions in Python. Here is an example defining a custom $sum function in Python which is later used in Scallop:

# Create a new foreign function by annotating an existing function with `@scallopy.foreign_function`
# Note that this function has variable arguments!
@scallopy.foreign_function
def my_sum(*args: int) -> int:
  s = 0
  for x in args:
    s += x
  return s

# Create a context
ctx = scallopy.ScallopContext()

# Register the declared foreign function (`my_sum`)
# Note that the function needs to be registered before it is used
ctx.register_foreign_function(my_sum)

# Add some relations
ctx.add_relation("I", (int, int))
ctx.add_facts("I", [(1, 2), (2, 3), (3, 4)])

# Add a rule which uses the registered function!
ctx.add_rule("R($my_sum(a, b)) = I(a, b)")

# Run the context
ctx.run()

# See the result, should be [(3,), (5,), (7,)]
print(list(ctx.relation("R")))

Now we elaborate on how we define new foreign functions in Python.

Function Signature

The annotator @scallopy.foreign_function performs analysis of the annotated Python function and makes sure that it is accepted as a Scallop foreign function. We require that types are annotated on all arguments and the return value. For simplicity, Python types such as int, bool, and str are mapped to Scallop types (and type families) as following:

Python type	Scallop type	Scallop base types
`int`	`Integer` family	`i8`, `i16`, ..., `u8`, `u16`, ..., `usize`
`float`	`Float` family	`f32`, `f64`
`bool`	`bool`	`bool`
`str`	`String`	`String`

If one desires to use a more fine-grained type

Argument Types

Optional Arguments

Variable Arguments

Error Handling

Debugging Proofs

We offer capability in Scallop to debug the internal proofs generated by provenance. This can be done through using special provenances specifically designed for debugging.

Debugging Top-K Proofs

We can use the provenance difftopkproofsdebug. Take the sum_2 application as an example, we have

ctx = scallopy.ScallopContext(provenance="difftopkproofsdebug", k=3)
ctx.add_relation("digit_a", int)
ctx.add_relation("digit_b", int)
ctx.add_rule("sum_2(a + b) = digit_a(a) and digit_b(b)")

# !!! SPECIAL TREATMENT WHEN INSERTING FACTS !!!
ctx.add_facts("digit_a", [((torch.tensor(0.1), 1), (1,)), ((torch.tensor(0.9), 2), (2,))])
ctx.add_facts("digit_b", [((torch.tensor(0.9), 3), (1,)), ((torch.tensor(0.1), 4), (2,))])

ctx.run()
result = ctx.relation("sum_2")

A majority of the code will look identical to the original example, but special treatment is required when adding new facts to the context. Originally, tags are just torch.tensor(SOME_PROBABILITY). But now, we need to supply an extra positive integer which we call Fact ID. Each of the fact added should be the following:

ctx.add_facts("SOME_RELATION", [
  ( ( torch.tensor(0.1),       1       ) , (    0,   ) ) # Fact 1
  #   -- PROBABILITY --  -- FACT ID --
  # --------------- TAG --------------     -- TUPLE --
  ( ( torch.tensor(0.9),       2       ) , (    1,   ) ) # Fact 2
  ( ( torch.tensor(0.1),       3       ) , (    0,   ) ) # Fact 2
  # ...
])

Basically, fact IDs are labels that the programmers can add to instrument the computation process. All IDs should start from 1, and should be distinct and contiguous for all added facts. The programmers will need to be responsible for managing the IDs so that there is no collision. Let's say 10 facts are added with probability, then there should be 10 fact IDs ranging from 1 to 10, inclusive. Of course, for non-probabilistic facts, the whole tag can be specified as None.

Debug in Forward Mode

When used in forward mode, one should do the following. The forward module should be setup just like before, with the only change being the provenance configuration:

sum_2 = scallopy.Module(
  provenance="difftopkproofsdebug",
  k=3,
  program="""
    type digit_a(a: i32), digit_b(b: i32)
    rel sum_2(a + b) = digit_a(a) and digit_b(b)
  """,
  output_relation="sum_2",
  output_mapping=[2, 3, 4])

Let's assume that only 1 and 2 are added for each digit in the sum_2 task. That is, we have digit A and digit B. Digit A can be 1 or 2, and the digit B can be 1 or 2 as well. Looking at the following code, we have the fact IDs being

Digit A is 1: Fact ID 1
Digit A is 2: Fact ID 2
Digit B is 1: Fact ID 3
Digit B is 2: Fact ID 4

Notice that the fact IDs all start from 1 and are contiguous (i.e., no gap in the used fact IDs). Also notice that, when in forward mode, the inputs need to be batched. In this example, let's only focus on one single data-point, say Datapoint 1.

digit_a = [
  [((torch.tensor(0.1), 1), (1,)), ((torch.tensor(0.9), 2), (2,))], # Datapoint 1
  # ...
]
digit_b = [
  [((torch.tensor(0.9), 3), (1,)), ((torch.tensor(0.1), 4), (2,))], # Datapoint 1
  # ...
]

After preparing the input facts, we can invoke the debug module as the following

(result_tensor, proofs) = sum_2(digit_a=digit_a, digit_b=digit_b)

Here, the output will be a tuple (result_tensor, proofs), unlike the non-debug version where only one single PyTorch tensor is returned. Specifically, the two components will be

result_tensor = torch.tensor([
  [0.09, 0.8119, 0.09], # Datapoint 1
  # ...
])

proofs = [
  [ # Datapoint 1
    [ # Proofs of (2,)
      [ (True, 1), (True, 3) ], # The first proof is 1 + 1 (using positive fact 1 and 3)
    ],
    [ # Proofs of (3,)
      [ (True, 1), (True, 4) ], # The first proof is 1 + 2 (using positive fact 1 and 4)
      [ (True, 2), (True, 3) ], # The second proof is 2 + 1 (using positive fact 2 and 3)
    ],
    [ # Proofs of (4,)
      [ (True, 2), (True, 4) ], # The first proof is 2 + 2 (using positive fact 2 and 4)
    ]
  ],
  # ...
]

Notice that result_tensor is just the original output probability tensor. The proofs will be List[List[List[List[Tuple[bool, int]]]]]. In particular, it is Batch -> Datapoint Results -> Proofs -> Proof -> Literal. Where each Literal is Tuple[bool, int] where the boolean indicates the positivity of the literal and the integer indicates the used fact ID.

Scallop CLI

Scallop Interpreter

Scallop REPL

Scallop Compiler

Full Scallop Grammar

SCALLOP_PROGRAM ::= ITEM*

ITEM ::= TYPE_DECL
       | RELATION_DECL
       | CONST_DECL
       | QUERY_DECL

TYPE ::= u8 | u16 | u32 | u64 | u128 | usize
       | i8 | i16 | i32 | i64 | i128 | isize
       | f32 | f64 | char | bool
       | String
       | CUSTOM_TYPE_NAME

TYPE_DECL ::= type CUSTOM_TYPE_NAME = TYPE
            | type CUSTOM_TYPE_NAME <: TYPE
            | type ENUM_TYPE_NAME = VARIANT1 [= VAL1] | VARIANT2 [= VAL2] | ...
            | type ADT_TYPE_NAME = CONSTRUCTOR1(TYPE*) | CONSTRUCTOR2(TYPE*) | ...
            | type RELATION_NAME(TYPE*)
            | type RELATION_NAME(VAR1: TYPE1, VAR2: TYPE2, ...)
            | type $FUNCTION_NAME(VAR1: TYPE1, VAR2: TYPE2, ...) -> TYPE_RET

CONST_DECL ::= const CONSTANT_NAME : TYPE = CONSTANT
             | const CONSTANT_NAME1 [: TYPE1] = CONSTANT1, CONSTANT_NAME2 [: TYPE2] = CONSTANT2, ...

RELATION_DECL ::= FACT_DECL
                | FACTS_SET_DECL
                | RULE_DECL

CONSTANT ::= true | false | NUMBER_LITERAL | STRING_LITERAL

CONST_TUPLE ::= CONSTANT | (CONSTANT1, CONSTANT2, ...)

FOREIGN_FN ::= hash | string_length | string_concat | substring | abs

BIN_OP ::= + | - | * | / | % | == | != | <= | < | >= | > | && | || | ^

UNARY_OP ::= ! | -

CONST_EXPR ::= CONSTANT
             | CONST_EXPR BIN_OP CONST_EXPR | UNARY_OP CONST_EXPR
             | $ FOREIGN_FN(CONST_EXPR*)
             | if CONST_EXPR then CONST_EXPR else CONST_EXPR
             | ( CONST_EXPR )

TAG ::= true | false | NUMBER_LITERAL  // true/false is for boolean tags; NUMBER_LITERAL is used for probabilities

FACT_DECL ::= rel RELATION_NAME(CONST_EXPR*)         // Untagged fact
            | rel TAG :: RELATION_NAME(CONST_EXPR*)  // Tagged fact

FACTS_SET_DECL ::= rel RELATION_NAME = {CONST_TUPLE1, CONST_TUPLE2, ...}                  // Untagged tuples
                 | rel RELATION_NAME = {TAG1 :: CONST_TUPLE1, TAG2 :: CONST_TUPLE2, ...}  // Tagged tuples
                 | rel RELATION_NAME = {TAG1 :: CONST_TUPLE1; TAG2 :: CONST_TUPLE2; ...}  // Tagged tuples forming annotated disjunction

EXPR ::= VARIABLE
       | CONSTANT
       | EXPR BIN_OP EXPR | UNARY_OP EXPR
       | new CONSTRUCTOR(EXPR*)
       | $ FOREIGN_FN(EXPR*)
       | if EXPR then EXPR else EXPR
       | ( EXPR )

ATOM ::= RELATION_NAME(EXPR*)

RULE_DECL ::= rel ATOM :- FORMULA | rel ATOM = FORMULA                // Normal rule
            | rel TAG :: ATOM :- FORMULA | rel TAG :: ATOM = FORMULA  // Tagged rule

FORMULA ::= ATOM
          | not ATOM | ~ ATOM                                                   // negation
          | FORMULA1, FORMULA2, ... | FORMULA and FORMULA | FORMULA /\ FORMULA  // conjunction
          | FORMULA or FORMULA | FORMULA \/ FORMULA                             // disjunction
          | FORMULA implies FORMULA | FORMULA => FORMULA                        // implies
          | case VARIABLE is ENTITY
          | CONSTRAINT
          | AGGREGATION
          | ( FORMULA )

ENTITY ::= CONSTRUCTOR(ENTITY*)

CONSTRAINT ::= EXPR // When expression returns a boolean value

AGGREGATOR ::= count | sum | prod | min | max | exists | forall | unique

AGGREGATION ::= VAR* = AGGREGATOR(VAR* : FORMULA)                             // Normal aggregation
              | VAR* = AGGREGATOR(VAR* : FORMULA where VAR* : FORMULA)        // Aggregation with group-by condition
              | VAR* = AGGREGATOR[VAR*](VAR* : FORMULA)                       // Aggregation with arg (only applied to AGGREGATOR = min or max)
              | VAR* = AGGREGATOR[VAR*](VAR* : FORMULA where VAR* : FORMULA)  // Aggregation with arg and group-by condition (only applied to AGGREGATOR = min or max)
              | forall(VAR* : FORMULA)
              | exists(VAR* : FORMULA)

QUERY_DECL ::= query RELATION_NAME
             | query ATOM

Scallop Book