Debugging Proofs
We offer capability in Scallop to debug the internal proofs generated by provenance. This can be done through using special provenances specifically designed for debugging.
Debugging Top-K Proofs
We can use the provenance difftopkproofsdebug
.
Take the sum_2
application as an example, we have
ctx = scallopy.ScallopContext(provenance="difftopkproofsdebug", k=3)
ctx.add_relation("digit_a", int)
ctx.add_relation("digit_b", int)
ctx.add_rule("sum_2(a + b) = digit_a(a) and digit_b(b)")
# !!! SPECIAL TREATMENT WHEN INSERTING FACTS !!!
ctx.add_facts("digit_a", [((torch.tensor(0.1), 1), (1,)), ((torch.tensor(0.9), 2), (2,))])
ctx.add_facts("digit_b", [((torch.tensor(0.9), 3), (1,)), ((torch.tensor(0.1), 4), (2,))])
ctx.run()
result = ctx.relation("sum_2")
A majority of the code will look identical to the original example,
but special treatment is required when adding new facts to the context.
Originally, tags are just torch.tensor(SOME_PROBABILITY)
.
But now, we need to supply an extra positive integer which we call Fact ID.
Each of the fact added should be the following:
ctx.add_facts("SOME_RELATION", [
( ( torch.tensor(0.1), 1 ) , ( 0, ) ) # Fact 1
# -- PROBABILITY -- -- FACT ID --
# --------------- TAG -------------- -- TUPLE --
( ( torch.tensor(0.9), 2 ) , ( 1, ) ) # Fact 2
( ( torch.tensor(0.1), 3 ) , ( 0, ) ) # Fact 2
# ...
])
Basically, fact IDs are labels that the programmers can add to instrument the computation process.
All IDs should start from 1, and should be distinct and contiguous for all added facts.
The programmers will need to be responsible for managing the IDs so that there is no collision.
Let's say 10 facts are added with probability, then there should be 10 fact IDs ranging from
1 to 10, inclusive.
Of course, for non-probabilistic facts, the whole tag can be specified as None
.
Debug in Forward Mode
When used in forward
mode, one should do the following.
The forward module should be setup just like before, with the only change being the provenance configuration:
sum_2 = scallopy.Module(
provenance="difftopkproofsdebug",
k=3,
program="""
type digit_a(a: i32), digit_b(b: i32)
rel sum_2(a + b) = digit_a(a) and digit_b(b)
""",
output_relation="sum_2",
output_mapping=[2, 3, 4])
Let's assume that only 1 and 2 are added for each digit in the sum_2
task.
That is, we have digit A and digit B.
Digit A can be 1 or 2, and the digit B can be 1 or 2 as well.
Looking at the following code, we have the fact IDs being
- Digit
A
is 1: Fact ID 1 - Digit
A
is 2: Fact ID 2 - Digit
B
is 1: Fact ID 3 - Digit
B
is 2: Fact ID 4
Notice that the fact IDs all start from 1 and are contiguous (i.e., no gap in the used fact IDs).
Also notice that, when in forward mode, the inputs need to be batched.
In this example, let's only focus on one single data-point, say Datapoint 1
.
digit_a = [
[((torch.tensor(0.1), 1), (1,)), ((torch.tensor(0.9), 2), (2,))], # Datapoint 1
# ...
]
digit_b = [
[((torch.tensor(0.9), 3), (1,)), ((torch.tensor(0.1), 4), (2,))], # Datapoint 1
# ...
]
After preparing the input facts, we can invoke the debug module as the following
(result_tensor, proofs) = sum_2(digit_a=digit_a, digit_b=digit_b)
Here, the output will be a tuple (result_tensor, proofs)
, unlike the non-debug version where only one single PyTorch tensor is returned.
Specifically, the two components will be
result_tensor = torch.tensor([
[0.09, 0.8119, 0.09], # Datapoint 1
# ...
])
proofs = [
[ # Datapoint 1
[ # Proofs of (2,)
[ (True, 1), (True, 3) ], # The first proof is 1 + 1 (using positive fact 1 and 3)
],
[ # Proofs of (3,)
[ (True, 1), (True, 4) ], # The first proof is 1 + 2 (using positive fact 1 and 4)
[ (True, 2), (True, 3) ], # The second proof is 2 + 1 (using positive fact 2 and 3)
],
[ # Proofs of (4,)
[ (True, 2), (True, 4) ], # The first proof is 2 + 2 (using positive fact 2 and 4)
]
],
# ...
]
Notice that result_tensor
is just the original output probability tensor.
The proofs
will be List[List[List[List[Tuple[bool, int]]]]]
.
In particular, it is Batch -> Datapoint Results -> Proofs -> Proof -> Literal
.
Where each Literal
is Tuple[bool, int]
where the boolean indicates the positivity of the literal and the integer indicates the used fact ID.