Debugging Proofs

We offer capability in Scallop to debug the internal proofs generated by provenance. This can be done through using special provenances specifically designed for debugging.

Debugging Top-K Proofs

We can use the provenance difftopkproofsdebug. Take the sum_2 application as an example, we have

ctx = scallopy.ScallopContext(provenance="difftopkproofsdebug", k=3)
ctx.add_relation("digit_a", int)
ctx.add_relation("digit_b", int)
ctx.add_rule("sum_2(a + b) = digit_a(a) and digit_b(b)")

# !!! SPECIAL TREATMENT WHEN INSERTING FACTS !!!
ctx.add_facts("digit_a", [((torch.tensor(0.1), 1), (1,)), ((torch.tensor(0.9), 2), (2,))])
ctx.add_facts("digit_b", [((torch.tensor(0.9), 3), (1,)), ((torch.tensor(0.1), 4), (2,))])

ctx.run()
result = ctx.relation("sum_2")

A majority of the code will look identical to the original example, but special treatment is required when adding new facts to the context. Originally, tags are just torch.tensor(SOME_PROBABILITY). But now, we need to supply an extra positive integer which we call Fact ID. Each of the fact added should be the following:

ctx.add_facts("SOME_RELATION", [
  ( ( torch.tensor(0.1),       1       ) , (    0,   ) ) # Fact 1
  #   -- PROBABILITY --  -- FACT ID --
  # --------------- TAG --------------     -- TUPLE --
  ( ( torch.tensor(0.9),       2       ) , (    1,   ) ) # Fact 2
  ( ( torch.tensor(0.1),       3       ) , (    0,   ) ) # Fact 2
  # ...
])

Basically, fact IDs are labels that the programmers can add to instrument the computation process. All IDs should start from 1, and should be distinct and contiguous for all added facts. The programmers will need to be responsible for managing the IDs so that there is no collision. Let's say 10 facts are added with probability, then there should be 10 fact IDs ranging from 1 to 10, inclusive. Of course, for non-probabilistic facts, the whole tag can be specified as None.

Debug in Forward Mode

When used in forward mode, one should do the following. The forward module should be setup just like before, with the only change being the provenance configuration:

sum_2 = scallopy.Module(
  provenance="difftopkproofsdebug",
  k=3,
  program="""
    type digit_a(a: i32), digit_b(b: i32)
    rel sum_2(a + b) = digit_a(a) and digit_b(b)
  """,
  output_relation="sum_2",
  output_mapping=[2, 3, 4])

Let's assume that only 1 and 2 are added for each digit in the sum_2 task. That is, we have digit A and digit B. Digit A can be 1 or 2, and the digit B can be 1 or 2 as well. Looking at the following code, we have the fact IDs being

  • Digit A is 1: Fact ID 1
  • Digit A is 2: Fact ID 2
  • Digit B is 1: Fact ID 3
  • Digit B is 2: Fact ID 4

Notice that the fact IDs all start from 1 and are contiguous (i.e., no gap in the used fact IDs). Also notice that, when in forward mode, the inputs need to be batched. In this example, let's only focus on one single data-point, say Datapoint 1.

digit_a = [
  [((torch.tensor(0.1), 1), (1,)), ((torch.tensor(0.9), 2), (2,))], # Datapoint 1
  # ...
]
digit_b = [
  [((torch.tensor(0.9), 3), (1,)), ((torch.tensor(0.1), 4), (2,))], # Datapoint 1
  # ...
]

After preparing the input facts, we can invoke the debug module as the following

(result_tensor, proofs) = sum_2(digit_a=digit_a, digit_b=digit_b)

Here, the output will be a tuple (result_tensor, proofs), unlike the non-debug version where only one single PyTorch tensor is returned. Specifically, the two components will be

result_tensor = torch.tensor([
  [0.09, 0.8119, 0.09], # Datapoint 1
  # ...
])

proofs = [
  [ # Datapoint 1
    [ # Proofs of (2,)
      [ (True, 1), (True, 3) ], # The first proof is 1 + 1 (using positive fact 1 and 3)
    ],
    [ # Proofs of (3,)
      [ (True, 1), (True, 4) ], # The first proof is 1 + 2 (using positive fact 1 and 4)
      [ (True, 2), (True, 3) ], # The second proof is 2 + 1 (using positive fact 2 and 3)
    ],
    [ # Proofs of (4,)
      [ (True, 2), (True, 4) ], # The first proof is 2 + 2 (using positive fact 2 and 4)
    ]
  ],
  # ...
]

Notice that result_tensor is just the original output probability tensor. The proofs will be List[List[List[List[Tuple[bool, int]]]]]. In particular, it is Batch -> Datapoint Results -> Proofs -> Proof -> Literal. Where each Literal is Tuple[bool, int] where the boolean indicates the positivity of the literal and the integer indicates the used fact ID.