..
      Licensed under the Apache License, Version 2.0 (the "License"); you may
      not use this file except in compliance with the License. You may obtain
      a copy of the License at

          http://www.apache.org/licenses/LICENSE-2.0

      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
      WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
      License for the specific language governing permissions and limitations
      under the License.

      Convention for heading levels in OVN documentation:

      =======  Heading 0 (reserved for the title in a document)
      -------  Heading 1
      ~~~~~~~  Heading 2
      +++++++  Heading 3
      '''''''  Heading 4

      Avoid deeper levels because they do not render well.

=========================================
Debugging the DDlog version of ovn-northd
=========================================

This document gives some tips for debugging correctness issues in the
DDlog implementation of ``ovn-northd``.  To keep things conrete, we
assume here that a failure occurred in one of the test cases in
``ovn-e2e.at``, but the same methodology applies in any other
environment.  If none of these methods helps, ask for assistance or
submit a bug report.

Before trying these methods, you may want to check the northd log
file, ``tests/testsuite.dir/<test_number>/northd/ovn-northd.log`` for
error messages that might explain the failure.

Compare OVSDB tables generated by DDlog vs C
--------------------------------------------

The first thing I typically want to check when ``ovn-northd-ddlog``
does not behave as expected is how the OVSDB tables computed by DDlog
differ from what the C implementation produces.  Fortunately, all the
infrastructure needed to do this already exists in OVN.

First, let's modify the test script, e.g., ``ovn.at`` to dump the
contents of OVSDB right before the failure.  The most common issue is
a difference between the logical flows generated by the two
implementations.  To make it easy to compare the generated flows, make
sure that the test contains something like this in the right place::

    ovn-sbctl dump-flows > sbflows
    AT_CAPTURE_FILE([sbflows])

The first line above dumps the OVN logical flow table to a file named
``sbflows``.  The second line ensures that, if the test fails,
``sbflows`` get logged to ``testsuite.log``.  That is not particularly
useful for us right now, but it means that if someone later submits a
bug report, that's one more piece of data that we don't have to ask
for them to submit along with it.

Next, we want to run the test twice, with the C and DDlog versions of
northd, e.g., ``make check -j6 TESTSUITEFLAGS="-d 111 112"`` if 111
and 112 are the C and DDlog versions of the same test.  The ``-d`` in
this command line makes the test driver keep test directories around
even for tests that succeed, since by default it deletes them.

Now you can look at ``sbflows`` in each test log directory.  The
``ovn-northd-ddlog`` developers have gone to some trouble to make the
DDlog flows as similar as possible to the C ones, right down to white
space and other formatting.  Thus, the DDlog output is often identical
to C aside from logical datapath UUIDs.

Usually, this means that one can get informative results by running
``diff``, e.g.::

    diff -u tests/testsuite.dir/111/sbflows tests/testsuite.dir/111/sbflows

Running the input through the ``uuidfilt`` utility from OVS will
generally get rid of the logical datapath UUID differences as well::

    diff -u <(uuidfilt tests/testsuite.dir/111/sbflows) <(uuidfilt tests/testsuite.dir/111/sbflows)

If there are nontrivial differences, this often identifies your bug.

Often, once you have identified the difference between the two OVSDB
dumps, this will immediately lead you to the root cause of the bug,
but if you are not this lucky then the next method may help.

Record and replay DDlog execution
---------------------------------

DDlog offers a way to record all input table updates throughout the
execution of northd and replay them against DDlog running as a
standalone executable without all other OVN components.  This has two
advantages.  First, this allows one to easily tweak the inputs, e.g.
to simplify the test scenario.  Second, the recorded execution can be
easily replayed anywhere without having to reproduce your OVN setup.

Use the ``--ddlog-record`` option to record updates,
e.g. ``--ddlog-record=replay.dat`` to record to ``replay.dat``.
(OVN's built-in tests automatically do this.)  The file contains the
log of transactions in the DDlog command format (see
https://github.com/vmware/differential-datalog/blob/master/doc/command_reference/command_reference.md).

To replay the log, you will need the standalone DDlog executable.  By
default, the build system does not compile this program, because it
increases the already long Rust compilation time.  To build it, add
``NORTHD_CLI=1`` to the ``make`` command line, e.g. ``make
NORTHD_CLI=1``.

You can modify the log before replaying it, e.g., adding ``dump
<table>`` commands to dump the contents of relations at various points
during execution.  The <table> name must be fully qualified based on
the file in which it is declared, e.g. ``OVN_Southbound::<table>`` for
southbound tables or ``lrouter::<table>.`` for ``lrouter.dl``.  You
can also use ``dump`` without an argument to dump the contents of all
tables.

The following command replays the log generated by OVN test number
112 and dumps the output of DDlog to ``replay.dump``::

    northd/ovn_northd_ddlog/target/release/ovn_northd_cli < tests/testsuite.dir/112/northd/replay.dat > replay.dump

Or, to dump just the table contents following the run, without having
to edit ``replay.dat``::

    (cat tests/testsuite.dir/112/northd/replay.dat; echo 'dump;') | northd/ovn_northd_ddlog/target/release/ovn_northd_cli --no-delta --no-init-snapshot > replay.dump

Depending on whether and how you installed OVS and OVN, you might need
to point ``LD_LIBRARY_PATH`` to library build directories to get the
CLI to run, e.g.::

    export LD_LIBRARY_PATH=$HOME/ovn/_build/lib/.libs:$HOME/ovs/_build/lib/.libs

.. note::

   The replay output may be less informative than you expect because
   DDlog does not, by default, keep around enough information to
   include input relation and intermediate relations in the output.
   These relations are often critical to understanding what is going
   on.  To include them, add the options
   ``--output-internal-relations --output-input-relations=In_`` to
   ``DDLOG_EXTRA_FLAGS`` for building ``ovn-northd-ddlog``.  For
   example, ``configure`` as::

        ./configure DDLOG_EXTRA_FLAGS='--output-internal-relations --output-input-relations=In_'

Debugging by Logging
--------------------

One limitation of the previous method is that it allows one to inspect
inputs and outputs of a rule, but not the (sometimes fairly
complicated) computation that goes on inside the rule.  You can of
course break up the rule into several rules and dump the intermediate
outputs.

There are at least two alternatives for generating log messages.
First, you can write rules to add strings to the Warning relation
declared in ``ovn_north.dl``.  Code in ``ovn-northd-ddlog.c`` will log
any given string in this relation just once, when it is first added to
the relation.  (If it is removed from the relation and then added back
later, it will be logged again.)

Second, you can call using the ``warn()`` function declared in
``ovn.dl`` from a DDlog rule.  It's not straightforward to know
exactly when this function will be called, like it would be in an
imperative language like C, since DDlog is a declarative language
where the user doesn't directly control when rules are triggered.  You
might, for example, see the rule being triggered multiple times with
the same input.  Nevertheless, this debugging technique is useful in
practice.

You will find many examples of the use of Warning and ``warn`` in
``ovn_northd.dl``, where it is frequently used to report non-critical
errors.

Debugging panics
----------------

**TODO**: update these instructions as DDlog's internal handling of panic's
is improved.

DDlog is a safe language, so DDlog programs normally do not crash,
except for the following three cases:

- A panic in a Rust function imported to DDlog as ``extern function``.

- A panic in a C function imported to DDlog as ``extern function``.

- A bug in the DDlog runtime or libraries.

Below we walk through the steps involved in debugging such failures.
In this scenario, there is an array-index-out-of-bounds error in the
``ovn_scan_static_dynamic_ip6()`` function, which is written in Rust
and imported to DDlog as an ``extern function``.  When invoked from a
DDlog rule, this function causes a panic in one of DDlog worker
threads.

**Step 1: Check for error messages in the northd log.** A panic can
generally lead to unpredictable outcomes, so one cannot count on a
clean error message showing up in the log (Other outcomes include
crashing the entire process and even deadlocks.  We are working to
eliminate the latter possibility).  In this case we are lucky to
observe a bunch of error messages like the following in the ``northd``
log:

    ``2019-09-23T16:23:24.549Z|00011|ovn_northd|ERR|ddlog_transaction_commit():
    error: failed to receive flush ack message from timely dataflow
    thread``

These messages are telling us that something is broken inside the
DDlog runtime.

**Step 2: Record and replay the failing scenario.** We use DDlog's
record/replay capabilities (see above) to capture the faulty scenario.
We replay the recorded trace::

    northd/ovn_northd_ddlog/target/release/ovn_northd_cli < tests/testsuite.dir/117/northd/replay.dat

This generates a bunch of output ending with::

    thread 'worker thread 2' panicked at 'index out of bounds: the len is 1 but the index is 1', /rustc/eae3437dfe991621e8afdc82734f4a172d7ddf9b/src/libcore/slice/mod.rs:2681:10
    note: run with RUST_BACKTRACE=1 environment variable to display a backtrace.

We re-run the CLI again with backtrace enabled (as suggested by the
error message)::

    RUST_BACKTRACE=1 northd/ovn_northd_ddlog/target/release/ovn_northd_cli < tests/testsuite.dir/117/northd/replay.dat

This finally yields the following stack trace, which suggests array
bound violation in ``ovn_scan_static_dynamic_ip6``::

    0: backtrace::backtrace::libunwind::trace
              at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.29  10: core::panicking::panic_bounds_check
              at src/libcore/panicking.rs:61
    [SKIPPED]
    11: ovn_northd_ddlog::__ovn::ovn_scan_static_dynamic_ip6
    12: ovn_northd_ddlog::prog::__f
    [SKIPPED]

Finally, looking at the source code of
``ovn_scan_static_dynamic_ip6``, we identify the following line,
containing an unsafe array indexing operator, as the culprit::

    ovn_ipv6_parse(&f[1].to_string())

Clean build
~~~~~~~~~~~

Occasionally it's desirable to a full and complete build of the
DDlog-generated code.  To trigger that, delete the generated
``ovn_northd_ddlog`` directory and the ``ddlog.stamp`` witness file,
like this::

   rm -rf northd/ovn_northd_ddlog northd/ddlog.stamp

or::

   make clean-ddlog

Submitting a bug report
-----------------------

If you are having trouble with DDlog and the above methods do not
help, please submit a bug report to ``bugs@openvswitch.org``, CC
``ryzhyk@gmail.com``.

In addition to problem description, please provide as many of the
following as possible:

- Are you running with the right DDlog for the version of OVN?  OVN
  and DDlog are both evolving and OVN needs to build against a
  specific version of DDlog.

- ``replay.dat`` file generated as described above

- Logs: ``ovn-northd.log`` and ``testsuite.log``, if you are running
  the OVN test suite
