Adding a new OVN feature to the DDlog version of ovn-northd¶
This document describes the usual steps an OVN developer should go
through when adding a new feature to ovn-northd-ddlog. In order to
make things less abstract we will use the IP Multicast
ovn-northd-ddlog implementation as an example. Even though the
document is structured as a tutorial there might still exist
feature-specific aspects that are not covered here.
Overview¶
DDlog is a dataflow system: it receives data from a data source (a set of “input relations”), processes it through “intermediate relations” according to the rules specified in the DDlog program, and sends the processed “output relations” to a data sink. In OVN, the input relations primarily come from the OVN Northbound database and the output relations primarily go to the OVN Southbound database. The process looks like this:
from NBDB +----------+ +-----------------+ +-----------+ to SBDB
---------->|Input rels|-->|Intermediate rels|-->|Output rels|---------->
+----------+ +-----------------+ +-----------+
Adding a new feature to ovn-northd-ddlog usually involves the
following steps:
Update northbound and/or southbound OVSDB schemas.
Configure DDlog/OVSDB bindings.
Define intermediate DDlog relations and rules to compute them.
Write rules to update output relations.
Generate
Logical_Flow``s and/or other forwarding records (e.g., ``Multicast_Group) that will control the dataplane operations.
Update NB and/or SB OVSDB schemas¶
This step is no different from the normal development flow in C.
Most of the times a developer chooses between two ways of configuring a new feature:
Adding a set of columns to tables in the NB and/or SB database (or adding key-value pairs to existing columns).
Adding new tables to the NB and/or SB database.
Looking at IP Multicast, there are two OVN Northbound tables where
configuration information is stored:
Logical_Switch, columnother_config, keysmcast_*.Logical_Router, columnoptions, keysmcast_*.
These tables become inputs to the DDlog pipeline.
In addition we add a new table IP_Multicast to the SB database.
DDlog will update this table, that is, IP_Multicast receives
output from the above pipeline.
Configuring DDlog/OVSDB bindings¶
Configuring northd/automake.mk¶
The OVN build process uses DDlog’s ovsdb2ddlog utility to parse
ovn-nb.ovsschema and ovn-sb.ovsschema and then automatically
populate OVN_Northbound.dl and OVN_Southbound.dl. For each
OVN Northbound and Southbound table, it generates one or more
corresponding DDlog relations.
We need to supply ovsdb2ddlog with some information that it can’t
infer from the OVSDB schemas. This information must be specified as
ovsdb2ddlog arguments, which are read from
northd/ovn-nb.dlopts and northd/ovn-sb.dlopts.
The main choice for each new table is whether it is used for output.
Output tables can also be used for input, but the converse is not
true. If the table is used for output at all, we add -o <table>
to the option file. Our new table IP_Multicast is an output
table, so we add -o IP_Multicast to ovn-sb.dlopts.
For input-only tables, ovsdb2ddlog generates a DDlog input
relation with the same name. For output tables, it generates this
table plus an output relation named Out_<table>. Thus,
OVN_Southbound.dl has two relations for IP_Multicast:
input relation IP_Multicast (
_uuid: uuid,
datapath: string,
enabled: Set<bool>,
querier: Set<bool>
)
output relation Out_IP_Multicast (
_uuid: uuid,
datapath: string,
enabled: Set<bool>,
querier: Set<bool>
)
For an output table, consider whether only some of the columns are
used for output, that is, some of the columns are effectively
input-only. This is common in OVN for OVSDB columns that are managed
externally (e.g. by a CMS). For each input-only column, we add --ro
<table>.<column>. Alternatively, if most of the columns are
input-only but a few are output columns, add --rw <table>.<column>
for each of the output columns. In our case, all of the columns are
used for output, so we do not need to add anything.
Finally, in some cases ovn-northd-ddlog shouldn’t change values in
. One such case is the seq_no column in the
IP_Multicast table. To do that we need to instruct ovsdb2ddlog
to treat the column as read-only by using the --ro switch.
ovsdb2ddlog generates a number of additional DDlog relations, for
use by auto-generated OVSDB adapter logic. These are irrelevant to
most DDLog developers, although sometimes they can be handy for
debugging. See the appendix for details.
Define intermediate DDlog relations and rules to compute them.¶
Obviously there will be a one-to-one relationship between logical
switches/routers and IP multicast configuration. One way to represent
this relationship is to create multicast configuration DDlog relations
to be referenced by &Switch and &Router DDlog records:
/* IP Multicast per switch configuration. */
relation &McastSwitchCfg(
datapath : uuid,
enabled : bool,
querier : bool
}
&McastSwitchCfg(
.datapath = ls_uuid,
.enabled = map_get_bool_def(other_config, "mcast_snoop", false),
.querier = map_get_bool_def(other_config, "mcast_querier", true)) :-
nb.Logical_Switch(._uuid = ls_uuid,
.other_config = other_config).
Then reference these relations in &Switch and &Router. For
example, in lswitch.dl, the &Switch relation definition now
contains:
relation &Switch(
ls: nb.Logical_Switch,
[...]
mcast_cfg: Ref<McastSwitchCfg>
)
And is populated by the following rule which references the correct
McastSwitchCfg based on the logical switch uuid:
&Switch(.ls = ls,
[...]
.mcast_cfg = mcast_cfg) :-
nb.Logical_Switch[ls],
[...]
mcast_cfg in &McastSwitchCfg(.datapath = ls._uuid).
Build state based on information dynamically updated by ovn-controller¶
Some OVN features rely on information learned by ovn-controller to
generate Logical_Flow or other records that control the dataplane.
In case of IP Multicast, ovn-controller uses IGMP to learn
multicast groups that are joined by hosts.
Each ovn-controller maintains its own set of records to avoid
ownership and concurrency with other controllers. If two hosts that
are connected to the same logical switch but reside on different
hypervisors (different ovn-controller processes) join the same
multicast group G, each of the controllers will create an
IGMP_Group record in the OVN Southbound database which will
contain a set of ports to which the interested hosts are connected.
At this point ovn-northd-ddlog needs to aggregate the per-chassis
IGMP records to generate a single Logical_Flow for group G.
Moreover, the ports on which the hosts are connected are represented
as references to Port_Binding records in the database. These also
need to be translated to &SwitchPort DDlog relations. The
corresponding DDlog operations that need to be performed are:
Flatten the
<IGMP group, ports>mapping in order to be able to do the translation fromPort_Bindingto&SwitchPort. For eachIGMP_Grouprecord in theOVN Southbounddatabase generate an individual record of typeIgmpSwitchGroupPortfor eachPort_Bindingin the set of ports that joined the group. Also, translate thePort_Bindinguuid to the correspondingLogical_Switch_Portuuid:relation IgmpSwitchGroupPort( address: string, switch : Ref<Switch>, port : uuid ) IgmpSwitchGroupPort(address, switch, lsp_uuid) :- sb::IGMP_Group(.address = address, .datapath = igmp_dp_set, .ports = pb_ports), var pb_port_uuid = FlatMap(pb_ports), sb::Port_Binding(._uuid = pb_port_uuid, .logical_port = lsp_name), &SwitchPort( .lsp = nb.Logical_Switch_Port{._uuid = lsp_uuid, .name = lsp_name}, .sw = switch).
Aggregate the flattened IgmpSwitchGroupPort (implicitly from all
ovn-controllerinstances) grouping by adress and logical switch:relation IgmpSwitchMulticastGroup( address: string, switch : Ref<Switch>, ports : Set<uuid> ) IgmpSwitchMulticastGroup(address, switch, ports) :- IgmpSwitchGroupPort(address, switch, port), var ports = port.group_by((address, switch)).to_set().
At this point we have all the feature configuration relevant
information stored in DDlog relations in ovn-northd-ddlog memory.
Pitfalls of projections¶
A projection is a join that uses only some of the data in a record. When the fields that are used have duplicates, the result can be many “copies” of a record, which DDlog represents internally with an integer “weight” that counts the number of copies. We don’t have a projection with duplicates in this example, but lswitch.dl has many of them, such as this one:
relation LogicalSwitchHasACLs(ls: uuid, has_acls: bool)
LogicalSwitchHasACLs(ls, true) :-
LogicalSwitchACL(ls, _).
LogicalSwitchHasACLs(ls, false) :-
nb::Logical_Switch(._uuid = ls),
not LogicalSwitchACL(ls, _).
When multiple projections get joined together, the weights can overflow, which causes DDlog to malfunction. The solution is to make the relation an output relation, which causes DDlog to filter it through a “distinct” operator that reduces the weights to 1. Thus, LogicalSwitchHasACLs is actually implemented this way:
output relation LogicalSwitchHasACLs(ls: uuid, has_acls: bool)
For more information, see Avoiding weight overflow in the DDlog tutorial.
Write rules to update output relations¶
The developer updates output tables by writing rules that generate
Out_* relations. For IP Multicast this means:
/* IP_Multicast table (only applicable for Switches). */
sb::Out_IP_Multicast(._uuid = hash128(cfg.datapath),
.datapath = cfg.datapath,
.enabled = set_singleton(cfg.enabled),
.querier = set_singleton(cfg.querier)) :-
&McastSwitchCfg[cfg].
Note
OVN_Southbound.dl also contains an IP_Multicast
relation with input qualifier. This relation stores the
current snapshot of the OVSDB table and cannot be written to.
Generate Logical_Flow and/or other forwarding records¶
At this point we have defined all DDlog relations required to generate
Logical_Flow``s. All we have to do is write the rules to do so.
For each ``IgmpSwitchMulticastGroup we generate a Flow that has
as action "outport = <Multicast_Group>; output;":
/* Ingress table 17: Add IP multicast flows learnt from IGMP (priority 90). */
for (IgmpSwitchMulticastGroup(.address = address, .switch = &sw)) {
Flow(.logical_datapath = sw.dpname,
.stage = switch_stage(IN, L2_LKUP),
.priority = 90,
.__match = "eth.mcast && ip4 && ip4.dst == ${address}",
.actions = "outport = \"${address}\"; output;",
.external_ids = map_empty())
}
In some cases generating a logical flow is not enough. For IGMP we
also need to maintain OVN southbound Multicast_Group records,
one per IGMP group storing the corresponding Port_Binding uuids of
ports where multicast traffic should be sent. This is also relatively
straightforward:
/* Create a multicast group for each IGMP group learned by a Switch.
* 'tunnel_key' == 0 triggers an ID allocation later.
*/
sb::Out_Multicast_Group (.datapath = switch.dpname,
.name = address,
.tunnel_key = 0,
.ports = set_map_uuid2name(port_ids)) :-
IgmpSwitchMulticastGroup(address, &switch, port_ids).
We must also define DDlog relations that will allocate tunnel_key
values. There are two cases: tunnel keys for records that already
existed in the database are preserved to implement stable id
allocation; new multicast groups need new keys. This kind of
allocation can be tricky, especially to new users of DDlog. OVN
contains multiple instances of allocation, so it’s probably worth
reading through the existing cases and following their pattern, and,
if it’s still tricky, asking for assistance.
Appendix A. Additional relations generated by ovsdb2ddlog¶
ovsdb2ddlog generates some extra relations to manage communication with the OVSDB server. It generates records in the following relations when rows in OVSDB output tables need to be added or deleted or updated.
In the steady state, when everything is working well, a given record
stays in any one of these relations only briefly: just long enough for
ovn-northd-ddlog to send a transaction to the OVSDB server. When
the OVSDB server applies the update and sends an acknowledgement, this
ordinarily means that these relations become empty, because there are
no longer any further changes to send.
Thus, records that persist in one of these relations is a sign of a
problem. One example of such a problem is the database server
rejecting the transactions sent by ovn-northd-ddlog, which might
happen if, for example, a bug in a .dl file would cause some OVSDB
constraint or relational integrity rule to be violated. (Such a
problem can often be diagnosed by looking in the OVSDB server’s log.)
DeltaPlus_IP_Multicastused by the DDlog program to track new records that are not yet added to the database:output relation DeltaPlus_IP_Multicast ( datapath: uuid_or_string_t, enabled: Set<bool>, querier: Set<bool> )
DeltaMinus_IP_Multicastused by the DDlog program to track records that are no longer needed in the database and need to be removed:output relation DeltaMinus_IP_Multicast ( _uuid: uuid )
Update_IP_Multicastused by the DDlog program to track records whose fields need to be updated in the database:output relation Update_IP_Multicast ( _uuid: uuid, enabled: Set<bool>, querier: Set<bool> )