SwitchV: Automated SDN Switch Validation with P4 Models

Kinan Dak Albab†
Brown University
Ali Kheradmand
Google
Muhammad Timarzi†

Jonathan DiLorenzo
Google
Steffen Smolka
Google
Jiaqi Gao
Harvard University
Minlan Yu
Harvard University, Google

ABSTRACT
Increasing demand on computer networks continuously pushes manufacturers to incorporate novel features and capabilities into their switches at an ever-accelerating pace. However, the traditional approach to switch development relies on informal specifications and handcrafted tests to ensure reliability, which are tedious and slow to maintain and update, effectively putting feature velocity at odds with reliability.

This work describes our experiences following a new approach during the development of switch software stacks that extend fixed-function ASICs with SDN capabilities. Specifically, we focus on SwitchV, our system for automated end-to-end switch validation using fuzzing and symbolic analysis, that evolves effortlessly with the switch specification. Our approach is centered around using the P4 language to model the data plane behavior of the switch as well as its control plane API. Such P4 models are then used as a formal specification by SwitchV, as well as a switch-agnostic contract by SDN controllers, and a living documentation by engineers.

SwitchV found a total of 154 bugs spanning all switch layers. The majority of bugs were highly relevant and fixed within 14 days.

CCS CONCEPTS
• Networks → Network reliability; Programming interfaces; • Software and its engineering → System description languages; Software verification and validation;

KEYWORDS
P4, P4 modeling, SDN switch validation, PINS, SAI, fuzzing, symbolic execution, automated test generation

ACM Reference Format:

1 INTRODUCTION
Motivation Demands on computer networks are ever-increasing. Networks are constantly challenged to become more reliable, more flexible, and more efficient. This drives manufacturers and operators to design, implement, and deploy new features and capabilities at an accelerating pace. In this paper, we focus on the following conundrum: How can we increase the reliability of our network infrastructure while simultaneously increasing feature velocity?

This question has led hyper-scalars to adopt novel approaches. Google uses Software Defined Networking (SDN) [19, 28], a network architecture that separates the control and data planes, to increase feature velocity and improve debugging. Microsoft [4] and Amazon [2] use network verification (§8) to detect network configuration bugs. Meanwhile, the way we build switches, and especially their software comprising network operating systems and control APIs, has largely remained the same: we write informal, often incomplete and ambiguous specifications (if any) in English, and check specification compliance using hand-crafted test suites. Verification techniques have been applied to switch hardware (§8), but not switch software or end-to-end correctness.

We are reaching an inflection point at which this approach no longer scales, due to these trends: First, with SDN, the controller, the switch software stack, and the switch hardware and drivers are typically developed by different teams across several companies. Timely and correct integration of these components thus hinges on precise encoding and understanding of the specification and API of each component, which must be agreed upon by various teams across organizational boundaries. Second, data from Microsoft Azure [55] and Facebook [40] suggest that the success of network verification at catching configurations bugs may have reached a point of diminishing returns, with many network failures now occurring due to switch hardware and software bugs. Finally, the accelerated evolution of switches is increasing the complexity of their software, which now often include an abundance of features and capabilities unlikely to be fully used by any single operator, which makes their validation and maintenance more challenging [13].

At its core, the traditional approach to switch design—relying on informal English specifications and hand-crafted tests—inevitably puts feature velocity and reliability at odds, forcing switch designers to carry out a balancing act between these two properties. Increasing reliability entails writing more tests, which saddles the development of new features with having to update these tests as
that human testers can inspect to identify the root cause, which may be a bug in the switch or the P4 model.

Automated validation using SwitchV helps reconcile the tension between reliability and feature velocity, as it alleviates the burden of updating and extending hand-crafted tests. Instead, SwitchV automatically generates new tests as the specification evolves. We have structured our approach to provide safeguards and incentives to keep the P4 model in sync with the implementation: We run SwitchV periodically (e.g., daily), catching any divergence between the P4 model and the switch behavior almost immediately. Additionally, in contrast to informal English specifications, updating our P4 models as the implementation evolves provides immediate value by yielding test coverage for new or changed features "for free"; it is also a technical necessity for exposing features in PINS to the controller, as the control plane API is defined by the P4 program. Effectively, this makes our P4 models a living documentation that engineers can consult for a precise, yet abstract and implementation-agnostic view of the current end-to-end behavior of the switch, mitigating the problem of out-of-date specifications.

We used SwitchV to validate two switch stacks under development called PINS and Cerberus (§6). SwitchV found 122 and 32 bugs in the two stacks, including bugs in the hardware, various software layers, the P4 toolchain, and the P4 models themselves.

Ethics Statement This work does not raise any ethical issues.

2 OVERVIEW

The centerpiece of our approach is using P4 to specify the API and behavior of the switch in its intended role (§3). The choice of P4 as a modeling language is integral to SwitchV, as it enables the automated validation of the control plane API of the switch and its data plane forwarding behavior.

The P4 language semantics ensure that P4 programs are unambiguous, making them suitable for use as formal specifications. P4 has relatively few and simple constructs (e.g., compared to general-purpose languages such as C++), which makes it easier to automatically analyze and reason about P4 programs, while also being more mature and familiar to network engineers than a custom-made modeling language. We show a simplified portion of a P4 program that expresses parts of a typical IPv4 routing flow in Figure 2.

Fixed-Function Switches Our focus in this paper is on the use of SwitchV to validate fixed-function switches [5] running the PINS software stack (Figure 4). A fixed-function switch consists of a rigid ASIC with limited flexibility. Concretely, the forwarding pipeline in such a switch is mostly fixed and encoded in the hardware: Operators cannot arbitrarily change the routing logic, control flow, and supported protocol headers. However, such a switch can still be programmed by a controller by installing table entries that the fixed logic matches against. For example, the controller can install entries that forward packets with a certain destination IP on a certain port, or drop packets from a specific source address. The controller performs this programming by issuing requests via the switch’s control plane API. There are also limited ways in which the so-called ACL tables, which are invoked at pre-determined places in the rigid packet-processing pipeline, can be configured prior to programming, allowing to trade off expressivity (# of table entries supported),...
PINS* (SONIC + P4*)

SDN Controller (Orion)
P4Runtime

11

12

14

15

16

17

18

19

20

21

22

23

24

25

26

27

Figure 2: Simplified portion of a fixed-function routing pipeline modeled as a P4 program.

// id table match keys => action action args

v1 vrf_tbl 1 => no_action void
v2 vrf_tbl 0 => no_action void
v3 vrf_tbl 1 => set_nexthop_id 1
i1 ipv4_tbl 10.0.*.* => set_nexthop_id 1
i2 ipv4_tbl 10.0.*.* => set_nexthop_id 1
i3 ipv4_tbl 10.0.*.* => set_nexthop_id 1
i4 ipv4_tbl 10.0.*.* => set_nexthop_id 1
i5 ipv4_tbl 10.0.*.* => set_nexthop_id 1

Figure 3: Table entries for Figure 2 in a human-readable form. Entries v2, v3, i2, i3, and i4 are invalid.

We model fixed-function switches as P4 programs. This is an unorthodox use of P4, which is designed and traditionally used to install custom forwarding pipelines onto P4-enabled switches. In contrast to a fixed-function switch, a P4-enabled switch re-arranges its pipeline whenever a P4 program is installed on it, effectively acting as an interpreter of that program. After installation, the controller issues control requests to the P4 switch to manage its table entries, whose signatures must match those defined in the installed P4 program, as shown in Figure 3. The interface exposed to the controller is governed by the P4Runtime Protocol [15], a standardized, RPC-based protocol specifying the exact binary format of these requests and how the switch is allowed to handle them.

PINS Fixed-function switches from different vendors or of different makes may have different capabilities and internals and may expose different APIs and protocols to the controller. This makes deploying and managing a heterogeneous network challenging. Recent work proposes various switch software layers [13, 39, 50] that provide common abstractions and APIs. The P4 Integrated Network Stack (PINS) [20] is a new software switch stack that extends fixed-function switches with limited programming capabilities and a unified control API. PINS is based on SONIC [56], an open-source network operating system built atop the vendor abstraction layer.

SAI [49], allowing it to run across hardware from different vendors. PINS extends SONIC with a P4Runtime interface—configured and governed by an accompanying P4 program—to the controller. Together, the P4Runtime Protocol and the P4 program constitute a contract between the PINS switch and the controller, precisely specifying program-independent concerns, e.g., the signature and abstract semantics of RPC calls, and program-dependent concerns, e.g., the tables exposed for programming and the packet-forwarding semantics of their entries, and the constraints on table entries that encode hardware limitations, respectively. The same P4 program is used to configure the ACLs on the PINS switch. PINS implements a P4Runtime server that receives requests from the controller, checks that they comply with the aforementioned constraints, and applies them to the underlying ASIC via a vendor-agnostic abstraction layer called the Switch Abstraction Interface (SAI) [49].

In addition to PINS, we used SwitchV to validate Cerberus, another software stack for fixed-function switches whose details we discuss in §6. In principle, SwitchV is also directly applicable to P4-enabled switches. The control plane API validation component of SwitchV relies on the switch exposing a P4Runtime API to communicate with the switch under test and to judge—based on a given P4 program and the P4Runtime standard—whether the observed behavior is admissible. Our data plane validation component is largely independent of the P4Runtime Protocol, and can in principle be extended to validate switches that do not support it.

Scope We used SwitchV to validate the control plane API and the packet forwarding behavior of PINS-based and Cerberus-based switches end-to-end. This includes validating both the new layers added by these stacks (e.g., the P4Runtime interface) as well as the existing layers they are built on top of (e.g., the hardware ASIC, the underlying operating system), to the extent that these layers affect the control plane API and packet forwarding.
SwitchV does not validate QoS (which, to a first approximation, only affects forwarding during congestion) or “management” and “operational” aspects of the switch, e.g., port speed configuration or TLS certificate configuration, respectively.

Recent work categorizes bugs found in switches into several classes [7]. SwitchV has detected bugs from a majority of these classes in practice, including functional bugs, bugs in the architecture and associated development tools, as well as bugs caused by underspecified behavior. SwitchV is not designed to detect bugs from the two remaining classes, security and performance, which are usually an artifact of the switch’s configurations.

**Design** SwitchV consists of p4-fuzzer (§4) and p4-symbolic (§5) responsible for generating tests for the control plane API and data plane behavior, respectively. p4-fuzzer generates a sequence of control plane requests for installing, modifying, or deleting various table entries, including valid requests as well as “useful” invalid ones. p4-symbolic generates test packets that satisfy the coverage assertions provided by test engineers, e.g., hitting every table entry.

Testers provide both components with an input P4 program that acts as a specification. Additionally, they provide p4-symbolic with a set of table entries that represent the switch’s present forwarding state. These are usually a replay of production table entries. Testers also provide a coverage metric (e.g., branch or trace coverage) relative to the P4 program encoded using Boolean assertions.

For each type of generated test, SwitchV provides a mechanism for judging whether the switch’s response was admissible or not. p4-fuzzer provides an Oracle that determines whether the response of a switch to a control plane request complies with the P4Runtime standard instantiated for the given P4 program, which determines the format of table entries and may contain additional constraints in the form of @refers_to and @entry_restriction (§3) annotations. We run test packets generated by p4-symbolic against the BMv2 P4 simulator configured with the input P4 program and table entries, and check that the behavior of the switch matches some observed behavior of BMv2. For both types of tests, we do not predict a single correct outcome, but rather check that the observed behavior is valid. For any test, there may be multiple valid behaviors due to under-specification in the P4Runtime Protocol, or non-determinism in the P4 program.

When SwitchV encounters switch behavior that it deems to be invalid, it produces a log of the incident. A human must inspect this log to investigate the root cause of the issue, and how it can be addressed. This may be a result of a bug in the switch, our P4Runtime Oracle, or in the P4 simulator. Additionally, when the switch is fixed-function, it may be that the switch’s behavior is correct, and the P4 program incorrectly encoded the desired functionality.

### 3 Modeling a Fixed-Function Switch in P4

We discuss our experience designing P4 programs that we use as models for fixed-function switches running PINS (Figure 4). At a high level, our P4 programs are an encoding of SAI with a similar structure to the SAI object model [50]. For the most part, we encode each SAI object as a P4 match-action table. Our P4 models are role-specific: they share a similar high-level structure and re-use many of the same components. However, they differ in components that depend on the deployment role of the switch.

We encode various resource limits and semantic constraints into these P4 programs, such that the P4 program includes all the necessary information to determine whether any control plane request (e.g., installation of table entries) would be accepted by the switch. Figure 2 shows simple examples of this where the P4 program specifies a minimum number of entries (i.e., size) for each table that the hardware is guaranteed to meet. This guarantees that the switch will accept any request that is valid from the perspective of the P4 program (and its embedded resource limits) per the P4Runtime Protocol semantics.

Furthermore, we design these P4 programs to exhibit our desired packet forwarding behavior. Given the same table entries and configuration, the switch must forward a packet the same way that the P4 program would, e.g., if it was run via a simulator. In realistic pipelines, such as SAI, the forwarding behavior may include non-determinism (e.g., for load balancing purposes), and thus this guarantee is defined over the set of possible behaviors per packet.

We found P4 to be suitable for modeling due to several important properties. P4 programs lend themselves well to automated validation. They specify both the control plane API and the data planes behavior of switch, and are unambiguous and machine-readable. This becomes apparent when contrasted with traditional approaches that write specifications informally in English. Furthermore, these programs are implementation-agnostic, they depend on the deployment role rather than the exact switch capabilities, and thus can be reused for different switches when deployed in the same role. Additionally, these P4 programs are living documentation that encode the specification, the contract, and the exposed functionality all at once, and thus ensure that the implementation and validation remain in sync. Finally, they enable rapid innovation as new switch hardware or software features can be quickly exposed by updating the P4 program, without having to wait for the lengthy process of exposing them via newer releases of NOS, SAI, or various standards.

**P4 Language Features** P4 is designed for programming P4-enabled switches, rather than modeling fixed-function ones. However, we found the core language features, specifically its tables and match action pipelines, expressive enough to allow us to model our target pipelines while also being amenable to symbolic execution. Conversely, several language features, including header stacks, unions, and registers were not needed to model our target pipelines, even though they may be important for P4 programming generally.

Several P4 targets, including the BMv2 P4 simulator [45], do not allow revisiting tables in multiple locations in a pipeline. This restriction stems from practical limitations in the targeted programmable switches (e.g., Intel Tofino switches [27]). However, it poses a challenge when modeling certain components, such as SAI’s router interfaces (RIFs), which interface with the underlying switch ports at both ingress and egress. Such components cannot be modeled as a single P4 table, since such a table could be matched on more than once (e.g., at both ingress and egress). Instead, they need to be modeled using workarounds, such as replicating them in several tables, which are then used in different locations. Such workarounds are merely modeling artifacts and must be accompanied by explicit constraints in the model to ensure their consistency.
e.g., the replicas in our example must have the same table entries since they correspond to the same actual component.

**P4-Constraints** A critical feature missing from P4 is the ability to encode semantic constraints on table entries to match the semantics of the underlying pipeline being modeled. Since it is primarily designed for P4-enabled switches, P4Runtime is a relatively permissive API that disallows syntactically invalid control plane requests, but is oblivious to semantic validity which differs between scenarios. This flexibility causes challenges in PINS, which uses this permissive protocol for programming the restrictive underlying fixed-function hardware. We mitigated this by providing mechanisms for specifying API constraints in the P4 program and enforcing these constraints at run time in PINS’s P4Runtime layer.

Consider a simple P4 implementation that looks up the IPv4 or IPv6 destination addresses in a ban-list. This can be modeled in P4 as a table that matches on IPv4 and IPv6 destination addresses as well as the packet type. From the perspective of P4Runtime, this is a table with three match keys, each with no particular semantic significance. This means that P4Runtime may accept nonsensical entries, such as entries that match the IPv6 destination address of IPv4 packets and vice versa. Additionally, P4Runtime may accept entries that cannot be mapped to hardware, entries not in canonical form, and entries that would interfere with the internals of the switch being modeled. For example, in PINS, the default VRF 0 is reserved by the hardware and cannot be programmed by the controller with table entries (Figure 2 line 4).

To capture such semantic restrictions, we built P4-constraints [14], a P4 extension that enables us to specify custom constraints on table entries using annotations in the P4 program. These constraints are part of the contract with the controller, and we use the constraints while validating the control plane API of switches to determine the semantic validity of the generated test requests. In our experience, we needed to model two kinds of constraints: (1) (isolated) requirements imposed by the underlying switch, such as excluding special built-in values. (2) Integrity constraints relating entries in different tables that correspond to inter-related switch components, or to the same component that is captured by multiple tables for modularity or due to other modeling artifacts.

We express the first kind of constraints via `@entry.restriction`, which can be attached to tables to restrict their entries using Boolean constraints that may refer to the keys of the table along with Boolean and relational operators. The `@refers.to` annotation allows us to encode the second kind of constraints and provides referential integrity, which essentially disallows dangling references between two tables. For example, Figure 2 encodes a common pattern where VRF IDs, which are modeled by an earlier table (`vrf.th1`), are matched against a later table (`ipv4.th1`). By using `@refers.to` in line 13, we disallow `ipv4.th1` entries that use non-existing VRF IDs, such as entry 12 from Figure 3. This captures a restriction of SAI, which requires that VRFs must be allocated (modeled in P4 by programming the VRF table) before they can be used. We open-sourced the P4-constraints extension, which is now a part of the P4 toolchain.

**Role Specific Instantiations** Developing a general model of PINS switches is undesirable. Such a hypothetical model must capture all the capabilities of the switch, even the ones that are not used in its deployed role. This would make the model overly permissive and unnecessarily complex. For example, on many ASICs, ACL can be configured to match against various combinations of packet headers and metadata. However, due to hardware (TCAM) limitations, only a few of the available headers fields can be matched on at a time. A natural way of modeling this in P4 is via a table that matches on all header fields. Table entries can then choose to match against any desired subset, and leave the remaining keys unset. However, such a model accepts table entries that match on keys beyond the capacity of the switch hardware, and is thus too permissive to use as a specification for SwitchV, or as a contract between a PINS switch and the controller.

Instead, we construct a different P4 model for each role the switch might be deployed in (e.g., ToR, WAN). For each of these roles, the combination of keys used for ACL is fixed and fits within hardware limits. Thus, the role-specific ACL can be directly expressed as a P4 table that matches only on that specific combination. We view these role-specific models as “instantiations” of the same blueprint. They have the same high-level structure as SAI. They re-use a lot of the same components and pipelines and only differ in the role-specific functionality, which currently only includes ACL. We simplify the effort required to design and maintain these instantiations by grouping all common components into a common P4 library, and instantiating from it using macros and preprocessors.

When a PINS switch is deployed in a role, we push the corresponding instantiation onto it to configure its ACL and establish its control plane API. Using the same instantiation for validation and configuration is an added benefit, but is incidental to SwitchV, which discovers deviations between a P4 model and a switch, regardless of how the model is organized, or whether the deviations stem from programmable or fixed-function components in the switch.

**Bounded Internal Resources** Fixed-function switches have a variety of internal mechanisms and resources that are handled by low-level components of the switch. Sometimes it makes sense to model them even if they do not affect packet-forwarding directly, since our P4 programs also aim to capture resource limits and availability. For example, in SAI, VRFs are limited resources that have to be allocated before they can be used. Therefore, we modeled VRFs in our P4 programs (Figure 2) using a table whose P4 semantics is a no-op, but whose semantics in PINS is to allocate and deallocate VRFs when entries are inserted or removed.

**Hashing** Switches often use hashing for load balancing purposes (e.g., to implement WCMP [59]). However, the exact hashing algorithm used is an internal detail that may differ across different switches and may be kept private by vendors. This makes precise modeling of the hashing algorithm challenging. Furthermore, hashing algorithms are often complex, and modeling them would result in complex models that are harder to analyze automatically. We chose to model hashing as an unspecified black box in our P4 programs, which we view as a “free” operation from a validation perspective (§5).

Recently, P4wn [29] proposed a more sophisticated treatment of hashing and other stateful constructs in forwarding pipelines for adversarial testing. Our modeling of hashing is far more primitive, and we did not need to use stateful constructs, such as registers, in our P4 models. It may be interesting to adapt some of P4wn’s gray
We reconcile these differences by introducing an additional logical oracle to check that the switch handled the updates correctly. It then uses an $\text{p4-fuzzer}$ between the expected and observed control plane API of the switch.

Naively generating invalid requests by randomly choosing values (e.g., random table or action IDs) produces “uninteresting” requests. Instead of sampling the space of requests uniformly, we use a mutation-based approach. After generating a valid control plane request to add an 11th entry, the $\text{p4-fuzzer}$ may generate control plane requests that violate no obvious rules in the $\text{P4Runtime}$ specification (provided sufficient resources), or reject the request (regardless of available resources).

Example 1. Consider a $\text{P4}$ program that specifies a match-action table $T$ of size 10. Assume $T$ already has 10 entries and received a control plane request to add an 11th entry. The $\text{P4Runtime}$ specification allows the switch to accept the request and install the entry (provided sufficient resources), or reject the request (regardless of available resources).

Example 2. $\text{P4Runtime}$ supports batch table updates. A single Write RPC may contain $n$ updates, and the switch is free to execute these updates in any order. In theory, this gives rise to $n!$ different executions, though many may lead to the same outcome in practice.

To deal with this, our oracle does not predict a single expected outcome. Instead, the oracle observes the switch’s response to ensure it belongs to the set of valid ones. Note that this requires the oracle to keep track of the switch’s current state.

Under-specified Behaviors The $\text{P4Runtime}$ specification generally allows for more than one possible behavior given a request. Here we discuss two such examples.

Example 1. Consider a $\text{P4}$ program that specifies a match-action table $T$ of size 10. Assume $T$ already has 10 entries and received a control plane request to add an 11th entry. The $\text{P4Runtime}$ specification allows the switch to accept the request and install the entry (provided sufficient resources), or reject the request (regardless of available resources).

Example 2. $\text{P4Runtime}$ supports batch table updates. A single Write RPC may contain $n$ updates, and the switch is free to execute these updates in any order. In theory, this gives rise to $n!$ different executions, though many may lead to the same outcome in practice.

To deal with this, our oracle does not predict a single expected outcome. Instead, the oracle observes the switch’s response to ensure it belongs to the set of valid ones. Note that this requires the oracle to keep track of the switch’s current state.

Valid and Invalid Requests We define a request to be syntactically valid if it conforms to the format specified by the $\text{P4}$ program and the $\text{P4Runtime}$ specification [15]. A request is constraint compliant if it does not violate any user-defined constraints annotated in the $\text{P4}$ program using the $\text{P4-constraints}$ extension [14]. A request is valid if the $\text{P4Runtime}$ specification dictates that it may be accepted by the switch in some state. A request is valid if and only if it is syntactically valid and constraint compliant. Valid requests may still be rejected in some states, for example, a valid request may be rejected due to insufficient resources, or because it tries to delete a non-existent table entry. Conversely, a request is invalid if it must be rejected, i.e., by being syntactically invalid or not constraint compliant.

The syntactic validity of a request can be assessed by analyzing the input $\text{P4}$ program, which determines the appropriate request format (e.g., the allowed headers or actions). The $\text{P4}$ program also includes the constraints as annotations.

4.1 Generating Valid Requests

$\text{p4-fuzzer}$ analyzes information regarding the existing tables in the input $\text{P4}$ program. This includes the table types and the headers and actions that they match. $\text{p4-fuzzer}$ uses this information to generate control plane requests that violate no obvious rules in the $\text{P4Runtime}$ specification. For example, $\text{p4-fuzzer}$ abides by the defined bit-size of each field in the generated table entry, and selects actions from the set of permitted actions in the corresponding table definition. We currently do not enforce constraint compliance, and thus frequently generate invalid requests for tables with constraints (e.g., size $2$ in Figure 3). We discuss ongoing work to support this in §7.

4.2 Generating Invalid Requests

Naively generating invalid requests by randomly choosing values (e.g., random table or action IDs) produces “uninteresting” requests. Such naive random requests are syntactically invalid with a high probability and end up exercising only the first few checks in the switch. This would leave most of the deeper and more complex control space untested.

Instead of sampling the space of requests uniformly, we use a mutation-based approach. After generating a valid control plane
request (as described above), p4-fuzzer applies a mutation randomly chosen from a specified list to produce a new request. Our mutations are based on historical analysis of control and data plane bugs and the expertise of engineers who manually debug them. Many also derive from the P4Runtime specification. These mutations produce entries that are usually "interestingly" invalid. Each invalid request is generated by applying a single mutation to a valid request, and the same valid request can be used many times to produce different invalid requests via different mutations. We give a few example mutations below.

**Single Action Tables** As shown in Figure 3, each table entry must consist of a valid table ID, an action ID permitted by the table, an appropriate number of arguments of appropriate sizes for the action, and at most one match field entry per each key in the table. We can generate “interesting” invalid entries by intentionally modifying a valid entry to violate any one of the above properties. For example, the Invalid ID mutation takes a valid entry and replaces its table, match field, or action ID with an ID that does not exist in the P4 program. Invalid Table Action replaces a valid action ID with an out-of-scope action. Other similar mutations include Invalid Match Type, Duplicate Match Field and Missing Mandatory Match Field

**One-shot Action Selector Programming** [15] Tables of this type allow an entry to map to a set of actions, each accompanied by a strictly positive probability weight. The Invalid Action Selector Weight mutation assigns a non-positive weight to an action in an action selector set. The Invalid Table Implementation mutation attempts to send a table entry with an action set to a single-action table and vice versa.

**Other Mutations** The Invalid Reference mutation picks a non-existing value for a field that refers to another field as specified by the refers_to annotation. Other mutations generate entries that refer to invalid resources (e.g. port or QoS queue), duplicate existing entries, or delete non-existing ones.

### 4.3 Oracle

We built an oracle that encodes the P4Runtime specification to determine if the switch behaves correctly. Due to under-specified behavior, there may be multiple correct output switch states and behaviors for a given request and input state. Attempting to keep track of all valid states throughout a sequence of requests can quickly lead to a state-explosion. Instead, our oracle issues a read to the switch to observe its actual state after a batch of requests, and then determines whether that state is valid or not (given the observed state of the switch prior to the batch). If the new state is indeed valid, our oracle can forget the prior state, and repeat the same process for the next batch of requests. This lets us process the next batch from a single state, reducing the non-determinism to the current batch of requests and a single starting state.

The oracle significantly simplifies the valid and invalid request generation process, since it allows the process to be sound. While the P4Runtime specification is complex, the oracle’s implementation is significantly smaller than the P4Runtime layer on the switch. Furthermore, SwitchV can detect bugs in the oracle implementation since such bugs likely lead to a divergence between the expected and observed switch behavior.

### 4.4 Running Test Requests

We generate many (e.g. few thousands) valid and invalid requests using the above procedure. We then group these requests into several batches. We send the batches to the switch sequentially. The switch may process the requests inside a single batch in parallel and in any order. Therefore, the batches must only include independent requests whose validity does not depend on the order of execution.

We ensure this by automatically analyzing the refers_to annotations in our P4 model, which encode any dependencies between table entries. We use this information to sequence any dependent requests to different batches. Note that for valid requests, we only generate ones that depend on previously installed entries or have no dependencies. Our Invalid Reference mutation generates invalid requests that refer to and thus depend on non-existent entries, to test whether the switch correctly rejects such requests.

### 5 DATA PLANE VALIDATION: P4-SYMBOLIC

**Overview** p4-symbolic validates the packet-forwarding behavior of a switch by generating a set of test packets. SwitchV runs the test packets against the switch and the BMv2 P4 simulator, and monitors their behavior. Mismatches indicate a potential issue in either the switch, model, or simulator. Testers provide p4-symbolic with three inputs: (1) the P4 program that models the switch’s expected behavior, (2) the table entries used to configure both the switch and simulator, and (3) coverage assertions describing the minimum coverage guarantees the generated test packets must meet.

**Symbolic Execution** p4-symbolic maintains a symbolic state $S$ that maps the header and metadata fields from the P4 program to their corresponding symbolic values at the current state of execution. Additionally, p4-symbolic builds a symbolic trace that maps each control flow construct in the program to a symbolic expression which evaluates to true if the construct is executed.

Initially, the symbolic trace is empty as the program is not executed yet, and the symbolic state consists of a mapping of each packet header and metadata field to a unique unconstrained symbolic variable. We denote the set of such variables by $X$. A concrete input test packet is an assignment of concrete values to the variables in $X$. As the program is symbolically analyzed, the symbolic state and trace are mutated with the effects of the analyzed expressions.

At the end of the symbolic execution, the symbolic state $S$ maps each field of the output packet header to a symbolic expression over variables from $X$. We denote these output symbolic expressions by $Y$. The symbolic trace now similarly maps every control flow
construct, such as a specific branch or table entry, to a Boolean symbolic expression over $X$. We denote this complete trace by $T$.

**Coverage Constraints** After symbolic execution is completed, $p4$-symbolic iteratively poses several coverage constraints over $X, Y$, and $T$. Each constraint asserts a certain desired property about the input packet, output packet, or the execution trace. This ensures that generated packets collectively meet the desired coverage requirements. Each constraint is passed to our backend SMT solver, Z3 [17], which produces a satisfying test packet if the constraint is satisfiable.

If $p4$-symbolic is configured to maximize branch coverage, this step produces a sequence of $|T|$ constraints, each asserting that a unique control flow construct (e.g., a branch or table entry) is executed or matched. Alternatively, maximizing trace coverage requires an assertion per each possible combination of program branches and entries. As the number of control constructs in the program increases, the number of such combinations (and thus assertions) increases combinatorically. This makes trace coverage impractical for complex P4 programs. To alleviate this, $p4$-symbolic exposes $X, Y$, and $T$ to test engineers so they can ensure coverage of a selected subset of important traces, allowing for a practical middle ground between branch and trace coverage.

**Decidability** Control blocks in P4 encode single-pass forwarding pipelines. Tables cannot be reused, and there are no mechanisms for iteration and recursion. Furthermore, $p4$-symbolic generates SMT formulas that are quantifier-free. The SMT constraints generated by $p4$-symbolic for typical programs only use the theories of bitvectors and equality, which are decidable. Replacing bitvectors with unbounded integers, it may be possible to build *constrained* examples of P4 programs and coverage assertions that require undecidable theories (e.g., Peano arithmetic). Even then, Z3 is mature and capable of solving many instances of such theories in a reasonable time using its built-in heuristics.

**Limitations** We do not support certain P4 constructs that we did not use in our P4 programs (header stacks, unions, and named calculations). To reduce the implementation effort, we deprioritized the support for generic P4 parsers. Instead, we rely on semi-hardcoded support for parser patterns of interest. Supporting a generic parser is mainly an engineering task that we leave as future work.

**Hashing** P4 programs may compute hashes over packet header fields and other metadata for purposes such as load-balancing. The exact hashing algorithm used by the switch is often unknown and may differ across switches. $p4$-symbolic interprets the hash as a free operation: The output of the hash is allowed to be any value, even in cases where these values are outside the range of the concrete hash. We rely on constraints set further down the symbolic execution to restrict the values the hash can take, for example when the value of the hash is used in a table match. To judge the correctness of the switch, SWITCHV configures the P4 simulator to use round-robin hashing, and runs the test packet through it several times (i.e., until the same behavior occurs twice) to build the set of all possible behaviors, and then checks that it includes the observed switch behavior.

**Trace Isolation** In regular execution, only one branch of a conditional expression is executed, and the remaining ones are ignored. However, this is not the case during symbolic execution where all branches are analyzed. This poses a challenge: We must ensure that the symbolic side effects from all such sibling branches are isolated.

A common approach (e.g. in KLEE [8]) is to symbolically execute each trace in the program in isolation, with a completely separate state. This guarantees isolation of side effects. However, the number of traces in P4 programs is prohibitive: while a P4 program typically includes a handful of *explicit* conditionals (e.g., if statements), the table entries (the number of which in our experience can be in the order of several hundreds) constitute an *implicit* form of branches that can quickly blow-up the number of traces. For example, a simple flow with three consecutive tables, with 100 entries each, would result in $100^3 = 1,000,000$ traces.

Existing work that symbolically executes P4 programs (e.g. [43]) avoids this by reducing the number of table entries, which negatively impacts the coverage. Instead, $p4$-symbolic performs only a single pass over the program, executing branches against the same symbolic state. We ensure isolation by encoding the context of the branch as a logical guard that is applied to all consequent side effects. Guarded assignments, and more generally guarded commands, are a well-known technique dating back to the seminal work of Dijkstra [18]. We adapt this technique to P4, where guards are deduced from branches and control flow statements, but also from table entries and action matches.

**Example** Consider the P4 program from Figure 2. Assume the table entries $v1, i1$, and $i5$ from Figure 3 are passed as inputs to $p4$-symbolic. We focus on $p4$-symbolic as it reaches the application of the ipv4_tbl table (line 23). At this point, the symbolic state $S$ maps headers and metadata fields to their current symbolic values, e.g. $S := \{ipv4.isValid → x_{ip}, ipv4.dst_addr → x_{dst_addr}, orf_id → x_{orf_id}\}$. The current context $C$ captures all the constraints for the program to reach this point of execution, e.g. because the execution is inside the body of a conditional (line 22), its corresponding condition ($x_{ip} = true$) is reflected in $C$.

$p4$-symbolic iterates over the entries in ipv4_tbl in descending order of priority (longest prefix match in this case). In each step, it maps the corresponding entry $e$ to an expression $T[e]$ capturing the condition in which the entry gets matched. The expression is the conjunction of the current context, the constraints on the current values of the header and metadata fields for the entry to match, and the negation of match conditions of higher priority entries. For instance, $T[i5] := C \land (x_{orf_id} = 1) \land match(x_{dst_addr}, 10.0, +* )$ and $T[i1] := C \land (x_{orf_id} = 1) \land match(x_{dst_addr}, 10.0, +* ) \land \neg((x_{orf_id} = 1) \land match(x_{dst_addr}, 10.0, +* ))$. The last conjunct in $T[i1]$ (the negation) ensures the higher priority (longer prefix) entry $i5$ does not match.

After handling matching on the table entries, $p4$-symbolic handles the actions they invoke. Assume that the set_nexthop_id action sets the metadata field nexthop_id to the argument passed to that action. Thus, $p4$-symbolic must set nexthop_id to 10 or 3 if either $i5$ or $i1$ is matched, respectively. For trace isolation, $p4$-symbolic guards this assignment by the condition that expresses when each entry is matched, which gives $S[\text{nexthop_id}] := if T[i5] then 10 else (if T[i1] then 3 else 0)$, where $v$ is the old value prior to the match. At the end of symbolic execution, the current symbolic state $S$ becomes $Y$, e.g. $Y$ maps nexthop_id to
We discuss a few of the more interesting examples to give a flavor of the kinds of bugs SwitchV can find.

As shown in Table 1, SwitchV found bugs across the whole stack. While a plurality of these bugs were found in the new parts of the stacks under development (the P4Runtime and Orchestration Agent), some were found in pre-existing code and components, primarily because they were used in new ways. Additionally, SwitchV found bugs in the P4 programs. These bugs manifested as differences between the observed behavior of the switch (specifically the ASIC behavior) and the P4 program. Upon inspection, we determined that the switch was actually correct, and the P4 programs did not encode our desired specifications correctly. Finally, SwitchV detected bugs in the P4 toolchain, including in the BMv2 P4 simulator and the P4-PDPI [24] framework.

We discuss a few of the more interesting examples to give a flavor of the kinds of bugs SwitchV can find.

In PINS, an application in the P4Runtime server was accidentally sending certain out packets back to the controller via the packet-in [15] mechanism. This was identified via p4-symbolic. The P4Runtime server would get into an inconsistent state after receiving certain sequences of L3 table entry deletions. This was identified by p4-fuzzer. The SyncD binary did not support default
route deletion while other routes were present in the same VRF. This was identified using production table entries to setup p4 symbolic tests. The Switch Linux was running a traditional LLDP networking application, which was interfering with our SDN controller’s LLDP application. This was identified by p4 symbolic, which detects when unexpected packets get punted to the controller.

In Cerberus, p4 symbolic identified that the switch software reversed the destination IP address used for packet encapsulation, because of an issue with endianness. The hardware dropped packets on a port with a certain port speed due to electric interference. This bug was detected by p4 symbolic in a pre-production unit, and was independently detected and fixed by the switch manufacturer.

We changed the underlying chip used in PINS past the mid-point during development. After the change, we noticed a resurgence of bugs in components that had been successfully validated before, including software components and even our P4 program. These bugs were the result of variations in the exact contract provided by the new chip compared to the old one. For example, the new chip has a built-in fixed-function trap that did not exist in the old switch, which immediately punts packets to the data plane.

Some of the bugs do not fit exactly into a single component. Rather, they are symptoms of a larger design issue throughout many components. For example, the new chip imposes restrictions on entries in certain tables that are incompatible with desired ACL behaviors, and with the design decisions that higher software layers made to support these behaviors. We found that 33% of the bugs identified in PINS were integration bugs resulting from a misunderstanding of the contract between two components.

The majority of bugs encountered in PINS were fixed within 14 days, with 33% of bugs fixed within 5 days. This indicates that most of the bugs found by SwitchV were important enough for developers to fix quickly. This is in contrast to other automated techniques we used in the past, where tools would find bugs that developers would not deem important enough to act on. Anecdotally, we filed issues for 3 bugs independent from SwitchV related to bad error messages, and their mean resolution time was much worse, at 66 days. We reported 9 bugs that remain unresolved as of this writing.

### Table 2: Which bugs could be found using the trivial test suite

<table>
<thead>
<tr>
<th>Test</th>
<th>PINS</th>
<th>Cerberus</th>
</tr>
</thead>
<tbody>
<tr>
<td>Set P4Info</td>
<td>22 (18%)</td>
<td>0 (0%)</td>
</tr>
<tr>
<td>Table entry programming</td>
<td>15 (12%)</td>
<td>0 (0%)</td>
</tr>
<tr>
<td>Read all tables</td>
<td>10 (8%)</td>
<td>2 (6%)</td>
</tr>
<tr>
<td>Packet-in</td>
<td>12 (10%)</td>
<td>4 (13%)</td>
</tr>
<tr>
<td>Packet-out</td>
<td>4 (3%)</td>
<td>1 (3%)</td>
</tr>
<tr>
<td>Packet forwarding</td>
<td>0 (0%)</td>
<td>0 (0%)</td>
</tr>
<tr>
<td>Not found by any test above</td>
<td>60 (49%)</td>
<td>25 (78%)</td>
</tr>
</tbody>
</table>

SwitchV reports inconsistencies between the switch and the P4 model but leaves it up to the testers to identify the source of the inconsistency. The number of days until bug resolution, shown in Figure 7, includes the time to identify that source, as well as to apply any relevant fixes. In our experience, most of the reported time was spent either on fixing the bug or on the issue waiting in the backlog pending developers’ availability.

### 6.2 Bug Complexity

While a vast majority of bugs found by SwitchV were deemed important by developers, many of them are simple bugs that might have been detected by simpler alternative forms of testing. To evaluate how many bugs detected by SwitchV are harder to detect via traditional means, we devised the following trivial suite of traditional integration tests:

1. **Set P4Info**: Push the P4Info configuration to the switch.
2. **Table entry programming**: Install a rule in every table, including an ACL entry that punts packets to the controller and an IPv4 route.
3. **Read all tables**: Read back all tables and compare with the set of entries installed earlier.
4. **Packet-in**: Send a packet that matches the previously installed ACL punt rule and check that the correct packet gets received on the packet-io channel.
5. **Packet-out**: Send a packet via packet-out for each port, and ensure the switch sends it out through those ports in the data-plane.
6. **Packet forwarding**: Send an IPv4 packet to the switch and check that it gets forwarded correctly according to the IPv4 route installed earlier.

The tests are meant to be executed in sequence. Table 2 shows the percentage of bugs that would be found by each sub-sequence of tests, excluding bugs that would already be found by earlier tests in that sequence. We observe that about 49% of the bugs from PINS would have been found by the trivial test suite. Some of the bugs not found by our trivial suite may still be found using other reasonable test suites. Additionally, 67% of the identified bugs are restricted to a single component. Many such bugs can be detected using better unit and component tests. We have not used any other kind of integration testing in PINS, and instead relied on SwitchV to find many of the trivial bugs, since it was easier to deploy and use than manually designing test cases.

In Cerberus, we estimate that 78% of the bugs cannot be found by our trivial test suite. This is expected since most of the simpler bugs in Cerberus would have been detected by the vendor during
their testing. Some simple bugs slipped through, partly because we sometimes performed testing with SwitchV in parallel to the vendor. Additionally, this project exhibited more complex bugs because its forwarding pipeline is more complex and feature-rich.

### 6.3 Performance of SwitchV

Table 3 shows the performance of p4-symbolic and p4-fuzzer on two different production P4 programs. We ran each experiment on a single virtual CPU core in a containerized environment. We use p4-symbolic to hit every reachable input table entry at least once (i.e., branch coverage). The generation column measures the time taken for this step (with and without a cache), while the testing column shows the time needed to run the generated packets through the switch and BMv2, and compare their behavior. We configure p4-fuzzer to generate 1000 write requests with approximately 50 table entry updates each. We have found this to be sufficient to catch a wide variety of bugs when run daily. We have not focused on optimizing performance, beyond caching for p4-symbolic, as the current performance is acceptable for our purposes.

#### Caching

The slowest stage in SwitchV is generating the test packets by repeatedly invoking Z3 to solve the SMT constraints produced by p4-symbolic. If the input P4 program and table entries are unchanged from previous runs, or their changes do not affect the SMT constraints, we simply lookup the test packets from a cache. Thus, we only need to run the expensive generation stage when the specifications have changes that affect the SMT formula, which is less frequent than updating the switch stack under validation.

### 7 DISCUSSION

#### P4 as a Specification Language

Our experience demonstrates that P4 can successfully model the behavior of switches, including fixed-function ones. Many of the challenges we encountered while modeling SAI in P4 stem from the flexibility (and thus permissiveness) of P4 and the P4Runtime Protocol compared to the restrictive and fixed semantics of the underlying switches and SAI. These challenges offer important insights that can help design the next generation of modeling and specification tools for networking. Indeed, we developed several P4 extensions [14, 24] to overcome some of these challenges, and up-streamed them as standalone open-source modules integrated into the P4 toolchain.

P4 programs provide a balance between the low-level detail required to capture the correctness of individual switches and the simplicity and formalism required for effective automated analysis. Using P4 programs as specifications allows SwitchV to validate the control plane API of a switch and its data plane behavior. This allows SwitchV to detect bugs that would not be observed using only data plane validation. Furthermore, bugs caused by errors in the control plane API (e.g., during installation of table entries) are detected earlier, rather than during packet forwarding where the root cause is more removed from where the bug occurred (e.g., packets being forwarded via the wrong port because a table entry was installed incorrectly). Currently, table entries generated using fuzzing are only used to validate the control plane API. A potential future extension is to also pass these entries to p4-symbolic and use them with the generated test packets, which can exercise additional control paths during data plane validation.

In other domains, testing tools often rely on “throw-away” specifications whose sole purpose is validation. These specifications can be complex and often use domain-specific modeling languages. Thus, updating and maintaining these specifications as the system describe evolves can require significant effort, and may lead to them going out of sync, even when validation is automated. This is a common problem that extends beyond switch validation to other domains ranging from maintenance of software test cases [41] to formal proofs [51]. By contrast, our P4 specifications are multi-purpose and “organic” to the networking ecosystem, and thus are more likely to be always up-to-date. They are the primary interface that expose new features in PINS, they define the contract with the controller, and they provide a living documentation that engineers can consult, and have additional incentives to continuously maintain. The effectiveness of our P4 models as documentation depends on the familiarity of the readers with P4, and we found that many developers prefer them to having to read thousands of lines of low-level C implementation.

#### Fuzzing

p4-fuzzer relies on mutations to generate valid and invalid control plane requests. This is a common technique that has been used extensively for fuzzing [6, 10, 46]. Unlike state-of-the-art general purpose smart fuzzing tools [52, 57], our fuzzer is intentionally specialized to our target domain with manually curated mutations based on the expertise of network engineers. This allows us to find interesting invalid requests, whose space is much smaller than the entire space of invalid requests, while also minimizing the engineering effort to build and maintain p4-fuzzer.

While our experience demonstrates that our fuzzer is capable of detecting interesting bugs throughout the switch stack, our approach does not provide provable formal coverage guarantees. This is a direction for future research, that may benefit from using smart fuzzing mechanisms, including coverage-based fuzzers, provided that they can be succinctly tuned to the idiosyncrasies of the P4Runtime protocol (e.g., using domain-specific fuzzing tools [47]).

p4-fuzzer may benefit from adding more mutations that increase the complexity of the generated invalid requests. One possibility is to use techniques that deduce mutations algorithmically [35] or via learning [23, 54]. Furthermore, we are currently implementing an explicit mutation for reasoning about P4-constraints based on binary decision diagrams (BDD). The mechanism transforms every constraint in the P4 program into a BDD over the bits of the header and metadata fields referred to in that constraint. We can efficiently sample solutions to this BDD to ensure that our valid
tests are constraint-compliant, and randomly mutate one of the
nodes of the BDD to generate (otherwise valid) table entries that
violate the corresponding constraint.

We considered alternative designs for control plane API valida-
tion that do not rely on fuzzing. While we can easily generate syntac-
tically valid table entries with an SMT solver, it is challenging to gen-
erate “interesting” invalid ones for black box implementations. In-
stead, we would need to use program analysis techniques to analyze
the implementation (specifically, PINS’s P4Runtime layer), and find
invalid entries that exercise different “deep” control paths within
that implementation. This may offer some advantages as it can
provide more precise coverage guarantees, similar to p4-symboli-
c. However, it requires dealing with implementations far more com-
plex than the P4 programs that p4-symboli c analyzes, both in
terms of size and friendliness to automated analysis (the switch’s
implementation is in C rather than P4). Furthermore, the imple-
mentation may frequently change, which may require updating the
automated analysis, e.g. in case unsupported features were used.

While existing tools have had some success with program analysis
of complex system implementations (e.g. KLEE [8]), we opted to rely
on mutation-based fuzzing as it provides a simpler mechanism for
generating useful invalid entries with black box implementations.

Development Processes Using SwitchV
We believe SwitchV is best suited to be run periodically and frequently (e.g. nightly)
during the development of switch stacks. This allows SwitchV
to detects issues, including complex integration bugs, earlier in
the development cycle. Additionally, developers get quick feedback
after their changes and can correct mistakes quickly. Furthermore,
SwitchV can be used earlier and more frequently than more expen-
sive forms of testing, such as DVT or fabric testing, which are
difficult to run on incomplete stacks. We do not recommend replac-
ing fabric testing with SwitchV, rather we recommend running
SwitchV frequently prior to fabric testing to shorten the develop-
ment cycle, and running fabric and in-production testing normally
after specific milestones are achieved.

We track the progress of larger projects at Google by defining
milestones in terms of objectives and key results (OKRs). SwitchV
provides a methodical way of tracking the state of components
of the switch stack, and we found that it provides a natural set
of metrics to measure the progress towards completing an OKR
for some feature F. For example, the percentage of fuzzed table
entries related to F that are correctly handled by the switch, or the
percentage of table entries related to F that produce correct output
packets when hit by test packets using p4-symboli c.

8 RELATED WORK
Automatic test generation is an established technique shown to
be an effective alternative to manual testing in various domains,
including system programs [8], circuits [11], enterprise applications
[53], and GUI applications [42]. P4pktgen [43] and ATPG [58] adapt
this technique for validating the data plane of a switch, but not its
c ontrol plane API.

P4pktgen
P4pktgen uses symbolic execution to analyze P4 pro-
grams and generate test packets. P4pktgen does not take table
entries as inputs. Instead, it generates at most a single entry per
table along side the test packet. Thus, the generated packets have
lower coverage of the control paths in the switch stack. For example,
P4pktgen cannot detect bugs in the switch’s implementation of
priority or longest prefix matching, since such bugs cannot be
observed without at least two correlated entries in the same table.

ATPG
ATPG operates on the switch’s configuration files and FIBs,
and uses them to build a model of the entire network. This model
essentially views a switch as a single match table (i.e. sequence
of rules) that directly match and produce packets. ATPG analyzes
the model and generates test packets that exercise various links
or rules in the network, and test its performance under different
loads. ATPG’s abstract view of a switch cannot represent lower-
level switch functionalities, such as behavior that depends on the
switch state (e.g. NAT), non-determinism (e.g. WCMP), and punt-
ing. However, this level of abstractions allows ATPG to effectively
reason about network-wide properties (e.g. reachability, liveness)
and performance (e.g. congestion).

Header Space Analysis
HSA [30] is a well-established technique for analyzing
packet forwarding behavior, and is used by many

tools (including ATPG) as a foundation for their data plane analysis.
Similarly, HSA could be used for data plane validation in SwitchV
rather than symbolic execution with an SMT solver. We chose our
particular p4-symboli c design because of the availability, flexibil-
ity, and ease of use of off-the-shelf SMT solvers, and our familiarity
with them. This worked well for our use cases at Google as shown
by our empirical results. Our main insights and contributions sur-
round the use of P4 for modeling, which can also be compatible
with HSA and other data plane testing approaches.

Validation vs Verification
While the last decade has seen exciting
progress on network and P4 data plane verification [1, 9, 21, 22,
26, 31, 33, 34, 36, 38, 44, 48], these tools solve a different and
orthogonal problem to SwitchV. The crucial difference is that
such tools never analyze the actual switch or network, but only its
configuration. Thus, they can find bugs in the configuration, but
not in the switch. In contrast, SwitchV finds bugs in the switch, but
not in its configuration. Formal techniques have been used to verify
hardware designs [3, 12], including switch hardware [37]. However,
such techniques cannot be used to verify black box switches, require
significant device-specific modeling, as well as verification overhead
and expertise, and cannot reason about non-hardware layers in
the switch stack. Verification and validation can be combined to reap
the benefits of both. Specifically, P4 verification tools (e.g. p4v [36]
or ASSERT-P4 [21]) can be used to verify that our P4 models indeed
encode our desired correctness properties, to increase our faith in
the fidelity of these models and the validation process as a whole.

ACKNOWLEDGMENTS
We thank the SwitchInfra team at Google for their patience and
support in pursuing this novel approach to switch validation, and
for root causing numerous bugs identified by SwitchV, at times
including false positives. We thank Waqar Mohsin for supporting
us in publishing this work, and Jeff Mogul and Colin Scott for
suggesting many improvements to the paper. We are also grateful
to Malte Schwarzkopf, our shepherd Ang Chen, and five anonymous
SIGCOMM reviewers for their help in improving the paper. This
work was supported in part by NSF grant CNS-2107078.
REFERENCES


### Listing of Selected Bugs Found in PINS

<table>
<thead>
<tr>
<th>Bug Description</th>
<th>Discovery</th>
<th>Component</th>
<th>Days to Resolution</th>
<th>Integration Issue?</th>
<th>Trivial test that would find bug</th>
</tr>
</thead>
<tbody>
<tr>
<td>Deleting non-existing entry causes entire batch to fail</td>
<td>p4-fuzzer</td>
<td>P4Runtime server</td>
<td>14</td>
<td>no</td>
<td></td>
</tr>
<tr>
<td>Does not handle MODIFY requests correctly, leaving old action parameters unchanged in table entries</td>
<td>p4-fuzzer</td>
<td>P4Runtime server</td>
<td>4</td>
<td>no</td>
<td></td>
</tr>
<tr>
<td>P4Info push failures are not propagated up to the controller.</td>
<td>p4-symbolic</td>
<td>P4Runtime server</td>
<td>0</td>
<td>yes</td>
<td>Table entry programming</td>
</tr>
<tr>
<td>Does not support reading ternary fields</td>
<td>p4-symbolic</td>
<td>P4Runtime server</td>
<td>0</td>
<td>no</td>
<td>Read all tables</td>
</tr>
<tr>
<td>Does not capitalize ACL table names</td>
<td>p4-symbolic</td>
<td>P4Runtime server</td>
<td>16</td>
<td>yes</td>
<td>Table entry programming</td>
</tr>
<tr>
<td>Incorrect error message for duplicate entries</td>
<td>p4-symbolic</td>
<td>P4Runtime server</td>
<td>2</td>
<td>no</td>
<td></td>
</tr>
<tr>
<td>PacketOut packets incorrectly get punted back to controller</td>
<td>p4-symbolic</td>
<td>P4Runtime server</td>
<td>26</td>
<td>no</td>
<td>Packet-out</td>
</tr>
<tr>
<td>Uses an orchestration agent API that does not support the space character in keys. This leads to the rejection of all ACL table entries.</td>
<td>p4-symbolic</td>
<td>P4Runtime server</td>
<td>34</td>
<td>no</td>
<td>Table entry programming</td>
</tr>
<tr>
<td>Incorrect handling of zero bytes in IDs</td>
<td>p4-fuzzer</td>
<td>P4 Toolchain</td>
<td>22</td>
<td>no</td>
<td>Set P4Info</td>
</tr>
<tr>
<td>Switch rejects WCMP groups with buckets with the same action, violating the P4RT specifications</td>
<td>p4-fuzzer</td>
<td>Orch. Agent</td>
<td>157</td>
<td>yes</td>
<td>Table entry programming</td>
</tr>
<tr>
<td>A bug in WCMP group updating logic causes unchanged group members to get removed</td>
<td>p4-symbolic</td>
<td>Orch. Agent</td>
<td>3</td>
<td>no</td>
<td></td>
</tr>
<tr>
<td>VRF deletion fails due to incorrect ALPM flag usage &amp; VRF response path is broken</td>
<td>p4-fuzzer</td>
<td>Orch. Agent and SyncD Binary</td>
<td>15</td>
<td>no</td>
<td></td>
</tr>
<tr>
<td>Does not clean up invalid entries in ACL tables, causing RESOURCE_EXHAUSTED error after 30 entries</td>
<td>p4-fuzzer</td>
<td>SyncD Binary</td>
<td>120</td>
<td>no</td>
<td></td>
</tr>
<tr>
<td>L3 forwarding not enabled for submit-to-ingress packets, causing them to be dropped with the new chip</td>
<td>p4-symbolic</td>
<td>SyncD Binary</td>
<td>19</td>
<td>yes</td>
<td></td>
</tr>
<tr>
<td>Switch occasionally re-marks DSCP to 0 in forwarded packets</td>
<td>p4-symbolic</td>
<td>SyncD Binary</td>
<td>53</td>
<td>yes</td>
<td></td>
</tr>
<tr>
<td>A port sync daemon restarts unexpectedly, breaking all packet IO</td>
<td>p4-symbolic</td>
<td>Switch Linux</td>
<td>3</td>
<td>yes</td>
<td>Packet-In</td>
</tr>
<tr>
<td>Daemons crash when network interface goes down</td>
<td>p4-symbolic</td>
<td>Switch Linux</td>
<td>164</td>
<td>yes</td>
<td></td>
</tr>
<tr>
<td>Runs a daemon that creates conflicting VRF configurations with other new services</td>
<td>p4-symbolic</td>
<td>Switch Linux</td>
<td>5</td>
<td>yes</td>
<td>Set P4Info</td>
</tr>
<tr>
<td>Switch sends IPv6 router solicitation packets unexpectedly</td>
<td>p4-symbolic</td>
<td>Switch Linux</td>
<td>unresolved</td>
<td>yes</td>
<td></td>
</tr>
<tr>
<td>Runs LLDP causing packets to be punted to controller</td>
<td>p4-symbolic</td>
<td>Switch Linux</td>
<td>9</td>
<td>yes</td>
<td>Packet-In</td>
</tr>
<tr>
<td>Resource guarantees for router_interface_table are unrealistically high for the new chip</td>
<td>p4-symbolic</td>
<td>P4 Program</td>
<td>47</td>
<td>yes</td>
<td></td>
</tr>
<tr>
<td>Header fields get rewritten before ACL is applied</td>
<td>p4-symbolic</td>
<td>P4 Program</td>
<td>14</td>
<td>no</td>
<td></td>
</tr>
<tr>
<td>P4 program does not reflect that switch drops IPv4 packets with destination IP 255.255.255.255</td>
<td>p4-symbolic</td>
<td>P4 Program</td>
<td>36</td>
<td>no</td>
<td></td>
</tr>
<tr>
<td>Program matches on the wrong ICMP field</td>
<td>p4-symbolic</td>
<td>P4 Program</td>
<td>13</td>
<td>no</td>
<td>Packet-In</td>
</tr>
</tbody>
</table>