• No se han encontrado resultados

LGS Los acuerdos contrarios a la moral, al orden

MARCO TEÓRICO

LGS Los acuerdos contrarios a la moral, al orden

2.2.1

Systems

To explore the feasibility of programmable dataplanes, we have developed three pro- totypes: a hardware-based packet processing pipeline, a software-based programmable PHY and a hardware-based programmable PHY. These three prototypes represent differ- ent points in the design space for programmable dataplane. The software-programmable

PHY takes a software-oriented approach to favor software flexibility over hardware per- formance. Hence, the focus of the work is to optimize software performance, while pre- serving flexibility. The hardware-programmable PHY and packet processing pipeline takes the opposite approach. In both cases, the method favors hardware performance over software flexibility. Hence, the focus of the work to improve software programma- bility while preserving the performance gain from hardware. Below, we describe each prototypes in more details.

Hardware Packet Processing Pipeline The first prototype is a programmable packet processing pipeline and a dataplane compiler. The target throughput is 4x10Gbps line rate. The prototype implementation consists of a C++-based compiler along with a run- time system written in hardware description language. For the frontend, the P4.org’s C++ compiler frontend is used to parse P4 source code and generate an intermediate representation [154]. The custom backend for FPGAs consists of 5000 lines of C++ code. The runtime is developed in a high-level hardware description language Blue- spec [147]. Bluespec provides many of the higher level hardware abstractions (e.g., FIFO with back-pressure) and the language includes a rich library of components, which makes development easier. The runtime is approximately 10,000 lines of Bluespec. The Connectal Project [96] is used to implement the control plane channel, and mechanisms to replay pcap traces, access control registers, and program dataplane tables.

Software Programmable PHY Our second prototype of a programmable PHY tar- gets at the 10Gbps line rate. The prototype includes an FPGA card, which performs the DMA (Direct Memory Access) transfers between network transceivers and host mem- ory, and vice versa. The prototype also consists of a software component which imple- ments the entire physical layer of the network protocol stack. The software is developed

and evaluated on Dell Precision T7500 workstations and Dell T710 servers. Both ma- chines have dual socket, 2.93 GHz six core Xeon X5670 (Westmere [15]) with 12 MB of shared L3 cache and 12 GB of RAM, 6 GB on each CPU socket. This prototype is used to evaluate both SoNIC and MinProbe , discussed in Chapter 4 and 4.4.

Hardware Programmable PHY The third prototype is built as a performance- enhanced version of the second prototype. In particular, the target throughput is 4x10Gbps line rate, which poses a scalability challenge for the software-based approach. As a consequence, the third prototype implements a programmable PHY in hardware. The prototype is built using an Altera DE5 board, with a Stratix V FPGA. The im- plementation includes the entire 10GbE physical layer designed in the Bluespec pro- gramming language [147], and also extends the physical layer to perform the modify operation to the value in idle characters, which is used to implement a zero-cost time synchronization protocol.

2.2.2

Evaluation

We used various types of hardware networks, and network topologies throughout this dissertation to evaluate our systems. We illustrate them in the following subsections.

National Lambda Rail

SoNIC To evaluate SoNIC, we connected the SoNIC board and the ALT10G board directly via fiber optics (Figure 2.8a). ALT10G allows us to generate random packets of any length and with the minimum inter-packet gap to SoNIC. ALT10G also pro- vides us with detailed statistics such as the number of valid/invalid Ethernet frames, and

(a) HiTech Global FPGA board (b) NetFPGA-SUME board

Figure 2.6: FPGA development boards used for our research.

Boston Chicago

Cleveland

Cornell (NYC) NYC

Cornell (Ithaca)

Sender Receiver

Figure 2.7: Our path on the National Lambda Rail.

frames with CRC errors. We compared these numbers from ALT10G with statistics from SoNIC to verify the correctness of SoNIC .

Further, we created a simple topology to evaluate the SoNIC: We used port 0 of the SoNIC server to generate packets to the Client server via an arbitrary network, and split the signal with a fiber optic splitter so that the same stream can be directed to both the Client and port 1 of the SoNIC server capturing packets (Figure 2.8b). We used various network topologies composed of a Cisco 4948 and IBM BNT G8264 switches for the network between the SoNIC server and the Client.

SoNIC server ALT10G

(a) Evaluation setup for SoNIC

Client

Splitter

SoNIC server

(b) Simple Topology for evaluating SoNIC

Figure 2.8: Simple evaluation setup.

MinProbe National Lambda Rail (NLR) was a wide-area network designed for re- search and has significant cross traffic [148]. We set up a path from Cornell university to NLR over nine routing hops and 2500 miles one-way (Figure 2.7). All the routers in NLR are Cisco 6500 routers. The average round trip time of the path was 67.6 ms, and there was always cross traffic. In particular, many links on our path were utilized with 1∼4 Gbps cross traffic during the experiment. Cross traffic was not under our control, however we received regular measurements of traffic on the external interfaces of all routers.

P4FPGA We evaluate the performance of P4FPGA generated designs against a set of representative P4 programs. Each program in our benchmark suite is compiled with the P4FPGA compiler into Bluespec source code, which is then processed by a commer- cial compiler from Bluespec Inc. to generate Verilog source code. Next, the Verilog source code is processed by the standard Vivado 2015.4 tool from Xilinx, which per- forms synthesis, placement, routing and bitstream generation. All of the above steps are automated with a Makefile. The compilation framework supports both the Altera tool suite, Quartus, and Xilinx tool suite, Vivado. For this evaluation, we only used Vivado. We deployed the compiled bitstream on a NetFPGA SUME platform with a

Xilinx Virtex-7 XC7V690T FPGA, with 32 high-speed serial transceivers to provide PCIe (Gen3 x8) communication and 4 SFP+ ports (10Gbps Ethernet).

P4Paxos We ran our end-to-end experiments on a testbed with four Supermicro 6018U-TRTP+ servers and a Pica8 P-3922 10G Ethernet switch, connected in the topol- ogy shown in Figure 6. The servers have dual-socket Intel Xeon E5-2603 CPUs, with a total of 12 cores running at 1.6GHz, 16GB of 1600MHz DDR4 memory and two In- tel 82599 10 Gbps NICs. We installed one NetFPGA SUME [62] board in each server through a PCIe x8 slot, though NetFGPAs behave as stand-alone systems in our testbed. Two of the four SFP+ interfaces of the NetFPGA SUME board and one of the four SFP+ interfaces provided by the 10G NICs are connected to Pica8 switch with a total of 12 SFP+ copper cables. The servers were running Ubuntu 14.04 with Linux kernel version 3.19.0.