• No se han encontrado resultados

Procedimientos previos a la instalación Este apartado describe dos tareas que debe realizar antes de ejecutar el programa de

While working on hotplugging and resource allocation, the idea of accessing a PCIe device on the other side of an NTB link occurred to us. Dolphin thought this was possible, and had done some preliminary testing but had not actually tried to interact with a device beyond probing the bar areas. This was done using their API function AttachPhysicalMemory which allowed us to map arbitrary physical memory addresses. We tried mapping the BAR areas of a PCIe device with this function and it allowed us to read and write to the a device’s BAR areas in one host from another host. We believed that this could be used to essentially remote control a PCIe device from another host. The memory operations that are forwarded from the NTB to the device will be directly sent to the device, and not through the other host’s CPU or chipset. This will not only ensure high performance, but also not affect the performance of the other host by incurring a overhead on the CPU or chipset. Instead, the only affect to the other host is the traffic between the NTB and the device as well as the traffic on the NTB itself which it might be using.

Using an NTB was a promising idea because it side-stepped the problems we had met so far. First of all, resource allocation should not be a problem as the remote device would not be part of the users PCIe fabric. Instead the resources would be consumed from the NTB itself. Secondly, it should allow for greater flexibility than both MR-IOV and PCIe switch partitioning since it’s entirely controlled by software. This is because it’s built on top of PCIe instead of into PCIe itself and can co-exist with existing hardware and software.

The NTB idea showed lots of promise, and Ladon showed that it was possible (see section 2.10.4). We however, had not discovered this at the moment. We feared that we would not be able to complete such an implementation in time or that it would be impossible due to some technical detail we had not thought of. When we later discovered Ladon and Merlin, we were convinced that this would be possible. Still, there was much to do as implementation specific details are scarce in Ladon and Merlin.

For this idea to offer a good alternative to normal PCIe switches and MR-IOV, existing drivers would need to be supported. We also wanted the needed requirements and modifications to the kernel to be minimal. We decided to attempt to implement a proof of concept with the goal of using an unmodified device driver to use a PCIe device behind an NTB on another host. The proof of concept was developed for Linux and uses Dolphin’s NTBs and software.

4.3.1

Device lending

We invented the term "device lending" to describe our NTB remote access solution. Compared to MR-IOV, device lending offer many of the same capabilities and advantages. Unlike MR- IOV with the Virtual Hierarchies (VH) however, there is no change in the hierarchy of the PCIe fabric with our device lending. Instead, everything stays standard PCIe, but with the added ability of a device to be accesses remotely. A remotely accessed device will stay in the owner host’s PCIe fabric, and will not be part of the other hosts PCIe fabric.

4.3. EXPERIMENTING WITH NTB 47 Dev Host A Host B MRA switch VF 1 VF 2 Host A Host B Dev NTB NTB PCIe switch PCIe switch

Figure 4.1: Left MR-IOV. Right: A host borrowing a device from another host using an NTB.

Additionally, the NTB based device lending is more flexible in the devices that can be borrowed or shared than MR-IOV. A comparison can be seen in figure 4.1. MR-IOV requires all devices to be assigned to the VH to be downstream of the MRA switch the host is connected to. This makes MR-IOV incapable of sharing a device locally connected to a host since a local device will probably be upstream of the MRA switch. While sharing local devices is not possible in either MR-IOV or with a partition-able switch, it has some distinct advantages. A locally connected device can have a 16 PCIe lanes directly to the CPU. The bandwidth and latency between this device and the CPU will be at the absolute maximum the machine can provide. A device that is externally connected will often have limited bandwidth to the CPU. For instance the cable used by Dolphin is only 8 lanes, half of what an internal slot can provide. While 16 lane external cables exist, the bandwidth of this external link is still shared by all devices connected behind it. In addition, externally connected devices will have higher latency, if still very low. We believe this provides a compelling argument for having as much as possible of devices with high bandwidth requirements connected locally. Our NTB based device lending scheme allows such devices to be utilized remotely, but still gives the devices the optimal bandwidth when used locally.

A significant advantage MR-IOV has over PCIe switches with partitioning is the MRA capable devices. These devices can be controlled by multiple hosts at the same time, like SR- IOV for multiple hosts, as described in the section about MR-IOV, 2.4.3. This advantage is however limited by the fact that currently few or no available devices are MRA capable. Further, we hope that the device lending can apply to the virtual devices created by an SR-IOV device. If so, such device could be shared among the hosts. This has also been demonstrated to work, see section 2.10.3. This would make our solution one of the very few that can share PCIe devices between multiple hosts. We believe it also has the potential to be one of the cheapest.

For our NTB based device lending to work with unmodified drivers we would need to fool the drivers into thinking they were interacting with a local device. Since the drivers directly interacted with the PCI subsystem we needed a way to get the PCI subsystem to play along.

/* Low-level architecture-dependent routines */

struct pci_ops {

int (*read) (struct pci_bus *bus, unsigned int devfn, int where,

int size, u32 *val);

int (*write)(struct pci_bus *bus, unsigned int devfn, int where,

int size, u32 val);

};

Code snippet 4.1: pci_ops structure in the Linux kernel (include/linux/pci.h)