• No se han encontrado resultados

Lista de comprobación para la instalación

stream performance 0.3 IPC. It is the average performance of single threaded superscalar processor shown in Section 4.5.4. In this case, we need from 100 to about 450 streams in the processor to sustain required throughput.

In fact, the theoretical requirements indicated in Figure 5.1 increase in a real envi- ronment. The dependencies among threads reduce the linear scalability of performance, especially for architectures with tens to hundreds of streams.

5.2

Workload Breakdown

The parallel network applications assign packets to software contexts (i.e. threads). The majority of network systems use the same amount of threads than the total number of hardware contexts (i.e. streams). In this section we discuss the workload of a given thread assigned to a particular stream.

Figure 5.2: Snort packet processing loop

Figure 5.2 depicts the packet processing loop of Snort. According to the workload classification proposed in Section 4.2 (i.e. self–contained, stateless, stateful), there are three main stages well differentiated: decoding, preprocessing, and rule matching. Once the stream finishes the packet processing, the system releases the thread and checks if there is any packet waiting to be processed (dotted line). Figure 5.3 presents the multithreaded Snort design that we develop in our implementation. We replicate the packet processing loop per thread and there is no breaking of the packet processing loop. The received packet from the network card (i.e. packet capturing) is allocated into the memory waiting to be processed. The system assigns the packet to a given thread that is processed in a particular stream. The processing starts to decode the packet header and payload to initialize a number of data structures (i.e. decoding stage). The decoding

76 Principles of Parallel Stateful Processing

Decoding Preprocessing Rule Matching

Read Packet

Decoding Preprocessing Rule Matching

Read Packet

Thread Boundary

Figure 5.3: Multithreaded Snort packet processing

stage presents self–contained workload, since it only needs data from the packet itself. The preprocessing engine receives the packet information and perform specialized tasks. They can behave as either stateless or stateful processing depending on the configura- tion. Nevertheless, throughout the experiments of this thesis we only enable stateful preprocessors in order to enhance the stateful processing. The stateful preprocessing keeps track of previous processed packets. Finally, the packet is scrutinized for signature matching. This stage is categorized as stateless workload, since external information is required (e.g. rules, signatures, keywords) and there is no need to keep track of previous packet processing.

The description of Snort packet processing points out that stateful DPI comprises several workload categories. This assumption can be extended to other stateful DPI applications, but with different distribution rates. In contrast, layer 2–3 applications present a single workload category (e.g. IP forwarding presents stateless workload).

We base our studies on different configurations of the Snort as a representative range of stateful DPI applications. We select a number of stateful preprocessors that show different behavior:

5.2 Workload Breakdown 77

• perfmonitor: it collects a wide range of statistics from packet processing intended for network administrators.

• stream4: provides TCP stream reassembly and stateful analysis capabilities to track simultaneous TCP streams and to ignore stateless attacks.

• frag3: is an IP defragmentation module that applies target-based host modeling anti–evasion techniques for attacks based on information about how an individual target IP stack operates.

Each workload presents a different configuration of enabled preprocessors. Table 5.1 indicates the enabled preprocessors according to the workload identificator. Moreover, in all configurations we employ the default configuration of rule set for rule–matching, that includes a total of 3291 rules.

Enabled Preprocessors Workload ID

Perfmonitor Mix-1

Stream4 Mix-2

Frag3 Mix-3

Perfmonitor - Stream4 Mix-12

Perfmonitor - Frag3 Mix-13

Stream4 - Frag3 Mix-23

Perfmonitor - Stream4 - Frag3 Mix-123

Table 5.1: Workload Mixes

Figure 5.4 shows the Snort workload distribution per packet processing stage. The X–axis indicates the stateful DPI workload mix according to the configurations of Snort. They differ in the enabled preprocessors. In the top graph we can observe the number of instructions denoted by Y–axis. The decoding stage (the bars with diagonal lines) shows similar workload (about 2K instructions per packet) regardless the Snort config- uration, since there are no directives for setting up the Snort decoder. In contrast, the preprocessing workload (indicated by the black bars) presents variations ranged from 2K up to about 8.3K instructions per packet, due to the different preprocessor configura- tions. Regarding the rule–matching stage (the gray bars), there are significant workload variations, although all configurations preserve the same rule–set. The reason behind this is that the preprocessing provides further knowledge to the packet processing and it can skip subsequent preprocessing or rule–matching for a given packet. We can ob- serve the configurations that show reduced preprocessing workload (i.e. Mix–1, Mix–3,

78 Principles of Parallel Stateful Processing

Mix–13) present higher rule–matching workload than the rest of configurations. Thus, rule–matching workload is sensitive to the preprocessing configuration.

0 10000 20000 30000 40000 50000 60000

Mix-1 Mix-2 Mix-3 Mix-12 Mix-13 Mix-23 Mix-123 Avg

Rule-Matching Preprocessing Decoding

(a) Instructions per Packet

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

Mix-1 Mix-2 Mix-3 Mix-12 Mix-13 Mix-23 Mix-123 Avg Decoding Preprocessing Rule-Matching

(b) Workload Distribution

Figure 5.4: Snort workload distribution according to the processing stages In addition, enabling more preprocessors doesn’t increase linearly the workload per packet. Part of the knowledge provided by a given preprocessor is used by other pre- processors (e.g. data structures). For example, the preprocessor workload of Mix–1, Mix–3, and Mix–13 or Mix–1, Mix–2, and Mix–12 show marginal differences since they share part of the processing workload. In addition, the remaining preprocessing