6. Resultados y discusión de hallazgos
6.2 Los textismos en la escritura en WhatsApp
The Tuple Space Search technique, originally proposed in [127], is a dynamic classification algorithm that builds upon the good lookup performance of hash tables in order to process incoming network packets. Tuple Space Search builds upon the observation that the rules R1, . . . , Rnin a d-dimensional rule set R can
be partitioned into m equivalence classes C1, . . . , Cmbased on the rules’ structural
properties. These equivalence classes are effectively sub rule sets and can, by exploiting the structure of all rules within one specific equivalence class, quickly be searched independently. One important aspect behind this approach is that, in typical rule sets, the number of equivalence classes is much smaller than the number of rules, i. e., m ≪ n [127].
In order to determine the equivalence classes, a prefix rule Ri=
(︂
h1 ∈ xi1/yi1, . . . , hd∈ xid/yid, ai
)︂
(4.1) is mapped to the d-tuple ti =(︁yi1, . . . , ydi
)︁
. Two prefix rules Ri and Rj are said to
be in the same equivalence class, if their corresponding tuples ti and tj are equal,
i. e., ti= tj. However, if a rule is not in prefix format, it must first be converted
into a set of prefix rules before the tuple mapping can take place. Unfortunately, this procedure increases preprocessing time and will blow up the size of the stored rule set in many cases, as there often does not exist a one-to-one mapping between a non-prefix rule to a prefix rule. Alternatively, the non-prefix rules can be kept in a separate list which is searched linearly during the classification process. This approach avoids the increase of the rule set size, but can result in poor classification performance, if there are too many non-prefix rules. In the remainder of this section, however, we will assume that every rule Ri is in prefix
format in order to focus on the actual classification algorithm.
After the tuples have been extracted from every rule, a hash map Mtis created for
every distinct tuple t. Subsequently, every rule Ri is inserted into the hash map
Mti that corresponds to the rule’s tuple ti. Here, a rule Ri’s hash key is computed
using the relevant leftmost yi
j bits of the net address xij from the rule’s subnet
checks, which we denote by xi j|yi
j. For a given hash function f , the hash key k(Ri)
for the prefix rule Ri can be obtained by hashing the concatenated relevant parts
of the subnet checks, i. e.,
k(Ri) = f (xi1|yi 1x i 2|yi 2. . . x i d|yi d). (4.2)
After all (k(Ri), Ri)pairs have been stored in the hash maps, an incoming packet
p can be classified by linearly probing the m hash maps. In order to search for rules that match the packet p within a hash map Mt=(y1,...,yd), a hash key k(p) is
constructed using p’s header fields analogously to the rule hashing procedure, i. e., k(p) = f (hp1|y1h
p
2|y2. . . h
p
d|yd). (4.3)
Subsequently, k(p) is used to locate all rules in Mtthat could potentially match
p, which are those rules that also hash to k(p). Of these rules, the most highly prioritized matching rule RMt is extracted. Note that, in general, this process must
be repeated for all m hash maps, since the rule hashing process does not preserve the initial rule ordering. Finally, after all hash maps have been traversed, the overall most highly prioritized matching rule Ri∗is determined from the rules that
are most highly prioritized in their respective hash maps. Both the computation of the hash maps from a two-dimensional initial rule set R and the classification operation for a packet p are illustrated in Figure 4.2.
As the previously in Section 4.1 discussed Linear Search, Tuple Space Search has a memory footprint linear in the number of rules, since every rule is stored in a single hash map, plus a small bookkeeping overhead introduced by the hash maps. However, if we assume amortized O(1) hash map lookup performance, probing m hash maps is still significantly faster than linearly scanning n rules (if m ≪ n). Moreover, Tuple Space Search allows for dynamic rule set updates that do not require a rebuild of the entire search data structure, as rule insertions or deletions can be performed through the corresponding hash map operations. These performance characteristics are summarized in Table 4.2.
A practical implementation of the Tuple Space Search algorithm can be found in the Open vSwitch [105]. According to the authors of [105], Tuple Space Search has been selected over faster classification algorithms due to its small storage requirements as well as its capability for quick rule set updates. Furthermore, although Tuple Space Search has been primarily designed for the usage on general purpose CPU systems [127], it has successfully been implemented and evaluated
Fig. 4.2: Sketch for Tuple Space Search preprocessing and classification operations.
Classification Data structure Data structure Memory
operation creation update requirements
O(m) O(d · n) O(1) O(d · n + m)
n: number of rules d: number of fields m: number of distinct tuples
Tab. 4.2: (Amortized) Tuple Space Search performance characteristics.
on a GPU platform [138]. That way, all hash maps can be queried in parallel, which allows to further accelerate the lookup operation.