tItULO I DERECHOS, ObLIGACIONES y GARANtIAS
77 • Establecer los mecanismos judiciales y administrativos necesarios para asegurar que la mujer objeto de
A search through an array is sometimes also called a table search, particularly if the keys are themselves structured objects, such as arrays of numbers or characters. The latter is a frequently encountered case; the character arrays are called strings or words. Let us define a type String as
String = ARRAY M OF CHAR
and let order on strings x and y be defined as follows: (x = y) ≡ (Aj: 0 ≤ j < M : xj = yj)
(x < y) ≡ Ei: 0 ≤ i < N: ((Aj: 0 ≤ j < i : xj = yj) & (xi < yi))
In order to establish a match, we evidently must find all characters of the comparands to be equal. Such a comparison of structured operands therefore turns out to be a search for an unequal pair of comparands, i.e. a search for inequality. If no unequal pair exists, equality is established. Assuming that the length of the words be quite small, say less than 30, we shall use a linear search in the following solution.
In most practical applications, one wishes to consider strings as having a variable length. This is accomplished by associating a length indication with each individual string value. Using the type declared above, this length must not exceed the maximum length M. This scheme allows for sufficient flexibility for many cases, yet avoids the complexities of dynamic storage allocation. Two representations of string lengths are most commonly used:
1. The length is implicitly specified by appending a terminating character which does not otherwise occur. Usually, the non-printing value 0X is used for this purpose. (It is important for the subsequent applications that it be the least character in the character set).
2. The length is explicitly stored as the first element of the array, i.e. the string s has the form s = s0, s1, s2, ... , sN-1
where s1 ... sN-1 are the actual characters of the string and s0 = CHR(N). This solution has the advantage that the length is directly available, and the disadvantage that the maximum length is limited to the size of the character set, that is, to 256 in the case of the ASCII set.
For the subsequent search algorithm, we shall adhere to the first scheme. A string comparison then takes the form
i := 0;
WHILE (x[i] = y[i]) & (x[i] # 0X) DO i := i+1 END
The terminating character now functions as a sentinel, the loop invariant is Aj: 0 ≤ j < i : xj = yj ≠ 0X,
and the resulting condition is therefore
((xi = yi) OR (xi = 0X)) & (Aj: 0 < j < i : xj = yj ≠ 0X)
It establishes a match between x and y, provided that xi = yi, and it establishes x < y, if xi < yi.
We are now prepared to return to the task of table searching. It calls for a nested search, namely a search through the entries of the table, and for each entry a sequence of comparisons between components. For example, let the table T and the search argument x be defined as
T: ARRAY N OF String; x: String
Assuming that N may be fairly large and that the table is alphabetically ordered, we shall use a binary search. Using the algorithms for binary search and string comparison developed above, we obtain the following program segment.
L := 0; R := N; WHILE L < R DO
m := (L+R) DIV 2; i := 0;
WHILE (T[m,i] = x[i]) & (x[i] # 0C) DO i := i+1 END ; IF T[m,i] < x[i] THEN L := m+1 ELSE R := m END END ;
IF R < N THEN i := 0;
WHILE (T[R,i] = x[i]) & (x[i] # 0X) DO i := i+1 END END
(* (R < N) & (T[R,i] = x[i]) establish a match*) 1.9.4. Straight String Search
A frequently encountered kind of search is the so-called string search. It is characterized as follows. Given an array s of N elements and an array p of M elements, where 0 < M < N, declared as
s: ARRAY N OF Item p: ARRAY M OF Item
string search is the task of finding the first occurrence of p in s. Typically, the items are characters; then s may be regarded as a text and p as a pattern or word, and we wish to find the first occurrence of the word in the text. This operation is basic to every text processing system, and there is obvious interest in finding an efficient algorithm for this task. Before paying particular attention to efficiency, however, let us first present a straightforward searching algorithm. We shall call it straight string search.
A more precise formulation of the desired result of a search is indispensible before we attempt to specify an algorithm to compute it. Let the result be the index i which points to the first occurrence of a match of the pattern within the string. To this end, we introduce a predicate P(i,j)
P(i, j) = Ak : 0 ≤ k < j : si+k = pk
Then evidently our resulting index i must satisfy P(i, M). But this condition is not sufficient. Because the search is to locate the first occurrence of the pattern, P(k, M) must be false for all k < i. We denote this condition by Q(i).
Q(i) = Ak : 0 ≤ k < i : ~P(k, M)
The posed problem immediately suggests to formulate the search as an iteration of comparisons, and we proposed the following approach:
i := -1;
REPEAT INC(i); (* Q(i) *) found := P(i, M)
UNTIL found OR (i = N-M)
The computation of P again results naturally in an iteration of individual character comparisons. When we apply DeMorgan's theorem to P, it appears that the iteration must be a search for inequality among corresponding pattern and string characters.
P(i, j) = (Ak : 0 ≤ k < j : si+k = pk) = (~Ek : 0 ≤ k < j : si+k≠ pk)
The result of the next refinement is a repetition within a repetition. The predicates P and Q are inserted at appropriate places in the program as comments. They act as invariants of the iteration loops.
i := -1;
REPEAT INC(i); j := 0; (* Q(i) *)
WHILE (j < M) & (s[i+j] = p[j]) DO (* P(i, j+1) *) INC(j) END (* Q(i) & P(i, j) & ((j = M) OR (s[i+j] # p[j])) *)
UNTIL (j = M) OR (i = N-M)
The term j = M in the terminating condition indeed corresponds to the condition found, because it implies P(i,M). The term i = N-M implies Q(N-M) and thereby the nonexistence of a match anywhere in the string. If the iteration continues with j < M, then it must do so with si+j≠ pj. This implies ~P(i,j), which implies Q(i+1), which establishes Q(i) after the next incrementing of i.
Analysis of straight string search. This algorithm operates quite effectively, if we can assume that a
mismatch between character pairs occurs after at most a few comparisons in the inner loop. This is likely to be the case, if the cardinality of the item type is large. For text searches with a character set size of 128 we may well assume that a mismatch occurs after inspecting 1 or 2 characters only. Nevertheless, the worst case performance is rather alarming. Consider, for example, that the string consist of N-1 A's followed by a single B, and that the pattern consist of M-1 A's followed by a B. Then in the order of N*M comparisons are necessary to find the match at the end of the string. As we shall subsequently see, there fortunately exist methods that drastically improve this worst case behaviour.