• No se han encontrado resultados

GOLFO DE CALIFORNIA

X. 2.- Madurez gonadal

The collection of known methods for generating random geometric objects (points, lines, etc.) and random combinatorial objects (trees, graphs, networks) is far too large to be surveyed here in any detail. As a general rule these generation methods work by combining techniques for generating numbers, permutations, and samples, as described in the previous section. To illustrate the variety of generation methods available, a short list of techniques for generating random unweighted graphs follows.

• Random uniform graph G(n,p). This is an undirected graph on n vertices, such that each of then(n−1)/2 possible edges is generated independently with

probabilityp. Start with an empty graph on n vertices. Consider each edge (i, j )

in sequence, and with probabilityp insert the edge in the graph. The following

code works for undirected graphs. for (i=1; i<=n; i++)

for (j=i+1; j<=n; j++)

prob[1...n] = table of probabilities for (i=i; i<=n; i++) {

R[i] = n*prob[i]; //scale probabilities if (R[i] >= 1) H.insert(i);

else L.insert(i); }

while (H.notEmpty()) {

k = H.select(); // any element j = L.select(); // any element table[j].alias = k; // j is done table[j].prob = R[j]; L.delete(j); R[k]= R[k]+(R[j]-1); // adjust probability if (R[k] <= 1) { H.delete(k); L.insert(k); } }

Figure 5.6. Initializing table values for the alias method.

• Random uniform graph G(n,k). To generate a random undirected graph with exactlyk edges, assign each edge to an integer 1 . . . m, where m= n(n − 1)/2.

Draw a sample of size k without replacement from 1 . . . m and insert the

corresponding sampled edges in the graph.

• Nearest-neighbor graph P (n,k). A random (directed) nearest-neighbor graph containsn vertices. Each vertex is assigned to a random coordinate point (x, y)

in the unit square. For each vertex, insert edges to its k nearest neighbors,

according to some distance metric.

• Random proximity graph P (n,δ). Assign each of n vertices to a random point in the unit square. Then, for each vertex, add an edge to all neighbors within distanceδ.

• Grid proximity graph G(n,k,d,e). This graph is built in a grid of of n vertices arranged in ak by n/k rectangle: each vertex has edges to d random neighbors

withine rows or columns away. Consider vertices row-by-row and column-by-

column: if vertexvrcin rowr, column c, has already been assigned f edges,

generated− f additional edges to random neighbors in higher-numbered rows

• Random acyclic graph G(n,k,d,e). The preceding grid technique can also be used to construct a random acyclic graph, by generatingd random edges only

in “forward” directionsr+ 1 through r + e.

• Random rooted binary tree B(n). To generate a random rooted binary tree, create a permutation of the integers 1. . . n and insert them into the tree according

to the binary search tree property. These trees are not uniformly distributed because some shapes are more likely than others to occur.

For ideas on how to generate inputs for a specific algorithmic problem, con- sult the experimental literature or search the Internet for problem-specific input testbeds.

5.2.6 Nonindependence

Another category of random generator of interest in algorithm research produces

nonindependent random variatesRi, where the probability that a particular value r= Riis generated is supposed to depend on recently generated values, in some

well-defined way.

To take a simple example, suppose data structureD must respond to a random

sequence of (insert, delete) operations using random keys. This application requires that an item with keyk cannot be deleted unless it has been previously inserted.

Thus the probability that delete(k) is generated next in the sequence depends on

whether insert(k) has previously appeared. The problem is to generate a random

sequence of insert(k), delete(k) operations, for k∈ 1...n such that every delete(k)

operation appears after its corresponding insert(k) operation.

One simple approach is to generate a random permutation of doubled elements 1, 1, 2, 2,. . . n, n. For each integer k in the permutation, check whether this is the

first or second appearance and generate insert(k) or delete(k) accordingly.

Another approach that creates more variety of sequences is to start by generating random line segments on an interval. First create a collection ofn random line

segments within the real interval[0,1) by generating a random pair of endpoints

(xi,yi) for each segment. Assign a random key k∈ 1...n to each segment; label the

left endpoint with insert(k) and the right endpoint with delete(k). Sort the labels

by coordinates and generate corresponding insert/delete operations. The method of generating endpoints can be adjusted to control how much overlap occurs in the line segments, and therefore the maximum size of the data structure. For example, to ensure that all inserts precede all deletes, generate left endpoints from[0,.5) and right endpoints from[.5,1).

Another type of nonindependence of interest in algorithm research is locality of

reference. In many application scenarios a sequence of address references or key

means that if keyk appears in the sequence it is more likely to appear again soon;

spatial locality means that key values neark are likely to be requested soon.

One simple approach to generating a sequence with locality is to define a prob- ability distributionD on the differences between successive keys in the sequence.

For example, suppose the problem is to generate a sequence of keysKiuniformly

from the integer range[1,10]. Let Didenote a sequence of difference variates gen-

erated randomly according to some probability functionδ(d). Generate the initial

keyK0uniformly at random and generate subsequent keysKi+1according to Ki+i= ((Ki+ Di− 1) % 10) + 1. (5.1)

The modulus function % is used to wrap key values into the range[0,9], and the trick with−1/ + 1 keeps the distribution centered at Ki.

An example density function for differences is shown in the following on the left. This function is defined byδ(d)= 1/5−abs(d)/35 (using the absolute value

functionabs) and is peaked at 0. Assuming that Ki= 8, the density function for Ki+1using the preceding formula is shown on the right. This probability density is peaked at 8.

-4 -3 -2 -1 0 1 2 3 4 5 Differences

1 2 3 4 5 6 7 8 9 10 Next key from 8

This difference probabilityδ(d) can be easily replaced with another function as

appropriate to the model. It can be tricky, however, to find suitable replacement for function (5.1) that constructsKi+1fromKi. It is possible to prove a theorem that

(5.1) preserves the uniform distribution of the original key: that is, the distribution of every keyKi, when averaged over all possible starting keysK0, remains uniform. This property does necessarily hold when other functions for generatingKi+1 fromKi are used. The danger is that the distribution ofKi will drift away from

uniform asi increases. For example, the scaled function that follows imposes an

asymptotic distribution on Ki that is heavier in the middle, so that late in the

sequence, 5 is more likely to appear than 1 or 10, no matter how the initial key is generated.

1 2 3 4 5 6 7 8 9 10 Next key from 8

This hazard – that an initially uniform distribution will be skewed via a series of random incremental steps – is not limited to the problem of generating distributions with locality. Panny [14] describes a long history of failed schemes for preserving the initial properties of a random binary search trees, over a sequence of random insertions and deletions. Under the usual definition, a random binary search tree (BST) of sizen is created by one of the following equivalent processes: (1) Select

a permutation of keys 1. . . n uniformly at random and insert the keys into the tree

in permutation order or (2) generaten uniform reals from (0, 1) and insert them

into the tree in generation order.

One common approach to studying insert/delete costs in binary search tree algorithms is to start by generating an initial random BST of sizen by method (2)

and then to apply a sequence of alternating random insertions and deletions, so thatn stays constant. A simple method is to generate aninsert key uniformly at random from[0,1], and then a random delete key uniformly from the set of already inserted keys. However, this approach fails to preserve the initial distribution of tree shapes; for example, Eppinger [6] showed experimentally that the average internal path length first decreases and then increases over time, a property that was later proved. Because of this phenomenon, the measured performance of a given BST algorithm may be more an artifact of the key generation scheme than of the algorithm itself. Many seemingly reasonable generation schemes have similar flaws; see Panny [14] for tips on avoiding this pitfall.

5.3 Chapter Notes

This chapter has addressed two practical aspects of experimental algorithmics: how to develop a test environment that supports correct, efficient, and well-documented experiments; and how to generate random inputs and structures according to a variety of distributions.

Here are the guidelines presented in this chapter.

5.1 Stand on the shoulders of giants: use the experimental resources available to

5.2 Apply your best validation and verification techniques to build confidence in

test program correctness.

5.3 Test parameters that change frequently should be specified at run time rather

than compile time.

5.4 Write self-describing output files that require minimal reformatting before

being submitted for data analysis.

5.5 Document your experiments. You should be able to return to your test

environment a month later and know exactly how to replicate your own results.

5.4 Problems and Projects

1. Implement your favorite array-based algorithm, such as quicksort, binary search, heapsort, or mergesort. As you write, insert comments with loop invari- ants into every loop and run through the three verification questions for loop invariants. Can you spot any errors?

2. Design a suite of inputs and input generators to validate the code written for question 1. Include anassert procedure in the program and run your tests. Did you find any errors? Swap programs with a friend and run your verification and validation tests on the friend’s code. Did either of you find errors that were previously missed?

3. Consider the problem of reformatting the output from a timing utility such astime or gprof into an input format suitable for your favorite statistical analysis package. How much work must be done by hand? Can you write a formatting tool that is faster and less prone to errors?

4. Revisit an experimental project that you carried out at least a month ago. How much do you remember about the tests? Can you replicate every experiment that you performed earlier? Can you reconstruct the meaning and purpose of every data file? What could you have done to document your experiments better?

5. Implement some of the statistical tests of randomness listed in Knuth [10] or DeVroye [5] and use them to check the random number generator provided by your operating system. Does it pass?

6. Apply tests of randomness to evaluate the sequence of low-order bits gener- ated by the RNG. At what point (what bit size) does the generator start to fail the tests?

7. How would you generate n points uniformly at random in the unit circle?

How would you generaten points uniformly on the circumference of the cir-

cle? How would you generaten points uniformly inside and on the surface of

8. Use the lookup method and the alias method to implement a generator for Zipf’s distribution. How do they compare, in terms of time and space usage? Read about statistical tests of randomness for nonuniform distributions (for example, in Knuth [10]) and apply those tests to the generators. Do they pass? 9. Many approximation algorithms for NP-hard optimization problems on graphs have a guaranteed bound on solution quality under the assumption that the edge weights obey the triangle inequality: that is, for each triangle of edges(x, y), (x, z), (y, x) the sum of weights on two edges is at least equal to the weight

on the third edge. This ensures that every edge represents the shortest path between its endpoints. Design and implement an algorithm to generate ran- dom graphs that obey the triangle inequality. Does it cover the space of all such graphs? (Note: There exist graphs that obey the triangle inequality that cannot be embedded into geometric space.) Is each such graph equally likely to be generated?

10. In GPS applications, a street or road in a roadmap comprises a sequence of connected line segments that represent its location in a satellite image. Design and implement a suite of random generators that model different types of maps: street grids in cities, superhighways, rural roads, and following various terrains (mountains, rivers, lakes, etc.).

11. Design and implement a random generator for variates that are both nonuni- form (for example generated by Zipf’s distribution) and nonindependent, dis- playing temporal but not spatial locality. Use it to generate a “text” of random words (strings of varying length determined by Zipf’s law) to evaluate your favorite set data structure implementations. How closely does performance on your generated inputs match performance on real words in English text?

Bibliography

[1] Adamic, Lada A., and Bernardo A. Huberman, “Zipf’s law and the Internet,” Glot- tometrics Vol 3, pp. 143–150, 2002. Availabel from: www.ram-verlag.de/ journal.htm.

[2] Bentley, Jon Louis, and James B. Saxe, “Generating sorted lists of random num- bers,”ACM Transactions on Mathematical Software, Vol 6, No 3, pp 359–364, September 1980.

[3] Bentley, Jon Louis, “Ten Commandments for Experiments on Algorithms,” in “Tools for Experiments on Algorithms,” in R. Rashid, ed., CMU Computer Science: A 25th Anniversary Commemorative, ACM Press Anthology Series, 1991.

[4] Bratley, Paul, Bennet L. Fox, and Linus E. Schrage, A Guide to Simulation, Springer- Verlag, 1983.

[5] DeVroye, Luc, Non-Uniform Random Variate Generation, Springer-Verlag, New York, 1986.Available from:cg.scs.carleton.ca/∼luc/rnbookindex. htmlfor an open-copyright Web edition.

[6] Eppinger, Jeffrey L., “An empirical study of insertion and deletion in binary search trees,” Communications of the ACM, Vol 26, Issue 9, September 1983.

[7] Fan, C. T., Mervin E. Muller, and Ivan Rezucha, “Development of sampling plans by using sequential (item by item) selection techniques and digital computers,” Journal of the American Statistical Association, 57, pp. 387–402, 1962.

[8] Gent, Ian P., Stuart A. Grant, Ewen MacIntyre, Patrick Prosser, Paul Shaw, Bar- bara M. Smith, and Toby Walsh, How Not to Do It, Research Report 97.27, School of Computer Studies, University of Leeds, May 1997. Available From: www.cs.st-andrews.ac.uk/∼ipg/pubs.html.

[9] Jones, T. G., “A note on sampling a tape file,” Communications of the ACM, Vol 5, No 6, p. 343, June 1962.

[10] Knuth, Donald E., The Art of Computer Programming: Vol 2, Seminumerical Algorithms, Addison Wesley, 1981.

[11] Law, Aberill M., and W. David Kelton, Simulation Modeling and Analysis, 3rd ed., McGraw-Hill, 2000.

[12] L’Ecuyer, Pierre, “Tables of linear congruential generators of different sizes and good lattice structure,” Mathematics of Computation, Vol 68, No 225, pp. 249–260, January 1999.

[13] Mehlhorn, Kurt, Stefan Näher, Michael Seel, Raimund Seidel, Thomas Schilz, Stefan Schirra, and Christian Uhrig, “Checking geometric programs or verification of geo- metric structures.” Proceedings of the Twelfth Annual Symposium on Computational Geometry SGC’96, pp. 159–195, 1996.

[14] Panny, Wolfgang, “Deletions in random binary search trees: A story of errors,” Journal of Statistical Planning and Inference, Vol 140, Issue 8, pp. 2334–45, August 2010. [15] Park, Stephen K., and Keith W. Miller, “Random number generators: Good ones are

hard to find,” Communications of the ACM, Vol 31, Issue 10, pp. 1192–1201, October 1988.

[16] Press, William H. et al. Numerical Recipes: The Art of Scientific Computing, 3rd ed., Cambridge University, Press 2007. An electronic book is available from the publisher atwww.nr.com.

[17] Ripley, Brian D., Stochastic Simulation, Wiley Series in Probability and Mathematics, Wiley & Sons, 1987.

Documento similar