• No se han encontrado resultados

Como condicionan las ilustraciones la traducción

3. LA ILUSTRACIÓN LITERARIA

3.4. Como condicionan las ilustraciones la traducción

When we first initialized numpy.ndarray objects by using numpy.zero, we provided an

optional argument for the memory layout. This argument specifies, roughly speaking, which elements of an array get stored in memory next to each other. When working with

small arrays, this has hardly any measurable impact on the performance of array

operations. However, when arrays get large the story is somewhat different, depending on the operations to be implemented on the arrays.

To illustrate this important point for memory-wise handling of arrays in science and finance, consider the following construction of multidimensional numpy.ndarray objects:

In [133]: x = np.random.standard_normal((5, 10000000)) y = 2 * x + 3 # linear equation y = a * x + b C = np.array((x, y), order=‘C’) F = np.array((x, y), order=‘F’) x = 0.0; y = 0.0 # memory cleanup In [134]: C[:2].round(2) Out[134]: array([[[-0.51, -1.14, -1.07, …, 0.2 , -0.18, 0.1 ], [-1.22, 0.68, 1.83, …, 1.23, -0.27, -0.16], [ 0.45, 0.15, 0.01, …, -0.75, 0.91, -1.12], [-0.16, 1.4 , -0.79, …, -0.33, 0.54, 1.81], [ 1.07, -1.07, -0.37, …, -0.76, 0.71, 0.34]], [[ 1.98, 0.72, 0.86, …, 3.4 , 2.64, 3.21], [ 0.55, 4.37, 6.66, …, 5.47, 2.47, 2.68], [ 3.9 , 3.29, 3.03, …, 1.5 , 4.82, 0.76], [ 2.67, 5.8 , 1.42, …, 2.34, 4.09, 6.63], [ 5.14, 0.87, 2.27, …, 1.48, 4.43, 3.67]]])

Let’s look at some really fundamental examples and use cases for both types of ndarray

objects:

In [135]: %timeit C.sum()

Out[135]: 10 loops, best of 3: 123 ms per loop

In [136]: %timeit F.sum()

Out[136]: 10 loops, best of 3: 123 ms per loop

When summing up all elements of the arrays, there is no performance difference between the two memory layouts. However, consider the following example with the C-like

memory layout:

In [137]: %timeit C[0].sum(axis=0)

Out[137]: 10 loops, best of 3: 102 ms per loop

In [138]: %timeit C[0].sum(axis=1)

Out[138]: 10 loops, best of 3: 61.9 ms per loop

Summing five large vectors and getting back a single large results vector obviously is slower in this case than summing 10,000,000 small ones and getting back an equal number of results. This is due to the fact that the single elements of the small vectors — i.e., the

rows — are stored next to each other. With the Fortran-like memory layout, the relative

performance changes considerably: In [139]: %timeit F.sum(axis=0)

Out[139]: 1 loops, best of 3: 801 ms per loop

In [140]: %timeit F.sum(axis=1)

Out[140]: 1 loops, best of 3: 2.23 s per loop

In [141]: F = 0.0; C = 0.0 # memory cleanup

In this case, operating on a few large vectors performs better than operating on a large number of small ones. The elements of the few large vectors are stored in memory next to each other, which explains the relative performance advantage. However, overall the operations are absolutely much slower when compared to the C-like variant.

Conclusions

Python provides, in combination with NumPy, a rich set of flexible data structures. From a

finance point of view, the following can be considered the most important ones: Basic data types

In finance, the classes int, float, and string provide the atomic data types.

Standard data structures

The classes tuple, list, dict, and set have many application areas in finance, with list being the most flexible workhorse in general.

Arrays

A large class of finance-related problems and algorithms can be cast to an array setting; NumPy provides the specialized class numpy.ndarray, which provides both

convenience and compactness of code as well as high performance.

This chapter shows that both the basic data structures and the NumPy ones allow for highly

vectorized implementation of algorithms. Depending on the specific shape of the data structures, care should be taken with regard to the memory layout of arrays. Choosing the right approach here can speed up code execution by a factor of two or more.

Further Reading

This chapter focuses on those issues that might be of particular importance for finance algorithms and applications. However, it can only represent a starting point for the exploration of data structures and data modeling in Python. There are a number of

valuable resources available to go deeper from here. Here are some Internet resources to consult:

The Python documentation is always a good starting point:

http://www.python.org/doc/.

For details on NumPy arrays as well as related methods and functions, see

http://docs.scipy.org/doc/.

The SciPy lecture notes are also a good source to get started: http://scipy-

lectures.github.io/.

Good references in book form are:

Goodrich, Michael et al. (2013): Data Structures and Algorithms in Python. John Wiley & Sons, Hoboken, NJ.

Langtangen, Hans Petter (2009): A Primer on Scientific Programming with Python. Springer Verlag, Berlin, Heidelberg.

[18] The

Cython library brings static typing and compiling features to Python that are comparable to those in C. In fact, Cython is a hybrid language of Python and C.

[19] Here and in the following discussion, terms like float, float object, etc. are used interchangeably, acknowledging that every float is also an object. The same holds true for other object types.

[20] Cf. http://en.wikipedia.org/wiki/Double-precision_floating-point_format.

[21] It is not possible to go into details here, but there is a wealth of information available on the Internet about regular expressions in general and for Python in particular. For an introduction to this topic, refer to Fitzgerald, Michael (2012):

Introducing Regular Expressions. O’Reilly, Sebastopol, CA.

Chapter 5. Data Visualization

Use a picture. It’s worth a thousand words.

— Arthur Brisbane (1911)

This chapter is about basic visualization capabilities of the matplotlib library. Although there are many other visualization libraries available, matplotlib has established itself as

the benchmark and, in many situations, a robust and reliable visualization tool. It is both easy to use for standard plots and flexible when it comes to more complex plots and customizations. In addition, it is tightly integrated with NumPy and the data structures that

it provides.

This chapter mainly covers the following topics: 2D plotting

From the most simple to some more advanced plots with two scales or different subplots; typical financial plots, like candlestick charts, are also covered.

3D plotting

A selection of 3D plots useful for financial applications are presented.

This chapter cannot be comprehensive with regard to data visualization with Python and matplotlib, but it provides a number of examples for the most basic and most important

capabilities for finance. Other examples are also found in later chapters. For instance, Chapter 6 shows how to visualize time series data with the pandas library.

Documento similar