Theano is somewhat similar to a compiler but with the added bonuses of being able to express, manipulate, and optimize mathematical expressions as well as run code on CPU and GPU. Since 2010, Theano has improved release after release and has been adopted by several other Python projects as a way to automatically generate efficient computational models on the fly.
In Theano, you first define the function you want to run by specifying variables and transformation using a pure Python API. This specification will then be compiled to machine code for execution.
As a first example, let's examine how to implement a function that computes the square of a number. The input will be represented by a scalar variable, a, and then we will transform it to obtain its square, indicated by a_sq. In the following code, we will use the T.scalar function to define the variable and use the normal ** operator to obtain a new variable:
print(a_sq) # Output:
# Elemwise{pow,no_inplace}.0
As you can see, no specific value is computed and the transformation we apply is purely symbolic. In order to use this transformation, we need to generate a function. To compile a function, you can use the th.function utility that takes a list of the input variables as its first argument, and the output transformation (in our case a_sq) as its second argument:
compute_square = th.function([a], a_sq)
Theano will take some time and translate the expression to efficient C code and compile it, all in the background! The return value of th.function will be a ready-to-use Python function and its usage is demonstrated in the next line of code:
compute_square(2) 4.0
Unsurprisingly, compute_square correctly returns the input value squared. Note, however, that the return type is not an integer (like the input type) but a floating point number. This is because the Theano default variable type is float64. you can verify that by inspecting the dtype attribute of the a variable:
a.dtype # Result: # float64
The Theano behavior is very different compared to what we saw with Numba. Theano doesn't compile generic Python code and, also, doesn't do any type inference; defining Theano functions requires a more precise specification of the types involved.
The real power of Theano comes from its support for array expressions. Defining a one- dimensional vector can be done with the T.vector function; the returned variable supports broadcasting operations with the same semantics of NumPy arrays. For instance, we can take two vectors and compute the element-wise sum of their squares, as follows:
a = T.vector('a') b = T.vector('b') ab_sq = a**2 + b**2
compute_square = th.function([a, b], ab_sq) compute_square([0, 1, 2], [3, 4, 5])
# Result:
The idea is, again, to use the Theano API as a mini-language to combine various Numpy array expressions will be compiled to efficient machine code.
One of the selling points of Theano is its ability to perform
arithmetic simplifications and automatic gradient calculations. For more information, refer to the official documentation (http://deeplearning.ne t/software/theano/introduction.html).
To demonstrate Theano functionality on a familiar use case, we can implement our parallel calculation of pi again. Our function will take a collection of two random coordinates as input and return the pi estimate. The input random numbers will be defined as vectors named x and y, and we can test their position inside the circle using standard element-wise operation that we will store in the hit_test variable:
x = T.vector('x') y = T.vector('y')
hit_test = x ** 2 + y ** 2 < 1
At this point, we need to count the number of True elements in hit_test, which can be done taking its sum (it will be implicitly cast to integer). To obtain the pi estimate, we finally need to calculate the ratio of hits versus the total number of trials. The calculation is illustrated in the following code block:
hits = hit_test.sum() total = x.shape[0] pi_est = 4 * hits/total
We can benchmark the execution of the Theano implementation using th.function and the timeit module. In our test, we will pass two arrays of size 30,000 and use the
timeit.timeit utility to execute the calculate_pi function multiple times:
calculate_pi = th.function([x, y], pi_est) x_val = np.random.uniform(-1, 1, 30000) y_val = np.random.uniform(-1, 1, 30000) import timeit
res = timeit.timeit("calculate_pi(x_val, y_val)",
"from __main__ import x_val, y_val, calculate_pi", number=100000) print(res)
The serial execution of this function takes about 10 seconds. Theano is capable of
automatically parallelizing the code by implementing element-wise and matrix operations using specialized packages, such as OpenMP and the Basic Linear Algebra
Subprograms (BLAS) linear algebra routines. Parallel execution can be enabled using configuration options.
In Theano, you can set up configuration options by modifying variables in the theano.config object at import time. For example, you can issue the following commands to enable OpenMP support:
import theano
theano.config.openmp = True
theano.config.openmp_elemwise_minsize = 10
The parameters relevant to OpenMP are as follows:
openmp_elemwise_minsize: This is an integer number that represents the minimum size of the arrays where element-wise parallelization should be enabled (the overhead of the parallelization can harm performance for small arrays)
openmp: This is a Boolean flag that controls the activation of OpenMP compilation (it should be activated by default)
Controlling the number of threads assigned for OpenMP execution can be done by setting the OMP_NUM_THREADS environmental variable before executing the code.
We can now write a simple benchmark to demonstrate the OpenMP usage in practice. In a file test_theano.py, we will put the complete code for the pi estimation example:
# File: test_theano.py import numpy as np import theano.tensor as T import theano as th th.config.openmp_elemwise_minsize = 1000 th.config.openmp = True x = T.vector('x') y = T.vector('y') hit_test = x ** 2 + y ** 2 <= 1 hits = hit_test.sum() misses = x.shape[0] pi_est = 4 * hits/misses