Linear programs express decision problems as a mathematical model in which require- ments are represented by a linear objective function and linear equality and inequality constraints. In vector-matrix notation, linear programs take the following form:
min cTx,
s.t. Ax ≤b,
x ≥0;
wherexis an (n×1) vector of decisions andc, band Aare known data of sizes (n×1), (m×1) and (m×n) respectively. This data might represent demand counts, supply levels, productivity measures and so on. The quantity we wish to minimise with respect to the decisionxis captured using objective functioncTxwhich might summarise total costs or, say, incomplete work over a planning horizon. An optimal solution to the linear program, x∗, must belong to the feasible set of decisions F ={x∈Rn|Ax≤b,x≥0}
and satisfy
cTx≥cTx∗ for all x∈ F \x∗.
Since their introduction by George B. Dantzig in 1947, linear programs have been extensively applied to practical decision problems. With extensions to integer decision variables and the use of non-linear functions in the objective and constraints, a more general class ofmathematical programs emerged:
min f(x),
s.t. gi(x) =bi, fori∈ {1, . . . , m}
x ∈Rn,
and hence benefit from being modelled as a non-linear program in which some or all functionsf andgi are non-linear. Examples include economies of scale in manufacturing
or the drop in signal strength with distance from a transmitter.
Certain functional forms for f and gi, due to their role in model formulation and
convenient mathematical properties, are predominant in mathematical programming. Linear functions are by far the most applicable in formulation and define linear programs which are particularly easy to solve. More generally, linear programs have the advantage of belonging to an important class of convex optimisation problems in which f and gi
(fori∈ {1, . . . , m}) are convex functions and the feasible region is aconvex set. A real valued function f(x) defined over points (x1, . . . , xn) is said to be a convex function if
and only if for any two pointsx= (x1, . . . , xn) and y= (y1, . . . , yn),
f(λx+ (1−λ)y)≤λf(x) + (1−λ)f(y)
for allλ∈[0,1]. When the inequality is strict, the function is said to be strictly convex (Dantzig and Thapa, 1997). The feasible region of a non-linear program is aconvex set provided it is specified by less-than-or-equal-to constraints involving convex functions. Convex optimisation problems benefit from the guarantee that every local minimum so- lution is in fact a global minimum. This property renders convex optimisation problems considerably easier and faster to solve than their non-convex counterparts.
Another important consideration when formulating mathematical programs which can be solved quickly, is the requirement or otherwise for some or all decision variables to be integers. The associated class ofinteger programs are generally much harder to solve. Roughly speaking, the efficient solution methods used to search the single continuous and convex solution space (present in continuous convex optimisation problems) cannot be applied to the disjoint integer solution space.
2.4.2
Stochastic Linear Programming
In the optimisation problems discussed above, all inputs were assumed to be determin- istic in nature. In many real problems however, it is not reasonable to assume that problem parametersc, A,b,gi are deterministically known. The future productivity of
a worker or the demand experienced at different points in time, for example, are better modelled by random variables and hence best characterised by probability distributions (King and Wallace, 2012).
The aim of Stochastic Programming is to find optimal decisions for problems which involve uncertain data. Uncertainty can be represented in terms of random experiments with outcomesω. The values that the various random variables take, denoted by vector
ξ, are known only after the random experiment so that ξ =ξ(ω).
Models in which some decisions are delayed until after information about uncertain quantities has been disclosed are referred to as recourse problems and form a powerful area of stochastic programming.
We can recognise decisions as falling into two groups (Birge and Louveaux, 1997): 1. First-stage decisions which have to be made before the experiment or before the
uncertain information is realised and available; and
2. Second-stage decisions which can be made after the experiment.
In general recourse program notation,xtraditionally represents first stage decisions and
y(ω,x) the second stage decisions. We summarise the sequence of events with
x→ξ(ω)→y(ω,x).
Dantzig (1955) and Beale (1955), is then the problem defined by
min cTx+Eξ[min q(ω)Ty(ω,x)],
s.t. Ax=b,
T(ω)x+W(ω)y(ω,x) = h(ω),
x,y(ω,x)≥0. (2.4.1) Our first-stage or here-and-now decision x does not respond to the outcome of ξ
in any way since it is determined before any information relating to uncertain data has become available. Associated with the first stage problem are the vectors c, b and matrix A.
In the second stage, any random event (from a set of possible events Ω) may be realised. For a given realisation ω, the problem data q(ω), h(ω), T(ω) and W(ω) become known, at which point the second stage decision y(ω,x) must be made. By definition, the single random eventω influences several random variables, here they are every component of ξ.
We can understand the goal of such models as identifying a first stage solution well-positioned against all possible outcomes in the second stage so that advantageous outcomes of ξ can be exploited without major vulnerability to disadvantageous ones.
The objective function contains both a deterministic term cTx and the expectation
of the second stage objectiveq(ω)Ty(ω,x) taken over all realisations ofξ. This second
stage term is the more difficult to compute since for eachω, y(ω,x) is the solution to a linear program in itself. To be able to solve stochastic programs, we therefore need to be able to effectively discretise the continuous distribution of stochastic variablesξ, summarising it using a finite set of samples or ‘scenarios’. We wish to discretise the distribution using as few scenarios as possible, without losing the key properties of the
distribution. This discretisation problem is discussed in more detail in the following subsection.
Discretisation of the expectation forming the second-stage sub-problem allows us to define thedeterministic equivalent linear programassociated with the original continuous problem. This notion is sometimes used to stress and clarify the ‘program within a program’ structure of model (2.4.1).
Defining the second stage value function for a given realisation ω as
Q(x,ξ(ω)) = min y{q(ω)Ty(ω,x)|W(ω)y(ω,x) =h(ω)−T(ω)x,y(ω,x)≥0},
the expected second stage value function, defined over discrete scenario set S, is thus defined as
Q(x) =X
s∈S
psQ(x,ξ(ω)),
whereps ∈[0,1] is the probability associated with each scenario s∈S.
We then have the so-called deterministic equivalent program
min cTx+Q(x) s.t. Ax=b,
x≥0. (2.4.2)
This model’s name gives away the fact that it is essentially one very large-scale ver- sion of a standard deterministic linear program and writing it in this form opens up a range of decomposition based solution techniques which exploit its underlying structure. Indeed we can solve increasingly large two-stage deterministic equivalent programs us- ing a variant of Dantzig-Wolfe Decomposition (or Column Generation) called Benders Decomposition. For more information on these techniques see Bertsimas and Tsitsiklis
(1997).
Two-stage stochastic programs can be extended to multiple stages with a simple amendment of the linear program above. The additional decision stages result in a scenario tree which quickly explodes in size however. Despite the progress made in solving two-stage stochastic programs, multi-stage programs remain elusively difficult to solve for more than a few stages of decision making and a hand-full of scenarios.