Extensions of HMC methods to discrete spaces have been considered in recent years [Zhang et al.,2016,Nishimura et al.,2017,Dinh et al.,2017]. I will explain the HMC methods on discrete spaces in the framework described in Section 4.3.1. The piecewise deterministic proposal map Sτ can be defined in various ways depending on
the structure of the sample space. In this subsection, we illustrate a simple example of sampling from discrete sample space having d-dimensional lattice-like structure, that is, when the space is defined as a product X = X1 × X2 × · · · Xd where each
component Xi, i ∈ 1 : d, is a set with finite or countable number of elements. We
suppose each space component Xi, i ∈ 1 : d, is isomorphic to one of three types of
sets, Z, Z+, or {1, . . . , k} for some k. That is, we assume that for each i ∈ 1 : d,
there exist a set Ai of one of the three types just mentioned and a bijective map
ιi : Ai → Xi. If xi = ιi(a) ∈ Xi for some a ∈ Ai and if a + 1 exists in Ai, we define
the next element of xi as x+i := ιi(a + 1) and say x+i exists in Xi. Note that x+i may
not exist if Ai = {1, . . . , k} and xi = ιi(k). Likewise, we define the previous element
of xi as x−i := ιi(a − 1), if a − 1 exists in Ai.
We define the momentum variable to be an element of V = {−1, 0, 1}d. Each
entry of v represents the direction in which the particle moves in each coordinate. We define the one step evolution map S1 : X × V → X × V as S1(x, v) := (˜x, ˜v) where
the i-th coordinates of ˜x and ˜v are defined as (˜xi, ˜vi) = (x+i , vi) if vi = 1 and x+i exists.
(xi, −vi) if vi = 1 and x+i does not exist.
(x−i , vi) if vi = −1 and x−i exists.
(xi, −vi) if vi = −1 and x−i does not exist.
(xi, vi) if vi = 0
Here, xi and vi denote the i-th components of x and v. The evolution map Sτ for
τ ∈ Z+ is defined as iterative composition of S1, that is, Sτ := S1τ. We take the
counting measure on X as the reference measure with respect to which the density ¯
π(x) is defined. We also take the counting measure on V as the reference measure on V. The density of the velocity distribution ψ(v) is taken such that ψ(v) = ψ(−v) for all vinV. One possible choice is to take ψ to be a function of kvk0 :=
Pd
i=11[vi6=0] only. In this case, ψ(·) defines a distribution on the number of nonzero components of v, and all elements of V with the same number of nonzero components are equally likely.
We check that the above construction satisfies the two conditions presented in Section4.3.1. If we take R(x) = −id for all x ∈ X, (4.4) and (4.6) are easily satisfied. Since R(x) is a bijection, it preserves the counting measure on V. We can also check that T ◦ S1 is a self-inverse map, so the condition (4.7) follows. By Lemma IV.3, we
see that Sτ is a bijection, so the counting measure on X × V is preserved by Sτ.
So far the algorithm just described does not seem to be related to HMC. We will now show how this algorithm may be seen as a HMC algorithm. The main idea is to view the uniform (0, 1) random variable we draw at each iteration of Algorithm 4
as the kinetic energy of the particle. The connection is made by setting the initial kinetic energy K equal to − log Λ. In this HMC formulation, we let each component
of the momentum v to take any real value, and the momentum space becomes V = Rd. The Hamiltonian of a particle at location x and momentum v is defined as
H(x, v) := − log π(x) +
d
X
i=1
|vi|.
We assume that the kinetic energy K :=Pd
i=1|vi| = kvk1 is equally shared by all mo-
mentum components with nonzero magnitude. That is, we let v = kvkK
0 sign(v1), . . . , sign(vd), where sign(a) = 1 if a ∈ R is positive, sign(a) = −1 if a is negative, and sign(0) = 0.
The Hamiltonian equation of motion (4.9) at discrete time t ∈ Z is interpreted in the following way. The first equation dxdt = ∂H∂v is interpreted as
xi(t + 1) − xi(t) =
∂kvk1
∂vi
= sign(vi).
The second equation dvdt = −∂H∂x is interpreted as
kv(t + 1)k1− kv(t)k1 = log π{x(t + 1)} − log π{x(t)},
such that v(t + 1) = kv(t)k1−log π{x(t)}+log π{x(t+1)}
kvk0 sign(v1), . . . , sign(vd). From this relation, we can easily see that the Hamiltonian of the system is preserved along the path, that is, H{x(t), v(t)} = H{x(t + 1), v(t + 1)}. A state (x, v) is physically admissible if the kinetic energy K = kvk1 ≥ 0. Under the relation K(0) = − log Λ,
the condition K(t) ≥ 0 is equivalent to
Λ = exp{−K(0)} = exp [−K(t) − log π{x(0)} + log π{x(t)}] ≤ π{x(t)} π{x(0)}.
This agrees with the acceptability criterion in Algorithm 4. Since Λ is a uniform (0, 1) random variable, the initial kinetic energy K(0) is distributed according to the exponential distribution with unit rate parameter. This formulation agrees with the Laplacian Hamiltonian Monte Carlo formulation considered in Zhang et al. [2016] and Nishimura et al. [2017].