Area geográfica de influencia turística y Excursiones

Perfil Socio-Demográfico del Crucerista Alemán y de Reino Unido

14. El Puerto de Barcelona

14.4. Area geográfica de influencia turística y Excursiones

The types of memory supported by our formalization of CUDA’s memory model are constant, global, local, and shared memory. What these four types of memory have in common is that they are addressable by PTX programs; registers, on the other hand, are not. We define the domain StateSpace to refer to one of the four addressable memory types. Additionally, we are interested in the two subsets of StateSpace that designate the memory types supported by atomic memory operations and the memory types that lie in DRAM.

StateSpace= {.global, .local, .shared, .const}

ssa ∈ StateSpaceatomic= {.global, .shared} ⊂ StateSpace

ssd ∈ StateSpacedevice= {.global, .const, .local} ⊂ StateSpace

As mentioned above, the domain PhysMemAddr is used to address a location in any of the memory types. We define the following subsets of this address space that identify locations in a specific memory type. For the addressable memory types, each address corresponds to one byte of memory. For registers, each address identifies a register in a SM’s register file.

GlobalAddr ⊂ PhysMemAddr SharedAddr ⊂ PhysMemAddr

ConstAddr ⊂ PhysMemAddr RegAddr ⊂ PhysMemAddr

LocalAddr ⊂ PhysMemAddr

However, global, local, and constant memory are all mapped to locations in DRAM. We therefore have to assume that global, local, and constant addresses uniquely identify a location in DRAM, meaning that the same address cannot be used by more than one of the DRAM memory types.

GlobalAddr ∩ LocalAddr= ∅ GlobalAddr ∩ ConstAddr= ∅

LocalAddr ∩ ConstAddr= ∅

Moreover, the number of addresses for each memory type is limited by the amount of available memory on the graphics card:

|_GlobalAddr|= GlobalMemSize |_SharedAddr|= SharedMemSize

|_ConstAddr|= ConstMemSize

RegAddr

= RegFileSize

Shared memory is organized into banks such that successive 32 bit words are assigned to successive banks. Memory operations affecting different shared memory banks can be serviced concurrently; if more than one 32 bit word is read from the same bank, a bank conflict occurs and access is serialized. However, we only define abstract functions that implicitly define the bank layout and resolve bank conflicts. We do not give concrete implementations because bank conflicts merely affect program performance. By contrast, we do have to consider bank locking to correctly define the semantics of shared memory atomic operations,

even though it is not officially documented when and how shared memory banks are locked.

Shared memory atomic operations are not guaranteed to be atomic — in contrast to atomic operations on global memory [9, Table 105] — because of the way the PTX compiler uses the locking mechanism to implement atomic operations on shared memory. Therefore, we cannot simply omit the concept of shared memory banks, even though they are solely relevant for performance consideration except for this one case. Consequently, a shared memory bank can either be in locked or unlocked state.

Lock= B

We identify shared memory banks by their bank index. The maximum number of shared memory banks is determined by the compute capability of the underlying hardware.

BankIdx= {1, . . . , SharedMemBanks}

The abstract function banks returns the indices of the banks that are accessed when a given amount of bytes starting at the specified address are read from or written to shared memory.

banks : SharedAddr × MemOpSize → BankIdx∗

The formalization of a memory is a function that maps an address to a byte or a register value. For shared memory, we also have to keep track of each bank’s lock state. Similarly, we keep track of each register’s blocked state. We neglect CUDA’s concept of constant memory banks for the reasons outlined in section 2.3.2.

GlobalMem= GlobalAddr → Byte LocalMem= LocalAddr → Byte

ConstMem= ConstAddr → Byte RegFile= RegAddr → RegValue

SharedMem= (SharedAddr → Byte) × (BankIdx → Lock)

With the above formalization of the different memory types, we can finally define the memory environment that we base the semantics of memory programs on. The memory environment comprises global, local, and constant memory as well as the L2 cache. In addition, we define a domain that encapsulates the memories found on each streaming multiprocessor, i.e. shared memory, the register file, and the L1 cache. The memory environment consequently contains a processor memory environment for each streaming multiprocessor.

ProcMem= SharedMem × L1Cache × RegFile

η ∈ MemEnv = GlobalMem × ConstMem × LocalMem × L2Cache × ProcMemNumProcessors

The streaming multiprocessors are successively numbered. These numbers represent the indices of the processors and uniquely identify each one. We assume that the location of a

processor in the list of processor memories in the memory environment corresponds to its index number.

pid ∈ ProcIdx= {0, . . . , NumProcessors − 1}

Similar to caches, we now formalize the operations supported on the memory environment that the memory program semantics later makes use of, namely reading and writing of any type of memory. To do this, we first define the following helper functions that cut down the amount of projections required to define the rules and functions found in the remainder of this chapter. These functions allow us to access a processor’s L1 cache, shared memory, or register file.

l1Cache : MemEnv × ProcIdx → L1Cache regFile : MemEnv × ProcIdx → RegFile

l1Cache(η, pid) = η−−−−−−−→

procMem,pid,l1Cache regFile(η, pid) = η−procMem−−−−−−→,pid,regFile

sharedMem : MemEnv × ProcIdx → SharedMem sharedMem(η, pid) = η−−−−−−−→

procMem,pid,sharedMem

The function readdevice reads a value from device memory, i.e. global, constant, or local

memory. It returns a list of bytes, the size of which depends on the size of the memory request. Additionally, we check whether the given address actually matches the specified state space. If it does not, ⊥ is returned. We use the abbreviated list concatenation syntax b1. . . bninstead of b1::. . . :: bnto make the definition more readable.

readdevice : MemEnv × StateSpacedevice×PhysMemAddr × MemOpSize → (Byte ∗

)⊥

readdevice(η, .global, pAddr, size) = ηglobalMem(pAddr+ 0) . . . ηglobalMem(pAddr+ size − 1)

if {_pAddr+ 0, . . . , pAddr + size − 1} ⊆ GlobalAddr

readdevice(η, .local, pAddr, size) = ηlocalMem(pAddr+ 0) . . . ηlocalMem(pAddr+ size − 1)

if {_pAddr+ 0, . . . , pAddr + size − 1} ⊆ LocalAddr

readdevice(η, .const, pAddr, size) = ηconstMem(pAddr+ 0) . . . ηconstMem(pAddr+ size − 1)

if {_pAddr+ 0, . . . , pAddr + size − 1} ⊆ ConstAddr

readdevice(η, ssd, pAddr, size) = ⊥ otherwise

We define a similar function for shared memory reads. However, shared memory reads can

access several different locations concurrently, thus the function is passed a list of addresses

and access sizes and returns a list of read values. We assume that the function that translates the shared memory accesses of threads into memory operations deals with bank conflicts such that requests that are in conflict are split into two or more conflict-free requests. Therefore, readshared can safely ignore the possibility of bank conflicts. Additionally, non-atomic reads

of shared memory do not care about a bank being locked, hence readsharedsucceeds even if an

accessed bank is locked.

readshared : MemEnv × ProcIdx × PhysMemAddr ∗

×_MemOpSize∗_{→ ((Byte}∗₎∗₎⊥

readshared(η, pid, ε, ε) = ε

readshared(η, pid, pAddr ::

−−−−→

(s(pAddr+ 0) . . . s(pAddr + size − 1)) :: readshared(η, pid,

−−−−→ pAddr,−size)−→

if s= π1sharedMem(η, pid) ∧ {pAddr + 0, . . . , pAddr + size − 1} ⊆ SharedAddr

readshared(η, pid,

−−−−→

pAddr,−size)−→ = ⊥ otherwise

Finally, we define two functions that enable memory programs to write to global, local, and shared memory. Obviously, threads are not allowed to write to constant memory. Only the host program can write to constant memory; however, the semantics of the host program are outside the scope of this report.

writedevice : MemEnv × StateSpacedevice×PhysMemAddr

×_{MemOpSize × Byte}∗→_MemEnv_⊥

writedevice(η . globalMem, .global, pAddr, size,~b) =

η / globalMem[pAddr + 0 . . . pAddr + size − 1 7→ ~b0. . .~bsize−1]

if {_pAddr+ 0, . . . , pAddr + size − 1} ⊆ GlobalAddr

write_device(η . localMem, .local, pAddr, size,~b) =

η / localMem[pAddr + 0 . . . pAddr + size − 1 7→ ~b0. . .~bsize−1]

if {_pAddr+ 0, . . . , pAddr + size − 1} ⊆ LocalAddr

writedevice(η, ssd, pAddr, size,~b) = ⊥ otherwise

Again, several shared memory locations can be written concurrently, bank conflicts are already accounted for, and locked banks do not prevent a non-atomic write.

writeshared: MemEnv × ProcIdx × PhysMemAddr

∗

×_MemOpSize∗_{× ((Byte}∗₎∗_{) → MemEnv}_⊥

writeshared(η, pid, ε, ε, ε) = η

writeshared(η .

−−−−−−−→

procMem, pid, pAddr ::−−−−→pAddr, size ::−size,~b :: ~~b) =−→ writeshared(η /

−−−−−−−→

procMem[pid 7→−procMem−−−−−−→pid/ s], pid,

−−−−→

pAddr,−size,~~b)−→

where s0 = sharedMem(η, pid)

∧_s= s0/ π₁_(s0_)[pAddr+ 0 . . . pAddr + size − 1 7→ ~b₀. . .~b_size−1_]

if {_pAddr+ 0, . . . , pAddr + size − 1} ⊆ SharedAddr

writeshared(η, pid,

−−−−→

pAddr,−size,~~b) = ⊥ otherwise−→

In document El turismo de cruceros en Cartagena. Una aproximación descriptiva (página 161-167)