1. EL DEPORTE COMO TEMA ARTÍSTICO 16
1.3. EL VALOR SIMBÓLICO DE ELEMENTOS
Currently S-Net supports multiple execution layers, e.g. PTHREAD[47]and Light- weight Parallel Execution Layer (LPEL)[112].
The PTHREAD layer maps each S-Net entity to a dedicated Portable Operating
System Interface [for Unix] (POSIX) thread[60]. As the threads in this layer are managed by an Operating System (OS) scheduler, it may suffer from cache- and context-switch related overheads. This is due to the nature of S-Net in which active entities tend to quickly exceed the number of available cores. The read/write operations on streams in this layer are protected by POSIX mutexes.
The LPEL was designed to overcome the problem faced by the PTHREADlayer. The LPEL creates as many threads as the number of the available processing cores. These threads are called workers in LPEL. Each S-Net entity is mapped to a lightweight coroutine. This alleviates the scheduling decision from an OS level to the LPEL level. In addition, LPEL also makes it possible to collect monitoring information, such as execution time of box entities or buffer usage. The read/write operations on streams in this layer are protected by atomic processor instructions, which emulate semaphores. Since our work revolves around LPEL we will keep our discussion limited to this particular execution layer. The execution layer also provides a scheduler to map computation to resources. The LPEL provides two schedulers described in work by Prokesch [112] and Nga [103]. The former provides a decentralised scheduler in which the task to core mapping is done statically, while the latter provides a centralised scheduler and demand-based priority for tasks. We will describe the latter in detail as we also use the same in our work.
2.2.4.1 LPEL - A Stream Execution Layer with Efficient Scheduling
The LPEL is an execution layer designed for S-Net which allows collecting monitoring information and provides control over mapping and scheduling of tasks. In addition, it provides task- and stream-management functionality. LPEL adopts a user-level threading scheme providing the necessary threading and communication mechanisms in user-space. It builds upon the services provided by the OS or virtual hardware, such as kernel-level threading, context switching in user-space, atomic instructions, and timestamping.
Figure2.8shows an abstract design of the LPEL. In LPEL, each core is modelled as a worker. A special case of a worker is called conductor. A task is ready when it has all of its data available on its input stream and its output stream is not full. The LPEL scheduler does not map tasks to worker permanently. Instead, ready tasks are stored in a queue called Central Task Queue (CTQ). It is the responsibility of the conductor to manage this CTQ. The scheduler uses the notion of data demands on streams to derive the task priority. The built-in monitoring framework is used to retrieve stream state information at runtime and to analyse data demands. When a worker is free, it sends a request for a new task to the conductor. On each request for a task from a worker, the conductor retrieves the task with the highest priority and sends it to the worker. While the worker is executing the task, the conductor updates the CTQ, and also updates the task priority if needs be, without interrupting workers.
All the conductor-worker communications are exercised via mailboxes. The com- munication takes place only between conductor and worker; worker-worker commu-
Running Task Monitoring Monitoring Monitoring Monitoring Monitoring monitoring
Worker
CPUConductor
CentralTask Queue Ready TaskReady TaskReady Task Ready Task CPU Running Task Monitoring Monitoring Monitoring Monitoring Monitoring monitoring
Worker
CPU Mailbox Mailbox Mailboxnication is not allowed. The mailbox basically consists of a message queue, in which messages are enqueued by conductor/workers and dequeued only by the owning work- er/conductor. As the workers access the mailboxes of the conductor concurrently, care must be taken to ensure corruption-free operations of the mailbox—atomic operation
or PTHREADmutexes are used to protect critical region.
The streams in LPEL are uni-directional and implemented as FIFO buffers. The read/write operations are protected by atomic operation or PTHREADmutexes. Reading
from an empty stream will put the task into a blocking state. Since the streams are not bounded, writing is always successful.
2.2.4.2 Distributed S-Net with LPEL
We already know that the S-Net language is extended with a concept of nodes and placement to support the distributed system. Each PE on the distributed system is equipped with its own S-Net RTS and LPEL/PTHREADexecution layer. There is no shared memory to provide communication between workers in LPEL. For this reason the centralised version of scheduler can not be used. Prokesch[112]implements a decentralised scheduler for LPEL that features a local scheduler for each worker. This scheduler is a perfect fit for such a distributed scenario—of course as an alternative to the Pthread based execution layer.
The compiler first takes the S-Net program with placement annotations and gener- ates the PE specific CRI code. Each PE runs an instance of its own S-Net RTS and LPEL layer with a decentralised scheduler. Each PE uses CRI code to create LPEL tasks and streams as usual. Once the task to PE mapping is done at the compile time, it is not changed during the run time. PE-specific LPEL instance with decentralised scheduler controls the scheduling of tasks within the PE.
S-Net RTS provides three components called, Input Manager (IM), Output Manager (OM) and the Data Fetcher (DF). When a stream crosses the PE boundary it is re- gistered with a manager, input stream with IM, and output stream with OM. The task reading/writing to the stream performs operations as normal, while IM/OM transparently moves messages between different PEs.
Let us now look at a simple example: tasks t1and t2 are connected serially and
are located on different PEs, PE1and PE2respectively. When t1produces a message
on its output stream, OM on PE1reads the message and by using MPI sends it to the
IM of the corresponding PE, PE2 in our case. The IM on PE2also uses the MPI to
receive the message. Once it has the message it writes it to the corresponding stream, the input stream of t2in our case.
the field data is sent instead, to avoid unnecessary data transfers. The representation consists of a Unique data Identifier (UID) and the location of the PE where the actual data is held. This mechanism allows the avoiding the transfer of data until it will be actually needed by a task. At this point, the DF sends a fetch request to the IM of the PE where the data is kept. The DF uses UID and location information to identify the data it needs and the location of the PE. The IM that received the data request informs the DF of the same PE. The DF retrieves the data requested and sends it to the PE that asked for it. The local scheduler of LPEL does not control either IM, OM, or DF, as they are implemented as kernel-level threads and are scheduled by the OS scheduler. One benefit of such a design is that it prevents deadlocks; on the flip side this design may increase OS-level context switches.