• No se han encontrado resultados

Tendencia arquitectónica

5.4 Proyecto arquitectónico

5.4.3 Tendencia arquitectónica

Real-time scheduling for multicores usually has the assumption of a set of independent tasks. Lately, there has been increasing trend to schedule multi-threaded applications (parallel tasks). This growing interest is motivated by modern real-time applications that require more performance that can not be easily achieved without multi-threaded appli- cations in which the threads can run in parallel. Some examples include unmanned aerial vehicles and self-driving cars that require processing data from different sensors including video cameras. Today, even many hobby toys and models, such as quad-copters, needs sophisticated real-time computation.

Researchers have devised schedulability analyses for a variety of different system models. To ease classification, we distinguish between two orthogonal concerns: the scheduling model and the task model. In the thread scheduling model, parallel threads of the same task are scheduled independently, whereas in gang scheduling, the threads must execute concurrently.

Parallel tasks can have multiple threads that can exceed the number of processors available in the system. It is common that parallel tasks are modeled using fork-join structure or a directed acyclic graph (DAG). In the fork-join structure, an application starts with a single thread and then forks multiple threads. After that, the task can keep alternating between a single sequential thread and multiple parallel threads. The application also ends, usually, with a single thread. In the case of a DAG, threads can be modeled as nodes in the graph, and the dependencies are modeled as directed edges. Edges represent precedence constraints: a predecessor node must be executed before a successor node.

Most related work considers thread scheduling. Work considering the fork-join task model include [88, 92, 141, 48, 120]. The DAG task model is analyzed in [32, 92, 93]. [109, 31] extended the work on classical DAG model by introducing conditional DAG model, where subtask execution depends on the conditional path of execution through the program.

As a first work in parallel real-time tasks scheduling, Lakshmanan et al. [88] proposed a scheduling algorithm for OpenMP fork-join structure. Nonetheless, the work in [88] restrictive as task has to fork to the same number of threads each time. Consequently, Saifullah et al. [141] relaxed the previous model in [88].

Some works have proposed scheduling schemes based on assigning intermediate dead- lines to individual threads of the same parallel task. Nelissen et al. [120] have proposed techniques to determine the intermediate artificial deadlines while minimizing the number of processors needed to schedule the whole task set. In addition, threads are treated as if they were independent sequential sporadic tasks. Unlike the previous reviewed works, Baruah et al. [32] proposed a scheduling analysis for single parallel task expressed as a DAG. Whereas, Li et al. [92] considered the same model but for multiple tasks. Nonethe- less, all reviewed works are only concerned with pure CPU scheduling, and none of them considered contention on shared resources, such as shared cache and main memory.

[93] introduced federated scheduling of classical DAG parallel-task model. It is a gener- alization of partitioned sequential tasks model by allocating dedicated set of cores for tasks with utilization higher than one. A fundamental assumption is that under these models, parallel threads of a task can be scheduled independently with no synchronization penalty. On the other hand, Alhammad et al. [16] is the first work on real-time scheduling for parallel tasks that considered the interference from other threads caused by contended ac- cesses to main memory. The technique used in this work, to provide thread isolation, is an improvement over [15] that avoids contention without the need to hardware arbitration. This work differs from [15] as it considered multi-threaded application instead of indepen- dent tasks. In addition, this work used profiling scheme similar to [104] in order to divide the application into segments. Each thread is segmented into three consecutive phases (prefetch, execution, write-back) in which synchronization happens at the boundaries of the memory phases (prefetch and write-back). Threads share data through main memory as the write-back phase flushes the cache content to main memory before the next thread starts the prefetch phase. It is also observed that the used technique is more scalable with the number of cores unlike the contention scheme.

In addition, Alhammad et al. [17], extended the work on federated scheduling in [93] by considering the cost of accessing memory. Basically, the authors introduce optimizing

algorithms to assign computation and memory budgets for each parallel task in the system to improve overall system schedulability.

Tessler et al. [165], introduced an application-level scheduling strategy to take advantage of cache memory to tighten the WCET in parallel task. In particular, the proposed method permits threads to execute across conflict free regions, and blocks those threads that would create an unnecessary cache conflict. The WCET bound is determined for the entire set of m threads, rather than treating each thread as a distinct task. The proposed method relies on the calculation of conflict free regions which are found by a static analysis of the task object.

The need to concurrently gang schedule related threads of the same parallel application has been first discussed in [123]. The performance benefits of gang scheduling has been studied in [60, 77, 150] and many others that looked into co-scheduling in general-purpose computing. In the real-time domain, the existing literature that consider gang scheduling distinguish among three task models. In the rigid model [81,55,64], the (constant) number of threads required by a task is fixed off-line. In the moldable [38] case, the number of threads assigned to each job is decided at run-time by the scheduler, but kept constant during the execution of the job. In the malleable [47] case, the scheduler can change the number of assigned threads during the job’s execution.

Our proposed bundled scheduling for parallel tasks, as discussed in Chapter 7, con- siders systems where the number of parallel threads is dictated by the application’s ex- ecution, so that global scheduling decisions do not influence the way the application is executed. Therefore, we only compare our approach against the rigid model. [81] intro- duced a schedulability analysis for rigid tasks with EDF scheduling policy; whereas, [55] provided a utilization-based schedulability test for EDF. In contrast, this paper targets fixed priorities. Furthermore, as noted in [55], [81] contains a mistake in the way carry- in interference is bounded, while [55] itself is limited to implicit deadlines, rather that constrained-deadlines as considered in our approach. Finally, [64] proposed an optimal, off-line slot-based scheduling algorithm for strictly periodic rigid tasks, but the framework does not naturally extend to sporadic tasks.

Documento similar