Broadening our understanding of economic activity

3. The social economy in Italy: limits and possibilities

3.2 Questioning the social economy from an Italian perspective

3.2.1 Broadening our understanding of economic activity

The works described above are focused on static scheduling of applications which cannot completely solve the problem of executing applications under runtime changing scenarios. Below we summarize some research efforts that deal with dynamic program flows. These approaches range from architectural support for dynamic scheduling to dynamic reconfiguration management, and dynamic ordering of tasks.

1.4. Background and related work

⋆ Microarchitecture support for dynamic scheduling

Noguera and Badia [NB04] presented an approach to the problem of non-preemptive multitasking on reconfigurable architectures, which is based on implementing a microarchitecture support for dynamic scheduling, containing a hardware-based configuration prefetching unit.

Their work is targeted on a heterogeneous architecture including a a general-purpose processor, and array of dynamic reconfigurable logic (DRL) blocks, and shared memory resources. Each DRL block can be independently configured. The architecture supports multiple configu-rations running concurrently.

They proposed a hardware-software co-design methodology for dynami-cally reconfigurable systems which is divided into three stages: applica-tion stage, static stage, and dynamic stage. The applicaapplica-tion stage is fo-cused on the system specification. The static stage includes four phases:

extraction, estimation, hardware-software partitioning, and hardware and software synthesis. The extraction phase obtains the task graph representation from the system specification, including the dependencies between tasks and their priorities. This phase also identifies independent tasks that are mutually exclusive (resembling the temporal partitioning proposed by Purna and Bhatia which was outlined above). The estima-tion phase provides informaestima-tion about delay and area to the following phase. This information is obtained by applying high-level synthesis and profiling tools. The hardware-software partitioning phase decides which tasks will be executed in reconfigurable hardware and which in software. The dynamic scheduling results highly depend on the quality of the hardware-software partitioning, which helps to reduce the runtime reconfiguration overhead. The dynamic stage includes execution schedul-ing and DRL multicontext schedulschedul-ing. Both of them run in parallel and base their functionality on events found in the event stream, i.e. tasks that are ready to be executed because all their dependencies have been completed. The execution scheduler and the DRL multicontext sched-uler algorithms are mapped to hardware, implementing the multitasking

Chapter 1. Introduction

support unit which stores the event stream. Tasks are executed in the DRL blocks or in the CPU, and their execution is triggered by the mul-titasking support unit. By observing a number of events in advance, the execution scheduler assigns events to functional units and decides their execution order. The DRL multicontext scheduler is used to minimize the reconfiguration overhead. It is in charge of deciding which DRL block must be reconfigured, and which reconfiguration context (task) must be loaded in the DRL block. Dynamic scheduler attempts to minimize the reconfiguration overhead by overlapping the execution

⋆ Dynamic reconfiguration management

Resano et al. [RMVC05] developed a reconfiguration manager to reduce the reconfiguration overhead for highly dynamic applications with very tight deadlines. Their target platform is a heterogeneous multi-processor platform that includes one or more instruction set processors (ISPs), re-configurable processing units (RPUs), and ASICs. This platform adopts network-on-chip interconnection model (ICN) [MBV⁺02] for the recon-figurable units. This ICN model partitions a FPGA into an array of identical tiles that serve as RPUs. Each RPU can accommodate one task. Task loading onto an RPU can be performed while the other tasks continues normal execution.

The reconfiguration manager runs under a hybrid runtime scheduling en-vironment called task concurrency management (TCM). This scheduling is split into two phases: design time, and runtime. At design time the scheduler explores the design space for each task and generate a small set of schedules with different energy-performance trade offs. At runtime the scheduler selects the most suitable schedule for each task from all the schedules determined at design time. Two different techniques are per-formed both at design time and runtime: prefetching and replacement.

Design time prefetching tags each node of a task graph with a weight that represents how critical that node’s execution is. An initial schedule is obtained by performing an as-late-as-possible scheduling. Runtime prefetching uses the initial schedule and applies on it a heuristic based

1.4. Background and related work

on list scheduling. According to the number of configurations ready for loading, runtime prefetching selects the configuration with the highest weight. After the runtime scheduler selects the schedule, it identifies the tasks that are currently located in the FPGA and are reusable. With this information, the runtime replacement algorithm creates a replace-ment list on which those tasks that will be executed sooner are always at the beginning of the list.

This work is focused on reducing the reconfiguration overhead for fine-grained reconfigurable architectures. This is achieved by prefetching the critical configuration based on a wighted list. However, this technique need to be supported for a data prefetching technique that provides ap-propriate data to the configurations loaded in advance. Our dynamic scheduling algorithms selects configurations and data to be loaded in advance according to the runtime condition, looking for the reduction of the computation stalls produced by configurations and data unavailabil-ity.

⋆ Dynamic mapping and ordering tasks

Yang and Catthoor [YC04] addressed the problem of task mapping and ordering under performance/cost trade-offs. They deal with embedded real-time applications with large amounts of instruction- and data-level parallelism, which have to be mapped onto a multi-processor platform, typically containing several reconfigurable and programmable compo-nents (general-purpose processor, DSP or ASIP), on-chip memory, I/O and other ASICs.

Performance/cost trade-off exploration is translated to a Pareto-based optimization problem [YC03]. The Pareto-optimal concept comes from multiobjective optimization problems, where more than one conflicting optimization objectives exists. A solution is Pareto-optimal when it is optimal in at least one optimization objective direction. Their approach explores the potentially available design space at design time but defer the selection step till run time, i.e. the system cost function is optimized

Chapter 1. Introduction

at runtime based on pre-computed performance-cost Pareto curves, sat-isfying the real-time constraints.

In their design methodology, the applications are represented as a set of concurrent thread frames (TFs) that exhibits a single thread of control.

Each of these TFs consists of many thread nodes (TNs) that can be looked at as an independent and schedulable code section.

Scheduling is done in two phases. Given a thread frame, the design time scheduler explores all the different mapping and ordering possibilities, and generates a Pareto-optimal set, where every point represents a dif-ferent mapping and ordering combination. The runtime scheduler works at the granularity of thread frames. It considers all actives thread frames, trying to satisfy their time constraints while minimizing the system cost.

After the runtime scheduler selects a Pareto point, it decides the execu-tion order of the thread nodes and on which processor to execute them.

To allows new thread frames to come and join the running applications at any moment, the authors have wrapped every thread frame into an object, which contains an initializer, a scheduler and a specific TF data structure. The scheduler keeps a set of function pointers. Every Pareto point just means a different set of values of these pointers. Whenever a new TF enters the system, its initializer is first called to register itself to the system. Then for a given Pareto point, the scheduler resets its pointer to the desired TNs in the appropriate order. The runtime sys-tem modules runs like a middleware layer which separates the application from the lower level RTOS.

This work deals with applications that have to be mapped on a multi-processor platform. In the case of a large number of multi-processors in this architecture, an important overhead can be expected for the runtime scheduler because the processor assignment is more complex. In addi-tion, no control flow instructions are considered to be placed within a thread node.

1.4. Background and related work

In document Autor: Marco Berlinguer (página 61-64)