This thesis takes inspiration from work being carried out at the Ohio State Uni- versity by Srinivasan et al. Their paper ”A Robust Scheduling Strategy for Mold- able Scheduling of Parallel Jobs” by Sudha Srinivasan, Savitha Krishnamoorthy and P. Sadayappan (2003) has strongly influenced the system design and test- ing approach adopted in this thesis.
Where most algorithms rely on users giving a recommendation on partition size and will then choose the best option at submit or schedule time, Srinivasan et al.’s approach includes algorithm scalability characteristics to further improve the allocation decision. Adopting the Downey model as a framework Srinivasan et al. are able to prove that letting the scheduler decide the variance, and then allocate resources helps to improve turn-around times for jobs in a FCFS queue with aggressive backfilling and a Fair Use policy. The results also show improved usage efficiency and more robustness on the part of the system.
The initial premise to their paper is that up to that point mouldable algorithms were still rigid as they were dependant on user input and didn’t take into account application performance characteristics. Using real trace logs of a small system at the San Diego Computing Centre and NAS benchmarks for their running al- gorithms Srinivasan et al. are able to create a mouldable workload. Working backwards, the execution time in the trace is matched to the results of the NAS benchmark for the application. This approach gives the authors a point on a scalability curve derived by the Downey model for the application. This point on the line uses the following information: application, number of cores requested and the execution time. The Downey model is then employed to calculate exe- cution times at other node allocations.
4. Literature Review 69
With this scalability information and use of the aggressive backfilling algorithm Srinivasan et al. are able to ensure improved performance across all classes of jobs. The class divisions are created along the lines of workload-weight. Work- load weights are calculated using number of cores requested and execution time. The authors maintain that FCFS systems cause fragmentation and un- der utilisation and the introduction of backfilling in mouldable scheduling leads to poor turn around time for some workload-weights (it is assumed they mean despite mouldable scheduling large jobs get delayed, but they do not explicitly state this) (Srinivasan et al., 2003).
Srinivasan et al.’s work signals a step change in thinking when it comes to mouldable scheduling. Taking out the rigidity within existing mouldable schedul- ing approaches by allowing the system to make allocation decisions using ap- plication performance characteristics is a major step towards autonomic work- load management. While they did not emphasise it, their approach to creating synthetic logs to test mouldable schedulers is novel and very important. Other papers around the same time have not strongly documented how they have cre- ated their workloads or how their workloads correlate to real observable work- loads on HPC systems.
However there are some assumptions made which lead to shortfalls in the ap- proach. Primarily it is the use of the Downey Model to evaluate scalability of workloads. The Downey model is a system agnostic approach to determine application scalability. The Downey model does not take into account network speeds and overheads (e.g. in clusters), location of the data (e.g. in clusters and clouds), or multi-tenancy nodes (e.g. in clouds). Even though the authors used the NAS benchmarks, the results are not used to find the true scalability across the range of allocations. They use the user specified core allocations in the trace logs to make assumptions about both the size of the dataset and
4. Literature Review 70
the workload information. Observation of workloads on the University of Hud- dersfield Queensgate Grid reveals allocations of 2 or 4 nodes are very common and cover a large range of dataset size1. However the execution time of two
jobs using the same application can be in the same range but have different workload and dataset sizes. This is why Srinivasan et al. assumptions seem to introduce larger errors into their testing.
Srinivasan et al. also do not take into account queue wait times in their system workload. This over sight is further compounded by the fact that the authors as- sumption of under-utilisation in a FCFS scheduling strategy refers to assertions made by Krallmann et al. in their 1999 paper entitle ”On the Design and Evalua- tion of Job Scheduling Algorithms”. This paper does not take in account mould- able scheduling, and only tests and refers to rigid workloads where only a single user specified allocation can be made for any job (Krallmann, Schwiegelshohn, & Yahyapour, 1999).
Under a mouldable environment the scheduling algorithm should be able to fit jobs into vacant spaces if the turn around time can be improved. A mouldable scheduler needs to take into account the queue time of a job into the ”total time” for the job. Based on this and the ability to mould a job, the chances of large swathes of the system sitting idle are very small. The exception would usually occur if the system assumes the minimum allocatable node count is greater than 50% of the system. In a heterogeneous user system the introduction of backfilling will also remove the small percentages of idle system.
1Section 6.3.2 shows how many classes of datasets for one application and one workload
4. Literature Review 71
4.6
Summary
This chapter presented an outline of the various scheduling algorithms, policies and strategies. At a fundamental level, there are scheduling algorithms that can be applied to many scheduling strategies. This thesis is focused on a mouldable scheduling strategy and will take into account First Come First Served and First Come First Served with Backfilling scheduling algorithms.
As efforts to make large, complex and more autonomous schedulers are on- going, the use of Artificial Intelligence (A.I.) techniques like fuzzy logic, heuris- tics and machine learning are being adopted. These improve the overall schedul- ing performance of HPC systems with minimal operator input.
Thus far this thesis has outlined the key characteristics of RCS, Job Manage- ment Systems, and Scheduling techniques. The next Chapter will explore the University of Huddersfield research computing environment and analyse the typical users and workloads handled.
Chapter 5
University of Huddersfield
Research Computing Infrastructure
5.1
Introduction
Since 2009 the University of Huddersfield has invested heavily in developing its in-house Research Computing Infrastructure (RCI). As the University has moved towards being more research led, giving researchers access to Research Computing Systems (RCS) type machines became essential. Within the Uni- versity environment the computer clusters and other systems had to support a wide array of applications. The roadmap for research computing at the Uni- versity is set by the High Performance Computing Research Group (HPC-RG), which is a group of academics from different disciplines and members of the University’s IT services. The day-to-day management of the RCI is handled by the High Performance Computing Resource Centre (HPC-RC).
The different systems that have been deployed to support research computing along with the software stack and supported applications are outlined in Section 5.2. The profile of users and the systems utilisation is discussed in Section 5.3.
5. University of Huddersfield RCS 73
FIGURE5.1: The Beowulf cluster Eridani: Cold Isle
The need to provide Quality of Service (QoS) to our users and improve resource utilisation has lead to the development of a new mouldable job scheduler.
To evaluate the mouldable scheduler, and the intelligent mouldable scheduler, described in detail in Section 8 and 9.2 respectively, a snapshot of a real work- load from an existing system is utilised. This snapshot forms the input for the simulations when the tool is tested and evaluated. The Workload Sample (de- scribed in Section 5.4) evaluates, in detail, resource utilisation. Job arrival and completion rates are presented to illustrate QoS that is inherent in the system.