3.1 Historia y cifras de Instituciones que la dictan
3.1.1 Duoc UC
3.1.1.2 Asignatura Taller de Televisión
4.3.1
Score-P
Score-P worked “out of the box” in my initial configuration using gcc and
g++ 4.8.5 with OpenMPI, which was a positive experience. Later attempts to build Score-P on other systems and with other compilers revealed its dependen- cies on Qt libraries, which are graphical libraries that are not always available on “headless” servers that are exclusivly accessed via ssh sessions. This is be- cause Score-P chooses to build its graphical interface as a standard part of its build despite the fact that it is software meant to run on systems that often not often offer a windowing system. Furthermore, while it uses an autotools based configure script, it makes non-standard use of its options. For example, the script provides an option, --without-cube, cube being the name of the graphi- cal user interface. Suprisingly, use of this option does not build Score-P “without” cube. Instead, it indicates that the user does not have any pre-existing copy of cubefor Score-P to use, in which case it builds a bundled version of cube. This type of abuse of language, while common, is still offensive. Attempts to compile Score-P with more modern versions of gcc, including a version of gcc 6 failed to compile due to problems that appeared to be related to the more strict in- terpretations of the C++ standards implemented in more recent version of the GNU compiler collection [22]. This is also not uncommon, many libraries still fail to be C++11 compliant, even though gcc 6 moved to the C++14 standard and the current development version of the GNU compiler collection is version 8. These serious portability and compatibility issues make the fact that you must build Score-P for every compiler collection that you may use a perilous task if the researcher is responsible for building and maintaining the Score-P software collection themselves.
These criticisms aside, since I was fortunate enough to have a working instal- lation for the duration of the study of the 3D ABC flow simulation, I was able to collect data and analyze it. This analysis yielded very useful information. The ability to scope the data collection using user defined regions and the filter file allowed me to cut down on the size of the profile collected, which kept the data sets for my experiments at a very reasonable size. Beyond file size considerations, any ability to reduce the amount of “noise” that the user has to sort through in order to find information of interest is highly valuable.
I did find the cube graphical interface relatively easy to use and understand and did not experience any failures of the software to function as advertised. Once the data was imported into the cube viewer, sorting options were hidden from plain view. Options to sort function calls on their metrics are discovered by right-clicking in a certain region of the display. This is not obvious, but useful and easy once found. There was no search function, but selection of a function call in the context tree view also selected in in the flat view, and the opposite case worked as well, which helped identify whereresetLevelswas being called. The break down of metrics over switches and nodes also provided good insight into load balance and the network topology of the nodes provisioned for a job.
The the text in the “Help → Getting Started” menu option of Cube-4.3.5 alludes to a feature for comparing data across experiment databases, “As per- formance tuning of parallel applications usually involves multiple experiments to
compare the effects of certain optimization strategies, Cube includes a feature de- signed to simplify cross-experiment analysis.” This feature is not named in the help text. Further investigation revealed the existence of another command line tool built when cube is built called cube cmp [23], but it is not usable for weakscaling type trials because it requires that they be run on identical “system dimensions”, a phrase used to describe the number of MPI ranks.
There is no data export option in the graphical interface of cube but there is an additional command line tool called cube dump which is advertised as being able to export to csv, gnuplot, and a binary data format understood by the statistical programming language R [23]. I never investigated this because I chose to do the simplest and fastest thing I could do to get any insight on the weakscaling of the program, which was to simply copy values from the screen onto a spread sheet.
In summary, the Score-P toolkit is a feature rich toolkit that can be used to gain significant insight into the performance of parallel codes, and exhibits many of the same portability, compatibility, and usability problems that beset most scientific software.
4.3.2
HPCToolkit
HPCToolkit, once built, can be used to run and measure any application, regardless of how it was compiled. For this reason, I only ever built HPCToolkit once, so I cannot attest to its compatibility with various tool chains. The fact that I, as a user, do not have to worry about rebuilding this tool if my envi- ronment changes is in and of itself very valuable. Additionally, all components of the HPCToolkit environment are free, which is a major consideration if the researcher wants to analyze the trace data of a program and the institution they are affiliated with does not have a license for Vampir. Even if the funds can be made available, the process of obtaining funds itself takes valuable resources, and if the researcher themselves must make this petition for funds, the process of obtaining funds stands in direct competition for the researcher’s time that may have been spent getting results. Other positive elements to using HPCToolkit includes an active and responsive mailing list that puts users in direct contact with the core developers. These core developers are also productive writers and have many easily found publications and presentations available to augment the official documentation available on the HPCToolkit website.
The multiple stages that the data must go through to produce a usable data set makes the workflow relatively complicated and fragile to script. The fact that the
hpcprof and hpcprof-mpi were intermittently unreliable produced situations in which it was unclear what state the data was in if some of the processing was done in the job submission script along with the simulation.
Given a well formed database, I found using the profile viewer hpcviewer
mostly useful for understanding the call tree. For all analysis comparing the vari- ous sized trials to one another and for generating graphs I exported the “Caller’s View” (which corresponds to the “Flat View” of Score-P) to the “csv” media type which can be imported into most spreadsheet software. Although HPCToolkit offers the ability to calculate ”derived metrics” and to open multiple databases, the derived metric creation wizard does not appear to offer the option to create
metrics that pull data from multiple databases. Another challenge is the inter- mingling of the MPI theads into the data of the main program. This makes some of the metrics HPCToolkit calculates for the user of questionable use, because in the case of the experiments I ran, two thirds of the threads never spent any time in the function calls of interest, skewing the data for some metrics on the order of 10×107. There is an ability to filter based on the threads, but this removes the columns of the metrics of interest from the view.
To summarize, HPCToolkit offers the user a good deal of information that they may be able to use to help them identify problem areas in their code. It avoids many compatibility issues by using the sampling approach as opposed to instrumentation. The sheer amount of data and the possibly non-obvious intermixing of data about different threads poses challenges to the user’s ability to understand the data collected. Proficient use of HPCToolkit requires the user to educate themselves about various hardware counters function and meaning, and to understand more about the inner workings of the MPI implementation they are using.