Xavier Teruel 's publication

The Secrets of the Accelerators Unveiled: Tracing Heterogeneous Executions Through OMPT

Germán Llort, Antonio Filgueras, Daniel Jiménez-González, Harald Servat, Xavier Teruel, Estanislao Mercadal, Carlos Álvarez, Judit Giménez, Xavier Martorell, Eduard Ayguadé and Jesús Labarta . The Secrets of the Accelerators Unveiled: Tracing Heterogeneous Executions Through OMPT. In Proceedings of IWOMP 2016. 12th International Workshop on OpenMP. (p. 217-236). Nara, JAPAN.

Extending previous efforts in the field to expose detailed information from the OpenMP and OmpSs runtimes, regarding the activity and performance of task-based parallel applications.

Abstract

Heterogeneous systems are an important trend in the future of supercomputers, yet they can be hard to program and developers still lack powerful tools to gain understanding about how well their accelerated codes perform and how to improve them.

Having different types of hardware accelerators available, each with their own specific low-level APIs to program them, there is not yet a clear consensus on a standard way to retrieve information about the accelerator’s performance. To improve this scenario, OMPT is a novel performance monitoring interface that is being considered for integration into the OpenMP standard. OMPT allows analysis tools to monitor the execution of parallel OpenMP applications by providing detailed information about the activity of the runtime through a standard API. For accelerated devices, OMPT also facilitates the exchange of performance information between the runtime and the analysis tool. We implement part of the OMPT specification that refers to the use of accelerators both in the Nanos++ parallel runtime system and the Extrae tracing framework, obtaining detailed performance information about the execution of the tasks issued to the accelerated devices to later conduct insightful analysis.

Our work extends previous efforts in the field to expose detailed information from the OpenMP and OmpSs runtimes, regarding the activity and performance of task-based parallel applications. In this paper, we focus on the evaluation of FPGA devices studying the performance of two common kernels in scientific algorithms: matrix multiplication and Cholesky decomposition. Furthermore, this development is seamlessly applicable for the analysis of GPGPU accelerators and Intel Xeon Phi co-processors operating under the OmpSs programming model.

Follow me at:

News & Events

OpenMP F2F 2018-2

A good week in Bordeaux, France focused on closing different open topics in the specification and many of those who have been discussed during past weeks have been shaping up and they are in good condition to go through the committee’s votes.

OpenMP tasking at ISC 2018

An advanced tutorial using the tasking model of the OpenMP standard. This course includes recent addition of OpenMP 4.5, being all the lecturers part of the OpenMP language committee.

INTERTWinE F2F 2018

Last Face to Face meeting before the end of the project and we have report the work progress for the different work packages and several technical sessions in a one-and-a-half-day meeting in Stockholm, Sweden.

PATC (May, 2017)

The tutorial will motivate the audience on the need for portable, efficient programming models that put less pressure on program developers while still getting good performance for clusters and clusters with GPUs.

OpenMP F2F 2018-1

We have done a very good progress towards the future of the OpenMP 5.0 specification. We have discussed many issues and voted several tickets that finally have already been included in the specification (or will do in the short term).

OmpSs demos at SC 2017

Two short demos at the exhibition center showing the basic concepts of the OmpSs programming model. See you in the BSC booth (#1975).