Polyhedral Optimization With Runtime Information
To broaden the scope of polyhedral optimization opportunities, runtime information can be considered. A polyhedral optimizer needs to have affine functions defining loop bounds, memory accesses and branching conditions. Unfortunately, this information is not always available at compile time and thus runtime information can help. I have been deeply involved in the APOLLO project, which is an Automatic speculative POLyhedral Loop Optimizer. In this context, my work focused on applying memoization technics to avoid the expensive cost of polyhedral optimization as much as possible.
In the CORSE team, I am now working in the same research direction. My current main research activity focuses on modeling non affine programs in the polyhedral model based on dynamic analyzes of program executions. The main idea behind this research line consists in over-approximating with a polyhedral representation the parts of the program that do not fit the model.
Publications IMPACT 2017, CCPE 2017, IMPACT 2018 and PPoPP 2019
Memory Profiling on NUMA Architectures
The memory subsystem of modern multi-core architectures is becoming more and more complex with the increasing number of cores integrated in a single computer. This complexity leads to profiling needs to let software developers understand how programs use the memory subsystem. Modern processors come with hardware profiling features to help building tools for these profiling needs. Regarding memory profiling, many processors provide means to sample read and write memory accesses. Unfortunately, these hardware profiling mechanisms are often very complex to use and specific to each micro-architecture. During my thesis I developed numap, a library dedicated to the profiling, through sampling, of the memory subsytem of modern multi-core architectures. numap is portable across many Intel micro-architectures and comes with a clean application programming interface allowing to easily build profiling tools on top of it. On top of numap, I am working on a memory profiler and runtime manager called NumaMMa NumaMMa provides original visual representations of the memory behavior for multithreaded application. From this information, efficient thread and memory placement policies can be computed.
Publications SAMOS 2016 and ICPP 2018
Dataflow Programs Profiling
Dataflow programming languages are adequate for expressing many applications quite naturally and have been proven to be a good approach for taking advantage of the intrinsic parallelism of modern multi-core architectures. During my PhD thesis, I proposed different mechanisms allowing to analyze the performances of dataflow applications running on multi-core architectures. The first one, consists in extensions in the so called Static DataFlow (SDF) computation model in order to be able to identify at runtime which parts of the application are bottlenecks.
The second mechanism proposed in my PhD work concerns the memory profiling of dataflow applications executed on Non Uniform Memory Access (NUMA) architectures. Leveraging the memory sampling capabilities of modern hardware, I studied how low level profiling information can be related to the high level abstractions provided by dataflow programming languages. From the reported information, the dataflow application developer and/or the dataflow runtime developer are able to map efficiently the application on the target architecture. I worked in collaboration with the SCI-STI-MM Multimedia Group at Ecole Polytechnique Fédérale de Lausanne (EPFL) to integrate the results of this work in the Turnus dataflow profiler. My profiler is now fully integrated in Turnus.
Publications MES 2013, PDP 2015, PhD Thesis and EMSOFT-WiP 2017
Resource Allocation For Many-core Architectures
During my postdoctoral position at LIRMM, I have been deeply involved in the Dreamcloud FP7 European project. The main objective addressed by DreamCloud is to enable dynamic resource allocation in many-core embedded and high performance systems while providing appropriate guarantees on performance and energy efficiency. In this project, I proposed and developed SystemC simulators dedicated to Network On Chip (NoC) based distributed memory many-core architectures. I also worked on allocation strategies leveraging artificial neural network (ANN) techniques.
In the CORSE team, I am now working in the context of heterogeneous systems including CPUs, GPUs, and FPGAs. I am particulary interested in both programming concepts to ease targetting such architectures and on runtime mechanisms for efficient resource allocation.
Publications RAPIDO 2016, DREAMCLOUD 2016, RECOSOC 2016 and MCSOC 2016 and RECOSOC 2018 - Best Paper Award