Memory Profiling on NUMA Architectures
The memory subsystem of modern multi-core architectures is becoming more and more complex with the increasing number of cores integrated in a single computer. This complexity leads to profiling needs to let software developers understand how programs use the memory subsystem. Modern processors come with hardware profiling features to help building tools for these profiling needs. Regarding memory profiling, many processors provide means to sample read and write memory accesses. Unfortunately, these hardware profiling mechanisms are often very complex to use and specific to each micro-architecture. During my thesis I developed numap, a library dedicated to the profiling, through sampling, of the memory subsytem of modern multi-core architectures. numap is portable across many Intel micro-architectures and comes with a clean application programming interface allowing to easily build profiling tools on top of it. On top of numap, I am working on a memory profiler and runtime manager called NumaMMa NumaMMa provides original visual representations of the memory behavior for multithreaded application. From this information, efficient thread and memory placement policies can be computed.
Polyhedral Optimization at Runtime
Dataflow Programs Profiling
Dataflow programming languages are adequate for expressing many applications quite naturally and have been proven to be a good approach for taking advantage of the intrinsic parallelism of modern multi-core architectures. During my thesis, I proposed different mechanisms allowing to analyze the performances of dataflow applications running on multi-core architectures. The first one, consists in extensions in the so called Static DataFlow (SDF) computation model in order to be able to identify at runtime which parts of the application are bottlenecks.
The second mechanism proposed in my PhD work concerns the memory profiling of dataflow applications executed on Non Uniform Memory Access (NUMA) architectures. Leveraging the memory sampling capabilities of modern hardware, I studied how low level profiling information can be related to the high level abstractions provided by dataflow programming languages. From the reported information, the dataflow application developer and/or the dataflow runtime developer are able to map efficiently the application on the target architecture. I worked in collaboration with the SCI-STI-MM Multimedia Group at Ecole Polytechnique Fédérale de Lausanne (EPFL) to integrate the results of this work in the Turnus dataflow profiler. My profiler is now fully integrated in Turnus.
Resource Allocation For Many-core Architectures
During my postdoctoral position at LIRMM, I have been deeply involved in the Dreamcloud FP7 European project. The main objective addressed by DreamCloud is to enable dynamic resource allocation in many-core embedded and high performance systems while providing appropriate guarantees on performance and energy efficiency. In this project, I proposed and developed SystemC simulators dedicated to Network On Chip (NoC) based distributed memory many-core architectures. I also worked on allocation strategies leveraging artificial neural network (ANN) techniques.