Research Projects

Memory Profiling on NUMA Architectures

The memory subsystem of modern multi-core architectures is becoming more and more complex with the increasing number of cores integrated in a single computer. This complexity leads to profiling needs to let software developers understand how programs use the memory subsystem. Modern processors come with hardware profiling features to help building tools for these profiling needs. Regarding memory profiling, many processors provide means to sample read and write memory accesses. Unfortunately, these hardware profiling mechanisms are often very complex to use and specific to each micro-architecture. During my thesis I developed numap, a library dedicated to the profiling, through sampling, of the memory subsytem of modern multi-core architectures. numap is portable across many Intel micro-architectures and comes with a clean application programming interface allowing to easily build profiling tools on top of it. On top of numap, I am working on a memory profiler and runtime manager called NumaMMa NumaMMa provides original visual representations of the memory behavior for multithreaded application. From this information, efficient thread and memory placement policies can be computed.

Polyhedral Optimization at Runtime

To broaden the scope of polyhedral optimization opportunities, runtime information can be considered. A polyhedral optimizer needs to have affine functions defining loop bounds, memory accesses and branching conditions. Unfortunately, this information is not always available at compile time and thus runtime information can help. I am currently working within the APOLLO framework, which is an Automatic speculative POLyhedral Loop Optimizer. In this context, my work focuses on two different subjects. First, I am investigating what is the impact of choices made by memory allocators on potential polyhedral opportunities. I am particularly interested in studying how data could be reorganized in memory in order to enable polyhedral optimization. The second research path I am following here concerns the polyhedral optimization of dynamic languages such as JavaScript. Because such languages are more and more used for compute intensive application, it is a necessity to investigate how traditional compiler optimization regarding loops could be applied in this context.

Dataflow Programs Profiling

Dataflow programming languages are adequate for expressing many applications quite naturally and have been proven to be a good approach for taking advantage of the intrinsic parallelism of modern multi-core architectures. During my thesis, I proposed different mechanisms allowing to analyze the performances of dataflow applications running on multi-core architectures. The first one, consists in extensions in the so called Static DataFlow (SDF) computation model in order to be able to identify at runtime which parts of the application are bottlenecks.

The second mechanism proposed in my PhD work concerns the memory profiling of dataflow applications executed on Non Uniform Memory Access (NUMA) architectures. Leveraging the memory sampling capabilities of modern hardware, I studied how low level profiling information can be related to the high level abstractions provided by dataflow programming languages. From the reported information, the dataflow application developer and/or the dataflow runtime developer are able to map efficiently the application on the target architecture. I worked in collaboration with the SCI-STI-MM Multimedia Group at Ecole Polytechnique Fédérale de Lausanne (EPFL) to integrate the results of this work in the Turnus dataflow profiler. My profiler is now fully integrated in Turnus.

Resource Allocation For Many-core Architectures

During my postdoctoral position at LIRMM, I have been deeply involved in the Dreamcloud FP7 European project. The main objective addressed by DreamCloud is to enable dynamic resource allocation in many-core embedded and high performance systems while providing appropriate guarantees on performance and energy efficiency. In this project, I proposed and developed SystemC simulators dedicated to Network On Chip (NoC) based distributed memory many-core architectures. I also worked on allocation strategies leveraging artificial neural network (ANN) techniques.