Speaker
Jimmy Aguilar Mena
(MHPC - ICTP)
Description
This thesis presents two experiences of hybrid programming applied to condensed matter and high energy physics. The two projects differ in various aspects, but both of them aim to analyse the benefits of using accelerated hardware to speedup the calculations in current science-research scenarios.
The first project enables massively parallelism in a simulation of the Anderson localisation phe- nomenon in a disordered quantum system. The code represents a Hamiltonian in momentum space, then it executes a diagonalization of the corresponding matrix using linear algebra libraries, and finally it analyses the energy-levels spacing statistics averaged over several realisations of the disorder.
The implementation combines different parallelization approaches in an hybrid scheme. The averag- ing over the ensemble of disorder realisations exploits massively parallelism with a master-slave config- uration based on both multi-threading and message passing interface (MPI). This framework is designed and implemented to easily interface similar application commonly adopted in scientific research, for ex- ample in Monte Carlo simulations. The diagonalization uses multi-core and GPU hardware interfacing with MAGMA, PLASMA or MKL libraries. The access to the libraries is modular to guarantee portability, maintainability and the extension in a near future.
The second project is the development of a Kalman Filter, including the porting on GPU architectures and autovectorization for online LHCb triggers. The developed codes provide information about the viability and advantages for the application of GPU technologies in the first triggering step for Large Hadron Collider beauty experiment (LHCb).
The optimisation introduced on both codes for CPU and GPU delivered a relevant speedup on the Kalman Filter. The two GPU versions in CUD and OpenCL have similar performances and are adequate to be considered in the upgrade and in the corresponding implementations of the Gaudi framework.
In both projects we implement optimisation techniques in the CPU code. This report presents exten- sive benchmark analyses of the correctness and of the performances for both projects.