Feb 24 – 26, 2016
SISSA, International School for Advanced Studies
Europe/Rome timezone

MHPC Thesis: Hybrid Parallelisation Strategies for Boundary Element Methods

Feb 25, 2016, 4:25 PM
20m
Aula Magna a Paolo Budinich (SISSA, International School for Advanced Studies)

Aula Magna a Paolo Budinich

SISSA, International School for Advanced Studies

Via Bonomea 265, 34136 Trieste, Italy

Speaker

Mr Nicola Giuliani (MHPC - SISSA)

Description

Whenever a mathematical problem admits a boundary integral representation, it can be straightforwardly discretised by Boundary Element Methods (BEM). In this work, we present an efficient hybrid parallel solver for FSI problems based on collocation BEM. The major bottlenecks for a serial implementations of BEM is the computational cost and memory requirements needed to respectively assemble and store the BEM full matrices. Both memory storage and assembling CPU times scale with the square of the number of degrees of freedom. We present two different strategies to parallelise BEM implementations. The first uses an MPI strategy, in which we distribute both assemblage workload and storage requirement among different processors, maintaining the classical BEM structure (and algorithm complexity). This approach leads to optimal strong and weak scalability for the matrix assemble cycles and vector matrix multiplication, although the overall algorithm remains of order O(N^2). In the second strategy, we employ a Fast Multipole Method (FMM) to reduce the computational cost and memory allocation of the BEM problem resolution to O(N), and we use a hybrid MPI and multi-threaded parallelization strategy. This implementation combines direct BEM close range interactions with FMM long range couplings, and represents the state of the art in parallel BEM solvers. The BEM-FMM algortihm calls for a hybrid solution, since the algorithm requires inherently a lot of communication among different processors. We address the main parallelisation techniques to be used in a hybrid parallel BEM-FMM implementation, for which we used the Intel Threaded Building Block paradigm to handle multicore platform, and MPI for the communication between different processors. We present strong and weak scalability results together with an optimality result concerning the way to proper set the hierarchical FMM space subdivision.

Primary author

Presentation materials