Parallelisation Techniques for the Dual Reciprocity and Time-Dependent Boundary Element Method Algorithms

Parallelisation Techniques for the Dual Reciprocity and Time-Dependent Boundary Element Method Algorithms

Tim Bashford Kelvin Donne Arnaud Marotin Ala Al-Hussany

Faculty of Architecture, Computing and Engineering, University of Wales, Trinity Saint David Swansea, Wales, United Kingdom

Page: 
395-403
|
DOI: 
https://doi.org/10.2495/CMEM-V5-N3-395-403
Received: 
N/A
| |
Accepted: 
N/A
| | Citation

OPEN ACCESS

Abstract: 

The Dual Reciprocity BEM (DRBEM) and the Time-Dependent BEM (TDBEM) are considered in the context of radiative and time-dependent thermal transport, respectively. In order to achieve sensible solution times for realistic 3D problems with large meshes, a range of optimisation techniques are considered, and a number of parallelisation techniques applied: shared memory using multi-core threading, Graphics Processing Unit (GPU) acceleration using CUDA, and distributed memory on a high performance cluster using MPI. Particular consideration is given to practical methods to invert large dense matrices

Keywords: 

BEM, CUDA, MPI, threading

  References

[1] Donne, K.E., Marotin, A. & Al-Hussany, A., Modified dual reciprocity boundary ele-ment modeling of collimated light fluence distribution in normal and cancerous prostate tissue during photodynamic therapy. In 34th International Conference on Boundary Elements and Other Mesh Reduction Methods, 2012.

[2] Brebbia, C.A., Telles, J.C.F. & Wrobel, L.C., Boundary Element Techniques - Theory and Applications in Engineering, ed. C.A. Brebbia, Springer, 1984.

[3] Susan Blackford. LAPack Naming Scheme, Online. Oct. 1999, available at: URL: http://www.netlib.org/lapack/lug/node24.html.

[4] Hasan, M.R. & Whaley, R.C., Effectively exploiting parallel scale for all problem sizes in LU factorization. In Parallel and Distributed Processing Symposium, 2014 IEEE 28th International, pp. 1039–1048, 2014. http://dx.doi.org/10.1109/ipdps.2014.109

[5] Landaverde, R., Zhang, T., Coskun, A.K. & Herbordt, M., An investigation of unified memory access performance in CUDA. In High Performance Extreme Computing Con-ference (HPEC),  IEEE, pp. 1–6, 2014. http://dx.doi.org/10.1109/hpec.2014.7040988

[6] Asaduzzaman, A., Gummadi, D. & Yip, C.M., A talented CPU-to-GPU memory map-ping technique. In Southeastcon, IEEE, pp. 1–6, 2014. http://dx.doi.org/10.1109/secon.2014.6950676

[7] Sterling, T.L., Beowulf Cluster Computing With Linux, Mit Press, 2002.

[8] Yang, L.T. & Guo, M., High-Performance Computing: Paradigm and Infrastructure, Wiley-Interscience, 2006.

[9] Tibayrenc, M., Genetics and Evolution of Infectious Diseases, Elsevier Science, 2010.