This paper presents the serial optimization as well as the parallelization of the TRAF code, a 3D multi-row, multi-block CFD solver for the RANS/URANS equations. The serial optimization was carried out by means of a critical review of the most time-consuming routines in order to exploit vectorization capability of the modern CPUs preserving the code accuracy. The code parallelization was carried out for both distributed and shared memory systems, following the actual trend of computing clusters. Performance were assessed on several architectures ranging from simple multi-core PCs to a small slow-network cluster, and high performance computing (HPC) clusters (CINECA PLX cluster and Intel Grizzly Silver cluster). Code performance are presented and discussed for the pure MPI, pure OpenMP, and hybrid OpenMP-MPI parallelisms considering typical turbomachinery applications: a steady state multi-row compressor analysis and an unsteady computation of a low pressure turbine (LPT) module. Noteworthy, the present paper can provide code developers with relevant guidelines in the selection of the parallelization strategy without asking for a specific background in the parallelization and HPC fields.

A Hybrid Parallelization Strategy of a CFD Code for Turbomachinery Applications / M. Giovannini; M. Marconcini; A. Arnone; A. Dominguez. - ELETTRONICO. - (2015), pp. 1-13. (Intervento presentato al convegno 11th European Conference on Turbomachinery Fluid Dynamics and Thermodynamics, ETC 2015 tenutosi a Madrid, Spain nel March 23-26, 2015).

A Hybrid Parallelization Strategy of a CFD Code for Turbomachinery Applications

GIOVANNINI, MATTEO;MARCONCINI, MICHELE;ARNONE, ANDREA;
2015

Abstract

This paper presents the serial optimization as well as the parallelization of the TRAF code, a 3D multi-row, multi-block CFD solver for the RANS/URANS equations. The serial optimization was carried out by means of a critical review of the most time-consuming routines in order to exploit vectorization capability of the modern CPUs preserving the code accuracy. The code parallelization was carried out for both distributed and shared memory systems, following the actual trend of computing clusters. Performance were assessed on several architectures ranging from simple multi-core PCs to a small slow-network cluster, and high performance computing (HPC) clusters (CINECA PLX cluster and Intel Grizzly Silver cluster). Code performance are presented and discussed for the pure MPI, pure OpenMP, and hybrid OpenMP-MPI parallelisms considering typical turbomachinery applications: a steady state multi-row compressor analysis and an unsteady computation of a low pressure turbine (LPT) module. Noteworthy, the present paper can provide code developers with relevant guidelines in the selection of the parallelization strategy without asking for a specific background in the parallelization and HPC fields.
2015
Conference Proceedings
11th European Conference on Turbomachinery Fluid Dynamics and Thermodynamics, ETC 2015
Madrid, Spain
March 23-26, 2015
Goal 7: Affordable and clean energy
M. Giovannini; M. Marconcini; A. Arnone; A. Dominguez
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in FLORE sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificatore per citare o creare un link a questa risorsa: https://hdl.handle.net/2158/958790
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 25
  • ???jsp.display-item.citation.isi??? 27
social impact