Skip to main content



​​​​​​​​​​​Please be advised this schedule is tentative and may be subject to change. Please check our website regularly to see the updates!​​
  • MondayMay 22
  • TuesdayMay 23
  • WednesdayMay 24
8:45 AM

Welcoming Remarks

9:00 AM

Design of Numerical Algorithms for Partial Differential Equations on Next-Generation Computer Architectures

The end of Dennard scaling, resulting in the use of low-power processor technologies in HPC, is leading to the deployment of large-scale parallelism on a single compute node to obtain high performance. By 2024, the cost of flops on HPC-capable systems, in terms of dollars and power, is expected to decrease by a factor of 100, relative to what it was in 2014. However, the performance of memory systems for these processors will not keep pace, due to power constraints. The ratio of the peak aggregate flop rate to the bandwidth between DRAM and the floating point units, as well as ratio of flop rate to overall size of DRAM, will be significantly less than what we have been used to for the last 20 years. Traditional numerical methods for partial differential equations have low arithmetic intensity (AI). Such methods can only obtain a small fraction of peak performance on such systems: performance is limited by rate at which data can get to the floating point units. In this talk, we will give several examples of how these new architectures lead to new tradeoffs in the design of numerical algorithms for PDE, trading flops for bytes in a way that reduces the overall data traffic and memory footprint required to obtain a given level of accuracy in a calculation. We will focus our attention on structured-grid and particle methods; in those cases, the most obvious example of this approach is the use of methods that are higher-order accurate than the ones customarily used (mostly second order) in multi-physics calculations. High-order methods can, in principle, obtain a given level of accuracy for fewer degrees of freedom, and the arithmetic intensity increases with the order of accuracy of the method. The design of new discretization methods is central to the success of this endeavor. However, there are other issues that need to be addressed in order for this approach to be successful. The design of high-order accurate methods that will retain their apparent advantages for realistic applications is nontrivial, and significant effort is required to realize the theoretically predicted performance by careful implementation.

 Phillip Colella
Professor of EECS, UC Berkeley
10:30 AM

Large-Eddy Simulations of Wall Bounded Turbulent Flows

We report recent progress on wall-resolved and wall-modeled large-eddy simulation (LES) of wall-bounded turbulent flows, including model development and applications. The sub-grid-scale (SGS) model employed is the stretched spiral vortex SGS model, originally proposed by Misra & Pullin (Phys. Fluids 1997). The present wall-model is an extension of the original one proposed by Pullin & co-workers for LES of high Reynolds number flows in a channel, turbulent boundary layer, and a pipe. Our extension to the wall model includes three modifications: relaxation of the log-law assumption, extension into two-dimensional wall model and reformulation in generalized curvilinear coordinates. Five flow configurations are examined in detail. The first case is that of a zero pressure gradient turbulent boundary layer in which our objective is to address whether the mean velocity profile obeys a log-law or a power-law. The second case is that of a separated and re-attached turbulent boundary layer; where the two-dimensional wall model is shown to reasonably predict the wall stress in the separated flow. The third case is flow over period hill, a benchmark case for LES modeling. The fourth and perhaps more challenging case is that of a flow over a circular cylinder with emphasis on reproduction of the drag crisis at Re~O(105) and quantification of skin friction on the surface. The final case is that of flows over airfoils, which is a prerequisite step for applying the present curvilinear wall model on cases with even more complex geometry. In all these, our emphasis is on validation and quantitative comparisons with experimental data when available. Acknowledgement: The work was supported by the KAUST office of Competitive Research Funds under Award No. URF/1/1394-01. The Shaheen-Cray XC40 at KAUST was utilized for all the simulations. Collaborators: Dale Pullin, Wan Cheng, Wei Zhang and Wei Gao

 Ravi Samtaney
Professor of Mechanical Engineering, KAUST
11:05 AM

Project Simulations Based on a Lattice-Boltzmann Method

The talk will present simulations based on a Lattice-Boltzmann Method, which allows efficient handling of large-scale and complex geometries. Since the turbulent flow needs to be spatially and temporally resolved, the simulations are usually carried out on a large number of computational cores for adequate turnaround times. Methodology and HPC requirements will be explained and examples range from aerospace to oil & gas applications.

 Benjamin Duda
Senior Aerospace Engineer, Exa Corporation
11:35 AM


 ​Matteo Parsani
Assistant Professor of AMCS, KAUST
12:00 PM

End of Morning Session

2:00 PM

CFD @ GE ... Usage & Trends

CFD is a key technology enabler for obtaining improved performance for land-based and airborne turbomachinery engines at GE. We will present the state-of-the-art ways GE utilizes CFD in the design process for turbomachinery blade rows and discuss the complementary strategies related to high performance computing. We will discuss the interplay between RANS and LES in the engineering process. We will close with a discussion of future challenges and highlight ways academic research can impact industrial CFD.

 Brian E. Mitchell
Senior Principal Engineer, GE GLOBAL Research
3:30 PM

HPC for Multiphase Separation Flows

Saudi Aramco R&DC scientists are working on enhancing the performance of multiphase gravity separator vessels (GSV), utilizing CFD to better design the internals of these separator vessels, and enhance their separation efficiency. The multiphase CFD modeling of such an application is computationally demanding, as it needs to address the many involved phenomena, in addition to the relatively large domain sizes. Typical Saudi Aramco vessels are 40 m in length and 4 m in diameter. Typical complex features of the multiphase separation process are the poly-dispersed nature of the oil emulsion, complex emulsion rheology, transient nature of the process, and the need to resolve turbulence structures that affect droplets coalescence and breakage, etc. These competing phenomena put a high demand on computational resources. Utilization of enough parallel processes of an HPC system, such as KAUST’s supercomputer Shaheen II, should expedite the turnover rate of such simulations, which will enable Saudi Aramco engineers to explore more design options, and forecast performance of the multiphase separators under foreseen conditions. In collaboration with the KAUST Supercomputing Lab, Saudi Aramco R&DC scientists are investigating the acceleration of the turnover rate of these particular simulations when using a parallel multiphase CFD code, such as ANSYS Fluent, to predict the complex flow inside Saudi Aramco GSVs, and running the simulations on a massively parallel HPC resource such as Shaheen II. The results of the scalability study performed on Shaheen II will be presented in this talk. The results show excellent scalability until 16k cores. The results of arguably the largest run of FLUENT ever, on any supercomputer, 200k cores, will be presented as well, conditional on the results being available on the date of the presentation.

 Ehab Elsaadawy
Computational Modeling Specialist, Saudi ARAMCO
4:05 PM

Towards Exascale Simulations of Turbulent Combustion with Complex Chemistry

Advances in massively parallel computing hardware enabled high fidelity simulations of turbulent combustion at unprecedented physical scales with chemical complexities. The presentation will start with an overview of current status in the KARFS (KAUST Adaptive Reacting Flows Solver) project, a new direct numerical simulation (DNS) code for reacting flows for hybrid CPU+X architecture. The new code has been applied to a number of physical studies, such as turbulent premixed flame at high Reynolds/Karlovitz numbers, bluff-body flame stabilization, and auto-ignition of premixed reactants in the presence of temperature fluctuations. Key challenges and scientific findings from these studies will be discussed.

 Hong Im
Professor of Mechanical Engineering, KAUST
4:40 PM

A Fully-Coupled Model for the DNS of Particulate Flow

In many applications the simulated fluid incorporates a rather small amount of particles and each single particle needs to be resolved. In biological systems the high viscosity of the surrounding cytosol plays a dominant role since the hydrodynamic interaction between the fluid and the particle serves as dedicated communication pathway within the overall system. So called direct numerical simulation is necessary. We derived a fully coupled and stable method for an appropriate formulation of the interacting forces. Many methods introduce external forces to describe the interaction between fluid and particles. Such an approach typically leads to a saddle point problem. We utilize a finite volume discretization of the Navier-Stokes equations. Its theoretical treatment as a so-called Petrov-Galerkin method enables to develop a fully coupled model, naturally incorporating the particles into the equations. By rigorous extension of the solution space and also the test space of the bilinear operator the internal forces can be formulated in a rather natural way on the basis of the underlying model equations. Our approach leads to an implicit representation of the according forces. The main benefit is the avoidance of a saddle point formulation. Consequently, the solving procedure is conducted in one step and the coupling of the interaction forces can be preserved. Furthermore, in contrast to other approaches, no stabilizing terms need to be introduced to achieve stability: By exploiting the comparability to a finite element method our method can even be proved to be stable. In addition, the immersed boundary is represented as sharp interface in the discrete equations. Second order convergence in space was reached. Finally, the resulting discrete scheme is consistent with the original, unrestricted equations without particles.

 Susanne Höllbacher
Research Associate, Johann-Wolfgang-Goethe University
5:10 PM

End of Afternoon Session

8:30 AM

Treatment Planning and the Movement of Circulating Tumor Cells in the Bloodstream

10:00 AM

Sustained Petascale Production-Mode DNS of Secondary Flows with a Very High-Order IMEX Solver on Three of the Most Powerful Supercomputers in the World

Corners in rectangular ducts induce flows in the cross-stream plane, so-called secondary flows that enhance the cross-sectional mixing. These secondary flows are turbulence driven. It is an open question as to whether Reynolds numbers greater than 30,000 strengthen or dampen the secondary flows, and how flow properties are affected. We demonstrate a tool capable of analyzing these questions by performing highly efficient extreme scale direct numerical simulations of secondary flows using very high-order spectral element methods. Single-node efficiency is achieved by auto-generated assembly implementations of small matrix multiplies and key vector-vector operations, aggressive loop merging, and selective single precision evaluations. Scalability is maintained by bounded nearest neighbor communication and global coarse solvers based on algebraic multigrid. Using 294,912 cores (9216 nodes) on Trinity XC40, we demonstrate greater than 50% of peak memory bandwidth sustained over the entire solver (500 TB/s in aggregate), sustained petascale performance, and a 30% reduction in power consumption compared to native Fortran on Shaheen XC40. Similar performance is achieved on the emerging Xeon Phi architecture. On 3072 nodes of Cori we achieved 378 TFLOP/s with an aggregated bandwidth of 310 TB/s, corresponding to 2.11x faster time-to-solution compared to the same number of Haswell nodes.

 ​Matteo Parsani
Assistant Professor of AMCS, KAUST
10:35 AM

Towards Greener Aviation with Python at Petascale

Accurate simulation of unsteady turbulent flow is critical for improved design of greener aircraft that are quieter and more fuel-efficient. We demonstrate application of PyFR, a Python based computational fluid dynamics solver, to petascale simulation of such flow problems. Rationale behind algorithmic choices, which offer increased levels of accuracy and enable sustained computation at up to 58% of peak DP-FLOP/s on unstructured grids, will be presented in the context of modern hardware. A range of software innovations will also be detailed, including use of runtime code generation via a bespoke domain specific language, which enables PyFR to efficiently target multiple platforms, including heterogeneous systems, via a single implementation. Finally, we will show results from a full-scale simulation of flow over a low-pressure turbine blade cascade, along with weak/strong scaling statistics from the Piz Daint and Titan supercomputers, and performance data demonstrating sustained computation at up to 13.7 DP-PFLOP/s.

 Freddie David Witherden
Postdoctoral Fellow, Stanford University
11:10 AM

Shared Memory Parallelization of the Flux Kernel of PETSc-FUN3D

Shared memory parallelization of the flux kernel of PETSc-FUN3D, an unstructured tetrahedral mesh Euler code previously characterized for distributed memory SPMD for thousands of nodes, is hybridized with shared memory SIMD for hundreds of threads per node. We explore thread-level performance optimizations on state-of-the-art multi- and many-core Intel processors, including second generation Intel Xeon Phi (Knights Landing). While linear algebraic kernels are bottlenecked by memory bandwidth for even modest numbers of cores sharing a common memory, the flux kernel, which arises in the control volume discretization of the conservation law residuals and in the formation of the preconditioner for the Jacobian, is compute-intensive and effectively exploits contemporary multi-core hardware. We study its performance on the Xeon Phi in three thread affinity modes, namely scatter, compact, and balanced, with different configurations of memory and cluster modes on Knights Landing, with various code optimizations to improve alignment and reduce cache coherency penalties. The optimizations employed to reduce the data motion and cache coherency protocol penalties are expected to be of value other unstructured applications as many-core architecture evolves.

11:45 AM

A Landau Collision Integral Solver with Adaptivety on Emerging Architectures

The Landau collision integral is an accurate model for the small-angle dominated Coulomb collisions in fusion plasmas. We investigate a high order accurate, fully conservative, finite element discretization of the nonlinear multi-species Landau integral with adaptive mesh refinement using the PETSc library ( We present algorithms and techniques that efficiently utilize emerging architectures by minimizing memory movement and being amenable to vector processing. We present performance results on the second generation Intel Xeon Phi, Knights Landing, processor.

 Mark Adams
Research Scientist, LBNL/Columbia University
12:20 PM

End of Morning Session

12:30 PM

Poster Competition

Sponsored by the KAUST Industry Collaboration Program (KICP)

12:30 PM


Sponsored by the KAUST Industry Collaboration Program (KICP)

2:20 PM

Bound-Preserving High Order Schemes for Convection-Dominated Problems

We give a survey of our recent work with collaborators on the construction of uniformly high order accurate discontinuous Galerkin (DG) and weighted essentially non-oscillatory (WENO) finite volume (FV) and finite difference (FD) schemes which satisfy a strict maximum principle for nonlinear scalar conservation laws, passive convection in incompressible flows, and nonlinear scalar convection-diffusion equations, and preserve positivity for density and pressure for compressible Euler systems. A general framework (for arbitrary order of accuracy) is established to construct a simple scaling limiter for the DG or FV method involving only evaluations of the polynomial solution at certain quadrature points. The bound preserving property is guaranteed for the first order Euler forward time discretization or strong stability preserving (SSP) high order time discretizations under suitable CFL condition. One remarkable property of this approach is that it is straightforward to extend the method to two and higher dimensions on arbitrary triangulations. We will emphasize recent developments including including high order bound-preserving schemes for relativistic hydrodynamics, high order discontinuous Galerkin Lagrangian schemes, and high order discontinuous Galerkin methods for radiative transfer equations. Numerical tests demonstrating the good performance of the scheme will be reported.

 Chi-Wang Shu
Professor of Mathematics, Brown University
3:45 PM

Recent Progress in Space-Time Conservation Element and Solution Element Method

Space-time conservation element and solution element (CE/SE) method is a unique finite-volume-type method for solving hyperbolic conservation laws. This approach has several attractive properties, including: (i) unified treatment of the space and time derivatives such that only one step is required for high-accuracy time marching; (ii) a compact stencil regardless of the order of the accuracy; (iii) easiness of extension to any arbitrary shape of polygonal elements; and (iv) straightforward and efficient implementation of non-reflective boundary condition. Since its inception, the CE/SE method has been successfully used in the computation of compressible flows. This presentation will introduce some recent progress in this method, including: (i) upwind CE/SE schemes and (ii) bound-preserving CE/SE schemes. Applications to hypersonic flows, detonation of gaseous mixtures, and multi-fluids flows will also be discussed.

 Hua Shen
Postdoctoral Fellow, KAUST
4:20 PM

Hierarchical Boltzmann Simulations and M(odel)-Refinement

Gas flow in thermal non-equilibrium can be described by kinetic theory and Boltzmann equation at the expense of strongly increased computational costs. A hierarchical simulation approach for Boltzman›s equation should provide a single numerical framework in which a coarse representation can be used to compute gas flows as accurate and efficient as in computational fluid dynamics (CFD), but a subsequent refinement allows to successively improve the result to the complete Boltzmann result. The hierarchical nature of the method can then be used to provide efficient model error estimates to increase the predictivity of a CFD computation. This talk will give a proof-of-concept approach for such a framework for the case of the steady linearized Boltzmann equation using Hermite discretization, that is, moment equations and an implicit discontinuous Galerkin formulation.

 Manuel Torrilhon
Professor of Mathematics, RWTH Aachen University
4:50 PM

Panel discussion

5:35 PM

End of Afternoon Session

8:30 AM

Quantifying Inflow and Turbulence Model Form Uncertainties in (RANS) Simulations of the Flow and Dispersion of a Passive Tracer in Downtown Oklahoma City

10:00 AM

Geophysical Flow Modeling on Adaptive Quadtree Meshes

We demonstrate our success in incorporating GeoClaw (LeVeque, George, Berger, Mandli), a widely used code for simulation of tsunamis, debris flow, flooding, storm surges and so on, into ForestClaw, an adaptive quadtree code based on the highly scalable library p4est (C. Burstedde). This new adaptive mesh framework uses the patch-based algorithms first developed by Berger, Colella, LeVeque and others and extends them to a non-overlapping hierarchy of grids occupying the leaves of a quadtree mesh. This new framework brings parallel compute power to the GeoClaw code (previously parallelized using OpenMP) and allows us to run GeoClaw simulations on large scale parallel computing environments. We show results from simulations of the 1976 Teton Dam failure in eastern Idaho, and show that we can achieve excellent agreement with historical data such as arrival times and flooding boundaries. If time permits, we will also discuss our experience in incorporating legacy Cartesian single grid/serial codes into modern adaptive frameworks. In particular, we discuss some of the pitfalls in porting codes which rely heavily on modern Fortran (F90) constructs such as modules. In this part of the talk, we will discuss our recent work porting Ash3d, a serial, single grid code developed by the USGS Cascade Volcanic Observatory (Vancouver, WA) for simulating volcanic ash dispersion, to ForestClaw.

 Donna Calhoun
Associate Professor of Mathematics, Boise State University
10:35 AM

An Introduction to Methods with the Summation by Parts Property with a Small Sampling of Recent Developments

Classical finite-difference operators having the summation-by-parts (SBP) property were originally constructed with the goal of mimicking finite-element linear stability proofs. The crucial feature of such operators is that their constituent matrices are high-order approximations to integral bilinear forms and therefore lead to high-order approximations to integration by parts. The combination of SBP operators with appropriate weak imposition of boundary and inter-element coupling results in an SBP framework that allows for a one-to-one correspondence between discrete and continuous stability proofs and thereby naturally guides the construction of robust algorithms. Since their introduction in the early 70s, much of the concentration in the SBP community has been on developing rigorous means of imposing inter-element coupling and boundary conditions for a variety of PDEs including the Navier-Stokes equations. Recently, there have been a number of exciting developments in the SBP community. The SBP framework has been extended to include nodal discontinuous and continuous Galerkin methods, the flux-reconstruction method, and has been shown to have a subcell finite-volume interpretation. SBP operators can now be constructed on non-tensor product nodal distributions consequently intruding the ability to construct SBP schemes on unstructured meshes. Nonlinearly robust schemes can be constructed by enforcing discrete entropy stability and dual consistent schemes have been developed that lead to superconvergent functional estimates, etc. The appealing feature of the SBP concept is that it provides a rigorous mathematical framework within which to construct flexible and robust numerical methods with advantageous properties. In this talk I will give a brief introduction to the SBP concept and highlight a number of recent developments.

11:10 AM

Kinetic Energy Balanced DG Schemes Based on SBP Operators on Gauss-Legendre Nodes

Regarding high order discretizations of fluid equations, recent approaches in the simulation of compressible turbulent flow are based on so-called split forms of conservation laws instead of the divergence form in order to guarantee the preservation of secondary physical quantities such as kinetic energy. Conservation of the primary quantities is then achieved by space discretizations with the so-called summation-by-parts (SBP) property. It has been shown that nodal DG schemes based on Gauss-Lobatto nodes and a lumped mass matrix posses this property. Within the framework of a generalized summation-by-parts property, a kinetic energy preserving(KEP)-DG scheme may also be constructed on Gauss-Legendre nodes. This variant is potentially more accurate and may be also more efficient than its Gauss-Lobatto counterpart for certain applications. In comparison to standard DG schemes, a better representation of the kinetic energy spectrum can been shown for homogeneous isotropic turbulence. Furthermore, we may extend the KEP-DG scheme on Gauss-Legendre nodes to moving fluid grids resulting from an ALE formulation of the fluid equations and thus to problems involving mechanical fluid structure interaction.

 Sigrun Ortleb
Research Group Leader, University of Kassel
11:40 AM


 ​Matteo Parsani
Assistant Professor of AMCS, KAUST
12:00 PM

End of Morning Session

1:30 PM

Afternoon Musicale

2:30 PM

Predictive CFD, Past, Present, and Future

*talk will be given by Dr. Freddie Witherden The talk will review the evolution of CFD over the last four decades, including the transition to progressively higher fidelity models and progressively more complex geometry. Industrial applications of CFD will be discussed with regard both to its predictive capability and human and computational costs. There is now a need to move beyond the current plateau of Reynolds averaged Navier Stobes (RANS) simulations towards large eddy simulations (LES). The final part of the talk will present some current developments of high order methods using unstructured meshes which offer a route to achieving this goal.

 Antony Jameson
Professor of Aeronautics and Astronautics, Stanford University
 Freddie David Witherden
Postdoctoral Fellow, Stanford University
4:00 PM

The SpectralDNS Project

In this talk I will present the spectralDNS project ( In spectralDNS, we implement Navier-Stokes solvers for Direct Numerical Simulations of turbulent flows. The solvers are implemented in the high-level Python language, and this is done without compromising significantly on speed. I will show a Navier-Stokes solver that is merely a script at 100 lines, but that still scales for thousands of processors at the KAUST Supercomputing Laboratory. The talk will address challenges and advantages with using high-level languages for high performance computing.

 Mikael Mortensen
Associate Professor of Mathematics, University of Oslo
4:35 PM

Algorithmic Adaptations to Extreme Scale

We analyze the contributions of the PCCFD workshop talks to the agenda of exascale algorithms, as articulated in, for instance, the G-8 country exascale software roadmap. Instead of squeezing out flops – the traditional goal of algorithmic optimality, which once served as a reasonable proxy for all associated costs – algorithms must now squeeze synchronizations, memory, and data transfers, while extra flops on locally cached data represent only small costs in time and energy. After decades of programming model stability with bulk synchronous processing, new programming models and new algorithmic capabilities (to make forays into, e.g., data assimilation, inverse problems, and uncertainty quantification) must be co-designed with the hardware. We briefly recap the architectural constraints and application opportunities. We then mention two types of tasks, each of occupies a large portion of all scientific computing cycles: large dense symmetric/Hermitian systems (covariances, Hamiltonians, Hessians, Schur complements) and large sparse Poisson/Helmholtz systems (solids, fluids, electromagnetism, radiation diffusion, gravitation). We examine progress in porting solvers for these tasks to the hybrid distributed-shared programming environment, including the GPU and the MIC architectures that make up the cores of the top scientific systems “on the floor” and “on the books.”

 David E. Keyes
Professor of AMCS, KAUST
5:05 PM

End of Afternoon Session

​​​​​​​​​​​​​​​​​Please be advised this schedule is tentative and may be subject to change. Please check our website regularly to see the updates!​​