Sign In
GTC Logo
GPU
Technology
Conference

March 24-27, 2014 | San Jose, California
Slidecasts of GTC sessions are available now for conference registrants – please “Sign In” to view.
PDFs of presentation slides will be available by mid-April. Registrants must login to view slidecasts and PDFs.
For non-registrants, this GTC content will be available at the end of April on GTC On Demand.

GPU Technology Conference Schedule Planner

Print
 
  • List View
  • Calender View

 
Refine:
  • Session Levels:
  • |
  • |
  • |
  • |
  • Session Levels:
  •  
  •  
  •  
  •  
  • = Highly Rated Speaker

HANDS-ON LAB

Presentation
Details

S4791 - Hands-on Lab: Building a Sparse Linear Solver using CUDA Libraries

Sharan Chetlur ( CUDA Software Engineer, NVIDIA )

In this hand-on session, we will construct a Sparse Iterative Solver using CUDA library routines. We will use the standard CUBLAS and CUSPARSE libraries to construct a simple, yet performant Solver without writing any custom CUDA kernels. We will walk through an example of how to set up and use various CUBLAS and CUSPARSE APIs to implement the SSOR (Symmetric Successive Over-Relaxation) algorithm. Be prepared for this hands-on lab by installing the suggested software at bit.ly/gtc14labs on your system.

Session Level: Intermediate
Session Type: Hands-on Lab
Tags: Computational Fluid Dynamics; Computational Physics; Numerical Algorithms & Libraries; Manufacturing

Day: Wednesday, 03/26
Time: 14:00 - 15:20
Location: Room 230A

S4792 - Hands-on Lab: Leveraging Accelerated Core Agorithms Using NVIDIA AmgX

Marat Arsaev ( Systems Software Engineer, NVIDIA )
Marat's expertise include image & video processing and software optimization and acceleration using GPUs. Prior to joining NVIDIA, Marat was a Software Developer at MSU Graphics & Media Lab.Marat received his degree in Computer Science from Moscow State University.

AMGX is a new flexible and easy-to-use NVIDIA GPU-accelerated high performance sparse linear solver library. It features variety of popular solvers as well as user-defined solver configurations like nested solvers or preconditioners. Come and learn how easy is it to use the library in your application, configure the solver and get maximum performance of it. You will also learn how to solve your linear system using multiple GPUs using our library. Be prepared for this hands-on lab by installing the suggested software at bit.ly/gtc14labs on your system.

Session Level: Intermediate
Session Type: Hands-on Lab
Tags: Computational Structural Mechanics; Computational Fluid Dynamics; Computational Physics; Programming Languages & Compilers

Day: Wednesday, 03/26
Time: 15:30 - 16:50
Location: Room 230A

Hands-on lab
 

TALK

Presentation
Details

S4299 - Fast Solvers for Linear Systems on the GPU

Cornelis Vuik ( Professor, Delft University of Technology )
Cornelis Vuik
Cornelis received his Master Applied Mathematics from TU Delft and his Ph.D. from Utrecht University. Since 2010, Cornelis has served as the Associate Editor SIAM Journal on Scientific c Computing (SISC). Cornelis has authored more than 130 ISI papers, is the Co-investigator of an Erasmus Mundus Master program and Director of Delft Centre for Computational Science and Engineering and Scientific c Director of 3TU.AMI Applied Mathematics Institute.

Some examples are given to solve large linear systems coming from practical/industrial applications. The methods are based on preconditioned Krylov subspace methods. Most building blocks are easy implemented on the GPU. The most involved operation is the preconditioner. In this talk three variants are discussed: (1) Neumann series, (2) Deflation techniques, and (3) Recursive red black ordering. The methods are applied so multi-phase flow and a ship simulator application and show speedups of a factor 30-40.

Session Level: Intermediate
Session Type: Talk
Tags: Numerical Algorithms & Libraries; Computational Fluid Dynamics; Supercomputing; Manufacturing

Day: Tuesday, 03/25
Time: 14:00 - 14:25
Location: Room LL21D

S4209 - Multi-Block GPU Implementation of a Stokes Equations Solver for Absolute Permeability Computation

Nicolas Combaret ( Software Engineer, FEI Visualization Sciences Group )
Nicolas Combaret is a Software Engineer at FEI Visualization Sciences Group where he works mainly on Avizo XLab simulation extensions of Avizo® Fire software application. These extensions provide efficient numerical simulation capabilities to compute material physical properties from a 3D digital image. Nicolas obtained his Ph.D. degree in Physical Chemistry of Condensed Matter from Bordeaux I University.

The goal of this session is to show a multi-block implementation of a Stokes equations solver in Avizo® Fire for absolute permeability computation. Challenges to compute such a complex property in a general purpose software application will be first defined to explain the basis of this work. A Stokes equations solver will be presented, which was developed to the GPGPU computing. Details about the multi-block approach which allows dealing with large datasets in acceptable time on one GPU will be given. Examples and metrics of performance will be finally shown before emphasizing the future perspectives.

Session Level: Intermediate
Session Type: Talk
Tags: Energy Exploration; Computational Fluid Dynamics; Computational Physics

Day: Tuesday, 03/25
Time: 14:30 - 14:55
Location: Room LL20B

S4250 - PyFR: Bringing Next Generation Computational Fluid Dynamics to GPU Platforms

Peter Vincent ( Lecturer, Imperial College London )
Peter Vincent
Peter Vincent is a Lecturer and EPSRC Early Career Fellow in the department of Aeronautics at Imperial College London, working at the interface between mathematics, computing, fluid dynamics, and aeronautical engineering. He holds a 1st class undergraduate degree from the department of Physics at Imperial College (graduating top of the year), and a PhD from the department of Aeronautics at Imperial College in the field of CFD. Prior to his appointment as a Lecturer, Peter served as a Postdoctoral Scholar in the department of Aeronautics and Astronautics at Stanford University, where he developed novel high-order numerical methods for CFD, and implemented them for massively-parallel many-core Graphical Processing Units (GPUs).

Discover how GPUs are being used to accelerate high-fidelity computational fluid dynamics (CFD) simulations on unstructured grids. In this talk I will (i) introduce the flux reconstruction approach to high-order methods; a discretization that is particularly well-suited to many-core architectures, (ii) introduce our massively parallel implementation, PyFR, which through a combination of symbolic manipulation and run-time code generation is able to easily target NVIDIA GPU hardware and, (iii) showcase some of the high-fidelity, unsteady, simulations undertaken using PyFR on both desktop and HPC systems.

Session Level: Intermediate
Session Type: Talk
Tags: Computational Fluid Dynamics; Numerical Algorithms & Libraries; Supercomputing

Day: Tuesday, 03/25
Time: 15:00 - 15:25
Location: Room 210A

S4598 - GPU Acceleration of CFD in Industrial Applications Based on OpenFOAM

Bjoern Landmann ( Development Engineer, FluiDyna GmbH )
Bjoern Landmann is a development engineer at FluiDyna GmbH, Munich, Germany, since 2011. His research interests include: computational multiphysics; high-performance coputing; turbulence and aeroacoustics.

CFD calculations in an industrial context prioritize fast turn-around times - a requirement that can be addressed by porting parts of the CFD calculation to the GPU, leading to a hybrid CPU/GPU approach. In a first step, the GPU library Culises has been developed, allowing the GPU-based solution of large-scale linear systems of equations that are in turn set up by MPI-parallelized CFD codes (e.g. OpenFOAM) on CPU. In this session we will address a second step, which consists in porting the construction of the linear system to the GPU as well, while pre- and post-processing remain on the CPU. Aiming for industrial applications in the automotive sector, the approach will be aligned on the simpleFOAM solver of OpenFOAM. As the set up of the linear system consumes up to 40-50% of computational time in typical cases of the automotive industry, this approach can further increase the acceleration of CFD computations.

Session Level: Intermediate
Session Type: Talk
Tags: Computational Fluid Dynamics; Automotive; Manufacturing

Day: Tuesday, 03/25
Time: 15:30 - 15:55
Location: Room 210A

S4727 - Large Scale Reservoir Simulation Utilizing Multiple GPUs

Garfield Bowen ( Simulator Lead, Ridgeway Kite Software )
Garfield Bowen
Garf Bowen is a leading figure in reservoir simulation serving as a committee member on the SPE (Society of Petroleum Engineers) Reservoir Simulation Symposium and as a member of the editorial committee of a number of SPE Journals. Garf has been associated with the ECLIPSE reservoir simulator development group from 1987 and has contributed as scientific and technical authority and innovation champion to successor projects and bespoke developments for a range of clients. Originally a mathematician he has built a career around developing elegant solutions to the practical problems facing reservoir engineers on a daily basis. Garf is currently leading the development of a new simulator for Ridgeway Kite, a UK based software start-up venture.

Reservoir simulation has a long history as a tool used by reservoir engineers to plan and optimize (oil & gas) field developments. These simulations are inevitably 3-dimensional and transient and hence require considerable computing resources. Traditional simulators are typically constrained by the bandwidth to memory. The GPU architecture allows access to greater bandwidth, once the simulator is parallel. However, the memory constraints on a GPU, limit the problem size that can be tackled. In this presentation we describe a paradigm where we utilize a single GPU if the problem will fit into the memory and simply scale to multiple GPUs as the memory requirement grows. The practicality is demonstrated by running a 32 million cell case on 32 Tesla GPUs.

Session Level: All
Session Type: Talk
Tags: Energy Exploration; Computational Fluid Dynamics; Computer Aided Design

Day: Tuesday, 03/25
Time: 16:30 - 16:55
Location: Room LL20B

S4341 - Harnessing the Power of Titan with the Uintah Computational Framework

Alan Humphrey ( Software Developer and Ph.D. Student, Scientific Computing and Imaging Institute, University of Utah )
Alan Humphrey
Alan Humphrey is a software developer at the Scientific Computing and Imaging Institute and also a Ph.D. student at the University of Utah where he works with Dr. Martin Berzins on improving the performance and scalability of the Uintah Computational Framework. Alan has been primarily involved in extending Uintah to run on hybrid CPU/GPU systems with the development of Uintah's prototype CPU-GPU task scheduler and most recently, Uintah's Unified Multi-threaded heterogeneous task scheduler and runtime system. Much of this work has been in preparation for using Uintah to solve large-scale energy related problems for the DOE NETL project using the entire Titan system with GPUs at Oak Ridge National Laboratory. He has considerable experience with heterogeneous systems and has done work with Uintah on TitanDev and NSF Keeneland. Much of Alan's past research has been focused on formal verification of concurrent systems, specifically the Message Passing Interface (MPI) and dynamic verification tools like In-situ Partial Order (University of Utah) - and its integration within the Eclipse Parallel Tools Platform (PTP). Alan has also been involved with the Eclipse PTP project since 2009.

This presentation will discuss how directed acyclic graph (DAG) approaches provide a powerful abstraction for solving challenging engineering problems and how using this abstraction and DAG approach, computational frameworks such as Uintah can be extended with relative ease to efficiently leverage GPUs, even at scale. Attendees will learn how frameworks like Uintah are able to shield the application developer from the complexities of the deep memory hierarchies and multiple levels of parallelism found in heterogeneous supercomputers such as Titan.

Session Level: Beginner
Session Type: Talk
Tags: Supercomputing; Computational Fluid Dynamics

Day: Tuesday, 03/25
Time: 17:30 - 17:55
Location: Room LL21A

S4186 - Optimizing a LBM code for Compute Clusters with Kepler GPUs

Jiri Kraus ( Developer Technology Software Engineer, NVIDIA )
Highly-Rated Speaker
Jiri Kraus
Jiri Kraus is a developer in NVIDIA's European Developer Technology team. As a consultant for GPU HPC applications at the NVIDIA Jülich Applications Lab, Jiri collaborates with local developers and scientists at the Jülich Supercomputing Centre and the Forschungszentrum Jülich. Before joining NVIDIA Jiri worked on the parallelization and optimization of scientific and technical applications for clusters of multicore CPUs and GPUs at Fraunhofer SCAI in St. Augustin. He holds a Diploma in Mathematics from the University of Cologne, Germany.

To fully utilize a GPU Cluster the single GPU code as well as the inter GPU communication needs to be efficient. In this session a LBM code applying a D2Q37 model is used as a case study to explain by example how both targets can be met. The compute intensive collide kernel of the LBM code is optimized for Kepler specifically considering the large amount of state needed per thread due to the complex D2Q37 model. To gain efficient inter GPU communication CUDA-aware MPI was used. We explain how this was done and present performance results on a Infiniband Cluster with GPUDirect RDMA.

Session Level: Intermediate
Session Type: Talk
Tags: Supercomputing; Computational Fluid Dynamics

Day: Wednesday, 03/26
Time: 09:30 - 09:55
Location: Room LL20C

S4285 - Scaling Soft Matter Physics to Thousands of GPUs in Parallel

Alan Gray ( Research Architect, EPCC, The University of Edinburgh )
Alan Gray
Dr Alan Gray was awarded a Ph.D. at The University of Glasgow in Theoretical Particle Physics in 2003, winning the 2004 Ogden Prize for the best UK thesis in particle physics phenomenology. He furthered this work under a fellowship at The Ohio State University, and since joining EPCC in 2005 he has been involved with a wide range of HPC-related projects: lately his research has focused on massively parallel GPU-accelerated supercomputing, making significant contributions to several scientific areas including condensed matter physics, genetics, and musical acoustics. He has authored a large number of refereed and highly-cited publications.

Discover how to adapt a real, complex application such that it can efficiently utilize thousands of GPUs in parallel. We describe our successes in combining CUDA with MPI to simulate a wide variety of complex fluids of key importance to everyday life. We are careful to present our work in a generalizable way, such that others can learn from our experience, follow our methodology and even re-use our highly efficient communication library. We detail our efforts to maximize both performance and maintainability, noting that we support both CPU and GPU versions (where the latter is 3.5-5 times faster comparing equal numbers of GPUs and fully-utilized CPUs). We present our work to carefully schedule and overlap lattice based operations and halo-exchange communication mechanisms, allowing excellent scaling to at least 8,192 GPUs in parallel on the Titan supercomputer.

Session Level: Intermediate
Session Type: Talk
Tags: Supercomputing; Computational Fluid Dynamics; Computational Physics

Day: Wednesday, 03/26
Time: 10:00 - 10:25
Location: Room LL20C

S4495 - Plasma Turbulence Simulations: Porting Gyrokinetic Tokamak Solver to GPU Using CUDA

Praveen Narayanan ( Applied Engineer, NVIDIA )
Praveen Narayanan
Praveen currently works in the Applied Engineering group at NVIDIA. His roles include porting and benchmarking GPU solvers and creating demos for trade shows. After working on combustion and fire related problems using perturbation theory and computation in graduate school, the author transitioned to benchmarking and performance analysis of HPC fusion codes during his postdoc stint, before taking up GPU computing at NVIDIA.

The porting process of a large scale Particle-In-Cell Solver (GTS) to the GPU using CUDA is described. We present weak scaling results run at scale on Titan which show a speed up of 3-4x for the entire solver. Starting from a performance analysis of computational kernels, we systematically proceed to eliminating the most significant bottlenecks in the code - in this case, the PUSH step, which constitutes the 'gather' portion of the gather-scatter algorithm that characterizes this PIC code. Points that we think might be instructive to developers include: (1) using the PGI CUDA Fortran infrastructure to interface between CUDA C and Fortran; (2) memory optimizations - creation of a device memory pool, and pinned memory; (3) a demonstration of how communication causes performance degradation at scale, with implications on shifter performance in general PIC solvers, and why we need algorithms that handle communication in particle shifters more effectively; (4) Use of textures and LDG for irregular memory accesses.

Session Level: Beginner
Session Type: Talk
Tags: Computational Physics; Computational Fluid Dynamics; Supercomputing

Day: Wednesday, 03/26
Time: 15:00 - 15:25
Location: Room 212A

S4669 - Supercharging Engineering Simulations at Mercury Marine with NVIDIA GPUs

Arden Anderson ( Technical Specialist - Computational Analysis, Mercury Marine )
Arden Anderson
Arden Anderson is responsible for structural analysis, crashworthiness, and vessel performance simulations at Mercury Marine. He also determines computing hardware requirements and has helped Mercury Marine transition to High Performance Computing (HPC). Prior to joining Mercury Marine in 2005, Mr. Anderson spent three years as an Engineering Analyst at Lawrence Livermore National Laboratory simulating blast loading and hypervelocity impact. At LLNL he was exposed to world class HPC environments, including the fastest computer in the world at that time (BlueGene/L). Mr. Anderson holds a BS and MS in Engineering Mechanics from the University of Wisconsin – Madison.

Mercury Marine will discuss their recent evaluation of NVIDIA GPU's for accelerating performance for Abaqus FEA. As part of the talk, Arden will highlight the critical metrics for the evaluation, and how they chose between having the GPU's at the local desktop or installed in the back room cluster. Arden will also discuss the business impact for the company from using a GPU-accelerated FEA implementation. Lastly, Arden will discuss what Mercury sees as future potential for leveraging GPU's as part of their design workflow.

Session Level: All
Session Type: Talk
Tags: Digital Manufacturing Summit; Computational Structural Mechanics; Clusters & GPU Management; Computational Fluid Dynamics

Day: Wednesday, 03/26
Time: 15:30 - 15:55
Location: Room 210H

S4371 - AMR-Based on Space-Filling Curve for Stencil Applications

Takayuki Aoki ( Professor, Tokyo Institute of Technology )
Takayuki Aoki received a BSc in Applied Physics, an MSc in Energy Science and Dr.Sci (1989) from Tokyo Institute of Technology, has been a professor in Tokyo Institute of Technology since 2001 and the deputy director of the Global Scientific Information and Computing Center since 2009. He received the Minister award of the Ministry of Education, Culture, Sports, Science & Technology in Japan and many awards and honors in GPU computing, scientific visualization, and others. He was the leader of the team of the Gordon Bell Prize in 2011 and also recognized as a CUDA fellow by NVIDIA in 2012.

AMR (Adaptive Mesh Refinement) is an efficient method capable to assign a mesh with a proper resolution to any local areas. It has great advantages from the view point of computational cost and memory usage for practical stencil applications such as computational fluid dynamics. According to the octree data structure, the refinement process is recursive and the computation is carried out on the leaf meshes. By using bigger leaves than those of CPU, we can assign a CUDA block to a leaf with enough thread numbers. We show a GPU implementation in which the leaves are connected by the Hilbert space-filling curve and discuss the overhead of the data management.

Session Level: Intermediate
Session Type: Talk
Tags: Numerical Algorithms & Libraries; Computational Fluid Dynamics; Supercomputing

Day: Wednesday, 03/26
Time: 16:00 - 16:25
Location: Room LL20D

S4518 - Accelerating Dissipative Particle Dynamics Simulation on Kepler: Algorithm, Numerics and Application

Yu-Hang Tang ( Ph.D. Student, Brown University )
Yu-Hang Tang
Yu-Hang Tang is a Ph.D. student in the Division of Applied Mathematics at Brown University. He got his bachelor's degree in Polymer Science at Zhejiang University, China. Following one year of study at the Center for Biophysics and Computational Biology at University of Illinois at Urbana-Champaign, he started his Ph.D. research in applied mathematics at Brown University. His current interests are various particle-based simulation techniques including molecular dynamics, dissipative particle dynamics and smooth particle hydrodynamics. He is also devoted to the development of massively parallel algorithms.

The talk focuses on the implementation of a highly optimized dissipative particle dynamics (DPD) simulation code in CUDA, which achieves 20 times speedup on a single Kepler GPU over 12 Ivy-Bridge cores. We will introduce a new pair searching algorithm that is parallel, deterministic, capable of generating strictly ordered neighbor list and atomics-free. Such neighbor list leads to optimal memory efficiency when combined with proper particle reordering schemes. We also propose an in-situ generation scheme for Gaussian random numbers that has a better performance without losing quality. In addition, details will be given on how to design custom transcendental functions that fit specifically to our DPD functional form. The code is scalable and can run on over a thousand nodes on the Titan supercomputer. Demonstration of large-scale DPD simulations on vesicle assembly and red blood cell suspension hydrodynamics using our code will be given.

Session Level: Intermediate
Session Type: Talk
Tags: Molecular Dynamics; Computational Fluid Dynamics; Supercomputing; Numerical Algorithms & Libraries

Day: Wednesday, 03/26
Time: 16:30 - 16:55
Location: Room LL21E

S4274 - High Resolution Astrophysical Fluid Dynamics Simulations on a GPU Cluster

Pierre Kestener ( Research Engineer, CEA )
Pierre Kestener
Research engineer at "Maison de la Simulation", a publicly funded research and service unit in high performance computing located in the Paris-Saclay campus.

A wide range of major astrophysical problems can be investigated by means of computational fluid dynamics methods, and performing numerical simulations of Magneto-Hydrodynamics (MHD) flows using realistic setup parameters can be very challenging. We will first report on technical expertise gained in developing code Ramses-GPU designed for efficient use of large cluster of GPUs in solving MHD flows. We will illustrate how challenging state-of-the-art highly resolved simulations requiring hundreds of GPUs can provide new insights into real case applications: (1) the study of the Magneto-Rotational Instability and (2) high Mach number MHD turbulent flows.

Session Level: Beginner
Session Type: Talk
Tags: Astronomy & Astrophysics; Computational Fluid Dynamics; Supercomputing

Day: Wednesday, 03/26
Time: 17:00 - 17:25
Location: Room LL21F

S4329 - Speeding-up NWChem on Heterogeneous Clusters

Antonino Tumeo ( Research Scientist, Pacific Northwest National Laboratory )
Antonino Tumeo
Antonino Tumeo received the M.S. degree in Informatic Engineering, in 2005, and the Ph.D. degree in Computer Engineering, in 2009, from Politecnico di Milano in Italy. Since February 2011, he has been a research scientist at Pacific Northwest National Laboratory (PNNL). He joined PNNL in 2009 as a post doctoral research associate. Previously, he was a post doctoral researcher at Politecnico di Milano. His research interests are modeling and simulation of high performance architectures, hardware-software codesign, power/performance characterization of high performance embedded systems, FPGA prototyping and GPGPU computing.

Learn the approaches that we implemented to accelerate NWChem, one of the flagship high performance computational chemistry tools, on heterogeneous supercomputers. In this talk we will discuss the new domain specific code generator, the auto-tuners for the tensor contractions, and the related optimizations that enable acceleration of the Coupled-Cluster methods module for single- and multi-reference formulations of NWChem.

Session Level: Intermediate
Session Type: Talk
Tags: Quantum Chemistry; Computational Fluid Dynamics; Supercomputing; Clusters & GPU Management

Day: Wednesday, 03/26
Time: 17:00 - 17:25
Location: Room LL21E

S4672 - Acceleration of Multi-Grid Linear Solver Inside ANSYS FLUENT Using AmgX

Sunil Sathe ( Senior Software Developer, ANSYS Inc )
Sunil Sathe
Sunil Sathe is a senior software developer in ANSYS Inc where he works on the high performance computing aspect of ANSYS FLUENT flow solver. He focuses mainly on improving the performance and scalability of the ANSYS FLUENT solver. Prior to joining ANSYS Inc., Sunil was a research scientist at Rice University where he developed finite element methods for simulation of fluid-structure interactions. Sunil has a Ph.D. in Mechanical Engineering from Rice University.

The solution of the linear equation systems arising from discretization of flow equations can be a major time consuming portion of a flow simulation. In the context of ANSYS FLUENT flow solver, especially when using the coupled solver, the linear solver takes a major chunk of the simulation time. In order to improve performance and also to let user take advantage of the available GPU hardware, we provide a mechanism in ANSYS FLUENT to off load the linear solver on to a GPU using NVIDIA's multi-grid AMG solver . In this talk we present a top level view of the architectural design of integrating the AmgX solver into ANSYS FLUENT. We also present some preliminary performance results obtained from our first offering of AmgX inside ANSYS FLUENT release 15.0.

Session Level: Beginner
Session Type: Talk
Tags: Computational Fluid Dynamics

Day: Thursday, 03/27
Time: 09:00 - 09:25
Location: Room LL20B

S4269 - Interactive Sandbox: Modelling and Visualization of Nature Phenomena on Hand-Made Landscapes

Maxim Rud ( Student, Tomsk Polytechnic University )
Maxim Rud
Maxim Rud is fourth-year student studying mechatronics and robotics in Tomsk Polytechnic University. Since the first year of education, he has taken part in robotics competitions, programming robots to perform different tasks. When he was second-grade student, he started working on project called "Interactive Sandbox". His work is connected with fluid dynamics simulation, graphics rendering and GPU parallel programming. Maxim also held several internships in Europe and presented results of his project at different conferences and startup events in Russia and USA.

Create your own world and use the power of GPU programming to visualize it. Build a unique landscape by your hands with help of device called "Interactive sandbox" and study real-time modelled and realistically visualized natural phenomena, such as volcanic eruptions, floods, weather and seasons changing. You will learn about using GPUs to increase performance of modelling and visualization, find out, how to implement real-time simulation of fluid flow over varying bottom topography and also discover an efficient and fast method of Microsoft Kinect data filtering.

Session Level: Beginner
Session Type: Talk
Tags: Combined Simulation & Real-Time Visualization; Real-Time Graphics Applications; Computational Fluid Dynamics; Virtual & Augmented Reality

Day: Thursday, 03/27
Time: 09:30 - 09:55
Location: Room 211A

S4649 - PyFR: Technical Challenges of Bringing Next Generation Computational Fluid Dynamics to GPU Platforms

Freddie Witherden ( Ph.D. Student, Imperial College London )
Freddie Witherden
Freddie Witherden studied Physics with Theoretical Physics at Imperial College London between 2008–2012 earning an MSci degree with first class honours. His masters thesis was on the development of a parallel Barnes-Hut type treecode for simulating laser-plasma interactions. Currently, he is a PhD candidate in the department of Aeronautics at Imperial College London under the supervision of Dr Peter Vincent. Outside of academia Freddie is the chief technology officer of the news analytics firm Newsflo Ltd. He also has a keen interest in digital forensics being the primary author of the libforensic1394 library.

Learn how to develop efficient highly-scalable GPU codes faster through use of the Python programming language. In this talk I will describe our accelerated massively parallel computational fluid dynamics (CFD) code, PyFR, and outline some of the techniques employed to reduce development time and enhance performance. Specifically, it will be shown how even complex algorithms – such as those employed for performing CFD on unstructured grids – can be constructed in terms of efficient matrix-matrix multiplications. Moreover, general advice will be given on how best to integrate CUDA and MPI. Furthermore, I will demonstrate how Python can be used both to simplify development and bring techniques such as run-time kernel generation to the mainstream. Examples of these techniques, as utilized in PyFR, will be given throughout.

Session Level: Intermediate
Session Type: Talk
Tags: Computational Fluid Dynamics; Numerical Algorithms & Libraries; Supercomputing

Day: Thursday, 03/27
Time: 09:30 - 09:55
Location: Room LL20B

S4207 - PARALUTION: A Library for Iterative Sparse Methods on Multi-core CPUs and GPUs

Dimitar Lukarski ( Post-Doctoral Researcher, Uppsala University, Sweden )
Highly-Rated Speaker
Dimitar Lukarski
Dimitar Lukarski holds a post-doc research position at the Department of Information Technology, Uppsala Universitet, Sweden. He works on interdisciplinary topics in the area of parallel numerical methods and emerging hardware such as GPUs and multi-core CPUs. His research focus is mainly on robust and fine-grained parallel sparse solvers and preconditioners. Dimitar received his Bachelor's degree from Technical University of Sofia / Bulgaria, Master's degree from Technical University of Karlsruhe / Germany, and doctoral degree from Karlsruhe Institute of Technology (KIT) / Germany in 2006, 2008 and 2012, respectively.

Dive deep into the sparse iterative solvers on GPUs without touching CUDA, with advanced preconditioning techniques and full portability of your program towards CPUs! Learn, how the PARALUTION library is able to handle these features! The library provides various Krylov subspace and algebraic/geometric multigrid solvers, including ILU and approximate inverse type of preconditioners/smoothers. You will investigate the design of the library in details, learn about its key techniques for fine-grained parallelism and additionally take note on the latest performance benchmarks on multi-core CPU, GPU and Xeon Phi. Source code examples will be presented to show the ease of use. Finally, the talk will give insight on how to directly integrate PARALUTION into your application using the C++ API or with the supplied plug-ins for FORTRAN, Deal.II, OpenFOAM, Elmer and Agros2D.

Session Level: Beginner
Session Type: Talk
Tags: Numerical Algorithms & Libraries; Supercomputing; Computational Fluid Dynamics

Day: Thursday, 03/27
Time: 10:00 - 10:25
Location: Room LL21D

S4565 - Weather Prediction Code Witten by a High-productivity Framework for Multi-GPU Computing

Takashi Shimokawabe ( Assistant Professor, Tokyo Institute of Technology )
Takashi Shimokawabe
Takashi Shimokawabe is currently an assistant professor at the Global Scientific Information and Computing Center (GSIC), Tokyo Institute of Technology (Tokyo Tech). I am a member of the Aoki laboratory in GSIC. My primary research interests are general-purpose computing on graphics processing units (GPGPU), computational fluid dynamics, and high performance computing. Our group was awarded the 2011 Gordon Bell Prize Special Achievements in Scalability and Time-to-Solution for peta-scale phase-field simulations (T. Shimokawabe et al.) I received Ph.D. in Energy Science from Tokyo Tech in 2012. I graduated with M.S. in Physics from Tokyo Tech in 2007.

Numerical weather prediction is one of the major applications in high-performance computing and is accelerated on GPU supercomputers. Obtaining good parallel efficiency using more than thousand GPUs often requires skillful programming, for example, both MPI for the inter-node communication and NVIDIA GPUDirect for the intra-node communication. The Japan Meteorological Agency is developing a next-generation high-resolution meso-scale weather prediction code ASUCA. We are implementing it on a multi-GPU platform by using a high-productivity framework for mesh-based application. Our framework automatically translates user-written functions that update a grid point and generates both GPU and CPU codes. The framework can also hide the complicated implementation for the efficient communications described above. In this presentation, we will show the implementation of the weather prediction code by using this framework and the performance evaluation on the TSUBAME 2.5 supercomputer at Tokyo Institute of Technology.

Session Level: Intermediate
Session Type: Talk
Tags: Climate, Weather, Ocean Modeling; Computational Fluid Dynamics; Supercomputing; Recommended Press Session – HPC-Science

Day: Thursday, 03/27
Time: 10:00 - 10:25
Location: Room 212B

S4762 - Simulation Really Does Imitate Life: Modeling a Human Heart Valve and other FSI Applications with GPU Technology

Wayne Mindle ( Director of Sales & Marketing, CertaSIM, LLC )
Dr. Mindle is currently the Director of Sales & Marketing at CertaSIM, LLC, the US and Canadian distributor of the IMPETUS Afea Solver. In addition he is the Benchmark Manager for IMEPTUS Afea. He obtained his Ph.D. from Northwestern University, in the area of Applied Mechanics, more specifically Finite Element Analysis as applied to the area of Nonlinear Explicit Transient Dynamic Problems. He has worked for several major aerospace companies, a consulting company for the FAA and prior to his association with CertaSIM, spent 15 years at LSTC as the lead technical sales engineer.

Fluid Structure interaction is one of the most challenging areas for numerical simulations. By itself modeling Fluid flow is complicated enough but to add the interaction with a deformable structure makes it even more challenging. One particular theory, SPH, is especially suited for GPU processing. SPH stands for Smooth Particle Hydrodynamics and it is a particle based Lagrangian continuum method which can run completely on the GPU. Improvements in the classic SPH Solver has led to an extremely accurate and robust solver that can better capture the pressure field for violent water impacts. FSI means fluid structure interaction and so to solve complicated problems an equally robust and accurate finite element solver needs to be part of the coupled solution. One particular application is modelling a Real Human Heart Valve, something that has not been done until now. Results using the latest NVIDIA GPU, the K40, will be shown for Heart Valve model along with other FSI applications.

Session Level: Intermediate
Session Type: Talk
Tags: Computational Fluid Dynamics; Computational Structural Mechanics

Day: Thursday, 03/27
Time: 10:00 - 10:25
Location: Room LL20B

S4594 - Unstructured Grid CFD Kernels for Gas Turbine Design

Tobias Brandvik ( Research Fellow, University of Cambridge )
Tobias Brandvik is a Research Fellow at the Whittle Laboratory. He obtained his MEng degree in 2007 and his PhD in 2012, both from the University of Cambridge. Tobias's current research is focused on how to best use emerging multi-core architectures for scientific computing. This includes both creating tools to ease the porting of legacy applications, and to investigate the possibilities offered by the greater computational power of multi-core processors in real design settings.

Learn about a new approach to developing large-scale Computational Fluid Dynamics (CFD) software for parallel processors such as GPUs. The session focuses on two topics: (1) the use of automatic source code generation for CFD kernels on unstructured grids to achieve close to optimal performance while maintaining code readability, and (2) case studies of advanced gas turbine simulations on clusters with 100s of GPUs.

Session Level: Beginner
Session Type: Talk
Tags: Computational Fluid Dynamics; Supercomputing; Computer Aided Design

Day: Thursday, 03/27
Time: 14:00 - 14:25
Location: Room LL20B

S4544 - Harnessing GPUs to Overcome Conventional Fluid-Particle Interaction Simulation Limitations

Adam Sierakowski ( Ph.D. Student, The Johns Hopkins University )
Adam Sierakowski
I graduated with honors from The Johns Hopkins University in 2010 with a B.S. in Mechanical Engineering, and Aerospace Engineering concentration, and a Mathematics Minor. I learned about scientific computing and GPUs through a series of summer internships at The Johns Hopkins University Applied Physics Laboratory during my undergraduate studies. I am currently working on my Ph.D. under Professor Andrea Prosperetti in the Mechanical Engineering Department at The Johns Hopkins University focusing on computationally simulating large-scale fluid-particle interactions. In my time away from science, I am a nationally-ranked triathlete and enjoy teaching others how to swim, cycle, and run.

Are you interested in decreasing the runtime of your 24-hour flow simulation to nine minutes? This is the story of how GPUs achieved a 150-time speedup and made Physalis into a viable computational tool for investigating the behavior of large fluid-particle systems. The Physalis method is the only known means of applying near-perfect boundary conditions to spherical particles in a coarse Cartesian finite-difference flow solver, but it suffers from a debilitating computational requirement. GPU technology enables us to overcome this limitation so we can investigate the underlying physics behind natural phenomena like dust storms and energy-generation technologies such as fluidized bed reactors. We will discuss concepts of the design of a GPU finite-difference incompressible Navier-Stokes flow solver, introduce the algorithm behind the Physalis method, and evaluate the current and future capabilities of this GPU fluid-particle interaction code.

Session Level: Beginner
Session Type: Talk
Tags: Computational Fluid Dynamics; Numerical Algorithms & Libraries

Day: Thursday, 03/27
Time: 14:30 - 14:55
Location: Room LL20B

S4881 - Challenges Simulating Real Fuel Combustion Kinetics: The Role of GPUs

Matthew McNenly ( Staff Researcher, Lawrence Livermore National Laboratory )
Matthew McNenly
Matthew McNenly is a staff research at Lawrence Livermore National Laboratory. He currently is the principle investigator of a project funded under the DOE Vehicle Technologies Program (Advanced Combustion Engines subprogram, project ACE-076). The focus of his project is to develop the next generation of combustion algorithms that will bring predictive simulation to engine designers in industry. He received a B.S.E in Aerospace Engineering from the University of Michigan. After his undergraduate degree, he spent a year at the General Motors R&D Center in Warren, MI in their basic research wind tunnel. He returned to the University of Michigan for graduate studies in Aerospace Engineering funded by the DOE Computational Science Graduate Fellowship. He earned a M.S. in applied mathematics and Ph.D. in Aerospace engineering focusing on new algorithms to accelerate microscale gas flow simulations. His research interests include micro/nanoscale transport phenomena, low pressure material processing, chemically reacting fluid dynamics and numerical methods.

There is a growing need in internal combustion (IC) engine design to resolve the complicated combustion kinetics in simulations. Without more predictive simulation tools in the design cycle, the cost of development will consume new concepts as it becomes harder to meet the performance and emission targets of the future. The combustion kinetics of real transportation fuels involve thousands of components – each that can react through thousands of intermediate species and tens of thousands of reaction paths. GPUs show promise delivering more physical accuracy (per $) to the IC engine design process. Specifically, GPU acceleration of nearly a factor of ten is demonstrated for the integration of multiple chemical source terms in a reacting fluid dynamics simulation. This speedup is achieved by reorganizing the thermodynamics and chemical reaction functions and by updating the sparse matrix functions using NVIDIA's latest GLU library.

Session Level: Intermediate
Session Type: Talk
Tags: Computational Fluid Dynamics; Automotive

Day: Thursday, 03/27
Time: 15:00 - 15:25
Location: Room LL20B

S4417 - Quickly Applying GPU Acceleration to Barracuda: An MP-PIC CAE Software

Andrew Larson ( Software Developer, CPFD Software )
Andrew Larson
Andrew studied computer science and mathematics as both an undergraduate and graduate student. After teaching Computer Science for 2 years in Viet Nam and Mathematics for 1 year in Iowa, he joined CPFD Software to continue specializing in GPU acceleration. First exposed to CUDA as a graduate student tasked with accelerating a Eulerian solver in the QUIC-URB code out of Los Alamos, he now intensely digs through the source of Barracuda VR, reducing execution times with GPU magic!

Learn about the challenges and possibilities of applying CUDA to a Multi-Phase Particle-In-Cell code base through (1) An applied approach to parallelizing Barracuda VR, a CAE MP-PIC code, (2) Achieved speed-ups of operation types specific to MP-PIC codes (in double-precision), (3) Focused discussion on the crux of MP-PIC, i.e. mapping Lagrangian data to the Eulerian grid and (4) Demonstrated speed-up and future expectations.

Session Level: Intermediate
Session Type: Talk
Tags: Computational Fluid Dynamics

Day: Thursday, 03/27
Time: 15:30 - 15:55
Location: Room LL20B

S4117 - Fast Fixed-Radius Nearest Neighbor Search on the GPU: Interactive Million-Particle Fluids

Rama Hoetzlein ( Graphics Devtech, NVIDIA )
Rama Hoetzlein
Rama Hoetzlein is a graphics research scientist working in the areas physical simulation, procedural animation, and scientific visualization, focusing on methods that utilize GPU-based computation. In January 2013, he started at NVIDIA as a Graphics Devtech.

Nearest neighbor search is the key to efficient simulation of many discrete physical models. This talk focuses on a novel, efficient fixed-radius NNS by introducing counting sort accelerated with atomic GPU operations which require only two kernel calls. As a sample application, fluid simulations based on smooth particles hydrodynamics (SPH) make use of NNS to determine interacting fluid particles. The Counting-sort NNS method achieves a performance gain of 3-5x over previous Radix-sort NNS, which allows for interactive SPH fluids of 4 million particles at 4 fps on current hardware. The technique presented is generic and easily adapted to other domains, such as molecular interactions or point cloud reconstructions.

Session Level: Advanced
Session Type: Talk
Tags: Computational Fluid Dynamics; Molecular Dynamics; Numerical Algorithms & Libraries; Performance Optimization

Day: Thursday, 03/27
Time: 16:00 - 16:25
Location: Room LL20B

S4418 - Faster Kinetics: Accelerate Your Finite-Rate Combustion Simulation with GPUs

Christopher Stone ( Owner, Computational Science and Engineering, LLC )
Dr. Christopher Stone received his PhD from Georgia Tech in 2003 and has been the owner of Computational Science and Engineering, LLC since 2006. His professional research and development interests include combustion modeling, computational fluid dynamics (CFD), iterative methods for sparse linear systems, and numerical integration methods. He has been developing parallel GPU algorithms in CUDA since 2008.

Explore the latest techniques for accelerating combustion simulations with finite-rate chemical kinetics using GPUs. In this session we will compare the performance of different numerical methods for solving stiff and non-stiff ODEs and discuss the compromises that must be made between parallel throughput and numerical efficiency. Learn techniques used to (1) manage variable integration costs across the concurrent ODEs and (2) reduce thread divergence caused by non-linear iterative solvers.

Session Level: Advanced
Session Type: Talk
Tags: Computational Fluid Dynamics; Numerical Algorithms & Libraries; Supercomputing; Computational Physics

Day: Thursday, 03/27
Time: 16:30 - 16:55
Location: Room LL20B

Talk
 

SPECIAL EVENT

Presentation
Details

S4952 - Hangout: Top 5 Poster Presenters

Marcin Thrust ( Postdoctoral Fellow, University of Montreal )
David Han ( Ph.D. Candidate, University of Toronto )
Thouis Jones ( Senior Scientist, Harvard University, School of Engineering and Applied Sciences )

Session Level: All
Session Type: Special Event
Tags: Computational Fluid Dynamics; Performance Optimization; Programming Languages & Compilers

Day: Tuesday, 03/25
Time: 13:00 - 14:00
Location: Concourse Pod A

Special event