Sign In
GTC Logo
GPU
Technology
Conference

March 24-27, 2014 | San Jose, California
Slidecasts of GTC sessions are available now for conference registrants – please “Sign In” to view.
PDFs of presentation slides will be available by mid-April. Registrants must login to view slidecasts and PDFs.
For non-registrants, this GTC content will be available at the end of April on GTC On Demand.

GPU Technology Conference Schedule Planner

Print
 
  • List View
  • Calender View

 
Refine:
  • Session Levels:
  • |
  • |
  • |
  • |
  • Session Levels:
  •  
  •  
  •  
  •  
  • = Highly Rated Speaker

KEYNOTE

Presentation
Details

S4736 - Opening Keynote

Jen-Hsun Huang ( Co-Founder, President and CEO, NVIDIA )
Jen-Hsun Huang
Jen-Hsun Huang co-founded NVIDIA in 1993 and has served since its inception as president, chief executive officer and a member of the board of directors. Under his leadership, NVIDIA invented the graphics processing unit (GPU) in 1999. Since then, it has consistently set new standards in visual computing with breathtaking, interactive graphics available on devices ranging from smartphones and tablets to notebooks and workstations. NVIDIA's expertise in programmable GPUs has led to breakthroughs in parallel processing that make supercomputing inexpensive and widely accessible. The company holds more than 5,000 U.S. patents granted or pending, including ones covering designs and insights fundamental to modern computing. Prior to founding NVIDIA, Huang worked at LSI Logic and Advanced Micro Devices. He holds a BSEE degree from Oregon State University (OSU) and an MSEE degree from Stanford University. He also was awarded an honorary doctorate from OSU.

Don't miss the opening keynote feature Jen-Hsun Huang, Co-Founder, President, and CEO of NVIDIA. Hear about what's next in visual computing, and preview disruptive technologies and exciting demonstrations across industries.

Session Level: All
Session Type: Keynote
Tags: ; Recommended for All Press

Day: Tuesday, 03/25
Time: 09:00 - 10:50
Location: Hall 3

S4884 - Keynote: Using NVIDIA GPUs for Feature Film Production at Pixar

Dirk Van Gelder ( Engineering Lead, Pixar )
Dirk Van Gelder joined Pixar Animation Studios in 1997 as a software engineer for Academy Award® nominated film A Bug's Life and winning short film Geri's Game, working on animation software and the studio's first use of subdivision surfaces. Dirk has worked on software for every Pixar movie since, including the ground-up rewrite of the studio's proprietary animation system Presto. Currently Dirk leads the Character and GPU teams in the Pixar Studio Tools Department.
Danny Nahmias ( Technical Director, Pixar )
Danny is a specialist in 3D computer graphics, virtual reality, computer vision and medical imaging. He has applied these skills in medical imaging, automotive, arts, architecture and currently entertainment industries. Danny is always looking for challenging problems and using his expertise to solve them in creative ways.

This presentation will show how Pixar uses GPU technology to empower artists in the animation and lighting departments. By providing our artists with high-quality, interactive visual feedback, we enable them to spend more time making creative decisions. Animators interactively pose characters in order to create a performance. When features like displacement, fur, and shadows become critical for communicating the story, it is vital to be able to represent these visual elements in motion at interactive frame rates. We will show Presto, Pixar's proprietary animation system, which uses GPU acceleration to deliver real-time feedback during the character animation process, using examples from Pixar's recent films. Lighting artists place and adjust virtual lights to create the mood and tone of the scene as well as guide the audience's attention. A physically-based illumination model allows these artists to create visually-rich imagery using simpler and more direct controls. We will demonstrate our interactive lighting preview tool, based on this model, built on NVIDIA's OptiX framework, and fully integrated into our new Katana-based production workflow.

Session Level: All
Session Type: Keynote
Tags: Media & Entertainment Summit; Recommended for All Press

Day: Wednesday, 03/26
Time: 11:00 - 12:00
Location: Hall 3

S4780 - Keynote: Video Games and the Future of Cognitive Enhancement

Adam Gazzaley ( Associate Professor, UCSF )
Dr. Adam Gazzaley obtained an M.D. and a Ph.D. in Neuroscience at the Mount Sinai School of Medicine in New York, completed clinical residency in Neurology at the University of Pennsylvania, and postdoctoral training in cognitive neuroscience at UC Berkeley. He is the founding director of the Neuroscience Imaging Center at the UC San Francisco, an Associate Professor in Neurology, Physiology and Psychiatry, and Principal Investigator of a cognitive neuroscience laboratory. His laboratory studies neural mechanisms of perception, attention and memory, with an emphasis on the impact of distraction and multitasking on these abilities. His unique research approach utilizes a powerful combination of human neurophysiological tools, including functional magnetic resonance imaging (fMRI), electroencephalography (EEG) and transcranial stimulation (TES). A major accomplishment of his research has been to expand our understanding of alterations in the aging brain that lead to cognitive decline. His most recent studies explore how we may enhance our cognitive abilities via engagement with custom designed video games, neurofeedback and TES. Dr. Gazzaley has authored over 80 scientific articles, delivered over 300 invited presentations around the world, and his research and perspectives have been consistently profiled in high-impact media, such as The New York Times, New Yorker, Wall Street Journal, TIME, Discover, Wired, PBS, NPR, CNN and NBC Nightly News. Recently, he wrote and hosted the nationally televised, PBS-sponsored special "The Distracted Mind with Dr. Adam Gazzaley". Awards and honors for his research include the Pfizer/AFAR Innovations in Aging Award, the Ellison Foundation New Scholar Award in Aging, and the Harold Brenner Pepinsky Early Career Award in Neurobehavioral Science.

A fundamental challenge of modern society is the development of effective approaches to enhance brain function and cognition in both healthy and impaired individuals. For the healthy, this serves as a core mission of our educational system and for the cognitively impaired this is a critical goal of our medical system. Unfortunately, there are serious and growing concerns about the ability of either system to meet this challenge. I will describe an approach developed in our lab that uses custom-designed video games to achieve meaningful and sustainable cognitive enhancement (e.g., Anguera, et al. Nature 2013), as well the next stage of our research program, which uses video games integrated with technological innovations in software (e.g., brain computer interface algorithms, GPU computing) and hardware (e.g., virtual reality headsets, mobile EEG, transcranial electrical brain stimulation) to create a novel personalized closed loop system. I will share with you a vision of the future in which high-tech is used as an engine to enhance our brain's information processing systems, thus reducing our reliance on non-specific drugs to treat neurological and psychiatric conditions and allowing us to better target our educational efforts.

This keynote will be preceded by naming the winner of the CUDA Center of Excellence Achievement Award, winner for Best Poster, and the new CUDA Fellows, followed by the launch announcement of the Global Impact Award. (Award ceremony duration approximately 15 minutes).

Session Level: All
Session Type: Keynote
Tags: Medical Imaging & Visualization; Video & Image Processing; Recommended for All Press

Day: Thursday, 03/27
Time: 10:30 - 12:00
Location: Hall 3

Keynote
 

HANDS-ON LAB

Presentation
Details

S4802A - Hands-on Lab: Developing GPU-Accelerated Applications with MATLAB

Dan Doherty ( Partner Manager, MathWorks )
Prior to working as Partner Manager, Dan was a Product Manager at MathWorks for over 5 years, focusing on MATLAB and core math and data analysis products. Dan received a B.S.E. and M.S.E. in Mechanical Engineering from the University of New Hampshire, where his research focused on prediction of cutting forces during CNC machining.

Learn how you can use MATLAB to develop GPU-accelerated applications without having to learn the intricacies of GPU architectures or low-level GPU computing libraries. Following a brief introduction to MATLAB you will work through exercises that show: (1) Using GPU-enabled MATLAB functions to accelerate large matrix operations; (2) Minimizing overhead associated with data transfer to the GPU; (3) Integrating CUDA kernels in MATLAB. Be prepared for this hands-on lab by installing the suggested software at bit.ly/gtc14labs on your system.

Session Level: Beginner
Session Type: Hands-on Lab
Tags: Programming Languages & Compilers

Day: Monday, 03/24
Time: 13:00 - 14:20
Location: Room 230B

S4868 - Hands-on Lab: Signal Processing with cuFFT

Jason Cohen ( Software Engineer, Developer Tools, NVIDIA )
Jason Cohen
Jason Cohen develops performance analysis tools for GPU programming. Currently the primary developer of the CUDA profiler in Nsight Visual Studio, he contributes to all layers of software from drivers to user interfaces, and has developed such features in the tools as the NVTX annotation library and kernel-replay profiling. Jason holds a B.S. in Computer Science and a B.S. and M.S. in Electrical and Computer Engineering from Carnegie Mellon University.

This lab will provide a guided example of developing applications using GPU-accelerated FFTs in C/C++. The process begins with prototyping an algorithm in MATLAB. Next, the algorithm is ported directly to C/C++ using CUFFTW first for convenience, and then cuFFT for production-quality performance. Finally, optimization techniques for maximizing GPU usage will be explored. Emphasis will be placed on using CUDA profiling tools to monitor GPU usage, take accurate measurements, and empirically verify all claims about performance at each step. Be prepared for this hands-on lab by installing the suggested software at bit.ly/gtc14labs on your system.

Session Level: Intermediate
Session Type: Hands-on Lab
Tags: Numerical Algorithms & Libraries

Day: Monday, 03/24
Time: 14:30 - 15:50
Location: Room 230A

S4788 - Hands-on Lab: Rapid Multi-GPU Programming with CUDA Libraries

Nikolay Markovskiy ( Compute DevTech Engineer, NVIDIA )
Nikolay is an HPC engineer with experience in scientific research and software development focusing on computational techniques related to physics, chemistry, and biology.

Learn how to use CUDA libraries for quick, high-level programming on multiple GPUs. We will accelerate Octave, using NVBLAS to provide drop-in acceleration on the GPU. We will walk through configuration of the library to run on multiple GPUs. We will then move on to use the extended (XT) library interfaces in cuBLAS and cuFFT, specifically using large matrices support in cuBLAS-XT and single & batch transforms across multiple GPUs using cuFFT-XT. Be prepared for this hands-on lab by installing the suggested software at bit.ly/gtc14labs on your system.

Session Level: Beginner
Session Type: Hands-on Lab
Tags: Numerical Algorithms & Libraries

Day: Tuesday, 03/25
Time: 13:00 - 14:20
Location: Room 230A

S4798 - Hands-on Lab: Getting Started with Parallel Programming

Mark Ebersole ( CUDA Educator, NVIDIA )
Highly-Rated Speaker
As CUDA Educator at NVIDIA, Mark Ebersole teaches developers the benefit of GPU computing using the NVIDIA CUDA parallel computing platform and programming model, and the benefits of GPU computing. With more than ten years of experience as a systems programmer, Mark has spent much of his time at NVIDIA as a GPU systems diagnostics programmer in which he developed a tool to test, debug, validate, and verify GPUs from pre-emulation through bringup and into production. Before joining NVIDIA, he worked at IBM developing Linux drivers for the IBM iSeries server. Mark holds a BS degree in math and computer science from St. Cloud State University.

Come and see how easy it is to get started programming for a massively parallel NVIDIA GPU. We'll explore the three main techniques; "Drop-in" accelerated libraries, directives, and CUDA-enabled languages. In addition you'll get resources and next steps on what to do next. Be prepared for this hands-on lab by installing the suggested software at bit.ly/gtc14labs on your system.

Session Level: Beginner
Session Type: Hands-on Lab
Tags: Programming Languages & Compilers

Day: Tuesday, 03/25
Time: 13:00 - 14:20
Location: Room 230B

S4799 - Hands-on Lab: Introduction to Python Acceleration

Mark Ebersole ( CUDA Educator, NVIDIA )
Highly-Rated Speaker
As CUDA Educator at NVIDIA, Mark Ebersole teaches developers the benefit of GPU computing using the NVIDIA CUDA parallel computing platform and programming model, and the benefits of GPU computing. With more than ten years of experience as a systems programmer, Mark has spent much of his time at NVIDIA as a GPU systems diagnostics programmer in which he developed a tool to test, debug, validate, and verify GPUs from pre-emulation through bringup and into production. Before joining NVIDIA, he worked at IBM developing Linux drivers for the IBM iSeries server. Mark holds a BS degree in math and computer science from St. Cloud State University.

Python is one, if not the, fastest growing language today. There is great community support and many tools available. The ability to quickly iterate on algorithms has made it very popular in the scientific community. In this hands-on tutorial, we'll see how we can get the performance of a compiled language by using Continuum Analytics NumbaPro compiler to accelerate Python code on the GPU. Be prepared for this hands-on lab by installing the suggested software at bit.ly/gtc14labs on your system.

Session Level: Beginner
Session Type: Hands-on Lab
Tags: Programming Languages & Compilers

Day: Tuesday, 03/25
Time: 14:30 - 15:50
Location: Room 230B

S4933 - Hands-on Lab: CUDA Application Development Life Cycle with NVIDIA® Nsight™ Eclipse Edition

Satish Salian ( Sr. Mgr. CUDA Tools and Developer Experience, NVIDIA )
Satish Salian is a Senior Software Engineering Manager responsible for CUDA developer tools, GPU system tools and CUDA developer experience at NVIDIA. He leads the overall strategy, direction and development of the CUDA tools ecosystem and engineering support for CUDA developers. Satish has been part of the NVIDIA team since 2001 and has also been involved in the development of NVIDIA's Graphics and display tools and related NVAPI SDK. Satish received his Bachelor's degree in Computer Engineering from University of Pune, India.

CUDA application development made easy with NVIDIA's Integrated Development Environment on Linux and MAC. Here's your opportunity to go through a step-by-step, hands-on exercise on editing, compiling, debugging and profiling a CUDA application using Nsight™ Eclipse Edition. Be prepared for this hands-on lab by installing the suggested software at bit.ly/gtc14labs on your system.

Session Level: Beginner
Session Type: Hands-on Lab
Tags: Programming Languages & Compilers; Debugging Tools & Techniques

Day: Tuesday, 03/25
Time: 14:30 - 15:50
Location: Room 230A

S4790 - Hands-on Lab: Numerical Integration in CUDA

Carl Ponder ( DevTech Engineer, NVIDIA )
Highly-Rated Speaker
Carl Ponder
Carl is a DevTech Engineer at NVIDIA where he focuses on CUDA application tuning and performance. Carl received his Ph.D. in Computer Science from the University of California, Berkley.

Evaluating integrals is an important part of modelling physical systems. For sufficiently complex systems, integrals as closed-form expressions are difficult to derive or do not exist, so numerical approximation is the method of choice. In this session we will survey methods of Numerical Integration -- Tiling, Monto Carlo and transforms -- and discuss their efficiencies and the characteristics of their approximation error. We will work through some simple hands-on exercises of integrating the Gaussian function, estimating Pi, and measuring the volume of a multidimensional polytope. You will gain some practice writing simple CUDA code and using the cuRand library to generate high-quality random numbers in parallel, which are also applicable to other areas such as randomized simulation. Be prepared for this hands-on lab by installing the suggested software at bit.ly/gtc14labs on your system.

Session Level: Intermediate
Session Type: Hands-on Lab
Tags: Numerical Algorithms & Libraries; Finance

Day: Tuesday, 03/25
Time: 16:00 - 17:20
Location: Room 230A

S4800 - Hands-on Lab: CUDA Fortran: Getting Started

Mathew Colgrove ( Dev Tech Software Engineer, NVIDIA )
Mathew Colgrove is a Dev Tech Software Engineer with NVIDIA's Portland Group team. Mat's primary role is to help users in porting code to accelerators using OpenACC and CUDA Fortran as well as assisting with general programming questions. Prior to his current position, he was Quality Assurance manager responsible for both building and maintaining PGI's proprietary automated testing environments. Mat is also NVIDIA's SPEC representative www.spec.org on the CPU and HPG committees.

This tutorial will cover various aspects of writing code in CUDA Fortran, which is the Fortran interface to the CUDA architecture. Topics covered will include a basic introduction to parallel programming concepts using CUDA, performance measurements and metrics, and some basic optimization techniques. Be prepared for this hands-on lab by installing the suggested software at bit.ly/gtc14labs on your system.

Session Level: Beginner
Session Type: Hands-on Lab
Tags: Programming Languages & Compilers

Day: Tuesday, 03/25
Time: 16:00 - 17:20
Location: Room 230B

S4793 - Hands-on Lab: Image Processing Using NPP

Yang Song ( Senior Software Engineer, NVIDIA )
Yang Song
Yang Song is the technical lead for NVIDIA's NPP library. As technical lead, he is responsible for NPP's overall design and schedule, and he is currently focused on high performance implementations of image codecs. He joined the NPP team originally as an intern in 2010, and returned full-time in 2011. Yang received his Ph.D in Electrical Engineering from University of Arizona in 2011, with a dissertation focused on hardware implementation of an H.264 codec. As a graduate student, he received a Chinese Government Award for Outstanding Student Abroad, and published a number of journal articles leading to technology disclosures through the University of Arizona. He received his MS and BS degrees from Nanjing University of Science and Technology, China.

Learn how to use the NVIDIA Performance Primitives (NPP) Library to solve image and signal processing problems. The workshop covers a simple but complete example for automatic contrast adjustment of an image. Topics covered include the specification of input and output data formats, the data alignment and memory management for high performance, and the flexibility of regions-of-interest for processing. Users will experience the simplicity to instantiate NPP primitives and the efficiency to leverage the GPU power for image processing. Be prepared for this hands-on lab by installing the suggested software at bit.ly/gtc14labs on your system.

Session Level: Beginner
Session Type: Hands-on Lab
Tags: Video & Image Processing

Day: Wednesday, 03/26
Time: 09:00 - 10:20
Location: Room 230A

S4801 - Hands-on Lab: Using Unified Memory in CUDA 6

Mark Ebersole ( CUDA Educator, NVIDIA )
Highly-Rated Speaker
As CUDA Educator at NVIDIA, Mark Ebersole teaches developers the benefit of GPU computing using the NVIDIA CUDA parallel computing platform and programming model, and the benefits of GPU computing. With more than ten years of experience as a systems programmer, Mark has spent much of his time at NVIDIA as a GPU systems diagnostics programmer in which he developed a tool to test, debug, validate, and verify GPUs from pre-emulation through bringup and into production. Before joining NVIDIA, he worked at IBM developing Linux drivers for the IBM iSeries server. Mark holds a BS degree in math and computer science from St. Cloud State University.

Prior to the release of CUDA 6, programmers accelerating C or C++ code on an NVIDIA GPU had to manually deal with memory allocation and synchronization between the CPU and GPU memory spaces. This requirement meant it took longer to get code accelerated on the GPU, and in some cases of complex data structures, made it nearly impossible to do manually. With Unified Memory, the task of memory management can be left to the underlying driver and software - leaving the programmer to concentrate on writing kernels and optimizing code. In this hands-on lab, we'll explore different use cases of Unified Memory and it's benefits. Be prepared for this hands-on lab by installing the suggested software at bit.ly/gtc14labs on your system.

Session Level: Intermediate
Session Type: Hands-on Lab
Tags: Programming Languages & Compilers

Day: Wednesday, 03/26
Time: 09:00 - 10:20
Location: Room 230B

S4791 - Hands-on Lab: Building a Sparse Linear Solver using CUDA Libraries

Sharan Chetlur ( CUDA Software Engineer, NVIDIA )

In this hand-on session, we will construct a Sparse Iterative Solver using CUDA library routines. We will use the standard CUBLAS and CUSPARSE libraries to construct a simple, yet performant Solver without writing any custom CUDA kernels. We will walk through an example of how to set up and use various CUBLAS and CUSPARSE APIs to implement the SSOR (Symmetric Successive Over-Relaxation) algorithm. Be prepared for this hands-on lab by installing the suggested software at bit.ly/gtc14labs on your system.

Session Level: Intermediate
Session Type: Hands-on Lab
Tags: Computational Fluid Dynamics; Computational Physics; Numerical Algorithms & Libraries; Manufacturing

Day: Wednesday, 03/26
Time: 14:00 - 15:20
Location: Room 230A

S4802B - Hands-on Lab: Developing GPU-Accelerated Applications with MATLAB

Dan Doherty ( Partner Manager, MathWorks )
Prior to working as Partner Manager, Dan was a Product Manager at MathWorks for over 5 years, focusing on MATLAB and core math and data analysis products. Dan received a B.S.E. and M.S.E. in Mechanical Engineering from the University of New Hampshire, where his research focused on prediction of cutting forces during CNC machining.

Learn how you can use MATLAB to develop GPU-accelerated applications without having to learn the intricacies of GPU architectures or low-level GPU computing libraries. Following a brief introduction to MATLAB you will work through exercises that show: (1) Using GPU-enabled MATLAB functions to accelerate large matrix operations; (2) Minimizing overhead associated with data transfer to the GPU; (3) Integrating CUDA kernels in MATLAB. Be prepared for this hands-on lab by installing the suggested software at bit.ly/gtc14labs on your system.

Session Level: Beginner
Session Type: Hands-on Lab
Tags: Programming Languages & Compilers

Day: Wednesday, 03/26
Time: 14:00 - 15:20
Location: Room 230B

S4792 - Hands-on Lab: Leveraging Accelerated Core Agorithms Using NVIDIA AmgX

Marat Arsaev ( Systems Software Engineer, NVIDIA )
Marat's expertise include image & video processing and software optimization and acceleration using GPUs. Prior to joining NVIDIA, Marat was a Software Developer at MSU Graphics & Media Lab.Marat received his degree in Computer Science from Moscow State University.

AMGX is a new flexible and easy-to-use NVIDIA GPU-accelerated high performance sparse linear solver library. It features variety of popular solvers as well as user-defined solver configurations like nested solvers or preconditioners. Come and learn how easy is it to use the library in your application, configure the solver and get maximum performance of it. You will also learn how to solve your linear system using multiple GPUs using our library. Be prepared for this hands-on lab by installing the suggested software at bit.ly/gtc14labs on your system.

Session Level: Intermediate
Session Type: Hands-on Lab
Tags: Computational Structural Mechanics; Computational Fluid Dynamics; Computational Physics; Programming Languages & Compilers

Day: Wednesday, 03/26
Time: 15:30 - 16:50
Location: Room 230A

S4794 - Hands-on Lab: Optimizing CUDA Application Performance with Visual Profiler

Sandarbh Jain ( Software Engineer, NVIDIA )
Sandarbh Jain
Sandarbh Jain is an Engineer in the CUDA Developer Tools group at NVIDIA. He is primarily responsible for CUDA performance analysis tools. Sandarbh received his Bachelor's degree in Computer Engineering from Jamia Millia Islamia, India.

This hand-on session takes you through the various steps involved in optimizing your CUDA application. NVIDIA's CUDA Visual profiler, a cross-platform performance profiling tool that delivers developers vital feedback for optimizing CUDA C/C++ applications, will be used on sample application code to dig out the various performance limiters and assist in the fine tuning of the code. Be prepared for this hands-on lab by installing the suggested software at bit.ly/gtc14labs on your system.

Session Level: Intermediate
Session Type: Hands-on Lab
Tags: Performance Optimization

Day: Wednesday, 03/26
Time: 17:00 - 18:20
Location: Room 230A

S4803 - Hands-on Lab: Getting Started with OpenACC

Michael Wolfe ( Compiler Engineer, NVIDIA )
Highly-Rated Speaker
Michael Wolfe has been a compiler engineer at The Portland Group since joining in 1996, where his responsibilities and interests have included deep compiler analysis and optimizations ranging from improving power consumption for embedded microcores to improving the efficiency of Fortran on parallel clusters. He was an associate professor at the Oregon Graduate Institute from 1988 until 1996, and was a co-founder and lead compiler engineer at Kuck and Associates, Inc., prior to that. He earned a PhD in Computer Science from the University of Illinois, and has published one textbook, "High Performance Compilers for Parallel Computing", a monograph, "Optimizing Supercompilers for Supercomputers", and many technical papers.

Learn how to use OpenACC directives to quickly start accelerating your applications. You will learn how to identify your GPU, what language features you can use, the most common directives to insert, how to build your program, and how to run your program. Small sample programs and self-guided exercises will be provided. Be prepared for this hands-on lab by installing the suggested software at bit.ly/gtc14labs on your system.

Session Level: Beginner
Session Type: Hands-on Lab
Tags: Programming Languages & Compilers

Day: Wednesday, 03/26
Time: 17:00 - 18:20
Location: Room 230B

S4795 - Hands-on Lab: Doing Great Things with OpenCV

Kirill Kornyakov ( Senior Software Engineer, Itseez )
Kirill Kornyakov has been a member of the core OpenCV development team for the last four years. He works at Itseez (Nizhny Novgorod, Russia), where he leads the development of the OpenCV library for the Android operating system, with a focus on performance optimization for the NVIDIA Tegra platform. He also works on the implementation of real-time computer vision algorithms, mainly Computational Photography and Advanced Driver Assistance Systems (ADAS). Kirill has B.Sc. and M.Sc. degrees from Nizhny Novgorod State University, Russia.

Computer Vision is developing fast, and finding new applications in such areas as driver assistance, computational photography, augmented reality and many others. OpenCV library (http://opencv.org) allows to rapidly prototype new algorithms, but the real-time performance still remains one of the key challenges. And that's why acceleration technologies like CUDA are becoming crucial, especially on embedded and mobile devices. In this tutorial we will study how Computer Vision applications can be optimized using the CUDA technology. We will consider several examples, and make them work faster using CUDA-optimized primitives. Also, some performance tips for developers will be given. Be prepared for this hands-on lab by installing the suggested software at bit.ly/gtc14labs on your system.

Session Level: Beginner
Session Type: Hands-on Lab
Tags: Machine Learning & AI

Day: Thursday, 03/27
Time: 09:00 - 10:20
Location: Room 230A

S4870 - Hands-on Lab: Getting More Parallelism Out of Multiple GPUs

Justin Luitjens ( Developer Technologies Engineer, NVIDIA )
Highly-Rated Speaker
Justin has been with NVIDIA for 3 years where he has focused on accelerating customer applications.

Multi-GPU systems provide higher performance per dollar than single-GPU systems.  This has led to a large increase in multi-GPU systems. This workshop will teach you the basics of multi-GPU applications. We will start with an application that utilizes a single GPU. Together we will extend this application to work on multiple GPUs efficiently. Topics covered will include dispatching work, communicating between GPUs, avoiding race conditions efficiently, along with other best practices. Be prepared for this hands-on lab by installing the suggested software at bit.ly/gtc14labs on your system.

Session Level: Intermediate
Session Type: Hands-on Lab
Tags: Programming Languages & Compilers

Day: Thursday, 03/27
Time: 09:00 - 10:20
Location: Room 230B

S4796 - Hands-on Lab: Parallel Programming: OpenACC Profiling

Michael Wolfe ( Compiler Engineer, NVIDIA )
Highly-Rated Speaker
Michael Wolfe has been a compiler engineer at The Portland Group since joining in 1996, where his responsibilities and interests have included deep compiler analysis and optimizations ranging from improving power consumption for embedded microcores to improving the efficiency of Fortran on parallel clusters. He was an associate professor at the Oregon Graduate Institute from 1988 until 1996, and was a co-founder and lead compiler engineer at Kuck and Associates, Inc., prior to that. He earned a PhD in Computer Science from the University of Illinois, and has published one textbook, "High Performance Compilers for Parallel Computing", a monograph, "Optimizing Supercompilers for Supercomputers", and many technical papers.

Profile your OpenACC applications using the PGI pgprof profiler, the NVIDIA Visual Profiler, and the NVIDIA compute profiler. Learn how to correlate events in the profile to constructs in your program, to allow you to optimize for better performance. Small sample programs and self-guided exercises will be provided. Be prepared for this hands-on lab by installing the suggested software at bit.ly/gtc14labs on your system.

Session Level: Intermediate
Session Type: Hands-on Lab
Tags: Programming Languages & Compilers

Day: Thursday, 03/27
Time: 14:00 - 15:20
Location: Room 230A

S4871A - Hands-on Lab: Using Logan: Mobile Super Computing

Mark Ebersole ( CUDA Educator, NVIDIA )
Highly-Rated Speaker
As CUDA Educator at NVIDIA, Mark Ebersole teaches developers the benefit of GPU computing using the NVIDIA CUDA parallel computing platform and programming model, and the benefits of GPU computing. With more than ten years of experience as a systems programmer, Mark has spent much of his time at NVIDIA as a GPU systems diagnostics programmer in which he developed a tool to test, debug, validate, and verify GPUs from pre-emulation through bringup and into production. Before joining NVIDIA, he worked at IBM developing Linux drivers for the IBM iSeries server. Mark holds a BS degree in math and computer science from St. Cloud State University.

The Tegra K1 processor brings the main power unit of top supercomputers to the mobile space; a Kepler-based GPU. The ability to program this GPU using the CUDA platform is going to revolutionize the amazing space of mobile processing applications; from face recognition to machine learning in autonomous robots. In this hands-on lab, we'll learn how to access the the developer board with TK1, as well as use the OpenCV library and CUDA-enabled C/C++ to accelerate a computer vision task. Be prepared for this hands-on lab by installing the suggested software at bit.ly/gtc14labs on your system.

Session Level: Intermediate
Session Type: Hands-on Lab
Tags: Programming Languages & Compilers; Mobile Applications

Day: Thursday, 03/27
Time: 14:00 - 15:20
Location: Room 230B

S4797 - Hands-on Lab: Accelerate Your C++ Code with Thrust Library

Maxim Milakov ( Senior DevTech Engineer, NVIDIA )
Maxim spends his time, driving GPU adoption with key application developers, providing support of NVIDIA solutions and technologies (CUDA, OpenACC), ensuring best possible performance of GPU computing applications on current and next-generation architectures; collaborating with the architecture and software teams at NVIDIA to influence the design of next-generation architectures and educating wide range of developers on parallel computing with NVIDIA accelerators. Maxim received his Bachelor Degree from Lomonosov Moscow State University.

Building parallel programs is easy with Thrust's power tools like parallel map, sort, and reduce. This session is a beginner-level tutorial; you will use Thrust's containers and algorithms to create a set of points on a 2D plane and classify them into quadrants. Familiarity with STL containers and algorithms is helpful but not required. By the end of the session you will be able to use basic Thrust algorithms to accelerate your C++ code on GPU, and you will have a solid foundation from which you can learn more advanced techniques. Be prepared for this hands-on lab by installing the suggested software at bit.ly/gtc14labs on your system.

Session Level: Beginner
Session Type: Hands-on Lab
Tags: Programming Languages & Compilers

Day: Thursday, 03/27
Time: 15:30 - 16:50
Location: Room 230A

S4871B - Hands-on Lab: Using Logan: Mobile Super Computing

Mark Ebersole ( CUDA Educator, NVIDIA )
Highly-Rated Speaker
As CUDA Educator at NVIDIA, Mark Ebersole teaches developers the benefit of GPU computing using the NVIDIA CUDA parallel computing platform and programming model, and the benefits of GPU computing. With more than ten years of experience as a systems programmer, Mark has spent much of his time at NVIDIA as a GPU systems diagnostics programmer in which he developed a tool to test, debug, validate, and verify GPUs from pre-emulation through bringup and into production. Before joining NVIDIA, he worked at IBM developing Linux drivers for the IBM iSeries server. Mark holds a BS degree in math and computer science from St. Cloud State University.

The Tegra K1 processor brings the main power unit of top supercomputers to the mobile space; a Kepler-based GPU. The ability to program this GPU using the CUDA platform is going to revolutionize the amazing space of mobile processing applications; from face recognition to machine learning in autonomous robots. In this hands-on lab, we'll learn how to access the the developer board with TK1, as well as use the OpenCV library and CUDA-enabled C/C++ to accelerate a computer vision task. Be prepared for this hands-on lab by installing the suggested software at bit.ly/gtc14labs on your system.

Session Level: Intermediate
Session Type: Hands-on Lab
Tags: Programming Languages & Compilers; Mobile Applications

Day: Thursday, 03/27
Time: 15:30 - 16:50
Location: Room 230B

Hands-on lab
 

TUTORIAL

Presentation
Details

S4165 - CUDA Optimization with NVIDIA® Nsight™ Eclipse Edition: A Case Study

Julien Demouth ( Developer Technology Engineer, NVIDIA )
Highly-Rated Speaker
Julien Demouth
Julien is a Developer Technology Engineer at NVIDIA where he works on the optimization of CUDA applications. Julien has a PhD in Computer Science from the INRIA in France.
Cliff Woolley ( Developer Technology Engineer, NVIDIA )
Highly-Rated Speaker
Cliff Woolley
Cliff Woolley is a senior developer technology engineer with NVIDIA. He received his master's degree in Computer Science from the University of Virginia in 2003, where he was among the earliest academic researchers to explore the use of GPUs for general purpose computing. Today he works with developers of high-performance computing applications to fine-tune their algorithms for the CUDA Platform, and he is one of the lead authors of developer documentation in the CUDA Toolkit for application tuning and best practices.

In this session, we will study a real CUDA application and use NVIDIA® Nsight™ Eclipse Edition on Linux to optimize the performance of the code. The attendees will learn a method to analyze their codes and how to use the tools to apply those ideas.

Session Level: Intermediate
Session Type: Tutorial
Tags: Performance Optimization

Day: Monday, 03/24
Time: 09:00 - 10:20
Location: Room 220B

S4578 - CUDA Debugging with Command Line Tools

Vyas Venkataraman ( Senior Engineer - CUDA tools, NVIDIA )
Highly-Rated Speaker
Vyas Venkataraman
Vyas Venkataraman is a Senior Engineer in the CUDA developer tools group at NVIDIA. He is primarily responsible for CUDA-MEMCHECK, and also works on developer tool support on new GPU architectures. He joined NVIDIA in 2010 from Boston University where he was doing research on abstractions for high level modeling of synthesizable communicating systems. Vyas received his PhD, M.S. and B.S. degrees from the College of Engineering at Boston University.

CUDA debugging tools CUDA-GDB and CUDA-MEMCHECK provide a whole new feature set to help improve your CUDA application development cycle. This session is a detailed walk-through of the key debugger features and advanced techniques on using printf, CUDA-GDB and MEMCHECK together to improve overall code productivity on Linux and MacOS platforms.

Session Level: Intermediate
Session Type: Tutorial
Tags: Debugging Tools & Techniques

Day: Monday, 03/24
Time: 09:00 - 10:20
Location: Room 210D

S4654 - Detailed Overview of NVENC Encoder API

Swagat Mohapatra ( Senior Software Lead, NVIDIA )
Swagat received his Bachelors in Electrical Engineering from IIT in Kaharagpur and joined the NVIDIA Video team in 2006. For the past 3 years he has been working on video encoders. He is responsible for SW encoder SDK , driver and encoder microcode development.
Abhijit Patait ( Sr. Manager, System Software, NVIDIA )
Abhijit Patait
Abhijit Patait has been leading NVIDIA's GPU multimedia team for past 4 years. His team is responsible for supporting the multimedia (audio and video) functionality in the NVIDIA GPU driver for Windows, NVENC SDK and GRID SDK. Prior to NVIDIA, Abhijit held several engineering and management positions working in the areas of baseband signal processing, telecom and VoIP systems design, audio/DSP processing etc. Abhijit holds an MSEE degree from University of Missouri-Rolla and and MBA from Haas School of Business, University of California at Berkeley.

This session gives a detailed overview of the NVENC encoder interface and the video encoding capabilities of current (Kepler) and future (Maxwell) generations of NVIDIA GPUs. We will present how to correctly use the encoder interface to take advantage of the hardware capabilities and software APIs used for encoding. The tutorial will detail steps on how to create HW encoder session asynchronously using the encoder as well as demonstrate how NVENC can be used in various applications such as transcoding, low-latency applications, virtualization and streaming. Additionally, we will also give an overview of some of the new features and recent improvements in Maxwell GPUs, particularly related to performance and quality of the encoder.

Session Level: Beginner
Session Type: Tutorial
Tags: Video & Image Processing; Media & Entertainment

Day: Monday, 03/24
Time: 09:00 - 10:20
Location: Room 211A

S4699 - Part 1: An Introduction to CUDA Programming (Presented by Acceleware)

Chris Mason ( Product Manager , Acceleware Ltd. )
Chris is the Product Manager for Acceleware's GPU accelerated electromagnetic product line. He is responsible for the successful development and launch of Acceleware products used by companies world-wide. Chris has 9 years of experience in developing commercial applications for the GPU and has delivered over 20 CUDA courses to students in a diverse range of industries. His previous experience also includes parallelization of algorithms on digital signal processors (DSPs) for cellular phones and base stations. Chris has a Masters in Electrical Engineering from Stanford University.

Join us for an informative introduction to CUDA programming. The tutorial will begin with a brief overview of CUDA and data-parallelism before focusing on the GPU programming model. We will explore the fundamentals of GPU kernels, host and device responsibilities, CUDA syntax and thread hierarchy. A programming demonstration of a simple CUDA kernel will be provided.

Session Level: Beginner
Session Type: Tutorial
Tags: Programming Languages & Compilers

Day: Monday, 03/24
Time: 09:00 - 10:20
Location: Room 220C

S4710 - Session 1: Introduction to Productive GPU Programming (Presented by ArrayFire)

Umar Arshad ( Senior Software Engineer, CUDA Training Specialist, ArrayFire )
Umar Arshad is an engineer at ArrayFire where he primarily works on improving concurrency in ArrayFire and applications using ArrayFire. He also created the CUDA and OpenCL Optimization training material and regularly gives tutorials throughout the country. Before joining ArrayFire Umar was a developer at Inovalon where he was involved with improving performance and designing large scale applications. Umar has graduated from Georgia State University with a Masters degree in Computer Science. At GSU, he studied Parallel programming and was the Program Chair of ACM in the University.

Excited to get started with GPU computing? Learn about the best practices and tools to quickly get started with GPUs. We will introduce you to the latest advancements available in the CUDA ecosystem and describe how to efficiently use them. You will walk away with the knowledge of the right tools to get started with increased productivity and cutting edge libraries to accelerate your applications using GPUs. Some of the libraries discussed will include cuBLAS, cuFFT, ArrayFire and Thrust.

Session Level: Beginner
Session Type: Tutorial
Tags: Programming Languages & Compilers; Debugging Tools & Techniques; Performance Optimization; Numerical Algorithms & Libraries

Day: Monday, 03/24
Time: 09:00 - 10:20
Location: Room 210B

S4721 - NVIDIA Rendering Innovations within Autodesk Maya and 3ds Max

Peter de Lappe ( Technical Product Mananger, NVIDIA )
Peter de Lappe
Peter primary focus is on the OptiX and SceniX accelerators where he manages the design and production of CUDA based programs using OptiX, SceniX, CompleX, PhysX. Peter is also a veteran Character Animator , Animation Director and Character Technical Director as well as being a CG modeling, lighting and rendering TD. He is also well versed in 3D animation software, especially the three major DCC packages Softimage, Maya and 3dsMAX as well as many compositing packages ( Flame/Inferno, Shake, Combustion, etc). Peter has a background in architecture and industrial design, with solid film andprint experience as well as a long list film, television and game credits.
Bart Gawboy ( Trainer and Interface Designer, NVIDIA )
Bart Gawboy
Barton trains artists and TDs all over the world in how to use mental ray. He also works on special projects, whether it is working closely with the studio users, or with Julia in refining the Autodesk Maya integration interface. He takes what he learns from training and end-user interaction to improve ARC products' ease-of-use, while retaining their flexibility. He has over 25 years of experience in computer graphics, in both hardware and software design and management, as well as in-the-trenches production experience, from Dreamworks to his own companies. Barton has a BA and BE in electrical engineering modified with music from Dartmouth College.
Julia Flototto ( Senior Developer Technology Engineer, NVIDIA )
Julia Flototto
Julia Floetotto is a Software engineer at NVIDIA Advanced Rendering Center in Berlin, Germany. She focuses on the development and integration of mental ray in Autodesk Maya in close collaboration with the Autodesk Maya rendering team. She has ten years of experience in computer graphics and software engineering at NVIDIA and previously mental images. Julia received her Ph.D. in Computer Science from INRIA Sophia Antipolis, France.
David Hackett ( Lead Lighter and Technical Director, The Mill )
David Hackett
David is currently lead lighter and technical director at The Mill. A Maya and mental ray user since 2000, he also blogs with developer Brenton Rayner on elementalray.wordpress.com; helping users achieve better renders using mental ray in Autodesk Maya. David's background includes college instruction as well as work on commercials and major motion pictures.
Jonathan Beals ( Lead Look Development and Lighting Artist, Hinge Digital )
Jonathan Beals has been the lead look development and lighting artist at Hinge Digital for four years. Previous to that, he worked with Adidas as a technical artist developing prototype footwear and apparel for the Olympics, World Cup, and the US Open tournaments. Jonathan is an advanced user of Maya and Mental Ray, as well as an avid user of Nuke. He received his BA at the Art Institute of Portland.

Come learn about the latest rendering capabilities of Autodesk 3ds Max and Maya from the makers of NVIDIA mental ray and iray. Moderated by Phil Miller, Director, Advanced Rendering at NVIDIA, this course will focus on artist workflow and also discuss the science behind the features, along with recent studio work showcasing how it is used in production, making this of value to anyone producing or even appreciating 3D rendering and animation.

Session Level: Intermediate
Session Type: Tutorial
Tags: Rendering & Animation; Ray Tracing; Media & Entertainment

Day: Monday, 03/24
Time: 09:00 - 10:20
Location: Room 210C

S4874 - Languages, Libraries and Development Tools for GPU Computing

Will Ramey ( Senior Product Manager, GPU Computing, NVIDIA )
Highly-Rated Speaker
As NVIDIA's senior product manager for GPU Computing, Will helps define and promote platforms, libraries and developer tools for the CUDA parallel computing platform. Prior to joining NVIDIA in 2003, he managed an independent game studio and developed advanced technology for the entertainment industry as a product manager and software engineer. He holds a BA in Computer Science from Willamette University and completed the Japan Studies Program at Tokyo International University. Outside of work, Will learns something new every day, usually from his two kids. He enjoys hiking, camping, swimming, spending time with his wonderful wife, and playing The Game.

Get a head start on the conference with this introduction to key technologies for GPU Computing. This tutorial will cover the key features of major programming language solutions, libraries and development tools for GPU computing that are available today. You will also learn which sessions to attend to learn more about each of the topics covered.

Session Level: Beginner
Session Type: Tutorial
Tags: Programming Languages & Compilers

Day: Monday, 03/24
Time: 09:00 - 10:20
Location: Room 212A

S4877 - Flap Higher Than the Birds: Differentiate Your Android Game and Allegorithmic Substance

Andrew Edelsten ( Manager, Tegra Developer Technologies, NVIDIA )
Andrew Edelsten
Andrew has 15 years experience making computer games, managing web and data centers and even a short stint as a commercial lawyer. He moved to NVIDIA four years ago and manages a team of Tegra and Android specialists who assist developers to enhance their games and apps for NVIDIA's Tegra processor.
Sebastien Deguy ( CEO, Allegorithmic )
Dr. Sébastien Deguy is Founder and CEO at Allegorithmic, the company behind the Substance texturing technology and product line. Sébastien has a computer science background with a specialization in mathematics, random processes, simulation, computer vision and image synthesis. He is also an award-winning director and producer of traditional and animated short films.

Android continues its meteoric rise as the world's dominate mobile operating system. Every day developers large and small discover new ways to delight users but getting noticed is increasingly difficult. The latest NVIDIA(r) Tegra(r) K1 processors provide developers with a host of new features to differentiate their titles and get them flying above the rest of the crowd. During this session discover the new CPU, GPU, and multimedia features the latest Tegra processors offer and learn how to use them to enhance and extend your applications. As an example of the type of differentiation the Tegra K1 makes possible, Allegorithmic and RUST Ltd will provide a hands-on demo of physically based shading (PBR), dynamic texturing and high resolution GPU based particle throwing using the latest Allegorithmic Substance texturing pipeline.

Session Level: Intermediate
Session Type: Tutorial
Tags: Mobile Summit; Mobile Applications; Game Development

Day: Monday, 03/24
Time: 09:00 - 10:20
Location: Room 210E

S4160 - CUDA Optimization with NVIDIA Nsight™ Visual Studio Edition: A Case Study

Julien Demouth ( Developer Technology Engineer, NVIDIA )
Highly-Rated Speaker
Julien Demouth
Julien is a Developer Technology Engineer at NVIDIA where he works on the optimization of CUDA applications. Julien has a PhD in Computer Science from the INRIA in France.

In this session, we will study a real CUDA application and use Nsight™ Visual Studio Edition on Windows to optimize the performance of the code. The attendees will learn a method to analyze their codes and how to use the tools to apply those ideas.

Session Level: Intermediate
Session Type: Tutorial
Tags: Performance Optimization

Day: Monday, 03/24
Time: 10:30 - 11:50
Location: Room 220B

S4167 - Introduction to Accelerated Computing Using Directives

Jeff Larkin ( Developer Technology Software Engineer, NVIDIA )
Highly-Rated Speaker
Jeff Larkin
Jeff Larkin is a software engineer in NVIDIA's Developer Technology group, where he helps developers profile and optimize scientific applications. Prior to joining NVIDIA, Jeff worked as a performance engineer at Cray Inc.

OpenACC and OpenMP 4.0 provides directives-based approaches to rapidly accelerating application for GPUs and other parallel architectures. This tutorial serves as an introduction to programming with OpenACC 2.0 and OpenMP 4.0. Participants will learn how to apply compiler directives to an existing application to parallelize the application for accelerated architectures. No prior GPU experience is required for this tutorial.

Session Level: Beginner
Session Type: Tutorial
Tags: Programming Languages & Compilers

Day: Monday, 03/24
Time: 10:30 - 11:50
Location: Room 212A

S4324 - Topics in GPU-Based Video Processing

Thomas True ( Senior Applied Engineer for Professional Video, NVIDIA )
Thomas True
Thomas True is a Senior Applied Engineer for Professional Video in NVIDIA's Professional Solutions Group where for the past 10 years he has focused on the use of GPUs in broadcast, video and film applications ranging from pre-visualization to post production and live to air. Prior to joining NVIDIA, Thomas was an Applications Engineer at SGI. Thomas has an M.S degree from the Graphics Lab at Brown University and a B.S. degree from the Rochester Institute of Technology.

The GPU is a high performing floating point parallel processor with extremely high memory bandwidth. This makes it ideally suited for video and image processing applications. This tutorial will present the latest techniques for optimal GPU-based video processing.

Session Level: Intermediate
Session Type: Tutorial
Tags: Video & Image Processing; Performance Optimization; Media & Entertainment; Real-Time Graphics Applications

Day: Monday, 03/24
Time: 10:30 - 11:50
Location: Room 211A

S4413 - NumbaPro: High-Level GPU Programming in Python for Rapid Development

Siu Kwan Lam ( Software Engineer, Continuum Analytics, Inc )
Siu Kwan Lam
Siu Kwan Lam has a B.S.+M.S. degree in Computer Engineering from San Jose State University. He has researched TCP covert channel detection for NSF STC TRUST and has taught CUDA at San Jose State University during his senior year. At Continuum Analytics, he is the primary developer for NumbaPro, and maintains the opensource LLVMPY project.
Travis Oliphant ( Co-founder & CEO, Continuum Analytics, Inc )
Travis has a Ph.D. from the Mayo Clinic and B.S. and M.S. degrees in Mathematics and Electrical Engineering from Brigham Young University. Since 1997, he has worked extensively with Python for numerical and scientific programming, most notably as the primary developer of the NumPy package, and as a founding contributor of the SciPy package. He is also the author of the definitive "Guide to NumPy". Travis was an assistant professor of Electrical and Computer Engineering at BYU from 2001-2007, where he taught courses in probability theory, electromagnetics, inverse problems, and signal processing. He also served as Director of the Biomedical Imaging Lab, where he researched satellite remote sensing, MRI, ultrasound, elastography, and scanning impedance imaging. From 2007-2011, Travis was the President at Enthought, Inc. During his tenure there, the company grew from 15 to 50 employees, and Travis worked with well-known Fortune 50 companies in finance, oil-and-gas, and consumer-products. He was involved in all aspects of the contractual relationship, including consulting, training, code-architecture, and development. As CEO of Continuum Analytics, Travis engages customers, develops business strategy, and guides technical direction of the company. He actively contributes to software development, and engages with the wider open source community in the Python ecosystem.

Learn about high-level GPU programming in NumbaPro to reduce development time and produce high-performance data-parallel code with the ease of Python. This tutorial is for beginning to intermediate CUDA programmers who already know Python. In this tutorial, audience will learn about (1) high-level Python decorators that turn simple Python functions into data-parallel GPU kernels without any knowledge of the CUDA architecture; (2) CUDA library bindings that can be used as a drop-in to speedup existing applications; and, (3) reuse existing CUDA-C/C++ code in Python with JIT Linking.

Session Level: Beginner
Session Type: Tutorial
Tags: Programming Languages & Compilers

Day: Monday, 03/24
Time: 10:30 - 11:50
Location: Room 210D

S4616 - UI Composer for Automotive HMIs - Part 1: What, Why, and How

Stephen Mendoza ( Automotive Artist, NVIDIA )
Stephen Mendoza
Stephen Mendoza is an automotive proof of concept artist in NVIDIA's embedded software group. Using UI Composer Studio and Architect, Stephen works with automotive customers and internal R&D groups to bring next-generation user interfaces and experiences to modern vehicles. His past experience includes building 3D interactive applications for the aerospace industry and developing forensic animations for vehicle accident reconstruction firms. An avid gamer and fine artist in his free time, Stephen's interests include modding games, developing abstract user interface concepts, and acrylic painting. He has a Bachelor of Arts degree from the Art Institute of Colorado.
Gavin Kistner ( Product Designer, NVIDIA )
Gavin Kistner
Gavin Kistner is the Product Designer for NVIDIA's UI Composer suite, bringing a user-focused approach to the customer tools. From feature design to interface layout to Lua scripting to documentation, he brings love and polish to every aspect of the software. Gavin has spent the past 10 years working on 3D user interfaces and tools, both at NVIDIA and Anark Corporation before that. Prior to 3D, Gavin spent 10 years working on application usability and development on the Web since its inception. He holds degrees in Computer Science and Electrical Engineering from Duke University, but has since realized that the more you can stand on the shoulders of giants and express high level abstractions, the farther you can see and reach.

An in-depth session discussing the creation of digital instrument clusters and IVI with NVIDIA's UI Composer Studio design software. Attendees will gain insight into how UI Composer Studio solves for photorealistic lighting and materials on the Tegra K1 automotive platform. Other areas of discussion include ways UI Composer Studio reduces the amount of time between design iterations.

Session Level: Beginner
Session Type: Tutorial
Tags: Automotive; Debugging Tools & Techniques; In-Vehicle Infotainment (IVI) & Safety; Digital Product Design & Styling; Recommended Press Session – Auto

Day: Monday, 03/24
Time: 10:30 - 11:50
Location: Room 210G

S4618 - NVIDIA VisualFX SDK: Enabling Cinematic Effects in Games

Monier Maher ( Director, VisualFX , NVIDIA )
Monier Maher
Monier Maher wrote his first games in the 80's and got back to game development when he co-founded AGEIA Technologies in 2002 to accelerate game physics. At NVIDIA, Monier continues bringing his passion for finding innovative solutions to create games that give the player an experience unmatched anywhere else in the game industry.
Nathan Reed ( Devtech Software Engineer, NVIDIA )
Nathan Reed
Nathan Reed is a graphics programmer, an amateur physicist, and a sci-fi nerd. He got his start in the game industry at Sony’s Sucker Punch studio, working on rendering for the Infamous series. Since joining NVIDIA in 2013, he’s been researching and developing graphics techniques for the GameWorks middleware libraries.
Simon Green ( Principal Software Engineer, NVIDIA )
Simon Green
Simon Green is a principal software engineer in the Developer Technology group at NVIDIA. He started graphics programming on the Sinclair ZX-81, which had 1 Kb of RAM and a screen resolution of 64 by 48 pixels, and has been trying to improve the quality of real-time graphics ever since. He received a B.S. in computer science from the University of Reading, U.K. in 1994. His research interests include cellular automata, physically-based simulation and analogue synthesizers.
Tae-Yong Kim ( Senior R&D Engineer, NVIDIA )
Tae-Yong Kim
Tae-Yong Kim is currently a senior R&D engineer at NVIDIA's PhysX group. He works on researching and developing NVIDIA PhysX/Apex technology such as APEX Clothing, Fluid, and Destruction. His current focus includes bringing a new APEX Fur module to games and realtime applications. Prior to joining NVIDIA, he developed various simulation and rendering technologies for Rhythm and Hues Studios. His tools were used for the production of Hollywood movies such as "Chronicles of Narnia", "Superman Returns", "Mummy 3". In 2010, he served as a DITS committee member for the Academy of Motion Picture Arts and Sciences.

The NVIDIA VisualFx SDK provides game developers a turnkey solution to enable cinematic effects like interactive fire and smoke, fur, waves , global illumination and more in games. All these complex, realistic effects are provided in an easy-to-use SDK to facilitate the integration and tuning in any given game engine. In this session we will provide an overview of the different VisualFX SDK modules, the roadmap and some case studies on how they were successfully used.

Session Level: Beginner
Session Type: Tutorial
Tags: Game Development; Visual Effects & Simulation; Rendering & Animation; Recommended for All Press

Day: Monday, 03/24
Time: 10:30 - 11:50
Location: Room 210C

S4700 - Part 2: GPU Architecture & The CUDA Memory Model (Presented by Acceleware)

Chris Mason ( Product Manager, Acceleware Ltd. )
Chris is the Product Manager for Acceleware's GPU accelerated electromagnetic product line. He is responsible for the successful development and launch of Acceleware products used by companies world-wide. Chris has 9 years of experience in developing commercial applications for the GPU and has delivered over 20 CUDA courses to students in a diverse range of industries. His previous experience also includes parallelization of algorithms on digital signal processors (DSPs) for cellular phones and base stations. Chris has a Masters in Electrical Engineering from Stanford University.

Explore the memory model of the GPU! The session will begin with an essential overview of the GPU architecture and thread cooperation before focusing on the different memory types available on the GPU. We will define shared, constant and global memory and discuss the best locations to store your application data for optimized performance. Features available in the Kepler architecture such as the shuffle instruction, shared memory configurations and Read-Only Data Cache are introduced and optimization techniques discussed. A programming demonstration of shared and constant memory will be delivered.

Session Level: Intermediate
Session Type: Tutorial
Tags: Programming Languages & Compilers; Performance Optimization

Day: Monday, 03/24
Time: 10:30 - 11:50
Location: Room 220C

S4711 - Session 2: Fast, Parallel Algorithms for Computer Vision and Machine Learning with GPUs (Presented by ArrayFire)

Umar Arshad ( Senior Software Engineer, CUDA Training Specialist, ArrayFire )
Umar Arshad is an engineer at ArrayFire where he primarily works on improving concurrency in ArrayFire and applications using ArrayFire. He also created the CUDA and OpenCL Optimization training material and regularly gives tutorials throughout the country. Before joining ArrayFire Umar was a developer at Inovalon where he was involved with improving performance and designing large scale applications. Umar has graduated from Georgia State University with a Masters degree in Computer Science. At GSU, he studied Parallel programming and was the Program Chair of ACM in the University.

Working on image processing, computer vision, or machine learning? Learn best practices for implementing parallel versions of popular algorithms on GPUs. Instead of reinventing the wheel, you will learn where to find and how to use excellent versions of these algorithms already available in CUDA and ArrayFire libraries. You will walk away equipped with the best tools and knowledge for implementing accelerated image processing and machine learning. This session will also include information about programming CUDA on Tegra mobile devices for computer vision applications.

Session Level: Beginner
Session Type: Tutorial
Tags: Computer Vision; Machine Learning & AI; Video & Image Processing; Numerical Algorithms & Libraries

Day: Monday, 03/24
Time: 10:30 - 11:50
Location: Room 210B

S4873 - Image and Vision Processing on Tegra

Elif Albuz ( Manager of Vision Software, NVIDIA )
Elif Albuz is the manager of Mobile Vision Software at NVIDIA, leading Computer Vision projects on Advanced Driver Assistance, Computational Photography and Augmented Reality on Tegra GPUs. Before Computer Vision Group, she was leading CUDA FFT Library; designing new algorithms for motion estimation, superresolution and frame-rate up conversion and accelerating them on NVIDIA GPUs; designing architecture for error concealment, adaptive quantization for video codec handwares; and implementing low-level code for h.264, MPEG2 codecs. Prior to joining NVIDIA, she worked at Sony Electronics, leading DVD decoder firmware stack that was used in DVD players and PS2, implementing real-time OS for multi-processor systems and accelerating h.264 using SIMD in the Multimedia Research Labs. Elif Albuz holds dual degree on Electrical Engineering and Computer Science where she focused on Artificial Intelligence and Robotics, and holds a Masters degree in Electrical Engineering with focus on content based image retrieval and parallel architectures.

Processing live and offline camera frames, images and video streams, and extracting semantic information enables various applications in mobile and embedded platforms. Inherently, image and vision computing algorithms are highly parallel, and fast processing of these algorithms enable new paradigms in embedded and mobile applications. Tegra K1 is built to address data parallel embedded and mobile applications, with CUDA enabled GPU, Image Signal processing Engine, NEON enabled quad-core ARM and encode and decode accelerator hardware. Tegra software libraries wrap all this capability and provide to the use of developers. In this session, an overview of software libraries and architecture that are relevant for image and vision computing on Tegra platforms will be presented.

Session Level: All
Session Type: Tutorial
Tags: Mobile Summit; Computer Vision; Automotive; Computational Photography; Recommended Press Session – Mobile

Day: Monday, 03/24
Time: 10:30 - 11:50
Location: Room 210E

S4200 - Advanced Accelerated Computing Using Directives

Jeff Larkin ( Developer Technology Software Engineer, NVIDIA )
Highly-Rated Speaker
Jeff Larkin
Jeff Larkin is a software engineer in NVIDIA's Developer Technology group, where he helps developers profile and optimize scientific applications. Prior to joining NVIDIA, Jeff worked as a performance engineer at Cray Inc.

This tutorial will expand upon the participants' experience with accelerator directives (OpenACC and OpenMP) by focusing on performance optimization and interoperability with other programming models. Participants will learn about the multiple levels of parallelism that can be expressed in OpenACC and OpenMP and how to apply them to their application code. They will also learn how asynchronous execution improves application performance. Finally, they will learn how compiler directives interoperate with other accelerated computing technologies such as CUDA C, CUDA Fortran, and libraries.

Session Level: Intermediate
Session Type: Tutorial
Tags: Programming Languages & Compilers

Day: Monday, 03/24
Time: 13:00 - 14:20
Location: Room 212A

S4244 - How to Visualize Your GPU-Accelerated Simulation Results

Peter Messmer ( Senior HPC DevTech Engineer, NVIDIA )
Highly-Rated Speaker
Peter Messmer
Peter Messmer joined NVIDIA in 2011 after spending more than 15 years developing HPC and GPU accelerated applications for industry and government clients, mainly in the area of plasma and EM simulations, data analysis and visualization. In his role as senior devtech engineer at NVIDIA, Peter is working with HPC users around the globe supporting them in accelerating their scientific discovery process by taking advantage of GPUs in their applications. Peter holds and MSc and PhD in Physics from ETH Zurich, Switzerland, with specialization in kinetic plasma physics and nonlinear optics.

Learn how to take advantage of GPUs to visualize results of your GPU-accelerated simulation! This session will cover a broad range of visualization and analysis techniques allowing you to investigate your data on the fly. Starting with some basic CUDA/OpenGL interoperability, we will introduce more sophisticated data models allowing you to take advantage of widely used tools like ParaView and VisIt to visualize your GPU resident data. Questions like parallel compositing, remote visualization and application steering will be addressed in order to allow you to take full advantage of the GPUs installed in your supercomputing system.

Session Level: Intermediate
Session Type: Tutorial
Tags: Large Scale Data Visualization & In-Situ Graphics; Scientific Visualization; Supercomputing; Combined Simulation & Real-Time Visualization

Day: Monday, 03/24
Time: 13:00 - 14:20
Location: Room 210C

S4626 - GPU Ray Tracing and Advanced Rendering Solutions from NVIDIA

Phillip Miller ( Director, Advanced Rendering, NVIDIA )
Highly-Rated Speaker
Phillip Miller
Mr. Miller directs product management for NVIDIA Advanced Rendering offerings, ranging from the Iray and mental ray shipping within leading products in Design and Entertainment to the OptiX ray tracing framework used extensively within private and commercial applications. He has been working on leading software products for 20 years, including the 3D animation efforts at Autodesk and the Web Design products at Adobe. He holds a Masters of Architecture from the University of Illinois and is a registered architect.

Learn how GPU computing is revolutionizing performance and possibilities in both interactive and production rendering. The latest capabilities of NVIDIA's Advanced Rendering solutions will be explored and demonstrated, along with what's possible with the latest in NVIDIA OptiX for accelerating custom ray tracing solutions. Trends in the industry, along with guidelines for configuring optimal rendering, will also be discussed.

Session Level: Beginner
Session Type: Tutorial
Tags: Rendering & Animation; Ray Tracing; Manufacturing; Media & Entertainment; Recommended Press Session – Digital Manufacturing

Day: Monday, 03/24
Time: 13:00 - 14:20
Location: Room 210D

S4671 - See the Big Picture: Scalable Visualization Solutions for High Resolution Displays

Doug Traill ( Solutions Architect, NVIDIA )
Highly-Rated Speaker
Doug Traill
Doug Traill is a Senior Solutions Architect at Nvidia responsible for Scalable Visualization Solutions. In this role he works with Systems Integrators and end customers to help design and implement complex visualization systems. During his career, Doug has been responsible for helping design & build some of the World’s largest Visualization Centers, Simulators and Planetariums.

Large format high resolution displays are being utilized everywhere from corporate conference rooms to Supercomputing facilities. NVIDIA Quadro SVS solutions provide many features to make it easier to install and utilize these large scale displays. Attendees of this tutorial will learn how to configure Quadro Graphics for thin bezel panel, edge-blended projectors, stereoscopic and immersive displays.

Session Level: Beginner
Session Type: Tutorial
Tags: Collaborative & Large Resolution Displays

Day: Monday, 03/24
Time: 13:00 - 14:20
Location: Room 210A

S4701 - Part 3: Asynchronous Operations & Dynamic Parallelism in CUDA (Presented by Acceleware)

Dan Cyca ( Chief Technology Officer, Acceleware Ltd. )
Dan has extensive experience working with GPUs, clusters and multi-core solutions. He is responsible for the development of Acceleware's high performance software applications for the engineering and energy industries. Dan joined Acceleware in 2004 as a software developer to build the company's first product, an electromagnetic solver for the GPU. Dan has also played a fundamental role in developing Acceleware's CUDA training materials and teaching the content to companies around the world. Prior to Acceleware, Dan's experience included developing 'C-to-hardware' compilers, and implementing digital signal processing and encryption algorithms on FPGAs. Dan has an M. Sc. in Electrical Engineering from the University of Calgary.

This tutorial dives deep into asynchronous operations and how to maximize throughput on both the CPU and GPU with streams. We will demonstrate how to build a CPU/GPU pipeline and how to design your algorithm to take advantage of asynchronous operations. The second part of the session will focus on dynamic parallelism. A programming demo involving asynchronous operations and dynamic parallelism will be included.

Session Level: Intermediate
Session Type: Tutorial
Tags: Programming Languages & Compilers; Performance Optimization

Day: Monday, 03/24
Time: 13:00 - 14:20
Location: Room 220C

S4712 - Session 3: Advanced CUDA Optimizations (Presented by ArrayFire)

Umar Arshad ( Senior Software Engineer, CUDA Training Specialist, ArrayFire )
Umar Arshad is an engineer at ArrayFire where he primarily works on improving concurrency in ArrayFire and applications using ArrayFire. He also created the CUDA and OpenCL Optimization training material and regularly gives tutorials throughout the country. Before joining ArrayFire Umar was a developer at Inovalon where he was involved with improving performance and designing large scale applications. Umar has graduated from Georgia State University with a Masters degree in Computer Science. At GSU, he studied Parallel programming and was the Program Chair of ACM in the University.

In this session, we will examine Instruction Level Parallelism (ILP), Kepler specific optimization including shuffle instructions, dynamic parallelism. We will also equip you with knowledge of important profiling and debugging tools to improve GPU utilization and kernel performance.

Session Level: Advanced
Session Type: Tutorial
Tags: Performance Optimization; Programming Languages & Compilers; Debugging Tools & Techniques; Numerical Algorithms & Libraries

Day: Monday, 03/24
Time: 13:00 - 14:20
Location: Room 220B

S4726 - Intro to Virtualization 101

Luke Wignall ( Senior Solution Architect, NVIDIA )
Luke Wignall
Luke came to NVIDIA after working as an owner of an integrator/VAR, as a sales engineer, solution architect, consultant, and system administrator with both VMware and Citrix technologies in both public and private industry. An early evangelist of virtualization, Luke now sees the ability to bring GPU to the end user experience as the missing "special sauce" that brings virtual desktops to the next level.
Jared Cowart ( Senior Solution Architect, NVIDIA )
Jared Cowart
TBD

This session introduces the audience to the concepts of server, desktop, and application virtualization. The audience will learn about the key concepts and technologies used in virtualization as a foundation on how NVIDIA is leading the way in adding a key ingredient, a superior end user experience. Jared and Luke will discuss the business reasons that make virtualization such a powerful answer, the important considerations before moving into virtualization, and what typical environments look like with live demos.

Session Level: Beginner
Session Type: Tutorial
Tags: Graphics Virtualization Summit; Desktop & Application Virtualization

Day: Monday, 03/24
Time: 13:00 - 14:20
Location: Room 211A

S4806 - UI Composer for Automotive HMIs - Part 2: Building Content

Stephen Mendoza ( Automotive Artist, NVIDIA )
Stephen Mendoza is an automotive proof of concept artist in NVIDIA's embedded software group. Using UI Composer Studio and Architect, Stephen works with automotive customers and internal R&D groups to bring next-generation user interfaces and experiences to modern vehicles. His past experience includes building 3D interactive applications for the aerospace industry and developing forensic animations for vehicle accident reconstruction firms. An avid gamer and fine artist in his free time, Stephen's interests include modding games, developing abstract user interface concepts, and acrylic painting. He has a Bachelor of Arts degree from the Art Institute of Colorado.
Gavin Kistner ( Product Designer, NVIDIA )
Gavin Kistner
Gavin Kistner is the Product Designer for NVIDIA's UI Composer suite, bringing a user-focused approach to the customer tools. From feature design to interface layout to Lua scripting to documentation, he brings love and polish to every aspect of the software. Gavin has spent the past 10 years working on 3D user interfaces and tools, both at NVIDIA and Anark Corporation before that. Prior to 3D, Gavin spent 10 years working on application usability and development on the Web since its inception. He holds degrees in Computer Science and Electrical Engineering from Duke University, but has since realized that the more you can stand on the shoulders of giants and express high level abstractions, the farther you can see and reach.

A continuation of Part 1, this is a hands-on, interactive demonstration of content creation using UI Composer. The audience will be guided through the steps to build a data-driven virtual automotive gauge. In order to actively participate in this session, attendees are asked to bring their own Windows laptop with UI Composer installed. UI Composer is available for free from http://uicomposer.nvidia.com/

Session Level: Beginner
Session Type: Tutorial
Tags: Automotive; Debugging Tools & Techniques; In-Vehicle Infotainment (IVI) & Safety; Digital Product Design & Styling; Recommended Press Session – Auto

Day: Monday, 03/24
Time: 13:00 - 14:20
Location: Room 210G

S4906 - Mobile GPU Compute with Tegra K1

Amit Rao ( Senior Manager, NVIDIA )
Amit Rao
Experienced manager with proven track record in growing teams from scratch and developing expertise on multiple domains as both technical lead and as manager. Amit is execution oriented with capability of guiding both people and technology to ensure a well planned and on-schedule delivery of goals. Experienced on desktops and embedded systems, ranging from research on gaming engines to implementation of graphics and compute standards on various hardware architectures. Strongly focused on performance critical and real time applications. His specialties include: 3D real time graphics, heterogeneous computing and game engine development.
Mark Ebersole ( CUDA Educator, NVIDIA )
Highly-Rated Speaker
As CUDA Educator at NVIDIA, Mark Ebersole teaches developers the benefit of GPU computing using the NVIDIA CUDA parallel computing platform and programming model, and the benefits of GPU computing. With more than ten years of experience as a systems programmer, Mark has spent much of his time at NVIDIA as a GPU systems diagnostics programmer in which he developed a tool to test, debug, validate, and verify GPUs from pre-emulation through bringup and into production. Before joining NVIDIA, he worked at IBM developing Linux drivers for the IBM iSeries server. Mark holds a BS degree in math and computer science from St. Cloud State University.

An in-depth session that explores how sophisticated mobile applications can harness the power of GPU Compute using the Kepler GPU in the Tegra K1 SOC. Topics to be covered include: (1) An overview of GPU Compute capability of Tegra K1; (2) A review of the various GPU Compute APIs with relative strengths and weaknesses including CUDA, RenderScript, OpenGL Compute Shaders and OpenCL; (4) Getting up and running on the Tegra Development Platform with GPU Compute; (5) Principles and considerations for programming with CUDA on Tegra and; (6) Walk-throughs of GPU Compute coding examples using CUDA for Tegra K1.

Session Level: Intermediate
Session Type: Tutorial
Tags: Mobile Summit; Recommended Press Session – Mobile

Day: Monday, 03/24
Time: 13:00 - 14:20
Location: Room 210E

S4236 - Multi GPU Programming with MPI (Part I+II+III)

Jiri Kraus ( Developer Technology Software Engineer, NVIDIA )
Highly-Rated Speaker
Jiri Kraus
Jiri Kraus is a developer in NVIDIA's European Developer Technology team. As a consultant for GPU HPC applications at the NVIDIA Jülich Applications Lab, Jiri collaborates with local developers and scientists at the Jülich Supercomputing Centre and the Forschungszentrum Jülich. Before joining NVIDIA Jiri worked on the parallelization and optimization of scientific and technical applications for clusters of multicore CPUs and GPUs at Fraunhofer SCAI in St. Augustin. He holds a Diploma in Mathematics from the University of Cologne, Germany.
Peter Messmer ( Senior Developer Technology Software Engineer, NVIDIA )
Highly-Rated Speaker
Peter Messmer
Peter joined NVIDIA in 2011 after spending more than 15 years developing HPC- and GPU-accelerated applications for industry and government clients. In his role as senior devtech engineer at NVIDIA, he supports HPC users around the globe in using GPUs to accelerate their scientific discovery processes. Peter holds an MSc and PhD in physics from ETH Zurich, Switzerland, with specialization in kinetic plasma physics and nonlinear optics.

In this session you will learn how to program GPU clusters using the message passing interface (MPI) and OpenACC or CUDA. Part I of this session will explain how to get started by giving a quick introduction to MPI and how it can be combined with OpenACC or CUDA. Part II will explain more advanced topics like GPU-aware MPI and how to overlap communication with computation to hide communication times. Finally, Part III will cover how to use the NVIDIA performance analysis tools in a MPI environment and give an overview of third party tools specifically designed for GPU clusters.

Session Level: Beginner
Session Type: Tutorial
Tags: Supercomputing; Performance Optimization

Day: Monday, 03/24
Time: 14:30 - 15:50
Location: Room 220B

S4455 - Multi-GPU Rendering

Ingo Esser ( Devtech Platform Engineer, NVIDIA )
Ingo Esser
Ingo Esser is a Senior DevTech Engineer in NVIDIA's Professional Solutions Group where he works to help different ISVs improving their rendering algorithms. These ISVs mostly work in the Automotive and the Oil&Gas domains, where either rendering complex surfaces or visualizing large datasets is an issue. He has a Diploma in Computer Science from the chair for Computer Graphics and Multimedia at the RWTH Aachen, Germany.
Shalini Venkataraman ( Senior Applied Engineer, NVIDIA )
Highly-Rated Speaker
Shalini Venkataraman
Shalini Venkataraman is a Senior Applied Engineer in NVIDIA’s Professional Solutions Group where she works on using GPUs to solve large-scale imaging and visualization problems in Medical, Oil&Gas and Scientific Computing domains. Prior to that she was a researcher at various High Performance Computing centers in the US and Singapore. Her interests are in parallel and large data visualization. She has a MS in Computer Science from the Electronic Visualization Lab, University of Illinois-Chicago and BS from the National University of Singapore.

With more workstation applications utilizing more efficient rendering pipelines and rendering larger scenes with more complex fragment shaders, GPUs can become the bottleneck in a system. The first part of this talk will be a refresher on multi-gpu programming basics to scale your rendering tasks. We show how to target individual APIs programmatically as well as structure your application by using multiple threads, OpenGL contexts and handle the synchronization and data transfer. The second part will dive in to details of designing a rendering pipeline that can efficiently utilize a multi-GPU setup by splitting rendering tasks into a set of phases. These phases represent a set of threads that distribute the rendering load across a set of GPUs. The talk will comprise how to set up a multithreaded application using C++11 constructs, how to analyze/debug the performance of a graphics application, how to do PCIe transfers efficiently and how to optimally distribute workload across different GPUs.

Session Level: Intermediate
Session Type: Tutorial
Tags: Rendering & Animation; Real-Time Graphics Applications; Large Scale Data Visualization & In-Situ Graphics; Media & Entertainment

Day: Monday, 03/24
Time: 14:30 - 15:50
Location: Room 210A

S4596 - Accelerating Ray Tracing Using OptiX

David McAllister ( OptiX Engineering Manager, NVIDIA )
Highly-Rated Speaker
David McAllister
David McAllister is the engineering manager of NVIDIA's OptiX ray tracing engine and has been in the OptiX group for four years. Prior to OptiX he was a GPU architect since joining Nvidia in 2000, working on GPUs from the GeForce 3 through Fermi. David received his Ph.D. in Computer Science from UNC Chapel Hill and has been in the computer graphics industry since 1989. He resides in Salt Lake City, Utah, USA.
Jan Tománek ( Owner/CEO, AAA Studio-FurryBall )
Jan Tománek is a film producer, director and owner of the Art And Animation Studio (AAA studio). AAA studio was founded in 1990 as one of the first private studios in the Czech Republic. The company is a family-run business based on creative freedom. After producing the first East-European CGI feature movie in 2008, Jan decided to develop his own in-house GPU renderer FurryBall based on DirectX 11. In 2012, AAA studio finished the sequel "Goat story 2" completely rendered on GPUs, with much better quality than the previous movie and 10 times faster. Jan's philosophy is that a movie is always fake and that renderers unconstrained by trying to be totally realistic allow more artistic freedom.
James Bigler ( Sr. Software Engineer, NVIDIA )
James Bigler is currently working for NVIDIA as a Sr. Software Engineer developing OptiX, a GPU accelerated ray tracing framework. His work with ray tracing dates back to 2000 at the University of Utah where he worked under Dr. Steven Parker researching and developing parallel ray tracing applications for rendering and scientific visualization. Since coming to NVIDIA in 2008, James has strived to bring more ray tracing awesomeness to everyone through OptiX. James holds a B.S. and M.S. in Computer Science from the University of Utah.

OptiX is the foremost platform for GPU ray tracing. It exposes the extreme ray tracing performance of the GPU to typical developers, while hiding most of the complexity usually associated with ray tracing. This session will cover everything developers need to get started with ray tracing in OptiX, including OptiX C and C++ APIs, the execution model, acceleration structures, programmable entry points, and best practices. We will also cover exciting customer use cases and the new OptiX Prime API that provides to-the-metal ray tracing without shading or recursion.

Session Level: Intermediate
Session Type: Tutorial
Tags: Ray Tracing; Rendering & Animation; Media & Entertainment; Manufacturing

Day: Monday, 03/24
Time: 14:30 - 15:50
Location: Room 210D

S4702 - Part 4: Essential CUDA Optimization Techniques (Presented by Acceleware)

Dan Cyca ( Chief Technology Officer, Acceleware Ltd. )
Dan has extensive experience working with GPUs, clusters and multi-core solutions. He is responsible for the development of Acceleware's high performance software applications for the engineering and energy industries. Dan joined Acceleware in 2004 as a software developer to build the company's first product, an electromagnetic solver for the GPU. Dan has also played a fundamental role in developing Acceleware's CUDA training materials and teaching the content to companies around the world. Prior to Acceleware, Dan's experience included developing 'C-to-hardware' compilers, and implementing digital signal processing and encryption algorithms on FPGAs. Dan has an M. Sc. in Electrical Engineering from the University of Calgary.

Learn how to optimize your algorithms for NVIDIA GPUs. This informative tutorial will provide an overview of the improved analysis performance tools available in CUDA 6.0 and key optimization strategies for compute, latency and memory bound problems. The session will include techniques for ensuring peak utilization of CUDA cores by choosing the optimal block size. For compute bound algorithms we will discuss how to improve branching efficiency, intrinsic functions and loop unrolling. For memory bound algorithms, optimal access patterns for global and shared memory will be presented and highlighting the differences between the Fermi and Kepler architecture. This session will include code examples throughout and a programming demonstration highlighting the optimal global memory access pattern which is applicable to all GPU architectures.

Session Level: Intermediate
Session Type: Tutorial
Tags: Performance Optimization

Day: Monday, 03/24
Time: 14:30 - 15:50
Location: Room 220C

S4713 - Session 4: Deploying Your CUDA Applications Into The Wild (Presented by ArrayFire)

Umar Arshad ( Senior Software Engineer, CUDA Training Specialist, ArrayFire )
Umar Arshad is an engineer at ArrayFire where he primarily works on improving concurrency in ArrayFire and applications using ArrayFire. He also created the CUDA and OpenCL Optimization training material and regularly gives tutorials throughout the country. Before joining ArrayFire Umar was a developer at Inovalon where he was involved with improving performance and designing large scale applications. Umar has graduated from Georgia State University with a Masters degree in Computer Science. At GSU, he studied Parallel programming and was the Program Chair of ACM in the University.

Excited about CUDA but concerned about deployment? In this session, you will learn best practices for deploying your CUDA application and about how to resolve issues that commonly arise in the process. You will learn about scaling your application to multiple GPUs to handle large amounts of data (such as streams and/or files on disk). You will also learn about deploying your CUDA based applications in the cloud using Node.js, containers via Docker, etc.

Session Level: Intermediate
Session Type: Tutorial
Tags: Numerical Algorithms & Libraries; Clusters & GPU Management; Big Data Analytics & Data Algorithms; Mobile Applications

Day: Monday, 03/24
Time: 14:30 - 15:50
Location: Room 210B

S4783 - Virtual is Better than Physical – Delivering a Delightful User Experience from a Virtual Desktop

Kenneth Fingerlos ( Solutions Architect, Lewan Technology )
Kenneth Fingerlos has been consulting around server and desktop virtualization for over five years; working with customers in Healthcare, Manufacturing, Education, Software Development, Energy, Architecture and Civil Engineering. Kenneth currently works for a VMware Premier and Citrix Platinum partner in Denver Colorado.

Desktop Virtualization has been around for years with a large number of very good reasons to deploy it. However in the effort to control costs and deliver desktops over poor connections and skinny pipes IT Admins have often resorted to delivering sub-par user experiences. This session focuses on technologies which allow delivery of stunning, responsive, rich user experiences from virtual desktops without breaking the bank. With a focus on user experience this session delves into IO, graphics, memory, CPU, and how to get the most smiles for your dollar. This session includes specific discussion of GPU Virtualization, IO optimization, and FLASH storage for Virtual Desktop Environments. Including configurations for Citrix XenApp, XenDesktop, and VMware View.

Session Level: Intermediate
Session Type: Tutorial
Tags: Graphics Virtualization Summit; Desktop & Application Virtualization; Remote Graphics & Cloud-Based Graphics

Day: Monday, 03/24
Time: 14:30 - 15:50
Location: Room 211A

S4825 - Tegra K1 Developer Tools for Android: Unleashing the Power of the Kepler GPU with NVIDIA's Latest Developer Tools Suite

Sebastien Domine ( Sr. Director SW Engineering - Developer Tools, NVIDIA )
Highly-Rated Speaker
Sebastien Domine
Sébastien is the Sr. Director of Developer Technology Tools at NVIDIA. He runs various software engineering teams and oversees the development of software products dedicated to ease the developer’s life and to foster the creation of applications that can take advantage of the GPU. Prior to NVIDIA, he worked on PC games at GameFX/THQ and 3D digital content creation tools at Katrix and Nichimen Graphics. He holds a Diplôme d’Ingénieur in Computer Science from EPITA, Paris, France.

The audience will learn about the latest developer tools suite specifically designed to unleash the power of Tegra K1 for Android application developers. The broad scope of this tutorial spans from advanced graphics to compute and multi-core CPU tools to enable developers to fully take advantage of the heterogeneous computing horsepower available. More specifically, compute developers will learn about the tools available to program CUDA on Tegra K1. Graphics developers will be introduced to the new Tegra Graphics Debugger for Tegra K1. This new mobile graphics development tool supports all the advanced features that Tegra K1 has to offer, via OpenGL ES 2.0, 3.0 and OpenGL 4.3. Finally, game developers will see how to manage their Android build configuration and debugging sessions all within the latest Visual Studio 2013, profile their application to identify hot spots and corresponding call stacks with our brand new release of Tegra System Profiler.

Session Level: Intermediate
Session Type: Tutorial
Tags: Mobile Summit; Debugging Tools & Techniques; Performance Optimization; Game Development

Day: Monday, 03/24
Time: 14:30 - 15:50
Location: Room 210E

Tutorial
 

TALK

Presentation
Details

S4704 - Data Processing and Analytics for Defense

Christopher White ( Program Manager , DARPA )
Christopher White
Dr. Chris White joined DARPA as a program manager in August 2011. His focus is on developing the enabling technology required for efficiently processing, analyzing and visualizing large volumes of data in a military, mission-oriented context. Dr. White previously served DARPA as its country lead for Afghanistan and in-theater member of the Senior Executive Service supporting the commander of the NATO International Security Assistance Force, the Combined Joint Staff branch for Intelligence, the Afghan Threat Finance Cell and the regional military commands. Prior to joining DARPA as government staff, Dr. White was a researcher in DARPA's Information Innovation Office where he created techniques to better understand, measure and model social media and large networks of information. Dr. White was a Research Fellow at Harvard University's School of Engineering and Applied Sciences and the Johns Hopkins University's Human Language Technology Center of Excellence, researching large-scale data analytics for graphs and networks, natural language processing, machine learning and statistical methods for heterogeneous sources in real-world applications. Dr. White holds Ph.D. and M.S. degrees in Electrical Engineering from the Johns Hopkins University and a B.S. in Electrical Engineering from Oklahoma State University.

Join this session to learn about DARPA XDATA program, the program created by DoD to efficiently process and analyze vast amount of mission-oriented information for Defense activities. Data science programs in the DoD aim to meet the challenges of big data by developing computational techniques and software tools for processing and analyzing the vast amount of mission-oriented information for Defense activities. As part of this exploration, the DARPA XDATA program aims to address the need for scalable algorithms for processing and visualization of imperfect and incomplete data. And because of the variety of DoD users, XDATA intends to create human-computer interaction tools that could be easily customized for different missions. Finally, to enable large scale data processing in a wide range of potential settings, XDATA plans to release open source software toolkits to enable collaboration among the applied mathematics, computer science and data visualization communities.

Session Level: Beginner
Session Type: Talk
Tags: Big Data Analytics & Data Algorithms; Machine Learning & AI; Defense; Large Scale Data Analytics; Recommended Press Session – HPC-Science

Day: Monday, 03/24
Time: 09:30 - 09:55
Location: Room 210F

S4617 - A High Level API for Fast Development of High Performance Graphic Analytics on GPUs

Zhisong Fu ( CUDA Researcher, SYSTAP )
Zhisong Fu
Zhisong Fu is a CUDA researcher at SYSTAP, LLC where he works on efficient GPU graph processing based on Merrill's BFS code. Zhisong is a Ph.D. candidate in the School of Computing at the University of Utah, Salt Lake City and received his Bachelor of Science in Computer Science from Zhejiang University in Hangzhou, China.

The goal of this session is to demonstrate how our high level abstraction enables developers to quickly develop high performance graph analytics programs on GPUs with up to 3 billion edges traversed per second on a Tesla or Kepler GPU. High performance graph analytics are critical for a large range of application domains. The SIMT architecture of the GPUs and the irregularity nature of the graphs make it difficult to develop efficient graph analytics programs. In this session, we present an open source library that provides a high level abstraction for efficient graph analytics with minimal coding effort. We use several specific examples to show how to use our abstraction to implement efficient graph analytics in a matter of hours.

Session Level: Beginner
Session Type: Talk
Tags: Big Data Analytics & Data Algorithms; Large Scale Data Analytics; Defense

Day: Monday, 03/24
Time: 10:00 - 10:25
Location: Room 210F

S4602 - Generating Optimized CUDA Code from Parallel Patterns

HyoukJoong Lee ( Ph.D. candidate, Stanford University )
HyoukJoong Lee is a Ph.D. candidate in electrical engineering at Stanford University. His research interests include parallel programming models and compilers for heterogeneous architectures. He has an MS in electrical engineering from Stanford University.

Using high-level languages for GPU programming improves programmer productivity, but the compiler must apply GPU specific optimizations to match the performance of manually optimized kernels. In this talk, we explore building a compiler based on structured parallel patterns to generate efficient code for GPU. Especially, we describe techniques for mapping nested parallel patterns on GPU and using shared memory. We compare the performance of our compiler with manually written kernels and show the impact of optimizations applied.

Session Level: Intermediate
Session Type: Talk
Tags: Programming Languages & Compilers; Large Scale Data Analytics; Defense

Day: Monday, 03/24
Time: 10:30 - 10:55
Location: Room 210F

S4611 - Speeding Up GraphLab Using CUDA

Vishal Vaidyanathan ( Partner, Royal Caliber )
Vishal Vaidyanathan
Vishal graduated from Stanford University in 2007 with a Ph.D. in Computational Chemistry and an M.S. in Financial Mathematics. He developed the first Folding@Home client that used GPUs to accelerate biomolecular simulations by 50 times over what was previously possible. From 2007-2009 Vishal worked at Goldman Sachs developing the first fully automated high frequency trading solution for the US Treasury desk in New York. Subsequently as co-founder of a startup in Silicon Valley, he developed low-latency trading systems and HFT strategies for futures contracts. Vishal joined Royal Caliber as a partner in 2012.

We demonstrate how describing graph algorithms using the Gather-Apply-Scatter (GAS) approach of GraphLab allows us to implement a general purpose and extremely fast GPU based framework for describing and running graph algorithms. Most algorithms and graphs demonstrate a large speedup over GraphLab. We show that speedup is possible when using multiple gpus within a box and that processing of large graphs is possible - with the latest Tesla cards over 48GB of GPU memory can be available within a single box. Example algorithms will include pagerank, bfs, and sssp. The precursor to this work serves as the basis for other attempts at a GPU based GAS framework.

Session Level: Beginner
Session Type: Talk
Tags: Big Data Analytics & Data Algorithms; Performance Optimization; Large Scale Data Analytics; Defense

Day: Monday, 03/24
Time: 11:00 - 11:25
Location: Room 210F

S4624 - GPU-Optimized Deep Learning Networks for Automatic Speech Recognition

Jessica Ray ( Computer Scientist - Human Language Technology, MIT Lincoln Laboratory )
Jessica Ray
Ms. Jessica Ray is a staff member at the MIT Lincoln Laboratory in the Human Language Technology group. Her work is in automatic speech and keyword recognition, with a focus on low-resource languages. After receiving her B.S. in Computer Science and Mathematics from the University of Massachusetts Amherst in May 2012, Ms. Ray joined MIT Lincoln Laboratory in June 2012.

In this talk, we compare the implementation of deep learning networks [1] on traditional x86 processors with the implementation on NVIDIA Tesla K20 GPU Accelerators for the purposes of training Restricted Boltzmann Machines [2] and for deep network back propagation in a large-vocabulary speech recognition task (automatic transcription of TED talks). Two GPU implementations are compared: 1) a high-level implementation using Theano [3] and 2) a native implementation using low-level CUDA BLAS libraries. We describe the scaling properties of these implementations in comparison to a baseline batched-x86 implementation as a function of training data size. We also explore the development time tradeoffs for each of the implementations.

Session Level: Intermediate
Session Type: Talk
Tags: Machine Learning & AI; Performance Optimization; Defense; Large Scale Data Analytics

Day: Monday, 03/24
Time: 11:30 - 11:55
Location: Room 210F

S4608 - Extending Python for High-Performance Data-Parallel Programming

Siu Kwan Lam ( Software Engineer, Continuum Analytics, Inc )
Siu Kwan  Lam
Siu Kwan Lam has a B.S.+M.S. degree in Computer Engineering from San Jose State University. He has researched TCP covert channel detection for NSF STC TRUST and has taught CUDA at San Jose State University during his senior year. At Continuum Analytics, he is the primary developer for NumbaPro, and maintains the opensource LLVMPY project.

Our objective is to design a high-level data-parallel language extension to Python on GPUs. This language extension cooperates with the CPython implementation and uses Python syntax for describing data-parallel computations. The combination of rich library support and language simplicity makes Python ideal for subject matter experts to rapidly develop powerful applications. Python enables fast turnaround time and flexibility for custom analytic pipelines to react to immediate demands. However, CPython has been criticized as being slow and the existence of the global interpreter lock (GIL) makes it difficult to take advantage of parallel hardware. To solve this problem, Continuum Analytics has developed LLVM based JIT compilers for CPython. Numba is the open-source JIT compiler. NumbaPro is the proprietary compiler that adds CUDA GPU support. We aim to extend and improve the current GPU support in NumbaPro to further increase the scalability and portability of Python-based GPU programming.

Session Level: Intermediate
Session Type: Talk
Tags: Big Data Analytics & Data Algorithms; Programming Languages & Compilers; Large Scale Data Analytics; Defense

Day: Monday, 03/24
Time: 13:00 - 13:25
Location: Room 210F

S4609 - High-Performance Graph Primitives on GPU: Design and Implementation of Gunrock

Yangzihao Wang ( Ph.D. Student, UC Davis )
Yangzihao Wang
Yangzihao Wang is a Computer Science Ph.D. student at UC Davis. His advisor is Prof. John Owens. His main research interests are 1) structure of parallelism and locality in irregular algorithms such as graph algorithms on GPU; 2) exascale computing and data analysis using Multi-GPUs.

Gunrock is a CUDA library for graph primitives that refactors, integrates, and generalizes best-of-class GPU implementations of breadth-first search, connected components, and betweenness centrality into a unified code base useful for future development of high-performance GPU graph primitives. The talk will share experience on how to design the framework and APIs for computing efficient graph primitives on GPUs. We will focus on the following two aspects: 1) Details of the implementations of several graph algorithms on GPUs. 2) How to abstract these graph algorithms using general operators and functors on GPUs to improve programmer productivity.

Session Level: Intermediate
Session Type: Talk
Tags: Big Data Analytics & Data Algorithms; Large Scale Data Analytics; Defense

Day: Monday, 03/24
Time: 13:30 - 13:55
Location: Room 210F

S4753 - Visual Object Recognition Using Deep Convolutional Neural Networks

Rob Fergus ( Associate Professor / Research Scientist, New York University / Facebook )
Rob Fergus is an Associate Professor of Computer Science at the Courant Institute of Mathematical Sciences, New York University. He is also a Research Scientist at Facebook, working in their AI Research Group. He received a Masters in Electrical Engineering with Prof. Pietro Perona at Caltech, before completing a PhD with Prof. Andrew Zisserman at the University of Oxford in 2005. Before coming to NYU, he spent two years as a post-doc in the Computer Science and Artificial Intelligence Lab (CSAIL) at MIT, working with Prof. William Freeman. He has received several awards including a CVPR best paper prize, a Sloan Fellowship & NSF Career award and the IEEE Longuet-Higgins prize.

This talk will describe recent progress in object recognition using deep convolutional networks. Over the last 18 months, these have demonstrated significant gains over traditional computer vision approaches and are now widely used in industry (e.g. Google, Facebook, Microsoft, Baidu). Rob Fergus will outline how these models work, and describe architectures that produce state-of-the-art results on the leading recognition benchmarks. GPUs are an essential component to training these models. The talk will conclude with a live demo.

Session Level: Beginner
Session Type: Talk
Tags: Machine Learning & AI; Computer Vision; Recommended Press Session – HPC-Science; Recommended for All Press

Day: Monday, 03/24
Time: 16:00 - 16:50
Location: Room 210B

S4171 - Efficient GPU-Friendly Pre-Conditioners for Large-Scale Finite Element Analysis

Krishnan Suresh ( Associate Professor, University of Wisconsin )
Krishnan Suresh
Krishnan Suresh is currently an Associate Professor in the Department of Mechanical Engineering, University of Wisconsin, Madison. He graduated in 1998 from Cornell with a Ph.D. in Mechanical Engineering. He later served as an Engineering Manager at Kulicke and Soffa Industries (1998-2002). His research interests are in optimization, high-performance computing and He has co-authored over 35 journal papers, and several conference papers, two of which have received best-paper awards from ASME

The goal of this session is to introduce a new GPU-friendly pre-conditioner, specifically for finite-element applications. The pre-conditioner is assembly-free in that neither the finite-element stiffness matrix nor the pre-conditioner is assembled (ever!). The memory foot-print is therefore extremely small, and the GPU implementation is, in most cases, compute-bound. A CUDA implementation will be discussed, followed by examples of finite element problems with 10's of millions of degrees of freedom. It is assumed that registrants are already familiar with finite element techniques.

Session Level: Intermediate
Session Type: Talk
Tags: Numerical Algorithms & Libraries; Computational Structural Mechanics; Computer Aided Design

Day: Tuesday, 03/25
Time: 13:00 - 13:25
Location: Room LL21D

S4327 - Hierarchical Algorithms on Heterogeneous Architectures: Adaptive Multigrid Solvers for LQCD on GPUs

M Clark ( HPC Compute Engineer , NVIDIA )
M Clark
Dr. Clark's background is in high energy physics, having completed doctoral research in Monte Carlo algorithms for lattice quantum chromodynamics in 2005, graduating from the University of Edinburgh. Upon subsequently moving to Boston University, Dr Clark focused upon adaptive multi-grid algorithms and symplectic integrators. It was during this time that research was initiated into harnessing GPUs for lattice QCD computation: this research has since evolved into the QUDA library. Dr. Clark spent 2009-2011 at Harvard University, continuing to work on algorithms for GPUs and many-core processors, with focus on signal processing. Dr. Clark moved to NVIDIA in 2011, and continues to work at the interface between applications, algorithms and parallel computation.

Learn how GPUs are using advanced multigrid-solver algorithms to revolutionize the study of sub-nuclear physics. Lattice quantum chromodynamics (LQCD) is the study of quarks and gluons, the constituent particles that make up protons and neutrons. Owing to the computationally demanding nature of these calculations, GPUs are an increasingly popular platform for deployment, where a single calculation can requires thousands of GPUs working in tandem for months. There has been much progress to date in developing scalable sparse linear solver algorithms, utilizing well-known mathematical methods such as mixed precision, domain decomposition and pipelining to improve performance, allowing efficient use of large GPU installations such as Blue Waters and Titan. However, there has been less focus on deploying 'mathematically optimal' linear solvers, that have optimal O(N) complexity. In this work we utilize the QUDA framework to deploy adaptive multigrid solvers on GPUs, in particular we describe the architecture abstractions that allow for deployment on heterogeneous systems, utilizing both GPUs and CPUs. We discuss in general the suitability of heterogeneous architectures for hierarchical algorithms, and compare performance against a highly optimized CPU implementation.

Session Level: Intermediate
Session Type: Talk
Tags: Computational Physics; Numerical Algorithms & Libraries

Day: Tuesday, 03/25
Time: 13:00 - 13:50
Location: Room 212A

S4379 - OpenGL 4.4 Scene Rendering Techniques

Christoph Kubisch ( Developer Technology Engineer, NVIDIA )
Christoph Kubisch
Christoph Kubisch is a Developer Technology Engineer for NVIDIA Corporation, where he focuses on OpenGL real-time rendering techniques suitable for CAD/DCC and scientific applications. He collaborates with external partners and NVIDIA's internal teams to optimize current and future rendering algorithms. Prior to joining NVIDIA, Christoph was a researcher on hardware accelerated visualization techniques for medical datasets at the Otto-von-Guericke University of Magdeburg. Furthermore, he has worked as technical artist creating game art, technology and DCC plugin development.
Markus Tavenrath ( Senior Software Engineer Developer Technology, NVIDIA )
Markus   Tavenrath
Markus Tavenrath finished his studies in computer science with focus on computer graphics in 2008. He was one of the first using raytracing on CUDA for this diploma thesis which brought him straight to NVIDIA. There he primarily worked on GPU raytracing for SceniX, NVIDIA's scenegraph technology, which was showcased at SIGGRAPH 2008. Afterwards he applied his experience to implement parts of OptiX, improve SceniX and develop several raytracing demos. In close cooperation with external partners he improved rendering performance and scenegraph usability as developer technology engineer. Now he is using the gained knowledge to experiment with future rendering technologies that bring high interactivity to complex scenes. This work includes both CPU and GPU strategies to solve typical scenegraph operations related to rendering.

OpenGL 4.4 provides new features for accelerating scenes with many objects, which are typically found in professional visualization markets. This talk will provide details on the usage of the features and their effect on real-life models. Furthermore we will showcase how more work for rendering a scene can be off-loaded to the GPU, such as efficient occlusion culling or matrix calculations.

Session Level: Intermediate
Session Type: Talk
Tags: Real-Time Graphics Applications; Performance Optimization; Media & Entertainment

Day: Tuesday, 03/25
Time: 13:00 - 13:50
Location: Room 210C

S4414 - High-Resolution Facial Performance Capture Using CUDA

Jerome Courchay ( Research Engineer, Telecom SudParis )
Jerome Courchay
Jérôme Courchay was born in France in 1977. He received Ph.D. in the field of computer vision in 2010. Jérôme works in the area of stereovision and spatio-temporal reconstruction.

Learn how to use GPU for accelerating 3D registration with Kinect or similar devices in order to capture highly-detailed facial performance in real time or at an interactive speed. We describe the energy-based approach that we borrowed from Hao Li et al. paper published at SGP 2008. We also explain why we can benefit from GPU computation power and achieve higher quality and more detail at interactive speeds. Finally, we elaborate on how real-time performance can be achieved by improving our CUDA-based implementation.

Session Level: Intermediate
Session Type: Talk
Tags: Computer Vision; Real-Time Graphics Applications; Game Development; Virtual & Augmented Reality

Day: Tuesday, 03/25
Time: 13:00 - 13:25
Location: Room 212B

S4443 - Towards Real-Time Nanostructure Prediction with GPUs

Abhinav Sarje ( Postdoctoral Researcher, Lawrence Berkeley National Laboratory )
Abhinav Sarje is presently a postdoctoral researcher in the Future Technologies Group at the Lawrence Berkeley National Lab. His research interests are in parallel algorithms and computing, high-performance scientific computing, algorithms and applications on emerging parallel architectures including multi/many core CPUs and GPUs. He completed his doctoral studies in computer engineering at Iowa State University.

Nanostructure prediction at synchrotron light-sources through X-ray scattering requires compute-intensive analysis of massive amounts of data, making it an ideal example of a Big Compute and Big Data application. In this session you will learn about how hybrid computing with Nvidia graphics processors and multi-core CPUs are making faster than ever data analysis at light-sources possible. Two major components of such analyses will be covered: (1) Forward simulations, and (2) Inverse Modeling. Software tools developed at the Berkeley Lab for this purpose will be taken as case-study. Details of implementations and code-optimization strategies of these software tools on massively-parallel GPU clusters will be given along with performance study on state-of-the-art supercomputers.

Session Level: All
Session Type: Talk
Tags: Supercomputing; Computational Physics; Recommended Press Session – HPC-Science

Day: Tuesday, 03/25
Time: 13:00 - 13:25
Location: Room LL21A

S4460 - Peer-to-Peer Molecular Dynamics and You

Scott LeGrand ( Principal Engineer, Amazon Web Services )
Highly-Rated Speaker
Scott LeGrand is currently a principal engineer at Amazon Web Services. He developed the first molecular modeling system for home computers, Genesis, in 1987, Folderol, the distributed computing project targeted at the protein folding problem in 2000, and BattleSphere, a networkable 3D space shooter for the Atari Jaguar the same year. Surprisingly, all three of these efforts shared a common codebase. More recently, he ported the Folding@Home codebase to CUDA, achieving a 5x speedup over previous efforts and which currently accounts for ~2.6 petaFLOPs of the project's computational firepower. He is best known for his work porting the AMBER molecular dynamics package to CUDA, attaining record-breaking performance in the process. In a previous life, Scott picked up a B.S. in biology from Siena College and a Ph.D. in biochemistry from the Pennsylvania State University. In the current life, he is developing life science services on Amazon's Elastic Compute Cloud (EC2).

Recent code optimization within AMBER has improved single-node performance by up to 30% and multi-GPU scaling by up to 70%. The latter was achieved by aggressive use of Peer-to-Peer copies and RDMA. This has unleashed new time scale regimes for sampling and simulation on low-end GPU clusters, beating every known software-based molecular dynamics codebase in existence at the time of submission. This talk will cover first how AMBER's already efficient single-node performance was made even more so, the challenge not only of enabling peer to peer copies between GPUs, but obtaining hardware capable of enabling it, and finally, up to the minute results using MVAPICH2 and OpenMPI for RDMA directly between GPUs on separate nodes connected by dual-line FDR Infiniband.

Session Level: Intermediate
Session Type: Talk
Tags: Molecular Dynamics; Big Data Analytics & Data Algorithms; Supercomputing

Day: Tuesday, 03/25
Time: 13:00 - 13:50
Location: Room LL21E

S4506 - Indexing Documents on GPU - Can You Index Web in Real Time?

Michael Frumkin ( Sr. Computer Architect, NVIDIA )
Michael Frumkin
Michael contributed to many areas of parallel and high performance computing. He contributed to parallel image inspection algorithm (KLA-Tencor), benchmarking of parallel and distributed applications (NASA), performance modeling of parallel applications on many core systems (Intel), optimization of search engine and implementation of control mechanism in computer defined networks (Google).

Index of web documents provides a base for search and decision making. Traditionally, GPUs are used to run applications having a lot of parallelism and a small degree of divergence. We show that GPUs also are able to outperform CPUs for an application that has a large degree of parallelism, but medium divergence. Specifically, we concentrate on text processing used to index web documents. We present indexing algorithms for both GPU and CPU and show that GPU outperforms CPU on two common workloads. We argue that a medium sized GPU enabled cluster will be able to index all internet documents in one day. Indexing of web documents on GPU opens a new area for GPU computing. Companies that provide search services spend a lot of cycles on indexing. Faster and more energy efficient indexing on GPU may provide a valuble alternative to CPU-only clusters used today.

Session Level: Intermediate
Session Type: Talk
Tags: Big Data Analytics & Data Algorithms; Machine Learning & AI

Day: Tuesday, 03/25
Time: 13:00 - 13:25
Location: Room 210B

S4528 - Enabling Efficient Use of UPC and OpenSHMEM PGAS Models on GPU Clusters

Dhabaleswar K. (DK) Panda ( Professor, The Ohio State University )
Highly-Rated Speaker
Dhabaleswar K. (DK) Panda is a Professor of Computer Science and Engineering at the Ohio State University. He has published over 300 papers in major journals and international conferences. The MVAPICH2 (High Performance MPI over InfiniBand, iWARP and RoCE) open-source software package, developed by his research group (http://mvapich.cse.ohio-state.edu), are currently being used by more than 1,960 organizations worldwide (in 70 countries). This software has enabled several InfiniBand clusters to get into the latest TOP500 ranking during the last decade. More than 130,000 downloads of this software have taken place from the project's website alone. He is an IEEE Fellow and a member of ACM.

Learn about extensions that enable efficient use of Partitioned Global Address Space (PGAS) Models like OpenSHMEM and UPC on supercomputing clusters with NVIDIA GPUs. PGAS models are gaining attention for providing shared memory abstractions that make it easy to develop applications with dynamic communication patterns. However, the existing UPC and OpenSHMEM standards do not allow communication calls to be made directly on GPU device memory. Data has to be moved to the CPU before PGAS models can be used for communication. This talk discusses simple extensions to the OpenSHMEM and UPC models that address this issue. They allow direct communication from GPU memory and enable runtimes to optimize data movement using features like CUDA IPC and GPUDirect RDMA, in a way that is transparent to the application developer. We present designs which focus on performance and truly one-sided communication. We use application kernels to demonstrate the use of the extensions and performance impact of the runtime designs, on clusters with GPUs.

Session Level: Intermediate
Session Type: Talk
Tags: Programming Languages & Compilers; Supercomputing

Day: Tuesday, 03/25
Time: 13:00 - 13:25
Location: Room LL20D

S4571 - Applications of GPU Computing to Mission Design and Satellite Operations at NASA's Goddard Space Flight Center

Abel Brown ( Principal Systems Engineer, A.I. Solutions )
Abel Brown
Abel holds degrees in Mathematics and Physics as well as a PhD in the field of Geodesy & Geophysics from The Ohio State University. For the past eight years Abel has been developing distributed software frameworks and administering high performance computing clusters. He has deployed and managed many sensor networks around the world in Antarctica, South America, and Greenland. Abel is dually appointed on the Magnetopheric Multiscale (MMS) Ground System and Conjunction Assessment development teams and manages numerous research projects at a.i. solutions on GPU computing, image analytics, and advanced satellite perturbation techniques. As co-author, Abel's recent work contributed to the PNAS publication which was featured in WIRED Magazine's "Best Scientific Figures of 2012" titled "Greenland Rising".

The computational intensity required for modern-day space missions is quickly outgrowing existing CPU capabilities. The Magnetosphere Multiscale (MMS) mission is the first NASA mission to fly four satellites in formation and thus has uniquely challenging design and operational requirements, namely, mitigation of collision scenarios involving space debris and/or the formation with itself. By design, no more than 1 in 1000 unsafe close approaches may go undetected while operationally no more than 1 in 20 alarms raised my be false - so as to minimize science interruptions. The confidence intervals required to satisfy such requirements pose daunting computational demands, which operationally, can not be met using traditional CPU solutions. Here it is demonstrated how GPU-accelerated solutions are being deployed, for the first time, at the NASA Goddard Space Flight Center (GSFC) to meet operational MMS mission requirements. Additional applications to Space Situational Awareness and mission design are discussed.

Session Level: All
Session Type: Talk
Tags: Defense; Numerical Algorithms & Libraries; Supercomputing; Scientific Visualization; Recommended for All Press

Day: Tuesday, 03/25
Time: 13:00 - 13:25
Location: Room 210D

S4694 - 10 Billion Parameter Neural Networks in Your Basement

Adam Coates ( Post-doctoral Researcher, Stanford University )
Adam Coates
Adam Coates received his Ph.D. in Computer Science from Stanford University in 2012. He is currently a post-doctoral researcher at Stanford and a Visiting Scholar at Indiana University, Bloomington. His research focuses on scaling up deep learning algorithms to enable machines to make more accurate and complex predictions from learned experience. His interests and prior work touch topics in computer vision, reinforcement learning, and robotics.

See how a cluster of GPUs has enabled our research group to train Artificial Neural Networks with more than 10 billion connections. "Deep learning" algorithms, driven by bigger datasets and the ability to train larger networks, have led to advancements in diverse applications including computer vision, speech recognition, and natural language processing. After a brief introduction to deep learning, we will show how neural network training fits into our GPU computing environment and how this enables us to duplicate deep learning results that previously required thousands of CPU cores.

Session Level: Beginner
Session Type: Talk
Tags: Machine Learning & AI; Computer Vision; Supercomputing; Recommended Press Session – HPC-Science; Recommended for All Press

Day: Tuesday, 03/25
Time: 13:00 - 13:50
Location: Room LL21B

S4696 - Maximizing TTI RTM Throughput with Kepler

David Wade ( Senior Analyst, Seismic Imaging Development, Statoil )
David Wade
David Wade is a HPC Developer who has specialized in Seismic Imaging since 2011. He graduated with a Master's degree in Physics from the University of Durham in 2009.

Reverse Time Migration (RTM) is a key algorithm for exploration in the oil & gas industry. We present an end-to-end implementation of TTI RTM running on the latest generation of Kepler GPUs. Further, we discuss the benefits and remaining challenges of achieving maximum throughput with this algorithm.

Session Level: Intermediate
Session Type: Talk
Tags: Energy Exploration; Performance Optimization

Day: Tuesday, 03/25
Time: 13:00 - 13:25
Location: Room LL20B

S4743 - First In Vivo Medical Images Using Photon-Counting, Real-Time GPU Reconstruction

Augustus Lowell ( VP of Engineering Technology, Triple Ring Technologies )
Augustus Lowell
Gus Lowell has over 20 years of experience in systems and software engineering, product specification development, and product planning, with involvement in the medical device field, semiconductor processing, and internet servers. His background includes expertise in pipelined data processing, image and signal processing algorithm development, distributed processing, embedded systems, real-time and event-driven systems, fault-detection and safety-critical systems, and structured analysis and object-oriented design. He has designed high-speed digital imaging processors and systems; analog/digital, data acquisition, machine-control interfaces, and sensor interfaces. He is skilled in coding in a number of assembly languages and C++/C/Pascal/Fortran. Gus has held senior engineering and project management positions at both Fortune 100 and start-up companies, including Abbott Laboratories, Tetris Systems, NexRay, and in the military space program for the United States Air Force. Gus holds three patents and has authored a number of scientific papers. He received his BS in Electrical Engineering from the Massachusetts Institute of Technology.

Triple Ring Technologies has worked on several generations of a cardiology imaging system. The unique x-ray imaging chain allows for up to 20x radiation exposure reduction and 3D localization. Each generational improvement in image quality required a 10x or more increase in the number of computations required to process the images. With sample rates of nearly 1 Msps and high-density detectors comprising over 200,000 elements, the latest generation system generates 160 billion samples per second and processes them into real-time images useful to a cardiologist. Historically, the processing elements used to achieve required computation rates were created using pipelined parallel processing stages in state-of-the art FPGAs, or exotic massively-parallel processor arrays. The latest generation of NVIDIA GPUs have changed this. We have recently implemented a novel image processor using an array of nine GPUs. We will show the first cardiac imaging study using this approach.

Session Level: All
Session Type: Talk
Tags: Real-Time Graphics Applications; Medical Imaging & Visualization; Clusters & GPU Management; Computational Physics

Day: Tuesday, 03/25
Time: 13:00 - 13:50
Location: Room LL21F

S4816 - Real-Time Fluid Dynamics

Maciej Matyka
Maciej  Matyka
Scientist working for many years on the fluid flow through porous media and the fluid flow in computer graphics. He implemented and experimented with many solvers for the Navier-Stokes equations i.e. the marker-and-cell method, the Lattice Boltzmann method, Smoothed Particles and used these techniques in demos: the tree effect in Crush (1st demo at Breakpoint), the river flow in Bremen (3rd intro at Breakpoint) and the main theme in Pord Tuo (3rd demo at Revision).

Simulation and rendering of the fluid flow at interactive frame rates is a challenging task. It requires algorithmic optimization, compromises on physics and a proper choice of the numerical model. I will discuss these points with focus on two techniques useful in games and demos: animation aerodynamics for simplified Navier-Stokes equations and the Lattice Boltzmann Method. Recent projects based on compute shaders will be presented.

Session Level: Advanced
Session Type: Talk
Tags: Real-Time Graphics Applications; Visual Effects & Simulation; Performance Optimization; NVScene

Day: Tuesday, 03/25
Time: 13:00 - 13:50
Location: Room 230C

S4831 - GAIA: The GPU-Accelerated Distributed Database and Computational Framework Solving the Infinite Data Problem

Nima Negahban ( CTO, GIS Federal )
Nima Negahban, CTO Nima has developed enterprise-scale platforms & leading technologies across a wide spectrum of market sectors ranging from bio-technology to high speed trading systems. Beyond a cutting-edge technical expertise, Nima has a proven track record of leveraging his technical prowess to create compelling business strategies. Nima holds a BS in computer science from the University of Maryland.

A distributed database and computational framework designed to leverage the GPU. GAIA's unique semantic type system coupled with its near real time processing,query, and visualization capability have made it the solution for Government Agencies coming to grips with being able to query and visualize high volume data streams. GAIA has been distributed to multiple Government Agencies including: Army, Navy, and DHS.

Session Level: All
Session Type: Talk
Tags: Defense; Supercomputing; Big Data Analytics & Data Algorithms; Cloud Visualization

Day: Tuesday, 03/25
Time: 13:00 - 13:25
Location: Room 210A

S4848 - Project Tango: Giving Mobile Devices a Human-Scale Understanding of Space and Motion

Johnny Lee ( Technical Program Lead, Google )
Johnny Lee is a Technical Program Lead at the Advanced Technology and Projects (ATAP) group at Google. He leads Project Tango, which is a focused effort to bring computer vision and advanced sensor fusion to mobile platforms. Previously, he helped Google X explore new projects as Rapid Evaluator and was a core algorithms contributor to the original Xbox Kinect. His YouTube videos demonstrating Wii remote hacks have surpassed over 15 million views and became one of the most popular TED talk videos. In 2008, he received his PhD in Human-Computer Interaction from Carnegie Mellon University and has been recognized in MIT Technology Review’s TR35.

Project Tango is a focused effort to harvest research from the last decade of work in computer vision and robotics and concentrate that technology into a mobile platform. It uses computer vision and advanced sensor fusion to estimate position and orientation of the device in the real-time, while simultaneously generating a 3D map of the environment. We will discuss the underlying technologies that make this possible, such as the hardware sensors and some of the software algorithms. We will also show demonstrations of how the technology could be used in both gaming and non-gaming applications. This is just the beginning and we hope you will join us on this journey. We believe it will be one worth taking.

Session Level: All
Session Type: Talk
Tags: Mobile Summit; Game Development; Virtual & Augmented Reality; Computer Vision; Recommended Press Session – Mobile

Day: Tuesday, 03/25
Time: 13:00 - 13:25
Location: Room 210E

S4887 - Project Lighthouse: Final-Frame Rendering with a GPU

Mike Romey ( Head of Pipeline, ZOIC Studios )
Mike Romey
Romey is at the helm of ZOIC Studios’ pipeline, ensuring that all aspects of the facility’s operations are running smoothly. He oversees custom software and pipeline processes that span a broad spectrum of facility management, visual effects, and production disciplines as well as leading teams in crafting solution-based proprietary technologies. This in turn provides ZOIC with some of the most sophisticated pipeline workflow and data reporting tools in the industry.

Zoic Studios will discuss our joint efforts with Chaos Group and NVIDIA to fully realize a final-frame GPU rendering pipeline in production. This unique talk identifies the critical pinch points of an existing VRAY pipeline and the level of attention contemporary GPU's must take to not only meet, but greatly exceed the expectations of quick-turn, television episodic production. Romey will include case study statistics from Zoic's juggernaut ZEUS virtual production pipeline which typically averages 300-400 shots per two week cycle for some of today's most demanding visual effects TV shows, Films, Commercials and Games. Discussion will include a candid evaluation of final-frame rendering on the GPU, its performance, deployment cost and effects on current and future production cycles. Additionally, a diary of technical event and challenges from this project will be discussed as well as a roadmap for future development needed to meet the insatiable demand for rendering visual effects.

Session Level: All
Session Type: Talk
Tags: Media & Entertainment Summit; Recommended Press Session – Media & Entertainment

Day: Tuesday, 03/25
Time: 13:00 - 13:50
Location: Room 211A

S4903 - The State of the Industry: How GPU Technologies Are Set to Empower the VDI Experience

Gunnar Berger ( Research Director, Gartner )
Gunnar Berger
Gunnar Berger is a Research Director for the Gartner for Technical Professionals, Cloud and Virtualization Strategies team. He covers server and client virtualization and private/public delivered desktops (DaaS). Mr. Berger spends considerable time with end-user organizations, advising them on architecture and best practices for both server virtualization and desktop transformation initiatives. Mr. Berger has worked with client virtualization technologies since 1999 and is recognized as a thought leader in the end-user computing space. Mr. Berger has spent the majority of his career as a specialized consultant focused in what is now called end-user computing. He specializes in desktop virtualization, which includes DaaS, storage, networking, server, personalization, virtualization and GPU virtualization technologies.

Hear the state of the VDI industry and how GPU is set to transform the way enterprises see and use a virtual desktop. In this presentation attendees will (1) learn about the typical VDI use cases and how virtual technologies are changing the paradigm; (2) understand the different GPU technologies that exist for SBC and VDI workloads; and (3) see visual demonstrations of each graphics technology.

Session Level: All
Session Type: Talk
Tags: Graphics Virtualization Summit; Recommended Press Session – Graphics Virtualization

Day: Tuesday, 03/25
Time: 13:00 - 13:50
Location: Room 210F

S4935 - Digital Workflow In Automotive Design

Tyler Blake ( Design Manager, Ford Motor Company )
As Design Manager at Ford Motor Company, Tyler Blake has been instrumental in the designs of several award-winning concept cars and best-selling production vehicles such as the F150 and Edge. Most recently, Tyler played a key role in the exterior design of the 2015 Mustang. As a member of Ford's Strategic Concepts Group, he is at the forefront of digital workflow. NVIDIA graphics cards are used in every aspect of his design process.

This talk will provide an overview of digital workflow in an automotive design studio. Learn how contemporary car design studios combine an array of GPU based digital tools with classical clay sculpting techniques to rapidly visualize ideas and deliver vehicles to market faster. Get insight in to the design development process behind concept vehicles and production cars such as the Ford Mustang and F-150. Renderfarms, Autodesk Alias, Autodesk Maya, Autodesk Showcase, Adobe Photoshop, Adobe After Effects, Digidesign Protools, Bunkspeed Drive, ICEM, and Catia are some of the NVIDIA driven technologies to be discussed.

Session Level: All
Session Type: Talk
Tags: Digital Manufacturing Summit; Computer Aided Design; Ray Tracing; Automotive

Day: Tuesday, 03/25
Time: 13:00 - 13:25
Location: Room 210G

S4958 - Application Optimized GPU System Solutions: Winning Strategies for Selecting Best Platforms (Presented by Super Micro)

Don Clegg ( VP, Marketing and Business Development, Super Micro Computer, Inc )
VP, Marketing and Business Development

Power Budget Challenges? Space Limitations? CAPEX and OPEX Constraints? Unmet Performance Expectations? Compressed Deployment Schedules? Server Management Interoperability? Serviceability? Density? Compatibility? What's YOUR biggest hardware challenge when deciding upon YOUR ideal GPU System Architecture?

Choosing the right hardware platform plays a tremendous role in the success of any GPU Solution endeavor. Supermicro has the broadest portfolio of GPU system building blocks in the industry. This session will give an overview of the many GPU System choices Supermicro provides to engineers and solutions architects. Strengths, optimizations, and competitive advantages within each of the Supermicro GPU product families will be reviewed so that the audience can gain a better understanding of how to select the best platform to meet their unique demands.

Session Level: All
Session Type: Talk
Tags: Desktop & Application Virtualization; Supercomputing; Performance Optimization

Day: Tuesday, 03/25
Time: 13:00 - 13:50
Location: Room LL20C

S4226 - Array-Oriented Python on the GPU with Parakeet

Alexander Rubinsteyn ( Ph.D. Student, NYU )
Alex Rubinsteyn is a Computer Science Ph.D. student at NYU. His interests are a high variance mixture distribution around programming language implementation and machine learning.

Python is quickly becoming the "glue" language of choice for scientific and numerical computing. For performance-critical algorithms, however, programmers still have to offload computations into compiled code. Parakeet is a runtime compiler for a numerical subset of Python which lifts this productivity burden. Parakeet intercepts calls into array-oriented Python functions and transparently compiles them into CUDA implementations. Come learn how to use Parakeet to gain orders of magnitude performance improvements over Python/NumPy programs.

Session Level: Beginner
Session Type: Talk
Tags: Programming Languages & Compilers

Day: Tuesday, 03/25
Time: 13:30 - 13:55
Location: Room LL20D

S4228 - Fast N-body Methods as a Compute-Bound Preconditioner for Sparse Solvers on GPUs

Rio Yokota ( Research Scientist, KAUST )
Rio Yokota
Rio Yokota is a Research Scientist in the Strategic Initiative for Extreme Computing at KAUST. He currently works on fast multipole methods (FMM), and their implementation on large scale heterogeneous architectures. During his PhD, he worked on the implementation of fast multipole methods on special purpose machines such as MDGRAPE-3, and then on GPUs after CUDA was released. During his post-doc he continued to work on FMM, and was part of the team that won the Gordon Bell prize for price/performance in 2009 using 760 GPUs. During his postdoc with Lorena Barba at Boston University he developed an open source parallel FMM code -- ExaFMM. He is now running this code on full nodes of the TSUBAME 2.0 and K computer in Japan, and also on Mira, Titan, and Stampede in the US.

Learn how to unleash the full power of GPUs on one of the more difficult problems -- preconditioning in sparse solvers -- by using fast N-body methods as a preconditioner. Fast N-body methods have been able to achieve high percentage of the peak performance since the early days of GPU computing. However, its successful applications have been limited to astrophysics and molecular dynamics, where the physics itself is naturally described by a collection of discrete points. Mathematically, there is nothing that prevents the use of fast N-body methods as a solver for a more general class of PDEs. This would not have been a good idea back when Flops were expensive, since it essentially turns the sparse matrix into a dense matrix of the same size, before hierarchically grouping the off-diagonal blocks. But now that Flops are becoming comparatively cheap, the notion of a "compute-bound preconditioner" sounds attractive more than ever. We will demonstrate how competitive such a preconditioner actually is on Kepler.

Session Level: Intermediate
Session Type: Talk
Tags: Numerical Algorithms & Libraries; Computational Physics

Day: Tuesday, 03/25
Time: 13:30 - 13:55
Location: Room LL21D

S4319 - Power-Aware Software on ARM

Paul Fox ( Graduate Fellow, EM Photonics, Inc. )
Paul Fox has four years of experience in high-performance computing and numerical linear algebra. He earned his bachelor's degree in electrical engineering at Grove City College, and is pursuing a master's degree at the University of Delaware. He contributed to the CUDA GPU linear algebra library, and has done research in autotuning for hybrid computing as well as low-power optimizations. His research interests include hybrid and high-performance computing, modeling and scientific computing, and languages and compilers.

Learn how to optimize your software application with power-awareness, to decrease size, weight and power of the overall system. Advancements in processing technology have provided considerable gains in performance and power savings. The latest generation of mobile processors enables smartphones that can remain idle for days, or operable for an entire trans-continental flight under heavy-use. These advancements have mainly been achieved with low-power-by-design approaches which allow processors to consume less energy when not in use. Unfortunately, situations requiring persistent use, such as navigation, severely limit the benefits of existing designs. Come see how EM Photonics is optimizing software to be more "power-aware," to benefit the soldiers in the field and how these techniques may be of benefit to your application.

Session Level: Advanced
Session Type: Talk
Tags: Defense; Mobile Applications

Day: Tuesday, 03/25
Time: 13:30 - 13:55
Location: Room 210A

S4381 - Real-Time 3D Pose Estimation of Hundreds of Objects

Karl Pauwels ( Postdoctoral Research Fellow, University of Granada, Spain )
Karl Pauwels
Dr. Karl Pauwels received the M.Sc. in Commercial Engineering, the M.Sc. in Artificial Intelligence, and the Ph.D. in Medical Sciences from the Katholieke Universiteit Leuven, Belgium. He is currently a Marie Curie postdoctoral research fellow at the Computer Architecture and Technology Department of the University of Granada, Spain. His main research interest is real-time computer vision in the context of autonomous navigation and dexterous manipulation of complex objects. He takes inspiration from biological vision to more easily exploit the parallelism provided by GPUs.

Discover how hundreds of objects can be simultaneously located and tracked in 3D through the real-time combination of visual simulation and visual perception. A tight integration of GPU graphics and compute has allowed us to continuously update a 3D scene model on the basis of dense visual cues, while at the same time feeding back information from this model to facilitate the cue estimation process itself. In this session we will describe (1) the low-level dense motion and stereo engine that can exploit such model feedback, (2) the 6DOF pose (location and orientation) estimation of hundreds of rigid objects at 40 Hz, and (3) how the same framework enables multi-camera and/or complex articulated object tracking. Throughout the session, we will pay special attention to implementation and system integration aspects of our real-time demonstrator system.

Session Level: Intermediate
Session Type: Talk
Tags: Computer Vision; Mobile Applications; Video & Image Processing; Machine Learning & AI

Day: Tuesday, 03/25
Time: 13:30 - 13:55
Location: Room 212B

S4462 - GPUs and Regular Expression Matching for Big Data Analytics

Alon Shalev Housfater ( Software Developer, IBM )
Alon Shalev Housfater is a software engineer at the Hardware Acceleration Laboratory, IBM since 2011. He specializes in applying computational acceleration technology for enterprise computing. Alon holds a PhD in Electrical and Computer Engineering from the University of Toronto where he studied fundamental performance limits of broadcast communication systems.

Regular expression based pattern matching is a key enabling technology for a new generation of big data analytics. We'll describe several key use cases that require high throughput, low latency, regular expression pattern matching. A new GPU based regular expression technology will be introduced, its basic performance characteristics will be presented. We'll demonstrate that the GPU enables impressive performance gains in pattern matching tasks and compare its performance against latest generation processors. Finally, we'll examine the key challenges in using such accelerators in large software products and highlight open problems in GPU implementation of pattern matching tasks.

Session Level: Intermediate
Session Type: Talk
Tags: Big Data Analytics & Data Algorithms; Recommended Press Session – HPC-Science

Day: Tuesday, 03/25
Time: 13:30 - 13:55
Location: Room 210B

S4632 - Exploring the Earth in 3D: Multiple GPUs for Accelerating Inverse Imaging

Chris Leader ( Graduate Student, Department of Geophysics, Stanford )
Chris Leader is currently a 5th year student in Stanford Earth Sciences working on both acquisition and HPC applications for seismic exploration. He was also worked on these topics at Shell International, Houston and Total, Pau, France. He holds an MSc from Imperial College London and a BA from Oxford University.

Discover how we can harness the power of multiple GPUs to explore the Earth with seismic data. A wave equation based inversion process is used to turn these data into a high-fidelity image, however for contemporary datasets this requires around 1018 operations, if not more. GPUs can ease this computational bottleneck but they create two further limiting factors: exacerbated disk accesses and global memory limitations. These can be addressed by manipulating the domain boundaries and by decomposing our problem across multiple GPUs. We will show you how we can create detailed seismic images without these traditional restrictions

Session Level: Beginner
Session Type: Talk
Tags: Energy Exploration; Computational Physics; Scientific Visualization; Numerical Algorithms & Libraries

Day: Tuesday, 03/25
Time: 13:30 - 13:55
Location: Room LL20B

S4642 - Large Scale Visualization at Boeing

Christopher Senesac ( Senior Process Architect, Boeing )
Christopher Senesac
Christopher Senesac joined The Boeing Company in 1993. He is currently a Senior Process Architect for Engineering Systems Integration Process and Tools at Boeing South Carolina responsible for coordinating the development, deployment and support of visualization tools and processes for Boeing South Carolina. He has held that position for the past 3 years and has 20 years of experience implementing visualization inside Boeing. He also led the architecture and implementation of Visualization at Boeing Philadelphia Rotorcraft prior to his current assignment. Chris brings more than 32 years of experience to the manufacturing and visualization arena.

Boeing designs and builds complex aerospace products that challenge conventional notions about 'scale'. Scale problems occur throughout the product life cycle. Sales and marketing work with unit costs in the hundreds of millions, designers and engineers explore hundreds of alternatives from dozens of different viewpoints, manufacturing builds and assembles products with millions of parts, and support must be able to provide spares and repair expertise for a product with a 50+ life span. This session explores the application of large scale visualization across the product life cycle as enabled by GPU technology.

Session Level: All
Session Type: Talk
Tags: Digital Manufacturing Summit; Large Scale Data Visualization & In-Situ Graphics; Manufacturing; Recommended Press Session – Digital Manufacturing; Recommended for All Press

Day: Tuesday, 03/25
Time: 13:30 - 14:20
Location: Room 210H

S4670 - The Operational Impact of GPUs on ORNL's Cray XK7 Titan

Jim Rogers ( Director of Operations, National Center for Computational Sciences, Oak Ridge National Laboratory )
Jim Rogers is the Director of Operations for the National Center for Computational Sciences at Oak Ridge National Laboratory. The NCCS provides full facility and operations support for three petaFLOP-scale systems including Titan, a 27PF Cray XK7. Jim has a BS in Computer Engineering, and has worked in high performance computing systems acquisition, integration, and operation for more than 25 years.

With a peak computational capacity of more than 27PF, Oak Ridge National Lab's Cray XK7, Titan, is currently the largest computing resource available to the US Department of Energy. Titan contains 18,688 individual compute nodes, where each node pairs one commodity x86 processor with a single NVIDIA Kepler GPU. When compared to a typical multicore solution, the ability to offload substantive amounts of work to the GPUs provides benefits with significant operational impacts. Case studies show time-to-solution and energy-to-solution that are frequently more than 5 times more efficient than the non-GPU-enabled case. The need to understand how effectively the Kepler GPUs are being used by these applications is augmented by changes to the Kepler device driver and the Cray Resource Utilization software, which now provide a mechanism for reporting valuable GPU usage metrics for scheduled work and memory use, on a per job basis.

Session Level: All
Session Type: Talk
Tags: Supercomputing; Performance Optimization; Recommended Press Session – HPC-Science

Day: Tuesday, 03/25
Time: 13:30 - 13:55
Location: Room LL21A

S4675 - Embedding CUDA

Dustin Franklin ( GPGPU Applications Engineer, GE Intelligent Platforms )
Highly-Rated Speaker
Dustin is a GPU expert in the defense & aerospace industry. Originally a 3D rendering architect for games & simulations, he changed focus in 2005 to GPGPU. Dustin has years of experience in deploying high-performance CUDA applications onto rugged platforms like UAVs, tanks, and attack helicopters. Currently he works for GE as a GPGPU Applications Engineer and lives near Washington DC.

Rugged GPUs are bringing leading edge performance and mission-critical reliability to platforms with harsh operating environments. Follow advances in GPU technology which unlock real-time CUDA capabilities for low-latency GPU applications. Learn how to architect systems with GPUDirect and 3rd-party IO devices and interconnects for efficient data streaming and increased scalability. Tune your CUDA kernels and control logic for low-latency asynchronous behavior with response times down into the microseconds. Explore embedded GPU applications in signal processing, imaging, avionics, vetronics, and shipboard.

Session Level: All
Session Type: Talk
Tags: Defense; Signal & Audio Processing; Video & Image Processing; Real-Time Graphics Applications

Day: Tuesday, 03/25
Time: 13:30 - 13:55
Location: Room 210D

S4876 - From Play to Presence (Presented by Unity)

Paul Tham ( Lead Developer , Unity Technologies )
Paul Tham is a Lead Developer at Unity Technologies specializing in non-game features aimed at the professional market. At Unity, he is the developer for the Cluster Rendering feature that allows Unity to render in sync across multiple machines and multiple screens This allows Unity to be used in VR domes and cubes. He is also the development lead for a group of programmers that are responsible for features such CAD and GIS data interoperability, multi-monitor support and other web related technologies. Before Unity, Paul worked at LucasArts and Ubisoft developing highly scalable backend features for AAA online games

Unity launched in 2005 as a 3D games authoring tool for Mac. 9 years later, enabled by the ubiquity of GPU technologies, with over 2.5 million registered developers and 350 million Unity Web Player installs, Unity powers thousands games on mobile devices, web browsers and desktop OSs. However Unity is not just about games. Unity is increasingly used by manufacturing companies to engage, educate and explain new products and visualize new developments. Porsche and Lego use Unity to power their online 3D configurators, NASA used Unity to educate the populace about the Mars rover, the US Army uses Unity to train for maintenance on the Apache helicopter. Seimens are using Unity to teach maintenance engineers how to maintain renewable energy solutions such as Wind farms; while research centers like Vienna University of Technology are using Unity to create advanced prosthesis. In the AEC industries, companies such as Arch Virtual are pioneering the successful use real time visualization in architecture for prestigious projects such as the $85 million Rutgers University School of Business development. This talk uses example industry case studies to illustrate Unity's journey from creating entertainment products to how designers and researchers are using Unity to create and innovate new product experiences.

Session Level: All
Session Type: Talk
Tags: Digital Manufacturing Summit; Recommended Press Session – Digital Manufacturing

Day: Tuesday, 03/25
Time: 13:30 - 14:20
Location: Room 210G

S4904 - Mobile Depth Sensing Methods and What They're Good For

Gur Arie Bittan ( CTO, Mantis Vision )
Gur Arie Bittan
Widely recognized as a visionary of the digital video market, Gur was immersed in video from an early age. At 12 he completed his first system integration project and quickly developed into one of Israel’s leading experts on video integration solutions. The leap from 2D to 3D was a natural progression, as he expanded his activities from supporting existing systems to developing new ones. Prior to co-founding the company, he served as the VP of Business Development at DVDemand, and as the Technical Manager for Primus D&A.

Introduction of main mobile depth sensing technologies, understanding their limitations and value to different kind of mobile apps (gesture, AR, modeling, content creation). In this session we will overview the inherent difference between "Time of flight" depth sensing (SoftKinetics), passive triangulation (multi aperturestereoshape from motion), active triangulationcoded light methods (Mantis-Vision, Primesense), and in specific, the performance differences effect on different kind of mobile apps.

Session Level: All
Session Type: Talk
Tags: Mobile Summit

Day: Tuesday, 03/25
Time: 13:30 - 13:55
Location: Room 210E

S4213 - Kokkos: A Manycore Device Performance Portability Library for C++ HPC Applications

H. Carter Edwards ( Principle Member of Technical Staff, Sandia National Laboratories )
Highly-Rated Speaker
H. Carter Edwards
H. Carter Edwards: 1979-1982, BS Aerospace Engineering, University of Texas at Austin. 1982-1991, LinCom Corporation, contractor at Johnson Space Center, Houston TX. 1991-1993, MS Aerospace Engineering, University of Texas at Austin. 1993-1997, PhD Computational and Applied Mathematics University of Texas at Austin. 1998-current, Sandia National Laboratories.

Discover how the Kokkos library enables you to develop HPC scientific applications that are performance portable across disparate manycore devices such as NVIDIA Kepler and Intel Xeon Phi. Portable programming models such as OpenMP, OpenACC, OpenCL, and Thrust focus on parallel execution but fail to address memory access patterns critical for achieving best performance. Thus codes must be extensively re-written to meet device specific memory access pattern requirements; e.g., data structures and loops transformed from array-of-structures patterns to structure-of-arrays patterns. We address this issue by integrating compile-time polymorphic data layout with parallel execution. We will present manycore performance portability of the LAMMPS molecular dynamics code and Trilinos/Tpetra linear solvers implemented with MPI+Kokkos, and run on a clusters with Intel Xeon Phi and NVIDIA Kepler devices.

Session Level: Intermediate
Session Type: Talk
Tags: Supercomputing; Numerical Algorithms & Libraries; Programming Languages & Compilers

Day: Tuesday, 03/25
Time: 14:00 - 14:50
Location: Room LL21A

S4287 - High Performance Numerical Algorithms for Seismic and Reservoir Simulations

Hatem Ltaief ( Research Scientist, KAUST )
Hatem Ltaief
Hatem received the PhD degree in Computer Science from the University of Houston in 2007. He was a Research Scientist in the Innovative Computing Laboratory in the EECS Department at the University of Tennessee, Knoxville until 2010. He joined KAUST, Saudi Arabia, in January 2011 and he is currently a Computational Scientist at the Supercomputing Laboratory. He was the recipient of an NVIDIA Cuda Research Center Award in 2012. He is part of the European Exascale Software Initiative (EESI) to build a European vision and roadmap to address the challenges of the new generation of massively parallel systems. His research interests include parallel numerical algorithms, fault tolerant algorithms, parallel programming models and performance optimizations for multicore architectures and hardware accelerators.
Rio Yokota ( Research Scientist, KAUST )
Rio Yokota
Rio Yokota is a Research Scientist in the Strategic Initiative for Extreme Computing at KAUST. He currently works on fast multipole methods (FMM), and their implementation on large scale heterogeneous architectures. During his PhD, he worked on the implementation of fast multipole methods on special purpose machines such as MDGRAPE-3, and then on GPUs after CUDA was released. During his post-doc he continued to work on FMM, and was part of the team that won the Gordon Bell prize for price/performance in 2009 using 760 GPUs. During his postdoc with Lorena Barba at Boston University he developed an open source parallel FMM code -- ExaFMM. He is now running this code on full nodes of the TSUBAME 2.0 and K computer in Japan, and also on Mira, Titan, and Stampede in the US.

Learn how to leverage current numerical algorithms for solving challenging reservoir and seismic simulation problems on GPUs using: 1) a novel preconditioner technique based on massively parallel, compute intensive Fast N-body methods, 2) an optimized implementation of the Sparse Matrix-Vector multiplication used during the iterative solver phase, which exploits the existing structure of the sparse matrix and 3) a synchronization-reducing algorithm for stencil-based computation during explicit time integration.

Session Level: Intermediate
Session Type: Talk
Tags: Energy Exploration; Numerical Algorithms & Libraries

Day: Tuesday, 03/25
Time: 14:00 - 14:25
Location: Room LL20B

S4299 - Fast Solvers for Linear Systems on the GPU

Cornelis Vuik ( Professor, Delft University of Technology )
Cornelis Vuik
Cornelis received his Master Applied Mathematics from TU Delft and his Ph.D. from Utrecht University. Since 2010, Cornelis has served as the Associate Editor SIAM Journal on Scientific c Computing (SISC). Cornelis has authored more than 130 ISI papers, is the Co-investigator of an Erasmus Mundus Master program and Director of Delft Centre for Computational Science and Engineering and Scientific c Director of 3TU.AMI Applied Mathematics Institute.

Some examples are given to solve large linear systems coming from practical/industrial applications. The methods are based on preconditioned Krylov subspace methods. Most building blocks are easy implemented on the GPU. The most involved operation is the preconditioner. In this talk three variants are discussed: (1) Neumann series, (2) Deflation techniques, and (3) Recursive red black ordering. The methods are applied so multi-phase flow and a ship simulator application and show speedups of a factor 30-40.

Session Level: Intermediate
Session Type: Talk
Tags: Numerical Algorithms & Libraries; Computational Fluid Dynamics; Supercomputing; Manufacturing

Day: Tuesday, 03/25
Time: 14:00 - 14:25
Location: Room LL21D

S4320 - Packet-based Network Traffic Monitoring & Analysis with GPUs

Wenji Wu ( Network Researcher, Fermilab )
Wenji Wu
Wenji Wu received the doctorate degree in computer engineering in 2003 from the University of Arizona, Tucson. He is currently a network researcher in Fermi National Accelerator Laboratory. His research interests include high performance networking, operating systems, and distributed systems.

In high-speed networks, network traffic monitoring and analysis applications may require enormous raw compute power and high I/O throughputs, especially when traffic scrutiny on a per-packet basis is needed. Under those conditions, the applications face tremendous performance and scalability challenges. The GPU architecture fits well with the features of packet-based network monitoring and analysis applications. At Fermilab, we have prototyped a GPU-assisted network traffic monitoring & analysis system, which analyzes network traffic on a per-packet basis. We implemented a GPU-accelerated library for network traffic capturing, monitoring, and analysis. The library consists of various CUDA kernels, which can be combined in various ways to perform monitoring and analysis tasks. In this talk, we will describe our architectural approach in developing a generic GPU-assisted network traffic monitoring and analysis capability. Multiple examples will be given to demonstrate how to use GPUs to analyze network traffic.

Session Level: All
Session Type: Talk
Tags: Big Data Analytics & Data Algorithms; Computational Physics; Supercomputing; Numerical Algorithms & Libraries; Recommended Press Session – HPC-Science

Day: Tuesday, 03/25
Time: 14:00 - 14:50
Location: Room 210B

S4401 - Real-Time Affine-Invariant Feature Extraction: Object Recognition Under Extreme Viewpoint Change

Valeriu Codreanu ( Postdoctoral Researcher, Eindhoven University of Technology )
Valeriu Codreanu
Valeriu Codreanu is a postdoctoral researcher at the Johann Bernoulli Institute for Mathematics and Computer Science, University of Groningen. Before joining the team in Groningen, Valeriu received his PhD in Electrical Engineering from the Polytechnic University of Bucharest in 2011 with a thesis proposing efficient cooperation between multi-threaded and vector processors. His general research interests lie in the field of energy-efficient computing systems, ranging from theory to architecture design and to programming such systems. His current interests revolve around software techniques to make efficient use of CPU-GPU systems and automatic ways of generating high quality parallel code, with the goal of making parallel programming easier.

Learn how to efficiently design affine-invariant feature extractors using GPU hardware for the purpose of robust object recognition. Local feature extraction from images is one of the main topics in pattern matching and computer vision in general. Some of the best feature extractors such as SIFT and SURF are scale, rotation, and translation invariant, but fall short when illumination and viewpoint change are taken into account. To increase the viewpoint-invariance of SIFT, the fully affine-invariant ASIFT was developed, but this came with a very high computational cost. We present results from using our simple image transformation framework to achieve real-time affine-invariant object recognition, while also being scalable in terms of the number of GPU devices used. Participants in this session will learn more about this high-performance CUDA solution for adding viewpoint-invariance to any feature extractor, relying on the hardware features of modern GPU devices.

Session Level: Intermediate
Session Type: Talk
Tags: Computer Vision; Video & Image Processing

Day: Tuesday, 03/25
Time: 14:00 - 14:25
Location: Room 212B

S4421 - GPU Computing with MATLAB

Andy Thé ( Sr. Product Marketing Manager - Image Processing , MathWorks )
Andy Thé
Andy holds a B.S. in Electrical Engineering from Georgia Institute of Technology and a B.A. in Business from Kennesaw State University. Before joining MathWorks, Andy spent 12 years as a field applications engineer focused on embedded processors at Texas Instruments, and 3 years as a software marketing manager for real-time software at IntervalZero.

Learn how to use NVIDIA GPUs to accelerate computationally intensive MATLAB applications in areas such as image processing, signal processing, and computational finance. We will use an image processing example to demonstrate how you can speed up your MATLAB code by using built-in GPU enabled functionality or by replacing key computations with CUDA kernels. We will also illustrate how MATLAB can be used as a development environment and test framework for CUDA kernel evaluation, visualization, and validation.

Session Level: Beginner
Session Type: Talk
Tags: Programming Languages & Compilers; Video & Image Processing; Medical Imaging & Visualization

Day: Tuesday, 03/25
Time: 14:00 - 14:50
Location: Room LL21F

S4438 - What's new in OpenACC 2.0 and OpenMP 4.0

Jeff Larkin ( Developer Technology Software Engineer, NVIDIA )
Highly-Rated Speaker
Jeff Larkin
Jeff Larkin is a software engineer in NVIDIA's Developer Technology group, where he helps developers profile and optimize scientific applications. Prior to joining NVIDIA, Jeff worked as a performance engineer at Cray Inc. Jeff represents NVIDIA to both the OpenACC and OpenMP organizations.

In 2013, both OpenACC and OpenMP released significant updates to their respective standards to better support GPUs. This talk will discuss what's new in each standard and how these features simplify GPU programming.

Session Level: Intermediate
Session Type: Talk
Tags: Programming Languages & Compilers

Day: Tuesday, 03/25
Time: 14:00 - 14:25
Location: Room LL20D

S4481 - High Performance Video Pipelining: A Flexible Architecture for GPU Processing of Broadcast Video

Peter Walsh ( Chief Emerging Technology Engineer, ESPN )
Peter Walsh
Peter Walsh is Chief Emerging Technology Engineer in the Emerging Technology Department at ESPN. His primary focus is on the development of systems for the enhancements of broadcast video through the insertion of virtual graphics. This requires the combination of rendering, video processing and real-time computer vision. His work leverages the GPU, taking advantage of the CUDA computing platform.

Discover how to architect a system for the real time GPU processing of broadcast video. Learn how the current generation of GPU hardware and the CUDA computing platform can be used to support simultaneous video acquisition, transferring of video from CPU to GPU, processing on the GPU, transferring video from the GPU to the CPU and performing processing on the CPU. The architecture described achieves high throughput by pipelining these operation while maintaining the flexibility for easy reconfiguration. A common buffer mechanism will be described for both CPU and GPU memory. This buffer mechanism also supports buffers having different line pitches which may be required depending on the hardware configuration. In addition to video processing, the interoperation between graphics and CUDA processing is also addressed within the same framework.

Session Level: Intermediate
Session Type: Talk
Tags: Media & Entertainment Summit; Video & Image Processing; Real-Time Graphics Applications; Recommended Press Session – Media & Entertainment; Recommended for All Press

Day: Tuesday, 03/25
Time: 14:00 - 14:25
Location: Room 211B

S4561 - Virtual Screening of One Billion Compound Libraries Using Novel GPU-Accelerated Cheminformatics Approaches

Olexandr Isayev ( Research Scientist, University of North Carolina at Chapel Hill )
Olexandr Isayev
Olexandr Isayev is a Research Scientist at UNC Eshelman School of Pharmacy at the University of North Carolina at Chapel Hill. His research interests include the broad areas of cheminformatics, molecular dynamics and materials science. Olexandr earned his Ph.D. in Theoretical Chemistry in 2008 and worked at Case Western Reserve University and US Army ERDC prior to joining UNC this year.

Recent years have seen an unprecedented growth of chemical databases incorporating tens of millions of available or up to 170 billion of synthetically feasible chemical compounds. They offer unprecedented opportunities for discovering novel molecules with the desired therapeutical and safety profile. However, current cheminformatics technologies and software relying on conventional CPUs are not capable to handle, characterize, and virtually screen such "Big Data" chemical libraries. We present the first proof-of-concept study of GPU-accelerated cheminformatics software capable of calculating chemical descriptors for a billion molecules-large library. Furthermore, we demonstrate the ability of GPU-based virtual screening software to rapidly identify compounds with specific properties in extremely large virtual libraries. We posit that in the era of big data explosion in chemical genomics, GPU computing represents an effective and inexpensive architecture to develop and employ a new generation of cheminformatics methods and tools.

Session Level: Intermediate
Session Type: Talk
Tags: Molecular Dynamics; Big Data Analytics & Data Algorithms

Day: Tuesday, 03/25
Time: 14:00 - 14:25
Location: Room LL21E

S4610 - OpenGL: 2014 and Beyond

Cass Everitt ( Principal Engineer, NVIDIA )
Cass Everitt
Cass discovered the magic of 3D graphics as an EE undergraduate at Mississippi State University late in the Cretaceous period, back when everyone thought z buffers and Gouraud shading were the M. bermensis's knees. (M. bermensis is the Cretaceous predecessor to the modern bee.) He has spent most of his career in various engineering capacities at NVIDIA, mostly focusing on the continued evolution of OpenGL and the amazing GPUs that implement it. He has a weakness for homogeneous coordinates and is a particular fan of negative w.
Seth Williams ( Mobile Developer Technologies Engineer, NVIDIA )
Seth Williams
Seth Williams is a Mobile Developer Technologies Engineer at NVIDIA. In the past 7 years there he's worn a few hats. He has worked as a Cg Runtime developer, a Cg shader-wrangler for OpenGL and D3D APIs, and as an OSX OpenGL driver developer, and now helps game developers directly to migrate their titles to mobile platforms and improve performance on Tegra SOCs. Prior to NVIDIA he worked as an OpenGL driver developer at 3Dlabs, and at Intel on their compilers and parallel programming tools. Seth holds a Bachelor of Science in Computer Engineering from University of Illinois at Urbana-Champaign.

Learn techniques for efficiently using the GPU and detecting and eliminating driver overhead. See the direction that OpenGL is heading in to embrace multi-threaded, multi-core CPU app designs. Also, the GPU can construct and update app rendering data structures to require very little CPU intervention. We will also explore subdivision surfaces and how to get them automatically GPU accelerated with a new extension. And hand-in-glove with subdivision surfaces is PTEX support in OpenGL. Finally, while OpenGL is the most broadly available open API for 3D graphics, it's also the most fragmented. We will explore Regal, an open source library that illustrates how to de-fragment the OpenGL landscape and keep your graphics back end code from becoming a patchwork of platform #ifdefs.

Session Level: Intermediate
Session Type: Talk
Tags: Real-Time Graphics Applications; Performance Optimization; Game Development; Rendering & Animation

Day: Tuesday, 03/25
Time: 14:00 - 14:50
Location: Room 210C

S4651 - Deep Learning Meets Heterogeneous Computing

Ren Wu ( Distinguished Scientist, Baidu )
Ren Wu
Dr. Ren Wu is a distinguished scientist of Baidu. He was the lead architect for Heterogeneous system Architecture (HSA) and before that, he was the Principal Investigator of CUDA Research Center at HP Labs. Dr. Wu is renowned for pioneering the idea of using GPUs to accelerate big data analytics as well as his work on GPU-accelerated large-scale clustering algorithms. At Baidu, Dr. Wu is leading the effort to build the company's heterogeneous computing platform - a turbo engine to power Baidu's business and to unlock a new kind of intelligence.

The rise of the internet, especially mobile internet, has accelerated the data explosion - a driving force for the great success of deep learning in recent years. Behind the scenes, the heterogeneous high-performance computing is another key enabler of that success. In this talk, we will share some of work we did at Baidu. We will highlight how big data, deep analytics and high-performance heterogeneous computing can work together with great success.

Session Level: All
Session Type: Talk
Tags: Machine Learning & AI; Big Data Analytics & Data Algorithms; Supercomputing; Video & Image Processing; Recommended for All Press

Day: Tuesday, 03/25
Time: 14:00 - 14:50
Location: Room LL21B

S4693 - Accelerating Low-Lying Eigenmode Deflation for Lattice QCD Fermion Inverters on GPUs

Alexei Strelchenko ( Computational Physics Developer, Fermi National Accelerator Laboratory )
Alexei Strelchenko is a computational physics developer at the Fermi National Accelerator Laboratory. He received a Ph.D. degree from Leipzig University, Germany. From 2009 to 2012 he was working on the Lattice QCD on GPU Architectures project at the Computation-based Science and Technology Research Centre (CaSToRC). He was also participating in European PRACE-1IP (Partnership For Advance Computing in Europe) and LinkSCEEM-2 projects. His main research interests focus on computational physics, general purpose computing on graphics processing units (GPGPU), Lattice QCD.

Learn how to leverage the power of GPUs to accelerate solution of large sparse linear systems with multiple right hand sides by means of the incremental eigCG algorithm. For a given hermitian system with multiple right hand sides this algorithm allows (1) to compute incrementally a number of small magnitude eigenvalues and corresponding eigenvectors while solving the first few systems with standard Conjugate Gradient (CG), and then (2) to reuse the computed eigenvectors to deflate the CG solver for the remaining systems. In this session we will discuss implementation aspects of the technique and analyse its efficiency on the example of lattice QCD fermion matrix inversions.

Session Level: Intermediate
Session Type: Talk
Tags: Computational Physics; Numerical Algorithms & Libraries; Supercomputing

Day: Tuesday, 03/25
Time: 14:00 - 14:25
Location: Room 212A

S4703 - GPU Computing for Cognitive Robotics

Martin Peniak ( Cognitive Robotics Researcher, CUDA Lecturer, Plymouth University )
Martin Peniak is a Ph.D. candidate at Plymouth University who recently finished working at NVIDIA architecture research group in Santa Clara. His Ph.D. research was supported by FP7 Italk and Poeticon++ projects investigating the acquisition of cognitive skills in humanoid robots. Martin previously collaborated with the ACT (Advanced Concepts Team) of the ESA (European Space Agency) where he researched the application of evolutionary robotics approaches for planetary rover control. Martin has an extensive experience in parallel GPU programming, machine learning and he is the main developer of MarsRoverSimulator and Aquila: Software Architecture for Cognitive Robotics.

Learn how GPU Computing impacts cognitive robotics in the areas of: (1) software development, (2) action and language acquisition in humanoid robots, and (3) biologically-inspired 3D object recognition. The presentation will feature the latest state-of-the-art results and videos from each area mentioned above.

Session Level: Intermediate
Session Type: Talk
Tags: Computer Vision; Machine Learning & AI

Day: Tuesday, 03/25
Time: 14:00 - 14:25
Location: Room LL21C

S4725 - Delivering High-Performance Remote Graphics with NVIDIA GRID Virtual GPU

Andy Currid ( System Architect, Distinguished Engineer, NVIDIA )
Andy Currid
Andy Currid is a system architect in NVIDIA's GPU Virtualization team, where he has led the implementation of NVIDIA's Virtual GPU architecture and other virtualization-based projects, and was formerly lead software architect for NVIDIA's server chipsets. Prior to NVIDIA, Andy worked on embedded systems, high-speed networking, and IP storage at Wind River Systems, portable workstation development at Tadpole Technology, and mainframe I/O subsystems at Fujitsu-ICL.

Learn how to deploy and optimize high-performance remote graphics applications using NVIDIA GRID Virtual GPU. This session will include an architectural overview of GRID Virtual GPU, which provides true hardware virtualization and sharing of the GPU between multiple virtual machines, a walkthrough of Virtual GPU setup on Citrix XenServer with remote graphics, and examples of how to tune the configuration for optimum remote graphics performance.

Session Level: All
Session Type: Talk
Tags: Graphics Virtualization Summit; Media & Entertainment Summit; Cloud Visualization; Recommended Press Session – Graphics Virtualization

Day: Tuesday, 03/25
Time: 14:00 - 14:50
Location: Room 210F

S4779 - Advances in Chaos V-Ray RT Towards GPU Production Rendering

Vladimir "Vlado" Koylazov ( Co-Founder and Chief Technology Officer, Chaos Group )
Vladimir
Vlado Koylazov is co-Founder and Chief Technology Officer of Chaos Group. He oversees the development of all software products, including the V-Ray rendering engine and its integrations in Autodesk 3ds Max, Maya and Softimage. Vlado graduated with a Bachelor's Degree in Informatics and a Master's Degree in Computer Graphics from FMI, Sofia University.

Learn about the recent advances of V-Ray RT GPU for photorealistic production and interactive rendering. The talk will follow the R&D process of Chaos Software for V-Ray RT GPU towards the goal of delivering production-quality final frame rendering on the GPU as well as improving the performance of the interactive renderer. The various obstacles along the way and the resulting solutions will be discussed. The talk will offer a behind-the-scenes glimpse into the exciting world of GPU programming and hopefully serve as a valuable insight for other software developers.G TC attendees with interest in photorealistic rendering, raytracing, distributed calculations, and programming in CUDA.

Session Level: All
Session Type: Talk
Tags: Media & Entertainment Summit

Day: Tuesday, 03/25
Time: 14:00 - 14:25
Location: Room 211A

S4818 - Technical History Lesson: Demo Effects on the Classic Amiga

Aske Simon Christensen
Aske  Simon Christensen
Aske Simon Christensen has been coding demo effects on the Amiga since 1992 and on the PC since 2005. He has a strong fascination of size-limited demos and is one of the authors of the Crinkler compression tool, which is widely used in the demoscene for compressing 4-kilobyte intros. Professionally, he has a PhD in computer science and has worked for ARM building the shader compiler for the ARM Mali series of GPUs. He is currently employed in the bioinformatics industry.

This talk will take the audience 25 years back in time to the childhood of real-time graphics. We will take a technical dive into the Amiga graphics hardware to discover how a machine of such modest computational power could produce such amazing graphical effects for its time. We will look at several classical demo effects and discuss how they were achieved.

Session Level: All
Session Type: Talk
Tags: Real-Time Graphics Applications; NVScene

Day: Tuesday, 03/25
Time: 14:00 - 14:50
Location: Room 230C

S4872 - Take GPU Power to the Maximum for Vision and Depth Sensor Processing: From NVIDIA's K1 Tegra to GPUs on the Cloud

Chen Sagiv ( CEO, SagivTech Ltd. )
Chen  Sagiv
Chen is co-founder and CEO of SagivTech Ltd., an Israeli based company that specializes in Computer Vision and GPU computing. Chen brings to SagivTech over 15 years of experience in the image processing industry. Chen holds a PhD from the Tel Aviv University in Applied Mathematics, with specializations in texture analysis, filter banks and optimization problems.
Eri Rubin ( Head of Development, SagivTech Ltd. )
Eri  Rubin
Eri is the head of development in SagivTech, with 15 years of experience as software developer and eight years experience in CUDA development. Prior to joining SagivTech Eri was a Team Leader of CUDA Development at OptiTex. He worked as a Senior Graphics Developer for IDT-E Toronto, on animation movies and TV special. Eri is now in the final stage of completing his MSc. in HPC from the Hebrew University of Jerusalem.

Over the last six months SagivTech has been intensively developing CUDA code on the K1 Tegra for mobile computer vision applications that require immense computing power. In this talk we will share our joint effort together with NVIDIA and Mantis Vision to port the core algorithms of Mantis Vision's depth camera to NVIDIA K1 Tegra. We will also introduce SceneNet, a project funded by the European Commission that uses the power of crowd sourcing, in the form of multiple mobile phone users, to create a higher quality 3D video scene experience. We will discuss SagivTech's vision to exploit the compute power of the hybrid platform composed of NVIDIA's K1 Tegra and discrete GPUs in the cloud for computationally intensive, online and interactive applications. We will conclude with some take home tips on writing CUDA on the K1 Tegra.

Session Level: All
Session Type: Talk
Tags: Mobile Summit; Computer Vision

Day: Tuesday, 03/25
Time: 14:00 - 14:25
Location: Room 210E

S4879 - HP and NVIDIA: Delivering Innovative HPC Solutions (Presented by HP)

Ed Turkel ( Group Manager, HPC Segment Product Management, HP )
Ed manages the worldwide product marketing team for the High Performance Computing (HPC) business at Hewlett Packard. The HPC business delivers integrated solutions for HPC with maximum performance and efficiency, enabling innovative research, engineering and analytics. Ed's team is responsible for developing HP's solutions and go-to-market strategy for HPC, working closely with HP's customers to develop the solutions that enable them to best achieve their business and research outcomes. Ed has almost 35 years experience in HPC, including 30 years with HP, in various technical, marketing and business roles.

High Performance Computing is characterized by user demand for increasing levels of performance to accomplish their science, engineering, or analytics workloads. These demands for performance growth are becoming more limited by the power, space and cost of deployment of new systems. For years, HP has partnered with NVIDIA to develop HPC solutions that are purpose-built for performance and scalability, while delivering innovative energy and space efficiency, with a focus on customer ROI. This session will showcase HP and NVIDIA's latest technologies and solutions in use today by leaders in the HPC community, plus trends for the future.

Session Level: All
Session Type: Talk
Tags: Supercomputing; Clusters & GPU Management; Scientific Visualization

Day: Tuesday, 03/25
Time: 14:00 - 14:50
Location: Room LL20C

S4133 - OpenMM Molecular Dynamics on Kinases: Key Cancer Drug Targets Revealed with New Methods and GPU Clusters

Vijay Pande ( Professor and Director, Stanford University )
Highly-Rated Speaker
Vijay Pande
Prof. Pande is currently the Director of the Program in Biophysics, Director of the Folding@home Distributed Computing project, and a Professor of Chemistry and (by courtesy) of Structural Biology and of Computer Science at Stanford University. His current research centers on the development and application of novel cloud computing simulation techniques to address problems in chemical biology. In particular, he has pioneered novel distributed computing methodology to break fundamental barriers in the simulation of kinetics and thermodynamics of proteins and nucleic acids. As director of the Folding@home project (http://folding.stanford.edu), Prof. Pande has, for the first time, directly simulated protein folding dynamics with quantitative comparisons with experiment, often considered a "holy grail" of computational biology. His current research also includes novel computational methods for drug design, especially in the areas of protein misfolding and related diseases such as Alzheimer's Disease. Prof. Pande received a BA in Physics from Princeton University in 1992. Prof. Pande has won numerous awards, including the Michael and Kate Bárány Award for Young Investigators from Biophysical Society (2012), Thomas Kuhn Paradigm Shift Award, American Chemical Society (2010), Fellow of the American Physical Society (2008), Irving Sigal Young Investigator Award from the Protein Society (2006), the MIT Indus Global Technovator's Award (2004), a Henry and Camile Dreyfus Teacher-Scholar award (2003), being named to MIT's TR100 (2002), and named a Frederick E. Terman Fellow (2002).

Learn how to use GPU-enabled molecular dynamics codes, parallelized on a cluster of 100 GPUs, and sample key conformational transitions. When applied to protein kinase molecules, key targets in anti-cancer drugs, these methods reveal new insights into how to target new drugs to these systems.

Session Level: Beginner
Session Type: Talk
Tags: Molecular Dynamics; Big Data Analytics & Data Algorithms; Bioinformatics & Genomics; Computational Physics; Recommended Press Session – HPC-Science

Day: Tuesday, 03/25
Time: 14:30 - 15:20
Location: Room LL21E

S4179 - Bringing Digital Fur to Computer Games

Tae-Yong Kim ( Senior R&D Engineer, NVIDIA )
Tae-Yong Kim
Tae-Yong Kim is currently a senior R&D engineer at NVIDIA's PhysX group. He works on researching and developing NVIDIA PhysX/Apex technology such as APEX Clothing, Fluid, and Destruction. His current focus includes bringing a new APEX Fur module to games and realtime applications. Prior to joining NVIDIA, he developed various simulation and rendering technologies for Rhythm and Hues Studios. His tools were used for the production of Hollywood movies such as "Chronicles of Narnia", "Superman Returns", "Mummy 3". In 2010, he served as a DITS committee member for the Academy of Motion Picture Arts and Sciences.

Fur rendering is one of the most important, but computationally expensive tasks in digitally creating animal creatures in films and games. We explain how features of recent GPUs can be used to create visually realistic rendering and simulation of fur and hairs. Our fur technology consists of 1) authoring pipeline to prepare hair assets in artist friendly tools 2) simulation engine to move hairs on skinned, animated characters 3) rendering and tessellation engine that creates millions of hair primitives on the fly all inside GPUs. We also share real-world challenges we faced in integrating the fur module into highly anticipated upcoming games such as Witcher 3, Call of Duty - Ghosts.

Session Level: Advanced
Session Type: Talk
Tags: Game Development; Combined Simulation & Real-Time Visualization; Real-Time Graphics Applications; Visual Effects & Simulation

Day: Tuesday, 03/25
Time: 14:30 - 14:55
Location: Room 210A

S4204 - Multifrontal Sparse QR Factorization on the GPU

Tim Davis ( Professor, University of Florida )
Tim Davis
Tim Davis is a professor in Computer and Information Science and Engineering at the University of Florida. He is a Fellow of the Society of Industrial and Applied Mathematics (SIAM), in recognition for his work on sparse matrix algorithms. His software for sparse direct methods appears in 100s of applications in industry, academia, and government labs, including MATLAB (x=A), Mathematica, NASTRAN, Cadence, Mentor Graphics, Google Ceres (StreetView, PhotoTours), IBM, Berkeley Design Automation, Xyce, and many others. For a full CV, see http://www.cise.ufl.edu/~davis/background.html .

Sparse matrix factorization involves a mix of regular and irregular computation, which is a particular challenge when trying to obtain high-performance on the highly parallel general-purpose computing cores available on graphics processing units (GPUs). We present a sparse multi-frontal QR factorization method that meets this challenge, and is up to eleven times faster than a highly optimized method on a multicore CPU. Our method is unique compared with prior methods, since it factorizes many frontal matrices in parallel, and keeps all the data transmitted between frontal matrices on the GPU. A novel bucket scheduler algorithm extends the communication-avoiding QR factorization for dense matrices, by exploiting more parallelism and by exploiting the staircase form present in the frontal matrices of a sparse multifrontal method. Peak performance is over 80 Gflops on an Fermi Tesla C2070, in double precision. This is joint work with Nuri Yeralan and Sanjay Ranka.

Session Level: Intermediate
Session Type: Talk
Tags: Numerical Algorithms & Libraries

Day: Tuesday, 03/25
Time: 14:30 - 15:20
Location: Room LL21D

S4209 - Multi-Block GPU Implementation of a Stokes Equations Solver for Absolute Permeability Computation

Nicolas Combaret ( Software Engineer, FEI Visualization Sciences Group )
Nicolas Combaret is a Software Engineer at FEI Visualization Sciences Group where he works mainly on Avizo XLab simulation extensions of Avizo® Fire software application. These extensions provide efficient numerical simulation capabilities to compute material physical properties from a 3D digital image. Nicolas obtained his Ph.D. degree in Physical Chemistry of Condensed Matter from Bordeaux I University.

The goal of this session is to show a multi-block implementation of a Stokes equations solver in Avizo® Fire for absolute permeability computation. Challenges to compute such a complex property in a general purpose software application will be first defined to explain the basis of this work. A Stokes equations solver will be presented, which was developed to the GPGPU computing. Details about the multi-block approach which allows dealing with large datasets in acceptable time on one GPU will be given. Examples and metrics of performance will be finally shown before emphasizing the future perspectives.

Session Level: Intermediate
Session Type: Talk
Tags: Energy Exploration; Computational Fluid Dynamics; Computational Physics

Day: Tuesday, 03/25
Time: 14:30 - 14:55
Location: Room LL20B

S4309 - Getting Maximum Performance in CATIA Live Rendering

Tim Lawrence ( VP Operations/Engineering , BOXX Technologies )
Co-Founder of BOXX Technologies, Tim currently serves as vice president of operations, responsible for managing system operations, manufacturing and product development. Prior to founding BOXX, Tim was the founder of Cooperative PCs, a personal computer and consultant business specializing in high-performance system design.

Resolving rendering performance is based on available hardware. The more GPU the faster the rendering, hence more production. Make sure you have the correct hardware and configuration; use certified solutions for all software. Use benchmark tools to determine modeling horsepower requirements. Optimal configuration does impact performance of rendering; what are the best configurations for CATIA hardware/software? Best practices for rendering in CATIA and steps to make your process optimal.

Session Level: Intermediate
Session Type: Talk
Tags: Digital Manufacturing Summit; Rendering & Animation; Clusters & GPU Management; Performance Optimization

Day: Tuesday, 03/25
Time: 14:30 - 14:55
Location: Room 210H

S4325 - GPU Accelerated 3D Point Clouds Generation from Stereo Images

Bingcai Zhang ( Tech Fellow, BAE Systems )
Bingcai Zhang
Dr. Zhang is a technical fellow at BAE Systems, the premier global defense and aerospace company. He joined BAE Systems in September 1995 right out of University of Wisconsin-Madison, where he earned his Ph.D. in engineering college and MS in computer science. His research interests are: (1)geospatial information technology and 3D mapping; (2)robot vision and unmanned systems; and (3)3D geoweb search. He has held positions as chief architect, chief photogrammetrist, R&D manager, and technical fellow with BAE Systems.

Automatic image understanding and object recognition/extraction have many applications in geospatial intelligence, remote sensing, image processing, and robotics. However, the radiometric properties and spectral characteristics of image pixels are very complex and variable. We take a new approach by extracting 3D objects from 3D point clouds which are generated from stereo images. Our new approach bypasses the complexity of image pixel properties and directly uses the invariant 3D property of any 3D object. One of the critical technologies of this approach is the 3D point clouds generation from stereo images. The 3D point clouds must be accurate and the generation process must be fast. We have developed a GPU accelerated "Automatic Spatial Modeler" (ASM) application that generates accurate 3D point clouds from stereo images. ASM matches every pixel and generates very dense and accurate 3D point clouds. Advanced image matching algorithms in ASM are based on many years of R&D at a global defense, aerospace and security company.

Session Level: Intermediate
Session Type: Talk
Tags: Defense; Computer Vision

Day: Tuesday, 03/25
Time: 14:30 - 14:55
Location: Room 210D

S4446 - Graphics and Computer Vision for Live Augmented Reality: The 34th America's Cup

Tim Heidmann ( Chief Instigator, Serious Intent LLC )
Highly-Rated Speaker
Tim Heidmann
Tim is a software architect specializing in applying new technology to creative applications, most recently in graphics and tracking for live television. He has worked with America’s Cup technology chief and sailor Stan Honey and his team on several innovative projects in the past, including the prototype of the first down line for football and the glowing puck in hockey. Previously, Tim was an evangelist at Silicon Graphics, with responsibility for the development of the animation and special effects markets.

For the 2013 America's Cup sailboat races, the event tech team tracked the yachts, marks, and HDTV helicopter cameras with unprecedented accuracy, enabling a real-time augmented reality graphics system called AC LiveLine. This was used extensively throughout the over 100 hours of international live television broadcast. In 2012, it received the Emmy for Technical Achievement in Sports Broadcast. Visuals provided identification of the yachts, details of the course, graphical display of tactical information, and a number of detailed insights into wind, course, and currents. GPU technology was pivotal in solving the problems of simulation, display, tracking, and visual processing inherent in such a complex project. This talk builds on a talk from last year's conference, and includes new topics such as using computer vision techniques to fine-tune the positioning of the yachts, corner tracking to eliminate the jitter of graphics relative to the video, and accelerated particle system techniques to simulate and display visual effects.

Session Level: Intermediate
Session Type: Talk
Tags: Media & Entertainment Summit; Virtual & Augmented Reality; Video & Image Processing; Recommended Press Session – Media & Entertainment

Day: Tuesday, 03/25
Time: 14:30 - 14:55
Location: Room 211B

S4453 - GPU-Based Lattice QCD Simulations as Thermometer for Heavy-Ion Collisions

Mathias Wagner ( Postdoc, Bielefeld University & Indiana University )
Theoretical physicist Dr. Mathias Wagner is currently working in the physics department at Indiana University. After receiving his PhD in 2009 at Technical University Darmstadt he moved on to Bielefeld University in 2010. There he focussed on CUDA implementations of Lattice QCD simulations. He is a member of the team administrating the Bielefeld GPU cluster and is the PI of the CUDA Research Center in Bielefeld. At Indiana University he continues working on high-performance Lattice QCD simulations on GPUs, intensively collaborating with researchers from the National Center for Supercomputing Applications at the University of Illinois and the developers of the QUDA library.

See how advances in GPU Computing enable us to simulate Quantum Chromodynamics and learn about fundamental properties of strongly interacting matter i.e., quarks and gluons at finite temperatures. With the advances in hardware and algorithms these simulations have reached a level that allows for a quantitative comparison with experimental data from heavy-ion colliders. Discover how the Kepler architecture helps us to boost the performance of the simulations and reach new level of precision. I will discuss selected optimizations for the Kepler K20 cards and modifications to prepare the code for the Titan supercomputer. Furthermore I compare and discuss pros and cons of our in-house in comparison to available libraries like the QUDA library.

Session Level: Intermediate
Session Type: Talk
Tags: Computational Physics; Supercomputing; Numerical Algorithms & Libraries

Day: Tuesday, 03/25
Time: 14:30 - 14:55
Location: Room 212A

S4474 - Scaling OpenACC Across Multiple GPUs

Michael Wolfe ( Compiler Engineer, NVIDIA )
Highly-Rated Speaker
Michael Wolfe
Michael Wolfe has been a compiler engineer at The Portland Group since joining in 1996, where his responsibilities and interests have included deep compiler analysis and optimizations ranging from improving power consumption for embedded microcores to improving the efficiency of Fortran on parallel clusters. He was an associate professor at the Oregon Graduate Institute from 1988 until 1996, and was a cofounder and lead compiler engineer at Kuck and Associates, Inc., prior to that. He earned a PhD in Computer Science from the University of Illinois, and has published one textbook, "High Performance Compilers for Parallel Computing", a monograph, "Optimizing Supercompilers for Supercomputers", and many technical papers.

Learn how to scale your OpenACC application across multiple GPUs. This Example-based presentation will cover three methods of using multiple GPUs. First, you can use MPI with OpenACC to program a different GPU from each MPI process. You can even share data on the GPU across the MPI processes when you have multiple MPI processes on a single node. Second, you can use OpenMP with OpenACC, assigning a different GPU to each OpenMP thread. If you have more CPU threads than GPUs, you can share some GPUs across multiple threads. Third, even a single thread or process can distribute data and computation across multiple GPUs. By dynamically selecting the device, you can easily split or replicate data across multiple devices.

Session Level: Advanced
Session Type: Talk
Tags: Programming Languages & Compilers

Day: Tuesday, 03/25
Time: 14:30 - 14:55
Location: Room LL20D

S4639 - TOPS: Real-Time Automotive Appearance Evaluation for Non-Prototype Design

Daisuke Ide ( System Engineer, Honda R&D, Japan )
Daisuke Ide
Daisuke is a system engineer at Honda R&D Japan where he works on the in-house CAD and visualization systems for styling design.

At Honda, our aim is to utilize and evaluate the computer generated appearance in the same way as a physical model. For CG-based design, we need physically accurate results, not just artistic representations. And we also need real time performance. To accomplish this, we developed rendering software solution called "TOPS". TOPS is already deployed to field users, and now we accomplished real time performance by GPU cluster.Finally we are able to obtain "physically correct rendering based on measured data" with real time performance.

Session Level: Intermediate
Session Type: Talk
Tags: Digital Manufacturing Summit; Automotive; Recommended Press Session – Digital Manufacturing; Recommended for All Press

Day: Tuesday, 03/25
Time: 14:30 - 14:55
Location: Room 210G

S4695 - A Real-Time Defocus Deblurring Method for Semiconductor Manufacturing

Tsutomu Sakuyama ( Imaging Technology Engineer, Dainippon Screen Mfg. Co., Ltd. )
Tsutomu Sakuyama
Tsutomu Sakuyama is an imaging technology engineer for Dainippon Screen Mfg. Co. Ltd. His focus is on image processing and general purpose computer application.

This session will present a real-time defocus deblurring method for the industrial equipment for semiconductors. Many studies have proposed fast deblurring methods for the natural and medical images, etc. However, these methods have difficulty in the equipment due to following reasons: Most of approaches requires the distance between imaging device and the object which cannot be obtained in most cases. In addition, the process must finish within constant cycle time determined by a specification of the equipment, which means 'real-time' in production purpose. In this session, we propose our deblurring method satisfying those constraints.

Session Level: All
Session Type: Talk
Tags: Computer Vision; Video & Image Processing; Computational Photography; Real-Time Graphics Applications

Day: Tuesday, 03/25
Time: 14:30 - 14:55
Location: Room 212B

S4855 - Creating CONSTRUCT: How NVIDIA GPUs are Defining a New Filmmaking Paradigm

Kevin Margo ( Independent Director, VFX/CG Supervisor, Blur Studio )
Kevin Margo
Kevin Margo is director of the hit sci-fi short film "Grounded". He joined Blur studio in 2003 as a scene assembly, lighting and compositing artist and has since moved into the studio's VFX/CG Supervisor role. Recent work includes the prologue for Thor 2: The Dark World and the David Fincher produced Halo 4: scanned cinematic trailer.

Kevin Margo will describe how he is using Chaos Group's V-Ray RT Renderer in his upcoming short film "CONSTRUCT", a CG-animated short film with final-production frames rendered entirely on NVIDIA GPUs. Follow early development of this project with behind-the-scenes breakdowns, concepts, motion-capture fight choreography, models, look development, and final-render clips from a segment of the short film. Focusing on both the creative and technical demands of the film, Kevin will show how GPU technology is enabling his small team of artists, working on nights and weekends in a short period of time to achieve excellent results while trailblazing new film-making workflows possible only due to recent gains in GPU rendering and performance.

Session Level: All
Session Type: Talk
Tags: Media & Entertainment Summit; Rendering & Animation; Recommended Press Session – Media & Entertainment

Day: Tuesday, 03/25
Time: 14:30 - 14:55
Location: Room 211A

S4900 - Integrating Computer Vision Sensor Innovations into Mobile Devices

Eli Savransky ( Principal Architect - CTO Office, NVIDIA )
Eli Savransky is a Principal Engineer at the CTO office of Nvidia's mobile business unit. Eli run special projects with strategic partners, and generally steers the company in advanced technology directions. Before joining Nvidia, Eli worked for fourteen years in Intel, where he started the Core microarchitecture project and served as one of the top architects. His last task there was to server as the chief architect for a future line of microprocessors.

Integrating innovative computer vision sensors into new mobile devices is a very complex endeavor. Especially when popular machines like tablets and smartphones have already a very high level of functionality and users expectations are sky high. In this talk, we will discuss the challenges and opportunities of developing a new device with advanced sensor capabilities.

Session Level: All
Session Type: Talk
Tags: Mobile Summit; Computer Vision; Game Development; Virtual & Augmented Reality; Recommended Press Session – Mobile

Day: Tuesday, 03/25
Time: 14:30 - 14:55
Location: Room 210E

S4247 - A GPU-Based Free-Viewpoint Video System for Surgical Training

Pierre Boulanger ( Professor, University of Alberta )
Pierre Boulanger
Dr. Boulanger worked for 18 years at the National Research Council of Canada as a senior research officer where his primary research interest was in 3D computer vision, rapid product development, and virtualized reality systems. He now has a double appointment as a professor at the University of Alberta's Department of Computing Science and at the Department of Radiology and Diagnostic Imaging. His main research topic and teaching is on virtualized reality systems. He is also principal investigator for stereo IPTV at TRLabs. In 2004, Dr. Boulanger was awarded an iCORE/TRLabs industrial chair in Collaborative Virtual Environment and is now the new CISCO chair in healthcare solutions. He has published more than 270 scientific papers in various Journals and Conferences. He is on the editorial board of two major academic journals. Dr. Boulanger is also on many international committees and frequently gives lectures on rapid product development and virtualized reality. He is the Director of the Advanced Man Machine Interface Laboratory. He is also the scientific director of the Servier Virtual Cardiac Center. On the commercial side, Dr. Boulanger is the president of PROTEUS Consulting Inc. an Alberta-based consulting firm specialized in Virtual Reality Applications.

In this presentation, we propose a novel GPU-based algorithm capable of generating free viewpoints from a network of fixed HD video cameras. This free viewpoint TV system consists of two main sub-systems: a real-time depth estimation sub-system, which extracts a disparity map from a network of cameras, and a synthetic viewpoint generation sub-system that uses the disparity map to interpolate new views between the cameras. In this system, we use a space-sweep algorithm to estimate depth information, which is amiable to parallel implementation. The view generation sub-system generates new synthetic images from 3D vertices and renders them from an arbitrary viewpoint specified by the user. Both steps are computationally extensive, but the computations can easily be divided from each other and thus can be efficiently implemented in parallel using CUDA. A surgical training application is presented.

Session Level: Beginner
Session Type: Talk
Tags: Computer Vision; Video & Image Processing; Virtual & Augmented Reality; Medical Imaging & Visualization; Recommended for All Press

Day: Tuesday, 03/25
Time: 15:00 - 15:25
Location: Room 212B

S4250 - PyFR: Bringing Next Generation Computational Fluid Dynamics to GPU Platforms

Peter Vincent ( Lecturer, Imperial College London )
Peter Vincent
Peter Vincent is a Lecturer and EPSRC Early Career Fellow in the department of Aeronautics at Imperial College London, working at the interface between mathematics, computing, fluid dynamics, and aeronautical engineering. He holds a 1st class undergraduate degree from the department of Physics at Imperial College (graduating top of the year), and a PhD from the department of Aeronautics at Imperial College in the field of CFD. Prior to his appointment as a Lecturer, Peter served as a Postdoctoral Scholar in the department of Aeronautics and Astronautics at Stanford University, where he developed novel high-order numerical methods for CFD, and implemented them for massively-parallel many-core Graphical Processing Units (GPUs).

Discover how GPUs are being used to accelerate high-fidelity computational fluid dynamics (CFD) simulations on unstructured grids. In this talk I will (i) introduce the flux reconstruction approach to high-order methods; a discretization that is particularly well-suited to many-core architectures, (ii) introduce our massively parallel implementation, PyFR, which through a combination of symbolic manipulation and run-time code generation is able to easily target NVIDIA GPU hardware and, (iii) showcase some of the high-fidelity, unsteady, simulations undertaken using PyFR on both desktop and HPC systems.

Session Level: Intermediate
Session Type: Talk
Tags: Computational Fluid Dynamics; Numerical Algorithms & Libraries; Supercomputing

Day: Tuesday, 03/25
Time: 15:00 - 15:25
Location: Room 210A

S4385 - Order Independent Transparency in OpenGL

Christoph Kubisch ( Developer Technology Engineer, NVIDIA )
Christoph Kubisch
Christoph Kubisch is a Developer Technology Engineer for NVIDIA Corporation, where he focuses on OpenGL real-time rendering techniques suitable for CAD/DCC and scientific applications. He collaborates with external partners and NVIDIA's internal teams to optimize current and future rendering algorithms. Prior to joining NVIDIA, Christoph was a researcher on hardware accelerated visualization techniques for medical datasets at the Otto-von-Guericke University of Magdeburg. Furthermore, he has worked as technical artist creating game art, technology and DCC plugin development.

Rendering many transparent surfaces is still a challenge int real-time rendering. With hardware features exposed in OpenGL 4 it is possible to minimize the amount of geometry passes and create transparency effects with order-independent drawing. Several techniques with different quality, performance and memory usage characteristics will be presented. The approaches will be evaluated on two different scenarios, hair and CAD model rendering.

Session Level: Intermediate
Session Type: Talk
Tags: Real-Time Graphics Applications; Performance Optimization

Day: Tuesday, 03/25
Time: 15:00 - 15:25
Location: Room 210C

S4459 - Parallel Lossless Compression Using GPUs

Evangelia Sitaridi ( Ph.D. Candidate, Columbia University )
Evangelia Sitaridi
Evangelia Sitaridi is a Ph.D. Candidate in the Computer Science Department of Columbia University in the City of New York. Her PhD research focuses on on database processing using GPUs. Before starting her PhD she graduated with a BSc and MSc degree from the Department of Informatics and Telecommunications of the University of Athens. During the summers of 2012 and 2013 she interned at IBM Almaden and TJ Watson.

Given the high cost of enterprise data storage, compression is becoming a major concern for the industry in the age of Big Data. Attendees can learn how to efficiently offload data compression to the GPU, leveraging its superior memory and compute resources. We focus on a the DEFLATE algorithm that is a combination of the LZSS and Huffman entropy coding algorithms, used in common compression formats like gzip. Both algorithms are inherently serial and trivial parallelization methods are inefficient. We show how to parallelize these algorithms efficiently on GPUs and discuss trade-offs between compression ratio and increased parallelism to improve performance. We conclude our presentation with a head-to-head comparison to a multi-core CPU implementation, demonstrating up to half an order of performance improvement using a single Kepler GPU. This is joint work with IBM researchers Rene Mueller and Tim Kaldewey.

Session Level: Intermediate
Session Type: Talk
Tags: Big Data Analytics & Data Algorithms

Day: Tuesday, 03/25
Time: 15:00 - 15:25
Location: Room 210B

S4517 - Latest Advances in MVAPICH2 MPI Library for NVIDIA GPU Clusters with InfiniBand

Dhabaleswar K. (DK) Panda ( Professor, The Ohio State University )
Highly-Rated Speaker
Dhabaleswar K. (DK) Panda is a Professor of Computer Science and Engineering at the Ohio State University. He has published over 300 papers in major journals and international conferences. The MVAPICH2 (High Performance MPI over InfiniBand, iWARP and RoCE) open-source software package, developed by his research group, are currently being used by more than 2,085 organizations worldwide (in 72 countries). This software has enabled several InfiniBand clusters to get into the latest TOP500 ranking during the last decade. More than 183,000 downloads of this software have taken place from the project's website alone. He is an IEEE Fellow and a member of ACM.

Learn about the latest developments in MVAPICH2 library that simplifies the task of porting Message Passing Interface (MPI) applications to supercomputing clusters with NVIDIA GPUs. MVAPICH2 supports MPI communication directly from GPU device memory and optimizes it using various features offered by the CUDA toolkit, providing optimized performance on different GPU node configurations. These optimizations are integrated transparently under standard MPI API, for better programmability. Recent advances in MVAPICH2 include designs for MPI-3 RMA using GPUDirect RDMA, framework for MPI Datatype processing using CUDA kernels, support for heterogeneous clusters with GPU and non-GPU nodes, and more. We use the popular OSU micro-benchmark suite and example applications to demonstrate how developers can effectively take advantage of MVAPICH2 in applications using MPI and CUDA/OpenACC. We provide guidance on issues like processor affinity to GPU and network that can significantly affect the performance of MPI applications that use MVAPICH2.

Session Level: Intermediate
Session Type: Talk
Tags: Supercomputing; Performance Optimization; Programming Languages & Compilers

Day: Tuesday, 03/25
Time: 15:00 - 15:25
Location: Room LL21A

S4539 - Efficient Memory and Bandwidth Management for Industrial Strength Kirchhoff Migration

Max Grossman ( Intern, Repsol )
Highly-Rated Speaker
Max Grossman
Max Grossman is a researcher at Repsol USA and a graduate student in the Habanero Multicore Research Group at Rice University. His work for the past 10 years has focused on schedulers and programming models for heterogeneous and distributed systems, motivated by work on applications in geophysics, astrophysics, and medical imaging.

This talk explores a computationally demanding seismic imaging algorithm: Kirchhoff Migration. In particular, we focus on memory and bandwidth management. Supporting large data sets via efficient bandwidth and memory utilization is a must given the richness of current data acquisition systems. This work builds on past work presented at GTC2013 on adaptive scheduling in distributed and GPU-centric platforms. Our implementation has been extended to support larger granularity tasks and larger data sets. This enables more effective utilization of network bandwidth but requires the implementation of GPU "virtual memory", as data usually exceed device capacity. The discussion will cover both stages of Kirchhoff Migration: travel time calculation and seismic data migration. Each stage presents unique challenges to effective memory utilization. This talk will include in-depth analysis of performance, scheduling, and bandwidth metrics generated by the target application under real world workloads.

Session Level: Advanced
Session Type: Talk
Tags: Energy Exploration; Computational Physics; Supercomputing

Day: Tuesday, 03/25
Time: 15:00 - 15:25
Location: Room LL20B

S4622 - Virtual Automotive: Projection Mapped Graphics for Automotive Design

Roy Anthony ( R&I Design, Christie Digital )
Roy Anthony
A member of the Christie Research & Innovation group, Roy C. Anthony leads research initiatives creating new concepts and technologies designed to benefit Christie's extensive, global customer base. A direct collaborator with 3rd party strategic partners and academic research consortiums, Roy's efforts include leading outreach programs and extensive hands on development and applications of technology solutions. A contributor to several film and display technology related academic consortiums including acting as Chair, ACM/SIGGRAPH K/W, and currently serves as Chair, SIGGRAPH 2014 Production Sessions as part of the upcoming ACM/SIGGRAPH Computer Animation Festival in Vancouver.
Kevin Moule ( R&I Software Engineer, Christie Digital )
Kevin  Moule
Kevin Moule has been working at Christie since 2008, currently in the role of Senior Product Developer, Software with the Research and Innovation group. He is primarily focused on developing machine vision based system for automatically warping and blending multi-projector displays. Prior to joining Christie he was a researcher at the University of Waterloo, developing software for visualization in design and manufacturing. He received a Master of Mathematics and Bachelor of Mathematics from the University of Waterloo.

Explore the challenges and issues that arose during the design and implementation of a projection mapped model of a 1/5th scale Audi R8. Adapting a traditional rendering pipeline to generate image content suitable for projection mapped graphics. A projection mapped model requires a single view point, but ideally view places that are better suited to the physical setup of the projectors and surface. Efficiently warping and blending the results for a seamless image. Even with content rendered from an ideal eye point the physical setup of the system (placement of the projectors/car) requires that some warping be applied to precisely align the content to the car. Finally we will explore the next steps in applying the above to a full scale car, the challenges in taking projection mapped graphics to scale.

Session Level: All
Session Type: Talk
Tags: Digital Manufacturing Summit; Collaborative & Large Resolution Displays; Real-Time Graphics Applications; Automotive; Recommended Press Session – Auto

Day: Tuesday, 03/25
Time: 15:00 - 15:50
Location: Room 210G

S4637 - Simulation On Demand: Using GRID as a Platform to Develop Simulation-Based Training with a Distributed Team

Joshua Lewis ( Chief Technology Officer, Check-6 Training Systems )
Joshua Lewis
Joshua "Doc" Lewis began his career at Lockheed Martin working in R&D. He founded and led the team developing the F-35 Pilot Training Aid, a system using video game technology to train the next generation of America’s fighter pilots to operate the world's most advanced aircraft. Doc currently serves as Check-6 Training Systems' Chief Technology Officer. In this role he is responsible for creating and maintaining the company's technology strategy, managing the technical staff, advising executive management, and performing research. Doc holds a B.S. in Aeronautical Science, a Masters’ in Software Engineering, and a Ph.D. in Modeling and Simulation. He is also a private pilot with an instrument rating and an A&P certificate.

How do you deploy a complex virtual simulation application to users across the continental US with a small support staff and minimal cost? Centralize it! Check-6 serves the energy industry by creating and deploying innovative solutions that blend interactive courseware, knowledge and skills assessment, and simulation-based training. The discussion will be on the process and benefits of using GRID as a centralized platform to build simulation-based procedural training for industrial systems using gaming technology. Topics will include the training development lifecycle, assets and models appropriate for training-oriented simulations, and the GRID assessment, deployment, and configuration process including some unexpected challenges and benefits.

Session Level: All
Session Type: Talk
Tags: Graphics Virtualization Summit; Desktop & Application Virtualization; Digital Manufacturing Summit; Energy Exploration

Day: Tuesday, 03/25
Time: 15:00 - 15:50
Location: Room 210F

S4641 - Lattice QCD using MILC and QUDA: Accelerating Calculations at the High-Energy Frontier

Justin Foley ( Software Developer, Microway and NVIDIA )
Justin Foley
Ph.D. in computational particle physics (Lattice QCD). A core contributor to the QUDA library for performing Lattice QCD calculations on GPUs.

Lattice Quantum ChromoDynamics (QCD) is a numerical treatment of the theory of the strong nuclear force. Calculations in this field can answer fundamental questions about the nature of matter, provide insight into the evolution of the early universe, and play a crucial role in the search for new theories of fundamental physics. However, massive computational resources are needed to achieve these goals. In this talk, we describe how NVIDIA GPUs are powering Lattice QCD calculations involving the MILC code suite and the QUDA library. This code base has allowed lattice applications to access unparalleled compute power on leadership-class facilities such as Blue Waters and Titan.

Session Level: Advanced
Session Type: Talk
Tags: Computational Physics

Day: Tuesday, 03/25
Time: 15:00 - 15:25
Location: Room 212A

S4648 - On Designing Accelerator-Based System Architectures for Demanding Signal Processing Applications

Bracy Elton ( Senior Computational Scientist, Dynamics Research Corporation, High Performance Technologies Group )
Dr. Bracy Elton is presently a Senior Computational Scientist in the High Performance Technologies Group of Dynamics Research Corporation, where he holds a position as a Signal/Image Processing On-Site in the US DOD High Performance Computing Modernization Program's User Productivity Enhancement, Technology Transfer and Training (DOD HPCMP PETTT) activity. He has 23+ years of experience in computational science across multiple disciplines, experience that includes working for three supercomputer manufacturers and the Ohio Supercomputer Center. Double majoring in Mathematics and Computer Science, Dr. Elton received a B.S. Cum laude from Pacific Lutheran University. He received a M.S. and a Ph.D. in Computer Science from the University of California, Davis.
Ross Smith ( Computational Scientist, ACE On-Site, Dynamics Research Corporation, High Performance Technologies Group )
Ross Smith
Dr. Ross Smith is a Computational Scientist with Dynamics Research Corporation (DRC) as an DOD HPCMP PETTT Advanced Computational Environments On-Site. His experience spans a wide range of technical subjects and passion for learning. Dr. Smith received his B.S. in Mechanical Engineering in 2003 when he graduated magna cum laude from Rose-Hulman Institute of Technology. He continued his education at Case Western Reserve University where he received his M.S. and Ph.D. in Electrical Engineering focusing on nanofiltration within BioMEMS. In 2010, Dr. Smith joined High Performance Technologies, Inc. (later acquired by DRC) where he has brought his enthusiasm and expertise to aid government research projects in a wide range of disciplines.

The advent of (1) high performance PCIe 3.0 compatible accelerators that provide for direct accelerator-to-accelerator communication, e.g., via NVIDIA GPUDirect RDMA, (2) PCIe 3.0 switches & devices, e.g., PLX Technology, and (3) PCIe 3.0 high bandwidth network adaptors and switches, such as those for 40 Gb/s Ethernet & 56 Gb/s FDR Infiniband presents opportunities for designing systems that enable demanding signal processing applications, such as real-time image & radar processing, and domain decomposition approaches for fluid dynamics. We combine the above and present ideas for designing systems onto which can be mapped such demanding signal processing applications.

Session Level: Intermediate
Session Type: Talk
Tags: Defense; Signal & Audio Processing; Supercomputing

Day: Tuesday, 03/25
Time: 15:00 - 15:50
Location: Room 210D

S4732 - GPU-Optimized Deep Learning Networks for Automatic Speech Recognition

Jessica Ray ( Computer Scientist - Human Language Technology, MIT Lincoln Laboratory )
Ms. Jessica Ray is a staff member at the MIT Lincoln Laboratory in the Human Language Technology group. Her work is in automatic speech and keyword recognition, with a focus on low-resource languages. After receiving her B.S. in Computer Science and Mathematics from the University of Massachusetts Amherst in May 2012, Ms. Ray joined MIT Lincoln Laboratory in June 2012.

In this talk, we compare the implementation of deep learning networks [1] on traditional x86 processors with the implementation on NVIDIA Tesla K20 GPU Accelerators for the purposes of training Restricted Boltzmann Machines [2] and for deep network back propagation in a large-vocabulary speech recognition task (automatic transcription of TED talks). Two GPU implementations are compared: 1) a high-level implementation using Theano [3] and 2) a native implementation using low-level CUDA BLAS libraries. We describe the scaling properties of these implementations in comparison to a baseline batched-x86 implementation as a function of training data size. We also explore the development time tradeoffs for each of the implementations.

Session Level: Intermediate
Session Type: Talk
Tags: Machine Learning & AI; Performance Optimization; Defense

Day: Tuesday, 03/25
Time: 15:00 - 15:25
Location: Room LL21B

S4829 - Embedded Vision: Enabling Smarter Mobile Apps and Devices

Jeff Bier ( Founder, Embedded Vision Alliance )
Jeff Bier is founder of the Embedded Vision Alliance (www.Embedded-Vision.com). "Embedded vision" refers to the incorporation of visual intelligence into systems, creating "machines that see and understand." The Embedded Vision Alliance is an industry partnership formed to enable the market for embedded vision technology by inspiring and empowering design engineers to create more capable and responsive products through integration of vision capabilities. The Alliance provides training videos, tutorial articles, code examples, and an array of other resources (all free of charge) on its web site, www.Embedded-Vision.com. Jeff is also co-founder and president of BDTI (www.BDTI.com), a trusted resource for independent analysis and specialized engineering services in the realm of embedded digital signal processing technology. Jeff oversees BDTI's benchmarking and analysis of chips, tools, and other technology. Jeff is also a key contributor to BDTI's engineering services, which focus on developing optimized software and system using embedded digital signal processing.

For decades, computer vision technology was found mainly in university laboratories and a few niche applications. Today, virtually every tablet and smartphone is capable of sophisticated vision functions such as hand gesture recognition, face recognition, gaze tracking, and object recognition. These capabilities are being used to enable new types of applications, user interfaces, and use cases for mobile devices. We illuminate the key drivers behind the rapid proliferation of vision capabilities in mobile devices, and highlight some of the most innovative processor architectures, sensors, tools and APIs being used for mobile vision.

Session Level: All
Session Type: Talk
Tags: Mobile Summit; Computer Vision; Computational Photography; Virtual & Augmented Reality

Day: Tuesday, 03/25
Time: 15:00 - 15:25
Location: Room 210E

S4830 - CUDA 6 and Beyond

Mark Harris ( Chief Technologist, GPU Computing, NVIDIA )
Mark Harris is Chief Technologist for GPU Computing at NVIDIA, where he works as a developer advocate and helps drive NVIDIA's GPU computing software strategy. His research interests include parallel computing, general-purpose computation on GPUs, physically based simulation, and real-time rendering. Mark founded www.GPGPU.org while he was earning his PhD in computer science from the University of North Carolina at Chapel Hill. Mark brews his own beer and cures his own bacon in Brisbane, Australia, where he lives with his wife and daughter.

CUDA is NVIDIA's parallel computing platform and programming model. CUDA 6 dramatically increases developer productivity with the introduction of Unified Memory, which simplifies memory management by automatically migrating data between the CPU and GPU. Unified Memory and other new features in CUDA tools and libraries make GPU computing easier than ever before. In this talk you'll hear about these features and get insight into the philosophy driving the development of CUDA and how it will take advantage of current and future GPUs. You will learn about NVIDIA's vision for CUDA and the challenges for the future of parallel software development.

Session Level: All
Session Type: Talk
Tags: Programming Languages & Compilers

Day: Tuesday, 03/25
Time: 15:00 - 15:50
Location: Room 220C

S4832 - Chrome on Mobile at 60 FPS: Now and In the Future

Nat Duca ( Software Engineer, Google )
Engineer on the Chrome team, focused on cross-stack optimizations to make silky, smooth HTML5 experiences possible.

Chrome Android's GPU compositing architecture renders trillions of frames a day, at 60fps. On one hand, it is tasked with rendering the most common desktop pages: content designed for desktop-class hardware capabilities, that without careful handling would bring a phone to its knees. On the other hand, it needs to render, also at 60fps, dynamic, touch-driven mobile applications. In this talk, we will explain how this architecture works, reviewing the design choices we've had to make around CPU/GPU alternatives, power efficiency, and memory usage. We will close with a discussion on where we're going in the future, especially around usage of the GPU for rasterization of web content.

Session Level: Intermediate
Session Type: Talk
Tags: Mobile Summit; Web Acceleration; Recommended Press Session – Mobile

Day: Tuesday, 03/25
Time: 15:00 - 15:25
Location: Room LL21C

S4880 - Building Photo-Real Virtual Reality from Real Reality, Byte by Byte

Scott Metzger ( Co-Founder, Nurulize )
Scott Metzger is a cofounder of Nurulize, which is focused on developing tools for high-end VR content creation. Metzger has 14 years experience as a Visual Effects Artist/Supervisor specializing in CG rendering for film, commercials, television, and music videos at Los Angeles studios including Luma Pictures, MPC LA, Method Studios, Digital Domain, Alcon Entertainment and Hydraulx. He is considered an innovator of high-resolution photo-real environment capture for offline rendering and real-time applications.

This talk describes the process for creating and viewing immersive VR environments of extraordinary visual quality, building on work that was used to launch NVIDIA's Quadro K6000 at SIGGRAPH 2013. It will review how laser scanning, 3D modeling, HDR image capture, and 3D paint tools were combined with high-frame rate playback to create highly-interactive worlds. He'll review the Mari workflow used for early production of Rise, a theatrical release now in production, and demonstrate the results on an Oculus Rift head-mounted display.

Session Level: All
Session Type: Talk
Tags: Media & Entertainment Summit

Day: Tuesday, 03/25
Time: 15:00 - 15:25
Location: Room 211B

S4882 - First Glimpse into the OpenPOWER Software Stack with Big Data Workload Example (Presented by IBM)

Ken Rozendal ( Distinguished Engineer, IBM )
Ken Rozendal is the chief architect for IBM's Linux Technology Center. Previously, he was lead architect for IBM's Linux kernel development and earlier for IBM's AIX kernel development organization. He created the original version of the CD-ROM filesystem in AIX and worked on the AIX design for diskless systems, the virtual memory and filesystem support for SMP systems, workload management, and application checkpoint/restart.
Keith Campbell ( Senior Software Engineer, IBM )
Keith Campbell is a senior software engineer at IBM, working in the Hardware Acceleration Laboratory since 2012. He specializes in applying computational acceleration technology for enterprise computing. Keith received a Bachelors of Math degree from the University of Waterloo in Waterloo, Ontario with majors in Computer Science and Combinatorics & Optimization.

The OpenPOWER Foundation (http://www.open-power.org/) is an open alliance of companies working together to expand the hardware and software ecosystem based on the POWER architecture. This collaboration across hardware and software vendors enables unique innovation across the full hardware and software stack. OpenPOWER ecosystem partners and developers now have more choice, control and flexibility to optimize at any level of the technology from the processor on up for next-generation, hyperscale and cloud datacenters. Integrating support for NVIDIA GPUs on the POWER platform enables high performance enterprise and technical computing applications such as Big Data and analytics workloads. This presentation will cover the software stack and developer tools for OpenPOWER, the planned support for CUDA, and a proof of concept showing GPU acceleration. This proof of concept will be available as a demo in the IBM booth.

Session Level: Intermediate
Session Type: Talk
Tags: Big Data Analytics & Data Algorithms; Programming Languages & Compilers; Debugging Tools & Techniques; Recommended Press Session – HPC-Science

Day: Tuesday, 03/25
Time: 15:00 - 15:50
Location: Room LL20D

S4899 - Leveraging GPUs on Amazon Web Services for Media and Entertainment

John Phillips ( Sr. Product Manager, Amazon Web Services )
John Phillips
John Phillips, a Sr. Product Manager on the Amazon EC2 team, is responsible for GPU instances among other instance families and features. Prior to joining Amazon Web Services, John received his MBA from Harvard University. Before business school, John worked for several years in the financial services industry. John received a B.S.E. in Electrical Engineering from Princeton University.
Jules Urbach ( Founder and CEO, OTOY Inc. and LightStage )
Jules attended Harvard-Westlake high school in LA before being accepted to Harvard University. However, he decided to defer his acceptance to Harvard (indefinitely as it turned out) to go make video games. He made his first game, Hell Cab - Time Warner Interactive, at age 18. Six years after he created Hell Cab, he founded Groove Alliance. Groove ended up creating the first game ever available on Shockwave.com (Real Pool). A decade later, Jules is busy working on his latest two companies, OTOY and Lightstage, to revolutionize 3D rendering.

Amazon EC2 now offers the G2, a new NVIDIA GPU instance type capable of running 3D graphics and GPU compute workloads in the AWS cloud. In this session, attendees will learn about the adoption and evolution of media workflows on the AWS cloud, including how OTOY and AWS have teamed up to bring the full power of NVIDIA GPUs to media and entertainment. Hosted on AWS' infrastructure, OTOY's ORBX-powered AMIs offer M&E professionals a fully customizable "PC in the Cloud," making sophisticated 3D design, rich-media and creative applications available through a web browser. In this presentation, OTOY and AWS will provide a snapshot of industry-specific use cases, including how ORBX streaming technology can be coupled with real-time 3D rendering via OTOY's OctaneRender.

Session Level: All
Session Type: Talk
Tags: Media & Entertainment Summit; Recommended Press Session – Media & Entertainment

Day: Tuesday, 03/25
Time: 15:00 - 15:25
Location: Room 211A

S4936 - Diving for Dollars: Liquid Cooling for Better TCO (Presented by Penguin Computing)

Philip Pokorny ( Chief Technology Officer, Penguin Computing )
Philip Pokorny is responsible for all aspects of Penguin Computings hardware products. He joined Penguin Computing in Feb, 2001 as an Engineer and has steadily taken on more responsibility in Penguin's hardware organization. He brings a wealth of both customer and engineering experience to the design, development and support of Penguin Computing's products. Prior to Penguin Computing, he worked 14 years in various Engineering and System Administration positions with Cummins, Inc and it's electronics startup Cummins Electronics. He participated in the development of internal network standards, deployed and managed a multi-site network of multi-protocol routers and supported a diverse mix of office and engineering workers with a variety of server and desktop operating systems. He has contributed code to Open Source projects including the Linux kernel, lm_sensors and LCDproc. Philip graduated in 1987 from Rose-Hulman Institute of Technology with BS degrees in Math and Electrical Engineering with a second major in Computer Science.

Penguin Computing is the largest private supplier of complete high performance computing (HPC) solutions in North America and has built and operates the leading specialized public HPC cloud service Penguin Computing on Demand (POD). Penguin Computing also applies its core expertise in the field of distributed large-scale enterprise computing delivering scale-out compute, storage, virtualization, and cloud solutions for organizations looking to take advantage of modern open data center architectures. Attend this session to learn about Penguin's liquid cooling that provides significant reduction in power consumption across CPU and GPU as well as saves on OPEX, CAPEX and floor space.

Session Level: Intermediate
Session Type: Talk
Tags: Supercomputing; Clusters & GPU Management; Energy Exploration

Day: Tuesday, 03/25
Time: 15:00 - 15:25
Location: Room LL20C

S4145 - High Frequency Elastic Seismic Modeling on GPUs Without Domain Decomposition

Thor Johnsen ( HPC Expert, Chevron )
Thor Johnsen
20 years of experience as a software developer in the oil and gas industry. Worked for Schlumberger for 17 years in various roles on projects targeting all forms of seismics, EM data and borehole logs. Worked as an HPC expert for WesternGeco from 2009 until 2011. Currently works as an HPC expert for Chevron ETC. Loves the challenge of parallel designs of complicated algorithms.

What if you want to do FDTD modeling on a dataset that cannot possibly fit into GPU memory? This session explores design patterns that take advantage of two levels in the GPU memory hierarchy that is often overlooked, host memory and disk, thereby greatly expanding the size of problem that can be handled. Two seismic modeling kernels were implemented, acoustic TTI with variable density and elastic triclinic. We show that these GPU kernels can handle extremely large datasets without domain decomposition (10's of billions of cells) while also taking full advantage of the computational throughput of 16 Kepler GPUs, achieving 20-30x better throughput than highly optimized CPU code running on a dual socket Sandy Bridge server. We also show that this design pattern can be applied to other numerical methods that have a concept of timestepping and exhibit good spatial locality, such as Lattice Boltzmann methods for fluid flow modeling.

Session Level: Advanced
Session Type: Talk
Tags: Energy Exploration; Numerical Algorithms & Libraries

Day: Tuesday, 03/25
Time: 15:30 - 16:20
Location: Room LL20B

S4266 - Concurrency and Overlapping: Fully Exploit Every Single GPU for Large-Scale Machine Learning

Yun Zhu ( Ph.D. Candidate, Georgia State University )
Yun Zhu
Yun Zhu is currently a Ph.D. student in Computer Science Department at Georgia State University. He received his BS at East China University of Science & Technology and MS at Emory University. His research interests include machine learning algorithm and high performance computing.

Learn how to tackle large-scale machine learning model that exceeds the device memory capacity of a GPU without using additional GPUs and maintain competitive performance comparing to those with everything on board. (1) Remove the device memory limitation by splitting parameter set into small parts that fits in device memory and update one by one. (2) Neutralize the overhead caused by the extra memory transferring of small parameter subsets between host and device using a)overlapping of memory copy and kernel execution and b) concurrent kernels. (3) Figure out the computation power limit of the GPU and select the optimal configuration of concurrency. Example will be given using Restricted Boltzman Machine (RBM) with billions of parameters.

Session Level: Intermediate
Session Type: Talk
Tags: Machine Learning & AI; Big Data Analytics & Data Algorithms

Day: Tuesday, 03/25
Time: 15:30 - 15:55
Location: Room LL21B

S4311 - Lanczos Algorithm Using CUDA for Lattice QCD

Hyung-Jin Kim ( Research Associate, Brookhaven National Laboratory )
Mr. Kim received his Ph.D. from Seoul National University in Republic of Korea(2012)

Getting an eigenvalue and eigenvector set of inversion matrix is the key point of accelerating the matrix inversion and the Lanczos algorithm is one of the well-known methods for this problem. But this routine is highly dominated by data access IO so it can be another bottleneck in the whole sequence. Even though the FLOPS/Bandwidth ratio of GPU is not good enough, GPU still has an advantage in memory bandwidth compared with that of CPU. We are implementing the Lanczos algorithm based on CUDA and will show preliminary performance result on multi GPU clusters.

Session Level: Advanced
Session Type: Talk
Tags: Computational Physics; Numerical Algorithms & Libraries; Supercomputing

Day: Tuesday, 03/25
Time: 15:30 - 15:55
Location: Room 212A

S4318 - Performance Impact of Dynamic Parallelism on Clustering Algorithms on GPUs.

Michela Taufer ( Associate Professor, University of Delaware )
Michela Taufer
Dr. Taufer is the David L. and Beverly J.C. Mills Chair of Computer and Information Sciences and an associate professor in the same department at the University of Delaware. She earned her master's degrees in Computer Engineering from the University of Padova (Italy) and her doctoral degree in Computer Science from the Swiss Federal Institute of Technology (Switzerland). From 2003 to 2004 she was a La Jolla Interfaces in Science Training Program (LJIS) Postdoctoral Fellow at the University of California San Diego (UCSD) and The Scripps Research Institute (TSRI), where she worked on interdisciplinary projects in computer systems and computational chemistry. From 2005 to 2007, she was an Assistant Professor at the Computer Science Department of the University of Texas at El Paso (UTEP). She joined the University of Delaware in 2007 as an Assistant Professor and was promoted to Associate Professor with tenure in 2012.

Discover and quantify the performance gains of dynamic parallelism for clustering algorithms on GPUs. Dynamic parallelism effectively eliminates the superfluous back and forth communication between the GPU and CPU through nested kernel computations. The change in performance is measured using two well-known clustering algorithms that exhibit data dependencies: the K-means clustering and the hierarchical clustering. K-means has a sequential data dependence wherein iterations occur in a linear fashion, while the hierarchical clustering has a tree-like dependence that produces split tasks. Analyzing the performance of these data-dependent algorithms gives us a better understanding of the benefits or potential drawbacks of CUDA 5's new dynamic parallelism feature.

Session Level: Beginner
Session Type: Talk
Tags: Numerical Algorithms & Libraries; Supercomputing

Day: Tuesday, 03/25
Time: 15:30 - 15:55
Location: Room LL21D

S4383 - GPU Cluster with Proprietary Interconnect Utilizing GPU Direct Support for RDMA

Toshihiro Hanawa ( Associate Professor, The University of Tokyo )
Toshihiro Hanawa
Toshihiro Hanawa received the M.S. degree and the Ph.D. degree in computer science from Keio University in 1995 and 1998. He was an assistant professor of Tokyo University of Technology, Japan, from 1998 to 2007, a research fellow of Center for Computational Sciences(CCS), University of Tsukuba, from 2007 to 2008, and an associate professor of CCS, from 2008 to 2013. Since December 2013, he is a project associate professor of Information Technology Center, The University of Tokyo. He joined to HA-PACS project supported by the MEXT special fund program from 2011 to 2013 in University of Tsukuba, and belongs to the "Research and Development on Unified Environment of Accelerated Computing and Interconnection for Post-Petascale Era" project supported by JST/CREST, Japan, from 2012 to 2018. His research interests include computer architecture and interconnection network. Dr. Hanawa is a member of IEEE CS and IPSJ.

Learn how our proprietary interconnect effectively works and achieves good performance on the GPU cluster using GPU Direct support for RDMA. We promote the HA-PACS project at the Center for Computational Sciences, University of Tsukuba, in order to build up the HA-PACS base cluster system, as a commodity GPU cluster, and to develop an experimental system based on the Tightly Coupled Accelerators (TCA) architecture as a proprietary interconnect connecting GPUs beyond the nodes using GPU Direct Support for RDMA mechanism. In this session, we describe the TCA architecture and the design and implementation of PEACH2 for realizing the TCA architecture using FPGA. We also introduce APIs for TCA cluster environment and show the performance of the application using TCA cluster.

Session Level: All
Session Type: Talk
Tags: Supercomputing; Clusters & GPU Management

Day: Tuesday, 03/25
Time: 15:30 - 15:55
Location: Room LL21A

S4392 - From CPU to GPU: Optimizing 3D Depthmap and Filtering

Tim Droz ( VP & GM North America, SoftKinetic )
Highly-Rated Speaker
Tim Droz
Tim heads the SoftKinetic U.S. organization delivering 3D time-of-flight (TOF) and gesture solutions to international customers such as Intel and Texas Instruments. Prior to SoftKinetic, Tim was VP Platform Engineering and head of the Entertainment Solutions Business Unit (ESBU) at Canesta, developing the Kinect 2 TOF sensor (acquired by Microsoft). Tim's pioneering work extends into all aspects of the gesture and 3D ecosystem including 3D sensors, gesture-based middleware and applications.

The advent of 3D technologies has created a particular strain on processing resources for embedded platforms such as Tegra. 3D Depthmap generation and filtering optimization have traditionally been processed using the CPU, but by offloading these capabilities to the much more robust GPU, up to a quarter of the bottleneck created in processing 3D images can be eliminated. In this session learn how utilizing the GPU to free up processing resources allows for a leaner, faster developer experience. Also, discuss how to manage and overcome the most difficult part of GPU processing - synchronization.

Session Level: All
Session Type: Talk
Tags: Mobile Summit; Computer Vision; Numerical Algorithms & Libraries; Programming Languages & Compilers; Recommended Press Session – Mobile

Day: Tuesday, 03/25
Time: 15:30 - 15:55
Location: Room 210E

S4410 - Visualization and Analysis of Petascale Molecular Simulations with VMD

John Stone ( Senior Research Programmer, Associate Director CUDA Center of Excellence, University of Illinois )
Highly-Rated Speaker
John Stone
John Stone is a Senior Research Programmer in the Theoretical and Computational Biophysics Group at the Beckman Institute for Advanced Science and Technology, and Associate Director of the NVIDIA CUDA Center of Excellence at the University of Illinois. Mr. Stone is the lead developer of VMD, a high performance molecular visualization tool used by researchers all over the world. His research interests include molecular visualization, GPU computing, parallel processing, ray tracing, haptics, and virtual environments. Mr. Stone was awarded as an NVIDIA CUDA Fellow in 2010. Mr. Stone also provides consulting services for projects involving computer graphics, GPU computing, and high performance computing. Prior to joining University of Illinois in 1998, Mr. Stone helped develop the award winning MPEG Power Professional line of video compression tools at Heuris.

We present recent successes in the use of GPUs to accelerate challenging molecular visualization and analysis tasks on hardware platforms ranging from commodity desktop computers to the latest Cray XK7 supercomputers. This talk will focus on recent algorithm developments and the applicability and efficient use of new CUDA features on state-of-the-art Kepler GPUs. We will present the latest performance results for GPU accelerated trajectory analysis runs on Cray XK7 petascale systems and GPU-accelerated workstation platforms. We will conclude with a discussion of ongoing work and future opportunities for GPU acceleration, particularly as applied to the analysis of petascale simulations of large biomolecular complexes and long simulation timescales.

Session Level: Intermediate
Session Type: Talk
Tags: Molecular Dynamics; Supercomputing; Big Data Analytics & Data Algorithms; Scientific Visualization

Day: Tuesday, 03/25
Time: 15:30 - 16:20
Location: Room LL21E

S4447 - Rhythm: Harnessing Data Parallel Hardware for Server Workloads

Sandeep Agrawal ( Student, Duke University )
Sandeep Agrawal
Sandeep Agrawal is a third-year Ph.D. student in the Computer Science Department at Duke University. His interests broadly lie in the field of computer and systems architecture, especially in the area of energy and performance optimization.

We present Rhythm, a framework for high throughput servers that exploits similarity across web service requests to improve server throughput and energy efficiency. Present work in data center efficiency primarily focuses on scale-out, with off the shelf hardware used for individual machines leading to an inefficient usage of energy and area. Rhythm improves upon this by harnessing data parallel hardware to execute "cohorts" of web service requests, grouping requests together based on similar control flow and using intelligent data layout optimizations. An evaluation of the SPECWeb Banking workload for future server platforms on the GTX Titan achieves 4x the throughput (reqs/sec) of a core i7 at efficiencies (reqs/Joule) comparable to a dual core ARM Cortex A9.

Session Level: Beginner
Session Type: Talk
Tags: Big Data Analytics & Data Algorithms

Day: Tuesday, 03/25
Time: 15:30 - 15:55
Location: Room 210B

S4552 - Practical Real-Time Voxel-Based Global Illumination for Current GPUs

Alexey Panteleev ( Application Performance Engineer, NVIDIA )
Alexey Panteleev
Alexey Panteleev is a software/hardware engineer and scientist with broad specialty and focus on optimization and performance. Since 2010, he has been working at NVIDIA as a GPU compute and graphics application performance engineer. In 2013, Alexey received a Ph.D. in computer architecture from the Moscow Engineering and Physics Institute. His interests include real-time graphics, low-level programming, hardware simulation and RTL design, data visualization, and uncommon programming languages.

This session describes the work at making the voxel-based global illumination (GI) approach practical for use in games running on current generation graphics hardware such as Kepler. Based upon Cyril Crassin's research, a library has been developed that allows applications to render GI effects for large and fully dynamic scenes at 30 frames per second or more, producing soft diffuse indirect lighting and blurry specular reflections, and providing emissive material support. During the session, Alexey will talk about the cone tracing GI algorithm in general and get into the details of scene representation, efficient multi-resolution voxelization, and indirect light gathering.

Session Level: Intermediate
Session Type: Talk
Tags: Real-Time Graphics Applications; Performance Optimization; Mobile Applications; Game Development

Day: Tuesday, 03/25
Time: 15:30 - 16:20
Location: Room 210C

S4570 - Accelerating 3D Reconstruction from Range Images with a Novel Cyclic Scheme

Christopher Schroers ( Research Assistant, Saarland University )
Christopher Schroers
Christopher Schroers received a M. Sc. degree in Visual Computing from the Saarland University, Saarbrücken, Germany in 2011. Since then he is a research assistant in the Mathematical Image Analysis Group, Saarland University.

Attend this session to get a deep understanding of variational range image integration methods. Such approaches are able to deal with a substantial amount of noise and outliers, while regularizing and thus creating smooth 3D reconstructions. See that incorporating a new direction-dependent smoothing behavior yields a better control of the smoothing with respect to the local structure of the unknown surface and thus state-of-the-art results. Also, learn how the integration can be accelerated with a novel and generic cyclic scheme named Fast Jacobi. Fast Jacobi is essentially a modified Jacobi over-relaxation (JOR) method, where the relaxation parameter is not fixed but varied in a cyclic way. Due to this, Fast Jacobi is much more efficient than JOR but still as simple to implement and perfectly suited for parallelization. Furthermore, the Fast Jacobi scheme is also applicable to a large range of other PDE-based image analysis problems.

Session Level: Beginner
Session Type: Talk
Tags: Computer Vision; Numerical Algorithms & Libraries

Day: Tuesday, 03/25
Time: 15:30 - 15:55
Location: Room 212B

S4598 - GPU Acceleration of CFD in Industrial Applications Based on OpenFOAM

Bjoern Landmann ( Development Engineer, FluiDyna GmbH )
Bjoern Landmann is a development engineer at FluiDyna GmbH, Munich, Germany, since 2011. His research interests include: computational multiphysics; high-performance coputing; turbulence and aeroacoustics.

CFD calculations in an industrial context prioritize fast turn-around times - a requirement that can be addressed by porting parts of the CFD calculation to the GPU, leading to a hybrid CPU/GPU approach. In a first step, the GPU library Culises has been developed, allowing the GPU-based solution of large-scale linear systems of equations that are in turn set up by MPI-parallelized CFD codes (e.g. OpenFOAM) on CPU. In this session we will address a second step, which consists in porting the construction of the linear system to the GPU as well, while pre- and post-processing remain on the CPU. Aiming for industrial applications in the automotive sector, the approach will be aligned on the simpleFOAM solver of OpenFOAM. As the set up of the linear system consumes up to 40-50% of computational time in typical cases of the automotive industry, this approach can further increase the acceleration of CFD computations.

Session Level: Intermediate
Session Type: Talk
Tags: Computational Fluid Dynamics; Automotive; Manufacturing

Day: Tuesday, 03/25
Time: 15:30 - 15:55
Location: Room 210A

S4766 - From Tent-Pole to Indie: How OctaneRender is Changing the Industry

Jules Urbach ( Founder and CEO, OTOY Inc. and LightStage )
Jules Urbach
Jules attended Harvard-Westlake high school in LA before being accepted to Harvard University. However, he decided to defer his acceptance to Harvard (indefinitely as it turned out) to go make video games. He made his first game, Hell Cab - Time Warner Interactive, at age 18. Six years after he created Hell Cab, he founded Groove Alliance. Groove ended up creating the first game ever available on Shockwave.com (Real Pool). A decade later, Jules is busy working on his latest two companies, OTOY and Lightstage, to revolutionize 3D rendering.

OTOY has built a production pipeline from high-quality content creation to delivery of photorealistic graphics in real-time. In this session, attendees will get a sneak peek under the hood as Jules Urbach unveils the 2014 roadmap for OctaneRender. The unbiased, physically-based OctaneRender, which draws on the power of NVIDIA GPUs, promises to revolutionize visual effects. Attendees will learn how the VFX community is using OctaneRender to transform their production pipelines. OctaneRender uses NVIDIA GPUs to deliver a rich feature set and interactive, final-quality previews to artists and TDs, allowing them to set cameras, lights and materials without time consuming iterations. From tent-pole to indie, Octane has been adopted by some of the industry's leading production designers.

Session Level: All
Session Type: Talk
Tags: Media & Entertainment Summit

Day: Tuesday, 03/25
Time: 15:30 - 15:55
Location: Room 211A

S4810 - NVIDIA Path Rendering: Accelerating Vector Graphics for the Mobile Web

Mark Kilgard ( Principal Graphics Software Engineer, NVIDIA Corporation )
Mark Kilgard works on GPU-accelerated web graphics, OpenGL, and programmable shading languages. Mark wrote numerous important OpenGL extension specifications and implemented the popular OpenGL Utility Toolkit (GLUT) for developing portable OpenGL examples and demos. Mark co-authored the book "The Cg Tutorial: the definitive guide to programmable real-time graphics." Mark is an NVIDIA Distinguished Inventor named on over 40 graphics-related patents. Mark's Karaoke rendition of Dolly Parton's "9 to 5" can't be beat.

Come see how NVIDIA is transforming your web browser into a fully GPU-accelerated experience. NVIDIA Path Rendering provides GPU-acceleration for web graphics standards such as Scalable Vector Graphics (SVG), HTML 5 Canvas, PDF documents, and font rendering. On mobile devices, screen resolutions and densities vary so vector graphics is a natural way to deploy 2D graphics experience such as games, maps, and traditional web pages. Watch as we demonstrate accelerated SVG viewers and web browsers on Tegra devices. We do this with an OpenGL extension available on all of NVIDIA's latest desktop and mobile GPUs.

Session Level: All
Session Type: Talk
Tags: Mobile Summit; Real-Time Graphics Applications; In-Vehicle Infotainment (IVI) & Safety; Web Acceleration

Day: Tuesday, 03/25
Time: 15:30 - 15:55
Location: Room LL21C

S4142 - Easy Multi-GPU Programming with CUDArrays

Javier Cabezas ( Ph.D. Student, Barcelona Supercomputing Center )
Javier Cabezas
Javier Cabezas is a Phd student in the Computer Architecture department at Universitat Politècnica de Catalunya (UPC) and works as researcher at the Barcelona Supercomputing Center (BSC). He received the B.S. and M.S. degrees in computer science from UPC, in 2006 and 2008, respectively. He has extensive experience in GPU programming and has participated in the development of other products that ease the development for GPUs like GMAC.

Learn how to boost your productivity with CUDArrays. CUDArrays is user-level library that eases the development of CUDA programs by offering a multi-dimensional array data type that can be used both in host and device code. This data type relieves programmers of the burden of managing multi-dimensional arrays through flat "C"-style memory allocations. Moreover, in systems with several GPUs and P2P memory access support, CUDArrays transparently distributes the computation across several GPUs. Using data access pattern information provided by the compiler, the runtime automatically determines how to partition (or replicate) the arrays to minimize the number of accesses to other GPUs' memories. Results show that linear speedups can be achieved in most cases. Examples will be provided for different types of scientific computations.

Session Level: Beginner
Session Type: Talk
Tags: Numerical Algorithms & Libraries

Day: Tuesday, 03/25
Time: 16:00 - 16:25
Location: Room LL21D

S4260 - Crash, Boom, Bang! Leveraging Game Physics and Graphics APIs for Scientific Computing

Peter Messmer ( Senior HPC DevTech Engineer, NVIDIA )
Highly-Rated Speaker
Peter Messmer
Peter Messmer joined NVIDIA in 2011 after spending more than 15 years developing HPC and GPU accelerated applications for industry and government clients, mainly in the area of plasma and EM simulations, data analysis and visualization. In his role as senior devtech engineer at NVIDIA, Peter is working with HPC users around the globe supporting them in accelerating their scientific discovery process by taking advantage of GPUs in their applications. Peter holds and MSc and PhD in Physics from ETH Zurich, Switzerland, with specialization in kinetic plasma physics and nonlinear optics.

In this talk, you will learn how to use the game and visualization wizard's tool chest to accelerate your scientific computing applications. NVIDIA's game physics engine PhysX and the ray tracing framework OptiX offer a wealth of functionality often needed in scientific computing application. However, due to the different target audiences, these frameworks are generally not very well known to the scientific computing communities. High-frequency electromagnetic simulations, particle simulations in complex geometries, or discrete element simulations are all examples of applications that could immediately benefit from these frameworks. Based on examples, we will talk about the basic concepts of these frameworks, introduce their strengths and their approximation, and how to take advantage of them from within a scientific application.

Session Level: Intermediate
Session Type: Talk
Tags: Computational Physics; Supercomputing; Numerical Algorithms & Libraries; Recommended Press Session – HPC-Science

Day: Tuesday, 03/25
Time: 16:00 - 16:50
Location: Room 212A

S4284 - Debugging PGI CUDA Fortran and OpenACC on GPUs with Allinea DDT

Sebastien Deldon ( Senior Software Engineer, PGI )
Sebastien Deldon
Working for PGI since 2004 on Benchmark performance analysis and tuning, compiler optimization implementations, profiler/debugger enhancements, LLVM back-end code generation for various CPUs (ARM/x86/others...) and NVIDIA GPUs.
David Lecomber ( CEO, Allinea )
David Lecomber
Dr David Lecomber is the CEO and a founder of Allinea Software and leads the research and development teams of its software products. He has played a role in parallel and high performance computing for over two decades. He holds a DPhil from Oxford University, where his research interests included programming models for concurrency and general purpose parallel computing, and he subsequently held research and teaching positions within the university. In 2002 he helped found Allinea Software to create tools for the forthcoming parallel era. The Allinea team has a passion for making the development of multi-process or multi-threaded codes easier - from the desktop through to supercomputers with millions of cores. David led the development team that created the world's first Petascale debugger and profiler - the only commercial software tools to have been proven to run concurrently on over 100,000 cores - and now in production use at over 700,000 cores.

PGI CUDA Fortran and OpenACC compilers are used extensively to take advantage of CUDA and NVIDIA GPUs within Fortran applications - and we will present new work that enables developers to debug the GPU kernels on the GPUs interactively. This work brings the benefits of true debugging to developers using CUDA and OpenACC for Fortran, including the ability to examine device state and memory and control GPU threads, and supplies a vital but previously missing weapon in the armory of GPU developers and is made available in the Allinea DDT 4.2.1 and PGI 14.1 releases.

Session Level: Intermediate
Session Type: Talk
Tags: Debugging Tools & Techniques

Day: Tuesday, 03/25
Time: 16:00 - 16:50
Location: Room LL20D

S4294 - Creating Particle Simulations with the Fluidix API

Adam MacDonald ( Software Engineer, OneZero Software )
Adam MacDonald
Adam MacDonald began developing physics simulation software in 2003 with St. Francis Xavier University in Nova Scotia, while pursing a degree in software engineering and a master's in computer science at the University of New Brunswick. He is the owner of OneZero Software in Fredericton, NB, Canada, and has worked on a range of projects including GPU-based tools for backscatter tomography, physical and biological simulation research, agent-based simulation with AI, and hardware development for pressure mapping systems and thermal conductivity tools.

This talk will provide an introduction to Fluidix, describe the binary tree-based neighbor search algorithm capable of handling 20 million particles/triangles per second, and walk through the process of designing and visualizing a complex SPH fluid simulation with dynamic and deformable 3D mesh-based boundary conditions. Fluidix is a CUDA-powered general particle simulation library and software platform which makes it easy for developers to design, run, and visualize any type of particle simulation for scientific research, real-time interactive systems, or visual effects. With an intuitive C++ API and integrated development environment abstracting template classes, data transfers, and parallel algorithms, it provides complete application flexibility and state of the art performance to users with only a basic knowledge of programming.

Session Level: Beginner
Session Type: Talk
Tags: Visual Effects & Simulation; Computational Physics; Media & Entertainment

Day: Tuesday, 03/25
Time: 16:00 - 16:25
Location: Room 210A

S4393 - Exploiting Application Scalability Using GPU Direct RDMA and Infiniband on Heterogeneous Clusters

Filippo Spiga ( HPC Application Specialist, HPCS, University of Cambridge )
Filippo is a HPC Application Specialist working at the High Performance Computing Service (HPCS) at the University of Cambridge. Previously he worked in top-level Research institutes/High Performance Computing centres (ICHEC, CINECA, CERN), and in Enterprise R&D (IBM Research) as well as wide multi-institutional collaborations (PRACE and EUAsiaGrid). At the University of Cambridge he works together with academics to help them exploit the benefits of GPU in their research. He is also one of the main developer of GPU-accelerated Quantum ESPRESSo. His main interests include general GP-GPU programming, numerical algorithms for GP-GPU, development of mixed multi-core CPU and GPU code and scientific application porting.

One of the main limitations in application scalability on heterogeneous clusters is the fact that, prior to any communication, data has to be transfer from device memory to host memory. NVIDIA introduced GPU Direct RDMA that provides a direct Peer-to-Peer data path between the GPU memory directly to/from the Infiniband card. We recently designed a new GPU cluster to be the highest performance and high scalable. The system consists of 128 Ivy-Bridge Intel nodes and 256 NVIDIA K20's. In order to exploit GPU Direct RDMA on all GPU each node is equipped with two Mellanox FDR Connect IB cards, each one sharing the same PCIe bus of one GPU. Thus the system represents the most scalable GPU cluster architecture possible today. The aim of this talk is to present improvements in application performance, showing some best practices to exploit properly GPU Direct. The pool of applications includes codes from both the scientific open-source community and industrial partners of HPCS in the domains of molecular modeling, electronic structure and CFD

Session Level: Intermediate
Session Type: Talk
Tags: Supercomputing

Day: Tuesday, 03/25
Time: 16:00 - 16:25
Location: Room LL21A

S4426 - Accelerating 3D Facial Modeling Using ArrayFire, OpenCV and CUDA

Umar Arshad ( Senior Engineer, ArrayFire )
Umar Arshad is an engineer at ArrayFire where he primarily works on improving concurrency in ArrayFire and applications using ArrayFire. He also created the CUDA and OpenCL Optimization training material and regularly gives tutorials throughout the country. Before joining ArrayFire Umar was a developer at Inovalon where he was involved with improving performance and designing large scale applications. Umar has graduated from Georgia State University with a Masters degree in Computer Science. At GSU, he studied Parallel programming and was the Program Chair of ACM in the University.

This session will discuss the lessons learned during the development of a facial modeling application used for glasses.com. The application made use of OpenCV and OpenMP to create a 3D representation of a person's face. Glasses.com wanted to improve the performance of the application to reduce run-times and hardware costs. We will discuss the performance requirements and the techniques used to meet their goals. Attendees will leave having learned how NVIDIA's visual profile is essential to profiling multi-threaded applications.

Session Level: Intermediate
Session Type: Talk
Tags: Computer Vision

Day: Tuesday, 03/25
Time: 16:00 - 16:25
Location: Room 212B

S4515 - Real-Time Geospatial Processing with NVIDIA® GPUs and CUDA-Based GIS Library

Srinivas Reddy ( CTO, SRIS )
Mr. Reddy is CTO at SRIS, a leading technology provider focusing primarily on GPU technology. He is responsible for overseeing the design and development of applications on Geospatially-Enabled Supercomputers and architecting next generation HPCs and HTCs using GPUs. He currently leads efforts to bring innovative solutions from various vendor partnerships to his clients. Mr. Reddy was a Senior Solutions Architect at IBM's Advanced Solutions, a leading provider of massive analytics serving federal markets. In 2009, Mr. Reddy received the prestigious NISC Chairman's Thought Leadership Award in recognition of his outstanding service, commitment to excellence, innovative approaches and solutions resulting in a positive impact to his customers' missions, high-quality performance, and noteworthy dedication to the realization of his client's objectives. Mr. Reddy has a B.S. degree in Zoology from Louisiana Tech University and a dual MBA/MS in Logistics and Transportation Management from Smith School of Business at University of Maryland at College Park.

GPUs have been highly successful in gaming technology and have become nearly universal in mobile devices, tablets, and laptops in their role as graphics processors and GIS platforms. User communities have been using GIS algorithms and libraries on CPUs for real-time computations. Since our problem set was a data-centric problem, we proposed that developing these GIS algorithms on NVIDIA GPUs should increase processing speed and efficiency. To test this hypothesis, we designed an architecture that included a Dell R720 with two NVIDIA K20x GPUs. We found that these algorithms lend themselves for perfect implementation on GPUs. Furthermore, GPUs became extremely important for their ability to process large amounts of data in real-time. Final testing revealed that processing speed increased by 200X (200/sec to 20,000/sec). In this session we will lay out the pitfalls of the current methods, our proposed architecture and some details about the findings.

Session Level: Intermediate
Session Type: Talk
Tags: Defense; Big Data Analytics & Data Algorithms

Day: Tuesday, 03/25
Time: 16:00 - 16:25
Location: Room 210D

S4640 - The Future of Entertainment: Immersive Reality System

Matteo Garibotti ( Research Fellow, University of Genoa )
Matteo Garibotti
Research Fellow at The Physical Structure of Perception and Computation (PSPC) group of the University of Genoa. Born in Genoa, Italy, on June 5th, 1987. MSc in Electrical Engineering in 2011. Work in the PSPC Lab since 2012 to the development of new techniques for displaying stereoscopic 3D environment, to allow the creation of new systems for 3D augmented reality. Author and co-author of few peer reviewed scientific papers, and co-inventor of an international patent. Moreover, tutor or supervisor of several B.Sc. and M.Sc. thesis, carried out at PSPC Lab.

Sooner or later, the worlds of videogames and cinemas will collide/meet halfway/join: we will have videogame with the quality of movies, and movies with the interaction of videogame. With a novel stereoscopic 3D visualization technique we developed and patented (www.truedynamic3d.com), we are able to create an immersive reality system where user can perceive virtual world completely merged with real world. This will lead to a new generation of entertainment contents, where movies will not be limited inside the frame of the monitor, but they will surround the user.

Session Level: Intermediate
Session Type: Talk
Tags: Media & Entertainment Summit; Virtual & Augmented Reality; Game Development; Computer Vision

Day: Tuesday, 03/25
Time: 16:00 - 16:25
Location: Room 211B

S4656 - Machine Learning with GPUs: Fast Support Vector Machines without the Coding Headaches

Stephen Tyree ( Graduate Student, Washington University in St. Louis )
Stephen Tyree is a PhD student in the Department of Computer Science and Engineering at Washington University in St. Louis. He holds a Bachelors degree in computer science and mathematics and a Masters degree in computer science, both from the University of Tulsa. His research focuses on parallel and approximate methods for fast and scalable machine learning.

Speeding up machine learning algorithms has often meant tedious, bug-ridden programs tuned to specific architectures, all written by parallel programming amateurs. But machine learning experts can leverage libraries such as CuBLAS to greatly ease the burden of development and make fast code widely available. We present a case study in parallelizing Kernel Support Vector Machines, powerful machine-learned classifiers which are very slow to train on large data. In contrast to previous work which relied on hand-coded exact methods, we demonstrate that a recent approximate method can be compelling for its remarkably simple implementation, portability, and unprecedented speedup on GPUs.

Session Level: Intermediate
Session Type: Talk
Tags: Machine Learning & AI; Big Data Analytics & Data Algorithms; Numerical Algorithms & Libraries

Day: Tuesday, 03/25
Time: 16:00 - 16:25
Location: Room LL21B

S4674 - Parallel Decomposition Strategies in Modern GPU

Sean Baxter ( Research Scientist, NVIDIA )
Highly-Rated Speaker
Sean Baxter
Sean is a research scientist at NVIDIA.

Learn strategies to decompose algorithms into parallel and sequential phases. These strategies make algorithmic intent clear while enabling performance portability across device generations. Examples include scan, merge, sort, and join.

Session Level: Intermediate
Session Type: Talk
Tags: Big Data Analytics & Data Algorithms; Performance Optimization

Day: Tuesday, 03/25
Time: 16:00 - 16:25
Location: Room 210B

S4686 - NVIDIA GRID for VDI: How To Design And Monitor Your Implementation

Florian Becker ( Sr. Director, Strategic Alliances, Lakeside Software Inc. )
Florian Becker
Florian Becker leads the global alliance with Citrix and the Citrix ecosystem for Lakeside Software. Prior to that, Florian led the worldwide consulting solutions team at Citrix Systems, where he and his team developed the implementation best practices and professional services offering for virtual application and desktop use cases for Citrix Consulting Services and System Integrators. With more than 15 years of experience in the high tech and software industry and experience in user-focused design, Florian is uniquely positioned to bridge the worlds of IT and the end-user. A physicist by training, Florian holds a Master of science in Information Systems from the University of Miami and completed the Customer Focused Innovation curriculum at Stanford University.
Ben Murphy ( Senior Applications Engineer and Product Manager, Lakeside Software Inc. )
Ben Murphy is a Senior Applications Engineer and Product Manager for the MarketPlace program at Lakeside Software, the makers of SysTrack. His primary work focuses on mathematical data analysis and reporting to transform data collected from distributed end points and servers into meaningful recommendations and information. Currently he is engaged in ongoing work with NVIDIA to provide both data driven assessments of GPU needs in production environments and provide ongoing insight into GPU consumption and performance for both physical and virtual systems.

Learn how to implement NVIDIA GRID technology in virtual desktop and application environments for graphics accelerated use cases. Apply industry-leading best practices to accurately assess the user community's current graphical application parameters and GPU utilization and use the data to accurately size and scale the vGPU implementation in VDI use cases. Monitor virtual GPUs to proactively detect changes in performance requirements of the end-user community and manage the end-user experience and to pinpoint performance bottlenecks in the environment.

Session Level: Beginner
Session Type: Talk
Tags: Graphics Virtualization Summit; Remote Graphics & Cloud-Based Graphics; Big Data Analytics & Data Algorithms; Computer Aided Design; Recommended Press Session – Graphics Virtualization

Day: Tuesday, 03/25
Time: 16:00 - 16:50
Location: Room 210F

S4742 - The Display of ALL

Kobi Ben Tzvi ( Co-Founder and CEO, Mishor3D )
Kobi Ben Tzvi is the Co-Founder & CEO of Mishor3D. Kobi graduated his B.Sc in Information Systems from the Technion institute of technology. After his ten-year career as a system engineer, developer & Project Manager in the defense industry, Kobi decided it was time for a change and started developing the Shadow Box Application.

Heads-Up Displays (HUD), once a unique and expensive technology which was only available in the cockpits of multimillion-dollar airplanes are now finding their way into many passenger cars, giving the driver the ability to access visually displayed information in closer proximity to forward scene events relative to conventional instrument panel displays. Contextual and augmented reality HMI (sometimes also referred to as contact analogue) is an HMI approach which intends to superimpose virtual marking, indications and any other information into the driver's actual vision of the real world and to provide an improved, most intuitive HMI to the driver. While AR HMI can be partially implemented with limited results in a 2D fashion (for example over a video screen), it is the use of a Heads-Up Display which provides the driver with a three-dimensional special impression needed for the best user experience. Come see how it is done.

Session Level: All
Session Type: Talk
Tags: Digital Manufacturing Summit; Automotive; In-Vehicle Infotainment (IVI) & Safety; Recommended Press Session – Digital Manufacturing

Day: Tuesday, 03/25
Time: 16:00 - 16:25
Location: Room 210H

S4747 - VRender: Pixar's GPU-Accelerated Volume Renderer

Florian Hecht ( Technical Director, Pixar )
Florian Hecht
Florian Hecht is a graphics software engineer at Pixar working on speeding up rendering and lighting workflows, in particular using GPU technology. He joined Pixar in 2011 after doing graphics research at UC Berkeley. He received a MS in computer science from the University of Karlsruhe, Germany as well as one from the Georgia Institute of Technology.

Pixar has developed an interactive, progressive renderer to speed-up the workflows involving volumetric special effects for our feature films. The renderer has been implemented with NVIDIA's CUDA and makes full use of available GPU performance and memory. The renderer supports various area lights sources and modifiers and implements a physically-based shading model using multiple importance sampling. It is used not just to create a preview but to produce the final frames for compositing in our movies. We'll talk about how the renderer is structured in terms of rendering phases and corresponding kernels on the GPU. We'll discuss how data is laid out and accessed in memory and how we deal with memory limitations. We'll go into details about how various features of the renderer work like customizable shaders, motion blur, shadows from surfaces, deep data output, screen space caching and guaranteed frame-rate interactivity.

Session Level: Beginner
Session Type: Talk
Tags: Media & Entertainment Summit; Recommended Press Session – Media & Entertainment; Recommended for All Press

Day: Tuesday, 03/25
Time: 16:00 - 16:25
Location: Room 211A

S4812 - An Introduction To the Possibilities of VR and AR For Digital Artists

Visa-Valtteri Pimia
Visa-Valtteri  Pimia
Known in the demoscene as visy, Visa-Valtteri has experimented with demo art since 2001 and he has been interested in the artistic possibilities of computers since laying his eyes on the Commodore 64. Working with Trilobit and Bilotrip, he is best known from demos for 8-bit devices such as Nintendo NES and Atari 2600 and has been homebrewing everything from games to glitchfests on all kinds of imaginable computer and console platforms. Professionally he's worked on game tech and mobile apps for companies like SCEE, Samsung and Remedy among others. He is a passionate gamer and helps run the indie game studio Hyperspace Yard.

VR is finally here despite all the hard years of birthing pains and problems along the way. Now that the hardware is up to the task, the real challenge lies in the VR software experiences. What does this mean for digital artists? How could we build meaningful immersive virtual realities for a single person to experience in the realm of digital art? What about getting rid of that ugly HDMI cable to the PC? This presentation introduces some ideas for interaction and exploration of a VR art piece, and outlines the technological requirements for a believable VR experience in the rendering and audio side of things. The presentation also touches on the latest advancements of AR technology and how it can be harnessed for similar uses.

Session Level: All
Session Type: Talk
Tags: Real-Time Graphics Applications; Virtual & Augmented Reality; NVScene

Day: Tuesday, 03/25
Time: 16:00 - 16:50
Location: Room 230C

S4837 - WebGL, HTML5 and How the Mobile Web Was Won

Tony Parisi ( Founder, Vizi )
Tony Parisi is an entrepreneur and career CTO/software architect. He has developed international standards and protocols, created noteworthy software products, and started and sold technology companies. Tony's passion for innovating is exceeded only by his desire to build great products. Tony is the co-creator of the VRML and X3D ISO standards for networked 3D graphics, and continues to innovate in 3D technology. Tony is the co-chair of the San Francisco WebGL Meetup (www.meetup.com/WebGL-Developers-Meetup), a founder of the Rest3D working group (http://www.rest3d.org/) and a member of the Khronos COLLADA working group creating glTF, the new file format standard for 3D web and mobile applications. Tony is also the author of O'Reilly Media's first authoritative book on WebGL, WebGL Up and Running (2012), and the upcoming Programming 3D Applications in HTML5 and WebGL (O'Reilly 2013). Tony is currently a partner in a stealth web media startup and has a consulting practice developing social games, game platforms and web applications for San Francisco Bay Area clients.

After years of fear, uncertainty and doubt, the jury is now in: HTML5 is the platform of choice for building cross-platform, connected applications for desktop and mobile. The advanced programming, animation and multimedia capabilities of modern web browsers, combined with hardware-accelerated 3D rendering provided by WebGL, represents a combination with limitless possibilities. With these technologies, developers can create immersive 3D games, integrated 2D/3D presentations, product displays, social media sites and more, all coded in JavaScript and running in the browser. This awesome power is also available to mobile devices: WebGL is now built into Android, and there are quality adapter libraries for use in developing hybrid applications (native + WebKit) for iOS. With HTML5 and WebGL, developers can build high-performance mobile 3D applications and web sites rivaling native implementations, in a fraction of the time. Join 3D pioneer and WebGL guru Tony Parisi as he explores the technology, busts the myths and tells us where it's really at for creating the next generation of 3D web and mobile applications.

Session Level: Intermediate
Session Type: Talk
Tags: Mobile Summit; Virtual & Augmented Reality; Game Development; Web Acceleration; Recommended Press Session – Mobile

Day: Tuesday, 03/25
Time: 16:00 - 16:25
Location: Room LL21C

S4847 - Using AR Capabilities to build new User Paradigms with Wearable Glasses

Raymond Lo ( CTO, Meta Company )
Raymond Lo is the currently the CTO (co-founder) of Meta company (www.spaceglasses.com). During his master and PhD study in the University of Toronto, Raymond and his supervisor Prof. Steve Mann who is often described as the 'father of wearable' had developed numerous Extreme Dynamic Range digital eyeglasses solutions which allow wearers to see in extreme contrast scene, often over a million-to-one contrast ratio, such as welding. Later, he developed several wearable eyeglasses with range imaging capabilities and lead him to the current development of Spaceglasses. His expertise in computer image processing and wearable/mobile computing have been pushing Meta forward quickly in a such competitive domain.

Learn how to develop novel user interfaces with Spaceglasses, the world first wearable augmented reality glasses that utilize optical see-through stereoscopic display and time-of-flight ranging imaging camera for hand tracking and recognition of user's environment. With the ability to track wearer's hand and surrounding surfaces in 3D space, a new form of human-computer interaction can be enabled by turning everyday objects into interactive surfaces with augmented graphics. Example augmented reality applications will be given to show the possible applications can be developed with our current AR-enabled eyeglasses.

Session Level: All
Session Type: Talk
Tags: Mobile Summit; Virtual & Augmented Reality; Computer Vision; Signal & Audio Processing

Day: Tuesday, 03/25
Time: 16:00 - 16:25
Location: Room 210E

S4849 - Citrix 3D Engineering Cloud: A Practical Approach (Presented by IBM)

Bret Bailey ( IT Specialist - Industrial Sector - Electronics Industry , IBM )
Bret has a Bachelors of Science and an MBA with 30 plus years of Information Technology experience. The focus of this experience is on IT solutions to Business Problems. Bret has spent the last 3 years exploring, testing, and building 3D Graphics Engineering solutions in the Cloud or Virtual Desktop Infrastructure (VDI) for high end graphics applications.

In today's fast changing business environment, companies are looking for ways to deliver better designs faster and cheaper while creating high quality products across an ecosystem of partners. To succeed, a company must transform its design processes by converting engineering silos into shared engineering clouds that improve collaboration, standardize processes and create a secure environment for sharing designs across operations and organizations including partners and suppliers. The 3D Engineering Cloud Solution is a high performance visual computing environment for organizations that have large 3D intensive graphics requirements and want to improve collaboration while protecting their assets and reducing costs. The 3D Engineering Cloud Solution is made possible due to a partnership between IBM, Citrix, and NVIDIA. This combination creates a unique 3D engineering environment in the Cloud.

Session Level: All
Session Type: Talk
Tags: Clusters & GPU Management; Computer Aided Design; Graphics Virtualization Summit; Remote Graphics & Cloud-Based Graphics

Day: Tuesday, 03/25
Time: 16:00 - 16:50
Location: Room LL21F

S4938 - Embedded Development For Tegra K1

Jesse Clayton ( Senior Manager DevTech Automotive/Embedded, NVIDIA )
Jesse Clayton is responsible for DevTech for automotive and embedded at NVIDIA. He has worked in the graphics industry for 9 years, spanning systems software, high performance computing, and computer vision. His current focus is on computer vision solutions for the automotive and embedded industries.

The Tegra K1 is a powerful SOC that will be leveraged across many industries. It is based on the same Kepler architecture as the world's fastest gaming systems and most efficient supercomputers and brings supercomputing power to mobile and embedded. Jesse Clayton from NVIDIA will articulate the embedded development process for Tegra K1. The talk will cover the platform, programming paradigm, and development tools, and provide details on the Tegra K1 architecture relevant to embedded applications.

Session Level: All
Session Type: Talk
Tags: Automotive; Computer Vision; Machine Learning & AI; Defense

Day: Tuesday, 03/25
Time: 16:00 - 16:50
Location: Room 220C

S4215 - GPU-Accelerated Large-Scale Dense Subgraph Detection

Andy Wu ( Research Scientist, Xerox Research Center )
Andy Wu
Andy Wu is a researcher working on large-scale data analytics project in XRCW (Xerox research center Webster, NY). In 2011, He graduated with a PhD degree on Computer science from Washington State University, Pullman, WA. The focus of his PhD research was on solving large-scale computational biology problems using high performance computers, and his research interests include parallel computing, string algorithms, graph algorithms and computational biology. He has published several papers in top-ranked conferences and journals, including TPDS, SC, ICPP, Nature Genetics, etc. He is utilizing GPU to solve the large-scale graph analytics in XRCW.

Large-scale dense subgraph detection problem has been an active research area for decades. It has numerous applications in web and bioinformatics domains. Therefore, numerous algorithms are designed to tackle this graph kernel. Due to the computation limitation, traditional approaches are infeasible when dealing with large-scale graph with millions or billions vertices. In this presentation, we proposed a GPU accelerated dense subgraph detection algorithm to solve the large-scale dense subgraph detection problem. It successfully mapped the irregular graph clustering problem into the GPGPU platform, and extensive experimental results demonstrated our strong scalability on the GPU computing platforms.

Session Level: Intermediate
Session Type: Talk
Tags: Big Data Analytics & Data Algorithms; Bioinformatics & Genomics

Day: Tuesday, 03/25
Time: 16:30 - 16:55
Location: Room 210B

S4262 - GGAS: Global GPU Address Spaces for Efficient Communication in GPU Clusters

Holger Fröning ( JProf, University of Heidelberg )
Holger Fröning
Holger Fröning is a junior professor at the Department of Mathematics and Computer Science at the Ruprecht-Karls University of Heidelberg, and leads the Computer Engineering Group. His research interests include parallel computing, computer architecture, interconnection networks and hardware design with a recent focus on application-specific heterogeneous computing and resource aggregation. He is spokesman of the "Advanced Computer Architecture" research group, and helps the EXTOLL company as a research consultant. From 2008 to 2011 he contributed as senior researcher to the Parallel Architectures Group at the Technical University of Valencia (Spain), led by Jose Duato. He has received his MSc and PhD degrees 2001 respectively 2007 from the University of Mannheim, Germany. He has published papers in prestigious peer-reviewed conferences and journals, and contributed to several books. He served as program committee member and reviewer in conferences, journals and workshops. He is co-organizer of the "Heterogeneous Unconventional Cluster Architecture and Applications (HUCAA)" workshop at the ICPP conference.

Modern GPUs are powerful high-core-count processors, widely used to accelerate computationally intensive general-purpose tasks. For peak performance, GPUs are distributed throughout the cluster. Current solutions typically combine the bulk-synchronous task model of GPUs with message passing semantics, which significantly increases complexity and requires the CPUs to communicate among distributed GPUs. This talk presents Global GPU Address Spaces (GGAS), which span over the device memories of GPUs at the cluster level for sharing and aggregation purposes. GGAS allow low overhead synchronization and efficient data movement between GPUs and confine control flow to the GPU domain for all computation and communication tasks. Both aspects contribute to time and energy savings. In addition, GGAS maintain the GPUs bulk-synchronous programming model by relying on a thread-collective communication model, which reduces the complexity of parallel programming on distributed GPUs significantly.

Session Level: Intermediate
Session Type: Talk
Tags: Supercomputing; Programming Languages & Compilers; Clusters & GPU Management

Day: Tuesday, 03/25
Time: 16:30 - 16:55
Location: Room LL21A

S4307 - GPU Accelerated Parallel Simulated Annealing for Fitting Molecular Dynamics Potentials

Pierre-Yves Taunay ( Research Programmer, The Pennsylvania State University )
Pierre-Yves Taunay
Pierre-Yves Taunay obtained his Master of Science in Aerospace Engineering from The Pennsylvania State University in 2012, along with a "General Engineer" degree from the Ecole Centrale de Nantes, France. He has been working since then at the Research Computing and Cyberinfrastructure unit at The Pennsylvania State University as a Research Programmer. His current research focuses on high performance computing for large scale engineering and scientific applications such as molecular dynamics, fluid dynamics, or plasma physics, using Graphics Processing Unit (GPU) and the Message Passing Interface (MPI) standard.

This work presents a parallel simulated annealing implementation for fitting molecular dynamics potentials. In our implementation, each GPU is given a random set of Lennard-Jones parameters sigma and epsilon, and performs separately a molecular dynamics simulation. A derived quantity, the structure factor, is then compared to experimental data and determines the quality of the fitting parameters. Information about the best fit is exchanged across GPUs at a fixed number of iterations. The choice of random parameters is then restarted in the vicinity of the best parameter set. Using GPUs, a larger parameter set can be explored in a given time as molecular dynamics simulations benefit greatly from GPU acceleration.

Session Level: Beginner
Session Type: Talk
Tags: Molecular Dynamics; Numerical Algorithms & Libraries; Supercomputing

Day: Tuesday, 03/25
Time: 16:30 - 16:55
Location: Room LL21E

S4433 - Real-Time Spectral Rendering

C. Allen Waddle ( Ph.D. Candidate, University of Victoria )
Allen Waddle is currently a Ph.D. candidate in the Department of Computer Science at the University of Victoria in British Columbia, Canada. In 2009 he received from the same university B.Sc. degrees in Combined Physics and Biochemistry and Combined Computer Science and Mathematics. His research interests include high performance computing on GPUs and real-time, physically based computer graphics.

The sensation of color is the result of light interacting with materials and an observer's visual system. In computer graphics this process is usually modeled by multiplying RGB triplets. While this practice is simple and efficient, it does not predict color reliably, nor can it model phenomena that are spectral in nature, such as iridescence. Spectral rendering overcomes these shortcomings, but the cost of multiplying spectra of lights and materials is prohibitive for real-time applications. Instead, we show how to pose and solve an optimization problem that constructs a low-dimensional basis in which spectra can be multiplied on a GPU at a cost that is only a fraction above the cost of RGB rendering. This method can be used in applications seeking to predict colors accurately in real time, examples of which can be found in optics, architectural rendering, industrial design, manufacturing and product marketing.

Session Level: Advanced
Session Type: Talk
Tags: Real-Time Graphics Applications; Visual Effects & Simulation; Rendering & Animation

Day: Tuesday, 03/25
Time: 16:30 - 16:55
Location: Room 210C

S4586 - Reasoning About Memory Performance Using Index-Digit Notation

Brandon Lloyd ( Software Engineer, NVIDIA )
Brandon Lloyd
Brandon got his Ph.D. at the University of North Carolina at Chapel Hill for his work with shadows. He worked at Microsoft Research for several years doing GPGPU work, including contributions to the DirectX11 FFT library. He now works at NVIDIA in the OptiX group working on GPU raytracing.

Achieving good memory performance in CUDA for algorithms on arrays with non-trivial access patterns, such as transpose or FFT, requires careful attention to shared memory bank conflicts, global memory coalescing, and on older GPUs, partition camping. Thinking about memory performance issues in the native multi-dimensional problem domain can sometimes be challenging. Index-digit notation provides an abstract representation of memory access patterns that can make reasoning about solutions to memory performance issues easier. In this session learn how to resolve bank conflicts, coalescing, and partition camping by performing simple transformations in index-digit notation. Applications to transpose and FFT will be discussed.

Session Level: Intermediate
Session Type: Talk
Tags: Numerical Algorithms & Libraries

Day: Tuesday, 03/25
Time: 16:30 - 16:55
Location: Room LL21D

S4661 - Halide: A language for Portable High-Performance Image Processing

Andrew Adams ( Software Engineer, Google )
Highly-Rated Speaker
Andrew Adams
Andrew Adams is a software engineer at Google, where he works on the Halide compiler. Andrew did his doctoral work at Stanford under Marc Levoy, where he worked on programmable cameras, light fields, and fast bilateral filtering. He then moved to MIT to work with Fredo Durand and Jonathan Ragan-Kelley on Halide, before rejoining Marc at Google in February 2013.

Learn how Halide can help you write a single implementation of an image processing routine that achieves performance comparable with hand-tuned assembly on ARM, x86, and GPUs. Halide factors an imaging pipeline into two parts: A pure functional description of the algorithm; and a separate 'schedule' which specifies how to vectorize, parallelize, tile, fuse or inline the stages of the pipeline. The schedule varies per architecture but the algorithm does not, and changing the schedule is guaranteed not to change the result.

Session Level: Intermediate
Session Type: Talk
Tags: Media & Entertainment Summit; Computational Photography; Programming Languages & Compilers; Video & Image Processing

Day: Tuesday, 03/25
Time: 16:30 - 16:55
Location: Room 211A

S4680 - Exploiting the GPU for High Performance Geospatial Situational Awareness Involving Massive and Dynamic Data Sets

Bart Adams ( Software Engineering Manager, Luciad )
Bart  Adams
Dr. Bart Adams, Software Engineering Manager, is with Luciad since April 2009. He holds an MSc and Ph.D. in Engineering (University of Leuven, Belgium) and spent two years as a post-doctoral researcher at Stanford University, USA. Within the R&D group of Luciad, he manages the research on novel algorithms for high-performance computation and visualization on desktop and mobile devices.

Geospatial Situational Awareness(SA)engines face stringent accuracy and performance requirements. Large volumes of static and dynamic data need to be analyzed and visualized, in both 2D and 3D in various geographic projections, at sub-centimeter accuracy and interactive update rates. In contrast to game engines where this data can be pre-processed and stored in optimized data structures, the data comes in any form and needs to be interpreted on-the-fly. This talk will discuss these challenges and the advanced GPU rendering techniques and algorithms that address them. We will show that by exploiting the GPU, terabytes of terrain and imagery data, in combination with highly dynamic data streams that can contain millions of tracks and multiple radar feeds as well as orthorectified UAV video streams, can be handled on a world-scale theater at update rates of over 60Hz.

Session Level: Advanced
Session Type: Talk
Tags: Defense; Combined Simulation & Real-Time Visualization; Big Data Analytics & Data Algorithms; Desktop & Application Virtualization

Day: Tuesday, 03/25
Time: 16:30 - 16:55
Location: Room 210D

S4727 - Large Scale Reservoir Simulation Utilizing Multiple GPUs

Garfield Bowen ( Simulator Lead, Ridgeway Kite Software )
Garfield Bowen
Garf Bowen is a leading figure in reservoir simulation serving as a committee member on the SPE (Society of Petroleum Engineers) Reservoir Simulation Symposium and as a member of the editorial committee of a number of SPE Journals. Garf has been associated with the ECLIPSE reservoir simulator development group from 1987 and has contributed as scientific and technical authority and innovation champion to successor projects and bespoke developments for a range of clients. Originally a mathematician he has built a career around developing elegant solutions to the practical problems facing reservoir engineers on a daily basis. Garf is currently leading the development of a new simulator for Ridgeway Kite, a UK based software start-up venture.

Reservoir simulation has a long history as a tool used by reservoir engineers to plan and optimize (oil & gas) field developments. These simulations are inevitably 3-dimensional and transient and hence require considerable computing resources. Traditional simulators are typically constrained by the bandwidth to memory. The GPU architecture allows access to greater bandwidth, once the simulator is parallel. However, the memory constraints on a GPU, limit the problem size that can be tackled. In this presentation we describe a paradigm where we utilize a single GPU if the problem will fit into the memory and simply scale to multiple GPUs as the memory requirement grows. The practicality is demonstrated by running a 32 million cell case on 32 Tesla GPUs.

Session Level: All
Session Type: Talk
Tags: Energy Exploration; Computational Fluid Dynamics; Computer Aided Design

Day: Tuesday, 03/25
Time: 16:30 - 16:55
Location: Room LL20B

S4739 - Using GPUs to Accelerate Learning to Rank

Alexander Shchekalev ( Senior Developer, Yandex )
Alexander Shchekalev graduated from Saint Petersburg State University in 2011 with a Master's degree in Information Technology. He researched methods of machine learning and their implementation on the GPU. From 2008 to 2011 he worked at Mogmo on development of computer vision system for automatic recognition of structured documents. Since 2011 Alexander works at Yandex where his activities are connected to research in the fields of GPGPU and machine-learned ranking.

Machine learning is a powerful tool for processing large amounts of data. Learning to rank plays a key role in many information retrieval problems and constructs a ranking model from training data. Ensemble methods allow us to make a trade-off between the quality of the obtained model and computational time of the learning process. On the other hand a lot of algorithms imply parallel processing of data. We describe the task of machine-learned ranking and consider the MatrixNet algorithm based on decision tree boosting. We present GPU optimized implementation of this method, which performs more than 20 times faster compared to the CPU based version and retains the same quality of ranking.

Session Level: Intermediate
Session Type: Talk
Tags: Machine Learning & AI

Day: Tuesday, 03/25
Time: 16:30 - 16:55
Location: Room LL21B

S4740 - A Day in the Life of a Designer at Gulfstream

Sean Thornton ( Concept Visualization Designer, Industrial Design Department , Gulfstream Aerospace Corporation )
After graduating from the Savannah Collage Of Art And Design in 1996 with a bachelor of fine arts in Computer Art, I began my career at Savannah-based Gulfstream Aerospace Corp. in the Interior Design Department as a Visualization Designer, providing 2D and 3D interior and exterior renderings for private business jet customers. In 2006, I joined the New Product Development Design Group, a multi-disciplinarian group consisting of engineers and technical illustrators working on the, then secret, G650. Soon, we added the diverse competencies and talents from the automotive design world, adopted new processes in data sharing and data review, and incorporated some new visualization software from Bunkspeed. Gulfstream engineers build data in Dassault Systèmes' Catia program; I import and build on engineering data in Autodesk® Alias and then render in (Bunkspeed's) Hyperdrive/ Hypershot. By leveraging the strengths of these varied applications, my rendering process improved in speed, quality, and ease of execution with dependable results. With the addition of some Adobe favorites (Photoshop, Illustrator, etc.), the design and visualization loop of conception to completion is closed. I am still in Savannah working on various special projects from big to small, in the aircraft cabin and out. Our group and commissions continue to grow, and Bunkspeed Drive has played a significant role in generating those powerful images that keep our business moving into the future. I am never sure what challenges my next assignment will hold but I feel secure in being able to complete any task at hand.

On any given day in the world of product development, a project may fall within a range from small latches in an open environment to full-scale, large production facilities with rigs, machinery, storage bins, and even tool boxes for senior leadership's possible approval. Our studio produces renderings for space, color, texture, feasibility, assembly and anything in-between. Working in both print and digital presentation media, our team of engineers, ascetic artists, interface designers, and surface modelers pool our resources to create highly detailed, photo-realistic imagery to aid our leaders in seeing the studio's vision. Our studio's information pipeline, software and equipment are the backbone for the thousands of images produced annually. Despite the occasional setbacks from cross-disciplinary collaboration and approval processes, having the ability to create quick, high fidelity images is instrumental in our business.

Session Level: All
Session Type: Talk
Tags: Digital Manufacturing Summit; Digital Product Design & Styling; Ray Tracing; Manufacturing; Recommended for All Press

Day: Tuesday, 03/25
Time: 16:30 - 16:55
Location: Room 210H

S4805 - Streamlined Transmission of 3D Assets with glTF

Fabrice Robinet ( Engineering Manager, MontageJS )
Fabrice Robinet works for MontageJS on seamlessly integrating 3D content on the web. Fabrice is also the COLLADA Working Group Chair at Khronos and lead for glTF (graphics library Transmission Format). Prior to joining the MontageJS team, Fabrice worked as an engineer at Apple where he co-created the Scene Kit framework.

This session explores issues around delivering real-time 3D content into mobile and web applications, by considering the following questions: (1)Images have JPEG, musics have MP3, why not a format to deliver 3D Content? (2) When designing a format for delivery, we can't ignore the underlying graphic API (GL) to do so. Therefore, wouldn't the most efficient engine formats eventually converge on the same kind of design? (3)Once content is baked and ready to be consumed by GL, how can we improve transfer rate with dedicated compression? (4) Wouldn't it be great to have a declarative way to represent GL content, so that developers can easily build a data-driven engine? (5)Why not centralize these common and so far redundant efforts to design a delivery and runtime format that is truly efficient for GL APIs? During this this show and tell presentation, glTF (graphics library Transmission Format) will be introduced. Following an overview of the eco-system, an introduction to glTF design and catchy demos from different implementations will be shown.Finally, compression results leveraging Open3DGC will be shared.

Session Level: Intermediate
Session Type: Talk
Tags: Mobile Summit; Real-Time Graphics Applications; Game Development; Web Acceleration

Day: Tuesday, 03/25
Time: 16:30 - 16:55
Location: Room LL21C

S4826 - Augmented Reality Gets Deep: The Impact of 3-D and Depth Cameras on Visualization

Rajesh Narasimha ( Sr. Computer Vision Researcher, Metaio )
Rajesh Narasimha
Rajesh Narasimha is a Senior researcher at Metaio and heads a research and development team in Dallas. Prior to this he spent 5 years as a Research Staff at Texas Instruments R&D Center. He obtained his Ph.D. in Electrical and Computer Engineering from Georgia Institute of Technology. His active areas of research are computer vision and machine learning.

Learn how the introduction and future integration of embedded 3-D cameras will affect computer vision and augmented reality experiences. This session will look at 3-D camera companies like Primesense and Softkinetic as well as the enabling technologies that take advantage of them. Learn also how this technology can be applied in automotive, manufacturing, and even consumer sectors.

Session Level: Intermediate
Session Type: Talk
Tags: Mobile Summit; Virtual & Augmented Reality; Automotive; Digital Product Design & Styling

Day: Tuesday, 03/25
Time: 16:30 - 16:55
Location: Room 210E

S4833 - Speeding Innovation for the Global Multiscreen Market with Virtualization through Software-defined Video Processing

Jesse Rosenzweig ( Co-Founder, Elemental )
Jesse brings a deep background in video systems and software to Elemental, which he co-founded and where he oversees the Cloud Development, Web Development, and R&D departments. Prior to Elemental, Jesse worked at Pixelworks, Qualcomm, and Ericsson. He earned a B.S. in computer science from the University of Colorado, Boulder.

The biggest challenge facing multiscreen content providers is keeping pace. Last year, Elemental, the leading multiscreen content delivery solutions supplier and pioneer of the use of GPUs to optimize video streaming over IP networks, responded with breakneck-paced innovation. In just under 11 months, Elemental launched the most complete HEVC codec implementation, supported the first real-time 4Kp60 live transmissions, and delivered hybrid ground-to-cloud solutions to major brand customers, including the industry's only cloud-bursting workflow to feature ground and cloud clusters working together with full feature parity. The key enabler: flexible software built upon high-performance, programmable hardware. This presentation will explore how Software-Defined Video Processing (SDVP) and the ubiquity of GPUs as virtual machines on the ground and in the Cloud provide the optimal core for large-scale media deployment architectures. It will also explore challenges with current virtual machine solutions and address customer uncertainty about virtualized video processing reliability and functionality.

Session Level: All
Session Type: Talk
Tags: Media & Entertainment Summit; Remote Graphics & Cloud-Based Graphics; Recommended Press Session – Media & Entertainment

Day: Tuesday, 03/25
Time: 16:30 - 16:55
Location: Room 211B

S4118 - Smackdown GPU Optimized VDI Solutions: 2014 Edition

Ruben Spruijt ( CTO, PQR )
Ruben Spruijt
Ruben Spruijt is CTO of PQR and focuses primarily on enterprise mobility, virtualisation and cloud management. He is actively involved in determining PQR’s vision and strategy. Ruben is a Microsoft Most Valuable Professional (MVP), Citrix Technology Professional (CTP), VMware vExpert and is also the only European with these three virtualisation awards. With his expertise, he is able to give customers sound advice, motivate his colleagues and write blogs, articles and opinion pieces on a regular basis. Through his presentations in several national and international congresses, Ruben shares his thoughts and knowledge on application and desktop delivery, and on virtualisation solutions. If you have feedback or questions for Ruben, send him an email at rsp@pqr.nl or follow him on Twitter @rspruijt.and desktop delivery, and on virtualisation solutions. If you have feedback or questions for Ruben, send him an email at rsp@pqr.nl or follow him on Twitter @rspruijt.

Get up to speed with GPU optimized VDI solutions. More and more customers see the benefits of server-hosted Desktop Virtualization solutions such as VDI. There are several important players in this market space, and from a marketing perspective these solutions have a lot in common. This presentation is based on industry analysis,customer cases and covers the Good/Bad/Ugly of VDI, the various use-case scenarios of GPU in Desktop Virtualization. An overview of Technology partners such as NVIDIA, MainFrame2, Teradici and the technical differences between Citrix, Microsoft and VMware VDI solutions, their approach to deliver Graphical,- resource intensive applications to end-users and finally tips on how to choose the right solution. Join Ruben Spruijt (MVP, CTP and vExpert - CTO @ PQR) as he provides opinions and great argumentation on such topics. Want to receive industry insights? Follow Ruben on twitter @rspruijt

Session Level: Intermediate
Session Type: Talk
Tags: Graphics Virtualization Summit; Desktop & Application Virtualization; Recommended Press Session – Graphics Virtualization

Day: Tuesday, 03/25
Time: 17:00 - 17:50
Location: Room 210F

S4155 - Portability and Performance: A Functional Language for Stencil Operations

Gerhard Zumbusch ( Professor, Friedrich-Schiller Universität Jena )
Gerhard Zumbusch
Gerhard is professor for scientific computing at the mathematics department of University Jena. He is working on efficient numerical algorithms, parallel computing and the solution of partial differential equations. He graduated from TU München, received his PhD from FU Berlin and habilitation from University Bonn.

A new programming language designed for stencils operations in explicit finite difference and image processing applications is introduced. Learn to use a small domain specific functional language. It allows for a short and portable way to express numerical schemes. Objects are immutable functions without storage type and side effects. The results are independent of the order of instructions and decisions to redundantly re-compute partial results. The scheduling of instructions and the storage layout, the partition into GPU kernels and the memory management are all left to the compiler. Learn about the parallel patterns used by the compiler to create high performance implementations of the numerical scheme for a specific problem size and hardware configuration. These include data layout for effective vectorization, strategies to re-compute or cache intermediate results, sliding window and space-time tiling of the iteration space, and list-scheduling to create code blocks for off-loading. Strategies which are useful in general.

Session Level: Intermediate
Session Type: Talk
Tags: Programming Languages & Compilers; Numerical Algorithms & Libraries

Day: Tuesday, 03/25
Time: 17:00 - 17:25
Location: Room LL20D

S4163 - Heterogeneous CPU+GPU Molecular Dynamics Engine in CHARMM

Antti-Pekka Hynninen ( Research Scientist, National Renewable Energy Laboratory )
Antti-Pekka Hynninen
Antti-Pekka Hynninen is a research scientist at the National Renewable Energy Laboratory where his main focus is the software development of the CHARMM Molecular Dynamics (MD) engine. Antti-Pekka recently rewrote the CHARMM MD engine core functions to be 2x faster than the previous version and he implemented a modern domain decomposition algorithm in CHARMM that enables parallel simulations on hundreds of CPU cores. Prior to joining NREL, Antti-Pekka was a post-doctoral researcher at Princeton University where he did research on Monte Carlo simulations of charged colloids. Antti-Pekka holds a PhD in physics from Utrecht University, The Netherlands.

This presentation provides a first glimpse of a heterogeneous CPU+GPU Molecular Dynamics (MD) engine in CHARMM. In the MD engine, the GPU is used for the calculation of the direct part of the non-bonded force calculation, while the CPU takes care of the rest of the work (reciprocal force calculation, bonded force calculation, integration, etc.). The MD engine is built around the CHARMM domain decomposition code enabling massively parallel MD simulations on multiple CPU+GPU nodes. The new MD engine outperforms the CPU code by a factor of 8 or more.

Session Level: Beginner
Session Type: Talk
Tags: Molecular Dynamics; Supercomputing; Computational Physics; Numerical Algorithms & Libraries

Day: Tuesday, 03/25
Time: 17:00 - 17:25
Location: Room LL21E

S4439 - ATCOM: A Real-Time Image Enhancement Platform for Surveillance

Eric Kelmelis ( CEO, EM Photonics, Inc. )
Eric Kelmelis is CEO and Co-Founder of EM Photonics. For over 10 years, EM Photonics has focused on computational acceleration and efficient high performance computing primarily in the fields of scientific computing and image processing. Mr. Kelmelis has bachelors and masters degrees in Electrical Engineering from the University of Delaware and has more than 50 technical papers, 3 patents, and a book chapter. He also currently serves as chair of the Modeling and Simulation conference at SPIE's Defense, Security, and Sensing Symposium and as a Visiting Instructor at the University of Delaware.

Learn how GPUs can be applied to real-time, real-world image processing applications. Images and videos recorded at long distances (greater than 1 mile) often suffer degradation due to the atmospheric turbulence between the subject and camera, which severely limits the quality of data that is captured by high-end imaging systems. We will discuss the practical considerations of keeping up with real-time video, including kernel performance and pipelining, and effectively using multiple GPUs in a real-time context. We have optimized specifically for the Kepler warp-shuffle instruction and will go in depth on the performance boosts offered by this new technology.

Session Level: Intermediate
Session Type: Talk
Tags: Defense; Video & Image Processing

Day: Tuesday, 03/25
Time: 17:00 - 17:25
Location: Room 210D

S4440 - GPU-Based Visualization for Flight Simulation

Tim Woodard ( Director of Research and Development, Diamond Visionics )
Tim Woodard
Mr. Tim Woodard is the Director of Research and Development at Diamond Visionics, with over 17 years experience specializing in the design and development of software architectures for real-time PC-based image generation, advanced C++, and modern OpenGL techniques. Mr. Woodard has received patents for the real-time simulator database generation technology which forms the basis of Diamond Visionics' GenesisRTX™ worldwide database generation system. GenesisRTX™ provides high-fidelity generation, visualization, and manipulation of visual databases at run-time directly from source data on low-cost PC-based platforms, eliminating the need for traditionally labor-intensive off-line database production processes.

Learn about the unique challenges which arise in designing modern visualization systems for use in real-time flight simulation and how recent GPU advancements are helping to address them. Scene generation tasks that have traditionally required extensive pre-computation can now be performed on the GPU. This has numerous advantages including instant feedback, greater scene complexity and fidelity, and allows for hardware consolidation.

Session Level: All
Session Type: Talk
Tags: Real-Time Graphics Applications; Combined Simulation & Real-Time Visualization; Virtual & Augmented Reality; Collaborative & Large Resolution Displays

Day: Tuesday, 03/25
Time: 17:00 - 17:25
Location: Room 210C

S4525 - Building Random Forests on the GPU with PyCUDA

Alexander Rubinsteyn ( Ph.D. Student, NYU )
Alex Rubinsteyn is a Computer Science Ph.D. student at NYU. His interests are a high variance mixture distribution around programming language implementation and machine learning.

Random Forests have become an extremely popular machine learning algorithm for making predictions from large and complicated data sets. The currently highest performing implementations of Random Forests all run on the CPU. We implemented a Random Forest learner for the GPU (using PyCUDA and runtime code generation) which outperforms the currently preferred libraries (scikits-learn and wiseRF). The "obvious" parallelization strategy (using one thread-block per tree) results in poor performance. Instead, we developed a more nuanced collection of kernels to handle various tradeoffs between the number of samples and the number of features.

Session Level: Intermediate
Session Type: Talk
Tags: Machine Learning & AI; Big Data Analytics & Data Algorithms

Day: Tuesday, 03/25
Time: 17:00 - 17:25
Location: Room LL21B

S4535 - Accelerating HPL on Heterogeneous Clusters with NVIDIA GPUs

Dhabaleswar K. (DK) Panda ( Professor, The Ohio State University )
Highly-Rated Speaker
Dhabaleswar K. (DK) Panda is a Professor of Computer Science and Engineering at the Ohio State University. He has published over 300 papers in major journals and international conferences. The MVAPICH2(High Performance MPI over InfiniBand, iWARP and RoCE) open-source software package, developed by his research group (http://mvapich.cse.ohio-state.edu), are currently being used by more than 2,085 organizations worldwide (in 72 countries). This software has enabled several InfiniBand clusters to get into the latest TOP500 ranking during the last decade. More than 183,000 downloads of this software have taken place from the project's website alone. He is an IEEE Fellow and a member of ACM.

Learn about the design and use of a hybrid High-Performance Linpack (HPL) benchmark to measure the peak performance of heterogeneous clusters with GPU and non-GPU nodes. HPL continues to be used as the yardstick for ranking supercomputers around the world. Many clusters, of different scales, are being deployed with only a subset of nodes equipped with NVIDIA GPU accelerators. Their true peak performance is not reported due to the lack of a version of HPL that can take advantage of all the CPU and GPU resources available. We discuss a simple yet elegant approach of a fine-grain weighted MPI process distribution to balance the load between CPU and GPU nodes. We use techniques like process reordering to minimize communication overheads. We use a real-world cluster, Oakley at the Ohio Supercomputer Center, to evaluate our approach. On a heterogeneous configuration with 32 GPU and 192 non-GPU nodes, we achieve up to 50% of the combined theoretical peak and up to 80% of the combined actual peak performance of the GPU and non-GPU nodes.

Session Level: Intermediate
Session Type: Talk
Tags: Supercomputing

Day: Tuesday, 03/25
Time: 17:00 - 17:25
Location: Room LL21A

S4607 - Smoke & Mirrors: Advanced Volumetric Effects for Games

Simon Green ( Principal Software Engineer, NVIDIA )
Simon Green
Simon Green is a principal software engineer in the Developer Technology group at NVIDIA. He started graphics programming on the Sinclair ZX-81, which had 1 Kb of RAM and a screen resolution of 64 by 48 pixels, and has been trying to improve the quality of real-time graphics ever since. He received a B.S. in computer science from the University of Reading, U.K. in 1994. His research interests include cellular automata, physically-based simulation and analogue synthesizers.
Nuttapong Chentanez ( Senior PhyX Researcher, NVIDIA )
Nuttapong  Chentanez
Nuttapong Chentanez obtained his PhD in computer science from University of California, Berkeley. His main research interest includes liquid and deformable body simulation both for real-time and off-line applications.

Learn how to add volumetric effects to your game engine - smoke, fire and explosions that are interactive, more realistic, and can actually render faster than traditional sprite-based techniques. Volumetrics remain one of the last big differences between real-time and offline visual effects. In this talk we will show how volumetric effects are now practical on current GPU hardware. We will describe several new simulation and rendering techniques, including new solvers, combustion models, optimized ray marching and shadows, which together can make volumetric effects a practical alternative to particle-based methods for game effects.

Session Level: Advanced
Session Type: Talk
Tags: Game Development; Visual Effects & Simulation; Rendering & Animation

Day: Tuesday, 03/25
Time: 17:00 - 17:50
Location: Room 210A

S4612 - Speeding Up GraphLab Using CUDA

Vishal Vaidyanathan ( Partner, Royal Caliber )
Vishal Vaidyanathan
Vishal graduated from Stanford University in 2007 with a Ph.D. in Computational Chemistry and an M.S. in Financial Mathematics. He developed the first Folding@Home client that used GPUs to accelerate biomolecular simulations by 50 times over what was previously possible. From 2007-2009 Vishal worked at Goldman Sachs developing the first fully automated high frequency trading solution for the US Treasury desk in New York. Subsequently as co-founder of a startup in Silicon Valley, he developed low-latency trading systems and HFT strategies for futures contracts. Vishal joined Royal Caliber as a partner in 2012.

We demonstrate how describing graph algorithms using the Gather-Apply-Scatter (GAS) approach of GraphLab allows us to implement a general purpose and extremely fast GPU based framework for describing and running graph algorithms. Most algorithms and graphs demonstrate a large speedup over GraphLab. We show that speedup is possible when using multiple GPUs within a box and that processing of large graphs is possible - with the latest Tesla cards over 48GB of GPU memory can be available within a single box. Example algorithms will include pagerank, bfs, and sssp. The precursor to this work serves as the basis for other attempts at a GPU-based GAS framework.

Session Level: Intermediate
Session Type: Talk
Tags: Big Data Analytics & Data Algorithms; Performance Optimization

Day: Tuesday, 03/25
Time: 17:00 - 17:25
Location: Room 210B

S4643 - Looking Back into the Future

Daniel Simon ( Creator, Cosmic Motors & The Timeless Racer )
Highly-Rated Speaker
Daniel Simon has been drawing since the age of 3, and has never stopped since. After attending the internationally acclaimed car design college in Pforzheim, Germany, he enjoyed five years of explosively creative work at Volkswagen's advance studio in Spain. In 2005, Simon worked as Senior Designer for Bugatti Automobiles. 2007, his first book was published, Cosmic Motors, opening the doors to Hollywood, becoming vehicle concept designer on the feature film 'Tron: Legacy'. Simon now works in Los Angeles, California, on a variety of concept design projects for Hollywood productions and clients. Clients today include Warner Brothers, Universal, Disney, Lotus, and even Formula 1. Recently, Simon focused his style on vehicles for Hollywood blockbuster films like 'Captain America' and 'Oblivion'. His best-selling book 'Cosmic Motors' sets new standards for automotive visualization and concept design. With his brand new book series 'The Timeless Racer' Simon takes us on a visual journey to explore a racing world as it could have existed between 1916 and 2615, entirely rendered on nVidia GPUs.

Fascinated with both the history and future of vehicles, renowned designer Daniel Simon enjoys seeing the bigger picture of time when approaching his projects. Accompanied by images and clips of his highly detailed vehicle work on films such as Tron: Legacy, Captain America, Oblivion and upcoming book The Timeless Racer, Simon will share his thought process and tools when creating some of the most detailed and believable concepts in transportation & entertainment.

Session Level: All
Session Type: Talk
Tags: Digital Manufacturing Summit; Automotive; Manufacturing; Rendering & Animation; Recommended Press Session – Auto

Day: Tuesday, 03/25
Time: 17:00 - 17:50
Location: Room 210G

S4646 - High-Performance Video Encoding Using NVIDIA GPUs

Abhijit Patait ( Sr. Manager, System Software, NVIDIA )
Abhijit Patait
Abhijit Patait has been leading NVIDIA's GPU multimedia team for past 4 years. His team is responsible for supporting the multimedia (audio and video) functionality in the NVIDIA GPU driver for Windows, NVENC SDK and GRID SDK. Prior to NVIDIA, Abhijit held several engineering and management positions working in the areas of baseband signal processing, telecom and VoIP systems design, audio/DSP processing etc. Abhijit holds an MSEE degree from University of Missouri-Rolla and and MBA from Haas School of Business, University of California at Berkeley.

This session is intended to provide a broad overview of the video encoding capabilities of the current (Kepler) and future (Maxwell) generations of NVIDIA GPUs. We will provide an overview of the hardware capabilities and software APIs used for video encoding, with an overview of recent improvements made in performance and quality of the encoder. We will also provide a quick overview of how NVIDIA video encoding can be used in various applications such as transcoding, low-latency applications, virtualization, streaming etc.

Session Level: Intermediate
Session Type: Talk
Tags: Media & Entertainment Summit; Cloud Visualization; Desktop & Application Virtualization; Remote Graphics & Cloud-Based Graphics

Day: Tuesday, 03/25
Time: 17:00 - 17:25
Location: Room 211B

S4664 - In-Place Array Transposition and Fast Array of Structure Accesses

Bryan Catanzaro ( Senior Research Scientist, NVIDIA )
Bryan Catanzaro
Ph.D., UC Berkeley. Bryan works in the Programming Systems and Applications Research group at NVIDIA where he focuses on machine learning algorithms for GPUs.

We'll present a new algorithm for in-place array transposition. The algorithm is useful for in-place transposes of large matrices, as well as in-place conversions between Arrays of Structures and Structures of Arrays. The simple structure of this algorithm enables full memory bandwidth accesses to Arrays of Structures. We'll discuss the algorithm, as well as several implementations on GPUs and CPUs.

Session Level: Intermediate
Session Type: Talk
Tags: Numerical Algorithms & Libraries

Day: Tuesday, 03/25
Time: 17:00 - 17:25
Location: Room LL21D

S4708 - Comparing OpenMP 4.0 Device Constructs to OpenACC 2.0

James Beyer ( Compiler Engineer, Cray Inc )
James Beyer
James is a senior member of the compiler optimization team at Cray Inc. He received his Ph.D. from the University of Minnesota. He has been the Cray representative to OpenMP ARB and language committee since 2006. James is also the author of proposal which spawned the OpenMP accelerator subcommittee. James is Co-chair of the OpenMP accelerator subcommittee and was involved in the desing of the original OpenACC specification and is currently an active member of the OpenACC technical committee.

The talk will briefly introduce two accelerator programming directive sets with a common heritage, OpenACC 2.0 and OpenMP 4.0. After introducing the two directive sets, a side by side comparison of available features along with code examples will be presented to help developers understand their options as they the begin programming as these two programming models both become available in production compilers.

Session Level: All
Session Type: Talk
Tags: Programming Languages & Compilers; Supercomputing; Performance Optimization

Day: Tuesday, 03/25
Time: 17:00 - 17:50
Location: Room LL21F

S4716 - Interactive 3D Data Visualization of 700 GB

Tom-Michael Thamm ( Director, Software Product Management Advanced Rendering, NVIDIA )
Tom-Michael Thamm
Mr. Thamm is Director for Software Product Management at NVIDIA Advanced Rendering Center (ARC) in Berlin Germany and is responsible for all software products, such as NVIDIA mental ray, NVIDIA Iray, and NVIDIA IndeX. He is managing and coordinating with his team the customer support as well as the general product definition and positioning. Mr. Thamm is working for NVIDIA ARC and before for mental images for over 20 years. He has led several key software projects and products, such as the new NVIDIA index product for geo-spatial visualization of huge datasets. He has studied Mathematics.
Jörg Mensmann ( Senior Graphics Software Engineer, NVIDIA IndeX, NVIDIA )
Joerg Mensmann is a Senior Graphics Software Engineer at NVIDIA, with a focus on large-scale and distributed volume rendering. Prior to joining NVIDIA, he was a member of the research staff in the Visualization and Computer Graphics group at the University of Munster, Germany, where he helped build a flexible visualization framework for medical volume data. He holds a Diplom degree and a PhD in Computer Science from the University of Munster.

Technical presentation of the latest version of NVIDIA IndeX (TM) with the emphasis of large volumetric data visualization. IndeX is a scalable GPU based software framework which renders high-quality images with interactive frame-rates.

Session Level: All
Session Type: Talk
Tags: Rendering & Animation; Big Data Analytics & Data Algorithms; Clusters & GPU Management; Combined Simulation & Real-Time Visualization

Day: Tuesday, 03/25
Time: 17:00 - 17:25
Location: Room LL20B

S4723 - Interactive Global Illumination with NVIDIA Visual Computing Cluster

Phillip Miller ( Director, Advanced Rendering, NVIDIA )
Highly-Rated Speaker
Phillip Miller
Mr. Miller directs product management for NVIDIA Advanced Rendering offerings, ranging from the Iray and mental ray shipping within leading products in Design and Entertainment to the OptiX ray tracing framework used extensively within private and commercial applications. He has been working on leading software products for 20 years, including the 3D animation products at Autodesk and the Web Design products at Adobe. He holds a Masters of Architecture from the University of Illinois and is a registered architect.
Stefan Radig ( Senior Manager, NVIDIA )
Stefan Radig is a Senior Manager at NVIDIA ARC. He co-leads engineering for the NVIDIA Iray rendering product. He is also responsible for NVIDIA ARC's distributed computing environment, a platform for highly scalable cluster based computing software which is especially suited for GPU based applications. Stefan Radig has a 15 year background in distributed systems and network technologies. Before joining NVIDIA he worked on a highly scalable internet over satellite platform, a video streaming platform, and multimedia conferencing systems.
Ankit Patel ( Sr. Product Manager, NVIDIA )
Ankit Patel
As a Sr.Product Manager Ankit Patel is working to deliver a product that allows everyone to harness the power of GPUs to help them realize their dreams, whether through creative storytelling or building amazing machines.

It is now possible to have physically-based, global illumination at interactive speeds within minimal noise. Come learn of Iray Nitro software and NVIDIA's new appliance designed specifically for high bandwidth cluster rendering that is the perfect solution for companies needing digital design reviews with uncompromised realism. Hundreds of GPUs will be used to show near linear of scaling of interactive GI driven from numerous professional software client solutions. See also how cluster management is made easy, allowing users to access all or part of the cluster with minimal networking knowledge.

Session Level: Beginner
Session Type: Talk
Tags: Rendering & Animation; Automotive; Ray Tracing; Manufacturing; Recommended Press Session – Auto

Day: Tuesday, 03/25
Time: 17:00 - 17:50
Location: Room 220C

S4729 - An Overview of FastFlow: Combining Pattern-Level Abstraction and Efficiency in GPGPUs

Marco Aldinucci ( Researcher, Computer Science Department, University of Torino )
Marco Aldinucci is an assistant professor at Computer Science Department of the University of Torino since 2008. Previously, he has been researcher at University of Pisa and Italian National Research Agency. He is the author of over a hundred papers in international journals and conference proceeding (Google scholar h-index 21). He has been participating in over 20 national and international research projects concerning parallel and autonomic computing. He is the recipient of the HPC Advisory Council University Award 2011 and the NVidia Academic Research programme 2013. He has been leading the "Low-Level Virtualization and Platform-Specific Deployment" workpackage within the EU-STREP FP7 ParaPhrase (Parallel Patterns for Adaptive Heterogeneous Multicore Systems) project, the GPGPU workpackage within the IMPACT project (Innovative Methods for Particle Colliders at the Terascale), and he is the contact person for University of Torino for the European Network of Excellence on High Performance and Embedded Architecture and Compilation. In the last year he delivered 5 invited talks in international workshops (March 2012 – March 2013). He co-designed, together with Massimo Torquati, the FastFlow programming framework and several other programming frameworks and libraries for parallel computing. His research is focused on parallel and distributed computing.

Get an overview of FastFlow's parallel patterns can be used to design parallel applications for execution on both CPUs and GPGPUs while avoiding most of the complex low-level detail needed to make them efficient, portable and rapid to prototype. For a more detailed and technical review of FastFlow's parallel patterns as well as a use case where we will show the design and effectiveness of a novel universal image filtering template based on the variational approach please join us for S4585 FastFlow: Combining Pattern-Level Abstraction and Efficiency in GPGPUs.

Session Level: All
Session Type: Talk
Tags: Media & Entertainment Summit

Day: Tuesday, 03/25
Time: 17:00 - 17:25
Location: Room 211A

S4808 - Real-Time Facial Motion Capture and Animation on Mobile

Emiliano Gambaretto ( Director of Research, Mixamo )
Emiliano Gambaretto
Emiliano Gambaretto obtained a PhD degree in Bioengineering from Politecnico di Milano (Italy) in 2011 for his research on Markerless Motion Capture. He also received a Master's Degree in Biomedical Engineering from Politecnico di Milano, a Diplome d'Ingenieur from Ecole Centrale de Lyon (France) in 2006 and worked at Stanford Biomotion Lab where he was co-inventor in markerless motion capture granted patent. Emiliano was part of Mixamo's founding team in 2008. He's currently Director of Research at Mixamo. His job consists of designing and developing the core technologies behind Mixamo's services. That includes animation, auto-rigging, real-time face animation, integration with 3rd-party software and industry standards.
Stefano Corazza ( CEO, Mixamo )
Stefano Corazza is the founder and CEO of Mixamo. Formerly researcher at Stanford University and winner of the Innovators' Challenge, Stefano hold a M.Sc. in Design and 2 PhDs in Engineering. Previous presentations include lectures on Character Animation at FMX ,Unite, Invited Keynote at SPIE and IFA.

3D Animation is the art form of the present and the future, with hundred of millions people drawn to its emotional power in movie theaters and games every year. Mixamo recently developed a facial capture and animation technology to enable anybody to create compelling animated content that is immediately reflected on a character's face. The technology was originally developed for 3D professionals, but with the recent introduction of the new generation mobile GPU hardware supporting OpenCL APIs such as the Tegra K1 it is now possible to port the technology to mobile devices. In the course of this presentation we will introduce numerical approaches to facial motion capture and animation that are based on a mixture of global and local models of human facial expressions and shape. The presenter will also go into the details of implementing the real-time technology on a Tegra K1 device.

Session Level: All
Session Type: Talk
Tags: Mobile Summit; Computer Vision; Game Development; Machine Learning & AI

Day: Tuesday, 03/25
Time: 17:00 - 17:25
Location: Room 210E

S4824 - A Really Realtime Raytracer

Matt Swoboda
Matt  Swoboda
Matt has a background in graphics programming in both the demoscene and the games industry. He spent 8 years in Sony Computer Entertainment's R&D group researching innovative graphics techniques on games consoles, and worked on every Sony console from Playstation 2 onwards. After moving into the live visuals world, he now works at D3 Technologies and his own production company Run Things. As a part of the legendary demogroup Fairlight he's been responsible for a number of award-winning demoscene productions, and has won more Scene.org awards than anyone else.

Realtime renderers find many important visual effects - such as reflections, refractions and global illumination - complicated, slow or impossible because they rely on ray tracing. These effects are necessary to approach the quality of an offline production render. While offline raytracers move to GPU and begin to approach "interactive" performance in the right situation, they are still not truly realtime - coping with ever-changing, moving scenes, lights and cameras, and producing a whole frame at final quality 30 times a second. This talk discusses a GPU raytracer that is focused on realtime applications throughout - the data structures, optimization for GPU compute, and the tricks and shortcuts used to achieve real time performance and some great visual effects.

Session Level: Advanced
Session Type: Talk
Tags: Ray Tracing; Real-Time Graphics Applications; Performance Optimization; NVScene

Day: Tuesday, 03/25
Time: 17:00 - 17:50
Location: Room 230C

S4902 - WebGL Magic for Mortals (Not Everyone is a Wizard)

Victor Sand ( Software Engineer, Goo Technologies )
Victor studied computer science at Linköping University in Sweden and at Stanford, California. The topic of his Master's thesis was space weather data visualization at NASA, and he's been taught computer graphics the "C and OpenGL" way. Joining the fast-moving forces of Goo Technologies, Victor is currently working in a team determined to revolutionize both computer graphics and the web itself by seamlessly combining WebGL with a decade of experience in the field.

Since the introduction of the GPU powered web a few years back, graphics experts and hackers have been working hard to utilize the WebGL power. The API is approaching its second version (WebGL 2.0) and the time has come to bring it to the masses. In this presentation, Goo Technologies will talk about (and show!) how they use WebGL, HTML5 and JavaScript as not only prerequisites, but as the complete foundation for bringing graphics technology to a broader audience. The session will include descriptions and demos of the modern 3D API Goo Engine and the cutting edge online 3D editor Goo Create.

Session Level: Intermediate
Session Type: Talk
Tags: Mobile Summit; Real-Time Graphics Applications; Rendering & Animation; Web Acceleration

Day: Tuesday, 03/25
Time: 17:00 - 17:25
Location: Room LL21C

S4188 - How to Avoid Global Synchronization by Domino Scheme

Lung-Sheng Chien ( Software Engineer, NVIDIA )
Lung-Sheng  Chien
Lung-Sheng Chien is a software engineer at Nvidia, working on CUBLAS and CUSPARSE libraries. Prior to Nvidia, he was a Ph.D. student in the Department of Mathematics at National Tsing Hua University. He received his B.S. and M.S. in the Department of Computer Science at National Tsing Hua University in 2003 and 2005, respectively.

Learn how to trace a data dependence graph without global synchronization. Such dependence graph can be built from sparse triangular solve, incomplete Cholesky factorization or incomplete LU factorization. There are several issues we will address, including: 1)How to reproduce the result without atomic operation; 2)How to keep one kernel to track data dependence graph; 3)How to keep small working space because GPU has limited device memory, and; 4)Penalty of warp size on this latency-sensitive application.

Session Level: Intermediate
Session Type: Talk
Tags: Numerical Algorithms & Libraries

Day: Tuesday, 03/25
Time: 17:30 - 17:55
Location: Room LL21D

S4197 - eyeSight on Android-based Set-Top-Boxes

Gideon Shmuel ( CEO, eyeSight Technologies Ltd. )
Gideon joined eyeSight with over 20 years of knowledge and experience in the Telecom and Enterprise Software markets, presenting an impressive track record in sales and enterprise development. Gideon has been involved in delivering complex solutions in growing technology organizations.

More and more we are seeing gesture recognition interfaces being integrated into digital devices around us. TVs and PCs with pre-installed gesture controls are becoming a standard feature in new devices launched in the market. As a provider of gesture solutions - we will discuss the benefits of running the gesture engines on GPU, as well as how Tegra based devices, including set top boxes, can benefit from such touch-free interfaces.

Session Level: All
Session Type: Talk
Tags: Mobile Summit; Smart TV, Mobile & Second Screen Applications; Video & Image Processing; Computer Vision

Day: Tuesday, 03/25
Time: 17:30 - 17:55
Location: Room 210E

S4271 - Halloc: A High-Throughput Dynamic Memory Allocator for GPGPU Architectures

Andrew Adinetz ( Researcher, Julich Supercomputing Centre, Forschungszentrum Jülich )
Andrew Adinetz
Andrew V. Adinetz got his M.S. degree in Computer Science in 2006 from Lomonosov Moscow State University, and his Ph.D. in Computer Science in 2009, also from MSU. He's currently working as a researcher at Forschungszentrum Jülich (NVIDIA Application Lab, Jülich Supercomputing Centre). His current research interests include GPU programming, algorithm design for many-core architectures, high-performance computing and programming languages.

Dynamic memory management is something that is taken for granted in almost all modern CPU runtime environments. However, on GPUs it became available only recently, and had so far seen very limited adoption due to low speed and lack of scalability. We present Halloc, a malloc/free-style dynamic memory allocator for GPUs. It is built around the idea of using a hash function-based procedure to search chunk bit arrays for free blocks. We describe algorithms used to manage slabs, large regions of memory from which smaller blocks are allocated. An evaluation of the allocator shows that it is scalable to tens of thousands of threads and hundreds of MiB worksets, can perform up to 1.7 billion allocations/s, demonstrates stable performance in a variety of scenarios and performs 2-100x better than state-of-the-art malloc/free-style GPU memory allocators.

Session Level: Advanced
Session Type: Talk
Tags: Programming Languages & Compilers

Day: Tuesday, 03/25
Time: 17:30 - 17:55
Location: Room LL20D

S4338 - The Energy Case for Graph Processing on Hybrid CPU and GPU Systems

Elizeu Santos-Neto ( Ph.D. Student, Electrical & Computer Engineering, University of British Columbia )
Elizeu Santos-Neto
Elizeu's research focuses on the characterization and design of online peer production systems such as peer-to-peer networks and collaborative tagging communities. Prior to joining UBC, he received a B.Sc. in Computer Science from the Universidade Federal de Alagoas and a M.Sc. in Computer Science from the Universidade Federal de Campina Grande, in Brazil.

This work reports on a power and performance analysis of large-scale graph processing on hybrid (i.e., CPU and GPU), single-node systems. While, on these systems, graph processing can be accelerated by properly mapping the graph-layout such that the algorithmic tasks exercise each of the processing units where they perform best; GPUs have much higher TDP thus their impact on overall energy consumption is unclear. An evaluation on large real-world graphs as well as on synthetic graphs as large as 1 billion vertices and 16 billion edges shows that efficiency - in terms of both performance and power, can be achieved.

Session Level: Intermediate
Session Type: Talk
Tags: Big Data Analytics & Data Algorithms; Energy Exploration

Day: Tuesday, 03/25
Time: 17:30 - 17:55
Location: Room 210B

S4341 - Harnessing the Power of Titan with the Uintah Computational Framework

Alan Humphrey ( Software Developer and Ph.D. Student, Scientific Computing and Imaging Institute, University of Utah )
Alan Humphrey
Alan Humphrey is a software developer at the Scientific Computing and Imaging Institute and also a Ph.D. student at the University of Utah where he works with Dr. Martin Berzins on improving the performance and scalability of the Uintah Computational Framework. Alan has been primarily involved in extending Uintah to run on hybrid CPU/GPU systems with the development of Uintah's prototype CPU-GPU task scheduler and most recently, Uintah's Unified Multi-threaded heterogeneous task scheduler and runtime system. Much of this work has been in preparation for using Uintah to solve large-scale energy related problems for the DOE NETL project using the entire Titan system with GPUs at Oak Ridge National Laboratory. He has considerable experience with heterogeneous systems and has done work with Uintah on TitanDev and NSF Keeneland. Much of Alan's past research has been focused on formal verification of concurrent systems, specifically the Message Passing Interface (MPI) and dynamic verification tools like In-situ Partial Order (University of Utah) - and its integration within the Eclipse Parallel Tools Platform (PTP). Alan has also been involved with the Eclipse PTP project since 2009.

This presentation will discuss how directed acyclic graph (DAG) approaches provide a powerful abstraction for solving challenging engineering problems and how using this abstraction and DAG approach, computational frameworks such as Uintah can be extended with relative ease to efficiently leverage GPUs, even at scale. Attendees will learn how frameworks like Uintah are able to shield the application developer from the complexities of the deep memory hierarchies and multiple levels of parallelism found in heterogeneous supercomputers such as Titan.

Session Level: Beginner
Session Type: Talk
Tags: Supercomputing; Computational Fluid Dynamics

Day: Tuesday, 03/25
Time: 17:30 - 17:55
Location: Room LL21A

S4434 - Real-Time 4K JPEG2000 for Broadcast and Digital Cinema

Jiri Matela ( CEO, Comprimato )
Jiri Matela
Jiri Matela received BSc and MSc degrees in Compute Science from Masaryk University in Brno, Czech Republic in 2007 and 2009. He is currently working toward the PhD degree at the Masaryk University focusing at image compressions, reformulations of image processing algorithms for massively parallel GPU architectures, high-speed networks. He is a member of team that recently received ACM Multimedia Best Open-Source Software Award for real-time image compressions and video transmission application UltraGrid and that demonstrated one of the first real-time compressed transmissions of video in 8K Ultra High-Definition resolution. Jiri is founder of Comprimato Systems, a company focusing on GPU accelerated image compressions and video codecs.

JPEG2000 is compression standard for digital cinema post-producition and it is an emerging standard for broadcast contribution and archiving. So far the JPEG2000 format was considered as computationally too heavy to be used for other then standardized applications such as cinema distribution. We present successful GPU design and implementation of JPEG2000 codec allowing for real-time film compression and decompression in digital cinema and broadcast applications. Fast GPU processing will help to further spread JPEG2000 as archiving and mezzanine format.

Session Level: Intermediate
Session Type: Talk
Tags: Media & Entertainment Summit; Video & Image Processing; Medical Imaging & Visualization

Day: Tuesday, 03/25
Time: 17:30 - 17:55
Location: Room 211B

S4458 - Speculative Atomics: A Case-Study of the GPU Optimization of the Material Point Method for Graphics

Gergely Klar ( Ph.D. Student, UCLA, Graphics Lab )
Gergely Klar
Gergely Klar is a Ph.D. student at UCLA, working on various projects of simulations for physically based animation. Gergely started his PhD at UCLA in 2011 as a Fulbright Science and Technology Fellow. He got his Master's Degree in Computer Science from Eotvos Lorand Science University, Hungary, and worked with the Computer Graphics group of Budapest University of Technology and Economics. In his free time he can be found competing at sailing races in Santa Monica Bay.

The Material Point Method's potential for computer graphics has been demonstrated at SIGGRAPH 2013, where it has been used with success in the simulation of snow. This opens up a range of exciting new opportunities for the simulation of elastic, viscous and fracturing materials. In this talk, we present an efficient, massively parallel implementation of MPM on the GPU. We show a radical new approach to remove the parallelization bottleneck imposed by the method's particles-to-grid rasterization step. The rasterization is implemented with the use of atomic instructions, while our specialized arrangement of the particles minimizes the occasions when actual synchronization is required. We demonstrate the efficiency of this speculative atomics approach by comparing it to more traditional implementations, where the rasterization step is transformed to a gather operation. This combination of atomic instructions and special arrangement is not limited to MPM, but can be used in any simulation with a particles-to-grid rasterization step.

Session Level: Intermediate
Session Type: Talk
Tags: Media & Entertainment Summit; Visual Effects & Simulation; Rendering & Animation; Performance Optimization

Day: Tuesday, 03/25
Time: 17:30 - 17:55
Location: Room 211A

S4488 - GooFit: Massively parallel likelihood fitting using GPUs

Rolf Andreassen ( Postdoctoral Fellow, University of Cincinnati )
Rolf Andreassen
Dr. Andreassen's research interests lie mainly in the creation of tools for physics analysis. He began developing software for HEP purposes as a summer student at CERN, writing a C++ API for the ISAJET event-generation package. As a Master's and Ph.D. student he wrote custom modules for analysing BABAR data, and later a fast simulation of the DIRC component of the SuperB experiment. Dr. Andreassen is the designer and lead developer of the GooFit fitting package, and has given courses at the University of Cincinnati and at CERN in CUDA programming and use of GooFit. He is involved with the QuarkNet outreach program, bringing high-school students and teachers to the university to gain experience with HEP theory and research.

We present the GooFit maximum likelihood fit framework which has been develop to run effectively on general purpose graphics processing units (GPUs) to enable next generation experimental high energy physics (HEP) research. Most analyses of data from HEP experiments use maximum likelihood fits. Some of today's analyses use fits which require more than 24 hours on traditional multi-core systems. The next generation of experiments will require computing power two orders of magnitude greater for analyses which are sensitive to New Physics. Our GooFit framework, which has been demonstrated to run on nVidia GPU devices ranging from high end Teslas to laptop GeForce GTs, uses CUDA and the Thrust library to massively parallelize the per-event probability calculation. For realistic physics fits we achieve speedups, relative to executing the same algorithm on a single CPU, of several hundred.

Session Level: Intermediate
Session Type: Talk
Tags: Computational Physics; Numerical Algorithms & Libraries; Big Data Analytics & Data Algorithms

Day: Tuesday, 03/25
Time: 17:30 - 17:55
Location: Room 212A

S4636 - Deep Neural Networks for Visual Pattern Recognition

Dan Ciresan ( Researcher, IDSIA )
Dr. Dan Ciresan received his PhD from "Politehnica" University of Timisoara, Romania. He first worked as a postdoc before becoming a senior researcher at IDSIA, Switzerland. Dr. Ciresan is one of the pioneers of using CUDA for Deep Neural Networks (DNNs). His methods have won five international competitions on topics such as classifying traffic signs, recognizing handwritten Chinese characters, segmenting neuronal membranes in electron microscopy images, and detecting mitosis in breast cancer histology images. Dr. Ciresan has published his results in top-ranked conference proceedings and journals. His DNNs have significantly improved the state of the art on several image classification tasks.

GPU-optimized Deep Neural Networks (DNNs) excel on image classification, detection and segmentation tasks. They are the current state of the art method in many visual pattern recognition problems by a significant margin. DNNs are already better than humans at recognizing handwritten digits and traffic signs. The complex handwritten Chinese characters are recognized with almost human performance. DNNs are successfully used for automotive problems like traffic signs and pedestrian detection; they are fast and extremely accurate. DNNs help the field of connectomics by making it possible to segment and reconstruct the neuronal connections in large sections of brain tissue for the first time. This will bring a new understanding of how biological brains work. Detecting mitotic cells in breast cancer histology images can be done quickly and efficiently with DNNs. Segmenting blood vessels from retinal images with DNNs helps diagnosticians to detect glaucoma.

Session Level: Beginner
Session Type: Talk
Tags: Computer Vision; Machine Learning & AI; Medical Imaging & Visualization; Automotive

Day: Tuesday, 03/25
Time: 17:30 - 17:55
Location: Room LL21B

S4706 - A GPU-Based Computational Framework for Large-Scale Critical Infrastructure Mapping Using Satellite Imagery

Dilip Patlolla ( R & D Staff, Oak Ridge National Laboratory )
Dilip Patlolla
Dilip Patlolla is Research staff member in the Geographic Information Science and Technology (GIST) Group at the Oak Ridge National Laboratory.He leads the development of Large-Scale Critical Infrastructure Mapping using advanced computing methods. His primary responsibilities include: opening up new domains of application for HPC, FPGAs, GPUs by researching and developing computing algorithms, and ensuring best possible performance on current and next-generation architectures. Dilip received his MS from the University of Tennessee, Knoxville and is the recipient of ORNL's 2013 Significant Event Award.

Assessing and monitoring critical infrastructures from space is a cost effective and efficient solution. Satellite images are now available with spatial resolutions and acquisition rates to enable image driven large-scale mapping and monitoring of critical infrastructure a viable possibility. However, processing huge volume of high spatial resolution imagery is not a trivial task. Often solutions require advanced algorithms capable of extracting, representing, modeling, and interpreting scene features that characterize the spatial, structural, and semantic attributes. Furthermore, these solutions should be scalable enabling analysis of big image datasets; at half-meter pixel resolution the earth's surface has roughly 600 Trillion pixels and the requirement to process at this scale at repeated intervals demands highly scalable solutions. In this research, we present a GPU-based computational framework designed for identifying critical infrastructures from large-scale satellite or aerial imagery to assess vulnerable population.

Session Level: All
Session Type: Talk
Tags: Defense; Video & Image Processing; Supercomputing; Big Data Analytics & Data Algorithms; Recommended Press Session – HPC-Science

Day: Tuesday, 03/25
Time: 17:30 - 17:55
Location: Room 210D

S4807 - HTML5 Outside the Browser Survival Guide: The Challenges of Hybrid Apps and Games

Iker Jamardo ( VP of Engineering, Ludei )
Iker Jamardo is a passionate software engineer who's main interests include real time graphics, software architecture and product design. Since his first steps in professional software development back in 2000 he has been related to the video game industry where he worked for 2 years developing a AAA PC game title. Then he moved to the research and lecturing field for over 10 years at the University of Deusto in Bilbao, Spain, where he developed multiple projects and publications in multimedia, virtual and augmented reality and accessibility technologies, working closely with companies in these areas. Wanting to go back to the video game industry, he joined Ludei to help with the design and development of the core native multi platform technology of the Cocoon engine, a Service Oriented Architecture that includes CocoonJS, a customized JavaScript virtual machine specifically tuned for native HTML5 game execution and monetization. As a company executive, he has been also involved in the definition and management of the different products of the company.

Web based apps and games are growing both in number and complexity. Running outside the browser on a mobile device is still a challenging path full of bumps and hoops to overcome. From efficient memory management to access to native features, hybrid apps provide a great way to solve the problems and mix all the advantages of both worlds: web and native. Far from the media fight of which is best, a combination of both technologies provide a much richer development experience. In this talk attendees will know and understand how to solve important matters when dealing with the system webview fragmentation, the poor bandwidth of the native bridges or the lack of support for certain important technologies like WebGL.

Session Level: All
Session Type: Talk
Tags: Mobile Summit; Game Development; Debugging Tools & Techniques; Web Acceleration

Day: Tuesday, 03/25
Time: 17:30 - 17:55
Location: Room LL21C

S4875 - Parallel Computation and Rendering with NVIDIA IndeX

Phillip Jong ( General Manager Exploration and Geoscience Software, Shell )
Phillip Jong
Phillip has 15 years of experience in managing proprietary Subsurface Geosciences software development. He has an in-depth knowledge of the entire Hydrocarbon Development lifecycle from Exploration to Appraisal to Production with specific domain expertise in Seismic Processing, Seismic Interpretation, Quantitative Interpretation, Basin Modelling, and Geological Modelling. Phillip's interests include leveraging technical and competitive IT to impact the way we explore and produce energy resources.

Exploration of potential oil fields in the oil and gas industry has become more complex over the last decade. In collaboration with NVIDIA, Shell has developed an in-house interpretation system for large seismic data volumes. NVIDIA IndeX, a scalable parallel rendering and computation framework, has proven to be a critical component of Shell's future software solutions.

Session Level: All
Session Type: Talk
Tags: Energy Exploration; Recommended Press Session – HPC-Science

Day: Tuesday, 03/25
Time: 17:30 - 17:55
Location: Room LL20B

S4243 - A GPU Sparse Direct Solver for AX=B

Jonathan Hogg ( Researcher, Science and Technology Facilities Council (STFC) )
Jonathan Hogg
Following completion of his Ph.D. at Edinburgh in 2009, Jonathan joined STFC where he works with the Numerical Analysis Group of the Scientific Computing Department. He is the author of several high performance software packages in the sparse and dense linear algebra with a focus on desktop HPC.

The solution of Ax=b for sparse A is one of the core computation kernels ("dwarves") used in scientific computing. While there are many GPU iterative methods libraries available, these can only tackle a limited range of problems due to preconditioning requirements. On the CPU, black box direct solvers are often the first port of call for more challenging problems, however little GPU support is present in existing libraries. We present a new direct solver library capable of performing entirely on GPU factorization and solve for symmetric problems. The talk will cover our solution to a number of the challenges involved in making this reality, and present results across a number of application areas including FEM and Optimization.

Session Level: Intermediate
Session Type: Talk
Tags: Numerical Algorithms & Libraries

Day: Wednesday, 03/26
Time: 09:00 - 09:25
Location: Room LL20D

S4251 - Reality Check: GRID-Accelerated High-End Graphics Performance in Virtual Desktops

Bernhard Tritsch ( CTO, Bluecue Consulting )
Bernhard Tritsch
Benny Tritsch is a virtualization technology expert, IT system architect, technical communicator and market analyst. He serves as the Chief Technology Officer at Bluecue Consulting in Germany and is a frequent speaker at leading international industry events. Benny has been awarded as a Microsoft Most Valuable Professional (MVP) for RDS every year since 2004 and he is a Citrix Technology Professional (CTP) since 2006.
Shawn Bass ( Senior Consultant, Syn-Net )
Shawn Bass
Shawn Bass, an independent consultant based in the Chicago area, is a “Citrix Technology Professional” and a “Microsoft Most Valuable Professional.” He’s been working with Citrix technologies since the WinView product days, and today, most of his clients are Fortune 500 companies (primarily in the financial services and insurance markets).

How good are today's virtual desktop remoting protocols when combined with NVIDIA GRID cards? Virtualization experts Benny Tritsch and Shawn Bass developed a unique, vendor-independent test methodology allowing them to visually compare Microsoft RemoteFX, VMware/Teradici PCoIP and Citrix HDX head-to-head under different network conditions. Join them in their session where they walk you through the results of their NVIDIA-accelerated tests. See the difference between shared and dedicated GPUs in virtual desktops running on different popular virtualization platforms.

Session Level: Intermediate
Session Type: Talk
Tags: Graphics Virtualization Summit; Desktop & Application Virtualization; Remote Graphics & Cloud-Based Graphics; Recommended Press Session – Graphics Virtualization

Day: Wednesday, 03/26
Time: 09:00 - 09:50
Location: Room 210F

S4259 - Optimization of a CUDA-based Monte Carlo Code for Radiation Therapy

Nick Henderson ( Postdoctoral Scholar, Stanford University, Institute for Computational and Mathematical Engineering )
Nick Henderson is a Postdoctoral scholar with the CUDA Center of Excellence in the Institute for Computational and Mathematical Engineering at Stanford University.

Learn about optimization efforts in G4CU, a CUDA Monte Carlo code for radiation therapy. G4CU is based on the core algorithm and physics processes in Geant4, a toolkit for simulating particles traveling through and interacting with matter. The techniques covered will include the use of texture references for look-up tables, device configuration for different simulation components, and scheduling of work for different particle types.

Session Level: Intermediate
Session Type: Talk
Tags: Computational Physics; Medical Imaging & Visualization; Numerical Algorithms & Libraries

Day: Wednesday, 03/26
Time: 09:00 - 09:25
Location: Room 212A

S4422 - A New GPU-Based Level Set Method for Medical Image Segmentation

Wenzhe Xue ( Research Assistant, Mayo Clinic Arizona; Arizona State University )
Wenzhe Xue
Wenzhe Xue is working towards his Ph.D. on Biomedical Informatics at ASU, and is a research assistant in the Medical Imaging Informatics (MII) lab at Mayo Clinic Arizona, under the supervision of Dr. Ross Mitchell. Wenzhe works on developing novel GPU-based level set methods for medical image segmentation and validating on both synthetic and real clinical image data. He aims to provide an accurate, precise, and fast tool for quantitative imaging on cancer treatment research and studies.

We have developed a new approach to measure lesion volumes in medical images using GPU programming. The approach is based on the level set method and minimizes the number of voxels included in the computational domain with unique efficiency. The underlying cost function and specifics of the level sets approach are not limited by the implementation, and multiple methods for determining the boundary progression speed are possible. We have experimented with intensity-based approaches as well as higher-order feature spaces using multiple image contrasts. We have tested our approach on synthetic images and in a clinical setting. GPU programming also enables real-time 3D rendering and visualization of the propagating level set surface volume. This GPU-enabled combination of speed and interactivity makes our approach an excellent candidate for use in oncology where change in tumor volume guides clinical decision making and assessment of treatment effectiveness.

Session Level: Beginner
Session Type: Talk
Tags: Medical Imaging & Visualization; Video & Image Processing; Combined Simulation & Real-Time Visualization; Recommended Press Session – HPC-Science; Recommended for All Press

Day: Wednesday, 03/26
Time: 09:00 - 09:25
Location: Room LL21B

S4451 - GPU Computing in .NET for Financial Risk Analytics

Ryan Deering ( Director of Quantitative Development, Chatham Financial )
Ryan Deering
Ryan leads Chatham’s Quantitative Development team focusing on derivatives pricing, risk analytics, and credit risk modeling. Prior to joining Chatham, Ryan received his PhD in Mathematics from Duke University in Durham, North Carolina. His dissertation focused on signal processing with applications to speech recognition. Ryan also holds a BS and MA degrees in Mathematics from Duke University.

Learn how a rapidly growing mid-sized financial company incorporated GPU computing into its quantitative finance models. Our quantitative development team faced two major obstacles in adopting GPU computing. The first obstacle is the large cost of switching away from our mature .NET development process. The other obstacle arises from the difficulty of synchronizing a slow hardware purchasing cycle with a fast software delivery cycle. We addressed these concerns by creating a hybrid linear algebra library in .NET that dynamically switches to GPU computing when CUDA hardware is available. This library allows our developers to code in .NET and focus on the mathematical and financial models without worrying about CUDA syntax. In this session we will describe how we built the library in .NET using CUBLAS, CURAND, and CUDA Runtime libraries. We will also show the performance gains from switching to GPU computing in pricing Bermudan swaptions using the Libor Market Model.

Session Level: Beginner
Session Type: Talk
Tags: Finance

Day: Wednesday, 03/26
Time: 09:00 - 09:50
Location: Room 210C

S4470 - Porting CPU-Based Multiprocessing Algorithms to GPU for Distributed Acoustic Sensing

Steve Jankly ( Principal Technical Professional - Software Development, Halliburton / Pinnacle )
Steve Jankly
Steve is a Principal Technical Professional (Software Development) for Pinnacle, A Halliburton Service. Pinnacle provides industry-unique integration of various fiber optic technologies for fracture diagnostic and reservoir monitoring. Steve has been working as a software development professional for various fiber optic sensing solutions for more than 6 years, and prior to that, he worked with image sensors and digital imaging technologies.

This talk describes our endeavors, from start to finish, in implementing a parallelizable and computationally intensive process on a GPU for fiber optic solutions, specifically Distributed Acoustic Sensing (DAS) interrogation systems. Applications for DAS vary, and include stimulation and production monitoring, verification of downhole equipment operation, pipeline monitoring and collection of seismic imaging data. These systems can produce up to a few gigabytes per second of data which needs to be processed in real-time. We have previously utilized embedded processors, but the need for faster computation ability arose with the next-generation system, due to the increased amounts of data and more complex data processing. We will discuss the process we undertook in porting the parallelized CPU version of the algorithms to NVIDIA GPUs, utilizing CUDA C. We also explore the various GPUs tested, and provide performance metrics.

Session Level: Intermediate
Session Type: Talk
Tags: Energy Exploration; Computational Physics

Day: Wednesday, 03/26
Time: 09:00 - 09:25
Location: Room LL20B

S4477 - Accelerating the Discrete Element Method for Faceted Particles Using HOOMD-Blue

Matthew Spellings ( Doctoral Candidate, University of Michigan )
Matthew Spellings
Matthew received his undergraduate degree from Vanderbilt University. He is currently a Ph.D. candidate in Professor Sharon Glotzer's lab at the University of Michigan.

Explore the concepts behind large-scale modeling of faceted anisotropic particles. Dynamical methods are the most direct way to study the full set of properties of systems of colloidal and nanoscale particles. Classical and event-driven molecular dynamics simulations of the past have focused on behavior of isotropic particles and limited classes of anisotropic particles such as ellipsoids. In this talk, we discuss the algorithms and data structures behind a GPU-accelerated implementation of the discrete element method for polyhedral particles in HOOMD-Blue. This formulation allows us to efficiently simulate conservative and non-conservative dynamics of faceted shapes within a classical molecular dynamics framework. Research applications include studies of nucleation and growth, granular materials, glassy dynamics and active matter.

Session Level: Intermediate
Session Type: Talk
Tags: Molecular Dynamics; Computational Physics

Day: Wednesday, 03/26
Time: 09:00 - 09:25
Location: Room LL21E

S4494 - Preliminary Work on Fast Radix-Based k-NN MultiSselect on the GPU

Roshan D'Souza ( Associate Professor, University of Wisconsin - Milwaukee )
Roshan D'Souza
Roshan D'Souza is an associate professor in the Mechanical Engineering Dept. at the University of Wisconsin-Milwaukee. He obtained a PhD in Mechanical Engineering in 2003 from the University of California, Berkeley. His research interests include parallel algorithms for Monte-Carlo type algorithms with applications in Systems Biology and parallel image processing,

In this presentation we describe an efficient multi-level parallel implementation of the most significant bit (MSB) radix sort-based multi-select algorithm (k-NN). Our implementation processes multiple queries within a single kernel call with each thread block/warp simultaneously processing different queries. Our approach is incremental and reduces memory transactions through the use of bit operators, warp voting functions, and shared memory. Benchmarks show significant improvement for over previous implementation of k-NN search on the GPU.

Session Level: Intermediate
Session Type: Talk
Tags: Machine Learning & AI; Big Data Analytics & Data Algorithms

Day: Wednesday, 03/26
Time: 09:00 - 09:25
Location: Room LL21D

S4620 - Dax: A Massively Threaded Visualization and Analysis Toolkit for Extreme Scale

Kenneth Moreland ( Principal Member of Technical Staff, Sandia National Laboratories )
Kenneth Moreland
Kenneth Moreland received the BS degrees in computer science and in electrical engineering from the New Mexico Institute of Mining and Technology in 1997. He received the MS and Ph.D. degrees in computer science from the University of New Mexico in 2000, and 2004, respectively, and currently resides at Sandia National Laboratories. Dr. Moreland specializes in large-scale visualization and graphics and has played an active role in the development of ParaView, a general-purpose scientific visualization system capable of scalable parallel data processing. His current interests the design and development of visualization algorithms and systems to run on multi-core, many-core and future-generation computer hardware.

Visualization on today's GPU technology and at extreme scale requires massive concurrency. The Dax Toolkit is a development framework for designing and using such devices. Learn how to use Dax to execute classic visualization and analysis algorithms on a variety of mesh data structures and adapt the templated toolkit to your own data structures. Also design your own massively-threaded visualization algorithms in a simplified development environment that allows you to focus on the mathematical and algorithmic design. Dax's automatic concept and scheduling mechanisms automatically build parallel scheduling and communication code from signatures using C++.

Session Level: Beginner
Session Type: Talk
Tags: Scientific Visualization; Large Scale Data Visualization & In-Situ Graphics

Day: Wednesday, 03/26
Time: 09:00 - 09:25
Location: Room LL21F

S4629 - Advanced HMI Design Workflow at PSA Peugeot Citroen

Alain Gonzalez ( Expert Workstations Graphics Technologies & 3D Imagery, PSA Peugeot Citroën )
Alain Gonzalez
A graduate of the University of Paris Sud Orsay with a Master's Degree in Computer Science & Engineering, Alain has worked in PSA Peugeot's IT department since 2000 starting as a Workstations IT Architect. Since 2009, Alain has been involved with the Expert Workstations Graphics Technologies & 3D Imagery area.
Benoit Deschamps ( Imaging Solutions - Team Leader , PSA Peugeot Citroën )
Benoit Deschamps
After receiving his Master's degree in Imaging & Multimedia from the University of Bordeaux in France, Benoit was a development engineer for an automotive company before joining PSA Peugeot Citroën's IT department in 2006 where he now serves as the Team Leader for Imaging Solutions.

PSA Peugeot Citroën IT department is involved to improve HMI Design Workflow. On the following topics, it will be explained how to provide tools, efficient workflows and hardware in order to design, evaluate and simulate HMI embedded in new car models. During the styling phase, movies must be produced to illustrate the HMI concept and/or tools are needed to simulate in real-time embedded systems. A powerful grid computing system and prototyping architecture will be presented to achieve those requirements. During the Design phase, the final HMI must be simulated with the interior environment in order to visualize reflections on windshield, color and trim integration, embedded display defaults and also to validate the usability of the HMI. Those simulations will be demonstrated into a real-time rendering software with a physical based rendering for HMI, an internal PSA Peugeot Citroën development. This talk will provide a focus on HMI, grid computing, GPU Computing, embedded hardware, physically based renderer.

Session Level: All
Session Type: Talk
Tags: Digital Manufacturing Summit; Automotive; Combined Simulation & Real-Time Visualization; Manufacturing; Recommended Press Session – Auto

Day: Wednesday, 03/26
Time: 09:00 - 09:50
Location: Room 220C

S4650 - Realizing High-Performance Pipelines Using Piko

Kerry Seitz ( Graduate Student Researcher, University of California, Davis )
Kerry Seitz
Kerry Seitz is a Ph.D. student in Computer Science at the University of California, Davis, working under Prof. John Owens. He received a B.S. in Computer Science with minors in Biology and Biomathematics from Trinity University. His research interests include heterogeneous computing, general purpose computing on graphics processing units (GPGPU), programming languages, and compilers. He is especially interested in the intersection of these areas. Due to the increasingly widespread use of many-core architectures, like graphics processing units (GPUs), he is interested in making these devices easier to program so that developers can take advantage of the processing power and energy efficiency that they provide.
Anjul Patney ( Research Scientist, NVIDIA )
Anjul Patney
Anjul received his Masters and Ph.D. degrees in Electrical and Computer Engineering from University of California at Davis, working with Prof. John D. Owens. His Bachelors degree is in Electrical Engineering from Indian Institute of Technology, Delhi.
Stanley Tzeng ( Computer Imaging Architect, NVIDIA )
Stanley Tzeng
Stanley Tzeng graduated from UC Davis with a PhD under the guidance of Professor John Owens. His PhD. Dissertation focused on alternative scheduling routines for discrete and embedded GPUs. He is currently working in NVIDIA as part of the mobile camera team.

We present Piko, a system abstraction to help implement high-level algorithmic pipelines on modern parallel architectures. We define 'pipelines' as a sequence of complex, dynamically-scheduled kernels that combine to implement a complex application. While primarily targeted towards efficient graphics applications, the way in which Piko exposes both parallelism and locality can naturally be applied to other domains as well. The abstraction helps programmers define work granularities as the data evolves across stages of an application. These definitions are disjoint from the underlying algorithms, which helps authors of Piko pipelines explore tradeoffs between locality and parallelism across varying application configurations and target architectures. As a consequence, Piko helps design high-performance software pipelines that are flexible as well as portable across architectures.

Session Level: Intermediate
Session Type: Talk
Tags: Real-Time Graphics Applications; Programming Languages & Compilers; Performance Optimization; Ray Tracing

Day: Wednesday, 03/26
Time: 09:00 - 09:50
Location: Room LL21C

S4673 - Inside Thrust: Building Parallel Algorithms with Bulk

Jared Hoberock ( Research Scientist, NVIDIA )
Jared Hoberock
Jared Hoberock joined NVIDIA Research in October 2008. His interests include parallel programming models and physically-based rendering. Jared is the co-creator of Thrust, a high performance parallel algorithms library. While at NVIDIA, Jared has contributed to the DirectX graphics driver, Gelato, a final frame film renderer, and OptiX, a high-performance, programmable ray tracing engine. Jared received a Ph.D in computer science from the University of Illinois at Urbana-Champaign. He is a two-time recipient of the NVIDIA Graduate Research Fellowship.

Learn how to build high performance and robust CUDA kernels with Bulk, the CUDA C++ library powering Thrust's high performance algorithm implementations. Learn how virtual shared memory, cooperative algorithms, and bulk-synchronous task launch make CUDA programming easier, more productive, and fun.

Session Level: Intermediate
Session Type: Talk
Tags: Programming Languages & Compilers

Day: Wednesday, 03/26
Time: 09:00 - 09:50
Location: Room LL21A

S4691 - Map-D: A GPU Database for Interactive Big Data Analytics

Thomas Graham ( Co-founder, Map-D )
Thomas  Graham
Before co-founding Map-D, Tom was a researcher at Harvard Law School where he focused on the intersection between social networks, big data and law reform. His research centered around privacy and the development of social science methodologies that allow legal scholars, governments and interest groups to more effectively incorporate social network data into their decision-making processes. Tom lived in China for many years where he studied Chinese and dabbled in Chinese cooking and calligraphy. Tom is admitted to the New York Bar and was previously an attorney with Davis Polk in Hong Kong, where he focused on capital markets and M&A across Asia's emerging markets. He is also admitted to practice law in Australia. Tom holds a LLM from Harvard Law School and a LLB, BA and Dip. Languages from Melbourne University.
Todd Mostak ( Co-founder, Map-D )
Todd Mostak
Todd was a researcher at MIT CSAIL, where he worked in the database group, before co-founding Map-D. Seeking adventure upon finishing his undergrad, Todd moved to the Middle East, spending two years in Syria and Egypt teaching English, studying Arabic and eventually working as a translator for an Egyptian newspaper. He then completed his MA in Middle East Studies at Harvard University, afterwards taking a position as a Research Fellow at Harvard’s Kennedy School of Government, focusing on the analysis of Islamism using forum and social media datasets. The impetus to build Map-D came from how slow he found conventional GIS tools to spatially aggregate and analyze large Twitter datasets.

Interactive big data analytics: Using GPUs to power the world's fastest database. As part of an emerging conversation between HPC and enterprise, this talk will focus on the future of high performance big data analytics from enterprise, government and scientific perspectives while tracking the challenges posed by data collection, hardware integration and interface design. But there is more at stake than data-drive cost savings, these perspectives are framed by the need to socialize and democratize high-power big data analytics to the advantage of all. Map-D is an ultra-fast GPU database that allows anyone to interactively analyze and visualize big data in real time. Built into GPU memory, Map-D's unique architecture runs 70-1000X faster than other in-memory databases and big data analytics platforms. We will also showcase Map-D's public demos, including TweetMap that maps over 1 billion tweets in real time and Campaign Finance Map that unravels the influence of money on political discourse over time.

Session Level: All
Session Type: Talk
Tags: Big Data Analytics & Data Algorithms; Combined Simulation & Real-Time Visualization; Supercomputing; Large Scale Data Visualization & In-Situ Graphics; Recommended Press Session – HPC-Science

Day: Wednesday, 03/26
Time: 09:00 - 09:50
Location: Room 210B

S4745 - Now You See It: Unmasking Nuclear and Radiological Threats Around the World

Michael Sossong ( Chief Physicist and Vice President for Research & Development, Decision Sciences International Corporation )
Columbus Scholar Dr. Michael Sossong has revolutionized the state of the art for passive nuclear threat detection using cosmic ray muon tomography. He joined Decision Sciences as Director of Nuclear Research in April 2008, leading the development of a commercial multi-mode passive detector system and other proprietary scanners and methods. Previously Principal Investigator for muon tomography (MT) at Los Alamos National Laboratory (LANL), Dr. Sossong was instrumental in the creation of full-physics simulation models for MT development, the application of tomographic algorithms to MT data, and the design and construction of Decision Sciences' full-scale Multi-Mode Passive Detection System (MMPDS). Dr. Sossong has contributed to several homeland security and nuclear stockpile projects. Earlier in his career he participated in the muon g-2 experiment performed at Brookhaven National Laboratory where he was responsible for the construction, operation and analysis of data from a set of drift tube particle tracking chambers used to track positrons from the decay of stored muons. The results of the analysis of data from these chambers were used to determine several systematic corrections to the g-2 measurement and also to set a new limit on the muon electric dipole moment. Recipient of the 2011 prestigious Christopher Columbus Homeland Security Award, Dr. Sossong is published in several peer-reviewed scientific journals and has presented at domestic and international symposia and conferences. He holds several patents and copyrights. Dr. Sossong earned his Ph.D., and master's and bachelor's degrees in Physics at the University of Illinois at Urbana-Champaign.

Muon tomography (MT), researched at Los Alamos National Lab, and developed into a real world solution by Decision Sciences (DSIC), harnesses nature to protect ports of entry against the smuggling of illicit materials and weapons of mass destruction. MT makes use of the constant rain of background radiation, cosmic ray-generated charged particles (muons and electrons), to image the contents of vehicles, cargo containers and other large volumes. Data processing requirements include the application of 3D imaging algorithms deciphering signals from millions of particle tracks within volumes with millions of elements. To meet operational and customer requirements, imaging and detection must be performed with very low latency. DSIC has applied the power of NVidia GPUs to the iterative reconstruction of the contents of the volume of interest,achieving performance gains of more than 100x. Hear about this revolutionary security scanning technology and how NVidia technologies are helping make its implementation possible.

Session Level: All
Session Type: Talk
Tags: Defense; Combined Simulation & Real-Time Visualization; Scientific Visualization; Rendering & Animation; Recommended for All Press

Day: Wednesday, 03/26
Time: 09:00 - 09:25
Location: Room 210D

S4785 - Khronos Open API Standards for Mobile Graphics, Compute and Vision Processing

Neil Trevett ( Vice President Mobile Ecosystem, NVIDIA )
Neil has spent over thirty years in the 3D graphics industry and is currently responsible for driving the advanced apps ecosystem on NVIDIA Tegra mobile devices. Neil is also the elected President of the Khronos Group industry standards consortium where he initiated the OpenGL ES standard, helped catalyze the WebGL project and chairs the OpenCL and StreamInput working groups. Previously, as Vice President of 3Dlabs, Neil was at the forefront of the silicon revolution bringing interactive 3D to the PC, and he established the embedded graphics division of 3Dlabs to bring advanced visual processing to a wide-range of non-PC platforms. Neil was elected President for eight consecutive years of the Web3D Consortium dedicated to creating open standards for communicating real-time 3D on the Internet. Neil graduated from Birmingham University in the UK with a First Class Joint Honors B.Sc. in electronic engineering and computer science and holds several patents in the area of graphics technology.

Discover how over 100 companies cooperate at the Khronos Group to create open, royalty free standards that help define the future of mobile silicon. This session explores the role of industry standards in maximizing mobile market opportunities and provides an overview of the state-of-the-art in acceleration APIs on Android and ARM-based systems including: (1) accelerating time to productive ecosystems rather than minimizing time to proprietary specifications; (2) balancing and reconciling the opposing benefits of "differentiation" and "fragmentation"; (3) designing open standards that drive innovation while allowing room for a healthy competition; (4) overview of Khronos ecosystem APIs for graphics, computing, sensor and vision processing; and (5) accelerating advanced applications such as Augmented Reality.

Session Level: All
Session Type: Talk
Tags: Mobile Summit; Virtual & Augmented Reality; Computer Vision; Real-Time Graphics Applications; Recommended Press Session – Mobile

Day: Wednesday, 03/26
Time: 09:00 - 09:25
Location: Room 210E

S4834 - Digital Pyrotechnics at Industrial Light & Magic with Plume

Olivier Maury ( Lead Rendering Engineer, Industrial Light & Magic )
Olivier   Maury
Olivier Maury is the Lead Rendering Engineer at Industrial Light & Magic. His expertise includes physically-based rendering and effects authoring systems. Olivier joined ILM in 2006, he has worked at large scale VFX studios for over 13 years. He holds a M.Sc. both in Theoretical Physics (Université Paris-Sud 11) and in Computer Science (Ingénieur Télécom ParisTech).

Fire, smoke, dust and explosions are staples of motion picture visual effects. Recent effects film like Star Trek: Into Darkness, Battleship or Transformers: Dark of the Moon contain thousands of these elements. Each requires fluid dynamics simulation and volume rendering. It is computationally intensive, and takes a lot of artist time to get right. This presentation will focus on how Industrial Light & Magic tackles this challenge using its groundbreaking proprietary package, Plume. Plume is a combination of a fluid dynamics solver and a volume renderer, both of which run entirely on the GPU. It has been heavily used in production since it was introduced in production in 2009 and quickly became the principal digital smoke and pyrotechnics tool used at ILM. Since it initial development, it has been used on 21 productions: 19 feature films and 2 theme park attractions. Olivier will discuss some of Plume's unique features, how it fits in ILM's digital effects pipeline as well as the infra-structure in place to make it such a productive tool. He will also illustrate its productivity going over its use in some recent productions. On February 15, 2014, Plume was recognized by the Academy of Motion Picture Arts and Sciences with a Scientific and Technical Award.

Session Level: All
Session Type: Talk
Tags: Media & Entertainment Summit; Recommended Press Session – Media & Entertainment; Recommended for All Press

Day: Wednesday, 03/26
Time: 09:00 - 09:25
Location: Room 211A

S4184 - OpenMM: GPU Accelerated Algorithm Development for Molecular Dynamics

Peter Eastman ( Senior Software Engineer, Stanford University )
Peter Eastman
Peter Eastman has spent over a decade developing MD software and five years coding on GPUs. He is the lead author of OpenMM.

Learn how to develop molecular dynamics algorithms for a GPU without writing any GPU code. OpenMM provides a high level scripting language in which scientists describe the computation to do using mathematics, not code. The equations are automatically analyzed and transformed into highly optimized CUDA kernels. This happens at runtime and is invisible to the user. Entirely novel algorithms can be implemented in just a few lines by someone with no CUDA programming experience, yet they run at full speed on the GPU hardware. This talk will describe how to use OpenMM to dramatically simplify and accelerate MD algorithm development. It also will describe the techniques used to transform equations into optimized code, making it relevant to programmers who want to apply similar techniques to other fields.

Session Level: Intermediate
Session Type: Talk
Tags: Molecular Dynamics; Computational Physics

Day: Wednesday, 03/26
Time: 09:30 - 09:55
Location: Room LL21E

S4186 - Optimizing a LBM code for Compute Clusters with Kepler GPUs

Jiri Kraus ( Developer Technology Software Engineer, NVIDIA )
Highly-Rated Speaker
Jiri Kraus
Jiri Kraus is a developer in NVIDIA's European Developer Technology team. As a consultant for GPU HPC applications at the NVIDIA Jülich Applications Lab, Jiri collaborates with local developers and scientists at the Jülich Supercomputing Centre and the Forschungszentrum Jülich. Before joining NVIDIA Jiri worked on the parallelization and optimization of scientific and technical applications for clusters of multicore CPUs and GPUs at Fraunhofer SCAI in St. Augustin. He holds a Diploma in Mathematics from the University of Cologne, Germany.

To fully utilize a GPU Cluster the single GPU code as well as the inter GPU communication needs to be efficient. In this session a LBM code applying a D2Q37 model is used as a case study to explain by example how both targets can be met. The compute intensive collide kernel of the LBM code is optimized for Kepler specifically considering the large amount of state needed per thread due to the complex D2Q37 model. To gain efficient inter GPU communication CUDA-aware MPI was used. We explain how this was done and present performance results on a Infiniband Cluster with GPUDirect RDMA.

Session Level: Intermediate
Session Type: Talk
Tags: Supercomputing; Computational Fluid Dynamics

Day: Wednesday, 03/26
Time: 09:30 - 09:55
Location: Room LL20C

S4203 - Gesture-Based Interactive Visualization of Large-Scale Data using GPU and Latest Web Technologies

Ibrahim Demir ( Assistant Research Professor, University of Iowa )
Ibrahim Demir
Ibrahim Demir develops web-based visualization and communication tools to make it easy to see information from complex and large scale geo-spatial environmental data sets. His work ranges from crowd-funding stream sensors, citizen science projects for collecting environmental data, adoption of environmental sensors by the public, crowdsourcing flood predictions, SIRI-like knowledge engine for flooding, displaying flood warnings using augmented reality, creating a virtual flood simulation on a tabletop, and experimenting with novel devices like Google Glass, Leap Motion and Microsoft Kinect for innovative projects on scientific visualization and interaction. Demir is the architect and main developer of Iowa Flood Information System (IFIS). Demir has a PhD in Environmental Informatics from the University of Georgia, and works as a Research Professor at the University of Iowa. Demir recently received an award from National Science Foundation (NSF) and Mozilla Foundation for his application, FloodCube – National Flood Information System, at the apps for next-generation networks challenge. Demir is a member of CUAHSI Informatics Committee, University of Iowa Information Technology Advisory Committee, Elsevier Innovation Explorers Program, and InnoCentive Challenge Program, and serves at the Global Environment for Network Innovations (GENI) as a project lead. His work is supported by NSF, NASA, International Water Association (IWA), Mozilla Foundation, Google, NVIDIA, BlackBerry, and many research organizations and technology companies.

As geoscientists are confronted with increasingly massive datasets from environmental observations to simulations, one of the biggest challenges is having the right tools to gain scientific insight from the data. Recent developments in web technologies make it easy to manage, visualize and share large data sets with the public. Novel visualization techniques and dynamic user interfaces allow users to interact with data, and change the parameters to create custom views of the data to gain insight from simulations and environmental observations. This requires intelligent knowledge discovery techniques to extract information from complex computational simulations and large data repositories. This presentation provides an overview of the information visualization and communication tools developed for communicating radar and satellite-based rainfall products utilizing graphical processing unit and latest web technologies in a web browser. User interface allows user to interact with the data using hand-gestures.

Session Level: Beginner
Session Type: Talk
Tags: Scientific Visualization; Large Scale Data Visualization & In-Situ Graphics; Real-Time Graphics Applications

Day: Wednesday, 03/26
Time: 09:30 - 09:55
Location: Room LL21F

S4270 - Computation of Mutual Information Metric for Image Registration on Multiple GPUs

Andrew Adinetz ( Researcher, Julich Supercomputing Centre, Forschungszentrum Jülich )
Andrew Adinetz
Andrew V. Adinetz got his M.S. degree in Computer Science in 2006 from Lomonosov Moscow State University, and his Ph.D. in Computer Science in 2009, also from MSU. He's currently working as a researcher at Forschungszentrum Jülich (NVidia Application Lab, Jülich Supercomputing Centre). His current research interests include GPU programming, algorithm design for many-core architectures, high-performance computing and programming languages.

Because of their computational power, GPUs are widely used in the field of image processing. And while registration of brain images has been previously accelerated with GPUs, registration of human brain images presents new challenges due to large amounts of data and images not fitting in the memory of a single device. We present how we address these challenges with a multi-GPU approach. We present in detail how we overcome challenges arising due highly irregular communication during metric computation. Our evaluation demonstrates that adequate performance is achieved with multiple GPUs even with high volume of communication.

Session Level: Beginner
Session Type: Talk
Tags: Medical Imaging & Visualization; Recommended Press Session – HPC-Science; Recommended for All Press

Day: Wednesday, 03/26
Time: 09:30 - 09:55
Location: Room LL21B

S4316 - Simulating Generation, Retention and Expulsion of Hydrocarbons on GPUs

Massimo Bernaschi ( Director of Technology, National Research Council of Italy )
Massimo Bernaschi
Massimo Bernaschi is with CNR, the National Research Council of Italy as Chief Technology Officer of the Institute for Applied Computing. He is also an Adjunct Professor of Systems Programming at "Sapienza" University in Rome; Trainer in Digital Forensics at "Sapienza" and Modena Universities and Adviser of the Consortium for Supercomputing Applications (CASPUR). Before joining CNR in 1998, Massimo worked ten years at the IBM European Center for Scientific and Engineering Computing where he developed the IBM PVMe product and received two Outstanding Technical Achievement Awards. His main scientific interests are parallel computing; modelling of complex systems (finance and biology); systems and network security; high performance computing. He is the author of about 150 papers in peer-reviewed journals and international conferences. Massimo Bernaschi started working with CUDA back in 2008. He developed the first CUDA implementation of the Lattice Boltzmann method for irregular geometries and has been a pioneer in multi-GPU programming. In 2011 he received a Honorable Mention in the Gordon Bell Award for MUPHY, a multi-physics CUDA code for the simulation of bio-fluids that achieves excellent scalability up to 4000 GPU. In 2013 he is again a finalist for the Gordon Bell Award with a simulation of protein crowding that achieves 20 Petaflops on 18000 GPU. He also developed CUDA codes for spin systems simulations, dictionary and brute-force attacks to cryptosystems, signal-processing and simulation of soft matter. In 2012 Massimo Bernaschi has been named "CUDA Fellow".

Learn how to use GPUs as batch processors to simulate thousands of independent systems having a complex dynamics but relatively limited computing requirements. By using an apparently naive approach with a single CUDA thread simulating an entire system, it is possible to obtain excellent global performances and minimize, at the same time, the differences in the results with respect to the original, serial, implementation of the same application. Crucial for the success of the porting is a proper choice of the data structures that need to be designed so that the global memory of the GPU can be accessed effectively even if threads work on distinct problems. The application we present simulates products of primary migration and the expulsion of hydrocarbons from source rock but the idea can be applied to other fields. The final result in our case is a highly scalable code that runs transparently on multiple GPUs and that can be more easily updated when the underlining model changes.

Session Level: Intermediate
Session Type: Talk
Tags: Energy Exploration; Numerical Algorithms & Libraries; Computational Physics

Day: Wednesday, 03/26
Time: 09:30 - 09:55
Location: Room LL20B

S4361 - GPU-Accelerated SDR Implementation of a Multi-User Detector for Satellite Return Links

Chen Tang ( Research Engineer, German Aerospace Center )
Chen Tang
Chen Tang is a Research Engineer in the Institute of Communication and Navigation at German Space Center (DLR). With the degree of Master of Science in electrical engineering at Technical University of Munich, he is currently working as a research engineer at German Space Center, specializing in Software Defined Radio (SDR), CUDA programming, satellite communication network and emergency management system. He has conducted several CUDA programming tutorials in his institute.

In this session a novel GPU-based Software Defined Radio (SDR) implementation of a Multi-User Detector (MUD) receiver for transparent satellite return link is presented. In the past decade new satellite applications have emerged, which require a bidirectional satellite link. Due to the scarcity and high cost of satellite frequency spectrum, it is very important to utilize the available spectrum as efficiently as possible. The efficient usage of the spectrum in the satellite return link is a challenging task, especially if multiple users are present. In previous work MUD techniques have been widely studied to increase the spectral efficiency of the satellite return link. However, due to the high computational complexity and its sensitivity to synchronization and channel estimation errors, only few implementations of MUD for satellite communications exist. Here we will present a GPU-accelerated MUD receiver operating in real time for satellite return links, which achieves a decoding throughput of 290 Kbps.

Session Level: Intermediate
Session Type: Talk
Tags: Defense; Signal & Audio Processing

Day: Wednesday, 03/26
Time: 09:30 - 09:55
Location: Room 210D

S4472 - Performance Analysis and Optimization of OpenACC Applications

Michael Wolfe ( Compiler Engineer, NVIDIA )
Highly-Rated Speaker
Michael Wolfe
Wolfe was a professor from 1988 through 1996, taught computer science classes, has taught and continues to present many tutorials, has done many conference technical presentations and a few invited conference presentations as well.

Learn how to use performance analysis tools to find the bottlenecks in your OpenACC applications. With the proper performance information, and the feedback from the compiler, you can tune your application and improve overall performance. Live demonstrations will use PGI's pgprof, NVIDIA's Visual Profiler and command-line nvprof, and additional tools available to the parallel computing community.

Session Level: Beginner
Session Type: Talk
Tags: Performance Optimization

Day: Wednesday, 03/26
Time: 09:30 - 09:55
Location: Room 212B

S4523 - OpenSubdiv Update

Manuel Kraemer ( Senior Software Engineer, Pixar )
Manuel Kraemer
Manuel Kraemer is a Graphics Software Engineer at Pixar Animation Studios. Prior to that he worked as a technical director  at Disney Feature Animation, Double Negative and the BBC. He is currently working on OpenSubdiv, the open source subdivision surface API.

A review of the OpenSubdiv API, detailing the tessellation techniques used to interactively display complex smooth limit surfaces. The session will also introduce the new features added to the library as well as domain problems still in the research stages.

Session Level: Intermediate
Session Type: Talk
Tags: Media & Entertainment Summit; Performance Optimization; Rendering & Animation; Game Development

Day: Wednesday, 03/26
Time: 09:30 - 09:55
Location: Room 211A

S4524 - Sparse LU Factorization on GPUs for Accelerating SPICE Simulation

Xiaoming Chen ( PhD candidate, Tsinghua University )
Xiaoming Chen
Xiaoming Chen received the BS degree from department of Electronic Engineering, Tsinghua University, in 2009, where he is currently working towards the Phd degree. His research interests include parallel EDA algorithms on CPUs and GPUs, and low power and reliability design methodologies of VLSI. He has published 17 papers and received 3 best paper nominations.

Simulation Program with Integrated Circuit Emphasis (SPICE) simulators are widely used for transistor-level simulation in IC design and verification. The time cost of SPICE simulators is dominated by two parts: MOSFET model evaluation and the sparse linear solver. This session will talk about our work on GPU-based sparse LU factorization which is specially designed for SPICE simulation. In particular, we will introduce the challenges of mapping a sparse solver onto a GPU, our parallelization strategies of sparse LU factorization, and performance optimization approaches. Experimental results will be presented and discussed as well.

Session Level: Intermediate
Session Type: Talk
Tags: Numerical Algorithms & Libraries; Electronic Design Automation

Day: Wednesday, 03/26
Time: 09:30 - 09:55
Location: Room LL20D

S4856 - Making Subdivision Surfaces an Industry Standard

David Yu ( Graphics Software Engineer, Pixar Animation Studios )

Catmull-Clark subdivision surfaces provide a powerful way to represent geometry in animation systems. At a base level they extend bspline patches to handle arbitrary topology. And with advanced features such as semi-sharp creasing, face-varying texture coordinates, and hierarchical control meshes, they can encode higher frequency modeling features with a surprising economy of control points.In this talk Bill Polson will cover: What is subdivsion? What are the advanced features and why are they useful? How are they drawn and evaluated on CPUs and GPUs? Bill will also discuss OpenSubdiv: Pixar's opensource implementation of subdivision surfaces. He'll demonstrate OpenSubdiv in action; and provide an update on the project, the engineering roadmap, and industry adoption.

Session Level: Beginner
Session Type: Talk
Tags: Mobile Summit; Media & Entertainment Summit; Rendering & Animation; Recommended Press Session – Mobile

Day: Wednesday, 03/26
Time: 09:30 - 09:55
Location: Room 210E

S4148 - Experiences Porting Real Time Signal Processing Pipeline CUDA Kernels to Kepler and Windows 8

Ismayil Guracar ( Senior Key Expert, Siemens Medical Solutions USA, Inc. Ultrasound Business Unit )
Ismayil Guracar has been working in the ultrasound imaging field for over 27 years. He is currently a Senior Key Expert with the Innovations Applications Group at Siemens Healthcare, Ultrasound Business Unit in Mountain View, California. His interests include ultrasound image formation and high performance real time signal processing, especially using GPUs. He holds 64 US patents and has pioneered new ultrasound technologies in the areas of parametric and molecular imaging and contributed to the development of many successful diagnostic medical imaging products.

The move to the Kepler generation of GPU cards created new challenges and opportunities for an existing medical ultrasound imaging product with high performance real time signal processing kernels based on CUDA running on the Fermi based Quadro 2000 and WinXP. The initial port to Kepler and Win8 only required a new driver, however significant degradation in execution speed was noted compared to the earlier generation. I will show how various causes of the slowdown were identified and the strategies we developed, including increasing instruction level parallelism (ILP), to refactor kernels to achieve the full potential of the Kepler architecture.

Session Level: Intermediate
Session Type: Talk
Tags: Medical Imaging & Visualization; Signal & Audio Processing; Performance Optimization

Day: Wednesday, 03/26
Time: 10:00 - 10:50
Location: Room LL21B

S4189 - CUDA-Accelerated Wireless Communication Technnique for Aerospace Exploration

Ying Liu ( Associate Professor, University of Chinese Academy of Sciences )
Ying Liu
Ying Liu received her B.S. degree from Peking University, China, in 1999, the M.S. degree and the Ph.D. degree from Northwestern University, Evanston, IL, USA, in computer engineering in 2001 and 2005, respectively. She is currently an associate professor in School of Computer and Control, University of Chinese Academy of Sciences. Her research interests include high-performance computing, data mining, and business intelligence. She served as the workshop chairman in the International Conference on Computational Science (2007) and International Conference on Data Mining (2007) as well. She is the recipient of NVIDIA Global Professor Partnership (2009) and awarded as the CUDA Teaching Center (2012) for her great contribution in CUDA course teaching in China.

Learn the concept of a typical telemetry system for aerospace exploration, and identify the bottleneck in he process flow and its corresponding computational complexity. Learn our approach to accelerate Multiple Symbol Detection (MSD)-based demodulation method. The computational core of MSD is defined as "sliding correlation" problem, which calculates the correlation between a long vector and a set of short vectors. An efficient CUDA parallelization scheme is proposed to accelerate MSD. High thread-level parallelism is achieved by this scheme and various optimization techniques are applied to improve the performance. CU-MSD is implemented by adapting "sliding correlation". Good speedups is observed on data sets generated from a real aerospace PCM/FM integrated baseband system.

Session Level: Intermediate
Session Type: Talk
Tags: Defense; Signal & Audio Processing

Day: Wednesday, 03/26
Time: 10:00 - 10:25
Location: Room 210D

S4222 - Red Fox: An Execution Environment for Relational Query Processing on GPUs

Haicheng Wu ( Ph.D. Student, Georgia Institute of Technology )
Haicheng Wu
Haicheng Wu is a Ph.D. student in the Computer Architecture and Systems Lab (CASL) at Georgia Institute of Technology under direction of Professor Sudhakar Yalamanchili. He received his B.S. in Electrical Engineering (EE) from Shanghai Jiao Tong University in 2006 and his M.S. in Electrical and Computer Engineering (ECE) from Georgia Institute of Technology in 2009. His research project is developing compiler tool chains for heterogeneous architectures with a focus on GPU-based systems. Specifically, He is developing a compiler, Red Fox, for accelerating large scale data warehousing applications on cloud architectures augmented with GPU accelerators. He received NVIDIA Graduate Fellowship twice (2012~2014).

This session will present the Red Fox system. Attendees will leave understanding GPU performance when executing relational queries over large data sets as typically found in data warehousing applications and the automatic compilation flow of kernel fusion which can be applied to other applications.

Session Level: Beginner
Session Type: Talk
Tags: Big Data Analytics & Data Algorithms; Programming Languages & Compilers

Day: Wednesday, 03/26
Time: 10:00 - 10:25
Location: Room 210B

S4285 - Scaling Soft Matter Physics to Thousands of GPUs in Parallel

Alan Gray ( Research Architect, EPCC, The University of Edinburgh )
Alan Gray
Dr Alan Gray was awarded a Ph.D. at The University of Glasgow in Theoretical Particle Physics in 2003, winning the 2004 Ogden Prize for the best UK thesis in particle physics phenomenology. He furthered this work under a fellowship at The Ohio State University, and since joining EPCC in 2005 he has been involved with a wide range of HPC-related projects: lately his research has focused on massively parallel GPU-accelerated supercomputing, making significant contributions to several scientific areas including condensed matter physics, genetics, and musical acoustics. He has authored a large number of refereed and highly-cited publications.

Discover how to adapt a real, complex application such that it can efficiently utilize thousands of GPUs in parallel. We describe our successes in combining CUDA with MPI to simulate a wide variety of complex fluids of key importance to everyday life. We are careful to present our work in a generalizable way, such that others can learn from our experience, follow our methodology and even re-use our highly efficient communication library. We detail our efforts to maximize both performance and maintainability, noting that we support both CPU and GPU versions (where the latter is 3.5-5 times faster comparing equal numbers of GPUs and fully-utilized CPUs). We present our work to carefully schedule and overlap lattice based operations and halo-exchange communication mechanisms, allowing excellent scaling to at least 8,192 GPUs in parallel on the Titan supercomputer.

Session Level: Intermediate
Session Type: Talk
Tags: Supercomputing; Computational Fluid Dynamics; Computational Physics

Day: Wednesday, 03/26
Time: 10:00 - 10:25
Location: Room LL20C

S4331 - Fast and Easy GPU Offloading for Computational Finance

Lukasz Mendakiewicz ( Software Development Engineer in Test II, Microsoft Corp )
Lukasz Mendakiewicz
Łukasz Mendakiewicz is a software engineer at Microsoft, where he focuses on the customer experience with parallel programming models for C++. He is especially interested in GPGPU acceleration, and puts this passion to work on C++ AMP. He holds an M.S. in Computer Science from AGH UST in Krakow, Poland with the thesis on implementing real-time global illumination algorithms on a GPU.

This session provides insight on how to obtain superior performance for computational finance workloads without compromising developer productivity. C++ AMP technology lets you write C++ STL like code that runs on GPUs (and CPUs) in a platform (Windows and Linux) and vendor agnostic manner. The session will start with an overview of C++ AMP, dive into C++ AMP features, list various compilers that support C++ AMP and showcase the performance characteristics of options pricing workloads written using C++ AMP code. Attend this talk to see how you can write productive and easy to maintain code that offers superior performance. Thereby delivering the ability to write productivity code once and exploit the hardware to its fullest.

Session Level: Intermediate
Session Type: Talk
Tags: Finance; Big Data Analytics & Data Algorithms; Programming Languages & Compilers

Day: Wednesday, 03/26
Time: 10:00 - 10:25
Location: Room 210C

S4333 - Computing the Cure: Combining Sequencing and Physical Simulation on GPUs to Provide Patient Customized Cancer Treatments

Ross Walker ( Associate Professor, UCSD )
Ross Walker
Ross Walker is an Associate Research Professor at the San Diego Supercomputer Center, an Adjunct Associate Professor in the Department of Chemistry and Biochemistry at the University of California, San Diego, CEO of Verizyme Inc and an NVIDIA Fellow. He runs the Walker Molecular Dynamics Lab in San Diego where he leads a team that develops advanced techniques for Molecular Dynamics Simulations supporting work aimed at improved drug and biocatalyst design. His work includes improved Quantum Mechanical, Molecular Mechanical models, development of new force fields for simulation of lipid membranes, simulations of cellulase enzymes for improved cellulosic bioethanol production and the development of a GPU accelerated version of the AMBER Molecular Dynamics engine PMEMD.

The sequencing revolution is completely changing the landscape of cancer treatment ushering in the era of personalized medicine where individual treatments will be customized for a specific patient. Instead of simply looking at stained tumor biopsy sections under a microscope, cancer diagnosis is going high-tech by allowing sequencing of patient tumors (and patient genomes) to determine what precise molecular events cause an individual cancer. In principle, this sequence information holds the key to individually targeted therapies with enormously increased success rates in treating (and even curing) cancer. This is the "molecular oncology" revolution and it will completely change the cancer diagnosis and treatment landscape in the next decade. This talk will highlight work by scientists at MSKCC, Stanford and UCSD to build the tools needed to determine drug susceptibilities using a combination of sequencing data and *physical* simulation. This work will ultimately provide a way to compute patient customized cancer treatments.

Session Level: Intermediate
Session Type: Talk
Tags: Molecular Dynamics; Computational Structural Mechanics; Bioinformatics & Genomics; Computational Physics; Recommended for All Press

Day: Wednesday, 03/26
Time: 10:00 - 10:50
Location: Room LL21E

S4356 - GPU Renderfarm with Integrated Asset Management & Production System

Chen Quan ( Centre Manager, Multi-platform Game Innovation Centre, Nanyang Technological University (NTU) )
Dr Quan Chen obtained his BSc in Computer Science and Technology from Fudan University, China, in 2002, and his Ph.D. from Nanyang Technological University (NTU), Singapore, in 2008. He is currently Centre Co-Manager of Multi-plAtform Game Innovation Centre, NTU. His main research interest is computer animation and games technology.

We propose our integrated system of GPU renderfarm with Asset Management & Production System that can greatly streamline the Computer Graphics (CG) movie production. Two of main advantages of our systems are: 1. Our asset management system can ease the difficulty in asset handling of a CG movie production between multiple artists. 2. The 3D CG assets that are stored in our system can then be directly submitted to GPU renderfarm for rapid rendering without the need to manually download the assets (from the asset management system). Moreover, by using GPU renderfarm we can accelerate the rendering time significantly.

Session Level: All
Session Type: Talk
Tags: Media & Entertainment Summit; Clusters & GPU Management; Remote Graphics & Cloud-Based Graphics

Day: Wednesday, 03/26
Time: 10:00 - 10:25
Location: Room 211A

S4373 - Enhanced Oil Recovery Simulation Performances on New Hybrid Architectures

Thomas Guignon ( Research Engineer, IFP Energies Nouvelles )
Thomas Guignon is a research engineer in computer science at IFP Energies Nouvelles since 2005. He received his Ph.D. in computer science 2003 while he works in an HPC company on software and hardware management of Linux cluster for HPC. His research work at IFP Energies Nouvelles focus on : improving simulation software performances, parallel linear solvers and GPU computing.

The goal of this session is to show that GPU linear solvers with highly parallel preconditioners can tackle with most advanced ones (CPR-AMG) using classical MPI based programming model in the context of reservoir simulation.

Session Level: Intermediate
Session Type: Talk
Tags: Energy Exploration; Numerical Algorithms & Libraries

Day: Wednesday, 03/26
Time: 10:00 - 10:25
Location: Room LL20B

S4429 - Enabling Efficient Many-Task Computing on GPGPUs

Scott Krieder ( Ph.D. Student, Illinois Institute of Technology )
Scott Krieder
Scott Krieder is a 3rd year Ph.D. student and 2013 Starr/Fieldhouse Research Fellow from the Department of Computer Science at the Illinois Institute of Technology. Scott works as a Research Assistant in the Data-Intensive Distributed Systems Laboratory and as a Teaching Assistant for the Department of Computer Science. In addition, Scott is a Guest Researcher at Argonne National Laboratory. Scott's recent research contributions involve the GeMTC project which aims to provide improved programmability and efficiency of hardware accelerators (GPGPUs, Intel Xeon Phi) in the Distributed Systems and High-Performance Computing spaces for Many-Task Computing workloads. Scott holds a Bachelors of Science in Computer Science from Creighton University(2010) and a Masters of Science in Computer Science from Loyola University of Chicago(2011).

Current software and hardware limitations prevent Many-Task Computing (MTC) workloads from leveraging hardware accelerators boasting Many-Core Computing architectures. Some broad application classes that fit the MTC paradigm are workflows, MapReduce, high-throughput computing, and a subset of high-performance computing. MTC emphasizes using many computing resources over short periods of time to accomplish many computational tasks (i.e. including both dependent and independent tasks), where the primary metrics have been measured in seconds; this work aims to reduce this granularity into milliseconds. Learn how to enable efficient Many-Task Computing through the use of a CUDA based framework that (1) Features a daemon kernel on the device managing compute elements, and (2) Enables efficient dynamic memory management through a sub-allocator.

Session Level: Intermediate
Session Type: Talk
Tags: Programming Languages & Compilers; Supercomputing; Clusters & GPU Management

Day: Wednesday, 03/26
Time: 10:00 - 10:25
Location: Room LL21A

S4475 - GPU-Based Multiplatform Transcoding

Mahmut Samil Sagiroglu ( Co-Founder, Erlab Software )
Mahmut Samil  Sagiroglu
Graduated in 1998 from Istanbul University Electronic Engineering department, he finished his MSc in the same university in 2001. He received his PhD. degree from Sabancı University in 2006. After working in several companies, he is employed by TÜBİTAK (The Scientific and Technological Research Council of Turkey) - UEKAE (National Electronics and Cryptology Research Institute) as a researcher in 1999. In pursuit of having roles in several projects and managing two different departments, he is actually responsible from Computational Biology and Security Applications Department. He is also managing Advanced Genomics and Bioinformatics Researches Center Infrastructure Project. His specialty areas include bioinformatics, cryptanalysis, signal processing and electronic designs.

Learn how to take the advantage of GPU for video processing and encoding in order to get a high efficient, real time, and multiplatform video output. Contemporary trends make it mandatory to transmit digital media to all platforms. We use GPU processing and NVENC hardware encoding in order to produce the video stream output in different formats simultaneously with minimum latency and high quality.

Session Level: All
Session Type: Talk
Tags: Media & Entertainment Summit; Video & Image Processing

Day: Wednesday, 03/26
Time: 10:00 - 10:25
Location: Room 211B

S4486 - Raising the Roofline on GPU Applications with Stacked Memory

Lorena Barba ( Associate Professor of Mechanical and Aerospace Engineering, George Washington University )
Highly-Rated Speaker
Lorena A. Barba is Associate Professor of Mechanical and Aerospace Engineering at the George Washington University, in Washington DC. She has MSc and PhD degrees in Aeronautics from the California Institute of Technology and BSc and PEng degrees in Mechanical Engineering from Universidad Técnica Federico Santa María in Chile. Previous to joining GW, she was Assistant Professor of Mechanical Engineering at Boston University (2008–2013) and Lecturer/Senior Lecturer of Applied Mathematics at University of Bristol, UK (2004–2008). Prof. Barba is an Amelia Earhart Fellow of the Zonta Foundation (1999), a recipient of the EPSRC First Grant program (UK, 2007), an NVIDIA Academic Partner award recipient (2011), and a recipient of the NSF Faculty Early CAREER award (2012). She was appointed CUDA Fellow by NVIDIA Corporation (2012) and is an internationally recognized leader in computational science and engineering.

GPU applications face three potential bottlenecks: instruction throughput, memory throughput and latency. Sometimes we can refactor the algorithm to improve performance after profiling. Another approach is to use the roofline model to analyze computational kernels and identify performance limitations on specific hardware. Such analysis characterizes many important scientific algorithms as memory-bound when running on GPUs. But as we look forward to new generations endowed with stacked DRAM, we see the roof magically lifting due to reduced latencies and higher bandwidths, leading to unprecedented speed-up factors in memory-bound algorithms. With my co-author Manuel Ujaldon, NVIDIA CUDA Fellow and Professor of Computer Architecture at the University of Malaga (Spain), we are looking at how scientific algorithms may benefit from the stacked DRAM of future GPU generations. In this talk, I will present how we characterize GPU application performance via the roofline model and analyze the contribution of stacked DRAM to anticipate its impact in raising performance ceilings in future GPUs like Volta.

Session Level: Intermediate
Session Type: Talk
Tags: Numerical Algorithms & Libraries; Supercomputing

Day: Wednesday, 03/26
Time: 10:00 - 10:25
Location: Room LL20D

S4499 - Enabling the Next Generation of Particle Physics Experiments: GPUs for Online Track Reconstruction

Andreas Herten ( PhD Student (Physics), Forschungszentrum Jülich GmbH )
Andreas Herten
In 2010, Andreas earned his Diploma in Physics from RWTH Aachen University. Since 2011, he has been a Ph.D. student in Physics at Institut für Kernphysik (Institute for Nuclear Physics) of Forschungszentrum Jülich GmbH (Research Center Jülich) & Ruhr-Universität Bochum.

PANDA is a next generation particle physics experiment involving a novel data acquisition mechanism. Commonly, particle physics experiments read out the full detector response of particle collisions only when a fast hardware-level trigger fires. In contrast to this, PANDA uses a sophisticated event filtering scheme which involves reconstruction of the whole incoming data stream in real time (online) to distinguish signal from background events. At a rate of about 20 million events per second, a massive amount of computing power is needed in order to sufficiently reduce the incoming data rate of 100 GB/s to 2 PB/year for permanent storage. We explore the feasibility of using GPUs for this task. This talk outlines the challenges PANDA faces with data acquisition and presents the status of the GPU investigations. Different reconstruction (tracking) algorithms running on NVIDIA GPUs are shown and their features and performances highlighted.

Session Level: Beginner
Session Type: Talk
Tags: Computational Physics

Day: Wednesday, 03/26
Time: 10:00 - 10:25
Location: Room 212A

S4516 - Scientific Data Visualization on GPU-Enabled, Hybrid HPC Systems

Mel Krokos ( Senior Lecturer, University of Portsmouth )
Mel Krokos
Mel Krokos is a Senior Lecturer in Computer Graphics and Visualisation in the School of Creative Technologies, University of Portsmouth UK. He previously held postdoctoral research positions involved in the successful completion of a number of IST European projects on bio-medical visualisation. He was a developer in an open source framework for biomedical applications, and is a member of the developers' team for a visualisation toolkit for astrophysics. He is currently involved with an EU-funded project which is developing a customisation methodology for science gateways integrating European DCI infrastructures. His research interests include big data visualisation, GPU-enabled algorithms, high performance computing, scientific gateways, computational grids, computer-aided modeling, user interfaces and virtual reality. Mel serves regularly as a journal reviewer and IPC member for international conferences. He is an ACM SIGGRAPH Pioneer and member of the IEEE Computer Society.

Our session will focus on exploitation of emerging GPU-enabled, hybrid HPC architectures for scientific data visualization. We employ Splotch - a rendering algorithm that allows production of high quality imagery and supports very large-scale datasets. We summarize a previously developed CUDA implementation of Splotch referring to the underlying performance model for data transfers, computations and memory access. We subsequently focus on exploitation of HyperQ to allow GPU sharing among multiple cores within nodes, followed by an MPI-based approach to distribute workloads across multiple hybrid nodes within HPC systems. A work-offloading model is finally discussed based on MPI-2 remote memory access features for exploiting multi-node, multi-core and multi-coprocessor accelerated computations towards achieving an optimal level of parallelism. We discuss performance results using reference datasets coming from large-scale astrophysical simulations.

Session Level: Advanced
Session Type: Talk
Tags: Large Scale Data Visualization & In-Situ Graphics; Scientific Visualization; Astronomy & Astrophysics; Supercomputing

Day: Wednesday, 03/26
Time: 10:00 - 10:25
Location: Room LL21F

S4572 - An Elegantly Simple Design Pattern for Building Multi-GPU Applications

Bob Zigon ( Sr Staff Research Engineeer, Beckman Coulter )
Highly-Rated Speaker
Bob Zigon
Bob Zigon is a Sr Staff Research Engineer and has worked at Beckman Coulter for 11 years. He has degrees in Computer Science and Mathematics from Purdue University. He was the architect of Kaluza, an NVIDIA Tesla powered analysis application for flow cytometry. He's now working in particle characterization and analytical ultracentrifugation. His interests include high performance computing, numerical analysis and software development for life science.

GPU-based applications can be architected in different ways. The simplest approach will have a client application that is tightly coupled to a single GPU. The second approach can have a client application that is tightly coupled to multi GPU's by way of operating system threads and GPU contexts. Finally, in scientific computing, a common pattern is to use MPI, multiple Intel cores and multiple GPU's that work cooperatively to solve a fixed problem. This session will describe a design pattern that loosely couples a client application to a collection of GPU's by way of a public domain "data structure server" called Redis. The approach works well for fat client and thin client based applications. The compelling aspects of the approach are 1) the ease of debugging and 2) the ease with which multiple GPU's can be added to handle increased user load.

Session Level: Intermediate
Session Type: Talk
Tags: Performance Optimization; Big Data Analytics & Data Algorithms

Day: Wednesday, 03/26
Time: 10:00 - 10:25
Location: Room 212B

S4580 - CUDA Debugging Tools: CUDA-GDB and CUDA-MEMCHECK

Vyas Venkataraman ( Senior Engineer - CUDA Tools, NVIDIA )
Highly-Rated Speaker
Vyas Venkataraman
Vyas Venkataraman is a Senior Manager in the CUDA developer tools group at NVIDIA. He is primarily responsible for CUDA-MEMCHECK, and also works on developer tool support on new GPU architectures. He joined NVIDIA in 2010 from Boston University where he was doing research on abstractions for high level modeling of synthesizable communicating systems. Vyas received his PhD, M.S. and B.S. degrees from the College of Engineering at Boston University.

Advance debugging techniques with CUDA-GDB and CUDA-MEMCHECK will be covered that can assist even the most advanced CUDA developer in locating program correctness issues.

Session Level: Advanced
Session Type: Talk
Tags: Debugging Tools & Techniques

Day: Wednesday, 03/26
Time: 10:00 - 10:50
Location: Room LL21D

S4634 - What Makes Design and Styling Critical to a Brand

David Hilton ( Auto Designer, Motorcity Europe )
David Hilton: Recently Head of Exterior Design Bentley Motors Ltd Mr. Hilton’s portfolio includes production vehicles as well as racecars. A graduate of the University of Cincinnati and the College for Creative Studies in Detroit, he was once the chief designer at Ford Racing in Britain. He spent most of his early career with Ford in Detroit and Germany, as well as with Mazda in Japan and VW Brazil. In Europe, he was responsible for the designs of the 2002 Ford Focus RS and ST models as well as the 2007 ‘Car of the Year’ S-Max, none of which were offered for sale in the United States. Since 2000, he has run his own design firm,Motorcity Europe in Cologne, a network of designers offering complete automotive and product design services worldwide. MCE also offers market research and branding consulting, and lists Jaguar, Kia, Peugeot, Hyundai, Spectre, McLaren and other automotive brands in its client roster. In December 2011 Bentley Motors hired David Hilton as its head of exterior design.

Most of the world's major car makers are now on the same level as far as quality and engineering are concerned. The dividing factors of design and Customer USPs have never been so important. OEM brands are becoming more adventurous in self-definition for one reason: they have to in order to survive! So, what makes a brand so special? How can you identify it, and bring it to a reality? This is one of the new keys to automotive success.

Session Level: All
Session Type: Talk
Tags: Digital Manufacturing Summit; Automotive; Manufacturing; Rendering & Animation; Recommended Press Session – Digital Manufacturing

Day: Wednesday, 03/26
Time: 10:00 - 10:50
Location: Room 210G

S4715 - ArcGIS Pro - 3D GIS in Virtualized Environments

John Meza ( Performance Engineering Team Lead, SW Development, ESRI )
John Meza is the Performance Engineering team lead at ESRI. In 2008 I started the team that is reponsible for performance and scalability testing of the ESRI ArcGIS product suite in multiple environments. Prior to starting this team in 2008 I was a consultant in the Professional Services division of ESRI specializing in performance and scalability analysis, investigation and presentations. Earlier at ESRI I was a Developer Technology Evangelist, and also a Support Analyst for Enterprise GeoDatabase technologies.

The ESRI user community encompasses a wide set of industries, from local and federal government, power utilities, to oil and gas, many are deployed in VDI environments. NVIDIA Grid cards allow those users to take advantage of the 3D graphics now available in our newest GIS analytical product: ArcGIS Pro. This presentation will show results of the performance and scalability testing of ArcGIS Pro in NVIDIA Grid enabled VDI environments. The presentation will include: VDI vendor tools available for admin and tuning, performance and scalability metrics, challenges in designing, developing and testing 3D applications for these environments.

Session Level: All
Session Type: Talk
Tags: Graphics Virtualization Summit; Desktop & Application Virtualization; Remote Graphics & Cloud-Based Graphics; Defense; Recommended Press Session – Graphics Virtualization

Day: Wednesday, 03/26
Time: 10:00 - 10:50
Location: Room 210F

S4835 - System Design Considerations for GPU-Enabled HPC Solutions (Presented by Dell)

Onur Celebioglu ( Systems Engineer Manager, Dell, Inc. )
Onur Celebioglu is the engineering Director for Dell’s High Performance Computing (HPC) solutions engineering team. His responsibilities include design, development, integration and performance characterization of Dell’s HPC and business analytics solutions. His primary areas of focus are performance analysis of scale-out systems, parallel file systems, cluster management tools, high speed network interconnects, accelerators and generation of best practices on the use of these technologies. He holds an M.S. in Electrical and Computer Engineering from Carnegie Mellon University.   

GPU accelerated systems have become an integral component of the HPC ecosystem. GPUs provide a quantum leap in performance across a wide spectrum of HPC applications. However, to fully realize these gains, it is important to design balanced systems and this talk will discuss different system-level considerations for various use cases. We will utilize HPL to analyze performance and power consumption at a system level as well as compare today's GPUs to the previous generation GPUs as applicable to highlight the improvements. The goal is to provide the audience information and best practices to design GPU enabled systems using the Kepler GPUs considering parameters such as power consumption, system size and system-level features.

Session Level: All
Session Type: Talk
Tags: Supercomputing

Day: Wednesday, 03/26
Time: 10:00 - 10:50
Location: Room LL21C

S4846 - Enabling Direct3D Content on OpenGL Platforms: Mac, Linux, Android, and Beyond!

Gavriel State ( Founder & CTO, TransGaming Inc. )
Gavriel State dreamed of being involved in video games ever since his father failed to type the Hunt The Wumpus BASIC source code into their PET. Eventually this lead to him founding TransGaming in order to ensure that he'd always be able play games other people had typed in, regardless of whether or not they chose the right kind of system to write them for. Gavriel and his team at TransGaming have since been involved in a wide variety of gaming technology products that bridge the gaps between platforms, such as Cedega, Cider, SwiftShader, and GameTreeTV.

Today's developers face unprecedented challenges in choosing which platforms to target when developing games and applications meant to be used by a wide consumer audience. Beyond the Windows desktop, there are now a huge variety of new choices: alternative desktop OS platforms such as Linux and Mac OS X; mobile devices such as phones and tablets; HTML-based web platforms, running on cloud-based servers; and a plethora of embedded CE systems, ranging from video game consoles to TV platforms. All of these platforms use some variety of OpenGL or OpenGL ES, rather than Direct3D. If you have games or other Direct3D-based content that you want to retarget to a new platform, this session will show you how to quickly and easily enable your graphics code to run on OpenGL platforms using TransGaming's shim technology.

Session Level: Intermediate
Session Type: Talk
Tags: Mobile Summit; Game Development; Media & Entertainment; Programming Languages & Compilers

Day: Wednesday, 03/26
Time: 10:00 - 10:25
Location: Room 210E

S4883 - Advanced Solutions for Media & Entertainment, Engineering and Design from HP and NVIDIA (Presented by HP)

Sean Young ( Worldwide Segment Manager, AEC and Product Development, HP )
At HP, Sean is responsible for global marketing strategy and programs targeting Product Development and AEC markets. He also manages global strategic alliances with Autodesk, Bentley, Dassault Systèmes, DS SolidWorks, PTC, and Siemens PLM. Previously Sean was a product manager and marketing manager at Autodesk, where he led product management for Showcase and 3ds Max, and managed marketing for the AutoCAD Design Suite. Sean holds an MBA from Queen’s University in Ontario, Canada.

Come learn about the technology and partnership between HP and NVIDIA that empowers users around the world to design and create without limitations.

Session Level: Beginner
Session Type: Talk
Tags: Computer Aided Design; Ray Tracing; Media & Entertainment; Manufacturing

Day: Wednesday, 03/26
Time: 10:00 - 10:50
Location: Room 210A

S4173 - Fast Evaluation of the Inverse Poisson Cumulative Distribution Function

Mike Giles ( Professor of Scientific Computing, University of Oxford )
Mike Giles
Prior to joining the faculty of Oxford University, Mike was an Assistant/Associate Professor at MIT. at Oxford, Mike is also a CUDA Fellow and Director of the Oxford University CUDA Center of Excellence.

The inverse of the Poisson cumulative distribution function maps uniformly-distributed random numbers to Poisson random variates. This talk describes a fast implementation for GPUs which is based on some novel approximations of the inverse of the closely-related incomplete gamma function for the case of large Poisson rates. Both single-precision and double-precision versions have been developed, and in each case the computational cost is not much more than the cost of the corresponding function for inverting the Normal cumulative distribution function. The software is freely available as open source from http://people.maths.ox.ac.uk/gilesm/poissinv/

Session Level: Intermediate
Session Type: Talk
Tags: Numerical Algorithms & Libraries

Day: Wednesday, 03/26
Time: 10:30 - 10:55
Location: Room LL20D

S4178 - Killer-App Fundamentals: Massively-Parallel Data Structures, Performance to 13 PF/s, Portability, Transparency, and More

Rob Farber ( Chief Scientist, BlackDog Endeavors, LLC )
Highly-Rated Speaker
Rob Farber
Rob Farber is recognized for his work in High Performance Computing (HPC), machine learning, scalable algorithms, massively parallel computing devices such as GPUs and Intel Xeon Phi, complex dynamical systems and high energy physics. He has co-founded two successful companies and currently acts as consultant to fortune 100 companies. His recent work has includes achieving 13 PF/s average sustained performance on the ORNL Titan supercomputer plus fast graph algorithms for social media "needle in the haystack" problems using NVIDIA GPUs, Intel Xeon Phi, and multi-core processors. Rob also served on the external faculty at the Santa Fe Institute, and authored "CUDA Application Design and Development", with the follow-up book in development. As an author/teacher, Rob has 100+ peer reviewed scientific publications.

Discover killer-app fundamentals including how to tame dynamic parallelism with a robust-performance parallel stack that allows both host and device side fast memory allocation and transparent data transfer of arbitrarily complex data structures and general C++ classes. A low-wait approach (related to wait-free methods)is used to create a performance robust parallel counter. You definitely want to use this counter for histograms! New results extending machine learning and big data analysis to 13 PF/s average sustained performance using 16,384 GPUs in the ORNL Titan supercomputer will be presented. General programming approaches for graph algorithms and identifying 100x speedups in algorithms like Kriging interpolation will be discussed. Both portability to -- and performance comparisons against -- other architectures such as Intel Xeon Phi will also be covered. Specific examples of this technology for social media analysis and brain research will be highlighted.

Session Level: All
Session Type: Talk
Tags: Programming Languages & Compilers; Machine Learning & AI; Supercomputing

Day: Wednesday, 03/26
Time: 10:30 - 10:55
Location: Room LL21A

S4353 - Terrestrial 3D Mapping with Parallel Computing Approach

Janusz Bedkowski ( Researcher, Institute of Mathematical Machines )
From 2006-2012 Janusz was a Researcher in Industrial Research Institute of Automation and Measurements, Warsaw, Poland. Prior to this, Janusz was a Researcher and lecturer in Institute of Automation and Robotics at Warsaw University of Technology, Warsaw, Poland and Researcher in Institute of Mathematical Machines, Warsaw, Poland. Janusz's research interests are parallel computing in CUDA for robotic applications and creating simulation tools for mobile robot operators' training.

This work concerns the parallel implementation of 3D mapping algorithm. Several nearest neighborhood search strategies are compared. The accuracy of final 3D mapping is evaluated with geodetic precision. This work can be used in several applications such as mobile robotics and spatial design. Attendees will learn how to choose proper nearest neighbors search strategy for 3D data registration, how to build accurate 3D maps, how to evaluate 3D mapping system with geodetic precision and what the influence of parallel programming is to performance and accuracy.

Session Level: Intermediate
Session Type: Talk
Tags: Computer Vision; Big Data Analytics & Data Algorithms

Day: Wednesday, 03/26
Time: 10:30 - 10:55
Location: Room 210B

S4372 - Does Antimatter Fall On The Earth? Measurement Of Antimatter Annihilation with GPUs

Akitaka Ariga ( Senior Assistant, University of Bern )
Akitaka Ariga
Akitaka Ariga, Senior Assistant in University of Bern, Switzerland, graduated in 2008 with his Ph.D. from Nagoya University, Japan.

One of the most important unanswered questions in physics is: Does antimatter fall in the same way as matter? At the European Organization for Nuclear Research (CERN, Geneva) the AEgIS experiment is underway to measure the gravitational force on antimatter and has to reach a nanometric precision in determining the free-fall of antimatter. In particular, the 3D reconstruction of particle tracks produced in matter - antimatter annihilations requires a huge amount of computing resources, that is a processing of tomographic images of 30 TByte per day. In this talk, the application of GPUs for the 3D tracking of particles in photo-emulsion detectors will be reported.

Session Level: All
Session Type: Talk
Tags: Computational Physics; Astronomy & Astrophysics

Day: Wednesday, 03/26
Time: 10:30 - 10:55
Location: Room 212A

S4382 - A Flexible IIR filtering Implementation for Audio Processing

Juergen Schmidt ( Senior Scientist, Technicolor Research & Innovation )
Juergen Schmidt
Jürgen Schmidt is Senior Scientist with a long standing experience in the fields of audio processing, audio presentation technologies and software architectures. He received his Diploma from the Leibniz Universität Hannover and joined DTO in 1994. His research interests are focussing audio processing for acquisition and replay and semantic technologies.

Infinite impulse response (IIR) filters are used in almost any signal processing area. In the field of audio applications they are used for loudspeaker equalization, crossover filtering or for sound control in mixing consoles. Modern audio applications like 3D sound require many audio channels to be processed in parallel in high precision. This is often implemented using high order IIR filter chains. A straight forward implementation of IIR filters would lead to bad utilization of the GPU system due to OpenCL's inability of recursive processing, though. In this contribution, an efficient implementation will be presented circumventing the recursive processing problem of OpenCL. It allows the processing of more than 64 audio channels with IIR filters of order 40 or more with scalable latency. It is implemented for all major operating systems in a flexible OpenCL/C++ framework.

Session Level: Intermediate
Session Type: Talk
Tags: Signal & Audio Processing

Day: Wednesday, 03/26
Time: 10:30 - 10:55
Location: Room 210D

S4437 - SPECACCEL: The New Standard for Accelerator Performance Benchmarking

Mathew Colgrove ( Dev Tech Software Engineer, NVIDIA )
Mathew Colgrove
Mathew Colgrove is a Dev Tech Software Engineer with NVIDIA's Portland Group team. Mat's primary role is to help users in porting code to accelerators using OpenACC and CUDA Fortran as well as assisting with general programming questions. Prior to his current position, he was Quality Assurance manager responsible for both building and maintaining PGI's proprietary automated testing environments. Mat is also NVIDIA's SPEC representative www.spec.org on the CPU and HPG committees.
Robert Henschel ( Manager, Scientific Applications and Performance Tuning, Indiana University )
Robert Henschel
Robert Henschel is leading a team of senior developers that create, optimize and maintain software for IU's high performance computing systems. He is responsible for providing support for researchers at Indiana University and users of XSEDE to enable efficient use of IU's HPC systems. As Indiana University's representative in the High Performance Group (HPG) at Standard Performance Evaluation Corporation (SPEC) he is leading IU's benchmarking efforts.

The Systems Performance Evaluation Corporation (SPEC) is a non-profit corporation that produces, maintains and publishes results of standardized performance benchmarks for high-performance computers. SPEC benchmark suites produced by the SPEC High-Performance Group (HPG) are generally comprised of applications focused on scientific and technical computing, coded using standard parallel programming interfaces. SPECACCEL is the latest benchmark from SPEC HPG, and is designed to objectively compare the performance of accelerator hardware systems, accelerator programming models and accelerator-enabled compilers. This talk will give an overview of the SPECACCEL suite, the benchmark run rules and processes for reporting results, and some sample performance results. Finally, we'll take an in-depth look at a few of the benchmarks to see what they can reveal about the performance characteristics of various accelerators.

Session Level: All
Session Type: Talk
Tags: Performance Optimization; Supercomputing

Day: Wednesday, 03/26
Time: 10:30 - 10:55
Location: Room 212B

S4489 - Harnessing Irregular Parallelism: A Case Study on Unstructured Meshes

Cliff Woolley ( Developer Technology Engineer, NVIDIA )
Highly-Rated Speaker
Cliff Woolley
Cliff Woolley is a senior developer technology engineer with NVIDIA. He received his master's degree in Computer Science from the University of Virginia in 2003, where he was among the earliest academic researchers to explore the use of GPUs for general purpose computing. Today he works with developers of high-performance computing applications to fine-tune their algorithms for the CUDA Platform, and he is one of the lead authors of developer documentation in the CUDA Toolkit for application tuning and best practices.

Traversal of unstructured meshes presents an interesting challenge for massively parallel processors such as GPUs. The problem offers abundant but irregular parallelism. Fortunately this irregular parallelism can still be harnessed to provide a speedup using GPUs. This talk presents our work on accelerating UMT2013, a benchmark that performs distributed 3D unstructured-mesh photon transport. UMT leverages both OpenMP and SIMD parallelism on CPUs, but neither by itself is sufficient to allow UMT to scale onto a GPU. Using the CPU and GPU together to detect and resolve sequential dependencies across the mesh, we can maximize parallelism.

Session Level: Intermediate
Session Type: Talk
Tags: Supercomputing; Computational Physics

Day: Wednesday, 03/26
Time: 10:30 - 10:55
Location: Room LL20C

S4550 - Shadertoy: Do You Know What a Fragment Shader Can Do?

Pol Jeremias ( Co-Founder, Beautypi )
Pol Jeremias
Pol Jeremias, originally from Barcelona, is a rendering engineer based in San Francisco, California. At a very young age he developed an interest for computers and software development. This interest progressively became more focused on real-time rendering, photorealism and entertainment. He graduated from University of Southern California with a Master's in Computer Science. Right after, he joined LucasArts where he was part of the rendering team working on Star Wars 1313. Today, he is happily writing algorithms and technology for interactive entertainment.
Inigo Quilez ( Co-Founder, Beautypi )
Inigo Quilez
Inigo Quilez grew up as a teenager enjoying mountains, snow, sea and nature, but also programming fractals, graphics algorithms and all sort of visual experiments. At the age of 18, right upon the discovery of the undeground community of the demoscene and the potential of using code and maths to build beauty, he decided he would spend his life into the creative side of computer graphics. After having finished his degree in Telecom Engineering, and having worked professionally in virtual reality and realtime rendering of massive data sets in Belgium for many years, he is now employed at Pixar Animation Studios, California, doing what he likes the most - inventing techniques and formulas, painting, drawing and creating procedural imagery and animation.

Shadertoy is a website that allows developers to live-code shaders that react to music, videos and webcam using WebGL. These creations are shared with the community, making Shadertoy a great repository for finding inspiration, learning and teaching about shading, reactivity and rendering. The website has one challenge: developers have to create their content by only using one full-screen fragment shader. This restriction has pushed the boundaries of what is possible to render in only two triangles. In this session, we are going to walk the audience through Shadertoy and some of the most innovative, artistic and creative GPU algorithms that our community has been developing over the last 6 months. This includes shaders employing raymarching, procedural texturing, modelling and animation, fractal geometry, image compression and volumetric rendering.

Session Level: Intermediate
Session Type: Talk
Tags: Media & Entertainment Summit; Real-Time Graphics Applications; Rendering & Animation; Recommended Press Session – Media & Entertainment

Day: Wednesday, 03/26
Time: 10:30 - 10:55
Location: Room 211A

S4647 - Accelerating Reverse Time Migration on GPUs: A Dataflow Approach

Hicham Lahlou ( CEO, Xcelerit )
Prior to founding Xcelerit, Hicham served as a Project Leaader at CTVR and a Software Developer at Ariane 5 Rocket. Hicham received is MSc in Electrical Engineering from the Institut national des Sciences appliquées de Lyon. Xcelerit was awarded Best Use of HPC in Financial Services by HPCwire in November 2013 and 2012.

Learn how to map Reverse Time Migration (RTM) applications to a dataflow model for high performance execution on GPUs and multi-core CPUs with improved user developer experience. As Oil & Gas exploration is pushed towards more complex geologies, RTM has become the de-facto standard algorithm to construct images of the Earth's subsurface from seismic wave data. GPUs allow to cope with the enormous computational complexity involved. This talk shows how RTM algorithms can be modeled and implemented as dataflow graphs. The benefits of using this model for high performance execution are detailed, e.g., the exposed levels of parallelism, memory locality, and optimization opportunities. The application code is portable between different hardware and the execution can be managed automatically, improving user experience. We use a practical example to demonstrate the performance that can be achieved.

Session Level: Intermediate
Session Type: Talk
Tags: Energy Exploration

Day: Wednesday, 03/26
Time: 10:30 - 10:55
Location: Room LL20B

S4728 - Full GPU Image Processing Pipeline for Camera Applications: An Overview

Fyodor Serzhenko ( CEO, Fastvideo )
Fyodor Serzhenko is CEO of Fastvideo company. His research interests include high speed cameras and software for high speed imaging, high performance computing. He was graduated from Moscow Institute of Physics and Technology in 1989 and got Ph.D. in physics of semiconductors in 1993.

This session will demonstrate on a general level how to combine fast performance and high quality for full image processing pipeline on GPU for camera applications in real time. We will provide an overview of GPU image processing pipeline for camera and its constituent parts as well as their suitability for the GPU architecture, analysis of achieved results and comparison with existing implementations, applications to machine vision, broadcasting and high speed imaging. For a more technical and detailed presentation on GPU image processing pipleline for camera applications, please see S4151 Full GPU Image Processing Pipeline for Camera Applications.

Session Level: All
Session Type: Talk
Tags: Media & Entertainment Summit

Day: Wednesday, 03/26
Time: 10:30 - 10:55
Location: Room 211B

S4784 - Monte-Carlo Simulation of American Options with GPUs

Julien Demouth ( Developer Technology Engineer, NVIDIA )
Highly-Rated Speaker
Julien is a Developer Technology Engineer at NVIDIA where he works on the optimization of CUDA applications. Julien has a Ph.D. in Computer Science from the INRIA in France.

In that session we will present our work on the computation of the Greeks of multi-asset American options. We will describe our implementation of the Longstaff-Schwartz algorithm and explain the programming techniques used to obtain a very efficient code for the Andersen-QE path discretization. This solution was developed in collaboration with IBM and STAC and is used to calculate the Greeks in real-time on a single workstation with Tesla GPUs.

Session Level: Intermediate
Session Type: Talk
Tags: Finance

Day: Wednesday, 03/26
Time: 10:30 - 10:55
Location: Room 210C

S4811 - Extreme Machine Learning with GPUs

John Canny ( Professor, UC Berkeley )
John Canny
John Canny is a professor in computer science at UC Berkeley. He is an ACM dissertation award winner and a Packard Fellow. Since 2002, he has been developing and deploying large-scale behavioral modeling systems. He designed and protyped production systems for Overstock.com, Yahoo, Ebay, and Quantcast. He also data mines for a hobby, and led the Netflix contest for a couple of months in 2006. Other recent projects include sensorless sensing: sensing stress from mouse and mobile phone data, computer-mediated learning and analysis of MOOC data.

BIDMach is an open-source library for GPU-accelerated machine learning. BIDMach on a single GPU node exceeds the performance of all other tools (including **cluster** systems on hundreds of nodes) for the most common machine learning tasks. BIDMach is an easy-to-use, interactive environment similar to SciPy/Matlab, but with qualitatively higher performance. The session will discuss: Performance: BIDMach follows a "Lapack" philosophy of building high-level algorithms on fast low-level routines (like BLAS). It exploits the unique hardware features of GPUs to provide more than order-of-magnitude gains over alternatives. Accuracy: Monte-Carlo methods (MCMC) are the most general way to derive models, but are slow. We have developed a new approach to MCMC which provides two orders of magnitude speedup beyond hardware gains. Our "cooled" MCMC is fast and improves model accuracy. Interactivity: We are developing interactive modeling/visualization capabilities in BIDMach to allow analysts to guide, correct, and improve models in real time.

Session Level: All
Session Type: Talk
Tags: Big Data Analytics & Data Algorithms; Machine Learning & AI; Scientific Visualization; Bioinformatics & Genomics

Day: Wednesday, 03/26
Time: 10:30 - 10:55
Location: Room LL21F

S4885 - Efficient Parallel Computation on Android

Jason Sams ( Engineer, Google )
Jason Sams
R. Jason Sams is the creator of the Android RenderScript compute runtime. His specialty is getting the most performance possible out of the various hardware present in Android devices. Prior to Android he lead the BeOS 3D driver team and later worked for NVIDIA on OpenGL performance.
Tim Murray ( Engineer, Google )
Tim is a software engineer on the RenderScript team at Google. Prior to joining Google, he managed the CUDA driver team at NVIDIA.

Mobile is very different from traditional desktop and HPC compute, which creates a new set of problems. Android's RenderScript addresses these problems by taking a different approach to compute. We will cover the problems presented by mobile and how RenderScript solves them. Results from K1 running typically RenderScript workloads will also be presented and discussed.

Session Level: Intermediate
Session Type: Talk
Tags: Mobile Summit; Mobile Applications; Programming Languages & Compilers; Recommended Press Session – Mobile

Day: Wednesday, 03/26
Time: 10:30 - 10:55
Location: Room 210E

S4817 - Procedural Content Generation and Shaders

Etienne Caron
Etienne  Caron
An active member of the North American demoscene in the 90s, Etienne was a main organizer for the NAID Demoparties (95/96). Now the Android lead developer for PasswordBox.com, Etienne is also the current lead of GDG Montreal-Android. GDGs are independent, Google backed developer groups. "As an active member of the Montreal mobile developer community over the last 5 years, I've had the chance to organize and take part in multiple local hackathon and events. I've also had the pleasure to do talks on various Android game development APIs and platforms (Unity, libgdx, Cocos2d), OpenGL, Android NDK usage, etc. Giving a technical talk is always an enriching experience, and I'm really looking forward to seeing you all at NVScene."

Demosceners are no strangers to Procedural Generation. In the early days computing resources were very limited, and Procedural Generation algorithms provided ways to generate complex and entrancing graphical constructs, going beyond simple disk space or memory limitations. With the mobile revolution upon us, we now all have extremely powerful computers that fit in our pockets. But oddly, we are now faced with new limitations, in the form of bandwith, storage space, etc. Procedural generation allow us to take simple seeds, and with the right algorithm turn them into massive universes to explore. This talk will focus on Procedural Content Generation techniques, and how to use them in concert with OpenGL ES Shaders to maximum effect. Note: This talk will be using the NVIDIA Shield and Tegra tools suite for it's example code, but could be of interest to anyone new to Procedural Content Generation and Shaders.

Session Level: Intermediate
Session Type: Talk
Tags: Real-Time Graphics Applications; Mobile Applications; NVScene

Day: Wednesday, 03/26
Time: 11:00 - 11:50
Location: Room 230C

S4815 - Rules of Thumb for (slightly) Better Design

Thomas Mann
Thomas  Mann
Pixtur has been playing around with code and design for almost three decades (but he feels much younger). As a member of demoscene groups such as Still, Bauknecht, and Haujobb he's won several competitions and awards in a variety of categories. Recently he and two other demosceners founded Framefield, a small Berlin-based agency for interactive realtime graphics.

Through his experience in the demoscene, education as an architect, and day job as a designer, Pixtur has assembled a basket of easy-to-understand and easy-to-implement rules of thumb for creating better design – with or without programming. He will cover a wide range of topics like ideation, concept, colors, composition, editing and animation – all illustrated with examples from the demoscene.

Session Level: All
Session Type: Talk
Tags: Real-Time Graphics Applications; NVScene

Day: Wednesday, 03/26
Time: 12:00 - 12:50
Location: Room 230C

S4820 - Multi-Platform Graphics Done Right

Bent Stamnes
Bent Stamnes
Bent Stamnes has been involved with real-time graphics since he became a demoscener in 1989. He has served as senior editor for the digital magazine ZINE, and is the founder of the cross-platform digital creation community Displayhack.org, as well as the co-author of the book "Demoscene: The Art of Real-Time". He has been a featured speaker on real-time graphics and the demoscene at the FMX conference on Animation, Effects, Games and Interactive Media since 2005, and at other conferences such as FITC Amsterdam and Toronto.

Since 2011, Outracks Technologies has been developing two unique pieces of technology: a new programming language ("Uno") and a toolsuite ("Realtime Studio"). Both have been designed to bring wow-factor, productivity and unparalleled portability to mainstream app development on mobile, the web and native platforms - with a special focus on graphics programming. Uno unites graphics code and logic code into a single high-level language, and this talk is an introduction to Realtime Studio and its benefits for developers and artists alike.

Session Level: All
Session Type: Talk
Tags: Real-Time Graphics Applications; Programming Languages & Compilers; Mobile Applications; NVScene

Day: Wednesday, 03/26
Time: 13:00 - 13:50
Location: Room 230C

S4139 - Follow the Light: Plasma Physics on 18,000 GPUs

Richard Pausch ( Ph.D. Student, Helmholtz-Zentrum Dresden - Rossendorf )
Richard Pausch
Richard Pausch is a Ph.D. student in the Junior Group Computational Radiation Physics at Helmholtz-Zentrum Dresden - Rossendorf. His research focuses on radiative processes in relativistic plasma physics.
Guido Juckeland ( System Engineer, ZIH, Technical University Dresden )
Guido Juckeland
Guido Juckeland is one of the founders of the CUDA Center of Excellence Dresden. He helps scientists to optimize their codes for accelerator hardware such as GPUs.

We show that with todays largest supercomputers it is possible to follow the trajectories of billions of particles, computing a unique fingerprint of their dynamics. With the use of 18,000 GPUs we could compute a 'sky map' of the radiation emitted by individual electrons in a large-scale, turbulent plasma, providing unique insight into the relation between the plasma dynamics and observable radiation spectra.

Session Level: Intermediate
Session Type: Talk
Tags: Astronomy & Astrophysics; Computational Physics; Supercomputing

Day: Wednesday, 03/26
Time: 14:00 - 14:50
Location: Room LL21F

S4191 - Enhancement of Tegra Tablet's Computational Performance by GeForce Desktop and Wifi

Di Zhao ( Postdoc Researcher, Ohio State University )
Di Zhao
Dr. Di Zhao is postdoctoral research of Ohio State University. He received his PhD of Computational Analysis and Modeling from Louisiana University in 2010, and he worked in Department of Biomedical Informatics of Columbia University from 2010 to 2012. His research interests are GPU computing and parallel algorithms.

Learn how to develop Tablet-Wifi-Desktop based applications to enhance your tablet's performance. Discussion of this talk includes: (1) a comprehensive discussion of the most popular architecture Table-Wifi-Desktop; (2) an introduction to programming approaches for Tablet-Wifi-Desktop; (3) an introduction to decomposition of computationally intensive problems such as Fast Fourier Transform for computer graphics on Tablet-Wifi-Desktop; (4) computational results to illustrate the performance enhancement of Tegra Tablet by Tablet-Wifi-Desktop.

Session Level: All
Session Type: Talk
Tags: Mobile Summit; Mobile Applications

Day: Wednesday, 03/26
Time: 14:00 - 14:25
Location: Room 210E

S4201 - GPU Acceleration of Sparse Matrix Factorization in CHOLMOD

Steven Rennich ( Senior HPC Developer Technology Engineer, NVIDIA )
Highly-Rated Speaker
Steven Rennich
Steven Rennich is a Sr. NVIDIA HPC Developer Technology Engineer. His primary activities include promoting the use of GPUs in computational structural mechanics and the development and optimization of parallel algorithms for direct and iterative solvers for sparse linear systems. Steve holds a Ph.D. in Aeronautics and Astronautics from Stanford University where his research involved computational fluid mechanics and vortex system instabilities. Prior to joining Nvidia, Steve spent many years parallelizing structural analysis and rigid body dynamics codes.
Tim Davis ( Professor, University of Florida )
Tim Davis
Tim Davis is a professor in Computer and Information Science and Engineering at the University of Florida. He is a Fellow of the Society of Industrial and Applied Mathematics (SIAM), in recognition for his work on sparse matrix algorithms. His software for sparse direct methods appears in 100s of applications in industry, academia, and government labs, including MATLAB (x=A), Mathematica, NASTRAN, Cadence, Mentor Graphics, Google Ceres (StreetView, PhotoTours), IBM, Berkeley Design Automation, Xyce, and many others. For a full CV, see http://www.cise.ufl.edu/~davis/background.html .

Sparse direct solvers, and their requisite factorization step, are a critical component of computational engineering and science codes. High performance is typically achieved by reducing the sparse problem to dense sub-problems and applying dense math kernels. However, achieving high performance on a GPU is complicated due to the range of sizes of the dense sub-problems, irregular memory access patterns, and the limited communication bandwidth between the host system and the GPU. This talk will describe the high factorization performance achieved in CHOLMOD using the GPU and discuss in detail key techniques used to achieve this performance including minimizing communication and maximizing concurrency.

Session Level: Intermediate
Session Type: Talk
Tags: Numerical Algorithms & Libraries; Big Data Analytics & Data Algorithms; Computational Structural Mechanics

Day: Wednesday, 03/26
Time: 14:00 - 14:50
Location: Room LL20D

S4216 - Beyond 4k: Video Walls and Interactive Displays at High Resolutions using Multi-Machine Clusters

Erik Beaumont ( COO, Ventuz )
Highly-Rated Speaker
After getting a joint honors degree in Chinese/Computer Science, and a brief stint as a database programmer for telephone databases, Erik worked for Avid/Softimage as senior product specialist, travelling the world and evangelising XSI. Following the aquisition of Softimage by Autodesk, he took a brief stint back at University and got an MPA in Organisational Change from the University of Missouri, Columbia. He then returned to semi-familiar stomping grounds as the global product manager for Ventuz in Germany. There he helped Ventuz mature and grow as a product and company until he took on a leadership role as COO in early 2013.

See how companies and broadcasters are tackling the challenge of needing to fill ever larger displays with ever higher resolutions with dynamic, well designed content - which additionally often need to be interactive. In this talk, we will look at a variety of challenges, from the technical hurdles of clustering and framelocking across large video walls, projector setups and multiple machines, to the issues of producing and running content at these very high resolutions. We will discuss how GPUs and realtime rendering are the only feasible answer going forward as resolutions increase as well as look into how interactivity can be achieved in these settings. We will discuss how we have dealt with these problems in real world projects and also look into how the future of large scale displays, whether LED walls, high density displays or high resolution projectors, is shaping and what benefits and risks these technologies might bring.

Session Level: All
Session Type: Talk
Tags: Digital Manufacturing Summit; Collaborative & Large Resolution Displays; Media & Entertainment; Real-Time Graphics Applications

Day: Wednesday, 03/26
Time: 14:00 - 14:50
Location: Room 210I

S4227 - GPU Implementation of Explicit and Implicit Finite Difference Methods in Finance

Mike Giles ( Professor of Scientific Computing, University of Oxford )
Mike Giles
Prior to joining the faculty of Oxford University, Mike was an Assistant/Associate Professor at MIT. at Oxford, Mike is also a CUDA Fellow and Director of the Oxford University CUDA Center of Excellence.

This talk will explain how to achieve excellent performance with GPU implementations of standard explicit and implicit finite difference methods in computational finance. Implicit methods are much harder to implement efficiently, but the task is made easier through the development of library software for the solution of multiple tridiagonal systems in parallel. The implementation strategies depend on the size and dimensionality of the problems being solved. 1D problems can be solved within one SMX unit of a GPU, 2D problems usually require more than one SMX, and 3D / 4D problems require the entire GPU for their solution. Computational performance results will be given for Kepler GPUs, and the talk will also discuss whether single precision arithmetic provides sufficient accuracy.

Session Level: All
Session Type: Talk
Tags: Finance

Day: Wednesday, 03/26
Time: 14:00 - 14:50
Location: Room 210C

S4313 - PRNGCL: OpenCL Library of Pseudo-Random Number Generators for Monte Carlo Simulations

Vadim Demchik ( Researcher, Dnepropetrovsk National University )
Vadim Demchik
Vadim Demchik is a researcher at Quantum Chromoplasma Laboratory of Dnepropetrovsk National University (Ukraine) where he works in the field of high-energy and particle physics at external conditions. Last 6 years he has specialized in the Monte Carlo lattice simulations on graphics processing units. He obtained Ph.D. degree in 2004 in Theoretical Physics from Dnepropetrovsk National University, Ukraine.

Learn how to construct easily Monte Carlo procedures on GPUs with a new open-source OpenCL library of pseudo-random number generators (PRNG) - PRNGCL. We will introduce our OpenCL implementation of most popular uniform PRNGs and briefly discuss the general techniques of PRN generation on GPUs. The performance comparison of existing PRNG libraries with PRNGCL will be provided. Some examples of PRNGCL library application for high-energy physics lattice simulations will be given.

Session Level: Intermediate
Session Type: Talk
Tags: Computational Physics

Day: Wednesday, 03/26
Time: 14:00 - 14:25
Location: Room 212A

S4328 - Object Tracking Under Nonuniform Illumination Conditions

Kenia Picos ( Postgraduate Student, CITEDI-IPN )
Kenia Picos is currently a postgraduate fellow at National Polytechnic Institute in Tijuana, México. Her research interest includes image processing, mathematical modeling and computer graphics.

The goal of this session is to demonstrate the performance of object tracking with correlation filtering by using nonuniform illuminated scenes. For this work, there are two fundamental limiters to kernel performance: memory usage and processed frames per second. In this session we will describe how to use source code basis for image processing and correlation techniques. Concepts will be illustrated with an example of object recognition and tracking, using ArrayFire, a new generation of image processing libraries that improves sequential algorithms to the highly parallel GPU and multicores architectures.

Session Level: Beginner
Session Type: Talk
Tags: Video & Image Processing; Computer Vision

Day: Wednesday, 03/26
Time: 14:00 - 14:25
Location: Room LL21A

S4342 - CUDA-Accelerated MATLAB without Parallel Computing Toolbox for 3D Medical Image Segmentation

Jung W. Suh ( Senior Research Scientist, KLA-Tencor )
Jung W. Suh
Jung W. Suh is a senior algorithm engineer and research scientist at KLA-Tencor. Dr. Suh received his Ph.D. from Virginia Tech in 2007 for his 3D medical image processing work. He was involved in the development of MPEG-4 and Digital Mobile Broadcasting (DMB) systems in Samsung Electronics. He was a senior scientist at HeartFlow, Inc., prior to joining KLA-Tencor.His research interests are in the fields of biomedical image processing, pattern recognition, machine learning and image/video compression. He has more than 30 journal and conference papers and 6 patents.

Learn how to accelerate your MATLAB codes using CUDA without Parallel Computing Toolbox. Although the Parallel Computing Toolbox is useful for speeding up, this toolbox may not be accessible to every MATLAB user and may have limitations in fully exploiting the power of both MATLAB and CUDA. For the purpose of general speeding up of MATLAB applications, the GPU-utilization through c-mex would provide more flexibility and power in many situations. This session will go through the MATLAB implementation of the atlas-based 3D hippocampus segmentation for MRI image as an example. The atlas-based segmentation is widely used in neuroimage analysis due to its reliable segmentation result even for the challenging target objects with ambiguous and complicated boundaries. However, it requires a high computational power because 3D image registration is used during the segmentation process. This session will show the each step of CUDA optimization for our atlas-based segmentation MATLAB codes from profiling to CUDA conversions through c-mex.

Session Level: Intermediate
Session Type: Talk
Tags: Medical Imaging & Visualization; Video & Image Processing; Computer Vision

Day: Wednesday, 03/26
Time: 14:00 - 14:25
Location: Room LL21B

S4359 - Real-Time Electromagnetic Wave Propagation Using OptiX for Simulation of Car-to-Car-Communication

Manuel Schiller ( Research Assistant / PhD-Student, Technische Universität München )
Manuel Schiller
Since 2012 Research Assistant at the chair of Robotics and Embedded Systems, Technische Universität München. Manuel received Diploma in Mechatronics and Information Technology, Technische Universität München in 2012.

In this session we present a real-time simulation of electromagnetic wave propagation using OptiX GPU ray tracing. This simulation is used in virtual test drives to allow testing of Advanced Driver Assistance Systems which will be based on wireless Car-to-Car communication. Learn how ray tracing performance can be improved to archieve real-time simulations and how the ray tracing results are post-processed to perform the electromagnetic calculations on the GPU using the Thrust library.

Session Level: Beginner
Session Type: Talk
Tags: Automotive; Ray Tracing; Computational Physics; Recommended Press Session – Auto

Day: Wednesday, 03/26
Time: 14:00 - 14:25
Location: Room 210A

S4386 - How to Combine OpenMP, Streams, and ArrayFire for Maximum Multi-GPU Throughput

Shehzan Mohammed ( Developer, ArrayFire )
Shehzan Mohammed is a developer at ArrayFire. Shehzan completed his Master's in Computer Graphics and Game Technology from the University of Pennsylvania. He started out programming rendering algorithms on GPUs and now contributes to ArrayFire, a general purpose software library accelerated by CUDA and OpenCL.

You've finished tuning your algorithm on a single GPU, and it's time to integrate it into your multi-threaded host code. What's next? This session will explore how to combine CUDA sterams and contexts with OpenMP threads to maximize throughput. It will also cover how this mapping works for out-of-core problems to keep your GPUs fed with data. The session will cover these techniques in CUDA and the ArrayFire library for productive GPU computing using examples from the analysis of large-scale financial data and a structure from motion algorithm from computer vision. Attendees will leave with an excellent understanding of how to handle out-of-core data; the ability to program using CUDA streams and how to integrate these with ArrayFire and; knowledge of three techniques for mapping OpenMP threads to CUDA devices and when to use each technique.

Session Level: Intermediate
Session Type: Talk
Tags: Performance Optimization

Day: Wednesday, 03/26
Time: 14:00 - 14:50
Location: Room 212B

S4394 - Attacking HIV with Petascale Molecular Dynamics Simulations on Titan and Blue Waters

James Phillips ( Senior Research Programmer, University of Illinois )
James Phillips
James Phillips is a Senior Research Programmer in the Theoretical and Computational Biophysics Group at the Beckman Institute for Advanced Science and Technology at the University of Illinois at Urbana-Champaign. He has a Ph.D. in Physics from the University of Illinois. Since 1999, James has been the lead developer of the highly scalable parallel molecular dynamics program NAMD, for which he received a Gordon Bell Award in 2002. His research interests include improving the performance and accuracy of biomolecular simulations through parallelization, optimization, hardware acceleration, better algorithms, and new methods.

The highly parallel molecular dynamics code NAMD was chosen in 2006 as a target application for the NSF petascale supercomputer now know as Blue Waters. NAMD was also one of the first codes to run on a GPU cluster when G80 and CUDA were introduced in 2007. When Blue Waters entered production in 2013, the first breakthrough it enabled was the complete atomic structure of the HIV capsid through calculations using NAMD, featured on the cover of Nature. How do the GPU-accelerated Cray XK7 Blue Waters and ORNL Titan machines compare to CPU-based platforms for a 64-million-atom virus simulation? Come learn the opportunities and pitfalls of taking GPU computing to the petascale and the importance of CUDA 5.5 and Kepler features in combining multicore host processors and GPUs in a legacy message-driven application.

Session Level: All
Session Type: Talk
Tags: Supercomputing; Molecular Dynamics; Recommended Press Session – HPC-Science; Recommended for All Press

Day: Wednesday, 03/26
Time: 14:00 - 14:50
Location: Room LL21C

S4465 - Optimizing CoMD: A Molecular Dynamics Proxy Application Study

Nikolay Sakharnykh ( Developer Technology Engineer, NVIDIA )
Nikolay Sakharnykh
Nikolay Sakharnykh is a developer technology engineer at NVIDIA. He joined NVIDIA in 2008 primarily to work on 3D graphics and games. During that time he enjoyed GPU computing and started working on CUDA applications and GPGPU for graphics. In 2012 he joined the HPC developer technology group to focus on GPU computing applications. He works closely with software and architecture teams at NVIDIA to develop new CUDA libraries, ensure best possible performance of key GPU applications and study algorithms behavior on future hardware. His background is computational fluid dynamics, although recently he has been working with various molecular dynamics applications. His main interests include linear algebra, tridiagonal solvers, graphs and AMG methods.
Jamal Mohd-Yusof ( Research Scientist, LANL )
Jamal Mohd-Yusof
Jamal Mohd-Yusof is member of the Collaborative Programming team in CCS-7 at Los Alamos National Laboratory. He was part of team which worked on Open Science programming for the Roadrunner petaflop machine, where he developed a novel distributed tri-diagonal solver needed to achieve significant speedup of the CFDNS-RR code. Although his training is in fluid mechanics he has been working with advanced architectures for several years, and teaches OpenCL courses at LANL. He is currently developing and profiling physics algorithms for a variety of advanced architectures, most recently focusing on molecular dynamics simulations.

Learn about various methods and trade-offs in the distributed GPU implementation of molecular dynamics proxy application that achieves more than 90% weak scaling efficiency on 512 GPU nodes. CoMD represents a reference implementation of classical molecular dynamics algorithms and workloads. It is created and maintained by The Exascale Co-Design Center for Materials in Extreme Environments (ExMatEx) and is part of the R&D100 Award-winning Mantevo 1.0 software suite. In this talk we will discuss the main techniques and methods that are involved in GPU implementation of CoMD, including (1) cell-based and neighbor list approaches for neighbor particles search, (2) different thread-mapping strategies and memory layouts. An efficient distributed implementation will be covered in detail. Interior/boundary cells separation is used to allow efficient asynchronous processing and concurrent execution of kernels, memory copies and MPI transfers.

Session Level: Intermediate
Session Type: Talk
Tags: Molecular Dynamics; Computational Physics

Day: Wednesday, 03/26
Time: 14:00 - 14:50
Location: Room LL21E

S4468 - Cross-Platform Performance Portability Using OpenACC

Michael Wolfe ( Compiler Engineer, The Portland Group )
Highly-Rated Speaker
Michael Wolfe
Michael Wolfe has been a compiler engineer at The Portland Group since joining in 1996, where his responsibilities and interests have included deep compiler analysis and optimizations ranging from improving power consumption for embedded microcores to improving the efficiency of Fortran on parallel clusters. He was an associate professor at the Oregon Graduate Institute from 1988 until 1996, and was a cofounder and lead compiler engineer at Kuck and Associates, Inc., prior to that. He earned a PhD in Computer Science from the University of Illinois, and has published one textbook, "High Performance Compilers for Parallel Computing", a monograph, "Optimizing Supercompilers for Supercomputers", and many technical papers.

OpenACC is designed to support performance portable parallel programming across a wide variety of heterogeneous and parallel node configurations. Learn what that means and how it affects the programs you write today and in the future. Examples will include NVIDIA Kepler and AMD Radeon targets.

Session Level: Intermediate
Session Type: Talk
Tags: Programming Languages & Compilers; Supercomputing; Recommended Press Session – HPC-Science

Day: Wednesday, 03/26
Time: 14:00 - 14:25
Location: Room LL20C

S4507 - Evaluation of Parallel Hashing Techniques

Rajesh Bordawekar ( Research Staff Member, IBM T. J. Watson Research Center )
Rajesh Bordawekar
Dr. Rajesh Bordawekar is a research staff member at the IBM T. J. Watson Research Center at Yorktown Heights, NY. Rajesh Bordawekar received his M.S and Ph.D. in Computer Engineering in 1993 and 1996, respectively.

This presentation will cover techniques for implementing hashing functions on the GPU. We will describe various parallel implementations of hashing techniques, e.g., cuckoo hashing, Partitioned Hashing, Bin-Hash, bloom filters, etc. and then present different ways of implementing these functions on the GPU, with emphasis on data structures that exploit GPU's data parallel features as well as memory constraints.

Session Level: Intermediate
Session Type: Talk
Tags: Big Data Analytics & Data Algorithms; Programming Languages & Compilers

Day: Wednesday, 03/26
Time: 14:00 - 14:25
Location: Room 210B

S4537 - Making Games Sound as Good as They Look: Real-time Geometric Acoustics on the GPU

Zuofu Cheng ( Research Assistant, University of Illinois at Urbana-Champaign )
Zuofu Cheng is a graduate student at University of Illinois at Urbana Champaign persuing his doctorate in Electrical Engineering, working with Professors Lippold Haken and Eric Shaffer. His interests are in GPU computing, sound and acoustics in gaming, game design and development, and embedded systems. He has worked as a teaching assistant for several classes in the engineering curriculum: ECE 210 (Analog Signal Processing), ECE 343 (Circuit Laboratory), and ECE 402 (Electronic Music Synthesis). He is the primary instructor for ECE 395 (Advanced Digital Systems Lab), where he made significant changes to the curriculum to improve the quality of teaching and final projects in the lab. He was the primary engineer behind avidengine, a real-time geometric acoustic engine for video games, as well as a contributing engineer to work done in mesh optimization for simulation codes at the University of Illinois Computational Science and Engineering program.

Geometric acoustics (GA), which involves directly simulating in real-time the acoustic transfer between sound sources and listeners in a virtual space, is considered the holy grail of game audio. We present a GA method and optimizations which along with the massive parallelism of modern GPUs allows for immersive sound rendering at interactive frame-rates. This talk focuses on optimizations made for Fermi and Kepler GPUs on the two main components of our engine: the ray-acoustic engine and the per-path head-related transfer function (HRTF) renderer. Audio examples will be given using the open-source ID Tech 3 engine, comparing original assets from the Quake 3 game rendered via traditional positional audio to the same assets processed through our engine.

Session Level: Beginner
Session Type: Talk
Tags: Signal & Audio Processing; Game Development; Virtual & Augmented Reality; Defense

Day: Wednesday, 03/26
Time: 14:00 - 14:25
Location: Room 210D

S4599 - An Adventure in Porting: Adding GPU-Acceleration to Open-Source 3D Elastic Wave Modeling

Robin Weiss ( Research Programmer, The University of Chicago )
Robin Weiss
Robin has a strong background in a wide range of applied computer science topics including swarm intelligence and evolutionary algorithms, inverse problems, and scientific visualization. He currently works at the University of Chicago's Research Computing Center providing technical consulting services to researchers. Robin has 5 years experience with CUDA and heterogeneous high-performance computing.

In this session we will describe our experience porting finite-difference time-domain (FDTD) algorithms for solving 3D anisotropic elastic wave equations to GPU, and extending the implementation to support clusters of GPU-equipped compute nodes. These implementations have been integrated with the open-source Madagascar seismic processing software package to allow for accelerated computation of 3D anisotropic elastic wave models. In our work we adopt a straightforward porting strategy that leads to a transparent yet high-performance implementation suitable for mid-sized computational grids. The approach is based on a stress-stiffness formulation on a non-staggered grid and achieves significant speedup compared to a parallel CPU-based implementation allowing for computation of seismic data at lower hardware cost and in less time than was previously possible. We also report details of our implementation strategy as well as performance evaluations in varied heterogeneous compute environments with a number of different GPU architectures.

Session Level: Intermediate
Session Type: Talk
Tags: Energy Exploration; Computational Physics; Scientific Visualization; Numerical Algorithms & Libraries

Day: Wednesday, 03/26
Time: 14:00 - 14:25
Location: Room LL20B

S4679 - Accelerating the DNA Sequencing Variant Calling Pipeline

Mauricio Carneiro ( Group Lead, Computational Technology Development, Broad Institute of MIT and Harvard )
Mauricio Carneiro
Dr. Mauricio Carneiro is a computer engineer by training with a Ph.D. in Evolutionary Biology from Harvard University. He leads the computational technology development team in the Broad Institute which is responsible for some of the main computational tools used in genetics and genomics research today. Some of the main contributions from Dr. Carneiro's team are the Genome Analysis Toolkit (GATK) and the best practice guidelines for DNA data processing and analysis. The GATK is the de facto standard tool in research, industry and hospitals around the world.

Learn about the best practice variant calling pipeline that drives every DNA sequencing project in the world be it for research, industry or to diagnose a patient in critical condition. Here we present the different approaches to optimize and accelerate key parts of this pipeline. First we will give you an overview of the process and how researchers around the world are using DNA sequencing data to understand complex and rare variants and their associations with disease. Second we will show you the work we have done to speed up this pipeline through use of GPUs and other technologies. Third we will discuss a new version of the pipeline that takes advantage of the optimizations to enable incremental analysis, that is, leveraging all historical data on every new sequencing project with minimal overhead. We close this presentation by discussing the many points that are still open for optimization and how the community can get involved.

Session Level: Intermediate
Session Type: Talk
Tags: Bioinformatics & Genomics

Day: Wednesday, 03/26
Time: 14:00 - 14:50
Location: Room LL21D

S4697 - Realtime Preview For VFX: Challenges and Rewards

Damien Fagnou ( Global Head Of VFX Operations, MPC )
Damien Fagnou
After finishing University with a Masters in Computer Science In France, Damien worked for an animated series implementing the technology to speed up the motion capture pipeline and rendering. He later accepted a job to help set up the workflow at Attitude studios and then, with his sights set on moving overseas, Damien took on the role of Tools and Workflow Programmer at Climax in the UK. In 2003, Damien transferred his skills to the film industry and started at leading VFX post production studio MPC to work on Troy, implementing preview tools and City Rendering scripts. In 2005, Damien became R&D lead on Charlie and the Chocolate Factory, 10,000 BC and Narnia. Damien then moved closer to production and became MPC's Stereographer working on movies including Pirates of the Caribbean: On Stranger Tides, the Harry Potter films and more recently Prometheus. After a few years in production Damien returned to his software roots and became Global Head of Software overseeing software development efforts across the company. Recently Damien moved up to a wider role as Global Head of VFX Operations, bringing together his expertise in both software and production to continue to evolve and refine the creation processes across all feature film VFX work at MPC.

In this session we will discuss the challenge and benefits of interactively visualizing large scenes in modern big budget VFX-driven movies. We will share some examples of the scale and complexity we experienced in our recent productions at MPC and the value of being able to visualize them without the need to go through long offline render processes. We will show initial results of our work done using Nvidia's OptiX framework and Fabric Engine to assemble and render large scenes in an interactive environments taking advantage of the power of high end GPUs.

Session Level: All
Session Type: Talk
Tags: Media & Entertainment Summit; Visual Effects & Simulation; Rendering & Animation; Large Scale Data Visualization & In-Situ Graphics; Recommended Press Session – Media & Entertainment

Day: Wednesday, 03/26
Time: 14:00 - 14:25
Location: Room 211A

S4843 - Virtual Engine Assembly Training Utilizing zSpace 3D Stereo Displays

Jeff Fisher ( Research Associate, VR and CAD/CAM, National Institute for Aviation Research, Wichita State University )
Jeff Fisher graduated from Wichita State University with a degree in Aerospace Engineering in 2007. He is the VR lab manager at WSU’s National Institute for Aviation Research (NIAR). Jeff specializes in CATIA V5 training, and VR simulations.

This session will showcase the use of digital models of an aircraft engine developed in CATIA to create an interactive learning experience on the ZSpace platform. This includes a customized environment developed using 3DVIA Studio Pro to provide students the ability to gain the necessary knowledge of how to assemble an aircraft engine.

Session Level: All
Session Type: Talk
Tags: Digital Manufacturing Summit; Combined Simulation & Real-Time Visualization; Real-Time Graphics Applications

Day: Wednesday, 03/26
Time: 14:00 - 14:25
Location: Room 210H

S4886 - Working with the Latest Oculus Rift Hardware and Software

Michael Antonov ( Chief Software Architect, Oculus )
Michael Antonov
Michael was co-founder and CTO of Scaleform, the #1 user interface (UI) technology provider in the video game market, which was acquired by Autodesk in March 2011. At Scaleform, he was the lead architect of the Scaleform GFx hardware-accelerated Flash vector graphics engine. Michael is an expert in complex multi-threaded architecture, computer graphics, programming language design, and engineering management.

Oculus VR is revolutionizing the way people experience 3D worlds. The company's first product, Oculus Rift, is a virtual reality headset that allows users to step inside virtual environments. It provides an immersive, stereoscopic 3D experience with an ultra-wide field of view and super-low-latency head tracking. Since the debut of the Oculus Rift development kit at Game Developers' Conference 2013, Oculus has added a high-definition display, positional tracking and low-persistence support. Also, we've made critical improvements to the Oculus SDK, adding new features while making things simpler and reducing latency. In this talk, we'll discuss everything you need to get started integrating the latest Oculus Rift hardware with your 3D environment. The talk includes an overview of the latest hardware, a technical breakdown for engineers and a game design discussion. We'll also talk about our vision for future hardware development leading to the consumer Rift.

Session Level: All
Session Type: Talk
Tags: Media & Entertainment Summit; Virtual & Augmented Reality; Defense; Medical Imaging & Visualization

Day: Wednesday, 03/26
Time: 14:00 - 14:25
Location: Room 211B

S4948 - If You Build It, Will They Come? Better Question Is, Will They Stay?

Dane Young ( Solutions Architect, Entisys Solutions )
Dane Young
Dane Young is a Citrix Technology Professional (CTP) and Solutions Architect with Entisys Solutions in Northern California, specializing in the design and deployment of application, desktop and server virtualization technologies from Citrix, Microsoft and VMware. Dane organizes and maintains a blog for virtualization and cloud enthusiasts which can be found at blog.itvce.com. Dane holds Masters of Business Administration and Management Information Systems degrees along with several industry certifications from Citrix, Microsoft, and VMware.

Building on S4726 (Intro to Virtualization) and S4783 (Virtual is Better than Physical), this session will take the audience through the most crucial phases of the development lifecycle: Pilot, Production Build, and Roll-out. Regardless of your motivations and business drivers to virtualize, if users don't catch the organizational vision, adoption may fail and projects may stall. If not tended to properly, this could turn the organization's pricey CapEx investment into a rather large paperweight! In this session, attendees will learn from the trenches what to do, and what not to do, when it comes time to extend the virtualized solutions to end users. Staying actively engaged during this phase will ensure that the solution continues to move forward and achieve enterprise adoption.

Session Level: Advanced
Session Type: Talk
Tags: Graphics Virtualization Summit; Desktop & Application Virtualization; Remote Graphics & Cloud-Based Graphics

Day: Wednesday, 03/26
Time: 14:00 - 14:25
Location: Room 210F

S4206 - Hardware and Software Design for a 1000 FPS Real-Time Soft-Field Tomography System

Patrik Gebhardt ( Ph.D. Student, Ruhr-University Bochum )
Patrik Gebhardt
Patrik is a Ph.D. Student at the Institute of Electronic Circuits researching the combination of several tomographic imaging techniques for determining the volume fractions of the components of multi-phase flows. During my studies of Electrical Engineering I have been developing GPU based PIC and FDTD software for plasma and metamaterial simulation.

See how to build up a high speed, low latency real-time measurement system for industrial process tomography based on Electrical-Impedance-Tomography, which is capable to generate more than 1000 cross-sectional images of a pipe per second, using (A) FPGAs for data acquisition of several ADCs and preprocessing in parallel and (B) GPUs for solving the underlying PDE and reconstruct these images with a latency of approx. 50 ms. Examples of the signal processing algorithms as well as the methods used to accelerate the reconstruction process on GPUs will be given.

Session Level: Beginner
Session Type: Talk
Tags: Signal & Audio Processing; Medical Imaging & Visualization

Day: Wednesday, 03/26
Time: 14:30 - 14:55
Location: Room 210D

S4364 - Enabling Real-Time Cancer Research Tools: Accelerating Analysis of Cell Responses to Chemotactic Gradients

Jimmy Pettersson ( GPU Computing Specialist, High Performance Consulting )
Jimmy has over 4 years experience programming GPUs in areas ranging from signal & image processing, finance, and medical applications.

Learn how we used CUDA to accelerate cancer research by building a complete real-time automated analysis tool for research scientists. By shortening an analysis process down from days to less than a minute we're enabling scientists to spend more time focusing on their work: cancer research, molecular drug screening on a cellular level, etc,. The talk will also get into some of the computational challenges and algorithm design opportunities that were seized upon.

Session Level: Intermediate
Session Type: Talk
Tags: Medical Imaging & Visualization; Bioinformatics & Genomics

Day: Wednesday, 03/26
Time: 14:30 - 14:55
Location: Room LL21B

S4395 - Real-Time Quantification Filters for Multidimensional Databases

Peter Strohm ( Software Developer, Jedox AG )
Peter Strohm
Peter Strohm obtained his diploma in Computer Science from the University of Freiburg, Germany, in 2008. After that he joined the Inline Processing Team at Fraunhofer Institute for Physical Measurement Techniques IPM, Freiburg, as a software developer for parallel real-time applications. Since 2013, he has been with Jedox as a GPU developer.

Learn how GPUs can speed up real-time calculation of advanced multidimensional data filters required in data analytics and business intelligence applications. We present the design of a massively parallel "quantification" algorithm which, given a set of dimensional elements, returns all those elements for which ANY (or ALL) numeric cells in the respective slice of a user-defined subcube satisfy a given condition. Such filters are especially useful for the exploration of big data spaces, for zero-suppression in large views, or for top-k analyses. In addition to the main algorithmic aspects, attendees will see how our implementation solves challenges such as economic utilization of the CUDA memory hierarchy or minimization of threading conflicts in parallel hashing.

Session Level: Intermediate
Session Type: Talk
Tags: Big Data Analytics & Data Algorithms; Finance

Day: Wednesday, 03/26
Time: 14:30 - 14:55
Location: Room 210B

S4500 - Accelerating Particle-Mesh Interaction for Particle-in-Cell Simulation

Alberto Madonna ( Graduate Student, University of Padova )
Alberto Madonna
Alberto Madonna obtained his Bachelor's Degree in Aerospace Engineering at the University of Padova, with the thesis "Implementation of a Sparse Bundle Adjustment Algorithm on an NVIDIA CUDA GPU", under supervision of Professor Stefano Debei. Currently, Alberto attends the laboratory course in Aerospace Propulsion at the University of Padova, where he focuses on numerical simulation of plasma propulsion. He is expected to obtain his Master's degree in Aerospace Engineering at the University of Padova in October 2013, with the thesis "Development of Software Modules for Simulating on GPU the Dynamics of Plasma for Electrical Spacecraft Propulsion", under supervision of Professor. Daniele Pavarin and Dr. Marco Manente. He is proficient in MATLAB, C/C++ and CUDA C, and has been programming GPUs since 2009.

We present an extremely innovative GPU implementation of a Particle-in-Cell code for plasma dynamics simulation on 3-D unstructured grids. Starting from a proven codebase, we integrate solutions and ideas coming from a thorough study of the state-of-the-art in parallel plasma simulation and other fields, adding some original contributions in areas such as workload management, particle ordering and domain decomposition. The result is a novel, flexible simulation pipeline, capable of performing more than an order of magnitude faster than the CPU implementation it originates from, while still presenting exciting opportunities for future developments. Moreover, all the concepts presented are applicable not only to Particle-in-Cell simulation, but in general to any simulation relying on the interaction between lagrangian particles and a spatial grid.

Session Level: Intermediate
Session Type: Talk
Tags: Computational Physics; Numerical Algorithms & Libraries

Day: Wednesday, 03/26
Time: 14:30 - 14:55
Location: Room 212A

S4510 - A Parallel GPU Solution to the Maximal Clique Enumeration Problem for CBIR

Christopher Henry ( Assistant Professor, University of Winnipeg )
Christopher Henry
Christopher received his Ph.D., Department of Electrical and Computer Engineering, University of Manitoba in 2011. He currently holds a Natural Sciences and Engineering Research Council of Canada (NSERC) Discovery Grant.

The focus of this talk is on a parallel GPU solution to the Maximal Clique Enumeration (MCE) problem, which is a depth-first search method commonly referred to as the backtracking paradigm. The solution to this problem is an outgrowth of work investigating an efficient method for finding all tolerance classes on a set of objects. Recently, the problem of finding tolerance classes has been shown to be the same as the MCE problem. Tolerance classes are sets where all the pairs of objects within a set must satisfy the tolerance relation and the set is maximal with respect to inclusion. Finding such classes is a computationally complex problem and has many applications areas (e.g. genomics and social media). In particular, this talk will focus on content-based image retrieval (CBIR) involving sets of objects with similar features. In the proposed application to CBIR, classes in image covers determined by a tolerance relation provide the content used for CBIR.

Session Level: Intermediate
Session Type: Talk
Tags: Video & Image Processing; Numerical Algorithms & Libraries

Day: Wednesday, 03/26
Time: 14:30 - 14:55
Location: Room LL21A

S4595 - OpenACC vs.OpenMP4: The Strong, the Weak, the Missing to Develop Performance Portable Applications on GPU and Xeon Phi

James Lin ( Vice Director of Center for HPC , Shanghai Jiao Tong University (SJTU) )
James Lin
James Lin is the vice director of Center for High Performance Center, SJTU, and PI of SJTU CUDA Center of Excellence (http://ccoe.sjtu.edu.cn). He has received JSPS RONKAPU Fellowship from Japan in Feb 2013 and been a visiting associate of Satoshi Matsuoka Laboratory, Tokyo Institute of Technology since then. His research interests include directive-based programming approach on massive heterogeneous system. He teaches CS075 Introduction to HPC in SJTU, and serves as a TPC member for SC13 and HPC China 2012 & 2013.

Learn to develop a unique code base to deal with both NVIDIA GPU and Intel Xeon Phi by using directive-based programming approach (OpenACC and OpenMP4) . We carried out early experiment on π, the GPU-Phi supercomputer of SJTU CCOE, with CAPS OpenACC compiler and HOMP, the OpenMP4-to-CUDA compiler based on Rose Compiler from LLNL. In this session we will show preliminary results of the evaluation with benchmarks and mini-apps, and then discuss different optimization methods applied, finally we will identify the strong, the weak, the missing features for OpenACC and OpenMP4 to achieve good performance portability on very different architectures.

Session Level: Advanced
Session Type: Talk
Tags: Programming Languages & Compilers; Supercomputing

Day: Wednesday, 03/26
Time: 14:30 - 14:55
Location: Room LL20C

S4621 - Beyond Pedestrian Detection: Deep Neural Networks Level-Up Automotive Safety

Ikuro Sato ( Senior Engineer, Denso IT Laboratory, Inc. )
Ikuro Sato
Ikuro Sato received hi Ph.D in Physics from the University of Maryland in 2005. After a postdoctoral fellowship in Lawrence Berkeley National Laboratory, he joined Denso IT Laboratory, Inc. in 2008. His primary research area includes computer vision and machine learning, mostly related to Advanced Driver Assistance Systems.
Hideki Niihara ( Engineer, Denso IT Laboratory, Inc. )
Hideki Niihara
Hideki Niihara has been focusing on implementing computer vision, signal processing, and machine learning algorithms on GPU for more than 5 years at Denso IT Laboratory, Inc, Japan. He is the leading developer of "Numerical Task Passing (NTP)" library that allows easy switching between CPU and GPU and has MATLAB interface.

People want not only cost-friendly, trouble-free, energy-efficient, but also safe cars. Today's technology provides Advanced Emergency Braking Systems that can detect pedestrians and automatically brake just before collision is unavoidable. We have a vision that future Advanced Driver Assistance Systems enable not just detecting pedestrians but recognizing how the pedestrians are and understanding the level of danger to avoid emergency situations. We claim deep Convolutional Neural Networks (CNN) are the right tools for these highly non-trivial tasks, and Tegra is the best partner. We demonstrate real-time deep CNN using Tegra.

Session Level: All
Session Type: Talk
Tags: Automotive; Computer Vision; Machine Learning & AI; Recommended Press Session – Auto; Recommended for All Press

Day: Wednesday, 03/26
Time: 14:30 - 14:55
Location: Room 210A

S4657 - Porting Fabric Engine to NVIDIA Unified Memory: A Case Study

Peter Zion ( Chief Architect, Fabric Engine Inc. )
Peter Zion
Peter Zion is co-founder and chief architect at Fabric Engine Inc. Peter is the principal architect and implementor of the core of Fabric Engine, a platform for the development of high-end 3D production tools. In this role Peter has designed and developed the KL programming language, a high-level, single-source language that supports efficient parallel computation on both CPUs and GPUs.

Fabric Engine is a platform for the development of high-end 3D production tools. Fabric Engine exposes high-performance computation through the KL programming language. KL is a high-level, single-source language that was initially designed to leverage multicore CPUs for parallelism, and was later adapted to additionally support modern GPUs. In this talk we present a case study of adding support for NVIDIA GPUs to KL through the use of the new unified memory feature, including key challenges faced and solutions architected to overcome these challenges.

Session Level: All
Session Type: Talk
Tags: Media & Entertainment Summit; Programming Languages & Compilers; Visual Effects & Simulation

Day: Wednesday, 03/26
Time: 14:30 - 14:55
Location: Room 211A

S4666 - Next Technology Steps for Applied Materials Global Engineering Collaboration Using CAD in the Cloud

Oran Davis ( Managing Director, Engineering Tools at Applied Materials, Applied Materials )
Oran Davis is the Managing Director, Engineering Tools and Collaboration at Applied Materials. Prior to joining Applied Materials, Oran held positions in systems engineering, CAD development and operations. Oran graduated from Haifa Technion Israel with a Bsc in Computer Engineering and received his Msc in Computer Engineering from Santa Clara University.

Applied Materials experiences and impression while migrating it's 3D mechanical CAD cloud to next generation technology. Applied Materials has 2000+ engineers spread across 93 locations in 22 countries. Each engineer has access to a private cloud CAD blade-station accessing terabytes of engineering data within the server room. Applied Materials has begun deploying the next generation of CAD in the cloud technologies. The goal is to enable improved service and resource management for the private cloud . Audience members will learn about the details for operating a cloud infrastructure, the challenges involved using current technologies, and the benefits that Applied Materials is currently seeing. From this real world example, audience members will gain key insights for determining if this type of solution is right for their company.

Session Level: Advanced
Session Type: Talk
Tags: Graphics Virtualization Summit; Digital Manufacturing Summit; Computer Aided Design; Remote Graphics & Cloud-Based Graphics; Recommended Press Session – Graphics Virtualization

Day: Wednesday, 03/26
Time: 14:30 - 14:55
Location: Room 210F

S4678 - NVIDIA Driven Image Generation in Immersive Fast-Jet Simulators

William Paone ( Technical Lead Engineer: Image Generator Product Development , Boeing )
William Paone
Bill Paone is the Technical Lead Engineer for Fixed Wing Image Generation Product Development for Boeing Training Systems and Government Services (TS&GS). Bill has lead image generation development and production for multiple world-wide programs. Emphasizing in visual simulation design, his long career also includes development and production engineering in multiple fields. This includes lead positions in hardware design, software design and system integration engineering.

Flight Simulation Visual Systems realism quality in immersive systems is dependent on the NVIDIA GPU roadmap. Image Generator (IG) designs with scalable architecture driving these systems are made successful by fitting the newest NVIDIA releases at market launch. With known use-case bench-marks and effective system scaling, development and production can be successful. With fast jet simulations, absolute determinism is required due to the scale (number of rendering platforms) and number of video streams. Also adequate scene quality for both low level and high altitude flight must be provided. To do this, benchmarking needs to be accomplished for all scene components, to allocate proper margins in the GPU for each. This talk will provide an example scalable image generator design with NVDIA solutions with example images and flight . We will review the challenges and issues with traditional visual database to IG release paths and talk about newer technology that renders and tessellates directly from source.

Session Level: All
Session Type: Talk
Tags: Digital Manufacturing Summit; Combined Simulation & Real-Time Visualization; Recommended Press Session – Digital Manufacturing

Day: Wednesday, 03/26
Time: 14:30 - 14:55
Location: Room 210H

S4709 - Allosphere Research Facility: New Trends In Data Discovery Through Large-Scale Interactive Visualization

JoAnn Kuchera-Morin ( Director AlloSphere Research Facility, Professor Media Arts and Technology, University of California, Santa Barbara )
JoAnn Kuchera-Morin
Dr. JoAnn Kuchera-Morin Director and Chief Scientist of the AlloSphere Research Facility (www.allosphere.ucsb.edu), Professor of Media Arts and Technology and Music. Her research focuses on creative computational systems, multi-modal media systems content and facilities design. Her years of experience in digital media research led to the creation of a multi-million dollar sponsored research program for the University of California—the Digital Media Innovation Program. She was Chief Scientist of the Program from 1998 to 2003. The culmination of Professor Kuchera-Morin's research is the AlloSphere, a 30-foot diameter, 3-story high metal sphere inside an echo-free cube, designed for immersive, interactive scientific and artistic investigation of multi-dimensional data sets. Scientifically, the AlloSphere is an instrument for gaining insight and developing bodily intuition about environments into which the body cannot venture—abstract higher-dimensional information spaces, the worlds of the very small or very large, and the realms of the very fast or very slow. Artistically, it is an instrument for the creation and performance of avant-garde new works and the development of new modes and genres of expression and forms of immersion-based entertainment. Professor Kuchera-Morin serves as the Director of the AlloSphere Research Facility located within the California NanoSystems Institute, Elings Hall, at the University of California, Santa Barbara. JoAnn Kuchera-Morin earned a Ph.D. in composition from the Eastman School of Music, University of Rochester.

Unlike a movie theater in which all viewers face the same direction and have the same "left" and "right", in the AlloSphere every user can face in any direction at any time, so 3D stereographic imagery must work correctly for all view directions simultaneously. We accomplish this for arbitrary 3D scenes with some special GPU programming. Panoramic cylindrical stereography through per-vertex displacement on the GPU proved to be an efficient and discontinuity-free alternative.

Session Level: All
Session Type: Talk
Tags: Virtual & Augmented Reality; Collaborative & Large Resolution Displays; Combined Simulation & Real-Time Visualization

Day: Wednesday, 03/26
Time: 14:30 - 14:55
Location: Room LL20B

S4809 - GPU Usage and the VFX Industry (Presented by Lenovo)

Allen Bolden ( Chief Architect & Executive Producer, Bit Theory, Inc. )
Including work while an undergrad, Allen's experience includes over 10 programs in 3D and graphic media applications including Photoshop (15 years), 3DS Max (10 years), Maya (9 years), and Terragen (5 years). His 3D skill set includes Modeling, Rigging, Motion Capture (MoCap), Animation Texture Mapping, Texture Painting, Environment Creation and Animation, Building Creation, Walkthroughs, Product Creation and Animation Demos. B.S. in Computer Science and Engineering (CSE) in the Computer Science Division of the Soda Hall undergraduate Program at Cal Berkeley earned in 3 years (1998‐2001). With all of this under his belt Allen quickly became a leading generalist in the field of VFX and animation. Currently, Allen has developed a Prime Intelligence system named Athena, which automates key tasks in the animation/VFX pipeline, and learns to apply what it ascertains to new assets and effects making their development and production faster and more cost efficient while maintaining expected quality. His methods are currently being used on Major motion pictures in the CG, VFX, and Stereoscopic conversion areas.

A case study with on the top methods used in the VFX pipeline, and some of the more daring "out of the box" uses coming in the near future for VFX and beyond. The goal of this presentation is to show how multiple technologies tie into a pipeline which is accessible anywhere and, powered by a backbone of GPU's, puts production on set in real time during critical time on set.

Session Level: All
Session Type: Talk
Tags: Media & Entertainment Summit; Video & Image Processing; Media & Entertainment; Recommended Press Session – Media & Entertainment

Day: Wednesday, 03/26
Time: 14:30 - 14:55
Location: Room 211B

S4857 - Challenges in Industrial Visualization on Mobile Devices

Jan Hermes ( Senior Researcher, RTT AG )
In 2006 Jan Hermes started at the Fraunhofer FIT working on Augmented Reality. Since 2010 he is Senior Researcher for real-time visualization at RTT and also representing RTT to the Khronos Group.

Mobile devices are ideal devices for industrial product visualization that easily wants to advertise a product to a wide range of potential customers. But as this type of visualization aims for a configurable, real-world preview of a product, rendering remains challenging on today's mobile devices. In this talk we want to explain how to deal with complex, CAD-based geometry and expensive shading requirement to generate a highly realistic real-time product configurator on state-of-the art mobile devices like the Tegra 5.

Session Level: All
Session Type: Talk
Tags: Mobile Summit; Rendering & Animation; Automotive; Digital Manufacturing Summit

Day: Wednesday, 03/26
Time: 14:30 - 14:55
Location: Room 210E

S4229 - Generation, Simulation and Rendering of Large Varied Animated Crowds

Isaac Rudomin ( Senior Researcher, Barcelona Supercomputing Center )
Isaac Rudomin: PhD Computer Science 1990 UPENN. Worked at ITESM as professor/researcher from December 1990 to January 2012. Working at BSC as researcher since March 2012.
Benjamin Hernandez ( Researcher (Postdoc), Barcelona Supercomputing Center )
Benjamin Hernandez
Benjamin Hernandez. PhD Computer Science Tecnologico de Monterrey. Works at ITESM Campus Ciudad de Mexico. Currently Postdoc at Barcelona Supercomputing Center

We discuss several steps in the process for simulating and visualizing large and varied crowds, in real time, for consumer-level computers and graphic cards (GPUs). We discuss methods for simulating, generating, animating and rendering crowds of varied aspect and a diversity of behaviors.

Session Level: Intermediate
Session Type: Talk
Tags: Media & Entertainment Summit; Combined Simulation & Real-Time Visualization; Rendering & Animation; Visual Effects & Simulation

Day: Wednesday, 03/26
Time: 15:00 - 15:25
Location: Room 211A

S4278 - How to Virtualize 3D Workstations? High-End Graphics for VDI in 10 Easy Steps

Mayunk Jain ( Technical Marketing Manager, Citrix Systems )
Mayunk Jain
As a Technical Marketing Manager with Citrix, Mayunk is responsible for evangelizing the company’s leading desktop and app virtualization products, Citrix XenDesktop® and Citrix XenApp®. His responsibilities include competitive intelligence, sales enablement, and development of technical collateral such as product demos, performance benchmarks and white papers. He is keenly involved with training and business development activities, in addition to his "day job". Prior to joining Citrix in 2008, he has held sales engineering roles with start-ups in the enterprise networking and wireless mobility space. He holds a post-graduate diploma in business administration from Indira Gandhi University, New Delhi.
Praveen Prakash ( Senior Software Engineer, Citrix Systems )
Praveen Prakash
With over 8 years experience in Quality Engineering and Enterprise product testing, Praveen is one of the driving forces behind the virtualized graphics technology developed by Citrix. As Technical Lead for the HDX (High Definition User eXperience) team located in the India Innovation Center, he is responsible for providing near-native user experience for users of virtual desktop and apps on any end points such as thin-clients, mobile devices, and Windows clients. Praveen is closely engaged in supporting customers, partners, and sales teams with complex trials and deployments of HDX 3D Pro around the world. He has a Masters in Science from BITS Pilani, one of India's highly reputed universities.

In this session, people with diverse technical backgrounds can learn what it takes to embark on a project with GPU-enabled virtual desktops and applications. We will talk about what components are needed, how they interact, where to learn more, and pick-up optimization best practices. You will gain the knowledge to plan your own roadmap for adopting this fantastic technology.

Session Level: Beginner
Session Type: Talk
Tags: Graphics Virtualization Summit; Desktop & Application Virtualization; Remote Graphics & Cloud-Based Graphics; Computer Aided Design

Day: Wednesday, 03/26
Time: 15:00 - 15:50
Location: Room 210F

S4289 - Efficient Solution of Multiple Scalar and Block-Tridiagonal Equations

Endre László ( Ph.D. student, University of Oxford, Oxford e-Research Center )
Endre László is a visiting Ph.D. student at the University of Oxford, Oxford e-Research Center under the supervision of prof. Michael B. Giles. Finished MSc in 2010 in Electrical and Computer Engineering at Pazmany Peter Catholic University (PPCU-FIT) in Budapest, Hungary. Worked for a financial consultancy company and the Institute for Technical Physics and Materials Science, Hungarian Academy of Sciences. Started PhD in Parallel Computing at PPCU-FIT in 2011.

Many numerical methods require the solution of multiple independent tridiagonal systems. This talk will describe optimized methods for solving such systems, considering both the case where the tridiagonal elements are scalar, and the case where they are composed of square blocks of dimension D, typically 3-8. For the scalar case very good performance is achieved using a combination of the Thomas algorithm and parallel cyclic reduction. In the block case it is shown that good performance can be achieved by using D cooperating threads, all within the same warp.

Session Level: Advanced
Session Type: Talk
Tags: Numerical Algorithms & Libraries; Finance; Computational Physics

Day: Wednesday, 03/26
Time: 15:00 - 15:25
Location: Room LL20D

S4337 - Take Your Head Out of That Box: Sunglasses-Like Virtual Reality Eyewear Using Light Field Displays

Douglas Lanman ( Senior Research Scientist, NVIDIA )
Douglas Lanman
Douglas Lanman works in the Computer Graphics and New User Experiences groups. His research is focused on computational imaging and display systems, including head-mounted displays (HMDs), automultiscopic (glasses-free) 3D displays, light field cameras, and active illumination for 3D reconstruction. He received a B.S. in Applied Physics with Honors from Caltech in 2002 and M.S. and Ph.D. degrees in Electrical Engineering from Brown University in 2006 and 2010, respectively. Prior to joining NVIDIA, he was a Postdoctoral Associate at the MIT Media Lab from 2010 to 2012 and an Assistant Research Staff Member at MIT Lincoln Laboratory from 2002 to 2005. Douglas has worked as an intern at Intel, Los Alamos National Laboratory, INRIA Rhône-Alpes, Mitsubishi Electric Research Laboratories (MERL), and the MIT Media Lab. He presented the "Build Your Own 3D Scanner" course at SIGGRAPH 2009 and SIGGRAPH Asia 2009, the "Build Your Own 3D Display" course at SIGGRAPH 2010, SIGGRAPH 2011, and SIGGRAPH Asia 2010, and the "Computational Imaging" and "Computational Displays" courses at SIGGRAPH 2012.
David Luebke ( Sr. Director NVIDIA Research, NVIDIA )
Highly-Rated Speaker
David Luebke helped found NVIDIA Research in 2006 after eight years on the faculty of the University of Virginia. Luebke received his Ph.D. under Fred Brooks at the University of North Carolina in 1998. His principal research interests are GPU computing and real-time computer graphics. Luebke's honors include the NVIDIA Distinguished Inventor award, the NSF CAREER and DOE Early Career PI awards, and the ACM Symposium on Interactive 3D Graphics "Test of Time Award". Dr. Luebke has co-authored a book, a SIGGRAPH Electronic Theater piece, a major museum exhibit visited by over 110,000 people, and dozens of papers, articles, chapters, and patents.

We will describe a new light-field-based approach to near-eye display that allows for dramatically thinner and lighter head-mounted display capable of depicting accurate accommodation, convergence, and binocular-disparity depth cues. Such near-eye light field displays depict sharp images from out-of-focus display elements by synthesizing light fields that correspond to virtual scenes located within the viewer's natural accommodation range. Building on related integral imaging displays and microlens-based light-field cameras, we optimize performance in the context of near-eye viewing. Near-eye light field displays support continuous accommodation of the eye throughout a finite depth of field; as a result, binocular configurations provide a means to address the accommodation convergence conflict that occurs with existing stereoscopic displays. We have built film-based static image prototypes, a binocular OLED-based prototype head-mounted display, and a GPU-accelerated stereoscopic light-field renderer.

Session Level: All
Session Type: Talk
Tags: Virtual & Augmented Reality

Day: Wednesday, 03/26
Time: 15:00 - 15:25
Location: Room LL20B

S4340 - OpenACC Parallelization and Optimization of NAS Parallel Benchmarks

Rengan Xu ( Student, University of Houston )
Rengan Xu
Rengan Xu is a PhD candidate in the department of Computer Science at University of Houston. His research mainly focuses on heterogeneous computing especially OpenACC programming model. He developed OpenACC validation suite, OpenACC runtime library in OpenUH compiler and explored different parallelization and optimizations techniques for porting OpenACC applications.

The goal of this session is to present how to program large applications and apply efficient optimization techniques using OpenACC. We will present our experiences in porting NAS parallel benchmarks that includes 5 kernels and 3 pseudo applications EP, FT, IS, CG, MG, LU, SP and BT to OpenACC. All of these benchmarks exhibit different program characteristics such as compute bound, memory bound, irregular memory access, etc. We will present our evaluation results that compares to that of the serial and the OpenMP version. Advantages and limitations of OpenACC will also be discussed.

Session Level: Intermediate
Session Type: Talk
Tags: Programming Languages & Compilers

Day: Wednesday, 03/26
Time: 15:00 - 15:25
Location: Room LL20C

S4347 - Conquering the Titan Supercomputer: A Star-by-Star Simulation of the Milky Way Galaxy

Evghenii Gaburov ( HPC Advisor, SURFsara )
Evghenii Gaburov
Evghenii Gaburov received MPhys with Astrophysics from University of Leicester. He continued with a PhD research at the University of Amsterdam working on stellar dynamics, stellar collisions and parallel processing on GPUs. Afterwards he spent two years at Leiden Observatory investigating the impact of strong magnetic field on accretion disks around supermassive black holes. He continued with this research at the Northwester University on the prestigious CIERA and NASA Hubble postdoctorate fellowships. Afterwards, he has joined SURFsara, the Dutch national supercomputing centre, to help researches to take advantage of compute power that modern processors offer.
Jeroen Bédorf ( Ph.D. Student, Leiden Observatory )
Jeroen Bedorf is a PhD student at the Leiden Observatory in the Netherlands. He obtained his Master of Science in Grid Computing at the University of Amsterdam. The topic of the thesis was high performance N-body simulations on Graphics Processing Units. After his master he started as a Ph.D. to continue the work on high performance N-body codes and applying them to the problems in computational astrophysics. The N-body codes are accelerated using GPUs. He has developed multiple N-body codes either using direct N-body or using hierarchical tree-code algorithms (for example the Bonsai tree-code). These tree-code methods are used to simulate the formation of galaxies and the effect galaxy mergers have on their evolution.

In this session we demonstrate how we are able to leverage the massive parallelism of thousands of GPUs inside the Titan supercomputer and be able to simulate the past and future of the Milky Way Galaxy on a star-by-star basis in less than 10 days. The audience will learn what it takes to parallelize an advanced hierarchical GPU tree-code to efficiently run on the Titan supercomputer. A gravitational N-body problem is by definition an all-to-all problem and it is of utmost importance for scalability to hide data communication behind computations. This turned out to be a major challenge on the Titan supercomputer because Bonsai's GPU kernels are ~3x faster on Kepler than on Fermi, which reduced compute time and as a result hampered scalability. We were able to solve this by redesigning the communication strategy by taking full advantage of each of the 16- CPU cores while the GPUs were busy computing gravitational forces. This allowed Bonsai to scale to more than 8192 GPUs.

Session Level: Intermediate
Session Type: Talk
Tags: Astronomy & Astrophysics; Supercomputing; Computational Physics; Numerical Algorithms & Libraries; Recommended Press Session – HPC-Science

Day: Wednesday, 03/26
Time: 15:00 - 15:50
Location: Room LL21F

S4480 - Next Gen Indie Filmmaking: The Technology, Workflow and Power at Our Fingertips

James Fox ( Owner, Dawnrunner )
James Fox is the Founder and CEO of Dawnrunner, and brings with him the creative power of 12 action-packed years in the film industry. As the principal Director for Dawnrunner he has lead the company through numerous award-winning projects, including: feature films, short films, commercials, music & industrial videos, all across the country. He has been recognized within the industry for his passion and determination to advance the industry through education and outreach, and has spoken at events like, NAB, Seattle International Film Festival, SXSW Film Festival, and dedicates his time and energy on several advisory boards, and through partnerships with children's arts education organizations. His "breaking down barriers" approach to filmmaking, creativity and technical street-smarts have labeled him a visionary for the future of visual storytelling through new methods and technological innovations. He provides strong and charismatic leadership - and an inexhaustible well of excitement and energy.

Learn how to leverage breakthrough technologies within a nimble independent production environment to create Hollywood-Challenging films. With 1) GPU Acceleration providing an unprecedented economy of time-to-power, and 2) Desktop Virtualization creating an opportunity for hassle-free expansion and contraction of work force; the barrier to entry required to bring your vision to life has been substantially mitigated. Concepts in technology and their integration into a developed workflow will be illustrated, along with a side-by-side comparison of a project that was transformed within this next-gen model.

Session Level: Beginner
Session Type: Talk
Tags: Media & Entertainment Summit; Desktop & Application Virtualization; Graphics Virtualization Summit; Recommended Press Session – Media & Entertainment

Day: Wednesday, 03/26
Time: 15:00 - 15:25
Location: Room 211B

S4492 - GPU Accelerated Fully Flexible Haptic Protein-Ligand Docking

Thanasis Anthopoulos ( Software Developer, Cardiff University )
Thanasis Anthopoulos
Thanasis has been designing CUDA-parallel algorithms for a drug design application in Cardiff University since 2010. He holds a degree in Imaging Sciences and a master's degree in Computer Science. His research interests include CUDA-parallel algorithms, meta-heuristics, bioinformatics, computer vision and image processing. He has worked on various research engineer and software developer projects prior to that such as: Tripod, with the school of Geoinformatics in Cardiff; a psychology project coding an application for a long term experiment in how humans interact with UIs and digE-Post; and a research project with EADS Innovation Works. Outside working hours, Thanasis enjoys spending time with his family. In addition, he is a keen surfer and windsurfer and enjoys surfing in the local breaks or windsurfing depending on weather conditions.

This presentation refers to a haptic protein-ligand docking (HPLD) application developed in the Molecular Modelling Lab of the Cardiff School of Pharmacy. The talk aims to describe in detail how GPUs enabled the application to run with a fully flexible ligand and protein target. The first part of the talk describes the algorithm used to perform the MMFF94s force-field energy and force calculations. Performance benchmarks will be presented to show the speed-up gained from the presented CUDA algorithms. The second part of the talk refers to an evolutionary algorithm designed to exploit Hyper-Q capabilities and evaluate asynchronously Energy kernels using the algorithm explained in the first part of the talk. Performance benchmarks are provided to show how the algorithm can achieve an additional 2-3X depending on system size when it runs on a GK110 chipset. The session closes with results generated from researchers using the CUDA version of the application.

Session Level: Intermediate
Session Type: Talk
Tags: Molecular Dynamics; Supercomputing; Computational Physics

Day: Wednesday, 03/26
Time: 15:00 - 15:25
Location: Room LL21E

S4495 - Plasma Turbulence Simulations: Porting Gyrokinetic Tokamak Solver to GPU Using CUDA

Praveen Narayanan ( Applied Engineer, NVIDIA )
Praveen Narayanan
Praveen currently works in the Applied Engineering group at NVIDIA. His roles include porting and benchmarking GPU solvers and creating demos for trade shows. After working on combustion and fire related problems using perturbation theory and computation in graduate school, the author transitioned to benchmarking and performance analysis of HPC fusion codes during his postdoc stint, before taking up GPU computing at NVIDIA.

The porting process of a large scale Particle-In-Cell Solver (GTS) to the GPU using CUDA is described. We present weak scaling results run at scale on Titan which show a speed up of 3-4x for the entire solver. Starting from a performance analysis of computational kernels, we systematically proceed to eliminating the most significant bottlenecks in the code - in this case, the PUSH step, which constitutes the 'gather' portion of the gather-scatter algorithm that characterizes this PIC code. Points that we think might be instructive to developers include: (1) using the PGI CUDA Fortran infrastructure to interface between CUDA C and Fortran; (2) memory optimizations - creation of a device memory pool, and pinned memory; (3) a demonstration of how communication causes performance degradation at scale, with implications on shifter performance in general PIC solvers, and why we need algorithms that handle communication in particle shifters more effectively; (4) Use of textures and LDG for irregular memory accesses.

Session Level: Beginner
Session Type: Talk
Tags: Computational Physics; Computational Fluid Dynamics; Supercomputing

Day: Wednesday, 03/26
Time: 15:00 - 15:25
Location: Room 212A

S4526 - GPU Accelerated Genomics Data Compression

BingQiang Wang ( Manager of Technical Computing, BGI )
BingQiang Wang
BingQiang Wang completed his doctorate in computational chemistry at East China University of Science and Technology (ECUST), China in 2006. Since March 2005, he was research scientist at Shanghai Supercomputing Center, dedicated to high performance computing enabled research in computational chemistry and life science. In March 2010 he joined BGI as head of high performance computing, to develop solutions for emerging life science challenges. He also served as adjunct assistant professor at Chinese University of Hong Kong (CUHK) for first half of 2012.

A review of existing compression algorithms is made, plus characteristics of genomics data (format). Then a general GPU-accelerated compression framework is introduced, featuring 1) adaptive compression scheme tuning, 2) optimized, GPU-accelerated compression algorithms, and 3) column-major storage. This approach fully exploit similarity within individual columns in popular genomics data formats, by using appropriate compression scheme (combination of algorithms), then GPU is employed to speedup compression / decompression thus several folds faster bandwidth.

Session Level: Intermediate
Session Type: Talk
Tags: Bioinformatics & Genomics; Big Data Analytics & Data Algorithms

Day: Wednesday, 03/26
Time: 15:00 - 15:25
Location: Room LL21D

S4585 - FastFlow: Combining Pattern-Level Abstraction and Efficiency in GPGPUs

Marco Aldinucci ( Researcher, Computer Science Department, University of Torino )
Marco Aldinucci is an assistant professor at Computer Science Department of the University of Torino since 2008. Previously, he has been researcher at University of Pisa and Italian National Research Agency. He is the author of over a hundred papers in international journals and conference proceeding (Google scholar h-index 21). He has been participating in over 20 national and international research projects concerning parallel and autonomic computing. He is the recipient of the HPC Advisory Council University Award 2011 and the NVidia Academic Research programme 2013. He has been leading the "Low-Level Virtualization and Platform-Specific Deployment" workpackage within the EU-STREP FP7 ParaPhrase (Parallel Patterns for Adaptive Heterogeneous Multicore Systems) project, the GPGPU workpackage within the IMPACT project (Innovative Methods for Particle Colliders at the Terascale), and he is the contact person for University of Torino for the European Network of Excellence on High Performance and Embedded Architecture and Compilation. In the last year he delivered 5 invited talks in international workshops (March 2012 – March 2013). He co-designed, together with Massimo Torquati, the FastFlow programming framework and several other programming frameworks and libraries for parallel computing. His research is focused on parallel and distributed computing.

Learn how FastFlow's parallel patterns can be used to design parallel applications for execution on both CPUs and GPGPUs while avoiding most of the complex low-level detail needed to make them efficient, portable and rapid to prototype. As use case, we will show the design and effectiveness of a novel universal image filtering template based on the variational approach.

Session Level: Beginner
Session Type: Talk
Tags: Video & Image Processing; Numerical Algorithms & Libraries; Programming Languages & Compilers

Day: Wednesday, 03/26
Time: 15:00 - 15:50
Location: Room LL21A

S4587 - CUDA Profiling Tools

Sandarbh Jain ( Software Engineer, NVIDIA )
Sandarbh Jain
Sandarbh Jain is an Engineer in the CUDA Developer Tools group at NVIDIA. He is primarily responsible for CUDA performance analysis tools. Sandarbh received his Bachelor's degree in Computer Engineering from Jamia Millia Islamia, India.

The NVIDIA Visual Profiler, nvvp, and command-line profiler, nvprof, are powerful profiling tools that you can use to maximize your CUDA application's performance. The NVIDIA Visual Profiler helps you understand your application's behavior with a detailed timeline and data from GPU performance counters. This session will provide an overview of the new GPU profiling features that an help you better tune your CUDA application.

Session Level: All
Session Type: Talk
Tags: Performance Optimization

Day: Wednesday, 03/26
Time: 15:00 - 15:50
Location: Room 212B

S4630 - Using GPU-Based RealityServer for an Online Laboratory Instrument Configurator

Mark Keenan ( CEO, Technicon )
Mark Keenan
Mark Keenan, Technicon's CEO, brings over 16 years of engineering and management experience to this role. Prior to joining Technicon, he was Director of Strategic Investments for Intel Capital, a position requiring a clear understanding of the interrelation of technology, marketing and financial issues. At Intel, Mark grew and managed an investment portfolio of over $150MM, closing over 20 venture capital deals while focusing on the manufacturing, telecom and electronics industries. His deep interest in the practical application of manufacturing technologies dates back to his studies at Georgia Tech where he earned a Master's degree in Electrical Engineering.

Learn how an online configurator integrated with GPU-based RealityServer lets lab managers interactively layout equipment and platforms, view 3D images of layouts and share designs with colleagues. Deployed for Thermo Fisher Scientific, a leading innovator of laboratory automation systems, the system is particularly valuable for lab managers who may not have experience laying out the equipment. Configurator rules ensure items are properly located and oriented. Using RealityServer, lab managers can quickly generate 3D views of their layout from any angle and location to see exactly what they're ordering and how it's assembled. Three-D images can be compiled into "photo-rolls" to fully document the layout. Lab managers can also view a budgetary estimate and equipment list for their layout.

Session Level: All
Session Type: Talk
Tags: Digital Manufacturing Summit; Rendering & Animation; Manufacturing

Day: Wednesday, 03/26
Time: 15:00 - 15:25
Location: Room 210G

S4644 - Getting Big Data Done On a GPU-Based Database

Ori Netzer ( VP, Product Management , SQream Technologies )
As the VP of Product Management for SQream Technologies, Ori is responsible for mapping the company's product road map. Ori is a Big Data thought leader and his main goal is to share my views with other professionals within the industry.

We will provide an in-depth analysis of our in production, GPU based technology for Big Data analytics, highlighting how our database benefits teleco companies. We will do this by explaining the key-features of our technology, mentioning that our database provides close to real-time analytics and provides up to 100X faster insights all in a very cost-effective manner. We will elaborate on these features and more in order to provide a clear understanding of how our technology works and why it is beneficial for teleco companies.

Session Level: All
Session Type: Talk
Tags: Big Data Analytics & Data Algorithms

Day: Wednesday, 03/26
Time: 15:00 - 15:25
Location: Room 210B

S4645 - Scalable Rendering Architecture: Challenges & Approaches

Ketan Mehta ( Principal Software Engineer, Vital Images Inc )
Ketan Mehta
Ketan has been working and leading rendering at Vital Images for 7+ years, where technology moved from stand alone workstations to enterprise rendering.

Learn about challenges in deploying medical imaging advanced visualization in enterprise data-centers and the impacts of virtualization. This talk will discuss and dispel some myths about virtualization with GPUs. The talk will also cover key challenges of designing scalable rendering architecture that can support tens to hundreds of users, focusing on system and architecture challenges along with software design concerns that need to be addressed. Active participation and sharing of experiences from the audience is welcome and encouraged.

Session Level: Intermediate
Session Type: Talk
Tags: Medical Imaging & Visualization; Remote Graphics & Cloud-Based Graphics; Clusters & GPU Management

Day: Wednesday, 03/26
Time: 15:00 - 15:25
Location: Room LL21B

S4655 - Efficient Lifetime Portfolio Sensitivities: AAD Versus Early-Start Longstaff-Schwartz Compression

Chris Kenyon ( Director, Quantitative Research – CVA / FVA, Lloyds Banking Group )
Chris Kenyon
Chris Kenyon is a Director in the Quantitative Research – CVA / FVA team at Lloyds Bank. Previously he was head quant for counterparty credit risk globally at Credit Suisse, and at DEPFA Bank PLC he was Head of Structured Credit Valuation (post crisis), working on pricing model development, and validation. He has also held positions at IBM Research, and Schlumberger where he applied real options pricing to everything from offshore rig lease extension options to variable volume outsourcing contracts. Chris holds a PhD in Applied Mathematics from Cambridge University where he was a Research Fellow (Computer Modeling), and has an MSc in Operations Research from the University of Austin, Texas.
Andrew Green ( Head of Quantitative Research - CVA / FVA, Lloyds Banking Group )
Andrew Green (Head of Quantitative Research – CVA / FVA) has headed the Quantitative Research – CVA / FVA team at Lloyds Bank for the last five years and is responsible for the development of models for credit and funding valuation adjustments. Prior to joining Lloyds, he headed the DCRM Quant team at Barclays Capital with responsibility for CVA model development. During his career as a quantitative analyst, Andrew has worked on fixed income, equity, credit and hybrid derivative models. Andrew holds a DPhil and BA in Physics from Oxford University and the Certificate in Advanced Study in Mathematics (Part III) from Cambridge University.

Developments in financial regulation (Dodd-Frank, Basel III) emphasize capital adequacy. Efficient lifetime portfolio VaR-based capital calculations for trading decisions are highly computationally challenging. The major impediment to widespread GPU adoption has been the need for multiple code-bases, however the Xcelerit middleware solves this. We give a single source CPU/GPU approach for highly-efficient lifetime portfolio sensitivity calculations. This talk introduces Early-Start Longstaff-Schwartz Compression (ES-LSC) which replaces (Automatic) Algorithmic Differentiation (AAD) that we demonstrate is technically unsuitable after t=0. Longstaff-Schwartz is a state-space method for pricing which we also apply to non-American derivatives as a compression technique. Early-Start means simulations (GPU/CPU) start from the past so the state space is available both at t=0 and all later times for VaR calculations for capital pricing (or IM). State space regressions provide sensitivities either analytically, or by finite difference.

Session Level: Intermediate
Session Type: Talk
Tags: Finance; Numerical Algorithms & Libraries

Day: Wednesday, 03/26
Time: 15:00 - 15:50
Location: Room 210C

S4714 - NVIDIA Vision Toolkit for Advanced Driver Assistance Systems, Computational Photography and Beyond

Elif Albuz ( Manager of Mobile Vision Software, NVIDIA )
Elif Albuz is the manager of Mobile Vision Software at NVIDIA, leading Computer Vision projects on Advanced Driver Assistance, Computational Photography and Augmented Reality on Tegra GPUs. Before Computer Vision Group, she was leading CUDA FFT Library; designing new algorithms for motion estimation, superresolution and frame-rate up conversion and accelerating them on NVIDIA GPUs; designing architecture for error concealment, adaptive quantization for video codec handwares; and implementing low-level code for h.264, MPEG2 codecs. Prior to joining NVIDIA, she worked at Sony Electronics, leading DVD decoder firmware stack that was used in DVD players and PS2, implementing real-time OS for multi-processor systems and accelerating h.264 using SIMD in the Multimedia Research Labs. Elif Albuz holds a Masters degree in Electrical Engineering where she studied multimedia processing and parallel architectures.
Frank Brill ( Computer Vision Software Manager, NVIDIA )
Frank Brill manages a computer vision team in NVIDIA’s Compute organization responsible for developing libraries of algorithms for Advanced Driver Assistance Systems (ADAS) and other mobile computer vision applications. Frank obtained his Ph.D. in Computer Science from the University of Virginia and started his career doing computer vision research and development for video security and surveillance applications at Texas Instruments, where he obtained 5 patents related to this work. He then managed a Network Camera software development group, and eventually moved into silicon device program management. As a silicon program manager, Frank was responsible for several digital still camera and multimedia devices, including the first device in TI’s DaVinci line of multimedia processors (the DM6446), and the catalog product roll-out of the OMAP3530 mobile device. Frank returned to computer vision to manage a silicon architecture group focused on accelerating computer vision algorithms, before joining NVIDIA in 2013. Frank also represents NVIDIA at the Khronos OpenVX working group, which is defining an API to enable efficient mobile implementations of computer vision applications.

In this session, we will present contents of the Vision Toolkit, discuss performance advantages and demonstrate real-time applications enabled by this library. The Vision Toolkit is a product of NVIDIA, designed to enable real-life Computer Vision applications. It leverages state-of-the-art Computer Vision research and offers a variety of functions to its developers,initially targeting Advanced Driver Assistance Systems (ADAS) and Augmented Reality (AR) applications. The toolkit will be highly GPU accelerated on mobile platforms, offering significant speedup and reducing engineering effort to design real-time vision applications. The toolkit includes open source samples and offers a flexible framework that enables users to extend and contribute new functionality. It will be deployed on different operating systems including Android and the Linux on ARM to registered developers and partners through NVIDIA's web site.

Session Level: All
Session Type: Talk
Tags: Automotive; Computer Vision; Computational Photography; Mobile Summit; Recommended Press Session – Auto

Day: Wednesday, 03/26
Time: 15:00 - 15:50
Location: Room 210A

S4756 - GPU-Accelerated Ab-Initio Simulations of Low-Pressure Turbines

Vittorio Michelassi ( Aero-Thermal Chief Engineer, General Electric Global Research )
Vittorio Michelassi
Prof. Dr V Michelassi is Aero-Thermal Chief Engineer of the Aero-Thermal-Mechanical-System Organization at General Electric Global Research. He oversees improvements of aero-thermo dynamic performance predictions for energy conversion related research. Vittorio has a wide experience in turbomachinery flows, turbulence modelling as well as Large Eddy and Direct Numerical Simulations.
Richard Sandberg ( Professor, Fluid Dynamics and Aeroacoustics, Aerodynamics and Flight Mechanics Group, University of Southampton )
Richard Sandberg
Prof. R D Sandberg is Professor of Fluid Dynamics and Aeroacoustics in the Aerodynamics and Flight Mechanics Research Group at the University of Southampton. He has developed an ab-initio simulation tool to conduct extensive large-scale flow and noise simulations. Richard heads the UK Turbulence Consortium.

Gas turbines (GT) are widely applied to aircraft propulsion and power generation. A 2 to 3% GT efficiency improvement is worth 3 to 6 $B/Yr of fuel, and a corresponding reduction of environmental impact for the GE GT fleet alone. Although GT performance has improved considerably, it is now becoming increasingly difficult to make further advances with current design tools. High fidelity Computational Fluid Dynamics (CFD) promises to accelerate this advancement through shifting from modelling to resolving flow phenomena at unprecedented level of detail. Still, resolving all scales of turbulence present in GT constitutes a formidable computational challenge that can only be met by algorithms that exploit the latest GPU accelerated architectures. The talk will present porting of a highly efficient hybrid OpenMP/MPI flow solver to Titan using Open-ACC. Performance of the novel code version and initial results of GPU-accelerated ab-initio simulations at realistic engine conditions will be shown. This talk is part of the "Extreme-Scale Supercomputing with Titan Supercomputer" series chaired by Jack Wells, Director of Science, National Center for Computational Sciences, Oak Ridge National Laboratory.

Session Level: Intermediate
Session Type: Talk
Tags: Supercomputing

Day: Wednesday, 03/26
Time: 15:00 - 15:25
Location: Room LL21C

S4778 - Interactive Processing and Visualization of Geospatial Imagery

Brian Smith ( Technical Fellow, The Boeing Company )
Dr. Brian Hendee Smith is a Technical Fellow in Boeing’s Advanced Network and Space Systems group. He has been using GPUs for sensor data processing since 2008, and led the development of the Boeing Agility SAR Lab. He holds a PhD in theoretical physics from the University of Minnesota.

Air borne and space borne sensors have the capability to collect massive quantities of geospatial imagery data. These sensors often operate outside of the range of the human visual system, presenting a challenge for analysis and visualization. The Agility framework was developed by The Boeing Company to more easily leverage GPU technology for interactive visualization of large geospatial datasets. Come see how interactive processing can increase the utility of geospatial data with examples from synthetic aperture radar and hyperspectral imaging systems.

Session Level: All
Session Type: Talk
Tags: Scientific Visualization; Defense

Day: Wednesday, 03/26
Time: 15:00 - 15:25
Location: Room 210D

S4814 - A Journey Into the Depths of a 4096 Bytes Production

Beausoleil Samson-Guillemette
Beausoleil  Samson-Guillemette
Having started making games at the age of 15, Beausoleil quickly learned any new skill set as required. Strong believer in DYI and self-teaching, he's now a game designer during the day, and makes demos at night. Demomaking allows him to play with any tech area he likes. From fitting a drum'n'bass music in a couple of bytes, to 3d modeling or coding software renderer for ASCII output. He also runs the annual Text Mode Demo Competition and the Recursion demoparty in Montréal.
David Valentine
David  Valentine
Polaris is the founder, organizer and a coder for the Northern Dragons demo-group. Since 2001 the group has created demos, and intros and more for events all over the world. In 2005, Polaris co-founded the web portal “in4k” which remains one of the top go to technical resources for 4k intro development.

In an era when disk space is measured in terabytes, some people still try to optimize programs for size. 4KB intro developers combine digital artistry and technical ingenuity to bring to life productions full of graphics and music, all wrapped into a program measuring in at only 4 KB or less. In this dynamic presentation, BarZoule and Polaris will discuss the art and the science of creating 4kb intros. The audience will get a one of a kind behind the scenes view as we journey through the creation of the 4kb intro W4K3D, winner of @Party 2012. Each speaker brings their own perspective to the creation process, to give you a true sense of the various aspects of intro conception, design, creation and realization. Not to be missed by people who enjoy unusual technical feats of beauty.

Session Level: Beginner
Session Type: Talk
Tags: Real-Time Graphics Applications; Performance Optimization; NVScene

Day: Wednesday, 03/26
Time: 15:00 - 15:50
Location: Room 230C

S4905 - The New Acura NSX, Making Dreams Reality

Pete Petersen ( Manager of Digital Modeling and Visualization , Honda R&D )
Pete Petersen
Pete has over 20 years experience in digital modeling and visualization. He's worked at General Motors and owned his own design firm prior to coming to Honda R & D in 2002. Pete received his industrial design education from the Art Center College of Design.
Joseph Catalano ( CG Designer, Honda R&D Americas, Inc. )
Joseph Catalano
Joseph is a CG Designer at Honda R&D Americas where he works on photo real rendering. His focus is on using CG rendering to match the quality of photographs. Joseph comes from a background of commercial photography and has transitioned his way into digital rendering.
Mari Takashima ( Principal System Engineer, Honda R&D )
n/a

Time is always a hurdle in automotive development. Honda R&D is constantly striving to implement the latest cutting edge technologies to accelerate and improve our creative workflow. We use the latest software and hardware such as RTT, Bunkspeed, Alias, and others running on multi-GPU workstations globally. Being an international company having R&D studios around the world we have design competitions. Using the power of digital allows this to happen in real time. Our Teams are always looking for faster, easier to use and higher performance tools to help us reduce development time and increase accuracy. We are always pushing our hardware to the limit. This requires the fastest, highest performance GPU's possible. We can never get enough horsepower! Join us and get a look into the design process of the making of the NSX Show car and many other models. See how the Honda R&D Digital Modeling and Visualization team uses the power of digital to develop designs using technology from 3D scanning, 4K and higher photo-real multi GPU rendering, and augmented reality. We hope to see you at our session so we can show you how we turn dreams into reality.

Session Level: All
Session Type: Talk
Tags: Digital Manufacturing Summit; Automotive; Real-Time Graphics Applications; Combined Simulation & Real-Time Visualization; Recommended Press Session – Digital Manufacturing

Day: Wednesday, 03/26
Time: 15:00 - 15:25
Location: Room 210I

S4170 - GPU Neutron Transport: Simulating Nuclear Reactions One Neutron at a Time

Tony Scudiero ( Compute Devtech, NVIDIA )
Tony Scudiero
Tony Scudiero is a Developer Technology software engineer, or devtech, at NVIDIA focusing on compute performance for HPC applications. He works very closely with developers to identify performance bottlenecks and optimization strategies to improve overall performance in their applications. Tony has a long history of using GPUs to accelerate science which predates the introduction of CUDA. Prior to working at NVIDIA, Tony worked at Cray as a high-performance compiler engineer and science library optimization engineer. He has also spent time in other industry roles including computational finance and algorithm design for medical and defense applications. Tony holds an M.S. in Computer Science, a B.S. in Computer Science, and a B.S. in Mathematics from the University of Minnesota.

Monte Carlo neutron transport is an approach to simulating radiation transport and nuclear reaction physics by simulating the individual lifespans of many millions of unbound neutrons. OpenMC is a recently developed Monte Carlo neutron transport application intended to allow future reactor designer to leverage extremely low-level simulation of new reactors years before they are built. The presenter, Tony Scudiero, has adapted OpenMC from its original incarnation as 27k lines of single-threaded Fortran 90 to a parallel CUDA C/C++ implementation optimized for the GPU. This talk covers computational considerations of Monte Carlo neutron transport, the design and process of porting OpenMC to CUDA, and the results and lessons learned in the process. Along with OpenMC, its miniapp benchmark XSBench will be discussed.

Session Level: Advanced
Session Type: Talk
Tags: Computational Physics; Supercomputing; Numerical Algorithms & Libraries

Day: Wednesday, 03/26
Time: 15:30 - 16:20
Location: Room 212A

S4190 - Finite Difference Simulations on GPU Clusters: How Far Can You Push 1D Domain Decomposition?

Pierre Wahl ( Ph.D .Student, Brussels Photonics Team/ Vrije Universiteit Brussel )
Pierre Wahl
Pierre Wahl received his B.S. degree in electrical engineering and the Erasmus Mundus M.S. degree in photonics from Vrije Universiteit Brussel, Brussels, Belgium, in 2007 and 2010, respectively. He wrote his Master's thesis at Interuniversity Microelectronics Center, Leuven, Belgium on high-frequency electrical voltage-controlled oscillators. Pierre joined the Miller Group, Stanford University, Stanford, Calif., as a Visiting Researcher from 2010 to July 2011. He is currently pursuing a PhD degree in electrical engineering at Vrije Universiteit Brussel (Brussels Photonics Team) on low-energy optical interconnects. His current research interests include optical interconnects and advanced simulation and optimization methods in nanophotonics.

To fully utilize a GPU Cluster the single GPU code as well as the inter GPU communication needs to be efficient. In this session the FDTD code B-CALM is introduced and is used as a case study to explain by example how both targets can be met. We explain how the memory bound kernels of B-CALM have been optimized for Fermi and Kepler and how efficient inter GPU communication was enabled by using CUDA-aware MPI. We explain in detail how this was done and present two performance models which we have developed to estimate single GPU performance as well as the scaling limits. To validate the model performance results from different systems are presented including a Infiniband Cluster with GPUDirect RDMA.

Session Level: Beginner
Session Type: Talk
Tags: Numerical Algorithms & Libraries; Clusters & GPU Management; Supercomputing; Computational Physics

Day: Wednesday, 03/26
Time: 15:30 - 15:55
Location: Room LL20D

S4261 - Massively-Parallel Stochastic Control and Automation: A New Paradigm in Robotics

Jonathan Rogers ( Assistant Professor, Georgia Institute of Technology )
Jonathan Rogers
Dr. Jonathan Rogers is an Assistant Professor at the Woodruff School of Mechanical Engineering at the Georgia Institute of Technology, where he is Director of the Intelligent Robotics and Emergent Automation Lab (iREAL). His research interests include stochastic control and dynamics, nonlinear estimation, and the intersection of robotic systems control and high performance computing. Dr. Rogers’ research has been funded by the US Army, US Navy, and various industry sponsors. In 2012, he was the recipient of the Army Research Office Young Investigator Program Award for his work in state estimation of complex dynamical systems.

Uncertainty in locomotion and sensing is one of the primary challenges in the robotics domain. GPU's are emerging as powerful new tools for uncertainty quantification through their ability to perform real-time Monte Carlo simulation as part of a closed-loop control system. By coupling GPU-based uncertainty propagation with optimal control laws, robotic vehicles can "hedge their bets" in unknown environments and protect themselves from unexpected disturbances. Examples of GPU-based stochastic controllers will be discussed for several robotic systems of interest, including simulated and experimental results demonstrating unique improvements in obstacle avoidance and accuracy. The theoretical concepts behind GPU-based control will be described allowing application of these control laws to a wide array of robotic systems.

Session Level: All
Session Type: Talk
Tags: Machine Learning & AI; Recommended for All Press

Day: Wednesday, 03/26
Time: 15:30 - 15:55
Location: Room LL20B

S4295 - Deep Optimization of the Parallel Algorithm for Molecular Dynamics Simulations

Witold Rudnicki ( Associate Professor, University of Warsaw, ICM )
Witold Rudnicki is an Associate Professor at the ICM, University of Warsaw. His background is theoretical biophysics, molecular dynamics simulations, bioinformatics and high performance computing. His current research interests concentrate on development of algorithms for hybrid computing architectures in the area of molecular dynamics, Monte Carlo, molecular modelling and machine learning.

In-depth analysis of optimizations of the molecular dynamics code for large-scale molecular dynamics simulations will be presented. They were performed on the GPU port of the IMD code used for MD simulations of large solid-state systems. Several optimization techniques were developed for the linked-cell protocol of MD simulations: (1) tiling of atom-atom interactions; (2) implementation of action reaction principle; (3) removal of redundant atoms and tiles; and (4) pipelining of the computations for subsequent layers of cells. These methods were compared with a brute force approach and tested for Fermi and Kepler architectures. The optimizations employed allowed up to 5-fold performance improvement in comparison with the straightforward port on Kepler and up to 3-fold improvement on Fermi. Up to 60-fold speedups of force kernels were observed in comparison with a single core CPU. The single workstation with K20 card was equivalent to 64 MPI processes on a cluster.

Session Level: Intermediate
Session Type: Talk
Tags: Molecular Dynamics; Computational Physics; Supercomputing

Day: Wednesday, 03/26
Time: 15:30 - 15:55
Location: Room LL21E

S4304 - Batch QR Decomposition of Small Matrices on the GPU Using Givens Rotations

Pierre-Yves Taunay ( Research Programmer, The Pennsylvania State University )
Pierre-Yves Taunay
Pierre-Yves Taunay obtained his Master of Science in Aerospace Engineering from The Pennsylvania State University in 2012, along with a "General Engineer" degree from the Ecole Centrale de Nantes, France. He has been working since then at the Research Computing and Cyberinfrastructure unit at The Pennsylvania State University as a Research Programmer. His current research focuses on high performance computing for large scale engineering and scientific applications such as molecular dynamics, fluid dynamics, or plasma physics, using Graphics Processing Unit (GPU) and the Message Passing Interface (MPI) standard.

This work details several GPU implementations of the QR decomposition algorithm using Givens rotations, with a particular focus on large batches of small matrices, displaying performance improvements over similar CPU routines. Each approach essentially consists of successive operations on the input matrix in order to transform it to the upper triangular matrix R, while accumulating operations in the matrix Q. Each GPU block operates on one or more matrices, with care taken to avoid thread divergence and large memory transfers.

Session Level: Intermediate
Session Type: Talk
Tags: Defense; Numerical Algorithms & Libraries; Supercomputing

Day: Wednesday, 03/26
Time: 15:30 - 15:55
Location: Room 210D

S4471 - High Speed Analysis of Big Data Using NVIDIA GPUs and Hadoop

Partha Sen ( COO, Fuzzy Logix )
Partha Sen
Partha has a passion for solving complex business problems using quantitative methods, data mining and pattern recognition. For a period of about 12 years from 1995 to 2007, Partha pursued this passion as a hobby and developed about 100 algorithms and over 700 quantitative models. These algorithms and models are the basis for the solutions being implemented by Fuzzy Logix today. Before founding Fuzzy Logix, Partha worked at Bank of America where he held senior management positions in the commercial and investment bank and in the portfolio strategies group. In the commercial and investment bank, Partha led the initiative to build a quantitative model driven credit rating methodology for the entire commercial loan portfolio. The methodology is used by the bank for allocating reserves against potential losses from loans. In the portfolio strategies group, Partha led a team to devise various strategies for effectively hedging the credit risk for the bank's commercial loan portfolio and for minimizing the impact of mark-to-market volatility of the portfolio of hedging instruments (Credit Default Swaps, Credit Default Swaptions, and CDS Indexes). Partha was also responsible for managing the Quantitative Management Associate Program at Bank of America. This is a two-year associate development program which has groomed over 75 quantitative managers within the enterprise. Prior to working at Bank of America, Partha held managerial positions at Ernst and Young and Tata Consultancy Services. He has a Bachelor of Engineering, with a major in computer science and a minor in mathematics from the Indian Institute of Technology. He also has an MBA from Wake Forest University.

Performing analytics on data stored in Hadoop can be time consuming. While Hadoop is great at ingesting and storing data, getting timely insight out of the data can be difficult which reduces effectiveness and time-to-action. The use of NVIDIA GPUs to accelerate analytics on Hadoop is an optimal solution that drives high price to performance benefits. In this session, we'll demonstrate a solution using NVIDIA GPUs for the analysis of big data in Hadoop. The demo will show how you can leverage the Hadoop file system, it's map reduce architecture and GPUs to run computationally intense models bringing together both data and computational parallelism. Methods demonstrated will include classification techniques such as decision trees, logistic regression and support vector machines and clustering techniques like k means, fuzzy k means and hierarchical k means on marketing, social and digital media data.

Session Level: Intermediate
Session Type: Talk
Tags: Big Data Analytics & Data Algorithms; Finance; Bioinformatics & Genomics; Recommended Press Session – HPC-Science

Day: Wednesday, 03/26
Time: 15:30 - 15:55
Location: Room 210B

S4513 - GPU Acceleration of Processing and Visualization for Various Optical Coherence Tomography Methodologies

Kevin Wong ( Graduate Student Researcher, Simon Fraser University )
Kevin Wong
Kevin Wong is a Master’s Student in the Biomedical Optics Research Group at Simon Fraser University. He received his Bachelor of Applied Science degree in Biomedical Engineering at Simon Fraser University. He developed his interest in GPU computing during a research project on the acceleration of Fourier Domain Optical Coherence Tomography processing. His graduate research concentrates on further improving high performance computing by exploring the potential of multi-GPU solutions for the FDOCT processing pipelines.

The goal of this session is to explore the many GPU computing applications for accelerating the processing pipeline and visualization algorithms in Fourier Domain Optical Coherence Tomography (FDOCT) for medical applications, such as ophthalmic imaging. We will describe the GPU programming techniques that we used for accelerating and optimizing real-time FDOCT processing, which is currently the fastest GPU implementation for FDOCT to the best of our knowledge. We will demonstrate two additional novel variations of functional OCT imaging made possible by GPU acceleration: real time speckle variance OCT (svOCT) to visualize the vasculature network in the retina, and wavefront sensorless adaptive optics OCT for ultrahigh resolution volumetric imaging. We will present videos to illustrate the use of our GPU-based FDOCT processing and the imaging applications in a clinical environment.

Session Level: All
Session Type: Talk
Tags: Medical Imaging & Visualization

Day: Wednesday, 03/26
Time: 15:30 - 15:55
Location: Room LL21B

S4576 - Exploring New Optimizations for Hybrid Programming Using OpenSHMEM and OpenACC

Oscar Hernandez ( Researcher, Oak Ridge National Laboratory )
Oscar Hernandez
Oscar Hernandez is a member of the Computer Science and Mathematics Division at Oak Ridge National Laboratory. He works on code analysis and transformation tools to support the NCCS and OLCF project activities. His research focus has been on compilers, static tools performance tools integration, and optimization techniques for parallel languages, especially for OpenMP and accelerator directives. He represents ORNL at the OpenACC, OpenMP ARB and OpenSHMEM organizations. His work also involves the research and development of tools for porting codes to future platforms, and innovative ways to increase programming productivity. He also works on static analysis tools for OpenSHMEM programs. Prior to his work, he was a developer of the OpenUH compiler and the Dragon Analysis Tool to analyze OpenMP programs. Oscar graduated from the University of Houston with a Ph.D. and Msc. degree in the area of compilers and high performance computing.

With the new accelerator-based systems, heterogeneous hybrid programming models are the natural choice to exploit the hardware available on these new systems. On accelerator-based system, previous efforts looking into hybrids models have primarily focused in using MPI (for inter-node programming on a cluster) in association with OpenACC/CUDA/OpenCL/HMPP (for inner-node programming on the accelerator). As accelerators get added into the mix, and there is better hardware support for PGAS languages/APIs, this means that new and unexplored heterogeneous hybrid models will be needed to effectively leverage the new hardware. In this session we explore the use of OpenACC directives to program GPUs and the use of OpenSHMEM, a PGAS library, for one-sided communication between nodes. We will also discuss how these two specifications interoperate and what new features are needed in the specifications to make this hybrid programming model work better.

Session Level: Intermediate
Session Type: Talk
Tags: Programming Languages & Compilers; Supercomputing

Day: Wednesday, 03/26
Time: 15:30 - 15:55
Location: Room LL20C

S4628 - BWT Indexing: Big Data from Next Generation Sequencing and GPU

Jeanno Cheung ( Research Engineer, HKU-BGI Bioinformatics Algorithms and Core Technology Laboratory )
Jeanno Cheung
Jeanno Cheung is a research engineer in HKU-BGI Bioinformatics Algorithms and Core Technology Laboratory. He works on bioinformatics applications and other projects that utilize parallel computing platforms such as CUDA and MIC.

With the rapid improvement in DNA sequencing technologies, huge amounts of sequencing data can be produced in a time-and-cost efficient manner (e.g., it costs only a few thousand US dollars to produce 100 Gigabases in a day). Compressed full-text indexing based on BWT has found to be very useful in speeding up the analysis of the high-throughput sequencing data. In this talk we consider two major problems in this context, namely, alignment of sequencing data onto a reference genome (for genetic variations detection), and indexing of sequencing data. These two problems have different applications and different technical challenges. We show how GPU can be exploited to achieve tremendous improvement in each case. In particular, our alignment solution makes it feasible to conduct NGS analysis even in the time-critical clinical environment; for example, 30+ fold whole genome sequencing data of human (~100 Gigabases) can be aligned and analyzed in a few hours, with sensitivity and accuracy even higher than before.

Session Level: Intermediate
Session Type: Talk
Tags: Bioinformatics & Genomics; Big Data Analytics & Data Algorithms

Day: Wednesday, 03/26
Time: 15:30 - 15:55
Location: Room LL21D

S4669 - Supercharging Engineering Simulations at Mercury Marine with NVIDIA GPUs

Arden Anderson ( Technical Specialist - Computational Analysis, Mercury Marine )
Arden Anderson
Arden Anderson is responsible for structural analysis, crashworthiness, and vessel performance simulations at Mercury Marine. He also determines computing hardware requirements and has helped Mercury Marine transition to High Performance Computing (HPC). Prior to joining Mercury Marine in 2005, Mr. Anderson spent three years as an Engineering Analyst at Lawrence Livermore National Laboratory simulating blast loading and hypervelocity impact. At LLNL he was exposed to world class HPC environments, including the fastest computer in the world at that time (BlueGene/L). Mr. Anderson holds a BS and MS in Engineering Mechanics from the University of Wisconsin – Madison.

Mercury Marine will discuss their recent evaluation of NVIDIA GPU's for accelerating performance for Abaqus FEA. As part of the talk, Arden will highlight the critical metrics for the evaluation, and how they chose between having the GPU's at the local desktop or installed in the back room cluster. Arden will also discuss the business impact for the company from using a GPU-accelerated FEA implementation. Lastly, Arden will discuss what Mercury sees as future potential for leveraging GPU's as part of their design workflow.

Session Level: All
Session Type: Talk
Tags: Digital Manufacturing Summit; Computational Structural Mechanics; Clusters & GPU Management; Computational Fluid Dynamics

Day: Wednesday, 03/26
Time: 15:30 - 15:55
Location: Room 210H

S4717 - Our Ride: Taking Design Offroad with the Carducci Dual Sport SC3 Adventure Motorcycle

Jim Carducci ( President & Co-founder, Carducci Dual Sport LLC )
Jim Carducci
Jim is the President and Co-founder of Carducci Dual Sport LLC responsible for the design, development and testing of the SC3 Adventure dual sport motorcycle. He’s been riding, racing and customizing motorcycles for over 40 years, and has a 32 year background in mechanical systems design engineering in the Silicon Valley semiconductor and aerospace industries. Jim started his career as a mechanical design engineer and worked his way up to Senior Director of Engineering leading a team who develop new technologies. Jim’s education includes a BS in Industrial & Systems Engineering from San Jose State University and graduate study in Electrical & Mechanical Engineering at Santa Clara University and SJSU.

Unusual for custom motorcycle builders, before any metal is cut, Carducci Dual Sport utilizes CAD and computer graphics tools to design, analyze, and render new SC3 Adventure dual sport motorcycles. The tools minimize timely and costly redesigns and reworks by enabling design architecture trade-offs, structural analysis of stressed components for safety, CFD heat transfer analysis for reliability, and 3D renders of the full model to visualize the motorcycle. This talk is an overview of the SC3 Adventure motorcycle, the CAD/CG tools, and the design and development process used to create an innovative reproducible custom dual sport motorcycle.

Session Level: All
Session Type: Talk
Tags: Digital Manufacturing Summit; Recommended Press Session – Digital Manufacturing

Day: Wednesday, 03/26
Time: 15:30 - 15:55
Location: Room 210G

S4757 - GPU Acceleration of the Fine/Turbo CFD Solver at the HPC Scale

David Gutzwiller ( Software Engineer, Numeca-USA )
David Gutzwiller
David joined Numeca USA in 2009 after completion of a graduate degree in Aerospace Engineering from the University of Cincinnati. At Numeca, David has worked on the development of the Fine/Turbo solver for massively parallel execution

In response to long term trends in power consumption and cost the GPU has become a common component in many of the world's largest supercomputers. However, industrial CFD solvers are typically not well suited for quick or effective GPU porting. Successfully utilizing the computational power of the GPU in an HPC environment requires very careful planning. This talk explores the GPU acceleration of the Fine/Turbo multi-block structured CFD solver through targeted porting of a computationally expensive module. The session will begin with an overview of the "CPUBooster" convergence acceleration module, which was chosen as a promising candidate for acceleration. The restructuring of this module with a GPU-oriented programming model and tuning of the new implementation for optimal performance will also be explored. Further discussion will highlight the positive impact of the GPU developments from an HPC user's point of view. Recent design optimization work by Ramgen Power Systems on the ORNL Titan supercomputer will showcase the performance gains. This talk is part of the "Extreme-Scale Supercomputing with Titan Supercomputer" series chaired by Jack Wells, Director of Science, National Center for Computational Sciences, Oak Ridge National Laboratory.

Session Level: Intermediate
Session Type: Talk
Tags: Supercomputing

Day: Wednesday, 03/26
Time: 15:30 - 15:55
Location: Room LL21C

S4169 - Accelerate Distributed Data Mining with Graphics Processing Units

Nam-Luc Tran ( R&D Engineer, EURA NOVA )
Nam-Luc Tran
Graduated in 2010 from the ULB in the department of biomedical engineering, Nam-Luc is currently a R&D engineer at EURA NOVA, a private research company. Nam-Luc has led many research projects so far in the fields involved in Big Data such as storage, distributed processing and architecture, with multiple collaborations from the ULB and the UCL.

Numerous distributed processing models have emerged, driven by (1) the growth in volumes of available data and (2) the need for precise and rapid analytics. The most famous representative of this category is undoubtedly MapReduce, however, other more flexible models exist based on the DFG processing model. None of the existing frameworks however have considered the case when the individual processing nodes are equipped with GPUs to accelerate parallel computations. In this talk, we discuss this challenge and the implications of the presence of GPUs on some of the processing nodes on the DFG model representation of such heterogeneous jobs and on the scheduling of the jobs, with big data mining as principal use case.

Session Level: Intermediate
Session Type: Talk
Tags: Big Data Analytics & Data Algorithms

Day: Wednesday, 03/26
Time: 16:00 - 16:25
Location: Room 210B

S4273 - GPU-Accelerated Primary Analysis Pipeline in Ion Proton DNA Sequencer

Mohit Gupta ( Staff Software Engineer, Life Technologies )
Highly-Rated Speaker
Mohit Gupta
Mohit Gupta is working as a Staff Software Engineer in Ion Torrent division of Life Technologies. In this capacity, he is responsible for speeding up algorithms used in data analysis of PGM and Proton DNA sequencers with a particular focus on GPU computing. Prior to his stint at Life, he worked as Senior Research and Development Engineer with Mirafra Technologies, Bangalore, India, in the area of Electronic Design Automation working on compiler for hardware description languages like Verilog. He holds a B.Tech in electrical engineering from Indian Institute of Technology, Bombay, India and M.S. in Computer Engineering from University of California, San Diego. He has published or presented in conferences and workshops like ICCAD, GTC and DFMY.
Jakob Siegel ( Software Engineer, Life Technologies )
Jakob Siegel
Jakob Siegel is a Software Engineer at Life Technologies currently focusing on High Performance Computing tasks in the context of DNA sequencing. Jakob graduated as a Dipl-Ing in Software Engineering from the University of Applied Sciences in Esslingen Germany, after which he also got a M.S. and a Ph.D. in Electrical and Computer Engineering from the University of Delaware. He has been involved in software projects in a variety of fields from pure computer sciences through the automotive sector, naval communication systems, atmospheric research until he joined the Ion Torrent team in January 2012 to work on the software side of DNA sequencing. In the past Jakob published or presented in multiple computer engineering conferences, workshops and journals for example: Computing Frontiers, ICS, ICPPW, GTC, AACEC, JACT.

Learn how GPU's are being exploited in conquering the compute challenge in processing terabytes of experiment data generated by Ion Proton DNA sequencer by accelerating data compression and signal processing algorithms and fast forwarding the journey towards personalized medicine. In this talk, we will discuss our DNA sequencing technology, the data compression and fitting algorithms we use and their GPU implementations and a streaming execution model to overlap data transfer and kernel execution for this high throughput system.

Session Level: Beginner
Session Type: Talk
Tags: Bioinformatics & Genomics; Recommended Press Session – HPC-Science

Day: Wednesday, 03/26
Time: 16:00 - 16:50
Location: Room LL21D

S4279 - Resolving False Dependence on Shared Memory

Patric Zhao ( GPU Architecht, NVIDIA )
Patric Zhao
Patric is a senior GPU architect in NVIDIA's Shanghai arch team. He is working on HPC project including Molecular dynamics, Fluid dynamics and big data analysis.As well he has lots of experience in parallel programming design and implementations. Before joining Nvidia, he was a senior software engineer at Cadence and was charge of distributed processing for EDA tools.

Large-scale shared memory provided by GPU can hugely improve the performance of applications and the shared memory programming model has been widely used for commercial and scientific purpose.However, a plenty of barriers arise when the shared memory is immoderately employed, which causes most of running time wasted on synchronization. Furthermore, false dependence issue occurs in some cases and it may dramatically depress the performance. In this session,we demonstrate how to identify false dependence issues. Meanwhile,we propose various strategies and solutions to deal with false dependence issue from both application algorithm and GPU kernel level. Performance analysis on NAMD, a very popular molecular dynamics program, has been done and the code example is provided. By applying our strategies, the effective occupancy is improved to 0.98 and the synchronization time is reduced by 70%, which finally brings about 30% performance increments.

Session Level: Intermediate
Session Type: Talk
Tags: Performance Optimization; Molecular Dynamics

Day: Wednesday, 03/26
Time: 16:00 - 16:25
Location: Room 212B

S4355 - High Performance Edge-Preserving Filter on GPU

Jonas Li ( GPU Architect, NVIDIA )
Jonas Li
Jonas joined NVIDIA in April 2013 and is a GPU architect in Nvidia Shanghai Arch team. He is working on CUDA/OpenCL application profiling and optimization. Jonas has a lot of experience in parallel programming and performance tuning.

The goal of this session is to show you the GPU implementation of a novel approach for performing high-quality edge-preserving filtering of images and videos in real time. A variety of effects can be achieved based on this filter, including edge-preserving filtering, depth-of-field effects, and stylization. We develop a CUDA-based high performance GPU implementation of edge-preserving filter. In this session, we will present our efforts to address some of the challenges with optimizing performance of edge-preserving filter on GPU. We touch upon such issues as highly-dependent workload, warp synchronization, divergent memory access and transposed data storage. Applied these optimizing approaches, The GPU implementation can filter 256 megapixel color images per second on a Tesla K20c card.

Session Level: Intermediate
Session Type: Talk
Tags: Video & Image Processing; Performance Optimization; Mobile Applications

Day: Wednesday, 03/26
Time: 16:00 - 16:25
Location: Room LL21A

S4363 - Accelerated X-Ray Imaging: Real-Time Multi-Plane Image Reconstruction with CUDA

Prashanth Bhat ( CTO, Manipal Dot Net Pvt. Ltd. )
Prashanth Bhat
Dr. Prashanth Bhat is Chief Technology Officer and Executive Director (Software) at Manipal Dot Net Pvt. Ltd., India, a technology outsourcing company which takes up software development and hardware design projects for worldwide clients. His areas of expertise include High performance parallel computing, GPU acceleration using CUDA, Search engine technology, and Embedded systems. Prior to joining Manipal Dot Net in 2007, he worked in the search engine industry for over eight years, during his tenures at Yahoo! Inc (USA), Overture Services, and Alta Vista Search. In these roles, he has contributed to the core search engine, the Contextual Match advertising infrastructure, and also a distributed machine learning architecture. As a summer intern at HP Research Labs, Palo Alto, he developed new process scheduling techniques for HP's high-end parallel servers. Dr. Prashanth Bhat graduated with a PhD in Computer Engineering from the University of Southern California, Los Angeles. He holds 3 US patents in the field of High Performance Computing and Search engines, and has authored about 15 international publications.

Explore the realm of modern X-ray Fluoroscopy, where ever-increasing data rates and computational requirements are the norm. This session presents an efficient and scalable CUDA solution for multi-plane image reconstruction, an essential yet computationally challenging component of these systems. Our parallelization strategy incorporates several non-trivial techniques to improve performance: (a)reduce run-time computations by using pre-computed LUTs; (b)reduce memory bandwidth consumption by accumulating computations in registers before writing to memory; (c)exploit 2D data locality by using the GPU's texture memory and cache; (d) optimize occupancy by tuning the thread-block configuration. We present experimental results on three Kepler GPUs: GeForce GTX690, Tesla K10, and Tesla K20. On the GTX690, we show real-time rates of 15 fps for 32 1000x1000 image planes, with speed-ups of 6000x over a CPU implementation, and 10x over an alternative CUDA approach. On both Tesla GPUs, we show linear scaling, making a multi-GPU solution viable.

Session Level: All
Session Type: Talk
Tags: Medical Imaging & Visualization; Video & Image Processing; Ray Tracing

Day: Wednesday, 03/26
Time: 16:00 - 16:50
Location: Room LL21B

S4371 - AMR-Based on Space-Filling Curve for Stencil Applications

Takayuki Aoki ( Professor, Tokyo Institute of Technology )
Takayuki Aoki received a BSc in Applied Physics, an MSc in Energy Science and Dr.Sci (1989) from Tokyo Institute of Technology, has been a professor in Tokyo Institute of Technology since 2001 and the deputy director of the Global Scientific Information and Computing Center since 2009. He received the Minister award of the Ministry of Education, Culture, Sports, Science & Technology in Japan and many awards and honors in GPU computing, scientific visualization, and others. He was the leader of the team of the Gordon Bell Prize in 2011 and also recognized as a CUDA fellow by NVIDIA in 2012.

AMR (Adaptive Mesh Refinement) is an efficient method capable to assign a mesh with a proper resolution to any local areas. It has great advantages from the view point of computational cost and memory usage for practical stencil applications such as computational fluid dynamics. According to the octree data structure, the refinement process is recursive and the computation is carried out on the leaf meshes. By using bigger leaves than those of CPU, we can assign a CUDA block to a leaf with enough thread numbers. We show a GPU implementation in which the leaves are connected by the Hilbert space-filling curve and discuss the overhead of the data management.

Session Level: Intermediate
Session Type: Talk
Tags: Numerical Algorithms & Libraries; Computational Fluid Dynamics; Supercomputing

Day: Wednesday, 03/26
Time: 16:00 - 16:25
Location: Room LL20D

S4405 - Making It Fast and Reliable: Speech Recognition with GPUs by Sequential Utilization of Available Knowledge Sources

Alexei V. Ivanov ( Research Scientist, Pervoice SPA )
Alexei V. Ivanov
Alexei V. Ivanov has a background in engineering and computer science. He received his Ph.D. in Theoretical Foundations of Computer Science in 2004 from Belarussian State University of Informatics and Radioelectronics. He also holds a MSc degree in Electrical Engineering from Moscow Institute of Physics and Technology (State University). He has working experience both in academia (University of Trento, Moscow Institute of Physics and Technology) and industry (Pearson Knowledge Technologies, USA; Speech Technology Center, Russia; Lernout & Hauspie Speech Products NV, Belgium). Alexei has broad experience in speech processing and recognition systems. His current research interests include adaptive conversational machines; web-integration of individual multimedia experiences; speech characterization technology; integration of para-linguistic knowledge into the process of speech recognition and interpretation.
Fabio Brugnara ( Senior Research Scientist, Fondazione Bruno Kessler (FBK), Trento, Italy )
Fabio Brugnara
Fabio Brugnara has been a researcher at FBK/ITC-irst since 1990. His research has since then been focused on automatic speech recognition, with special interests in acoustic modelling, acoustic model training and adaptation, large vocabulary decoding and language modelling. He has been head of the CASPER research group (Comprehensive Automatic SPEech Recognition). He co-authored more than 40 scientific publications, and served as a reviewer for international conferences and journals. He teaches Speech Recognition courses at the Trento University and coordinates the Speech Processing course at the MHLTI Master at the same University.

Join application field experts for a discussion on the methods of speech recognition/understanding where the original task is factorized into smaller sub-tasks, that individually can be efficiently implemented in GPU. Speech recognition is an instance of multi-criteria optimization problem that allows sequential refinement of a pool of reasonably plausible hypotheses by pruning according to several optimality criteria originating from independent knowledge sources. Intermediate results are represented by graph-like objects: lattices and confusion networks of alternatives. Besides compactness this representation allows for efficient quantification of system's confidence in the output. Computation time analysis reveals improvements in comparison to the traditional implementation of speech recognizer. Live demonstration will be provided to illustrate advantages of the proposed approach.

Session Level: All
Session Type: Talk
Tags: Signal & Audio Processing; Defense; Recommended for All Press

Day: Wednesday, 03/26
Time: 16:00 - 16:25
Location: Room 210D

S4423 - Black Holes on the GPU: Experiences with Accelerated Relativity

Adam Lewis ( Graduate Student, University of Toronto/ CITA )
Adam earned his bachelor's degree in physics from Hamilton's McMaster University and is now entering the second year of his PhD at the University of Toronto. As part of a Caltech-Cornell-Toronto collaboration, he has spent much of the past year working on the GPU acceleration of SpEC, a code which performs detailed simulations of black hole mergers.

New "telescopes" that directly observe the spacetime fluctuations from black holes will come online within the next few years, but the data they generate will be meaningless unless compared against banks of known signals. Creating these banks requires black hole mergers of many different masses, spins, and orbital eccentricities to be simulated. This is not yet feasible, since even a single simulation may take several months. GPU acceleration offers a theoretical speedup of 50X, but until now has been too laborious to attempt. This is no longer the case: using a combination of hand-coding in CUDA, calls to CUBLAS and cuSPARSE, and our own automatic porting routine "CodeWriter," we have successfully accelerated the C++-based "Spectral Einstein Code". I will discuss our porting strategy, the challenges we encountered, and the new science made possible by the GPU. This talk should be of particular interest to scientists working on GPU ports of their own codes.

Session Level: Advanced
Session Type: Talk
Tags: Astronomy & Astrophysics; Programming Languages & Compilers; Computational Physics; Supercomputing

Day: Wednesday, 03/26
Time: 16:00 - 16:25
Location: Room LL21F

S4449 - Product Innovation Using Private & Public Cloud

Ravi Kunju ( VP of Strategy and Business, Altair )
Ravi Kunju: Over 20 years of experience in applying advance numerical methods and analytics, specifically in HPC, to solve complex problems in the areas of CAE and BI. Ravi's career has spanned working at Ford and Chrysler in the areas of crash-safety, advanced manufacturing (sheet metal forming), and at Altair in product design, software product management, and executive roles in global sales, regional management, and strategic marketing. Ravi has a M.S. in Mechanical Engineering from Wayne State University, and an MBA from Ross School of Business, University of Michigan, Ann Arbor.

Simulation driven product innovation leads to a lot of design explorations that traditionally require significant investment in computing infrastructure. Cloud based solutions have promising potential in becoming a channel for such massive computations, however the biggest challenge is to address the visualization of the 'big-data', generated from these large computation. A software and hardware engineered appliance targeted in providing a unified interface for the entire Product Simulation Lifecycle will be demonstrated with examples, as the framework for the Altair's private and public cloud offerings.

Session Level: All
Session Type: Talk
Tags: Graphics Virtualization Summit; Computational Structural Mechanics; Digital Product Design & Styling; Digital Manufacturing Summit

Day: Wednesday, 03/26
Time: 16:00 - 16:25
Location: Room 210F

S4536 - An Approach to Parallel Processing of Big Data in Finance for Alpha Generation and Risk Management

Yigal Jhirad ( Head of Quantitative Strategies and Risk Management , Cohen & Steers )
Yigal  Jhirad
Yigal D. Jhirad, Senior Vice President, is Director of Quantitative Strategies and a Portfolio Manager for Cohen & Steers’ options and real assets strategies. Mr. Jhirad heads the firm’s Investment Risk Committee. He has 26 years of experience. Prior to joining the firm in 2007, Mr. Jhirad was an executive director in the institutional equities division of Morgan Stanley, where he headed the company’s portfolio and derivatives strategies effort. He was responsible for developing, implementing and marketing quantitative and derivatives products to a broad array of institutional clients, including hedge funds, active and passive funds, pension funds and endowments. Mr. Jhirad holds a BS from the Wharton School. He is a Financial Risk Manager (FRM), as Certified by the Global Association of Risk Professionals. He is based in New York.
Blay Tarnoff ( Senior Software Engineer, Cohen & Steers )
Blay Tarnoff
Blay Tarnoff is a senior applications developer and database architect. He specializes in array programming and database design and development. He has developed equity and derivatives applications for program trading, proprietary trading, quantitative strategy, and risk management. He is currently a consultant at Cohen & Steers and was previously at Morgan Stanley.

This session discusses the convergence of parallel processing and big data in finance as the next step in evolution of risk management and trading systems. We advocate a risk management approach in finance should evolve from more traditional inter-day top down metrics to intra-day bottom up approach using signal generation and pattern recognition. We have also determined that parallel processing is a key tool to absorb greater insights into market patterns providing "trading DNA" and more effective tools to manage risk in real time.

Session Level: All
Session Type: Talk
Tags: Finance; Big Data Analytics & Data Algorithms; Numerical Algorithms & Libraries

Day: Wednesday, 03/26
Time: 16:00 - 16:50
Location: Room 210C

S4604 - BUDE: GPU-Accelerated Molecular Docking for Drug Discovery

Richard Sessions ( Senior Research Fellow, University of Bristol )
Richard Sessions
Richard Barry Sessions (RBS) started his research career as a physical-organic chemist with Professor RW Alder (FRS, Bristol) and held a Royal Society Fellowship with Professor JM Lehn (Nobel Laureate, Strasbourg). He migrated fields to molecular modelling in the mid 1980s, performing some of the first protein molecular dynamics simulations in the UK in David Osguthorpe's group (Bath). After a period at Cambridge University (UK) he has been the principal molecular modeller in the School of Biochemistry at the University of Bristol UK where he continues to develop methods and collaborate with experimental groups. He has published over 160 peer-reviewed papers in the scientific literature.

The Bristol University Docking Engine (BUDE) is next-generation molecular docking software exploiting GPUs to deliver a step change in performance. Massive sampling of search space, coupled with a novel method of estimating the free energy of binding between the receptor and ligand (the docking partners), enables novel science. BUDE and a medium sized GPU-enabled supercomputer can be used to perform (1) virtual-screening-by-docking of 10 million drug-like molecules to a protein for drug discovery in a few days; (2) scanning of the surface of a protein with hundreds of drug-like molecules to locate binding sites; (3) protein-protein docking in real space for predicting important protein interactions involved in cellular signaling. In recent optimization work with BUDE, we have achieved a sustained 46% theoretical peak FLOPs on an NVIDIA GTX680.

Session Level: Intermediate
Session Type: Talk
Tags: Molecular Dynamics; Bioinformatics & Genomics; Supercomputing

Day: Wednesday, 03/26
Time: 16:00 - 16:25
Location: Room LL21E

S4627 - Social, Mobile, Cloud, GPU: The Technology Stack For Untethered Product Development

Randall Newton ( Managing Director, Consilia Vektor )
Randall Newton
Randall S. Newton brings 25+ years of computer software industry experience to his current work in business intelligence, leadership training, and effective marketing in an era of digital disruption. Past employers include Autodesk, Bentley Systems, CADCAMnet, and Jon Peddie Research.

High-performance visual computing is the 'missing link' that allows for a new technology stack in product development, when combined with mobile, social, and cloud technologies. This session will explore new, untethered uses for existing product development tools, and new products and techniques only possible with the addition of GPU technology.

Session Level: All
Session Type: Talk
Tags: Digital Manufacturing Summit; Digital Product Design & Styling; Mobile Applications

Day: Wednesday, 03/26
Time: 16:00 - 16:25
Location: Room 210H

S4737 - On-Line and Batch Stitching of Gigapixel Images Using OpenGL and CUDA Frameworks

Daniel Marks ( Assistant Research Professor, Electrical and Computer Engineering, Duke University )
Daniel Marks
Daniel L. Marks is an Associate Research Professor of Electrical and Computer Engineering at Duke University. Dr. Marks is the lead optical and systems engineer on the DARPA AWARE program. He has made many contributions to computational imaging and real-time image processing, including novel methods in optical coherence tomography, compressive millimeter wave imaging and multiscale lens design.

We present GPU-based methods for generating gigapixel-scale image renderings from the AWARE multi-scale gigapixel cameras. We demonstrate a streaming zoomable gigapixel video interface, allowing viewers to digitally zoom by 30x over a 100 degree field of view. We also discuss adaptive batch gigapixel image stitching for online distribution. We compare the performance and utility of OpenGL-based image rendering through the traditional GPU video pipeline and CUDA-based image rendering via GPGPU methods.

Session Level: Intermediate
Session Type: Talk
Tags: Media & Entertainment Summit; Video & Image Processing; Media & Entertainment; Computational Photography

Day: Wednesday, 03/26
Time: 16:00 - 16:25
Location: Room 211B

S4758 - Accelerating Three-Body MD Potentials Using NVIDIA Tesla K20X GPUs

Masako Yamada ( Physicist, Advanced Computing Lab, GE Global Research )
Masako Yamada
Dr. Yamada is a Senior Scientist at GE Global Research. She completed her Ph.D. at Boston University, where she was a member of the Center for Computational Science. Dr. Yamada recently won the IDC HPC Innovation Excellence Award and HPCwire Editors’ Choice Award, and is an HPCwire 2014 Person to Watch.

We will give on overview on porting a three-body molecular dynamics potential from the CPU-only Jaguar supercomputer to the hybrid CPU/GPU Titan at Oak Ridge National Lab. We achieved >5x acceleration in a 1 million molecule droplet simulation by moving the 3-body potential and neighbor lists to the Tesla K20X GPU accelerator, while keeping the time integration, thermostat/barostat, bond/angle calculations and statistics on the AMD Opteron CPUs. This talk is part of the "Extreme-Scale Supercomputing with Titan Supercomputer" series chaired by Jack Wells, Director of Science, National Center for Computational Sciences, Oak Ridge National Laboratory.

Session Level: Advanced
Session Type: Talk
Tags: Supercomputing

Day: Wednesday, 03/26
Time: 16:00 - 16:25
Location: Room LL21C

S4821 - Interactive Simulations (where nobody has to die)

Robert Hodgin
Robert  Hodgin
Robert is a co-creator of the Cinder C++ framework and was a founding partner of the Barbarian Group. He also worked with Bloom to create the Planetary app which was the first code acquisition by the Smithsonian Cooper-Hewitt National Design Museum. He works primarily in C++, OpenGL, and GLSL. His coding style has been described as "sloppy" and "tightly coupled" but he still manages to create beautiful work if he does say so himself. Which he does. Often. He lives and works in Brooklyn.

My work over the last few years has trended towards three main categories: Flocking Simulations, Terrain Engines, and Astronomical Visualizations. Additionally, I strive for real-time so most of my work can be experienced with peripherals like the Oculus Rift VR headset or the LEAP Motion Controller. I will showcase a few of these projects and explain some of the methodology and creative influences that helped shape the end result. I will show no code, but I hope to inspire.

Session Level: All
Session Type: Talk
Tags: Real-Time Graphics Applications; NVScene

Day: Wednesday, 03/26
Time: 16:00 - 16:50
Location: Room 230C

S4836 - Merging ADAS and Infotainment to Create a Connected, Cloud-Enhanced, Vehicle Safety System

Roger Lanctot ( Associate director, Global Automotive Practice, Strategy Analytics )
Roger Lanctot has a powerful voice in the definition of future trends in automotive safety and infotainment systems. Roger draws on 25 years' experience in the technology industry as an analyst, journalist and consultant. Roger has conducted and participated in major industry studies, created new research products and services, and advised clients on strategy and competitive issues throughout his career. Roger is a prolific writer and blogger and is a frequent featured speaker at industry events on such topics as driver distraction, smartphone connectivity, customer relationship management and usage-based insurance. He has an AB in English from Dartmouth College.

The barriers between in-vehicle infotainment systems and safety systems are falling and off-board connections are proliferating. The combination of these two trends is enabling entirely new user experiences in connected vehicles while setting the stage for new revenue opportunities. The challenge for car makers and their suppliers is to build big data opportunities upon consumer usage, vehicle diagnostic and environmental data. Leveraging vehicle data will create new value propositions, mitigate driver distraction and help drivers avoid accidents and, eventually, deliver autonomous driving. But justifying and implementing connectivity is still in its earliest phases. This session will help define the combination of applications, services, content and enabling technologies that will speed enhanced safety to the market.

Session Level: All
Session Type: Talk
Tags: Automotive; Big Data Analytics & Data Algorithms; In-Vehicle Infotainment (IVI) & Safety

Day: Wednesday, 03/26
Time: 16:00 - 16:25
Location: Room 210A

S4844 - Using GPUs in the Cloud for Scalable HPC in Engineering and Manufacturing

David Pellerin ( Business Development Principal, High Performance Computing , Amazon Web Services )
David Pellerin is a Business Development Principal focusing on commercial HPC at Amazon Web Services. Prior to joining Amazon, David’s career included a number of ventures related to accelerated and high performance embedded computing, serving application domains including financial computing, life sciences, image processing, and electronic design automation. David is the author of five Prentice Hall books, most recently including Practical FPGA Programming in C.

The use of HPC Cloud Computing environments continues to accelerate with the advent of higher performing infrastructures and capabilities. In this presentation, AWS and HGST will provide a view into specific use cases that highlight how HPC and GPU Cloud Computing is being used as a competitive advantage with CAD/CAM and electronic design automation (EDA), with the ability to spin up clusters running HPC applications. Real-world manufacturing use cases will be discussed.

Session Level: All
Session Type: Talk
Tags: Graphics Virtualization Summit; Digital Manufacturing Summit; Remote Graphics & Cloud-Based Graphics

Day: Wednesday, 03/26
Time: 16:00 - 16:25
Location: Room LL20B

S4845 - Leveraging a Super Computer to Achieve Real Time Interaction for a Digital Peugeot Car with Full Global Illumination

Benoit Deschamps ( Imaging Solutions, Team Leader, PSA Peugeot Citroen )
After receiving his Master's degree in Imaging & Multimedia from the University of Bordeaux in France, Benoit was a development engineer for an automotive company before joining PSA Peugeot Citroën's IT department in 2006 where he now serves as the Team Leader for Imaging Solutions.
Alain Gonzalez ( Expert Workstations Graphics Technologies & 3D Imagery , PSA Peugeot Citroen )
A graduate of the University of Paris Sud Orsay with a Master's Degree in Computer Science & Engineering, Alain has worked in PSA Peugeot's IT department since 2000 starting as a Workstations IT Architect. Since 2009, Alain has been involved with the Expert Workstations Graphics Technologies & 3D Imagery area.
Arnaud Renard ( CTO of the Champagne-Ardenne HPC Center, Reims Champagne Ardennes University )
Arnaud Renard
PhD in Computer Science of University of Reims Champagne-Ardenne, Arnaud RENARD has been working in Parallel Computing and Combinatorial Optimization area. Since 2009, Arnaud is the CTO of the Champagne-Ardenne HPC Center, member of the french HPC network equip@meso. Since 2013, Arnaud is managing the full K20X hybrid Romeo supercomputer, ranked at the 5th position of the Green500 list.
Michael Krajecki ( Head of the ICT Research Center (CreSTIC) and the ROMEO High Performance Computing Center, Reims Champagne Ardennes University )
Michael Krajecki
Michaël Krajecki received his Master's degree in Computer Science from the University of Metz, France, in 1995. He defended his Ph.D degree in Computer Science at the University of Metz in 1998. He is a full professor in Computer Science from the University of Reims since 2005. He is the actual head of the ICT Research Center (CReSTIC) and of the ROMEO High Performance Computing Center. He is also an associate professor in the Royal Military College of Canada, Ontario. His research interests are mainly focused on parallel algorithms, GPU computing, high performance computing and combinatorial optimization.
Julien Berta ( VP Technology and Innovation, Mechdyne )
Julien Berta
Julien Berta serves as Vice President of Technology and Innovation within Mechdyne. His current responsibilities include guiding the company's technical vision and leading its technology development. Prior to his current role, Berta served as technical and product manager for Mechdyne's Software Division working on projects such as ISU, BP and Hess. Berta also worked for Fakespace Labs, a pioneer in virtual reality research and development. Before rejoining Mechdyne in 2010, he served as the head of software development for F4, an online game studio in Paris, France. Berta earned a post graduate degree in computer graphics from Université Louis Pasteur, Strasbourg, France; an MS in Software engineering, from Ecole Nationale Supérieure des Télécommunications, Paris, France; and an MS in physics from Ecole Nationale Supérieure de Physique, Strasbourg, France.

PSA Peugeot Citroën in partnership with Reims University, RTT and Barco will show a car model in Real Time Full GI using the Romeo supercomputer based in Reims, France equipped with 260 Tesla K20s. The car model will be loaded into RTT DeltaGen which will be connected to Reims through a remote display to leverage the horse power of the GPU Cluster. Tapping into this power on demand allows Peugeot to achieve stunning photorealistic results in real time, as if the real vehicle was right in front of them. With this, design changes for materials and colors can be visualized instantly and decisions can be made faster.

Session Level: All
Session Type: Talk
Tags: Digital Manufacturing Summit; Automotive; Digital Product Design & Styling; Rendering & Animation; Recommended Press Session – Digital Manufacturing

Day: Wednesday, 03/26
Time: 16:00 - 16:50
Location: Room 220C

S4878 - Android Differentiation Options on Tegra K1

Andrew Edelsten ( Manager, Tegra Developer Technologies, NVIDIA )
Andrew Edelsten
Andrew has 15 years experience making computer games, managing web and data centers and even a short stint as a commercial lawyer. He moved to NVIDIA four years ago and manages a team of Tegra and Android specialists who assist developers to enhance their games and apps for NVIDIA's Tegra processor.

Android continues its meteoric rise as the world's dominate mobile operating system. Every day developers large and small discover new ways to delight users but getting noticed is increasingly difficult. The latest NVIDIA Tegra K1 processors provide developers with a host of new features to differentiate their titles and get them flying above the rest of the crowd.

Session Level: Beginner
Session Type: Talk
Tags: Mobile Summit; Game Development; Mobile Applications

Day: Wednesday, 03/26
Time: 16:00 - 16:25
Location: Room 210E

S4960 - Creating High-Dynamic-Range Content for Dolby Vision

Thad Beier ( Director of Image Platform Workflow, Dolby Laboratories )
Thad Beier
Thad Beier, Director of Image Platform Workflow, Dolby Laboratories, has spent most of his career in the movie visual effects business, starting with basic computer graphics research at NYIT’s Computer Graphics Lab in the late 70’s, culminating as Visual Effects Supervisor at Digital Domain working on films such as Transformers 3 and GI JOE: Retaliation. Thad is a member of the visual effects branch of the Academy of Motion Picture Arts and Sciences, and has been a long-term participant in AMPAS’s Main Sci-Tech Committee.

Thad Beier will present Dolby's high-dynamic range, wide color gamut system called "Dolby Vision", describing the motivation behind its development and the positive, visceral reaction that content producers and viewers alike have on first seeing content created and viewed in this radically wider image space. He will discuss how NVIDIA's GPU technology is integral to every step of the production process, from off-line computation to real-time image processing.

Session Level: All
Session Type: Talk
Tags: Media & Entertainment; Visual Effects & Simulation

Day: Wednesday, 03/26
Time: 16:00 - 16:25
Location: Room 211A

S4134 - Accelerated Visual Effects Made Accessible with Javascript

Sean Safreed ( Co-founder, Red Giant )
Sean Safreed is co-founder of Red Giant Software, and a 16-year veteran of the computer graphics industry. The company started with 2 products and has since grown to offer more than 50 products with a team that spans the United States, and Canada. Before founding Red Giant in 2002, he worked on the Apple's QuickTime team. At Silicon Graphics' he lead efforts to add innovative video features to the companies hardware systems. At Puffin Designs, he worked as a product manager on Commotion, a ground-breaking video paint application that originated at Industrial Light and Magic.

Learn how to exploit the powerful new platform from Red Giant that leverages OpenGL and OpenCL on the latest GPUs coupled with easy to use Javascript to create visual effects tools that run on a variety of operating system and host applications for video editing and compositing. The Red Giant platform lets artists create both simple image processing tools and complete user interfaces with just a few simple lines of code. This session will provide both an architectural overview and live examples of advanced tools that exploit the Red Giant framework. In addition, this session will show the power of connecting real-time gaming render techniques and visual effects.

Session Level: Beginner
Session Type: Talk
Tags: Media & Entertainment Summit; Video & Image Processing; Real-Time Graphics Applications

Day: Wednesday, 03/26
Time: 16:30 - 16:55
Location: Room 211A

S4147 - SIFT Descriptor Extraction on the GPU for Large-Scale Video Analysis

Hannes Fassold ( Senior Researcher, Joanneum Research )
Hannes Fassold
Hannes Fassold received a MSc degree in Applied Mathematics from Graz University of Technology in 2004. Since then he works at Joanneum Research, where he is currently a senior researcher at the Audiovisual Media Group of the DIGITAL — Institute for Information and Communication Technologies. His main research interest are algorithms for digital film restoration and content-based video quality analysis as well as the efficient parallelization of these algorithms on the GPU. He has published several publications in these fields and is the principal investigator for the CUDA Research Center at DIGITAL - Institute for Information and Communication Technologies, Joanneum Research

Learn how the analysis of large-scale video data sets can be greatly accelerated by taking usage of the power of GPUs. Due to their robustness, SIFT (Scale-Invariant Feature Transform) descriptors are very popular for all sort of video analysis tasks. In this talk, we will first present an efficient GPU implementation of an interest point detector (e.g. using the DoG or LoG operator) and the extraction of SIFT descriptors around these interest points. We will compare the GPU implementation with the reference CPU implementation from the HessSIFT library in terms of runtime and quality. Furthermore, we will talk about the benefits of GPU-accelerated SIFT descriptors for applications such as near-duplicate video detection, which aims at detecting duplicates almost identical video segments in large video data sets, or linking video segments by shooting location or salient object.

Session Level: Intermediate
Session Type: Talk
Tags: Video & Image Processing; Computer Vision; Media & Entertainment

Day: Wednesday, 03/26
Time: 16:30 - 16:55
Location: Room LL21A

S4249 - Histograms in CUDA: Privatized for Fast, Level Performance

Nicholas Wilt ( Author, The CUDA Handbook )
Nicholas Wilt
Nicholas Wilt has been programming professionally for more than twenty-five years in a variety of areas, including industrial machine vision, graphics, and low-level multimedia software. While at Microsoft, he served as the development lead for Direct3D 5.0 and 6.0, built the prototype for the Desktop Window Manager, and did early GPU computing work. At NVIDIA, he worked on CUDA from its inception, designing and often implementing most of CUDA’s low-level abstractions. Now at Amazon, Mr. Wilt is working on cloud computing technologies relating to GPUs.

Histograms are an important statistical tool with a wide variety of applications, especially in image processing. Naive CUDA implementations suffer from low performance on degenerate input data due to contention. This presentation will show how to use "privatized" (per-thread) histograms to balance performance of the average case against data-dependent performance of degenerate cases.

Session Level: Intermediate
Session Type: Talk
Tags: Big Data Analytics & Data Algorithms; Video & Image Processing

Day: Wednesday, 03/26
Time: 16:30 - 16:55
Location: Room 210B

S4265 - RDMA GPU Direct for the Fusion-io ioDrive

Robert Wipfel ( Fellow, Fusion-io )
Robert started his career at a parallel processing startup, and then at INMOS, worked on a distributed operating system for the Transputer. Robert next helped Unisys and Intel jointly enter the commercial parallel processing market. He worked on single system image Unix and Oracle Parallel Server. At Novell Robert was an architect or engineering lead for various Data Center products that integrated clustering, virtualization, and network storage. His work on management products combined web-scale automation, process orchestration and a federated CMDB to create IT as a Service. Robert joined Fusion-io as an architect and helped the company deliver its second generation ioMemory product line. He is presently chief architect for the ION Data Accelerator all-Flash SCSI storage appliance.

Learn how to eliminate I/O bottlenecks by integrating Fusion-io's ioDrive Flash storage into your GPU applications. The first part of this session is a technical overview of Fusion-io's PCIe attached ioDrive. The second part presents developer best practices and tuning for GPU applications using ioDrive based storage. Topics will cover threading, pipe-lining, and data path acceleration via RDMA GPU Direct. Demos and example code showing integration between RDMA GPU Direct and Fusion-io's ioDrive will be given.

Session Level: Intermediate
Session Type: Talk
Tags: Performance Optimization; Finance; Big Data Analytics & Data Algorithms

Day: Wednesday, 03/26
Time: 16:30 - 17:20
Location: Room 212B

S4365 - RAMSES on the GPU: An OpenACC-Based Approach

Claudio Gheller ( Computational Scientist, ETH-CSCS )
Highly-Rated Speaker
Claudio Gheller took a Ph.D. in Astrophysics at SISSA-ISAS (Trieste). He is currently working as computational scientist at the Swiss National Supercomputing Center (CSCS), which is part of ETH. Among current duties, he is responsible for WP8 (“Scientific codes enabling on new HPC architectures”) in the EU funded PRACE project, responsible for the Physics network in the Swiss funded PASC project and involved in a number of research projects on code development, HPC simulations and data analysis in astrophysics and visualization of scientific data.

We present the work accomplished to enable the numerical codes "RAMSES" to the GPU, in order to efficiently exploit hybrid accelerated HPC architectures. RAMSES is a code designed for the study astrophysical problems on different scales (e.g. star formation, galaxy dynamics, large scale structure of the universe) treating at the same time various components (dark energy, dark matter, baryonic matter, photons) and including a variety of physical processes (gravity, magneto-hydrodynamics, chemical reactions, star formation, supernova and AGN feedback, etc.). It is implemented in Fortran 90 and adopts the OpenACC paradigm to offload some the most computationally demanding algorithms to the GPU. Two different strategies have been pursued for code refactoring, in order to explore complementary solutions and select the most effective approach. The resulting algorithms are presented together with the results of tests, benchmarks and scientific use cases.

Session Level: Advanced
Session Type: Talk
Tags: Astronomy & Astrophysics; Supercomputing; Numerical Algorithms & Libraries; Computational Physics

Day: Wednesday, 03/26
Time: 16:30 - 16:55
Location: Room LL21F

S4369 - Move Your Showroom into the Cloud Using GRID VCA

Stefan Schoenefeld ( Developer Technology Engineer, NVIDIA )
Stefan Schoenefeld
Stefan joined NVIDIA in 2007 as a Systems Software Engineer working on the SceniX Scenegraph SDK. Since 2011 he is working as a Developer Technology Engineer, specializing in GRID and Remote Graphics for workstations.

This talk will give an overview how GRID VCA and how it can be used outside a standard Virtual Desktop Infrastructure. Learn how GRID VCA can replace the workstation-based infrastructure in a showroom without major changes to the existing software solution. We will also cover modifications done to the GRID VCA architecture to enable custom application features as well as an outlook on the future development plans.

Session Level: All
Session Type: Talk
Tags: Graphics Virtualization Summit; Automotive; Remote Graphics & Cloud-Based Graphics; Digital Manufacturing Summit; Recommended Press Session – Auto

Day: Wednesday, 03/26
Time: 16:30 - 16:55
Location: Room 210F

S4497 - Parallelizing a Real-Time 3D Finite Element Algorithm using CUDA: Limitations, Challenges and Opportunities

Vukasin Strbac ( PhD student, KULeuven University, Leuven )
Vukasin Strbac is a PhD student at KULeuven University, Leuven, Belgium. He is a member of the Biomechanics section, within the Department of Mechanical Engineering. He is also a member of the Robotics Assisted Surgery group specializing in the Finite Element Method and parallel computing for the intraoperative setting.

Learn about the challenges of parallelizing a Finite Element problem using the Total Lagrangian Explicit Dynamic formulation. We examine the algorithm and perform a detailed analysis of the performance limiting factors of parallelization using CUDA. Potential optimization benefits are elucidated in terms of register usage thresholds and other factors for better performance. Results of a larger usability study are presented on a simple problem examining single/double precision tradeoff on a wide range of GPUs and problem sizes. Discover the impact that real-time FE can bring to the intraoperative surgical setting with in-the-loop computation facilitating surgical robotics.

Session Level: Intermediate
Session Type: Talk
Tags: Numerical Algorithms & Libraries; Computational Structural Mechanics; Computational Physics

Day: Wednesday, 03/26
Time: 16:30 - 16:55
Location: Room LL20D

S4518 - Accelerating Dissipative Particle Dynamics Simulation on Kepler: Algorithm, Numerics and Application

Yu-Hang Tang ( Ph.D. Student, Brown University )
Yu-Hang Tang
Yu-Hang Tang is a Ph.D. student in the Division of Applied Mathematics at Brown University. He got his bachelor's degree in Polymer Science at Zhejiang University, China. Following one year of study at the Center for Biophysics and Computational Biology at University of Illinois at Urbana-Champaign, he started his Ph.D. research in applied mathematics at Brown University. His current interests are various particle-based simulation techniques including molecular dynamics, dissipative particle dynamics and smooth particle hydrodynamics. He is also devoted to the development of massively parallel algorithms.

The talk focuses on the implementation of a highly optimized dissipative particle dynamics (DPD) simulation code in CUDA, which achieves 20 times speedup on a single Kepler GPU over 12 Ivy-Bridge cores. We will introduce a new pair searching algorithm that is parallel, deterministic, capable of generating strictly ordered neighbor list and atomics-free. Such neighbor list leads to optimal memory efficiency when combined with proper particle reordering schemes. We also propose an in-situ generation scheme for Gaussian random numbers that has a better performance without losing quality. In addition, details will be given on how to design custom transcendental functions that fit specifically to our DPD functional form. The code is scalable and can run on over a thousand nodes on the Titan supercomputer. Demonstration of large-scale DPD simulations on vesicle assembly and red blood cell suspension hydrodynamics using our code will be given.

Session Level: Intermediate
Session Type: Talk
Tags: Molecular Dynamics; Computational Fluid Dynamics; Supercomputing; Numerical Algorithms & Libraries

Day: Wednesday, 03/26
Time: 16:30 - 16:55
Location: Room LL21E

S4533 - GPU Accelerated Model Combination for Robust Speech Recognition and Keyword Search

Wonkyum Lee ( Ph.D. student, Carnegie Mellon University )
Wonkyum Lee
Wonkyum Lee is a Ph. D student in the Department of Electrical and Computer Engineering at Carnegie Mellon University. He received his M.Sc. degree in Electrical Engineering from KAIST, Daejeon in 2009 and B.Sc degree in Radio and Communication Engineering from Korea University, Seoul, Korea in 2007. From 2009 to 2012, he was with KAIST Institute in Korea, where he works as a research engineer for next generation wireless communication and user experience. His primary research interest is to build deep neural network acoustic model and system combination for robust speech recognition and keyword search while he works on BABEL program.

Learn how to develop GPU-accelerated model combination for Robust Speech Recognition and Keyword Search that is built on (1) GPU-accelerated acoustic score computation for DNN and GMM models, (2) acoustic score level combination with different combination techniques, and (3) efficiency rescoring of hypothesis over hybrid architectures of GPU and multicore CPUs. Evaluation will be given under 2013 OpenKWS evaluation task, which is challenging corpus to see how combination helps speech recognition task and keyword search task.

Session Level: Intermediate
Session Type: Talk
Tags: Signal & Audio Processing; Defense; Machine Learning & AI; Mobile Applications

Day: Wednesday, 03/26
Time: 16:30 - 16:55
Location: Room 210D

S4554 - GPU Acceleration of a Variational Monte Carlo Method

Niladri Sengupta ( Graduate(PhD) Student, Louisiana State University, Baton Rouge, USA )
Niladri Sengupta
I have obtained my MSc. in Physics from Indian Institute of Technology Bombay, India in 2010. I am pursuing my PhD in Department of Physics and Astronomy in Louisiana State University, Baton Rouge. I am also an active member of GPU team consisting of multidisciplinary students and faculties of different departments such as Physics,Chemistry,Biology,Computer Science,Electrical Engineering,Mechanical Engineering and Mathematics in Center of Computation and Technology,LSU(A CUDA research center).My research interest is to develop and optimize the codes necessary to study the strongly correlated quantum systems. I have worked more than two semester on this GPU implementation of Variational Monte Carlo Method project which is a part of interdisciplinary project of LA-SiGMA at LSU. I have attended several CUDA,openACC workshops and taken GPU programming classes prior to work on this project.

The session will describe the CUDA implementation of a variational Monte Carlo method for the study of strongly correlated quantum systems including high-temperature superconductors, magnetic semiconductors and metal oxides heterostructures. The presentation will cover different tuning and optimization strategies implemented in the GPU code. To eliminate the bandwidth limited performance we have used caching and a novel restructuring of the computation and data access patterns.We also perform two specific optimizations for Kepler. The code uses dynamic compilation to improve performance, especially in parts with limited parallelism. Using Kepler, our code achieves 22 times and 176 times speedup compared to 8 cores and single core CPU implementations respectively. The GPU code allows us to obtain accurate results for large lattices which are crucial for developing predictive capabilities of materials properties. Our developed techniques for matrix inverse and determinant updates can be recycled for other quantum Monte Carlo methods.

Session Level: Intermediate
Session Type: Talk
Tags: Computational Physics; Quantum Chemistry

Day: Wednesday, 03/26
Time: 16:30 - 16:55
Location: Room 212A

S4688 - Photo-Realistic Real-Time Digital Mock-Up Design Review in a Five-Sided 4Kx4K Immersive Room

Andras Kemeny ( General Manager, Center for Virtual Reality and Immersive Simulation, Renault, Renault )
Andras Kemeny
Dr. Andras Kemeny, is the General Manager of the Center of Virtual Reality and Immersive Simulation, a virtual design center for vehicle engineering, constituted and developed by Andras Kemeny in 2002 with the support of Renault's Research and Development Executive Management. Its goal is to provide large simulation facilities for virtual vehicle engineering as well as its deployment in the various engineering departments such as vehicle equipment and architecture, driver aid systems as well as driver safety and ergonomics. Prior to founding the research center he was Research Group manager at Thales Training and Simulation between 1984 and 1986 for real time visualization software development and Marketing Manager from 1986 to 1988 at Thomson Digital Image. He was European Function Coordinator for the PROMETHEUS (EU45) Eureka Project, a 1.5 billion euros project, between 1989 and 1993 for inter-vehicle and road to vehicle communication systems. Dr.Kemeny initiated and directed the development of a large driving simulation software package – SCANER© - which is installed at Renault, Nissan, Audi, Hyundai, PSA, Valeo, Volvo Trucks and several research facilities in Europe, North America, China, Japan and Korea, including TRL (UK) and JARI (Japan) as well as Tongji University (China). Currently he is also Director of the Laboratory of Immersive Visualization (LIV), a joint Renault - Arts et Métiers ParisTech research laboratory, as well as Associate Professor at Arts et Métiers ParisTech. He is the President of the Driving Simulation Conferences DSC Europe, which he has initiated in 1995. He is member of the Editorial Board of the International Journal of Heavy Vehicle Systems, member of the ParisTech board and member of the ART Carnot Institut Steering Committee.

Renault had recently put in use a new CAVE™, a 5 sided virtual reality room with a combined resolution of 70 M pixels, distributed over sixteen 4K projectors and two 2K projector as well as an additional 3D HD collaborative power wall. Images of the studied vehicle are displayed in real time thanks to a cluster of 20 HP Z800 computers with 24 Go RAM and 40 nVidia Quadro 6000 graphics boards. Renault's CAVE™ aims at answering needs of the various vehicle styling and engineering design steps. Starting from vehicle architecture through the subsequent design steps, ergonomic and perceived quality control to production, Renault has built up a list of use-cases and carried out already a number of major Digital Mockup Design Review (DMDR) validations in this CAVE for on-going vehicle projects since early 2013. The talk will discuss the use of the CAVE for digital manufacturing design review and its role in the automotive design process.

Session Level: All
Session Type: Talk
Tags: Digital Manufacturing Summit; Automotive; Collaborative & Large Resolution Displays; Digital Product Design & Styling; Recommended Press Session – Digital Manufacturing

Day: Wednesday, 03/26
Time: 16:30 - 16:55
Location: Room 210G

S4689 - Increase Traffic and Revenues: Lightworks Iray + Photorealistic Interactive 3D-Online, In-Store POS Digital Configurators

Dave Coldron ( Product Director, Lightwork Design Ltd. )
Dave Coldron
Mr Coldron leads Product Management across all Lightwork Design solutions having key responsibility for their High End Visualization solution Lightworks Iray+. He has over 20 years experience in the computer graphics industry specifically in providing integrated solutions for the mechanical CAD, lighting simulation, architectural and interior design markets. He has an art and design background combined with wide ranging knowledge of 3D graphics technology, interaction design and usability. Using these skills he drives products forward using compelling digital content combined with a design process aimed at supporting end-users workflow.

Learn how to take your 3D Online & In-Store Point of Sale Digital Configuration experiences to new levels using Lightworks Iray+ Interactive Photorealistic Visualization to drive your next digital product campaign. We demonstrate how to free yourself from the constraints of image based configurators allowing true customization of model data, camera, material look and lighting leading to a great consumer experience, increased traffic and increased conversions. Drive your configurator directly from your 3D catalog, avoiding complex, costly and error prone image management, enabling faster product updates, reducing In-Store inventory and allowing a true connection between the consumer and the manufacturing and production process.

Session Level: All
Session Type: Talk
Tags: Digital Manufacturing Summit; Automotive; Ray Tracing; Rendering & Animation; Recommended Press Session – Digital Manufacturing

Day: Wednesday, 03/26
Time: 16:30 - 17:20
Location: Room 210H

S4759 - High Resolution Catastrophe Modeling Using CUDA

Dag Lohmann ( Co-Founder, KatRisk )
Dag Lohmann
Before co-founding the risk modeling company KatRisk LLC, Dag was Vice President of Model Development at Risk Management Solutions (RMS) and a scientist with NOAA/NCEP/EMC. His main interest is probabilistic catastrophe models. He published papers on modeling, forecasting, data assimilation and climate change. He received a Physics Diploma (Masters) from the Georg-August University in Goettingen (Germany) and a Ph.D. from Hamburg University (Germany).

Extreme weather and climate events are costly, dangerous, and disruptive. Our ability to estimate the current and future risk of such events is important for emergency response preparedness, climate change adaptation, relevant public policies , and insurance. After a short introduction into catastrophe modeling we will talk about high resolution global flood risk models. With CUDA-based fluid mechanics code running on the latest generation of NVIDIA Kepler GPUs (inhouse, as well as on the Oakridge National Lab TITAN supercomputer) it is now possible to create flood maps and probabilistic flood models on 10m to 90m resolution worldwide. We will talk about specific challenges (coding, atmospheric data, terrain models, data volume, etc.) during this project and will show exciting results of our simulations. This talk is part of the "Extreme-Scale Supercomputing with Titan Supercomputer" series chaired by Jack Wells, Director of Science, National Center for Computational Sciences, Oak Ridge National Laboratory.

Session Level: Intermediate
Session Type: Talk
Tags: Supercomputing

Day: Wednesday, 03/26
Time: 16:30 - 16:55
Location: Room LL21C

S4787 - Accelerated Software as a Service

Michael Houston ( Principal Engineer, NVIDIA )
Michael Houston
Mike Houston is a Principal Engineer at NVIDIA concentrating on mobile and cloud computing. He received his Ph.D. in Computer Science from Stanford University in 2008, focusing on research in programming models, algorithms, and run-time systems for parallel architectures. His currently the technical lead on NVIDIA’s data-center programs.

In this session we will cover how to use GPUs to accelerate back-end data-center infrastructure, specifically image processing. We will present an implementation of a REST API for image processing, concentrating on an approach to managing large numbers of concurrent requests while efficiently scheduling the CPU and GPU resources in the system. We will show that we can provide higher throughput and lower latency than CPU implementations we are replacing. We will also discuss the practical infrastructure implications, different deployment scenarios, and how to fit SW acceleration into those scenarios.

Session Level: Intermediate
Session Type: Talk
Tags: Video & Image Processing

Day: Wednesday, 03/26
Time: 16:30 - 16:55
Location: Room LL20B

S4867 - The Path to Fast Lines in Adobe Illustrator

Vineet Batra ( Senior Computer Scientist, Adobe )
Vineet Batra joined Adobe Systems in 1998 and has worked on many products including Acrobat, Illustrator and InDesign. He is presently Senior Computer Scientist, contributing to development efforts for graphics acceleration of Adobe Illustrator CC. In past, he has also implemented GPU-accelerated rendering of PDF model in Adobe Reader Mobile using OpenGL ES.

This talk covers a real-world application of NVIDIA's path rendering technology (NVPR) for accelerating 2D vector graphics, based on Adobe PDF model. We shall demonstrate the use of this technology for real-time, interactive rendering in Adobe Illustrator CC. The substantial performance improvement is primarily attributed to NVPR's ability to render complex cubic Bezier curves independently of device resolution. Further, we shall also discuss the use of NVIDIA's Blend extension to support compositing of transparent artwork in conformance with the Porter-Duff model using 8X-multisampling and per-sample fragment Shaders. Using these technologies, we achieve performance of 30 FPS when rendering and scaling a complex artwork consisting of a hundred thousand cubic Bezier curves with ten thousand blend operations per frame using GTX 780 TI graphics card.

Session Level: Intermediate
Session Type: Talk
Tags: Media & Entertainment Summit; Recommended Press Session – Media & Entertainment

Day: Wednesday, 03/26
Time: 16:30 - 16:55
Location: Room 211B

S4901 - The Evolution and Future of Wearable Displays

Paul Travers ( President, Vuzix )
Paul Travers
Paul Travers, President of Vuzix Corporation, has over 23 years of experience in the wearable display markets. Over the years his companies has developed solutions for the consumer, defense and enterprise markets and in practically every category from passive viewing to VR, AR and most recently smart glasses. With 100s of thousands of wearable display being shipped and over 7 unique product lines Paul and his team at Vuzix has been more involved in this industry than any other company in the world.

Through the history of wearable displays, an industry that has such great promise, there has been very few winners except in niche markets. This talk will explore the history of wearable display technology sharing some of the failures and also some of the few success; leading to what is happening today where the real opportunity for wearable displays is finally being realized. Leaving on some thoughts for how the technology is evolving to create possibly one of the most exciting new technologies and market opportunities since the smart phone itself.

Session Level: All
Session Type: Talk
Tags: Mobile Summit

Day: Wednesday, 03/26
Time: 16:30 - 16:55
Location: Room 210E

S4961 - Audi Piloted Parking on zFAS: Valet Parking for the 21st Century

Miklós Kiss ( Head of ADAS Predevelopment, Audi Electronics Venture GmbH )
Miklós Kiss
Prior to becoming Head of ADAS predevelopment, Miklos was Head of Audi Accident Research Unit. Prior to joining Audi, Miklos served as head of HMI Research at Volkswagen Research. Miklos was also head of ADAS HIM Research at Voklswagen Research and teamleader of HMI at Munich University Geration Reserach Program (LMU). His Ph.D. project was in time and language perception in the human brain (neuropsychology).

What does it mean to bring super computing into the car? Examples of piloted parking systems show what that means for customers as well as for developers. Audis way into piloted driving for the 21st century.

Session Level: All
Session Type: Talk
Tags: Automotive; Video & Image Processing

Day: Wednesday, 03/26
Time: 16:30 - 16:55
Location: Room 210A

S4151 - Full GPU Image Processing Pipeline for Camera Applications

Fyodor Serzhenko ( CEO, Fastvideo )
Fyodor Serzhenko
Fyodor Serzhenko is CEO of Fastvideo company. His research interests include high speed cameras and software for high speed imaging, high performance computing. He was graduated from Moscow Institute of Physics and Technology in 1989 and got PhD in physics of semiconductors in 1993.

This advanced session provides a technical and detailed analysis of how to combine fast performance and high quality for full image processing pipeline on GPU for camera applications in real time. We provide details on GPU image processing pipeline for camera and its constituent parts (Dark Frame subtraction, Flat-Field Correction, PRNU, White Balance, Demosaicing, ICC profiling and Color Management, output via OpenGL, compression to JPEG), and their suitability for the GPU architecture, analysis of achieved results and comparison with existing implementations, applications to machine vision, broadcasting and high speed imaging.

Session Level: Advanced
Session Type: Talk
Tags: Video & Image Processing; Computer Vision; Mobile Applications

Day: Wednesday, 03/26
Time: 17:00 - 17:50
Location: Room LL21A

S4199 - Effortless GPU Models for Finance

Ben Young ( Senior Software Engineer, SunGard )
Ben Young
Ben Young is a senior developer working across the Adaptiv product range. He has been at SunGard for over eight years and has been looking at Adaptiv Analytics performance as part of his work for the last six years or so

Learn how SunGard provides support for GPUs, such that both SunGard engineers, and quantitative developers at our clients have to make only trivial code changes to exploit both the CPU and GPU to full effect.

Session Level: Intermediate
Session Type: Talk
Tags: Finance

Day: Wednesday, 03/26
Time: 17:00 - 17:25
Location: Room 210C

S4248 - Restricting the Seed-and-Extend Search Space in GPU-Based Short-Read Alignment

Richard Wilton ( Associate Researc Scientist, Johns Hopkins University )
Highly-Rated Speaker
Richard Wilton works with compute-intensive applications involving terabyte- and petabyte-scale data.

Most research into the use of GPUs for biological sequence alignment has focused on the choice and implementation of appropriate parallel algorithms for sequence matching. This strategy has yielded a number of GPU-based implementations with speeds 5 to 10 times faster than CPU implementations with comparable sensitivity and mapping quality. We have taken a different approach to the use of GPUs by implementing a series of CUDA kernels that filter the set of reference locations at which to compute seed-and-extend alignments, thereby decreasing the amount of parallel sequence-matching computation and improving the overall throughput of the GPU/CPU pipeline. Even without extreme CUDA code optimization, we observe increased sensitivity (i.e., a larger number of reported valid mappings) with throughput as good as or better than existing GPU-based sequence aligners.

Session Level: Advanced
Session Type: Talk
Tags: Bioinformatics & Genomics

Day: Wednesday, 03/26
Time: 17:00 - 17:25
Location: Room LL21D

S4274 - High Resolution Astrophysical Fluid Dynamics Simulations on a GPU Cluster