Sign In
GTC Logo
GPU
Technology
Conference

April 4-7, 2016 | Silicon Valley

Schedule Planner

Print
Download PDF
 

 
Refine:
  • Session Levels:
  • |
  • |
  • |
  • |

TALK

Presentation
Details

S6645 - Scientific Visualization in HPC

Peter Messmer Principal Software Engineer , NVIDIA
Highly-Rated Speaker
Peter Messmer is a senior software engineer in NVIDIA's Developer Technology organization, working with clients to accelerate their scientific discovery process with GPUs. One area of his current research is to investigate how to utilize the GPUs in high performance computing systems for data analysis and visualization. Prior to joining NVIDIA, Peter spent more than 15 years developing HPC- and GPU-accelerated applications for industry and government clients, ranging from simulating next-generation particle accelerators or electromagnetic problems to modeling the behavior of dust on the surface of the Moon. Peter holds an M.S. and Ph.D. in physics from ETH Zurich, Switzerland, with specialization in kinetic plasma physics and nonlinear optics.

Learn how to leverage the graphics power in your GPU-accelerated supercomputer to turn your simulation data into insight. Starting from simulation data distributed across the nodes of a remote supercomputer, we'll cover various techniques and tools to convert this data into insightful visualizations at your workstation, leading to an end-to-end GPU accelerated visualization pipeline.

Level: Beginner
Type: Talk
Tags: In-Situ and Scientific Visualization; Supercomputing & HPC

Day: Monday, 04/04
Time: 09:00 - 09:50
Location: Room 212A

S6590 - HPC Visualization Using NVIDIA IndeX™

Tom-Michael Thamm Director, Software Product Management, NVIDIA
Tom-Michael Thamm is director for software product management at the NVIDIA Advanced Rendering Center (ARC) in Berlin, Germany, and is responsible for all commercial software products, such as NVIDIA mental ray, NVIDIA Iray, and NVIDIA IndeX. He is managing and coordinating with his team the customer support as well as the general product definition and positioning. Tom-Michael has worked for NVIDIA ARC, and before for mental images, for over 25 years. He has led several key software projects and products, such as the NVIDIA IndeX product for large volume visualization. He has studied mathematics.
Christopher Lux Senior Graphics Software Engineer, NVIDIA
Christopher Lux is a senior graphics software engineer at the NVIDIA Advanced Rendering Center. He received is Ph.D. in computer science in 2013 from the Bauhaus-Universitat Weimar, Germany. Through his interest in real-time computer graphics and scientific visualization, he early on focused his work on the interactive visualization of large-scale datasets from the geo-scientific and medical domain.
Marc Nienhaus Product Technology Lead of the NVIDIA IndeX commercial software at NVIDIA, NVIDIA
Marc Nis the product technology lead of the NVIDIA IndeX commercial software at NVIDIA. He manages the NVIDIA IndeX software engineering team and is responsible for the product architecture and applications of NVIDIA IndeX in various application domains. Before joining mental images' R&D rendering department and NVIDIA ARC, Marc researched as a post-doc at Northwestern University in Illinois and led research projects at the University of Potsdam. His research interests include parallel and distributed rendering and computing, scientific visualization, GPU-based rendering, and photorealistic and non-photorealistic expressive depictions. Marc holds an M.S. in mathematics with a minor in computer science from the University of Muenster, and a Ph.D. in computer science from the Hasso Plattner Institute at the University of Potsdam. Marc has published various papers on GPU-based real-time and non-photorealistic rendering techniques.

We'll give a technical overview of the NVIDIA IndeX architecture that enables instant visualization of simulation and compute data, details on the interface design and use. Further, NVIDIA IndeX's capabilities are demonstrated by real-world solutions, which include a real-time weather prediction and a seismic wave-propagation algorithm.

Level: All
Type: Talk
Tags: In-Situ and Scientific Visualization; Large Scale and Multi-Display Visualization; Computer-Aided Engineering

Day: Monday, 04/04
Time: 10:00 - 10:50
Location: Room 212A

S6796 - Khronos API Ecosystem Update – Including Vulkan, OpenCL, OpenVX and SPIR-V

Neil Trevett Vice President Developer Ecosystem, NVIDIA
Neil Trevett has spent over thirty years in the 3D graphics industry - and by day drives the advanced apps ecosystem on NVIDIA Tegra mobile and embedded devices. By night, Neil is the elected President of the Khronos Group industry standards consortium where he initiated the OpenGL ES standard now used by billions worldwide every day, helped catalyze the WebGL project to bring interactive 3D graphics to the Web, chairs the OpenCL working group defining the open standard for heterogeneous parallel computation and has helped create and launch the new generation Vulkan API.

Discover how over 100 companies cooperate at the Khronos Group to create open, royalty free standards that enable developers to access the power of the GPU to accelerate demanding compute, graphics and vision applications. This session includes the very latest updates, including the newly announced Vulkan, SPIR-V, OpenVX and OpenCL 2.1 specifications.

Level: Beginner
Type: Talk
Tags: Real-Time Graphics; Computer Vision & Machine Vision; Programming Languages

Day: Monday, 04/04
Time: 11:00 - 11:50
Location: Room LL20C

S6783 - VisionWorks™: A CUDA Accelerated Computer Vision Library

Elif Albuz Computer Vision Software Lead, NVIDIA
Elif Albuz is the technical lead for VisionWorks Toolkit at NVIDIA, driving features and optimizations with CUDA acceleration on Tegra GPUs. Before Computer Vision Group, she was leading CUDA FFT Library; designing new algorithms for motion estimation, superresolution and frame-rate up conversion and accelerating them on NVIDIA GPUs; designing architecture for error concealment, adaptive quantization for video codec hardware; and implementing low-level code for h.264, MPEG2 codecs. Prior to joining NVIDIA, she worked at Sony Electronics, leading DVD decoder firmware stack that was used in DVD players and Playstation 2, implementing real-time OS for multi-processor systems and accelerating h.264 using SIMD in the Multimedia Research Labs. Elif Albuz holds dual degree on Electrical Engineering and Computer Science where she focused on Artificial Intelligence and Robotics, and holds a Masters degree in Electrical Engineering where she did research on content based image retrieval, parallel architectures and algorithms.

In this talk, we will introduce NVIDIA VisionWorks™ toolkit, a software development package for computer vision (CV) and image processing. VisionWorks(TM) implements and extends the Khronos OpenVX standard, and it is optimized for CUDA-capable GPUs and SOCs enabling computer vision applications on a scalable and flexible platform. VisionWorks implements a thread-safe API and framework for seamlessly adding user defined primitives. The talk will give an overview of the VisionWorks toolkit, OpenVX API and framework, VisionWorks-plus modules including VisionWorks Structure From Motion and Object Tracker modules, and computer vision pipeline samples showing integration of the library API into a computer vision pipeline on Tegra platforms.

Level: All
Type: Talk
Tags: Computer Vision & Machine Vision; Embedded; Self-Driving Cars & Automotive

Day: Monday, 04/04
Time: 13:00 - 13:50
Location: Room LL20A

S6815 - Advanced Rendering with DirectX®

Oleg Kuznetsov DevTech, NVIDIA
Oleg began his professional career at NVIDIA in 2005 at the tender age of 20. Starting as a game tester, Oleg worked his way up to Developer Technology Engineer, optimizing different games from well-known legends like S.T.A.L.K.E.R. and Witcher 2 to the current and upcoming AAA titles. When not analyzing shader code, Oleg enjoys riding his Honda VFR1200 and snowboarding.

This talk focuses on some of the new features that DX12 and DX11.3 introduce, as well as touch up on how to drive DX12 efficiently. Amongst other things the slides will shed light on the use of predication, ExecuteIndirect and explicit MGPU in DX12.

Level: Beginner
Type: Talk
Tags: Rendering & Ray Tracing; Game Development

Day: Monday, 04/04
Time: 13:00 - 13:50
Location: Room LL20C

S6504 - A Data-Driven Methodology for NVIDIA GRID™ vGPU™ Sizing

Jeremy Main Senior Solution Architect, NVIDIA
Jeremy is the Senior Solution Architect for NVIDIA's GRID enterprise graphics virtualization in Japan. He works to architect solutions for organizations to deliver high-fidelity GPU-accelerated desktops and applications. Before joining NVIDIA, Jeremy led the development of several remote graphics products as well as 3D CAD software development. Jeremy received his Bachelor of Science from the University of Utah.
Milan Diebel Senior Product Manager, NVIDIA
Milan is the Senior Product Manager for the NVIDIA GRID product family. He has been working in the technology sector for 15 years in a variety of roles. Milan holds a PhD in Physics from the University of Washington as well as an MBA from Cornell University.

GRID vGPU sizing is often viewed as more of an art form than a science. One of the challenges is that synthetic performance benchmarks are not a good representation of actual user workloads in virtualized environments. Influenced by many customer interactions, we will be introducing a systematic way of producing sizing information utilizing both real user workloads and synthetic performance benchmarks.

Level: Intermediate
Type: Talk
Tags: Graphics Virtualization

Day: Monday, 04/04
Time: 14:30 - 15:20
Location: Room 210G

S6797 - Top 20 Posters Fast Forward

David Luebke Vice President Graphics Research, NVIDIA
Highly-Rated Speaker
David Luebke helped found NVIDIA Research in 2006 after eight years teaching computer science on the faculty of the University of Virginia. David is currently Vice President of Graphics Research at NVIDIA. His personal research interests include virtual and augmented reality, display technology, ray tracing, and graphics architecture. His honors include the NVIDIA Distinguished Inventor award, the NSF CAREER and DOE Early Career PI awards, and the ACM Symposium on Interactive 3D Graphics "Test of Time Award". David has co-authored a book, a SIGGRAPH Electronic Theater piece, a major museum exhibit visited by over 110,000 people, an online course on parallel computing that has reached over 80,000 students, and dozens of papers, articles, chapters, and patents on computer graphics and GPU computing.

GTC Fast Forward Poster program is an accelerated poster presentation program that serves as a catalyst for the advancement of an array of innovations that come from universities, research labs and industry. The GTC Poster Review Committee selected the best 20 posters submitted to GTC2016. This program gives the author a chance to present his GPU project in front of the top technology developer working in a vast array of industries.

Level: All
Type: Talk
Tags: General Interest; Press-Suggested Sessions: General Interest

Day: Monday, 04/04
Time: 15:00 - 15:50
Location: Room 212A

S6853 - MXNet: Flexible Deep Learning Framework from Distributed GPU Clusters to Embedded Systems

Mu Li Ph.D. Student, Carnegie Mellon University
Mu Li is currently a final year Ph.D. student at Carnegie Mellon University. His research interests lie in algorithms and systems for distributed machine learning and deep learning. In particular, he designs algorithms and systems scaling to petabyte datasets and running over thousands of machines. He has co-authored tens of top journal and conference papers ranging from learning theory to machine learning, from data mining to systems. He has served as principal architect at Baidu and co-founded several machine learning startups.
Tianqi Chen Ph.D. Student, University of Washington
Tianqi Chen is a third year Ph.D. at University of Washington, working on large scale machine learning. He has co-authored many important works in scalable learning systems, statistical sampling theory and deep learning. He also designed several widely used scalable learning systems, including XGBoost and MXNet.

This talk will describe how to develop and deploy deep learning applications efficiently and easily using MXNet. MXNet is a new deep learning framework developed by collaborators from over 10 institutes. It is designed for both flexibility and optimized performance, with easy to use interfaces in currently 7 programming languages including Python, Scala and R. We will discuss the technologies to scale out the framework to distributed clouds ranging from EC2, Azure, GCE to Spark clusters, and also memory optimizations to fit into embedded systems like mobile phones. Finally, we'll demonstrate deep learning applications in computer vision, natural language processing, and speech recognition.

Level: All
Type: Talk
Tags: Deep Learning & Artificial Intelligence; Programming Languages; Embedded

Day: Monday, 04/04
Time: 15:00 - 15:50
Location: Grand Ballroom

S6392 - AEC Project Execution Using GRID vGPU Enhanced Virtualization

Bill Dale Technology Leader, Jacobs
Bill is a technology leader at Jacobs with more than 20 years of experience in the areas of science, technology, and engineering. He specializes in identifying opportunities for improving work process with the practical application of technology and is passionate about information security and intellectual property management. Currently, he advises internal teams, and external entities, on optimizing process execution. He enjoys skiing, fishing, and spending time with his family.
Randall Siggers Solution Architect, NVIDIA
Randall is a Solutions Architect for NVIDIA with more than 20 years of experience in IT. He specializes in researching, analyzing, and implementing emerging technology. His current projects involve, SCCM, Cloud technology, imaging systems and VDI. He enjoys traveling and speaking at various tech conferences and in his spare time, enjoys building custom gaming rigs, vintage BMX, and import tuning.

This session presents an overview of the challenges involving traditional BIM workflow processes and the benefits of moving to GRID vGPU enabled Integrate Project Design.

Level: Intermediate
Type: Talk
Tags: Graphics Virtualization

Day: Monday, 04/04
Time: 16:00 - 16:25
Location: Room 210G

S6851 - GPU Server Portfolio Overview (Presented by Supermicro)

Neil Truong Product Manager, Supermicro
SUPERMICRO is the clear industry leader in GPU accelerated total solutions.As Vice President of Marketing & Worldwide Business Development at SUPERMICRO, Don Clegg leads teams focused on delivering High-Performance Server, Storage and Networking systems, leveraging GPU technology. Don brings 30+ years of direct experience in design, marketing, and business development to help Supermicro deploy its industry leading, first-to-market, scale-out/scale-up platforms. Don began his career as a design engineer, developing multi-node, multi-user, x86 servers and workstations. With an emphasis on first-to-market product introductions, Don subsequently held executive positions at several chipset and system companies where he helped them achieve #1 market share. The trend continues at Supermicro. He earned a bachelor's degree in Electrical Engineering from Brigham Young University, where he graduated with high honors.

Supermicro will be giving an overview on next generation technologies and GPU solutions.

Level: All
Type: Talk
Tags: Education & Training

Day: Monday, 04/04
Time: 16:00 - 16:50
Location: Room LL20A

S6868 - Give Life to your 3D Art with MDL and NVIDIA Iray® in Substance Painter

Manuel Kraemer Sr. Developer Technology Engineer, NVIDIA
Manuel Kraemer is a Senior Developer Technology Engineer at NVIDIA. Previously Manuel was a Graphics Software Engineer at Pixar Animation Studios. Prior to that, Manuel worked as a technical director at Disney Feature Animation, Double Negative and the BBC.
Jérémie Noguer Senior Product Manager, Substance Painter
With a game developer background and after being a Technical Artist for Allegorithmic for 7 years, Jérémie is the Senior Product Manager for Substance Painter since 2013.

Allegorithmic and NVIDIA will show how combining Substance, worldwide reference for procedural textures, MDL, the new standard to define multi-layer materials, and NVIDIA Iray, GPU-accelerated unbiased raytracer, will help solving artists and developers PBR material challenges from edition to final frame rendering for artistic shots. After explaining MDL basics and the associated material workflow in Substance Designer, we will showcase the latest edition of Substance Painter, market's most innovative real-time 3D painting software. Now embedding Iray as alternate viewport, Substance Painter fully leverages the power of MDL and Substance and natively enhances your art with the most advanced rendering quality reduced to minimal compute time thanks to GPU acceleration.

Level: Intermediate
Type: Talk
Tags: Rendering & Ray Tracing; Game Development

Day: Monday, 04/04
Time: 16:00 - 16:50
Location: Room 210E

S6859 - Unveiling the Impact of Time Slicing with NVIDIA GRID™ vGPU for Realistic ROI/TCO Analysis

Erik Bohnhorst GRID Solution Architect, NVIDIA
Erik Bohnhorst is a Senior GRID Solution Architect at NVIDIA based in Stuttgart, Germany. After 7 years working at HP and focusing on Client Virtualization, Erik joined NVIDIA to support the largest Graphics Accelerated Client Virtualization opportunities in central Europe. Erik regularly shares his experience and technical understanding of Client Virtualization opportunities at technical events like BriForum, HP Discover, VMworld, E2EVC and other industry focused events.

A detailed look into why time slicing the various GPU engines allows scalability without compromising the graphics experience and what impact it has on benchmarking and generating realistic user per host recommendations which directly impact the TCO/ROI model.

Level: Intermediate
Type: Talk
Tags: Graphics Virtualization

Day: Monday, 04/04
Time: 16:30 - 16:55
Location: Room 210G

S6117 - Parallelization and Performance of the NIM Weather Model on CPU, GPU and MIC Architectures

Mark Govett Chief, Advanced Computing Section, NOAA Earth System Research Laboratory
Highly-Rated Speaker
Mark manages the High Performance Computing Section, a software group that both supports model development, parallelization, and porting to high performance computers, and explores advanced computing technologies for the National Oceanic and Atmospheric Administration (NOAA). Mark has worked in high performance computing, code parallelization and compiler development for over 20 years. He has developed two Fortran compilers, the Scalable Modeling System (SMS) for MPI based parallelization, and the F2C-ACC GPU compiler. He also parallelized two weather models using the F2C-ACC compiler and has been collaborating with Cray and PGI to improve the capabilities and performance of their commercial GPU compilers.

In an era defined by increasing diversity in computing architectures, performance portability is a key requirement for weather and climate applications that require massive computing resources. In this talk, you will learn about how we developed and achieve performance on CPU, GPU and MIC architectures using industry-standard OpenACC and OpenMP directives. Performance results from the NIM weather model will be shown for a number of device, node and multi-node and system configurations. Further, communications optimizations will highlight a more than a 40% improvement in runtime with scaling to thousands of GPUs.

Level: Intermediate
Type: Talk
Tags: Earth System Modelling; Supercomputing & HPC; Programming Languages; OpenACC; Press-Suggested Sessions: HPC & Science

Day: Tuesday, 04/05
Time: 13:00 - 13:25
Location: Room 211A

S6144 - Introducing NVIDIA's Data Center GPU Manager

Brent Stolle Software Engineer, NVIDIA
Brent Stolle is a software engineer at NVIDIA
Rajat Phull Software Engineer, NVIDIA
Rajat Phull is a software engineer at NVIDIA.

NVIDIA is launching a new tool for data center GPU management. This is a freely available, comprehensive GPU management framework that enables cluster management, resource scheduling and monitoring products from NVIDIA partners and supports individual users and admins as well. Data Center GPU manager 1.0, available for Tesla GPUs on Linux, helps to ensure GPU reliability and uptime, streamline common data center administrative tasks and improve overall resource efficiencies while still providing complete control over GPUs and expanded visibility into their behavior. It includes active health monitoring, diagnostics, system alerts, and governance policies including power and clock management. The talk will provide an introduction to the key features of this SW stack, as well as an overview.

Level: All
Type: Talk
Tags: Data Center & Cloud Computing; Supercomputing & HPC

Day: Tuesday, 04/05
Time: 13:00 - 13:50
Location: Room LL21C

S6164 - Accelerating Gene Set Enrichment Analysis on CUDA-Enabled GPUs

Bertil Schmidt Professor, JGU Mainz
Bertil Schmidt is a tenured full professor and chair for Parallel and Distributed Architectures at the University of Mainz, Germany. Prior to that he was a faculty member at Nanyang Technological University (Singapore) and at University of New South Wales. Bertil's research group has designed a variety of algorithms and tools for computational science and bioinformatics, mainly focusing on the analysis of large-scale sequence and short read datasets. For his research work, he has received a CUDA Research Centre award, a CUDA Academic Partnership award, a CUDA Professor Partnership award, and Best Paper Awards at IEEE ASAP 2009 and IEEE ASAP 2015. Bertil serves as the champion for bioinformatics and computational biology on gpucomputing.net.
Christian Hundt Professor, University Mainz
Christian Hundt has received his diploma in theoretical physics for the analysis of quantization maps on curved manifolds and a PhD degree in Computer Science for the efficient subsequence alignment of time series on CUDA-enabled accelerators at the University of Mainz, Germany, in 2010 and 2015. In his current position, as a postdoctoral researcher at the Parallel and Distributed Architectures group, he investigates the design and parallelization of algorithms in the field of bioinformatics.

Learn how to efficiently parallelize gene set enrichment analysis (GSEA) using CUDA. GSEA is an important bioinformatics method that determines whether given sets of genes are statistically overrepresented between two phenotypes. The GSEA software from the Broad Institute is the most popular tool to perform such studies with several thousand users. NGS technologies are gradually replacing microarrays for high-throughput gene expression studies. Size and availability of input data sets are increasing, leading to high runtimes of the desktop GSEA application. We present an efficient CUDA parallelization of the core GSEA algorithm. By using a combination of parallelization techniques, we achieve speed-ups of around two orders of magnitude on a single GPU.

Level: Intermediate
Type: Talk
Tags: Computational Biology

Day: Tuesday, 04/05
Time: 13:00 - 13:25
Location: Marriott Salon 5

S6227 - Distributed Deep Learning at Scale

Soumith Chintala Research Engineer, Facebook AI Research
Soumith Chintala is a Research Engineer at Facebook AI Research. Prior to joining Facebook in August 2014, Soumith worked at MuseAmi, where he built deep learning models for music and vision targeted at mobile devices. In the past, Soumith worked on state-of-the-art deep learning models for pedestrian detection, natural image OCR, depth-images among others while driving his research heavily using CUDA and multiple GPUs.

This talk provides a brief overview of deep learning research, the challenges involved in scaling it up across multi-GPU and multi-machine clusters, while providing software that is flexible enough for research settings. We discuss the clear trends that are emerging in deep learning from a HPC perspective and discuss several examples from our work at Facebook AI Research.

Level: All
Type: Talk
Tags: Deep Learning & Artificial Intelligence; Computer Vision & Machine Vision; Press-Suggested Sessions: AI & Deep Learning

Day: Tuesday, 04/05
Time: 13:00 - 13:50
Location: Hall 3

S6253 - VMD: Petascale Molecular Visualization and Analysis with Remote Video Streaming

John Stone Senior Research Programmer, University of Illinois at Urbana-Champaign
Highly-Rated Speaker
John Stone is a senior research programmer in the Theoretical and Computational Biophysics Group at the Beckman Institute for Advanced Science and Technology, and associate director of the NVIDIA CUDA Center of Excellence at the University of Illinois. John is the lead developer of VMD, a high-performance molecular visualization tool used by researchers all over the world. His research interests include molecular visualization, GPU computing, parallel processing, ray tracing, haptics, and virtual environments. John was named an NVIDIA CUDA Fellow in 2010. In 2015, he joined the Khronos Group Advisory Panel for the Vulkan graphics API. John also provides consulting services for projects involving computer graphics, GPU computing, and high performance computing.

We'll showcase recent successes in the use of GPUs to accelerate challenging molecular visualization and analysis tasks on hardware platforms ranging from commodity desktop computers to the latest GPU-accelerated petascale supercomputers by Cray and IBM. We'll highlight the use of in-situ ray tracing and rasterization combined with GPU-accelerated video streaming for high-interactivity remote visualization, CUDA just-in-time compilation to increase the performance of data-driven visualization and analysis algorithms, and we'll describe new, GPU-accelerated, MD trajectory clustering algorithms.

Level: Intermediate
Type: Talk
Tags: In-Situ and Scientific Visualization; Computational Chemistry; Rendering & Ray Tracing

Day: Tuesday, 04/05
Time: 13:00 - 13:50
Location: Room LL21D

S6391 - Bootstrapping Labels for One Hundred Million Images

Jimmy Whitaker Software Engineer, Digital Reasoning
Jimmy Whitaker is a software engineer at Digital Reasoning, a cognitive computing company focused on enabling humans to leverage big data to make decisions, where he has been pioneering computer vision efforts. Prior to joining Digital Reasoning, Jimmy completed his M.S. in computer science at the University of Oxford, where he achieved a distinction for his research in the field of steganalysis -- detecting hidden information in images.

We'll describe how we created an iterative labeling process to perform data science on 100 million+ images using a GPU-powered workflow with convolutional neural networks. Recently, deep learning techniques such as deep convolutional neural networks (ConvNets) have achieved state-of-the-art results in many computer vision tasks. The data-driven nature of deep learning normally requires a large number of labeled examples to achieve high accuracies. Unfortunately, much of the publicly available data on the web is not labeled, thus requiring human labelers for large datasets or unsupervised machine learning techniques. Our labeling process allows weak labels and a small number of strong labels to be used to create classifiers for very large datasets.

Level: Beginner
Type: Talk
Tags: Deep Learning & Artificial Intelligence; Big Data Analytics; Computer Vision & Machine Vision

Day: Tuesday, 04/05
Time: 13:00 - 13:25
Location: Room 210H

S6422 - Enhancing Visual Realism of Mixed Reality Applications with Stereo Vision

Azzam Edwin CTO, Stereolabs
Edwin Azzam co-founded STEREOLABS in 2010. As STEREOLABS's Chief Technical Officer, Edwin is responsible for leading the company's product development and technology strategy in stereo vision. Prior to founding STEREOLABS, Edwin was a project manager at Astrium Space Transportation, Paris.Edwin holds a Master's degree in Optics & Image Processing from Institut d'Optique, France, as well as a Master's degree in Management from ESSEC Business School. He is a PhD supervisor and a National Technical Expert for the ANR (National Research Agency), where he uses his technical and market expertise for the assessment of national research projects in the field of computer vision and 3D image processing.

Discover how stereo vision and 3D depth sensing on GPU enable the development of mixed reality applications, which merge virtual information into a live 3D video stream of the real world. We will discuss the various stages of a real-time mixed reality processing pipeline, and how NVIDIA's GPU acceleration is integral to every step of the pipeline. We will also show demonstrations of how stereo depth sensing can be used to create 3D virtual playgrounds and real-time augmentation of the environment.

Level: All
Type: Talk
Tags: Computer Vision & Machine Vision; Virtual Reality & Augmented Reality; Video & Image Processing; Embedded

Day: Tuesday, 04/05
Time: 13:00 - 13:25
Location: Room 210F

S6423 - Accelerating Approximate Weighted Matching on GPUs

Antonino Tumeo Research Scientist, Pacific Northwest National Laboratory
Highly-Rated Speaker
Dr. Antonino Tumeo has been a research scientist in the PNNL's High Performance Computing group since February 2011. Antonino received an M.S. degree in informatic engineering in 2005, and a Ph.D. in computer engineering in 2009, from Politecnico di Milano in Italy. He Joined PNNL in 2009 as a post-doctoral research associate. Previously, he was a post doctoral researcher at Politecnico di Milano. His research interests are modeling and simulation of high-performance architectures, hardware-software codesign, FPGA prototyping, and GPGPU computing.

Matching is a fundamental graph problem with numerous applications in science and engineering. This talk discusses the efficient implementation of half-approximate weighted matching on GPUs. We start by describing the Suitor algorithm, currently considered the best algorithm for this problem, and identifying by its key implementation challenges. In its basic formulation, the Suitor algorithm appears poorly suited to GPUs, due to the irregular memory accesses and the use of locks. We proceed by introducing four variants of the algorithm that progressively address these challenges by exploiting Kepler's hardware features. We demonstrate that the final implementation outperforms by several times the performance of previous best matching algorithms for GPUs and of the Suitor algorithm on CPUs.

Level: Intermediate
Type: Talk
Tags: Algorithms; Big Data Analytics; Aerospace & Defense

Day: Tuesday, 04/05
Time: 13:00 - 13:25
Location: Marriott Salon 3

S6452 - Run-Time Scene-Graph Construction from Geographic Source Data

Tim Woodard Chief Technology Officer, Diamond Visionics
Highly-Rated Speaker
Tim Woodard is the chief technology officer at Diamond Visionics, with over 18 years of experience specializing in the design and development of software architectures for real-time, PC-based image generation using Agile development processes, advanced C++, and modern OpenGL techniques. Tim has received patents for the real-time simulator database generation technology that forms the basis of Diamond Visionics' GenesisRTX worldwide database generation system. GenesisRTX provides high-fidelity generation, visualization, and manipulation of visual databases at run-time directly from source data on low-cost PC-based platforms, eliminating the need for traditionally labor-intensive off-line database production processes. He has served as the director of engineering, director of research and development, and principal investigator for a number of Phase I, II, and III U.S. Government Small Business Innovative Research Grants. Tim has also published and presented papers at I/ITSEC, IMAGE, NVIDIA's GPU Technology Conference, ASQ, and ITEC.

In modern computing hardware, the gaps in performance between GPUs, CPUs, RAM, and storage continue to widen. When visualizing large and dense geographic datasets (e.g., imagery, elevation, vectors, features), balancing the workload effectively between these resources (and considering the bottlenecks between them) is crucial. Conventional wisdom for optimal performance from just 10 years ago may not provide the same benefits it once did. In this talk, we demonstrate that by exploiting parallelism on the CPU and especially the GPU, much greater throughput can be achieved. Furthermore, by utilizing modern OpenGL techniques (e.g., NV_command_list), an order of magnitude increase in performance can be achieved when compared to previously available rendering methods.

Level: Intermediate
Type: Talk
Tags: Real-Time Graphics; Aerospace & Defense; Performance Optimization

Day: Tuesday, 04/05
Time: 13:00 - 13:25
Location: Room 210E

S6616 - NCCL: Accelerated Collective Communications for GPUs

Nathan Luehr Senior Devtech Engineer, NVIDIA
Nathan Luehr is a senior developer technology engineer for compute applications at NVIDIA. He earned a Ph.D. in theoretical chemistry from Stanford University in June 2015.

We present NCCL, a library of multi-GPU communication collectives (e.g., broadcast, all-reduce, all-gather). NCCL enables applications to harness the computational throughput of multiple GPUs with minimal developer effort by providing optimized, topology-aware, asynchronous collectives with a familiar API.

Level: All
Type: Talk
Tags: Tools & Libraries; Supercomputing & HPC

Day: Tuesday, 04/05
Time: 13:00 - 13:25
Location: Room 211B

S6650 - Optimizing In-Field Processing Using GPUs

Tarik Saidani Senior Software Engineer, PGS
Tarik is a Senior Software Engineer at PGS. He is specialized in parallel programming and software optimization. He worked in the Oil and Gas industry for the last five years helping research geophysicist in commercializing their applications, using parallel programming and software optimization techniques. He holds a PhD degree in parallel computing from Paris-Sud University.

Learn how GPU accelerators help marine seismic acquisition to efficiently perform one of the fundamental steps in the on-board processing flow. GPUs not only allow unprecedented data processing throughput, but also reduce hardware footprint, power consumption and heat dissipation of the in-field compute system.

Level: Intermediate
Type: Talk
Tags: Energy Exploration; Performance Optimization; Signal & Audio Processing

Day: Tuesday, 04/05
Time: 13:00 - 13:25
Location: Marriott Salon 1

S6689 - Creating CONSTRUCT: A GPU-Rendered Short Film

Kevin Margo Director / VFX Supervisor, Blur Studio
Highly-Rated Speaker
Kevin is director of the hit sci-fi short film "Grounded". He joined Blur studio in 2003 as a scene assembly, lighting and compositing artist and has since moved into the studio's VFX/CG Supervisor role. Recent work includes the prologue for Thor 2: The Dark World and the David Fincher produced Halo 4: scanned cinematic trailer.

Come watch a special screening of the GPU rendered independent short film "CONSTRUCT"! Afterwards, Kevin will describe how Chaos V-Ray RT and NVIDIA GPUs were used throughout production on the groundbreaking short film, rendered entirely on GPUs. Go here (http://constructfilm.com/) to see more of the project and here (https://www.youtube.com/watch?v=nnaz8q6FLCk) to see how interactive GPU rendering was used on a motion capture stage during production. As a bonus, Kevin will cover how GPU rendering was recently implemented in his day job at Blur Studio to help visualize the irreverent title sequence of the Fox/Marvel hit film DEADPOOL.

Level: All
Type: Talk
Tags: Rendering & Ray Tracing; Media & Entertainment; Real-Time Graphics; Press-Suggested Sessions: Professional Graphics

Day: Tuesday, 04/05
Time: 13:00 - 13:50
Location: Room LL21B

S6782 - Securing the San Francisco 49ers and Levi's Stadium

Dan Cory VP of Security, San Francisco 49ers
Dan Cory is in his fifth year with the 49ers and first as the team's vice president of security. In his role, he oversees all elements of security for both the team and Levi's® Stadium. Prior to joining the organization, Cory served in the British Military as a Royal Marine Commando, followed by twelve years of service as a law enforcement officer for Scotland Yard's Special Operations Department. His law enforcement and military career has afforded him the opportunity to work with many different security units conducting operations around the world.

The scope and scale of securing a facility like Levi's Stadium and the San Francisco 49ers is monumental. Learn how video monitoring and analysis allows the 49ers security team to focus on ensuring a safe season for the team and fans alike.

Level: All
Type: Talk
Tags: Intelligent Video Analytics (IVA)

Day: Tuesday, 04/05
Time: 13:00 - 13:25
Location: Room LL20D

S6825 - The OpenPOWER Foundation: Revolutionizing Data-Centric Transformation (Presented by IBM)

Sumit Gupta Vice President, High Performance Computing and Analytics, IBM Power Systems
Sumit is responsible for offering and product management of OpenPower-based solutions for high performance computing and high performance data analytics. In this role, Sumit is driving the offerings IBM is building for the technical computing markets and machine and deep learning markets. Sumit joined IBM in May 2015 from NVIDIA, where he was the general manager for the Tesla GPU accelerator business. He was central in building this startup business within NVIDIA from zero to a several hundred million dollar business. Sumit is a product management, marketing, and business leader for enterprise systems and software products. He has previously held positions in marketing, business strategy, and engineering at Tensilica, Tallwood Venture Capital, Intel, S3, and IBM. Sumit has a Ph.D. in Computer Science from the University of California, Irvine, and a bachelors of technology in Electrical Engineering from the Indian Institute of Technology, Delhi. He has authored one book, one patent, several book chapters and more than 20 technical publications.

The growth of the OpenPOWER Foundation has been phenomenal. Why, you might ask? In less than two years, OpenPOWER has grown from five members to over 180, with membership across all tiers of hardware, software, and end users themselves. The Foundation provides a compelling and rapidly growing open approach to infrastructure and software for rapidly changing workloads and evolving IT consumption models. This is a revolution that is making a profound difference in the price/performance criteria of end users, as well as accelerating compelling development for performance to drive business advantage. OpenPOWER members are co-creating their approach to technology—as innovators, producers, and consumers utilizing IBM's Power Architecture.

Level: All
Type: Talk
Tags: Big Data Analytics; Data Center & Cloud Computing; Supercomputing & HPC

Day: Tuesday, 04/05
Time: 13:00 - 13:50
Location: Marriott Salon 6

S6829 - Drive Me: Volvo's Autonomous Car Program

Henrik Lind Technical Expert, Volvo Car Corporation
Henrik Lind has a master in Electrical Engineering from Chalmers University of Technology. Henrik has been working with advanced driver assistance technologies and technology research at Volvo Technological Development since 1997 leading research of sensors and functions. From the year 2001 Henrik moved to Volvo Cars as responsible for the introduction of radar and vision related functions at Volvo Car Corporation with the aim to provide increased safety and comfort for drivers. He introduced forward collision warning with emergency brake and adaptive cruise control in 2006 followed by new innovations in safety. From 2013 and forward Henrik has been working in bringing in highly automated driving technologies at Volvo Cars. He is appointed technical specialist.

We'll present the Drive Me project involving 100 highly autonomous vehicles in the vicinity of Gothenburg, Sweden. Henrik will discuss different technologies related to sensors and sensor processing and the resulting requirement for high performance processing in autonomous vehicles.

Level: All
Type: Talk
Tags: Self-Driving Cars & Automotive ; Press-Suggested Sessions: Self-Driving Cars & Auto

Day: Tuesday, 04/05
Time: 13:00 - 13:25
Location: Room LL21E

S6838 - Create Full Set of Materials for Hyundai Genesis G380 with Substance Designer, Iray and MDL

David Nikel Digital Model Manager, Hyundai
After 10 years as a modeler for General Motors and running its own independent company for 4 years, David has been the Digital Model Manager at Hyundai USA since 2002.
Jerôme Derel Chief Product Officer, Allegorithmic
Engineer and product designer Jerome Derel joined Allegorithmic in 2014 as a chief product officer. Jerome worked for seven years at Dassault Systemes as a visualization expert in the Design Studio and the CATIA Design teams, leading projects producing high-quality virtual materials.
Pierre Maheut Product Manager & Senior Industrial Designer, Allegorithmic
With an industrial design background and after 8 years at Dassault Systemes as CATIA Creative Design expert & portfolio manager, Pierre joined Allegorithmic as product manager & senior industrial designer.

Discover how Substance Designer enables the creation of the extensive set of materials for the interior & exterior of the Hyundai Genesis G380. We will show how Allegorithmic's Substance procedural technology and NVIDIA's Material Definition Language (MDL) can be combined to bring materials creation to a level never reached before. Material review will be achieved on the actual fully detailed car model using Substance Designer and NVIDIA Iray integration. Finally, we will explain how Substance can help industrial designers in their creative iterations and exploration phases.

Level: All
Type: Talk
Tags: Product & Building Design; Rendering & Ray Tracing; Press-Suggested Sessions: Professional Graphics

Day: Tuesday, 04/05
Time: 13:00 - 13:50
Location: Room LL21A

S6848 - Deep Learning Workloads on CRAY Cluster Systems with NVIDIA™ GPUs (Presented by Cray)

Ryan Olson Principal Performance Engineer, Cray
Ryan Olson is a Member of the Performance Engineering Team at Cray since 2007. Prior to this, Ryan was a Postdoctoral Research Associate at the University of Minnesota, and he completed graduate work at the Ames Laboratory. Ryan holds a PhD in Physical Chemistry from Iowa State University, and a BA in Chemistry & Mathematics from Saint John's University.
Mark Staveley Director of Product Management, Cray
Mark Staveley is a Director of Product Management with CRAY. Mark is part of the Analytics Products Team, and his main role is to lead CRAY's Machine Learning efforts. Prior to joining CRAY, Mark spent over 6 years working at Microsoft where he held various roles – including being the Technical Program Manager for Microsoft Azure's accelerated computing and visualization program, the Research Program Manager for Microsoft Research's large-scale data management and processing program, Senior Engineer on Xbox One and the Microsoft Windows HPC Server. Prior to Microsoft, Mark worked as a Computational Researcher at ACEnet and HPCVL – two of Canada's Largest High Performance Computing Centers.

Cray Cluster Systems have long been used to support Supercomputing and Scientific Applications. In this talk we'll demonstrate how these same systems can be easily configured to support Docker and subsequently various Machine Learning Software Packages – including NVIDIA's Digits Software. Additionally, these systems can be configured in such a way that their Docker containers can be configured to pull data from Cray's Sonexion Scale-out Lustre Storage System. With this configuration our systems can have maximum application flexibility through docker as well as simultaneously being able to support the high performance storage requirements of many types of machine learning workloads through a connection with our Lustre ecosystem.

Level: All
Type: Talk
Tags: Deep Learning & Artificial Intelligence; Tools & Libraries; Supercomputing & HPC

Day: Tuesday, 04/05
Time: 13:00 - 13:50
Location: Room 212B

S6902 - Virtual Reality: You Are Here

David Luebke Vice President Graphics Research, NVIDIA
Highly-Rated Speaker
David Luebke helped found NVIDIA Research in 2006 after eight years teaching computer science on the faculty of the University of Virginia. David is currently Vice President of Graphics Research at NVIDIA. His personal research interests include virtual and augmented reality, display technology, ray tracing, and graphics architecture. His honors include the NVIDIA Distinguished Inventor award, the NSF CAREER and DOE Early Career PI awards, and the ACM Symposium on Interactive 3D Graphics "Test of Time Award". David has co-authored a book, a SIGGRAPH Electronic Theater piece, a major museum exhibit visited by over 110,000 people, an online course on parallel computing that has reached over 80,000 students, and dozens of papers, articles, chapters, and patents on computer graphics and GPU computing.

NVIDIA Research reviews the technology, the components, and the challenges of virtual reality. We describe how GPUs are addressing these challenges, and our vision for the future of VR.

Level: All
Type: Talk
Tags: Virtual Reality & Augmented Reality; Press-Suggested Sessions: Virtual Reality

Day: Tuesday, 04/05
Time: 13:00 - 13:50
Location: Room LL20C

S6157 - Effective Evaluation of Betweenness Centrality on Multi-GPU Systems

Massimo Bernaschi Director of Technology, National Research Council of Italy
Highly-Rated Speaker
Massimo Bernaschi is with CNR, the National Research Council of Italy as Chief Technology Officer of the Institute for Applied Computing. He is also an Adjunct Professor of Systems Programming at "Sapienza" University in Rome; Trainer in Digital Forensics at "Sapienza" and Modena Universities. Before joining CNR in 1998, Massimo worked ten years at the IBM European Center for Scientific and Engineering Computing where he developed the IBM PVMe product and received two Outstanding Technical Achievement Awards. His main scientific interests are parallel computing; modelling of complex systems (finance and biology); systems and network security; high performance computing. Massimo is the author of over 150 papers in peer-reviewed journals and international conferences.

Learn how to use (multi) GPU and CUDA to speed up the process of ranking the importance of each node in a large scale network. You will see how to solve an extraordinary challenge, that is the exact computation of Betweenness Centrality, by using as building blocks relatively simple algorithms, like the Breadth First Search, that have been highly tuned for latest generation GPU cards. Our approach is fully scalable and overcomes the limitation on the size of the graph that can be studied on a single GPU. We'll present results obtained on both synthetic and real-world graphs.

Level: Intermediate
Type: Talk
Tags: Algorithms; Performance Optimization

Day: Tuesday, 04/05
Time: 13:30 - 13:55
Location: Marriott Salon 3

S6362 - CNN Based Object Detection in Large Video Images

Tao Wang Chief Scientist, iQIYI ltd. Corp.
Dr. Tao Wang is chief scientist of iQIYI ltd. Corp., the biggest video sharing platform in China, where he works on computer vision and multimedia software applications. He received his Ph.D. in computer science from Tsinghua University in 2003. Tao then worked as a senior researcher in Intel Labs China. He has published more than 60 papers in IJCV, CVPR, CIVR, ICME, and ACM multimedia.

Object detection in real video images is more challengable than image data set. We'll present CNN based object detection research on IQIYI large image and videos. It is used for content based ads recommendation.

Level: All
Type: Talk
Tags: Deep Learning & Artificial Intelligence; Computer Vision & Machine Vision

Day: Tuesday, 04/05
Time: 13:30 - 13:55
Location: Room 210H

S6420 - Parallel Silence Coding Algorithms for Seismic Data Compression on GPUs

John Cheng Research Scientist, BGP
John is a research scientist with profound industry experience in high-performance. John has developed seismic imaging products with GPU and many parallel applications computing on heterogeneous computing platforms. John is the author of many books, including Professional CUDA C programming, by Wiley 2014. John has profound experience in both academic research and industry development, and is gifted in making complex subjects accessible to readers with a concise and illustrative approach. John earned his doctoral degree in Computational Intelligence from Tokyo Institute of Technology.

Join industry experts for a discussion on a novel parallel silence coding algorithms on GPUs. Silence coding, combined with Huffman coding to form a lossless scheme, is extensively used in seismic data compression. It is inherently an serial procedure and not easy to be parallelized for GPUs. In this session, you will learn how to convert the sequential computation into the parallel computation through prefix-scan operations, a key primitive in many parallel algorithms, and how to quickly implement your kernels by utilizing NVIDIA CUB, a library of high-performance parallel primitives and reusable components for every layer of the CUDA programming mode. Concepts and performance are illustrated through examples by adjusting alternative algorithmic strategies provided in CUB.

Level: Intermediate
Type: Talk
Tags: Energy Exploration; Performance Optimization

Day: Tuesday, 04/05
Time: 13:30 - 13:55
Location: Marriott Salon 1

S6563 - Where Tegra Meets Titan: Asymmetric Computer Vision for Smartphones and Robotics

Tom Drummond Professor, Monash University
Tom Drummond has been a principal investigator on several EU Framework projects and is a chief investigator in the ARC Centre of Excellence for Robotic Vision. Tom studied mathematics for his B.A. at the University of Cambridge. In 1989, he emigrated to Australia and worked for CSIRO in Melbourne for four years before moving to Perth for his Ph.D. in computer science at Curtin University. In 1998, he returned to Cambridge as a postdoctoral research associate and in 1991 was appointed as a university lecturer and was subsequently promoted to senior university lecturer. In 2010, he returned to Melbourne and took up a professorship at Monash University.

This presentation will argue that battery life and thermal limits will prevent small mobile devices from implementing the next generation of visual processing algorithms without external assistance from high performance computing. Several innovative methods of distributing these problems between lightweight and high-powered nodes will be explored for a number of visual processing applications relevant to smartphones and robotics. We'll illustrate how these problems can be mapped onto the thread model of GPUs and will present a couple of CUDA tricks used to maximize efficiency.

Level: All
Type: Talk
Tags: Computer Vision & Machine Vision; Robotics & Autonomous Machines; Virtual Reality & Augmented Reality

Day: Tuesday, 04/05
Time: 13:30 - 13:55
Location: Room 210F

S6577 - Fighting Infections and Antimicrobial Resistance Through GPU-Accelerated In Silico Models

Radu Marculescu Professor of Electrical & Computer Engineering, Carnegie Mellon University
Dr. Radu Marculescu is a professor in the Electrical and Computer Engineering Department at Carnegie Mellon University. He has received several Best Paper Awards in top conferences and journals covering design automation of integrated systems and embedded systems. Radu currently serves as the editor-in-chief of Foundations & Trends of Electronic Design Automation and as an associate editor of Elsevier Journal of Nano Communication Networks. Radu has been involved in organizing many symposia and conferences, and has been guest editor of special issues in archival journals and magazines. His current research focuses on cyber-physical systems, social, and biological systems. He is an IEEE Fellow.

Learn the core principles behind cell-cell communication and understand the use of in silico models and simulation algorithms needed to evaluate the dynamics of heterogeneous microbial populations. Explore the pathogens' inter-cellular network, its dynamics and contribution to biofilm formation. See how the newest GPU-based platforms can enable highly parallel simulations with performance gains of orders of magnitude over existing CPU-only solutions. Understand how the application of social and network sciences to understanding bacterial population dynamics can aid in developing new treatments and better drugs to control the many pathogenic bacteria that use social interactions to cause infections and antimicrobial resistance.

Level: Beginner
Type: Talk
Tags: Computational Biology; Supercomputing & HPC; Press-Suggested Sessions: HPC & Science

Day: Tuesday, 04/05
Time: 13:30 - 13:55
Location: Marriott Salon 5

S6583 - WetBrush: GPU-Based 3D Painting Simulation at the Bristle Level

Zhili Chen 3D Graphics Researcher, Adobe Research
Zhili Chen is a 3D graphics researcher at Adobe. He got his Ph.D. in computer science at The Ohio State University in 2015. His research interests include physically based simulation, real-time graphics, 3D reconstruction, and virtual reality.
Byungmoon Kim Software Engineer, Adobe Research
Byungmoon Kim worked in Creative Technology, leading developments of MIDI software. He then moved to US to study at the Georgia Institute of Technology with broad interests that resulted in three master’s degrees in Aerospace Engineering, Mathematics, and Computer Science, before he received his PhD in Computer Science. His research included robot control, spacecraft control and experiments, collision detection, geometry mesh filtering, some haptics, and fluid simulations. After his PhD, he worked at NVIDIA as a software engineer. During this time, he worked on general DirectX driver development; and anti-aliasing, driver ambient occlusion, and some advanced stereo features. He later joined to Adobe Research working on Flash scene graph and physics engine, implicit/explicit hybrid mesh repair for 3D printing, octree/quadtree simulation, interactive selection tools, painting simulation, face-aware liquify warp tool, and a number of research projects.

We built a real-time oil painting system that simulates the physical interactions among brush, paint, and canvas at the bristle level entirely using CUDA. To simulate sub-pixel paint details given the limited computational resource, we propose to define paint liquid in a hybrid fashion: the liquid close to the brush is modeled by particles, and the liquid away from the brush is modeled by a density field. Based on this representation, we develop a variety of techniques to ensure the performance and robustness of our simulator under large time steps, including brush and particle simulations in non-inertial frames, a fixed-point method for accelerating Jacobi iterations, and a new Eulerian-Lagrangian approach for simulating detailed liquid effects.

Level: Intermediate
Type: Talk
Tags: Real-Time Graphics; Computational Fluid Dynamics

Day: Tuesday, 04/05
Time: 13:30 - 13:55
Location: Room 210E

S6628 - Co-Designing GPU-Based Systems and Tools for Numerical Weather Predictions

Thomas Schulthess Director, Swiss National Supercomputing Centre
Thomas Schulthess is director of the Swiss National Supercomputing Centre (CSCS) and a professor for computational physics at ETH Zurich. He received his Ph.D. in physics in 1994. Since 2010, he has taken interest in refactoring climate codes to take advantage of novel, energy-efficient computing architectures.
Carlos Osuna Scientific Developer, Meteoswiss, Zurich
Carlos Osuna is a scientific software developer at MeteoSwiss, Zurich. Since 2011, he has been involved in research projects at ETH Zurich and MeteoSwiss, refactoring dynamical cores of weather codes using DSLs to port legacy codes to GPUs and provide performance portable applications. He received his Ph.D. in experimental high energy physics in 2003.

We'll discuss the hardware-software co-design project behind the most cost and energy efficient system for numerical weather prediction -- an appliance based on the Cray CS-Storm system architecture that is loaded with NVIDIA K80 GPUs and operated on behalf of MeteoSwiss by CSCS since October 2015.

Level: Intermediate
Type: Talk
Tags: Earth System Modelling; Press-Suggested Sessions: HPC & Science

Day: Tuesday, 04/05
Time: 13:30 - 13:55
Location: Room 211A

S6778 - Scaling Human Vision with GPUs

David Luan CEO, Founder, Dextro
David Luan is the co-founder of Dextro, a video analysis company based in NYC that uses machine learning and computer vision to help companies with large video collections easily understand, categorize, search, and visually transcribe their content—without depending on text metadata. David previously built computer vision systems at iRobot's military research group, and commercialized cutting-edge research as a Thiel Fellow.

Searching, filtering, and running aggregations on video at scale requires tools to enable humans to apply rich queries to the dataset, and get accurate answers quickly. Discover how users of Dextro's GPU-powered computer vision platform are able to analyze huge video datasets and extract meaningful answers in a matter of seconds, and how we apply this technique to train and create new categories on the fly. See live demos of Dextro's platform applied to user-generated video and security footage datasets.

Level: All
Type: Talk
Tags: Intelligent Video Analytics (IVA); Deep Learning & Artificial Intelligence; Media & Entertainment

Day: Tuesday, 04/05
Time: 13:30 - 13:55
Location: Room LL20D

S6856 - Audi Autonomous Braking with a 3D Monovision Camera

Rudolph Matthias Director Architecture Driver Assistance Systems , Audi AG
Dr. Rudolph studied Electrical Engineering at the University of Kassel and got his Ph.D. in Aerospace Engineering and Engineering Mechanics from Iowa State in 1999 with a minor in mathematics. After holding various positions at Audi he took in 2009 the Lead of the Department "Architecture Driver Assistance Systems". The zFAS project is one of the core development of the department. Dr. Rudolph is a member of the management at Audi.

To fulfill the EuroNCAP requirements an autonomous braking system has to be developed. The emergency braking system is designed to brake for pedestrians as well as for car to car scenarios. We'll explain how the functional logic is developed and what has to be done to reach a zero false positive goal with an excellent field performance. Audi was the first OEM who fulfilled this goal with a single 3D Monovision camera by developing the first A-SIL B camera with our supplier Kostal, the architecture of the 3D camera is explained as well.

Level: All
Type: Talk
Tags: Self-Driving Cars & Automotive

Day: Tuesday, 04/05
Time: 13:30 - 13:55
Location: Room LL21E

S6210 - NVIDIA GRID™ and Dassault Catia from Proof of Concept to Production

Fred Devoir Sr. Architect & Manager of IT Infrastructure, Textron Inc.
Fred Devoir is a senior systems architect and manager of IT infrastructure at Textron Inc. Fred has a wide variety of specialized and business systems experience with particular interests in integration and virtualization projects specifically centered around virtual desktop infrastructure (VDI), graphics acceleration, and high performance computing clusters. His past experience includes 22 years of IT professional work experience as an IT manager, senior systems analyst, engineer, and architect for Fortune 500 companies in the aerospace/defense, engineering, medical, and pharmaceutical industries.
Chris Savage Infrastructure Operations Manager, Bell Helicopter
Chris Savage joined Bell Helicopter in 2011 as Disaster Recovery Program Manager following 15 years in IT and BC/DR management. He currently serves as Infrastructure Operations Manager. He earned his B.Sc. degree in Emergency Administration and Planning from the University of North Texas, and holds an MBCP certification from DRI.

Join us for a technical discussion on NVIDIA GRID-accelerated virtual desktop infrastructure to support Dassault Catia workloads and what it takes to deploy the infrastructure from proof of concept to production. Learn how to tune Catia for use on virtual desktops, including how to optimize the NVIDIA GRID graphics drivers in windows to deliver Catia workloads. Learn about frame rate limiters and other performance optimization settings available in the infrastructure. Learn how persona management can assist with concurrent user deployments and what data is required to save with the users personalization settings in order to retain the Catia settings.

Level: Intermediate
Type: Talk
Tags: Graphics Virtualization; Product & Building Design

Day: Tuesday, 04/05
Time: 14:00 - 14:50
Location: Marriott Salon 4

S6246 - Digital Actors at MPC: Bridging the Uncanny Valley with GPU Technology

Damien Fagnou Global Head of VFX Operations, MPC
Highly-Rated Speaker
Damien Fagnou is the global head of VFX Operations at MPC, where he brings together his expertise in software and production to evolve and refine the creation processes across all feature film VFX work. After finishing university with an M.S. in computer science in France, he worked for an animated series implementing the technology to speed up the motion capture pipeline and rendering. He later accepted a job to help set up the workflow at Attitude studios and then took on the role of Tools and Workflow Programmer at Climax in the U.K. In 2003, he transferred his skills to the film industry and started at leading VFX post-production studio MPC to work on Troy, implementing preview tools and city rendering scripts. In 2005, Damien became R&D lead on Charlie and the Chocolate Factory, 10,000 BC, and Narnia. He then moved closer to production and became MPC's stereographer working on movies, including Pirates of the Caribbean: On Stranger Tides, the Harry Potter films, and Prometheus. After a few years in production, he returned to his software roots and became global head of Software overseeing software development efforts across the company.

Discover the next generation of GPU-enabled facial rigs for digital actors at MPC. Through a mixed approach of linear deformers and non-linear analysis, MPC aims to improve the performance and appearance of its digital actors and improve upon the state of the art in the visual effects industry. You'll learn from industry experts how MPC is using the latest fabric engine technology to ease the transition to GPUs, enabling fast drawing of characters and fast parallel computation of deformers on CUDA.

Level: Intermediate
Type: Talk
Tags: Media & Entertainment; Performance Optimization; Algorithms

Day: Tuesday, 04/05
Time: 14:00 - 14:25
Location: Room LL21C

S6334 - 10X Faster Transparency from Low Level Shader Optimisation

Pyarelal Knowles Student, RMIT University
Pyarelal Knowles is a PhD student at RMIT University, Melbourne, with research interests in real-time computer graphics and physics simulations. He completed his Bachelor of IT (Games and Graphics Programming) in 2008, before a Comp. Sci. (Honours) year in 2009 at RMIT.

Some techniques to improve sorting performance of many small lists are discussed for much faster order independent transparency, a problem with elements of both compute and graphics which are quickly converging. The focus is on technical issues encountered, such as occupancy and memory hierarchy performance, a comparison between GLSL shaders and CUDA, and some discussion of GPU evolution.

Level: Intermediate
Type: Talk
Tags: Real-Time Graphics; Algorithms; Performance Optimization

Day: Tuesday, 04/05
Time: 14:00 - 14:25
Location: Room 210E

S6337 - GPU-Accelerated Graph Query for Cyber Applications

Jim Carbonaro Senior Software Engineer, Blazegraph
Jim Carbonaro is subject matter expert for integration and scaling of Blazegraph solutions with real-time analytic processing frameworks, including Spark, Scala, Storm, Kafka, GraphX, and Redis. He is a lead developer of DASL and DASL algorithms for large-scale graph analytics. He led recent work to compare performance of Apache Spark GraphX with Blazegraph-accelerated graph analytics.

Cyberspace is a critical domain for government and commercial organizations. It is about networks, devices, and how they interact. Graphs model nodes and links and how they are connected. Defending the critical networks in cyberspace requires processing and analyzing extremely large quantities of graph data in near-real time. Key cyber analytics and data sets ranging from Topological Vulnerability Analysis, Traffic Flow Analysis, and Network Attack Graphs are graphs. This session will discuss how Blazegraph GPU meets this challenge by delivering near-real time performance at a very large data scales, uses a flexible and updatable graph representation to support complex analytics, and supports existing graph frameworks (RDF, Tinkerpop) and query languages (SPARQL).

Level: Intermediate
Type: Talk
Tags: Aerospace & Defense; Big Data Analytics; Deep Learning & Artificial Intelligence

Day: Tuesday, 04/05
Time: 14:00 - 14:25
Location: Marriott Salon 2

S6345 - Advances in V-Ray RT GPU

Vladimir Koylazov CTO, Chaos Software Ltd.
Highly-Rated Speaker
Vladimir Koylazov (Vlado) has more than 15 years of software development experience, the majority of which he spent developing and improving the render engine V-Ray. Passionate about 3D graphics and programming, Vlado is the driving force behind Chaos Group's software solutions. Vladimir is CTO of Chaos Software and one of the original creators of the V-Ray renderer.
Blagovest Taskov Lead Developer, Chaos Group
Blagovest Taskov is the lead of the V-Ray RT GPU developers team. He has been working on the some of the latest advancements in V-Ray RT GPU, including improved OpenCL support, performance optimizations and many rendering features.

The talk describes recent advances in the V-Ray RT GPU raytracer for production rendering. With V-Ray 3.0 we will be offering a host of new features, optimizations, and improvements that will simplify artists' workflow while offering advanced capabilities and great speed improvements. One of the key improvements will be the simplified workflow.

Level: Advanced
Type: Talk
Tags: Rendering & Ray Tracing; Product & Building Design

Day: Tuesday, 04/05
Time: 14:00 - 14:25
Location: Room LL21B

S6357 - Towards Efficient Communication Methods and Models for Scalable GPU-Centric Computing Systems

Holger Froening Associate Professor, Ruprecht-Karls University of Heidelberg
Holger Froenig is an associate professor at the Department of Mathematics and Computer Science at the Ruprecht-Karls University of Heidelberg (Germany), and leads the Computer Engineering Group at the Institute of Computer Engineering. His research interests include parallel computing, computer architecture, interconnection networks and hardware design with a recent focus on application-specific heterogeneous computing, data movement optimizations and associated power and energy aspects.

GPU computing is used pervasively for many reasons, including performance increase and improved energy efficiency. They are used pervasively in high performance computing, resulting in a strong need to optimize data movements between GPUs at the cluster level. Existing communication models and methods are designed for CPUs, however. We'll point out limitations when employing traditional techniques to GPUs, and how to overcome those to support a full GPU-centric traffic souring and sinking. Our experiments show that such specialized communication models and methods provide substantial advantages in terms of energy and time. We observe that besides specialization in computing, we also need specializing in communication for utmost performance and energy efficiency.

Level: Intermediate
Type: Talk
Tags: Supercomputing & HPC; Performance Optimization

Day: Tuesday, 04/05
Time: 14:00 - 14:50
Location: Room 212A

S6384 - NVIDIA CUDA® for Mobile

Yogesh Kini Manager, CUDA System Software, NVIDIA
Yogesh Kini manages the Tegra CUDA driver team at NVIDIA. For last four years he has been working towards enabling GPU compute software on different Tegra platforms. His team is responsible for the CUDA API and system software on various embedded, automotive, and mobile platforms based on Tegra SOC. He holds a B.S. from Manipal Institute of Technology, India.

This session is about a few important use-cases in mobile that can be accelerated using CUDA. Use-cases include image processing, camera output post-processing, using CUDA. Attendees will learn about : [1] Tegra unified memory architecture, which can be utilized by applications to reduce total memory usage and power consumption. [2] CUDA interoperability with EGLImage [3] Use of EGLStreams to setup image processing pipeline using CUDA. [4] Tegra specific enhancements to CUDA-OpenGL(ES) interop

Level: Intermediate
Type: Talk
Tags: Computer Vision & Machine Vision; Video & Image Processing; Tools & Libraries

Day: Tuesday, 04/05
Time: 14:00 - 14:25
Location: Room 210F

S6401 - Towards Interactive Visual Exploration of Massively Parallel Programs Using a Domain-Specific Language

Tobias Klein Student, TU Vienna / KAUST
Tobias Klein is an M.S. student at TU Vienna working under the direction of Professor Eduard Groller. He is visiting the Visual Computing Center at KAUST for his M.S. thesis research work on the interactive visualization and analysis of massively parallel GPU programs in the context of the KAUST NVIDIA CUDA Research Center, in collaboration with Dr. Peter Rautek and Professor Markus Hadwiger.

Learn about the world of visual exploration of massively parallel programs. We describe our framework for interactive program visualization to help you understand program run-time behavior and find the causes of possible slowdowns and bugs. Our framework comprises a simple domain-specific language for annotating OpenCL kernel code, automatic just-in-time compilation of the necessary code instrumentation, and interactive visualization capabilities. Our problem-specific code annotation facilitates user-centered analysis. We describe a variety of interactive visualizations using the well-known D3 framework, providing insight into the program's structure, execution, and memory accesses. We describe several use cases that show the program visualization capabilities of our approach in action.

Level: Intermediate
Type: Talk
Tags: Tools & Libraries; Performance Optimization; Programming Languages

Day: Tuesday, 04/05
Time: 14:00 - 14:25
Location: Room 211B

S6414 - Structure-Preserving Smoothing for Seismic Amplitude Data by Anisotropic Diffusion Using GPGPU

Joner Duarte R&D Software Engineer, Tecgraf/PUC-Rio
Joner Duarte is researcher at computational geophysics group of Tecgraf/PUC-Rio. He received MSc degree in Computer Graphics at Pontifical Catholic University of Rio de Janeiro (2012). He currently researches on High Performance Computing and HCI for Virtual Reality applied to geophysics.

In this session we present a new method for attenuating noise on seismic data while preserving structural features. Our approach uses anisotropic diffusion to filter the seismic amplitude data that involves solving a large linear system. Moreover, we use seismic attributes, that represent horizons and faults, as restrictions of the diffusion process and an implicit method for solving the diffusion equation. The use of GPGPU to accelerate the linear system solution allows the seismic filtering to be executed at interactive time providing a fine adjustment of input parameters before processing the whole data.

Level: All
Type: Talk
Tags: Energy Exploration; Tools & Libraries

Day: Tuesday, 04/05
Time: 14:00 - 14:25
Location: Marriott Salon 1

S6424 - Exploring Scalable Implementations of Triangle Enumeration in Graphs of Diverse Densities: Apache Spark vs. GPUs

Michela Taufer Associate Professor, University of Delaware
Michela Taufer joined the University of Delaware in 2007, where she was promoted to associate professor with tenure in 2012. She earned her M.S. in computer engineering from the University of Padova and her Ph.D. in computer science from the Swiss Federal Institute of Technology (ETH). She was a post-doctoral researcher supported by the La Jolla Interfaces in Science Training Program (also called LJIS) at UC San Diego and the Scripps Research Institute. Before she joined the University of Delaware, Michela was faculty in computer science at the University of Texas at El Paso.
Travis Johnston Postdoctoral Researcher, Univeristy of Delaware
Travis Johnson is a post-doctoral researcher working with Dr. Michela Taufer in the Global Computing Laboratory at the University of Delaware. He is working on several projects that are centered on big data analytics for scientific computation.

We'll present graphs as powerful tools when analyzing complex relationships between entities. We'll share how many structures commonly found in computer science, like social networks, computer networks, and the world wide web, can be modeled as graphs. Since many of the real graphs are very large and complex, the associated analysis algorithms must be very efficient and highly parallel. We present two implementations of a key graph-based analysis such as the triangle enumeration for two different parallel paradigms: GPU programming and Apache Spark. We'll reveal the performance of the two different implementations for the different paradigms as the characteristics of the graph change.

Level: Beginner
Type: Talk
Tags: Algorithms; Tools & Libraries; Big Data Analytics

Day: Tuesday, 04/05
Time: 14:00 - 14:25
Location: Marriott Salon 3

S6468 - Democratizing Sequencing with Ion Torrent S5 and S5XL DNA Sequencers Powered by GPUs

Mohit Gupta Staff Software Engineer, Thermo Fisher Scientific
Highly-Rated Speaker
Mohit Gupta is working as a staff software engineer in Genetic, Medical and Applied Sciences division of Life Sciences Solutions, a part of Thermo Fisher Scientific Inc. In this capacity, he is responsible for speeding up algorithms used in data analysis of PGM, Proton, S5, and S5XL DNA sequencers, with a particular focus on GPU computing. Previously, Mohit worked as senior research and development engineer with Mirafra Technologies, Bangalore, India, in the area of electronic design automation working on compiler for hardware description languages like Verilog. He holds a B.Tech. in electrical engineering from the Indian Institute of Technology, Bombay, India and an M.S. in computer engineering from University of California, San Diego. He has published or presented in conferences and workshops like ICCAD, GTC, and DFMY.

Learn how GPUs have accelerated the pace of research in targeted DNA sequencing by providing quick turnaround time in data analysis of terabytes of raw data generated by Ion Torrent DNA sequencers, like S5XL, in a matter of hours. We'll showcase our complete signal processing pipeline running on GPUs and share our results with lessons learnt in developing CUDA code for Kepler and Maxwell architectures. We'll share our experiences with using CUDA Multi Process Service (MPS). We'll touch upon several examples in areas of clinical research, drug discovery, and human identification that have got a tremendous boost from the speed of our technology propelled by GPUs.

Level: All
Type: Talk
Tags: Computational Biology; Press-Suggested Sessions: HPC & Science

Day: Tuesday, 04/05
Time: 14:00 - 14:25
Location: Marriott Salon 5

S6536 - Building Immersive NVIDIA SHIELD Android TV Games

Luc Beaulieu CTO, Frima Studio
Luc Beaulieu is chief technology officer at Frima, a 350+ employee company, with a passion for digital entertainment. He is currently leading the technical innovation and smart toy groups. Luc has over 20 years of experience in videos games, online communities, and digital experiences.
Jean-Philippe Doiron Technology Director, Frima Studio
With Frima since 2009, Jean-Philippe benefits from over 15 years of professional experience in the creation of games and Web applications. Before joining Frima, he co-founded an Internet multimedia development company where he acted as developer for the better part of six years. He also worked for seven years as a consultant, mostly in Canada, the US and the UK, developing highly optimized and multithreaded database architecture and reporting systems for transit operators. His current responsibilities as Director of Technology include aligning the R&D efforts with Frima's technological needs, laying down quality standards, and providing the R&D team with support. Jean-Philippe is an expert in software and game development, profiling and optimization, Direct X, C++, and Flash 3D.

We'll show how game experiences on the SHIELD Android TV can be extended beyond the border of the monitor. Sound is an obvious example, but what about lights? Find out how Frima increased immersion in its Chariot game by adding connectivity between SHIELD and the Philips Hue lighting system. We'll show how the game interacts with the lights and what had to change to make it work. With Chariot being a console game first, the audience also learns about performance comparisons between the SHIELD TV, new and old generation consoles.

Level: All
Type: Talk
Tags: Game Development

Day: Tuesday, 04/05
Time: 14:00 - 14:25
Location: Room 212B

S6594 - Connecting the Dots of Emerging Technologies and Real-World Application

Anthony Cortez Senior Designer | Visualization | Lighting Consultant, Arup
Anthony Cortez is the Arup visualization leader in the Americas Region. A graduate of the Art Institute of Pittsburgh, Anthony has been working as a senior designer for the visualization industry for over 18 years. Anthony has worked on numerous architectural and engineering projects in visualization, lighting design, and motion graphics. Projects include the Fulton Center, the Tappan Zee Bridge, JFK JetBlue Terminal 5, YAS Marina Hotel, Singapore Stadium and the Academy of Arts & Science Theater at the Lighthouse International. In addition to traditional 3D visualization, he integrates with other Arup disciplines to produce validated lighting studies, acoustic and pedestrian/vehicle simulations, fire simulations, as well as real-time simulations using cutting-edge gaming technology.

New technologies are enabling us to bring designs to life in ways never before possible. Real-time graphics engines immerse viewers in a virtual environment, providing a perfect tool for showcasing projects and troubleshooting problems during the design phase. Our engineers and graphics experts collaborate to produce interactive walk-throughs showing options for different spaces and environments. As new synergies continue to emerge in the design field, visualization is key to providing a common basis of understanding among multiple design disciplines and project stakeholders. Arup is helping turn best practice into the next practice by using real-time graphics engines as a presentation tool, greatly improving communication and understanding for project teams and clients alike.

Level: All
Type: Talk
Tags: Product & Building Design; Virtual Reality & Augmented Reality

Day: Tuesday, 04/05
Time: 14:00 - 14:25
Location: Room LL21A

S6621 - Unified CPU+GPU Programming for the ASUCA Production Weather Model

Michel Müller Research Assistant, Tokyo Institute of Technology
Michel Muller entered the Ph.D. graduate course at the Department of Energy Sciences, Tokyo Institute of Technology in 2015, supervised by Professor Aoki. He graduated with an M.S. in electrical engineering and information technology from ETH Zurich in 2012. From 2009 to 2014, he worked as a consultant and then a systems architect at ATEGRA AG, Switzerland.

Porting applications to GPUs still requires compromises between time-to-solution, GPU performance, and CPU performance. This often leads to major challenges for large, Fortran-based applications like weather and climate models. We'll focus on two of these challenges, whose significance is shown using real-world code examples and performance results: The differing requirements on parallel task granularity as well as storage order between the two architectures. A proposed solution is a flexible preprocessor framework called "Hybrid Fortran," which has been used to port both dynamics and physics of ASUCA, one of the Japan Meteorological Agency's current operational weather models. Finally, an even more hands-off solution to GPU portability is proposed in the shape of a black box solution.

Level: Intermediate
Type: Talk
Tags: Earth System Modelling; Tools & Libraries; Computational Physics; Supercomputing & HPC; OpenACC

Day: Tuesday, 04/05
Time: 14:00 - 14:25
Location: Room 211A

S6695 - Generative Adversarial Networks

Ian Goodfellow Senior Research Scientist, OpenAI
Ian is a Senior Research Scientist. He is the lead author of the textbook Deep Learning (www.deeplearningbook.org). He studies new methods to improve deep learning. His interests include generative models, machine learning in the adversarial setting, and accelerating the training of neural networks. He has contributed to several open source machine learning libraries that leverage CUDA, including Theano, Pylearn2, and TensorFlow.

Generative adversarial networks (GANs) provide a way to generate realistic images with deep neural networks. Compared to other approaches to generative modeling, GANs are able to learn the cost function. This allows them to learn to capture important details that a fixed, manually designed cost function, such as mean squared error, would ignore. Compared to maximum likelihood estimation (MLE), GANs are specialized for the task of generating realistic samples. Both MLE and GANs are consistent statistical estimators, but have different priorities. MLE aims to assign high likelihood to all of the data, but may also assign high likelihood to other points and thus generate unrealistic samples. GANs aim to always generate realistic samples.

Level: Intermediate
Type: Talk
Tags: Deep Learning & Artificial Intelligence

Day: Tuesday, 04/05
Time: 14:00 - 14:25
Location: Hall 3

S6761 - Advanced System Power Management for Deep Learning and A.I. Machines (Presented by Linear Technology)

Dave Dwelley Office of the CTO, Linear Technology
Dave Dwelley is an Office of the CTO at Linear Technology. Since joining the company over 25 years ago, Dave has served as an analog chip designer and design manager. He received his BSEE/CD degree from UC Berkeley in 1986 and is a member of the original IEEE P802.3af (PoE) task force starting in 2000 and was a founding member of the P802.3at (PoE+) group starting in 2004. He now serves as chairman of the IEEE 802.3 Power over Data Lines study group and participates in the P802.3bp Reduced Twisted Pair Gigabit group. Dave holds 16 patents and spends his free time raising two teenagers and thinking about rebuilding the collection of old cars gathering dust in his garage.

Linear Technology's DC/DC regulator and power management solutions enable designers to increase performance in GPU- and CPU-based systems. Improved electrical, thermal and mechanical properties for core, I/O, and memory rails, combined with expertise and tools for PCB layout, simulation and design verification permit deployment of more efficient, lighter weight, cooler, and more compact systems. This presentation will also focus on methods of controlling, monitoring and debugging power circuits by digitally communicating with the device, reading temperature and load current data while setting voltages and start-up conditions. Future product advancements related to powering automotive electronics will also be discussed.

Level: All
Type: Talk
Tags: Performance Optimization; Deep Learning & Artificial Intelligence; Self-Driving Cars & Automotive

Day: Tuesday, 04/05
Time: 14:00 - 14:25
Location: Marriott Salon 6

S6794 - The Technology Powering the Immersive Cinema Experiences from Lucasfilm's ILMxLAB

Lutz Latta Principal Engineer, Lucasfilm
Lutz Latta is the Principal Engineer of ILMxLAB, where he leads the technology development for Lucasfilm's innovative VR, AR, and immersive experiences. Previously he worked extensively on video games, as the Lead Graphics Engineer of Star Wars: 1313 at LucasArts, and on The Lord of the Rings and Command & Conquer games at Electronic Arts Los Angeles.

Bringing Cinematic Virtual Reality to life requires the kind of tight collaboration between technical and creative forces that Lucasfilm has thrived on for over 40 years. We'll dive deep into the technology that powers the creative and technical experiments underway at the studio. We will discuss how multiple GPUs collaborate to achieve the highest level of photorealism for virtual reality today, how to repurpose offline rendered movie quality assets for real time rendering in sub 11 milliseconds per frame, and some of the lessons learned along the way.

Level: Intermediate
Type: Talk
Tags: Virtual Reality & Augmented Reality; Media & Entertainment; Real-Time Graphics; Press-Suggested Sessions: Professional Graphics; Press-Suggested Sessions: Virtual Reality

Day: Tuesday, 04/05
Time: 14:00 - 14:25
Location: Room LL20C

S6809 - Large-Scale Volume and Particle Visualization on GPU Clusters with vl3

Silvio Rizzi Postdoctoral Appointee, Argonne National Laboratory
Silvio Rizzi is a Postdoctoral Appointee at the Argonne Leadership Computing Facility, Argonne National Laboratory. His research interests include large-scale data analysis and visualization, in-situ visualization, GPU and many-core computing, display technologies, augmented and virtual reality, and surgical simulation. Silvio earned a B.Sc. in Electronics Engineering from Universidad Tecnologica Nacional, Mendoza, Argentina, and the degrees of M.S. in Electrical and Computer Engineering and Ph.D. in Industrial Engineering and Operations Research from the University of Illinois at Chicago.

We'll describe vl3, a GPU-accelerated parallel framework for large-scale scientific visualization and visual analytics. We'll explain its parallel architecture, based on a combination of the message passing interface (MPI) and GLSL shaders. In addition, we'll present applications to interactive visualization of large-scale volumetric and particle-based datasets generated on some of the most powerful supercomputers on the planet. We will also discuss strong and weak scalability experiments on up to 125 NVIDIA K80 GPUs. Finally, we'll cover vl3's capability for remote data visualization and streaming ultra-high-resolution images to remote displays, including a large tiled display driven by a workstation with multiple Quadro M6000 cards and Mosaic technology.

Level: All
Type: Talk
Tags: In-Situ and Scientific Visualization; Large Scale and Multi-Display Visualization

Day: Tuesday, 04/05
Time: 14:00 - 14:25
Location: Room LL21D

S6830 - WePod: Autonomous Shuttles on Public Roads

Floris Gaisser Researcher, Delft University of Technology
Floris Gaisser has finished a masters in Mechanical Engineering at Delft University of Technology with a specialization in Vision Based Robotics and Intelligent Mechanical Systems. Currently he is working as a PhD researcher in the Intelligent Vehicles & Cognitive Robotics department on the WePods project. Further he is a founding partner in Robot Care Systems. His expertise lies in the field of Product and Mechanical Engineering with a focus on Computer Vision Applications.

The WePod is the first self-driving vehicle on the public road without a steering wheel or pedals. To achieve driving in such a complex environment and guarantee safety, multiple sensors covering 360 degrees around the vehicle have been used. Sensor-fusion, road-user detection, classification and tracking have been implemented on NVIDIA's DrivePX platform. This session will give an overview of the systems architecture and implementation, as well preliminary test results of driving on the public road will be presented.

Level: All
Type: Talk
Tags: Self-Driving Cars & Automotive ; Deep Learning & Artificial Intelligence; Press-Suggested Sessions: AI & Deep Learning; Press-Suggested Sessions: Self-Driving Cars & Auto

Day: Tuesday, 04/05
Time: 14:00 - 14:25
Location: Room LL21E

S6840 - Intelligent Video Analysis System Based on GPU and Distributed Architecture

Shiliang Pu Executive Vice President, Hikvision Research Institute
Shiliang Pu, incumbent Executive Vice-President of Hikvision Research Institute, received double doctor's degree of Rouen University and Zhejiang University. He is regarded as the top expert in the field of image processing and intelligent identification both here and in the Zhejiang Province. He is the technical leader of a key laboratory in the Ministry of Public Security and a technological innovation leader at CETC.

In this session, we'll show you the impact that GPUs and technology like deep learning will have on the surveillance industry. As a global leader in the security/surveillance market, we will share some of the work we do with governments and municipalities to help them manage massive amounts of video and data.

Level: All
Type: Talk
Tags: Intelligent Video Analytics (IVA); Deep Learning & Artificial Intelligence; Video & Image Processing

Day: Tuesday, 04/05
Time: 14:00 - 14:25
Location: Room LL20D

S6116 - Towards Building a GPU Cloud Service for Human-Level Quality Image Understanding

Xiaodong He Senior Researcher, Microsoft
Xiaodong He is a senior researcher in the Deep Learning Technology Center, Microsoft Research, Redmond, Wash. He is also an affiliate full professor in the Department of Electrical Engineering at the University of Washington, Seattle, serving in the Ph.D. reading committee. His research interests include deep learning, speech, natural language, vision, information retrieval, and knowledge representation and management. He has published in IEEE TASLP, IEEE SPM, Proc. IEEE, ICASSP, ACL, EMNLP, NAACL, CVPR, SIGIR, WWW, CIKM, ICLR, NIPS, and other venues. He has received several awards, including the Outstanding Paper Award of ACL 2015. He and colleagues developed the MSR-NRC-SRI entry and the MSR entry that won No. 1 in the 2008 NIST Machine Translation Evaluation and the 2011 IWSLT Evaluation (Chinese-to-English), respectively, and the MSR image captioning system that won first prize at the MS COCO Captioning Challenge 2015. He has held editorial positions on several IEEE Journals and has served on the organizing and program committees of major speech and language processing conferences. He is a senior member of IEEE and a member of ACL.
Kenneth Tran Senior Research Engineer, Microsoft Tran
Kenneth Tran is a senior research engineer in the Deep Learning Technology Center, Microsoft Research. Previously, he was a machine learning scientist in the Cloud Machine Learning group at Microsoft, building a machine learning platform, which now powers the Azure ML. His research interest includes machine learning, optimization, and distributed computing.

Learn the latest deep learning techniques for semantic modeling of image, text, and knowledge graph, all empowered by GPU computing and cloud service. We'll demonstrate how to build deep semantic models across different modalities, and how to apply these models to reach the best results in information retrieval, question answering, and image captioning benchmarks. In particular, facilitated by the recently announced Microsoft Azure GPU compute instances, we'll show how to use GPU clusters to extend the MSR image captioning system, which won first prize in the COCO Captioning Challenge at CVPR 2015, and to build a publically available, large-scale, deep image understanding service that achieves state-of-the-art performance in generating novel captions for images.

Level: Intermediate
Type: Talk
Tags: Deep Learning & Artificial Intelligence; Computer Vision & Machine Vision; Data Center & Cloud Computing; Press-Suggested Sessions: AI & Deep Learning

Day: Tuesday, 04/05
Time: 14:30 - 14:55
Location: Hall 3

S6131 - Nvpro-Pipeline: Handling Massive Transform Updates in a SceneGraph

Markus Tavenrath Senior Developer Technology Engineer, NVIDIA
Markus finished his studies in computer science with focus on computer graphics in 2008. He was one of the first using ray tracing on CUDA for this diploma thesis which brought him straight to NVIDIA. There he primarily worked on GPU raytracing for SceniX, NVIDIA's scene graph technology, which had been showcased at SIGGRAPH 2008. Afterwards he applied his experience to implement parts of OptiX, improve SceniX and develop several ray tracing demos. In close cooperation with external partners he improved rendering performance and scenegraph usability as developer technology engineer. Now he is using the gained knowledge to experiment with future rendering technologies that bring high interactivity to complex scenes. This work includes both CPU and GPU strategies to solve typical scene graph operations related to rendering.

This session will walk through the new transforms hierarchy module in nvpro-pipeline which is able to compute the world matrices for each node in a transform hierarchy massively parallel instead of the traditional serial computation. Running the algorithm on a high-end GPU like a Quadro M6000 gives a massive speedup over computing the hierarchy on the CPU. In addition to this, the data which has to be transferred between the CPU and GPU gets minimized which gives another performance boost.

Level: Intermediate
Type: Talk
Tags: Real-Time Graphics

Day: Tuesday, 04/05
Time: 14:30 - 14:55
Location: Room 210E

S6215 - MBE: A GPU-Based Fast, Robust and Precise Solver for Chemical ODEs

Fan Feng Ph.D. Student, Computer Network Information Center, Chinese Academy of Sciences, Beijing, China
Fan Feng is a Ph.D. student with the Supercomputer Center, Computer Network Information Center, Chinese Academy of Sciences, Beijing.
Zifa Wang Professor, Institute of Atmospheric Physics, Chinese Academy of Sciences
Professor Zifa Wang is Director of the State Key Laboratory of Atmospheric Boundary Layer Physics and Atmospheric Chemistry (LAPC) of the Institute of Atmospheric Physics, Chinese Academy of Sciences, the editor of Chinese Journal of Atmospheric Sciences, Aerosol and Air Quality Research, The ScientificWorld JOURNAL and SOLA Journal. He has developed a nested air quality prediction modeling system (NAQPMS), which is a tool to study air pollution such as Asian dust storms across a regional and urban scale and is widely used in China as a real time forecasting model of air quality. This model was included in a multiple model inter-comparison project MICS-Asia III. He studied and worked in Japan from 1998 to 2002 and got his PhD in Atmospheric Physics in 1997.

Explore a GPU-based efficient algorithm for chemical ODEs, which is the core and costly part of atmosphere chemistry model in CAS-ESM project. Chemical ODEs is numerically sticky because of its stiffness, nonlinearity, and nonnegativity. Traditional solvers, such as LSODE, are hard for parallelism because of its complicated control flow and coupling. In our experiments, we have obtained 3-5X speedup on GPU when the same input is set on each node, which eliminates the divergences in kernel, while the performance with real input is even worse than the serial code. So we develop a new solver Modified-Backward-Euler (MBE). In our numerical experiments, MBE is shown to be faster and more precise than LSODE, and it's easy to parallelize, so we can expect a significant speedup on GPU.

Level: All
Type: Talk
Tags: Earth System Modelling; Computational Fluid Dynamics; Algorithms

Day: Tuesday, 04/05
Time: 14:30 - 14:55
Location: Room 211A

S6276 - Autonomous Robotic 3D Printing: Real-Time Path Planning with Computer Vision

Daghan Cam Architect, University College London
Daghan Cam is an architect and researcher based in London. He is the director of Daghan Cam Limited, which operates between architecture, technology, and research. He runs a post-graduate research cluster at UCL's Bartlett School of Architecture with Alisa Andrasek and Andy Lomas. He also leads research on GPU computing and he is a co-principal investigator of UCL as an NVIDIA GPU Research Center. Previously he worked with Zaha Hadid Architects. He taught workshops and gave lectures at AA Visiting Schools in Istanbul, Athens, London, and at Ecole d'architecture in Paris. His work on computational design and large-scale robotic fabrication has been widely exhibited, recently in San Francisco and in Milan Design Week 2015.

Teach your 3D printing robot how to adapt to unpredictable material behavior by using deep learning algorithms. We'll introduce a path planning strategy for iteratively correcting robot target positions in a 3D printing process by using an NVIDIA Jetson card attached to an industrial robotic arm. Initial path generation, visual tracking of material behavior in real-time, evaluation and recomputation of robot trajectories will be explained by code examples and video recordings from the fabrication process.

Level: Beginner
Type: Talk
Tags: Product & Building Design; Robotics & Autonomous Machines; Computer Vision & Machine Vision; Deep Learning & Artificial Intelligence

Day: Tuesday, 04/05
Time: 14:30 - 14:55
Location: Room LL21A

S6288 - Automatically Fusing Hundreds of Stencil Kernels for Better Performance and Productivity

Mohamed Wahib Post-Doctoral Researcher, RIKEN Advanced Institute for Computational Science
Mohamed Wahib is currently a post-doctoral researcher in the HPC Programming Framework Research Team at RIKEN Advanced Institute for Computational Science (RIKEN AICS). He joined RIKEN AICS in 2012 after years at Hokkaido University, Japan, where he received a Ph.D in computer science in 2012. Prior to his graduate studies, Mohamed worked as a researcher at Texas Instruments R&D for four years.

This talk proposes an end-to-end framework for automatically transforming stencil-based CUDA programs to exploit inter-kernel data locality. The CUDA-to-CUDA transformation collectively replaces the user-written kernels by auto-generated kernels optimized for data reuse. The transformation is based on two basic operations, kernel fusion and fission, and relies on a series of automated steps: gathering metadata, generating graphs expressing dependencies and precedency constraints, searching for optimal kernel fissions/fusions, and generation of optimized code. We show how the automatic transformations were practical and effective in exploiting exposed data localities for a variety of real-world applications with large codebases that contain dozens of kernels and data arrays.

Level: Intermediate
Type: Talk
Tags: Tools & Libraries; Supercomputing & HPC; Performance Optimization

Day: Tuesday, 04/05
Time: 14:30 - 14:55
Location: Room 211B

S6389 - Embedded Deep Learning for Object Detection and Classification in Aerial Images

Jon Barker Solution Architect, NVIDIA
Jon Barker is a solution architect with NVIDIA, helping customers and partners develop applications of GPU-accelerated machine learning and data analytics to solve defense and national security problems. He is particularly focused on applications of the rapidly developing field of deep learning. Prior to joining NVIDIA, Jon spent almost a decade as a government research scientist within the U.K. Ministry of Defence and the U.S. Department of Defense R&D communities. While in government service, he led R&D projects in sensor data fusion, big data analytics, and machine learning for multi-modal sensor data to support military situational awareness and aid decision making. He has a Ph.D. and B.S. in pure mathematics from the University of Southampton, U.K.

Learn how deep learning can be applied to object detection, localization, and tracking problems in remote sensing. We'll present a technical case study showing how a convolutional neural network (CNN) trained in the data center using DIGITS can be deployed to an embedded GPU system to carry out low-latency object detection, classification, and tracking in high-resolution aerial imagery. We'll compare different approaches to detection and localization tasks. An example will be given of integrating the Caffe deep learning framework for GPU-accelerated CNN inference with an OpenCV-based image and video processing pipeline. We'll also show how transfer learning can be accomplished using DIGITS to train a CNN when only a small task specific training dataset is available.

Level: Intermediate
Type: Talk
Tags: Deep Learning & Artificial Intelligence; Computer Vision & Machine Vision; Robotics & Autonomous Machines; Aerospace & Defense

Day: Tuesday, 04/05
Time: 14:30 - 14:55
Location: Room 210H

S6397 - Real-Time Non-Rigid Image Registration Engine

Randall Miles Senior Research Scientist, Propulsion Science and Technology
Dr. Randall Miles is a physicist, algorithm developer, and senior research scientist at Propulsion Science and Technology. He is lead designer and developer for model database development activities, and key contributor on a variety of projects, including quantum chemistry calculations and radar cross section modeling of CFD fields.

Non-rigid image registration, i.e., morphing, allows a smaller footprint of seed images to be used to create a smooth and continuously changing series of images. We'll present a new high-speed toolkit for image morphing implemented using NVIDIA GPU technology. Time improvements of ~80% were seen through implementing a succession of CUDA optimizations guided by the Nsight profiler results. Tests were conducted using available simulated rocket plume images to calculate run times and create performance measures.

Level: All
Type: Talk
Tags: Aerospace & Defense; Video & Image Processing; Performance Optimization; Computer Vision & Machine Vision

Day: Tuesday, 04/05
Time: 14:30 - 14:55
Location: Marriott Salon 2

S6421 - Using OpenACC to Parallelize Seismic One-Way-Based Migration

Maxime Hugues HPC Research Scientist, Total E&P Research & Technology USA, LLC
Maxime Hugues has been an HPC research scientist at TOTAL Houston since 2012. Maxime graduated from the French National Engineer School ""ISEN-Toulon"" in 2007. The same year, he received an M.S. from the University of Science and Technologies of Lille. He was a Ph.D. fellow at the oil and gas company TOTAL, and received his degree in computer science in 2011 at the University of Lille. While doing his Ph.D., he worked as a junior researcher in high performance computing at TOTAL. He continued to work on the multi-programming paradigm as a postdoctoral researcher at INRIA and as a visitor scientist at the University of Tsukuba. His research focuses on programming paradigms and innovative hardware for extreme computers.

We'll describe our experience in using OpenACC to parallelize One-Way Based Migration, a seismic application that uses Fourier Finite Differencing. We describe our approach at optimizing application kernels that involve FFT operations and solving systems of tridiagonal sparse matrices. We talk about expectations and challenges of using OpenACC along with potential pitfalls for application users. We highlight the advantages of using OpenACC for high-performance scientific applications and list shortcomings that affect performance.

Level: All
Type: Talk
Tags: Energy Exploration; Performance Optimization; Supercomputing & HPC; OpenACC

Day: Tuesday, 04/05
Time: 14:30 - 14:55
Location: Marriott Salon 1

S6428 - Molecular Simulations of DNA Loop Extrusion Explain and Predict Human Genome Architecture

Adrian Sanborn Ph.D. Candidate, Department of Computer Science, Stanford University
Adrian Sanborn is a Ph.D. candidate in the department of Computer Science at Stanford University and a researcher at the Center for Genome Architecture in Houston. Previously, he graduated summa cum laude from Harvard University with a degree in mathematics and computer science.

We'll show how the human genome's 3D organization, which is closely linked to important cellular functions, can be explained and predicted by molecular simulations. Our recent high-resolution maps of DNA self-contacts revealed that the genome is organized into loops and domains demarcated by the DNA-binding protein CTCF. We present a model, developed using GPU-accelerated molecular simulations, in which loops and domains form through loop extrusion. Our simulations recapitulate DNA contact maps given only CTCF binding locations. When we alter CTCF binding locations using genome editing, our simulations generate accurate predictions for the edited DNA contact maps. These results significantly advance our understanding of genome folding and open a path towards targeted surgery of 3D genomes.

Level: All
Type: Talk
Tags: Computational Biology; Computational Physics; Press-Suggested Sessions: HPC & Science

Day: Tuesday, 04/05
Time: 14:30 - 14:55
Location: Marriott Salon 5

S6489 - Testing Chordal Graphs with CUDA®

Agnieszka Lupinska PhD Student , Jagiellonian University
Agnieszka Lupinska is a Ph.D. student of computer science at Jagiellonian University in Cracow. She teaches programming in CUDA. She was a software engineer in Nokia Cracow, from June 2014 to June 2015. She developed embedded Linux systems in Small Cell, Nokia's product for LTE technology. Her interests include parallel computing, multi-threaded algorithms, low-level language programming, advanced algorithms, and computational complexity.

We'll present the CUDA implementation of algorithm to test chordality of graphs, which uses the parallel partition refinement with pivots. A graph is chordal if each cycle of size greater than three in $G$ has a chord, that is an edge between two non-adjacent vertices on the cycle. In total, the algorithm takes O(N) time on N-threads grid and it performs O(N+M) work for graphs of N vertices and M edges. We'll compare the performance tests results achieved by the CUDA implementation on NVIDIA GeForce GTX TITAN X and the sequential implementation on CPU with four cores (eight threads). We'll present the tests results for cliques, sparse graphs, dense graphs, and random chordal graphs.

Level: Advanced
Type: Talk
Tags: Algorithms; Big Data Analytics

Day: Tuesday, 04/05
Time: 14:30 - 14:55
Location: Marriott Salon 3

S6528 - Implementing Deep Learning for Video Analytics on Tegra X1

Carles Fernández Tena Director of Research, Herta Security
Carles Fernández Tena received his B.S. in Telecommunication Eng. and M.S. in Language and Speech from the Technical University of Catalonia (UPC) in 2005. He received an M.S. in Computer Vision and AI from the Autonomous University of Barcelona (UAB) in 2008, where he obtained his Ph.D. cum laude in 2010, receiving the 2010 Extraordinary Ph.D. Award. He has published more than 40 scientific articles in international journals and conferences. Currently he leads the research team at Herta Security. His research interests include Biometrics, Computer Vision, and Machine Learning, particularly unconstrained facial analysis in image and video.

The performance of Tegra X1 architecture opens the door to real-time evaluation and deployment of deep neural networks for video analytics applications. This session presents a highly optimized, low-latency pipeline to accelerate demographics estimation based on deep neural networks in videos. The proposed techniques leverage the on-die hardware video decoding engine and Maxwell GPU cores for conducting advanced video analytics such as gender or age estimation. Our results show that Tegra X1 is the right platform for developing embedded video analytics solutions.

Level: All
Type: Talk
Tags: Intelligent Video Analytics (IVA); Embedded; Aerospace & Defense; Deep Learning & Artificial Intelligence; IoT

Day: Tuesday, 04/05
Time: 14:30 - 14:55
Location: Room LL20D

S6693 - Network, Storage, and Workflow Design for GPU Centric Film Productions

Jeff Brue CTO, Open Drives
Jeff is the CTO and founder of Open Drives a Data Storage company focused on Production IT. Jeff has an extensive background in filmmaking, 3d animation, and storage kernel design. Having worked on over 120 feature films in his time as CTO at several post production facilities, as well as heading up the first Uncompressed Data workflow commercial system in Hollywood with the Viper camera. Founding Open Drives in 2011 he has been the principal architect for the facilities for such productions as Gone Girl for David Fincher as well as designing the in house technology architecture for the Coen Brothers and Deadpool for 20th Century Fox. Open Drives studio data clients include Legendary Pictures, Warner Brothers, 20th Century Fox, Disney as well as many others.

This talk is designed to provide an overall perspective on GPU centric workflow for Media and Entertainment and an update on the talk from last years perspective on Gone Girl, to this year's high lighted film Deadpool. Particularly focused on designing storage systems for high speed GPU workflows in VFX and editing.

Level: All
Type: Talk
Tags: Media & Entertainment; Performance Optimization; General Interest

Day: Tuesday, 04/05
Time: 14:30 - 14:55
Location: Room LL21C

S6750 - The Future of GPU Rendering in 2016 and Beyond

Jules Urbach CEO & Founder, OTOY, Inc.
Jules Urbach is a pioneer in computer graphics, streaming and 3D rendering with over 25 years of industry experience. He made his first game, Hell Cab (Time Warner Interactive) at age 18, which was one of the first CD-ROM games ever created. Six years after Hell Cab, Jules founded Groove Alliance. Groove created the first 3D game ever available on Shockwave.com (Real Pool). Currently, Jules is busy working on his two latest ventures, OTOY and LightStage which aim to revolutionize 3D content capture, creation and delivery.

You'll learn about OTOY's GPU rendering research and new product releases in the coming year. OTOY's breakthroughs in compression and rendering on NVIDIA GPUs have dramatically reduced the barriers for light field rendering, making it a viable media format that gives content creators everywhere a simple, cost-effective way to bring high quality, interactive 3D content to multiple platforms for the world to enjoy.

Level: Intermediate
Type: Talk
Tags: Rendering & Ray Tracing; Real-Time Graphics; Virtual Reality & Augmented Reality; Media & Entertainment; Press-Suggested Sessions: Virtual Reality

Day: Tuesday, 04/05
Time: 14:30 - 14:55
Location: Room LL21B

S6789 - Quasar (GPU Programming Language) on GDaaS Accelerates Coding from Months to Days (Presented by Cloudalize)

Bart Goossens CTO, Gepura (spinoff in incubation at UGent iMinds)
Since 2006, Bart Goossens has been a presenter at more than 30 scientific conferences in the domain of Image Processing/Medical Image Processing, such as IEEE International Conference on Image Processing, SPIE/IS&T Electronic Imaging, SPIE Optics + Photonics and SPIE Medical Imaging. He was also invited for a lecture at Banff International Research Station (BIRS), Banff, Canada in 2010 and for two lectures at the Image Processing Seminar of the University of Houston (dept. Mathematics) in 2013 and 2014, respectively.

Learn how to reduce coding efforts from months to days by using Quasar on the GDaaS Platform. This way of working will relieve the programmer from most of the device/platform-dependent issues he experiences today. The programming language Quasar provides an easy access to compute resources such as: parallel computing, IDE tools, Multi-core CPU, and (Multi-)GPU. In addition, it provides a runtime that reconfigures the code automatically, depending on the context. GDaaS, on the other hand, is a GPU Desktop as a Service platform that enables the instant provisioning and deployment of applications. GDaaS provides a unified API (to sell/use from your website or app), and billing reporting to support pay-as-you-use licensing models with fraud protection. Develop better, faster and smarter.

Level: All
Type: Talk
Tags: Programming Languages; Video & Image Processing; Computer Vision & Machine Vision

Day: Tuesday, 04/05
Time: 14:30 - 14:55
Location: Marriott Salon 6

S6803 - Canvas: The Enterprise Media Server Solution For Game Engines

Kora Van den Bulcke Founder, Immersive Design Studios
Kora Van den Bulcke graduated from the University of Montreal's program in Architecture and is a recipient of the Gold Medal of the Royal Architecture Institute of Canada (1996). After practicing as an architect, she co-founded the new media art collective Workspace Unlimited in 2001 and in 2007 co-founded Immersive Design Studios where she currently acts as president with a focus on content development with game technology. Her vision of architecture is not limited to the design of future physical spaces, but also includes the creation of "hybrid space": mixed/augmented environments that coexist with digital environments. Though conceptually engaged with the materiality of architectural space and urban life, her creative output focuses on immersive technologies that activate a new field of relations between people, architecture and the broader environment.
Thomas Soetens Founder, Immersive Design Studios
Thomas Soetens graduated in 1992 with an MFA in Visual Arts from the St-Lucas School of Arts in Belgium. After practicing as a painter, he co-founded Workspace Unlimited in 2001 and founded Immersive Design Studios in 2007 where he currently acts as CEO with a focus on research and development. Immersive Design Studios is an interdisciplinary design and technology company based in Montreal utilizing the potential of 3D game technology in corporate events, architecture, cultural new-media installations, and real-time collaborative environments.

Canvas is a GPU software and hardware media server solution, which enables enterprises to create immersive projects with game engine technology for corporate events, live shows, architectural visualizations, VR and sports. CANVAS is designed for large-scale projections and immersive experiences of real-time game engine content, video and live capture.

Level: All
Type: Talk
Tags: Game Development; Virtual Reality & Augmented Reality; Media & Entertainment; Product & Building Design

Day: Tuesday, 04/05
Time: 14:30 - 14:55
Location: Room 212B

S6808 - Image Compositing on GPU-Accelerated Supercomputers

Pascal Grosset Ph.D. Student, University of Utah
Pascal Grosset is a Ph.D. student in graphics and visualization at the University of Utah. He is working with Dr. Charles Hansen on image compositing algorithms for high performance computing systems targeting both GPUs and many-core CPUs.
Gilda Bisceglia Market Strategy & Communication , pascalpolverini.com
Gilda Bisceglia has worked as a journalist and then in communication in different areas, from fashion houses in Milan to health corporation where she covered the role of HR and project manager.

Image compositing on GPUs has traditionally been considered impractical because communication between GPUs on different nodes used to be very slow. However, the introduction of GPU Direct RDMA changes that. Inter-node GPU communication is now only a one-copy operation instead of five. Also Kepler-class K20 GPU accelerators and above can run both OpenGL and CUDA at the same time. We'll present the workflow that allows us to use OpenGL 4.5 with GLSL for volume rendering, GPU Direct RDMA and CUDA Kernels for compositing, and CUDA OpenGL interop for transferring data between OpenGL and CUDA.

Level: Advanced
Type: Talk
Tags: In-Situ and Scientific Visualization

Day: Tuesday, 04/05
Time: 14:30 - 14:55
Location: Room LL21D

S6835 - Enabling Autonomous Drones with Real Time Computer Vision Applications

Dor Abuhasira CEO, Percepto
Dor Abuhasira is the CEO and co-founder of Percepto, a leading provider of real time computer vision technology and applications for drones (sUAV's). Dor has lead embedded computing hardware design at Netoptics and before that as a technology leader at ECI-telecom. Dor was responsible for the company's next generation GPON platforms developed for Tier 1 customers like British Telecom and Deutsche Telecom. Dor has a B.Sc. with honors in Electrical engineering from Ben-Gurion University in Israel.

Real time computer vision applications are the missing link to unlock a world of new applications for drones. From autonomous inspection missions for power lines or railroads, to advanced filming features such as follow-me and position correction - the need for embedded computer vision to create autonomy for drones is rapidly growing. Percepto has developed the embedded platform with drones oriented SDK that enables most drones today to use these type of capabilities. We'll go over the need, challenges and solutions that enable new types of disruptive autonomous drones solutions.

Level: All
Type: Talk
Tags: Computer Vision & Machine Vision; Robotics & Autonomous Machines; Embedded

Day: Tuesday, 04/05
Time: 14:30 - 14:55
Location: Room 210F

S6857 - VW's Approach to Piloted Driving with Deep Learning

Martin Hempel Project Leader, Volkswagen of America
Martin Hempel is project manager at the Volkswagen Electronic Research Laboratory (ERL) in Belmont. He has a Masters engineering degree in Mechatronics and about 5 years of experience in Driver Assistance Systems and Piloted Driving. He started his career at Audi in Germany working on several ADAS pre-development projects before he moved to California in 2014. Since 2015 he has been a member of the Deep Learning Initiative, evaluating the potential of deep learning algorithms for automotive applications.

The Electronics Research Laboratory (ERL) is a part of the global research and development network that supports the Volkswagen Group brands. These brands include Audi, Bentley, Bugatti, Lamborghini, Porsche, and Volkswagen. Located in Silicon Valley, we draw upon its innovation spirit to build new concepts and technologies for our future vehicles. Deep learning is at the center of our work in the fast evolution of piloted driving. As part of our research into this technology, our mission is to research deep neural network architectures and bridge the gap between concept and series development application. In this paper, we'll present our current development in a variety of Deep Learning projects as well as insights into how this technology could affect the future of piloted driving.

Level: All
Type: Talk
Tags: Self-Driving Cars & Automotive ; Deep Learning & Artificial Intelligence

Day: Tuesday, 04/05
Time: 14:30 - 14:55
Location: Room LL21E

S6860 - Rendering Challenges in Quill: A VR Sketching Tool

Inigo Quilez VFX Supervisor, Oculus Story Studio
Highly-Rated Speaker
Inigo Quilez grew up as a teenager enjoying the mountains, snow and sea, but also programming fractals and graphics algorithms. At the age of 18, right upon the discovery of the underground community called "the demoscene" and the potential of using code and maths to build beauty in real-time, he decided to focus all of his work and time into the creative side of computer graphics. After having finished his Master's degree in Electrical Engineering, and having later worked professionally in virtual reality and real-time rendering of massive data sets in Belgium for six years, he moved to the US to work at Pixar Animation Studios. There he spent five years creating procedural vegetation and landscapes for the movies, from research and tools creation to doing the required shot work in production. Inigo joined the Oculus Story Studio end of 2014, where he now works on bridging the worlds of real-time rendering, filmmaking and virtual reality. In his spare time Inigo co-founded the website Shadertoy, to which he also contributes with content regularly.

At this session, you'll learn how the Oculus Story Studio created Quill, their VR illustration production tool for their incoming short movie "Dear Angelica" on top of OpenGL 4.5. Quill stresses the GPU due to the high poly count required for accurate and smooth illustration vector work, the need to do live edits in the data sets to meet the artist's workflow, and the natural constraints of stereo high resolution rendering at 90 fps. We'll review the choices of rendering algorithms that enabled us to hit the performance and quality required while allowing the artists express the visual look of the movie. We'll go through the architecture of the software, and the algorithmic decisions and modern OpenGL features employed in the creation of Quill.

Level: Intermediate
Type: Talk
Tags: Virtual Reality & Augmented Reality; Real-Time Graphics; Media & Entertainment

Day: Tuesday, 04/05
Time: 14:30 - 14:55
Location: Room LL20C

S6115 - Real-Time Free Viewpoint TV System Based on a New Panorama Stitching Framework

Pierre Boulanger Professor, University of Alberta
Pierre Boulanger has more than 30 years of experience in 3D computer vision, rapid product development, and the applications of virtual reality systems to medicine and industrial manufacturing. He worked for 18 years at the National Research Council of Canada as a senior research officer where his primary research interest was in 3D computer vision, rapid product development, and virtualized reality systems. He now has a double appointment as a professor at the University of Alberta Department of Computing Science and at the Department of Radiology and Diagnostic Imaging. He is currently the director of the Advanced Man Machine Interface Laboratory (AMMI) as well as the scientific director of the SERVIER Virtual Cardiac Centre. In 2013, Pierre was awarded the CISCO chair in healthcare solutions, a 10 years investment by CISCO systems in the development of new IT technologies for healthcare in Canada. His main research topics are on the development of new techniques for telemedicine, patient specific modeling using sensor fusion, and the application of tele-presence technologies to medical training, simulation, and collaborative diagnostics.

With the advance of GPU and vision technologies, free viewpoint TV (FTV) will become a reality in the near future. Traditional videos such as those shown on TV or viewed on the Internet are passive and two-dimensional in nature. Viewers can only passively observe the events captured by a cameraman and have no ability to actively change their viewpoint once the video is recorded. On the contrary, FTV will allow the viewer to select an arbitrary viewpoint and thus enjoy a feeling of immersion into events such as an Olympic competition or a popular theatre show. In this presentation, we will describe a FTV system based on creating a real-time panorama from multiple pixel synchronized cameras using GPU and how to transmit this information using normal IPTV technologies.

Level: Intermediate
Type: Talk
Tags: Computer Vision & Machine Vision

Day: Tuesday, 04/05
Time: 15:00 - 15:25
Location: Room 210F

S6193 - Visualization Toolkit: Improving Rendering and Compute on GPU's

Robert Maynard R&D Engineer, Kitware Inc
Robert Maynard joined Kitware in 2010 as a research and development engineer. He is one of the primary developers of VTK-M and also an active contributor to CMake, SMTK, CMB, ParaView, and VTK.

Learn about how the latest changes to VTK, VTK-m, and Catalyst are allowing for better GPU-accelerated rendering and compute. We'll give an overview of the latest changes to VTK's rendering infrastructure, VTK-m compute capabilities, and Catalyst. Lastly, we'll demonstrate the results of this work by showing the results of an in-situ visualization of PYFR GPU simulation.

Level: Intermediate
Type: Talk
Tags: In-Situ and Scientific Visualization; Supercomputing & HPC

Day: Tuesday, 04/05
Time: 15:00 - 15:50
Location: Room LL21D

S6194 - Delivering Graphics-Intensive Applications to Computing Labs and BYOD in Education

Michael Goay Executive IT Director, University of Southern California Viterbi School of Engineering
Michael Goay is responsible for the information technology and computer systems that support enterprise goals of Viterbi School of Engineering in University of Southern California. He oversees the school's IT service support, service delivery, digital communication, information systems, and system infrastructure in support of academic, administrative, and research activities. He has a B.S. in electrical engineering from the University of Texas, Austin, and an M.S. in health informatics from the University of Minnesota.

We'll examine some of the current possibilities to deliver graphics-intensive applications in support of engineering, architecture, and design and show how NVIDIA GRID boards benefit the Horizon View virtualized environment in the University of Southern California Viterbi School of Engineering, empowering students to learn and study with graphics-intensive software with any device, in the location where and when they feel most productive and inspired.

Level: All
Type: Talk
Tags: Graphics Virtualization; Data Center & Cloud Computing; Computer-Aided Engineering

Day: Tuesday, 04/05
Time: 15:00 - 15:50
Location: Marriott Salon 4

S6240 - High-Level GPU Programming Using OpenMP 4.5 and Clang/LLVM

Arpith Jacob Research Staff Member, IBM T. J. Watson Research Center
Arpith Jacob is a research staff member in the Advanced Compiler Technologies Group at the IBM T. J. Watson Research Center. His interests include parallelizing compilers and special-purpose accelerators. His current research is on building an effective OpenMP compiler for the IBM POWER/NVIDIA GPU CORAL supercomputers.

Learn how to exploit GPUs using high-level directives and our opensource Clang/LLVM OpenMP compiler and runtime. In this talk we describe the OpenMP data and execution model for accelerators and the application directives to program them. We take the audience behind the covers to reveal how the higher level abstractions map to CUDA and GPU constructs, which can be exploited for either flexibility or performance. Our runtime transparently manages code and data offloading to multiple GPUs on a node, uses a pool of pinned memory to reduce overhead, and supports asynchronous execution of dependent kernels on GPU SMs with CUDA streams by simply expressing kernel dependencies. We show that performance of programs written in OpenMP is close to that of native CUDA for relevant benchmarks.

Level: All
Type: Talk
Tags: Programming Languages; Supercomputing & HPC

Day: Tuesday, 04/05
Time: 15:00 - 15:25
Location: Room 210E

S6283 - GPU-Centric Thinking: Use Case Acceleration of a DNA Sequencer Pipeline

Charles Seberino Principal Software Performance Engineer,
Chuck Seberino most recently worked for Complete Genomics, Inc.(CGI), where his primary responsibility was algorithm development and GPU acceleration of high throughput DNA sequencing systems. Prior to Chuck's time at CGI, he worked for government, defense, and robotics companies including Raytheon Missile Systems and Silicon Graphics. He has actively developed software for graphics, visual simulation, and GPGPU applications for over 20 years. He is refocusing his GPU and HPC expertise into the life sciences space, where he is pursuing an M.S. in Bioinformatics at Stanford. He holds a degree in Electrical Engineering from the University of Arizona.

So you can write a kernel, but can you take your app and get great performance out of GPUs without going to extremes? This talk focuses on three areas of CUDA programming - data movement, hardware architecture, and multi-level parallelism. A modern DNA sequencing pipeline is examined from a use case porting effort along with additional code examples. The code was taken from a multi-threaded CPU application and accelerated using CUDA. The end result was a hybrid application scaling to over 28 host threads spanning 4 GPUs. Comparisons will be shown for one or more GPUs on Tesla K40 and K80 cards as well as GeForce Titan X. Additional topics covered include the use of 3rd party libraries, custom memory allocators, and stream callbacks.

Level: Intermediate
Type: Talk
Tags: Performance Optimization; Computational Biology

Day: Tuesday, 04/05
Time: 15:00 - 15:50
Location: Room 212A

S6310 - How Seymour Powell Accelerates Their Design Workflow by Using SOLIDWORKS Visualize, NVIDIA Iray® and Quadro® VCA

Brian Hillner SOLIDWORKS Visualize Product Manager, Dassault Systemes
Brian Hillner, product manager for SOLIDWORKS Visualize (formerly Bunkspeed), has played an integral role in managing the product and producing digital assets for multiple clients. With a degree in industrial design, Brian is able to understand and tailor cutting-edge software experiences for specific target audiences. His unique blend of design, photography, and sales has helped him promote SOLIDWORKS's ecosystem of software solutions as a leader in the design community. Brian holds a B.S. in design with honors from DAAP at the University of Cincinnati in Ohio.
David Randle Strategy and Business Development, Dassault Systemes SolidWorks
With an Industrial Design degree, a BFA from Academy of Art University in San Francisco, and over 10 years of experience in creative services, marketing, product management and business development with Bunkspeed, David Randle is now helping to drive SOLIDWORKS strategy to address designer needs in material and realistic rendering.
Kok Chian Leong Senior Designer, Seymour Powell
Graduated from Royal College of Art in London, Kok Chian Leong has over a decade experience as senior designer consultant for various companies around the globe, from Hewlett-Packard, Dell and HTC, to World Kitchen, Sun Microsystems and Motorola. He is now the design lead for large scale industrial design products and one of the principal designer at Seymour Powell.

Using actual customer data, we'll present the many capabilities and direct benefits of SOLIDWORKS® Visualize (formerly called Bunkspeed), which provides a suite of standalone software tools combining industry-leading rendering capabilities with design-oriented features and workflows. Visualize enables easy and fast creation of visual content for designers, engineers, marketing, and other content creators. We'll showcase its flexibility and adaptability to the 3D workflow.

Level: All
Type: Talk
Tags: Product & Building Design; Rendering & Ray Tracing

Day: Tuesday, 04/05
Time: 15:00 - 15:50
Location: Room LL21A

S6343 - Task-Based Dynamic Scheduling Approach to Accelerating NASA's GEOS-5

Eric Kelmelis CEO, EM Photonics
Highly-Rated Speaker
Eric Kelmelis is the co-founder and CEO of EM Photonics, a company focused on the development and transition of innovative research and technology in the fields of advanced imaging, high-performance computing, and embedded systems. Eric received B.S. and M.S. degrees in electrical engineering from the University of Delaware, has more than 60 publications, and holds two patents. He has also served as conference chair at SPIE's Defense, Security, and Sensing symposium since 2010.

As with many complex scientific computing applications, NASA's GEOS-5 climate modeling tool is computationally intense and can benefit from modern accelerated co-processing hardware. However, the burden of utilizing these new devices and achieving optimal results is placed on the same scientists responsible for developing the core algorithms and applying them to applications of interest. We'll present a task-based programming approach coupled with a dynamic scheduler. This allows the science of the software to be divorced from its implementation, both reducing the burden on the programmer and allowing it to adapt to changes in hardware architecture. In collaboration with NASA's Goddard Flight Research Center, we show our results in applying this technique to GEOS-5.

Level: All
Type: Talk
Tags: Earth System Modelling; Tools & Libraries; Computational Physics

Day: Tuesday, 04/05
Time: 15:00 - 15:25
Location: Room 211A

S6482 - Live Video Classification and Deep Learning at Twitter

Nicolas Koumchatzky Engineer, Twitter
Nicolas Koumchatzky is an engineer at Twitter Cortex in NYC. He designs models (mostly images and videos) and contributes to the internal deep learning codebase. He completed his Advanced Msc in Probabilities at Ecole Polytechnique in Paris in 2007. After working in quantitative financial modeling, he was introduced to deep learning and joined Madbits, acquired by Twitter in July 2014.

Twitter is a unique source of real-time information, offering amazing opportunities for automatic content understanding. The format of this content is diverse (tweets, photos, videos, music, hyperlinks, follow graph, ...), the distribution of topics ever-changing (on a weekly, daily, or sometimes hourly basis), and the volume ever-growing; making it very challenging to automatically and continuously expose relevant content. Particularly exciting is the rise of live streaming video, and the constraints that come with it. In this talk we present the work done at Twitter Cortex to tackle live video classification, rich media representation, and the technological choices we made to bridge the gap between deep learning research and fast-paced product development."

Level: Intermediate
Type: Talk
Tags: Deep Learning & Artificial Intelligence; Press-Suggested Sessions: AI & Deep Learning

Day: Tuesday, 04/05
Time: 15:00 - 15:50
Location: Hall 3

S6484 - G3NA-V: GPU-Enabled Tool for Mining and Aligning Complex Gene Interaction Graphs

Karan Sapra PhD Student, Clemson University
Karan Sapra is a fifth year Ph.D. candidate in electrical and computer engineering at Clemson University under Dr. Melissa Smith. He received a B.S. in computer engineering with mathematical science minor from Clemson in 2011. He recently finished a six-month internship at Oak Ridge National Laboratory in the Technology Integration Group.

The rapid production of new crops in the face of population pressure and climate change is arguably as important to human health in the future as biomedical research is today. We'll demonstrate the utility of biological graph alignment using an NVIDIA GPU-enabled visualization G3NA wrapper and human interaction tool called G3NA-V. While our tool is applicable across the life sciences, our target end-user is the HPC-underserved plant breeder, for whom we open high-resolution windows into dynamic crop genetic systems to accelerate the crop development cycle.

Level: All
Type: Talk
Tags: Computational Biology; Big Data Analytics; In-Situ and Scientific Visualization; Supercomputing & HPC

Day: Tuesday, 04/05
Time: 15:00 - 15:25
Location: Marriott Salon 5

S6490 - Collision Avoidance on NVIDIA Tegra®

Richard Membarth Senior Researcher, DFKI
Richard Membarth is a senior researcher at the German Research Center for Artificial Intelligence (DFKI). He holds a diploma degree in computer science from the University of Erlangen-Nuremberg and a postgraduate diploma in computer and information sciences from Auckland University of Technologies. In 2013, he received his Ph.D. (Dr.-Ing.) at the University of Erlangen-Nuremberg on the automatic code generation for GPU accelerators from a domain-specific language for medical imaging. After his Ph.D., Richard joined the Graphics Chair and Intel Visual Computing Institute at Saarland University as postdoctoral researcher. His research interests include parallel computer architectures and programming models with focus on automatic code generation.
Christoph Lauer Research Engineer, AUDI AG
Christoph Lauer is research engineer at AUDI AG, Ingolstadt. He holds a diploma degree in computer science from the University of Erlangen-Nuremberg and a postgraduate diploma in computer and information sciences from Auckland University of Technologies. In 2011, he received his Ph.D. (Dr.-Ing.) at the University of Erlangen-Nuremberg on the model-based design of embedded safety control units.

D(r)ive deep into crash prediction in future automotive systems that allow the tracking of dozens of objects in real time by utilizing the processing power of embedded GPUs. We'll describe (1) the new possibilities for crash prediction systems in embedded systems that are only possible by taking advantage of recent developments of embedded GPUs, and (2) the implementation and optimization of such a system on the Tegra K1 utilizing AnyDSL, a framework for rapid prototyping of domain-specific libraries that targets NVVM and CUDA.

Level: All
Type: Talk
Tags: Self-Driving Cars & Automotive ; Embedded; Performance Optimization; Robotics & Autonomous Machines

Day: Tuesday, 04/05
Time: 15:00 - 15:25
Location: Room LL21E

S6570 - Deep Learning in Real-World Large-Scale Image Search and Recognition

Xian-Sheng Hua Senior Director/Researcher, Alibaba Group
Xian-Sheng Hua became a researcher and senior director of Alibaba Group in April 2015, leading the multimedia technology team in the Search Division. Before that, he was a senior researcher of Microsoft Research Redmond since 2013, working on web-scale image and video understanding and search, as well as related applications. He was a principal research and development lead in Multimedia Search for the Microsoft search engine, Bing, until 2011, where he led a team that designed and delivered leading-edge media understanding, indexing, and searching features. He joined Microsoft Research Asia in 2001 as a researcher. Since then, his research interests have been in the areas of multimedia search, advertising, understanding, and mining, as well as pattern recognition and machine learning. He has authored or co-authored more than 250 research papers in these areas and has filed more than 90 patents. He received his B.S. in 1996 and Ph.D. in applied mathematics in 2001 from Peking University, Beijing. Dr.Hua is an IEEE Fellow and ACW Distinguished Scientist.

We'll introduce how deep learning helps realize a real-world visual search and recognition system. This topic has been studied for decades and became very hot again in recent years mainly due to the rapid development of deep learning and large-scale search techniques. Many visual search and recognition preliminary products are available to the public. However, have we solved all the big technical and non-technical challenges? Has ImageNet solved the recognition problem? What are the key factors of realizing a real-world visual recognition/search system? Are semantic gaps still there? Which direction is visual search/recognition going toward? What is still missing? We'll discuss all these based on a real-world, deep learning-based visual search and recognition.

Level: All
Type: Talk
Tags: Deep Learning & Artificial Intelligence; Computer Vision & Machine Vision; Press-Suggested Sessions: AI & Deep Learning

Day: Tuesday, 04/05
Time: 15:00 - 15:25
Location: Room 210H

S6617 - Using OpenACC to Accelerate Kirchhoff Depth Migration

Ken Hester Solution Architect, NVIDIA
Ken Hester is a solution architect with NVIDIA.

Learn how to use OpenACC directive-based methods to accelerate legacy seismic imaging apps using the public domain Seismic Unix package from the Colorado School of Mines Center for Wave Phenomena. Several case studies exist that explain how to accelerate reverse time migration algorithms using CUDA and GPUs. We'll focus on accelerating Kirchhoff Depth Migration (KDM), which is an important part for the seismic processing workflow. While significant performance gains will be reported using industry-standard KDM benchmarks and data, the emphasis will highlight how the OpenACC tool-chain improves portability and maintainability, improves programmer productivity, and maximizes performance.

Level: All
Type: Talk
Tags: Energy Exploration; OpenACC

Day: Tuesday, 04/05
Time: 15:00 - 15:25
Location: Marriott Salon 1

S6659 - Perfworks: A Library for GPU Performance Analysis

Avinash Baliga GPU Foundations Profiler Software Architect, NVIDIA
Avinash Baliga is the lead developer of the PerfWorks SDK, and has worked on GPU and game tools for the past 10 years. At prior GTCs, he has presented on the Nsight CUDA Debugger and Analysis tools.

Attendees will learn about PerfWorks, an updated successor to Perfkit, which can be used for GPU performance analysis. PerfWorks will support NVIDIA GPUs and SOCs going forward. Developers will get an overview of how PerfWorks will allow them access to low-level performance counters in NVIDIA GPUs. We'll provide example use cases so developers can see how it can be added to their application to instrument-specific sections of their code.

Level: Intermediate
Type: Talk
Tags: Tools & Libraries

Day: Tuesday, 04/05
Time: 15:00 - 15:25
Location: Room 211B

S6661 - Training Recurrent Neural Networks in FP16

Erich Elsen Research Scientist, Baidu USA, Inc.
Erich Elsen joined Baidu's new Silicon Valley AI Lab in the summer of 2014, excited to get involved with deep learning. He teaches a course on parallel algorithms, OpenMP, MPI, and CUDA at Stanford as a consulting associate professor each spring. Erich received his Ph.D. in mechanical engineering from Stanford in 2009. His thesis developed novel parallel algorithms for running fluid dynamics and molecular dynamics computations on two types of then newly released parallel processors: Sony's Cell and GPUs. After graduating, he joined an EDA startup, where he developed GPU-accelerated computational lithography solutions. From there he founded a consulting company, Royal Caliber, to help others take advantage of GPUs. Some projects include moving Shazam's recognition engine to run on GPUs and a high-performance graph algorithm framework, vertexAPI2.

Reducing training time allows us to learn from our experiments more quickly and make new innovations based on what we've learned. Using less than the standard 32 bits to represent a number can help reduce training times. We'll talk about how to use 16-bit floating point because it is starting to have wide hardware support with the release of Pascal. Unfortunately, naively converting all datatypes from 32- to 16-bits doesn't work, as training stability and accuracy are comprised. We'll discuss the reasons for the difficulties and solutions. Finally, we'll show performance and scalability improvements due to using reduced precision.

Level: Intermediate
Type: Talk
Tags: Algorithms; Deep Learning & Artificial Intelligence; Performance Optimization

Day: Tuesday, 04/05
Time: 15:00 - 15:25
Location: Marriott Salon 3

S6698 - How GPU Helps to Power Next Generation of ArcVideo Video Products and Service

Jin Huang CTO, ArcVideo, Inc
Jin Huang is CTO of ArcVideo, a spin off software company from ArcSoft, Inc, and focus on providing video related solutions and service to Chinese Broadcasting and OTT customers. Working on multimedia areas for over ten years, including PC/Mobile and Server/Cloud business, Jin responses for enabling broadcasting level of video solutions with high performance, private/public cloud video SAAS services, and intelligent video content analytic products supporting millions of end users.

ArcVideo is a leading video solution company in China, provides video transcoding, video content analyzing, interactive video solution running on physical or virtual servers, private and public cloud for Broadcasting and OTT customers. We take fully advantage of Tesla and GRID GPU transcoding and generic CUDA capabilities, to accelerate video transcoding and post processing pipeline, enable Deep Learning training for fast video content recognition, and also private and public cloud video based services for content providers. The high performance of GPU bring ArcVideo next generation of video experience including VR and 4K HEVC broadcasting, and make real time video based interactive platform possible to support millions users.

Level: Intermediate
Type: Talk
Tags: Media & Entertainment; Video & Image Processing; Data Center & Cloud Computing

Day: Tuesday, 04/05
Time: 15:00 - 15:25
Location: Room LL21C

S6721 - Redshift: Production-quality, final-frame rendering on the GPU

Panagiotis Zompolas CTO, Co-founder, Redshift Rendering Technologies
Panagiotis Zompolas is a video game industry veteran driven by a passion for computer graphics and hardware. Panos has worked with GPUs since the days of the 3dfx and has closely followed the GPU compute revolution since its inception in the mid-2000s. Panos' career in the video game industry includes leading companies like Sony Computer Entertainment Europe and Double Helix Games (now Amazon Games). He has led teams of graphics programmers in the creation of render engines, spanning several generations of hardware. This experience, tied with his passion for the industry, is one of the key pillars of Redshifts success.
Robert Slater VP Engineering, Redshift Rendering Technologies
Robert Slater is a seasoned GPU software engineer and video game industry veteran, with a vast amount of experience in and passion for the field of programming. As a programmer, Rob has worked for companies such as Electronic Arts, Acclaim and Double Helix Games (now Amazon Games). During this time, Rob was responsible for the core rendering technology at each studio, driving their creative and technical development. Rob’s graphics engine programming experience and know-how ensures that Redshift is always at the forefront of new trends and advances in the industry.

We'll discuss the latest features of Redshift, the GPU-accelerated renderer running on NVIDIA GPUs that is redefining the industry's perception towards GPU final-frame rendering. A few customer work examples will be demonstrated. This talk will be of interest both to the industry professional who want to learn more about GPU-accelerated production-quality rendering as well as the software developer who's interested on GPU-accelerated rendering.

Level: All
Type: Talk
Tags: Media & Entertainment; Rendering & Ray Tracing; Real-Time Graphics

Day: Tuesday, 04/05
Time: 15:00 - 15:25
Location: Room LL21B

S6738 - Computational Displays for Virtual and Augmented Reality

David Luebke Vice President Graphics Research, NVIDIA
Highly-Rated Speaker
David Luebke helped found NVIDIA Research in 2006 after eight years teaching computer science on the faculty of the University of Virginia. David is currently Vice President of Graphics Research at NVIDIA. His personal research interests include virtual and augmented reality, display technology, ray tracing, and graphics architecture. His honors include the NVIDIA Distinguished Inventor award, the NSF CAREER and DOE Early Career PI awards, and the ACM Symposium on Interactive 3D Graphics "Test of Time Award". David has co-authored a book, a SIGGRAPH Electronic Theater piece, a major museum exhibit visited by over 110,000 people, an online course on parallel computing that has reached over 80,000 students, and dozens of papers, articles, chapters, and patents on computer graphics and GPU computing.

We'll describe work by NVIDIA Research and our partners on challenges common to all wearable VR and AR displays:(1) FOCUS: how to put a display as close to the eye as a pair of eyeglasses, where we cannot bring it into focus? (2) FIELD OF VIEW: how to fill the user's entire vision with displayed content? (3) RESOLUTION: how to fill that wide field of view with enough pixels? A "brute force" display would require 10,000x8,000 pixels per eye! (4) BULK: displays should be vanishingly unobtrusive, as light and forgettable as a pair of sunglasses, but the laws of optics dictate that most VR displays today are bulky boxes bigger than ski goggles. I will discuss several "computational display" prototypes which sidestep these challenges by co-designing the optics, display, and rendering algorithm.

Level: All
Type: Talk
Tags: Virtual Reality & Augmented Reality

Day: Tuesday, 04/05
Time: 15:00 - 15:50
Location: Room LL20C

S6828 - Intelligent Video Analytics for Urban Management

Gabriele Randelli Founder & Chief Technology Officer, Smart-I
Gabriele Randelli is Founder and Chief Technology Officer at Smart-I S.r.l., where he deals with computer- vision algorithms applied to embedded platforms for adaptive lighting control on street posts. Throughout his professional career, he has mainly operated as an R&D Engineer, dealing with human- computer interaction technologies (computer vision, 3D simulators, speech recognition, tangible interfaces) applied to defense, robotics, and space applications. He also attended a Post-Doc at Sapienza University of Rome (Italy), where he gained a Ph.D. in Computer Engineering. His research interests include human-robot interaction technologies, cognitive robotic systems and machine learning. He’s author of more than 20 publications released in major journals, conference proceedings, and books in robotics

We'll show how it is possible to apply intelligent video analytics technologies to three different Smart Cities scenarios: traffic monitoring, adaptive lighting control of street poles, and security. Furthermore, when fast and reliable network infrastructure is not available, yet power consumption and energy savings is a key factor, it is possible to achieve significant data analysis only by leveraging on low-power computational architectures through GPU-based optimization

Level: All
Type: Talk
Tags: Intelligent Video Analytics (IVA); Computer Vision & Machine Vision; Embedded

Day: Tuesday, 04/05
Time: 15:00 - 15:25
Location: Room LL20D

S6850 - Evolution of GPU Technologies and Applications (Presented by Supermicro)

Don Clegg VP of Marketing, Supermicro
Highly-Rated Speaker
SUPERMICRO is the clear industry leader in GPU accelerated total solutions.As Vice President of Marketing & Worldwide Business Development at SUPERMICRO, Don Clegg leads teams focused on delivering High-Performance Server, Storage and Networking systems, leveraging GPU technology. Don brings 30+ years of direct experience in design, marketing, and business development to help Supermicro deploy its industry leading, first-to-market, scale-out/scale-up platforms. Don began his career as a design engineer, developing multi-node, multi-user, x86 servers and workstations. With an emphasis on first-to-market product introductions, Don subsequently held executive positions at several chipset and system companies where he helped them achieve #1 market share. The trend continues at Supermicro. He earned a bachelor's degree in Electrical Engineering from Brigham Young University, where he graduated with high honors.

In the past decade, the need for greater GPU performance has been at a steady rise. From HPC to Deep Learning and Big Data Analytics, denser, more powerful GPU solutions have become a necessity in order to service the next generation of GPU-accelerated applications. Supermicro will demonstrate how these applications have progressed, and how their GPU solutions influenced this evolution.

Level: All
Type: Talk
Tags: Supercomputing & HPC; Big Data Analytics; Deep Learning & Artificial Intelligence

Day: Tuesday, 04/05
Time: 15:00 - 15:50
Location: Marriott Salon 6

S6335 - Accelerating Reverse Time Migration Application for Seismic Imaging with GPU Architecture

Sergio Orlandini Software Developer, CINECA
Sergio Orlandini is a software developer in the High Performance Computing department of CINECA. He obtained a Ph.D. in physical-chemistry at the Sapienza University of Rome in 2010. Sergio's main interests are parallel computing, GPU computing, and code optimization. He is currently working on implementing seismic algorithms on GPUs. Sergio is also author and teacher of GPU programming course at CINECA.

We'll present an efficient GPU implementation of a reverse time migration (RTM) application. After an overview of the application, we'll discuss the use of GPUs to speed up the solution of wave equation in a finite difference algorithm. We'll show how to exploit concurrency between GPU and CPU computation and how to efficiently overlap computation and communication between GPU and CPU on different nodes. To reduce memory allocation on device, the use of a 16-bit fixed-point representation for velocity fields is exploited. In the second part of our talk, we'll show the RTM application performance, and discuss and analyze the performance on different HPC clusters and different devices.

Level: Intermediate
Type: Talk
Tags: Energy Exploration; Performance Optimization; Supercomputing & HPC

Day: Tuesday, 04/05
Time: 15:30 - 15:55
Location: Marriott Salon 1

S6354 - Unleashing the Performance Potential of GPU for Atmospheric Dynamic Solvers

Haohuan Fu Associate Professor, Tsinghua University
Haohuan Fu is an associate professor at the Center of Earth System Science in Tsinghua University. His research interests include HPC in earth and environmental sciences, computer architectures, performance optimizations, and programming tools in parallel computing. Haohan has a Ph.D. in computing from Imperial College London. He's a member of IEEE.

We'll demonstrate our efforts on developing highly efficient solvers for atmospheric dynamics on the GPU platforms. Besides general optimizations for GPU-based scientific computing applications, we apply optimization strategies that are specifically customized for atmospheric dynamic solvers. We'll show that by combining both algorithmic and architectural considerations, our optimization improves the computation efficiency from the original 2.24% to around 16% at the peak, with a sustained double-precision performance of 1.04 Tflops within one CPU-GPU node. We think this work demonstrates a huge potential for performing more efficient climate modeling work on GPU platforms.

Level: Advanced
Type: Talk
Tags: Earth System Modelling; Performance Optimization; Supercomputing & HPC

Day: Tuesday, 04/05
Time: 15:30 - 15:55
Location: Room 211A

S6371 - Deep Convolutional Neural Networks for Spoken Dialect Classification of Spectrogram Images Using DIGITS

Nigel Cannings Chief Technical Officer, Intelligent Voice Limited
Nigel Cannings founded Docusite as a research and development vehicle for advanced natural language processing and voice recognition technology, gaining prestige clients, including AXA Investment Managers, before merging with Chase ITS in 2009. He was educated in England at Brentwood School, and in the USA at Milton Academy in Boston. He qualified as a lawyer in 1993, and has worked for some of world's largest law firms and software companies. He contributes regularly to a number of publications, including the Huffington Post and the Global Legal Post. He gained U.K. government recognition by way of a large grant for high-tech research exploring problems in speech research, such as ultra-high-speed GPU-accelerated speech recognition and emotional analysis of telephone calls.

Deep convolution neural networks are designed for classification tasks involving static images. We'll outline the novel application of using such networks for speech processing tasks such as the identification of a speaker's dialect. Representing speech as spectrogram images, we'll show our recent results from the NIST language recognition competition, and discuss how the network training results can be improved by manipulation of the spectrogram images in a way appropriate in the context of speech applications.

Level: All
Type: Talk
Tags: Aerospace & Defense; Deep Learning & Artificial Intelligence; Signal & Audio Processing

Day: Tuesday, 04/05
Time: 15:30 - 15:55
Location: Marriott Salon 2

S6383 - High Performance CTC Training for End-to-End Speech Recognition on GPU

Minmin Sun GPU Architecture Engineer, NVIDIA
Minmin Sun has worked at NVIDIA as a GPU architecture engineer since he graduated from Nanjing University in 2012. His interests are GPU architecture and speech recognition.

End-to-end speech recognition systems, which directly transcribe audio data with text without requiring an intermediate phonetic representation, are based on recurrent neural network (RNN) + connectionist temporal classification (CTC). CTC is to automatically learn the alignments between speech frames and the label sequence of transcript. In this work, we focus on optimizing CTC training, especially the forward-backward algorithm, on GPU. Firstly, opportunities of saving computation and memory access of CTC forward-backward algorithm were quantitatively analyzed and utilized to get a speedup of ~1.28X. Secondly, by data reuse among frames and data transfer between frames through register file and shared memory, we get a speedup of ~1.80X.

Level: All
Type: Talk
Tags: Deep Learning & Artificial Intelligence

Day: Tuesday, 04/05
Time: 15:30 - 15:55
Location: Room 210H

S6413 - GPU Computing with Apache Spark and Python

Stanley Seibert Scientific Software Developer, Continuum Analytics
Dr. Stanley Seibert is a scientific software developer at Continuum Analytics. He received a Ph.D. in experimental high energy physics from the University of Texas in Austin and performed his postdoctoral research at Los Alamos National Laboratory and University of Pennsylvania. Stan has been evangelizing the use of Python and GPU computing since 2007, and has worked on a number of applications using Python, C++ and CUDA, including maximum likelihood parameter estimation in large data sets, Monte Carlo optical simulations, and information-theoretic approaches to experiment design. Prior to joining Continuum Analytics, Stan was Chief Data Scientist at Mobi.
Siu Kwan Lam Software Engineer, Continuum Analytics
Siu Kwan Lam is a software developer at Continuum Analytics and the lead developer of the Numba open source compiler project. He has a B.S. and M.S. degree in Computer Engineering from San Jose State University. He taught CUDA at San Jose State University during his senior year and has researched TCP covert channel detection for NSF, STC, and TRUST.

We'll demonstrate how Python and the Numba JIT compiler can be used for GPU programming that easily scales from your workstation to an Apache Spark cluster. Using an example application, we show how to write CUDA kernels in Python, compile and call them using the open source Numba JIT compiler, and execute them both locally and remotely with Spark. We also describe techniques for managing Python dependencies in a Spark cluster with the tools in the Anaconda Platform. Finally, we conclude with some tips and tricks for getting best performance when doing GPU computing with Spark and Python.

Level: Intermediate
Type: Talk
Tags: Programming Languages; Tools & Libraries; Big Data Analytics

Day: Tuesday, 04/05
Time: 15:30 - 16:20
Location: Room 211B

S6439 - GPU-Oriented Sparse Multifrontal QR Method

Wissam Sid-Lakhdar Postdoctoral Research Associate, Texas A&M University
Wissam Sid-Lakhdar is a post-doc at Texas A&M University, working with Professor Tim Davis. He did his Ph.D. at ENS Lyon in France, under the supervision of Dr. Jean-Yves L'Excellent, on "Scaling the solution of large sparse linear systems using multifrontal methods on hybrid shared-distributed memory architectures." His current interests concern the resolution of sparse linear systems through direct methods on GPUs.

We'll present the sparse direct method, a multifrontal QR factorization intended specifically for GPU accelerators. Our approach relies on the use of a bucket scheduler that exploits an irregular parallelism on both a coarse grain, among a set of fronts with different characteristics, and on a fine grain, through the exploitation of the staircase shape of these fronts. The scheduler then relies on dense GPU kernels which design and implementation target recent GPU architectures.

Level: Intermediate
Type: Talk
Tags: Algorithms; Performance Optimization; Tools & Libraries

Day: Tuesday, 04/05
Time: 15:30 - 15:55
Location: Marriott Salon 3

S6467 - Training My Car to See: Using Virtual Worlds

Antonio M. López Principal Investigator & Associate Professor, Computer Vision Center & Universitat Autònoma de Barcelona
Antonio Lopez is the head of the Advanced Driver Assistance Systems (ADAS) Group of the Computer Vision Center, and associate professor of the Computer Science Department, both at the Universitat Autonoma de Barcelona (UAB). In 1996, Antonio participated in the foundation of the Computer Vision Center at the UAB, where he has held different institutional responsibilities. Antonio has been principal investigator of numerous public and industrial projects, and is a co-author of a large number of top journal and conference papers. His research interests are vision-based object detection, semantic segmentation, domain adaptation, and computer graphics for training visual models. These topics are seen as key technologies to be applied in ADAS and autonomous driving.

Learn how realistic virtual worlds can be used to train vision-based classifiers that operate in the real world, i.e., avoiding the cumbersome task of collecting ground truth by manual annotation. Many vision-based applications rely on classifiers trained with annotated data. We avoid manual annotation by using realistic computer graphics (e.g. video games). However, the accuracy of the classifiers drops because virtual (training) and real (operation) worlds are different. We overcome the problem using domain adaptation (DA) techniques. In the context of vision-based driver assistance and autonomous driving, we present our DA experiences using classifiers based on both handcrafted features and CNNs. We show how GPUs are used in all the stages of our training and operation paradigm.

Level: Beginner
Type: Talk
Tags: Self-Driving Cars & Automotive ; Computer Vision & Machine Vision; Deep Learning & Artificial Intelligence; Press-Suggested Sessions: Self-Driving Cars & Auto

Day: Tuesday, 04/05
Time: 15:30 - 15:55
Location: Room LL21E

S6502 - GPU Accelerated Virtual Cell Biology and SIMD Enhanced High Throughput Computational Biology

Narayan Ganesan Assistant Professor, Stevens Institute of Technology
Narayan Ganesan is an assistant professor of electrical and computer engineering at the Stevens Institute of Technology, Hoboken, New Jersey. He received his Ph.D. from Washington University in St. Louis in 2006. He later worked on designing novel compute architectures using massively parallel processors and reconfigurable hardware for scientific computing. His research interests include designing efficient algorithms and computing architectures for handling big-data and big-computation problems in molecular dynamics, computational biology, bioinformatics and healthcare notification systems.
Hanyu Jiang Research Assistant, Stevens Institute of Technology
Hanyu Jiang is Ph.D. student of computer engineering at the Stevens Institute of Technology, Hoboken, New Jersey. He received his B.S. in control science and engineering from Harbin Institute of Technology, Harbin, China, in 2012, and an M.E. of computer engineering from Stevens Institute of Technology in 2014. His current research interests include heterogeneous and parallel computing, multi-core processor architecture, bioinformatics, and big data analytics.

We'll present GPU-enabled virtual cell biology (VCB) and explore the advantages of SIMD along with SIMT execution to boost the performance of computational biology applications. We'll first present a GPU-based whole cell simulation framework to study complex biological pathways via a scalable model with millions of agents. This simulates a multitude of intracellular reactions from which the overall cellular function emerges, thus serving as a virtual computational microscope. We'll then explore embedding SIMD instructions as the inner-most tier in a multi-tiered parallel framework to obtain a performance boost. This method was employed to accelerate all stages of the HMMER3 pipeline to gain an order of magnitude increase in performance over highly optimized CPU and non-SIMD-based GPU implementations.

Level: Intermediate
Type: Talk
Tags: Computational Biology; Algorithms

Day: Tuesday, 04/05
Time: 15:30 - 15:55
Location: Marriott Salon 5

S6534 - Exciting Practical Applications of Scalable Deep Learning and Image Recognition in the Cloud

Georgi Kadrev CEO, Imagga Technologies
Georgi Kadrev is co-founder and CEO of Imagga Technologies (http://imagga.com), one of the companies pioneering the image-recognition-as-a-service model, offering highly scalable cloud API to businesses and developers. Georgi graduated with an M.S. in technology entrepreneurship from Sofia University in 2009 and is currently an assistant professor and Ph.D. student in the Software Engineering department, specializing in practical deep-learning for image recognition. While leading Imagga, Georgi has won multiple technology, innovation, and entrepreneurship awards, most recently the best company award in the "Technology For The Big Players" track at South Summit, Madrid, October 2015.

We'll demonstrate how scalable image recognition based on deep-learning can greatly contribute to business cases varying from advertising and user profiling to content management and cloud services. We'll also discuss the technical challenges of providing scalable image recognition capable of handling huge loads of images, instant feedback loops, and customer-specific recognition tasks, and how we've addressed them using GPUs in the cloud. Ultimately, you'll benefit from our experience handling 80+ different practical cases and dive deep into the most exciting ones.

Level: All
Type: Talk
Tags: Computer Vision & Machine Vision; Big Data Analytics; Deep Learning & Artificial Intelligence

Day: Tuesday, 04/05
Time: 15:30 - 15:55
Location: Room 210F

S6647 - Maxwell Render Meets the GPU

Juan Cañada Head of Maxwell Render Technology, Next Limit Technologies
Juan Canada joined Next Limit to work on research projects, later moving to the newly formed Maxwell Render research team. Since then, Juan has held several positions in the team, leading it since 2007. He holds a B.S. in mechanical engineering and a degree in environmental sciences. Outside the office, Juan used to describe himself as an acceptable guitar player, although his skills have deteriorated since the birth of his beautiful daughter. To try to stop himself thinking about rendering all the time, he is an avid scuba diver and underwater photographer, although sometimes, when he looks at how light behaves under the sea, he realizes how much work we have left to do!

We'll take you through the challenges the Next Limit team have overcome to create a GPU version of their highly acclaimed Maxwell Render engine. Maxwell Render was the first unbiased, spectral, physically based render engine on the market (2004) and due to its impeccable quality, serves as a ground truth reference for many. We'll reveal some of the technology behind our new GPU-based renderer, taking a look at both its advantages and limitations -- and you will of course be one of the first to see it running live! We'll show you how the GPU version has maintained the core qualities of the current CPU engine -- enabling users to create images from 3D scenes in a reliable and predictable way but with one key difference: faster than ever.

Level: Intermediate
Type: Talk
Tags: Rendering & Ray Tracing; Real-Time Graphics; Algorithms

Day: Tuesday, 04/05
Time: 15:30 - 15:55
Location: Room LL21B

S6655 - Towards a High Performance Analytics and Computing Platform for Brain Research

Dirk Pleiter Group leader, Forschungszentrum Juelich
Dirk Pleiter is research group leader at the Julich Supercomputing Centre (JSC) and professor of theoretical physics at the University of Regensburg. At JSC he is leading the work on application oriented technology development. Currently, he is principal investigator of the POWER Acceleration and Design Center, a center that is jointly run by IBM, JSC and NVIDIA. He has played a leading role in several projects for developing massively-parallel special purpose computers, including several generations of QPACE. Dirk is author and co-author of more than 170 scientific papers, conference contributions and book chapters in the areas theoretical high-energy physics and computer science. Forschungszentrum Julich was one of the first academic institutions that joined the OpenPOWER Foundation.

Understanding and modeling the human brain continues to be one of the biggest challenges of research. The Human Brain Project is a European flagship, which is in the process of creating a research infrastructure that will facilitate this research. Many research topics in this field require scalable compute resources or the ability to process extreme-scale data volumes (in some cases even both). Examples are approaches to simulate the network of a human brain in its full complexity and the efforts to create high-resolution brain atlases. GPUs play already today an important role to realize the necessary computational capabilities. We'll give an overview of the efforts of building an high-performance analytics and computing platform for brain research.

Level: All
Type: Talk
Tags: Big Data Analytics; Supercomputing & HPC; Press-Suggested Sessions: HPC & Science

Day: Tuesday, 04/05
Time: 15:30 - 15:55
Location: Room 210E

S6717 - Discovering the New Frontier of Shadertoy

Pol Jeremias Co-Founder, Beautypi
Highly-Rated Speaker
Pol Jeremias is passionate about technology and art. He grew up near Barcelona and moved to California in 2006. Since then, Pol has researched computer graphics and worked in multiple games for companies such as LucasArts or SoMa Play. Today, he helps create movies at Pixar Animation Studios. In his spare time, he has co-founded Shadertoy.com and Beautypi. When he is not programming you will find him running, reading or watching movies.
Inigo Quilez Co-Founder, Beautypi
Highly-Rated Speaker
Inigo Quilez grew up enjoying the mountains, snow and sea, but also programming fractals and graphics algorithms. After having finished his Master's degree in Electrical Engineering, and having later worked professionally in virtual reality and real-time rendering of massive data sets in Belgium for six years, he moved to the US to work at Pixar Animation Studios. There he spent five years creating procedural vegetation and landscapes for the movies, from research and tools creation to doing the required shot work in production. Inigo joined the Oculus Story Studio end of 2014, where he now works on bridging the worlds of real-time rendering, filmmaking and virtual reality. In his spare time Inigo co-founded the website Shadertoy, to which he also contributes with content regularly.

In this session, the Shadertoy.com creators will change the way you think about fragment shaders. During the last three years, the Shadertoy community has been trying to answer one question: What can a fragment shader do? The results have been mind-blowing. People from all over the world collaborated to break the limits of what was possible: procedural content, incredible raymarching tricks, VR or even GPU generated music. This year there is a new challenge: What can multiple fragment shaders do? Imagine fragment shaders could talk to each other and finally build complex algorithms running on the browser using your GPU. Join the Shadertoy creators to discover new ideas and techniques to create fragment shaders that are games, progressive path tracers, sorting algorithms, demos and more.

Level: Intermediate
Type: Talk
Tags: Media & Entertainment; Real-Time Graphics; Virtual Reality & Augmented Reality

Day: Tuesday, 04/05
Time: 15:30 - 16:20
Location: Room LL21C

S6775 - The Recent Advances in GPU-Based Intelligent Video Analysis

Hai Tao CEO and Founder, Beijing Vion Technology
Hai Tao is a founder and the CEO of Beijing Vion Technology, Inc., a company focusing on developing world leading computer vision and artificial intelligence algorithms and products, with various applications in intelligent transportation systems (ITS), public safety, and business intelligence. Hai holds more than 10 US patents and published more than 130 papers in the field of image processing and computer vision. He received his B.S. and M.S. degrees in Automation from Tsinghua University in 1991 and 1993, respectively. He received the Ph.D. degree in Electrical Engineering from the University of Illinois at Urbana-Champaign in 1999.

We'll demonstrate our recent progress in applying GPUs in several key computer vision sub-fields including video-based face recognition, vehicle attribute analysis, urban management event detection, and high density crowd counting. These algorithms combine the traditional feature-plus-classifier approach with the recent advances in deep learning to make high performance computer vision systems practical and enable products in several vertical markets including intelligent transportation systems (ITS), business intelligence (BI), and smart video surveillance. In addition, we'll demonstrate a single-GPU video analytic box that can process up to 8 channels of analog or 2 channels of 1080p HD video inputs. A prototype 40-GPU server system capable of processing up to 80 channels of 1080p video inputs will also be introduced during this presentation.

Level: All
Type: Talk
Tags: Intelligent Video Analytics (IVA); Algorithms; Computer Vision & Machine Vision

Day: Tuesday, 04/05
Time: 15:30 - 15:55
Location: Room LL20D

S6216 - The Future of Unified Memory

Nikolay Sakharnykh Developer Technology Engineer, NVIDIA
Nikolay Sakharnykh is a senior developer technology engineer at NVIDIA, where he works on accelerating applications on GPUs. He has experience in scientific research and software development focusing on computational techniques related to physics, chemistry, and biology.

Learn about the new Unified Memory programming model for heterogeneous architectures. We'll deep dive into architecture and software changes in Unified Memory, what it means for developers, and how it enables new features for GPU applications, including on-demand paging and memory oversubscription. Use cases in HPC and other domains will be provided with the initial performance projections. Unified Memory performance optimizations, such as data prefetching and location hints, will be covered along with real-world application examples.

Level: Intermediate
Type: Talk
Tags: Programming Languages; Performance Optimization; Tools & Libraries

Day: Tuesday, 04/05
Time: 16:00 - 16:50
Location: Room 212A

S6230 - Hierarchical Computations on Manycore Architectures

Hatem Ltaief Senior Research Scientist, Extreme Computing Research Center, KAUST
Highly-Rated Speaker
Hatem Ltaief is a senior research scientist in the Extreme Computing Research Center at KAUST, where he advises several KAUST students in their M.S. and Ph.D. research. Hatem received his engineering degree from Polytech Lyon at the University of Claude Bernard Lyon I, France, an M.S. in applied mathematics at the University of Houston, and a Ph.D. degree in computer science from the University of Houston. From 2008 to 2010, he was a research scientist in the Innovative Computing Laboratory in the Department of Electrical Engineering and Computer Science at the University of Tennessee, Knoxville. He is part of the European Exascale Software Initiative (EESI) to build a European vision and roadmap to address the challenges of the new generation of massively parallel systems. He has various strategic partnerships with industries (Saudi Aramco, Intel, NVIDIA) as well as Universities and HPC Centers (University of Tennessee, INRIA Bordeaux, L'Observatoire de Paris, Barcelona Supercomputing Center). He is the author or co-author of 40 journal/conference papers and book chapters. His research interests include parallel numerical algorithms, parallel programming models, and performance optimizations for multicore architectures and hardware accelerators.

Learn about a new hierarchical matrix structure for fast linear algebra computations on GPUs! Recursivity, tree traversal, hierarchical data layout, and batched kernel executions are some of the ingredients of a new HPC recipe for computing challenging linear algebra operations and solving large scientific problems (e.g., spatial statistics) on GPUs. By exploiting the low-rank matrix representations, the original dense matrix of the problem can be approximated, which results in saving the memory footprint and reducing the algorithmic complexity, while still maintaining an adequate solution accuracy. In addition, the talk showcases a new high-performance hierarchical symmetric eigensolver and SVD, juicing the horsepower out of multiple GPUs to the fullest.

Level: Intermediate
Type: Talk
Tags: Algorithms; Performance Optimization; Tools & Libraries

Day: Tuesday, 04/05
Time: 16:00 - 16:25
Location: Marriott Salon 3

S6279 - Visual Feature Learning from Web Images and Click Log

Chen Fang Research Scientist, Adobe Systems
Chen Fang is a research scientist at Adobe Research. His interests include image recognition, image retrieval, deep learning, and large-scale machine learning. He obtained his Ph.D. from the computer science department at Dartmouth College.

Visual feature learning is a fundamental problem in computer vision. Existing solutions rely on deep learning and large-scale labeled datasets. However, it is often labor intensive and time consuming to collect such datasets. On the other hand, the internet offers raw visual data, i.e., images and videos, at massive scale and the associated user behavior data, e.g., click logs. We'll present a novel framework to learn visual features from such data, which completely forgoes the need of labeled datasets. We apply the proposed framework and its variants on two kinds of web data: images on a social website and their view history, and search log of a commercial image search engine. High-quality visual features are learned in both cases.

Level: All
Type: Talk
Tags: Computer Vision & Machine Vision; Deep Learning & Artificial Intelligence

Day: Tuesday, 04/05
Time: 16:00 - 16:25
Location: Room 210F

S6281 - Accelerating Science Platforms for Machine Learning, Big Data, and Earth System Science

John Taylor Leader, Computational and Simulation Science, CSIRO
Dr. John Taylor currently leads CSIRO Data61 Science Platforms. John has written more than 140 articles and books on computational and simulation science, climate change, global biogeochemical cycles, air quality and environmental policy, from the local to the global scale, spanning science, impacts and environmental policy. His research has been widely cited and attracted significant media attention. John has worked as a computational scientist and group leader both at the Mathematics and Computer Science Division, Argonne National Laboratory and at the Atmospheric Science Division at Lawrence Livermore National Laboratory. John was senior fellow in the Computation Institute at the University of Chicago. He has served on the Advisory Panel of the Scientific Computing Division of U.S. National Center for Atmospheric Research (NCAR) and the U.S. National Energy Research Scientific Computing Center NUGEX Advisory Committee. John currently serves on the board of the National eResearch Collaboration Tools and Resources (NeCTAR), a federal government super science initiative. He is a fellow of the Clean Air Society of Australia and New Zealand.
John Zic Principal Research Scientist, CSIRO
John Zic is a principal research scientist at CSIRO and leads the Dependable Systems research team. He is the Standards Australia Chair of Committee IT-038 "Cloud Computing", which has actively contributed to the new ISO standards 17788 — Cloud Computing Overview and Vocabulary and 17789 Cloud Computing — Reference Architecture. Previously, John has participated in the NSW Government Information Security Working Group and was an expert evaluator for the EU Framework 7 Call 8 Object 10.4 "Trustworthy ICT" in February 2012. He has given many invited presentations and keynotes: the inaugural MIT Kerberos and Internet of Trust (MIT-KIT) conference in 2014; the EU FP7 Program "Building International Co-operation" in Brussels (2011); INCO-TRUST workshop in New York (2010); Kerberos Consortium Conferences at MIT (2010 and 2011); the Vanguard/TTI CyberINsecurity Conference in 2010. He was a member of the 11th Joint EU and Australian Science and Technology Cooperation Committee in 2010. Academically, John has published in the area of trustworthy and dependable systems since 1990.
Oliver Obst Data Mining Team Leader, CSIRO Data61
Oliver Obst is a senior research scientist, leader of the CSIRO Data61 Big Data Platform project, and leader of the Data Mining Research Team at the Commonwealth Scientific and Industrial Research Organisation (CSIRO) in Sydney, Australia. In his work, he solves practical problems around making sense of data in industrial projects—for example, finding patterns, or detecting anomalies, but also selecting informative features or placing sensors for classification, prediction, or to make the best decisions based on past experience.

At CSIRO Data61, we're building the next generation of science platforms that exploit GPU computing to dramatically accelerate the time to discovery and the pace of innovation in science and industry. Scientific applications routinely generate huge amounts of data. In response to these trends, we've developed and deployed a new breed of GPU-accelerated big data technologies, earth system modeling tools, and machine learning capabilities. We'll present examples of our work in big data analytics, earth system modeling, and deep learning that clearly demonstrate the value that GPU computing can deliver to research organisations and industry. CSIRO has been at the forefront of GPU computing since 2009 and was one of the first NVIDIA CUDA Research Centers.

Level: All
Type: Talk
Tags: Deep Learning & Artificial Intelligence; Earth System Modelling; Big Data Analytics; Astronomy & Astrophysics; Press-Suggested Sessions: AI & Deep Learning

Day: Tuesday, 04/05
Time: 16:00 - 16:25
Location: Room 210H

S6314 - Java Image Processing: How Runtime Compilation Transforms Memory-Bound into Compute-Bound

Florent Duguet Founder, ALTIMESH
Florent Duguet founded Altimesh in 2008, in an effort to reduce the learning curve of GPU computing for high-level language developers.The outcome is the Hybridizer, which enables many-core computing in high-level programming environments such as dot net and java. Florent graduated with a Ph.D. in computer graphics in 2005. He has implemented solutions for financial services for oil and gas industries with a focus on GPGPU since 2007, starting from the proof of concept and leading up to production.

A wide variety of image processing algorithms are typically parallel. However, depending on filter-size or neighborhood search pattern, memory access is critical for performances. We'll show how loop reordering and memory locality fine-tuning help achieve best performance. Using Hybridizer to automate Java byte-code transformation to CUDA source code, and using new CUDA feature Run Time Compilation, we transformed execution from memory-bound to compute-bound. Applying this technique to oil and gas image processing algorithms results in interactive response time on production-size datasets.

Level: All
Type: Talk
Tags: Performance Optimization; Energy Exploration; Video & Image Processing

Day: Tuesday, 04/05
Time: 16:00 - 16:25
Location: Marriott Salon 1

S6331 - Running Multiple Workloads on a GPU: A UX Oriented Approach

Yuval Sarna Graphics Software Expert, GameFly Streaming
Yuval Sarna is a graphics software expert at GameFly Streaming, where he focuses on solving game performance issues and developing new algorithms to drive the game streaming engine. Yuval is a graduate of Tel-Aviv University and holds a B.S. in computer science. Before his studies, he developed a real-time 3D engine that was later on used in his military service, where he worked on various other 3D projects.

With the increased usage of cloud computing for GPU-demanding workloads, the need to share the GPU between applications increases. We'll present an approach that improves the user experience when running GPU-demanding workloads concurrently. We'll take you through an overview of the existing GPU scheduling schemes used in Windows OS and present a new approach to better utilize the compute resources and achieve maximal cost efficiency.

Level: Intermediate
Type: Talk
Tags: Game Development; Performance Optimization

Day: Tuesday, 04/05
Time: 16:00 - 16:50
Location: Room 212B

S6446 - A Fully Automated, High Performance Geolocation Improvement Workflow for Problematic Imaging Systems

Devin White Senior Research Scientist, Oak Ridge National Laboratory
Dr. Devin White is a senior research scientist at Oak Ridge National Laboratory and is a subject matter expert in the areas of quantitative social science, modeling complex adaptive systems, social network analysis, high performance computing, tactical airborne and spaceborne geopositioning, uncertainty propagation and analysis, image science, computer vision, multimodal image registration, data fusion, data visualization, imaging spectroscopy, lidar, SAR, and Earth observing systems. Devin is also a joint faculty professor of anthropology at the University of Tennessee, Knoxville. He previously served as a lead scientist for Exelis Visual Information Solutions and a scientist at Integrity Applications Incorporated, supporting large commercial and government customers.
Sophie Voisin Geospatial Software Engineer, Oak Ridge National Laboratory
Dr. Sophie Voisin is an engineer at Oak Ridge National Laboratory developing high performance computing methods for geospatial data analysis for the GIST group. She received her Ph.D. in computer science and image processing from the Universite de Bourgogne (France) in 2008 and joined ORNL in 2010 to work on numerous image processing related projects, successively performing quantitative analysis of neutron 2D and 3D image data; developing new techniques for eye-gaze data analysis, for which she is a co-recipient of an R&D 100 award (2014); and now implementing multidimensional image processing algorithms on GPU platforms for high performance computing analysis of satellite imagery.

Learn how hybrid CPU-GPU parallelization is being used to support rapid improvement of the geolocation accuracy of imagery collected by multiple airborne and spaceborne platforms. A sensor-agnostic, plugin-based framework with CUDA-enabled workflows was built to support photogrammetric and computer-vision processing tasks like image registration and orthorectification. Leveraging the complementary strengths of multicore CPUs and multiple Tesla K80 GPUs on each compute node required significant custom development to achieve optimal performance. We dramatically reduced per-image processing time and can handle multiple data streams simultaneously. The science behind two workflows will be presented, along with their performance metrics while executing on both bare-metal and virtual machines.

Level: Intermediate
Type: Talk
Tags: Video & Image Processing; Supercomputing & HPC; Aerospace & Defense

Day: Tuesday, 04/05
Time: 16:00 - 16:50
Location: Room 210E

S6458 - A GPU-Based Cloud Speech Recognition Server for Dialog Applications

Alexei V. Ivanov CTO, Verbumware Inc.
Alexei Ivanov has a background in engineering and computer science. He received his Ph.D. in theoretical foundations of computer science in 2004 from Belarusian State University of Informatics and Radioelectronics. He holds an M.S. in electrical engineering from Moscow Institute of Physics and Technology (State University). He has worked in both academia (University of Trento, Moscow Institute of Physics and Technology) and industry (Pearson Knowledge Technologies, USA; Speech Technology Center, Russia; Lernout & Hauspie Speech Products NV, Belgium). Alexei has broad experience in speech processing and recognition systems. His current research interests include adaptive conversational machines; web-integration of individual multimedia experiences; speech characterization technology; and integration of para-linguistic knowledge into the process of speech recognition and interpretation.

We'll show that GPUs enable a successful solution for difficult applications such as speech recognition as a dialog interaction. Dialog interactions are problematic because of high variability of spontaneous speech and the processing time constraints. Remote interactions impose telecommunication constraints, i.e. the narrowband and compressed signal representation, limited spoken context available for adaptation. The mass service requires acoustic model adaptation to regional accents. We conduct our experiments with the speech from non-native speakers of English as an extreme case of accented speech. Our GPU-based system exhibits high accuracy with processing speeds faster than the natural speaking pace. The latency of the speech recognizer is below that required for user satisfaction.

Level: All
Type: Talk
Tags: Signal & Audio Processing; Algorithms; Data Center & Cloud Computing; Deep Learning & Artificial Intelligence

Day: Tuesday, 04/05
Time: 16:00 - 16:25
Location: Marriott Salon 2

S6473 - Lift: Primitives for Hybrid CPU/GPU Parallel Programming

Nuno Subtil Principal Software Engineer, Genia
Nuno Subtil is a principal software engineer at Genia and the lead developer for the company's GPU-accelerated pipeline for primary analysis in DNA sequencing. Before joining Genia, Nuno was part of the NVBIO team at NVIDIA, focused on high-performance parallel algorithms for bioinformatics. Prior to that, he worked on low-level graphics system software for mobile and desktop platforms at NVIDIA. He also did research on physically based image synthesis at TU Wien and on accelerating computer vision algorithms at Deutsche Telekom Laboratories during the early days of GPUs. Nuno holds a computer science degree from the University of Coimbra, Portugal.

Lift is a thin abstraction layer that hides some of the complexity of parallel programming. It provides a set of primitives analogous to (and entirely compatible with) NVIDIA's Thrust library, but designed around drastically simpler code, suitable for inclusion in large, complex projects which target NVIDIA GPUs or Intel CPUs. Lift is an open-source project under active development at Genia. It is the foundation for our primary analysis pipeline for DNA sequencing, as well as the foundation for Firepony, an open-source base quality score recalibrator for DNA sequencing data. We'll cover the motivation for Lift and the applications we're developing it for, and then explain how it works, what problems it solves, and what lessons we learned from prior experience with similar libraries.

Level: Intermediate
Type: Talk
Tags: Computational Biology; Tools & Libraries

Day: Tuesday, 04/05
Time: 16:00 - 16:25
Location: Marriott Salon 5

S6494 - Implementing NVIDIA GRID™ Solutions in Unmanned Vehicle Ground Control Systems

Nathan Wincey Systems Architect, Lockheed Martin, Mission Systems and Training
Nathan Wincey is a systems architect at the Lockheed Martin Unmanned Integrated Systems group within the Mission Systems and Training Division. Over the course of his 16 years with Lockheed Martin, he has worked as a systems engineer and architect on a variety of unmanned aerial vehicle (UAV) ground control systems.

This talk will present the journey Lockheed Martin Unmanned Integrated Systems has taken over the past several years in its effort to successfully virtualize the GPU for integration into its Unmanned Vehicle Ground Control Systems. During this effort, Lockheed Martin UIS has utilized direct GPU pass-through, Microsoft's RemoteFX, VMware's vSGA, and now NVIDIA GRID vGPU technologies to bridge this once impassible technology gap in successfully virtualizing the GPU. This talk will provide an overview of the challenges faced during this effort, solutions to those challenges, benefits gained by staying open-architecture oriented, and SWaP improvements realized.

Level: All
Type: Talk
Tags: Graphics Virtualization; Aerospace & Defense

Day: Tuesday, 04/05
Time: 16:00 - 16:25
Location: Marriott Salon 4

S6670 - Toward Bridging the Gap Between High Quality and High Performance for HPC Visualization

Robert Sisneros Technical Program Manager: Data Analysis and Visualization, National Center for Supercomputing Applications
Robert Sisneros manages NCSA's Data Analysis and Visualization Group. This group is tasked with supporting science teams utilizing NSF HPC resources as well as furthering the state of scientific visualization through cutting edge research. As a senior member of the Blue Waters Project, Robert's research interests in I/O and visualization are primarily aligned with issues of particular importance to high performance computing. These include: in situ visualization, data models and representations, parallel analysis algorithms, I/O parameter optimization, and "big data" analytics. Robert earned the degrees of Bachelor of Science in Mathematics and Computer Science from Austin Peay State University and the degrees of Master of Science and Doctor of Philosophy in Computer Science from the University of Tennessee in Knoxville.

I will discuss how the current standard for large-scale science is becoming obsolete, and how this is creating a gap between high-quality graphics and high performance visualization. I'll introduce the recent work of integrating NVIDIA IndeX™ with ParaView, work that I see as directly addressing the aforementioned gap.

Level: All
Type: Talk
Tags: In-Situ and Scientific Visualization; Supercomputing & HPC; Press-Suggested Sessions: HPC & Science

Day: Tuesday, 04/05
Time: 16:00 - 16:25
Location: Room LL21D

S6672 - Training and Deploying Deep Neural Networks for Speech Recognition

Bryan Catanzaro Senior Researcher, Baidu Research
Highly-Rated Speaker
Bryan Catanzaro is a research scientist at Baidu's Silicon Valley AI Lab, where he leads the systems team. His research is focused on efficient tools and methodologies for training and deploying large deep neural networks. Before joining Baidu, Bryan was involved in popularizing GPUs for machine learning while working at NVIDIA, including the creation of CUDNN. Bryan received his PhD from the University of California at Berkeley, where he wrote the first Support Vector Machine training library to run on Graphics processors, and created Copperhead, a Python-based DSL for parallel programming.

Training and deploying deep neural networks for speech recognition is very computationally intensive. I will discuss how we have made our training process scale efficiently to many GPUs while training, as well as how we use GPUs to take our deep neural networks to users at scale through Batch Dispatch.

Level: Advanced
Type: Talk
Tags: Deep Learning & Artificial Intelligence

Day: Tuesday, 04/05
Time: 16:00 - 16:25
Location: Hall 3

S6724 - Acceleration of Weather Forecasting and Meteorological Satellite Data Assimilation, Processing and Applications

Allen Huang CTO, Tempo Quest Inc.
Allen Huang received his Ph.D. in the area of satellite remote sensing from the University of Wisconsin-Madison in 1989. In the same year he joined Cooperative Institute for Meteorological Satellite Studies, Space Science and Engineering Center, University of Wisconsin-Madison, and is currently a distinguished scientist of the UW- Madison, a Fellow of International Society for Optical Engineering (SPIE), PI of many NOAA and NASA sponsored projects, an Adjunct Professor of several universities, CEO of Hyper Sensing, LLC, and CTO of Tempo Quest., Inc.

In partnership with scientists from Space Science and Engineering Center (SSEC), Tempo Quest Inc. is embarking on a quest to complete a proprietary version of Weather Research and Forecasting Model (WRF) - AceCAST, a mesoscale and global model designed for both operational forecasters and atmospheric researchers and widely used by commercial, government, and institutional users. The state-of-the-art acceleration of low throughput, low energy consumption, and error resilient satellite remote sensing data compression suitable for data, image, and video transmission and archive will also be discussed.

Level: All
Type: Talk
Tags: Earth System Modelling

Day: Tuesday, 04/05
Time: 16:00 - 16:25
Location: Room 211A

S6800 - VR: Not Just for Games

Amir Ebrahimi Principal Engineer, Unity Technologies Labs
Amir Ebrahimi was both the 13th and 600th employee at Unity. He currently focuses on VR development tools, including building out robust APIs, support and guidance, and is lead engineer on Unity's new in-VR scene editing tools for the editor.
Dioselin Gonzalez Lead VR Engineer, Unity Technologies Labs
Currently a lead virtual reality engineer at Unity Labs, Dioselin Gonzalez's recent experience includes working as character animation and lead behavior engineer at DreamWorks Animation. In 2005 she received a Master’s degree from Purdue, where she was involved on collaborative virtual reality research at the at the Envision Center for Data Perceptualization.

With several companies poised to release high-quality VR headsets in 2016, it's not just game developers that are interested in creating worlds, experiences, and applications for VR. We'll talk about some of the experimentation we have done at Unity Labs for finding novel ways of interacting in VR and look at other experiences/interactions that we have seen by others that are worth mentioning. We'll finalize with a discussion that extrapolates these current experiments to possible future immersive technologies applied to other industries.

Level: Beginner
Type: Talk
Tags: Virtual Reality & Augmented Reality; Game Development; Press-Suggested Sessions: Virtual Reality

Day: Tuesday, 04/05
Time: 16:00 - 16:25
Location: Room LL20C

S6802 - Real Time Global Illumination on the Move at 35M Polys

Dennis Malone Virtual Prototype Engineer, Nissan North America
Dennis Malone has worked at Nissan since 2000. He started as a cockpit module layout and design until 2006 and in virtual ergonomics design and testing until 2009. He now runs quality and craftsmanship virtual testing.

Nissan has achieved never before seen performance with Ray Tracing and Global Illumination in real time by leveraging emerging technology advancements. Utilizing the new developments with previously unheard of hardware and software abilities, we have been able to achieve a production ready digital testing environment that allows a time-efficient design decision-making process that previously had to be completed using render output functions. Photorealistic digital craftsmanship testing has been in use at Nissan N.A since 2007. Since that time, we have worked towards the ability to increase the capabilities of the available tools, remaining on the bleeding edge in hopes of reaching full GI on the fly during review sessions. We feel we have finally reached that goal in full 4K.

Level: All
Type: Talk
Tags: Product & Building Design; Rendering & Ray Tracing; Real-Time Graphics; Self-Driving Cars & Automotive

Day: Tuesday, 04/05
Time: 16:00 - 16:25
Location: Room LL21A

S6833 - Delivering GPU-Accelerated Applications from your Private Cloud (Presented by Cisco)

Shawn Kaiser Technical Solutions Architect, Cisco
Shawn Kaiser is a member of the Data Center Solutions Architecture team focusing on Virtualization Competency. Shawn's primary background is in Virtual Infrastructure Architecture and Design where he designed and ran one of the first production Virtualization environments with VMware ESX 1.5. He has acted as a consultant for a number of years to help customers realize the benefits of Virtualization, and has done many assessments, designs and implementations. Shawn's primary role in Cisco is to support other Sales Engineers and customers around opportunities for Virtualization and VDI with the Unified Computing System (UCS).

Have you ever wondered if you could run your graphic intensive applications like AutoCAD, ArcGIS, or PTC Creo in your enterprise private cloud? Would it perform the same as your powerful workstation on your desk? We will discuss how the graphics workstation is evolving to meet the demands of a global, anytime, anywhere workforce that's no longer tethered to location or hardware. Learn how Cisco offers a high-performance virtualized solution for delivering immersive 3D graphics for designers, clinicians, and researchers. We will also discuss best practices by taking a look at successful customer use cases and review recent additions to our portfolio of solutions. We will explore how you can leverage the Cisco Unified Computing System platform coupled with the latest NVIDIA GRID technology.

Level: Beginner
Type: Talk
Tags: Graphics Virtualization; Data Center & Cloud Computing

Day: Tuesday, 04/05
Time: 16:00 - 16:25
Location: Marriott Salon 6

S6842 - NVIDIA Advanced Rendering Products for End Users

Phillip Miller Senior Director, NVIDIA Advanced Rendering Products, NVIDIA
Highly-Rated Speaker
Phillip Miller is senior director of NVIDIA's commercial advanced rendering offerings, ranging from the Iray and mental ray shipping within leading products in Design and Entertainment to the IndeX technology used in large data visualization. Phil has been with NVIDIA for 7 years and has led leading software products for over 20 years, including the Entertainment offerings at Autodesk and the Web Design product line at Adobe. He holds a Masters of Architecture from the University of Illinois and is a registered architect.

Come learn of the products NVIDIA is producing that you can use within popular 3D tools like 3ds Max, Maya, Rhino and Cinema4D or to scale the rendering from other products across render farms. NVIDIA's new range of Iray plug-in products will be discussed, along with recent advances in mental ray. We'll also include how the Iray SDK is employed and Iray is exposed in each of the different products to better support native workflows.

Level: Beginner
Type: Talk
Tags: Rendering & Ray Tracing; Product & Building Design; Media & Entertainment

Day: Tuesday, 04/05
Time: 16:00 - 16:25
Location: Room LL21B

S6866 - ROBORACE: The Global Driverless Championship of Intelligence and Technology

Denis Sverdlov CEO, Roborace
Denis Sverdlov is the Founder and CEO of ROBORACE (http://www.roborace.com) and UK investment company KINETIK (http://www.kinetik.vc). With more than 10 years of leadership in IT and telecommunications, Denis is a venture investor and successful entrepreneur passionate to create cool products which are radically better. Denis is the former chief executive of YOTA, a wireless broadband 4G carrier. He also created YotaPhone — mobile phone with the second "always on" screen. KINETIK founded a new automotive business Charge, which started operations in January 2015. Charge is developing a modular powertrain for commercial electric vehicles (EV) of 2 to 26 tonnes. The solution makes it possible to build a commercial EV at the price of a conventional one with significantly lower operational costs.KINETIK is also funding and developing several global innovative ventures in the Internet of Things, Biotech and other emerging industries.

ROBORACE is a global race series for full-size driverless electric cars. The championship will provide a showcase platform for the autonomous driving solutions that are now being developed by many large industrial automotive and technology players as well as top tech universities. As a competition of intelligence and technology, ROBORACE is fusing AI with automotive engineering in extreme conditions. Bringing together motorsports and gaming in that battle of algorithms the teams will compete on the racing tracks in major cities across the world. During the talk we will share the technical vision of our competition and explain the selection criteria for the racing teams. Join us to discuss and be the first to hear some exciting news about ROBORACE!

Level: All
Type: Talk
Tags: Self-Driving Cars & Automotive ; Robotics & Autonomous Machines; Deep Learning & Artificial Intelligence

Day: Tuesday, 04/05
Time: 16:00 - 16:25
Location: Room LL21E

S6129 - Parallel Low Rank LU and Cholesky Refactorization

Lung-Sheng Chien Software Engineer, NVIDIA
Highly-Rated Speaker
Lung-Sheng Chien is a software engineer at NVIDIA, working on CUSOLVER and CUSPARSE libraries. Prior to NVIDIA, he was a Ph.D. student in the Department of Mathematics at National Tsing Hua University. He received his B.S. and M.S. in the Department of Computer Science at National Tsing Hua University in 2003 and 2005, respectively.

Attendees can learn how to use a low-rank update in linear solver during a nonlinear process--for example, linear programming, structural mechanics, and circuit simulation. A GPU-friendly version is proposed, which is mainly based on BLAS2 operations. Compared to traditional approaches, with BLAS2 operations, we can hide instruction latency well and achieve full bandwidth of a many-core processor. In this talk, we describe the basic idea of low-rank update and show up to 5x speedup from complexity analysis.

Level: Intermediate
Type: Talk
Tags: Algorithms; Computer-Aided Engineering

Day: Tuesday, 04/05
Time: 16:30 - 16:55
Location: Marriott Salon 3

S6209 - A Look at Real World Performance Capabilities of NVIDIA GRID™ 2.0

Fred Devoir Senior Architect & Manager of IT Infrastructure, Textron Inc.
Fred is a Sr. Systems Architect and Manager of IT Infrastructure at Textron Inc. Fred has a wide variety of specialized and business systems experience with particular interests in integration and virtualization projects specifically centered around Virtual Desktop Infrastructure (VDI), graphics acceleration, and high performance computing clusters. His past experience includes 22 years of IT professional work experience as an IT Manager, Sr. Systems Analyst, Engineer, and Architect for Fortune 500 companies in the Aerospace/Defense, Engineering, Medical, and Pharmaceutical industries.
Luke Wignall Manager, GRID Performance Engineering, NVIDIA
Highly-Rated Speaker
Luke Wignall came to NVIDIA after working as an owner of an integrator/VAR, as a sales engineer, solution architect, consultant, and system administrator with both VMware and Citrix technologies in both public and private industry. An early evangelist of virtualization, Luke saw the ability to bring GPU to the end user experience as the missing "special sauce" that brings virtual desktops to the next level. Now managing the NVIDIA GRID Performance Engineering Lab, his focus is on performance and scalability to deliver the best value with the highest end user experience across all virtual workloads.
Jason K. Lee Performance Engineer - GRID , NVIDIA
Jason K. Lee is part of the NVIDIA GRID Performance engineering team and responsible for testing and evaluating NVIDIA GRID platform, performance investigation and benchmarking, developing automation and example code for NVIDIA GRID platform. Former Solution Architect, Software engineer and developer.

Join us for a technical dive into benchmarks and real workloads. How do real metrics stack up against what a benchmark tells you? Learn about the various performance characteristics, application tuning options, as well as hardware and platform design considerations.

Level: Intermediate
Type: Talk
Tags: Graphics Virtualization; Performance Optimization; Product & Building Design

Day: Tuesday, 04/05
Time: 16:30 - 16:55
Location: Marriott Salon 4

S6273 - Deep Learning at the Edge of the Network

Hugo Latapie Principal Engineer, Cisco
30 years HW/SW development experience ranging from defense industry projects to television technologies and networking. Primary focus over the last five years has been applying machine learning and deep learning developments to a wide range of products such as multi-modal user interfaces, networking applications, and recently visual analytics. Most of my work over the past five years has leveraged Nvidia GPU’s and CUDA.

This talk will focus on the pragmatic use of high performance computing using NVIDIA GPUs and deep learning algorithms in visual, crowd, and behavioral analytics projects at Cisco. We will highlight use cases in IoT, SmartCities, retail, event analytics, and transportation. We'll also highlight our approach, architecture, and deployment models leveraging NVIDIA-docker, swarm, etc. Computing on end point devices, at the edge, and the cloud in a distributed heterogeneous model in support of the applications above will also be discussed. This talk is targeted primarily towards practitioners either actively engaged in product development or seriously contemplating it. However, we will also discuss advanced use cases that may be of interest to researchers. We will not be covering core deep learning algorithms (generative/discriminative/Boltzmann/Autoencoders/RNN/LSTM/…) although we will highlight our use of these algorithms.

Level: Intermediate
Type: Talk
Tags: Deep Learning & Artificial Intelligence; IoT; Big Data Analytics

Day: Tuesday, 04/05
Time: 16:30 - 16:55
Location: Marriott Salon 6

S6378 - Simplifying Multi-GPU Communication with NVSHMEM

Nathan Luehr Developer Technology Engineer, NVIDIA
Nathan Luehr is a senior developer technology engineer for compute applications at NVIDIA. He earned a Ph.D. in theoretical chemistry from Stanford University in June 2015.
Sreeram Potluri Senior CUDA Software Engineer, NVIDIA
Sreeram Potluri received his Ph.D. in computer science and engineering from Ohio State University and is a senior software engineer at NVIDIA Corp. His research interests include high-performance interconnects, heterogeneous architectures, parallel programming models, and high-end computing applications.

We'll present an overview of the NVSHMEM multi-GPU programming model. NVSHMEM is an implementation of the OpenShmem standard for GPUs. By providing fine-grained communication primitives between GPU threads, NVSHMEM improves communication latencies and can greatly reduce the complexity usually associated with multi-GPU programming. Two application studies are presented to illustrate the utility of NVSHMEM: CoMD, a molecular dynamics mini-application, and HPGMG, a geometric multi-grid solver.

Level: Intermediate
Type: Talk
Tags: Programming Languages; Tools & Libraries; Supercomputing & HPC

Day: Tuesday, 04/05
Time: 16:30 - 16:55
Location: Room 211B

S6388 - GPGPU Applications for Hydrological and Atmospheric Simulations and Visualizations on the Web

Ibrahim Demir Assistant Research Professor, University of Iowa
Ibrahim Demir develops web-based visualization and communication tools to make it easy to see information from complex and large-scale geo-spatial environmental datasets. His work ranges from crowdfunding stream sensors, citizen science projects for collecting environmental data, adoption of environmental sensors by the public, crowdsourcing flood predictions, SIRI-like knowledge engine for flooding, displaying flood warnings using augmented reality, creating a virtual flood simulation on a tabletop, and experimenting with novel devices like Google Glass, Leap Motion, and Microsoft Kinect for innovative projects on scientific visualization and interaction.

Learn about general-purpose computing on GPU applications using web technologies to improve speed of web-based scientific computing and visualizations. GPGPU is the use of a GPU to perform computation for operations other than graphics processing. WebCL defines a JavaScript binding to the OpenCL standard for parallel computing on the web. The presentation will include background information on scientific computing techniques (e.g. WebGL, WebCL, Web Workers, ASM.js, SIMD.js, etc.) on the web, and sample applications from hydrological and atmospheric sciences.

Level: Beginner
Type: Talk
Tags: In-Situ and Scientific Visualization; Big Data Analytics; Algorithms

Day: Tuesday, 04/05
Time: 16:30 - 16:55
Location: Room LL21D

S6399 - Accelerating Performance and Scalability with NVIDIA GPUs on HPC Applications

Pak Lui Application Performance Manager, HPC Advisory Council
Pak Lui is the Application Performance Manager for the HPC Advisory Council. He has been involved in demonstrating application performance on various open source and commercial applications. His main responsibilities involve characterizing HPC workloads, analyzing MPI profiles to optimize on the HPC applications, as well as exploring new technologies, solutions and their effectiveness on real HPC workloads. Pak works at Mellanox Technologies where his main focus is to optimize HPC applications on products, explore new technologies and solutions and their effect on real workloads. Pak has been working in the HPC industry for over 15 years. Prior to joining Mellanox Technologies, Pak worked as a Cluster Engineer at Penguin Computing, responsible for building and testing HPC cluster configurations from different OEMs and ISVs. Pak holds a B.Sc. in Computer Systems Engineering and a M.Sc. in Computer Science from Boston University.

GPU-based clusters are being adopted at a rapid pace in HPC clusters to perform compute-intensive tasks at a large scale. One of the main performance challenges in the deployments of this GPU clusters is the performance and latency of communications between GPUs across the interconnect fabric. The goal of this session is to highlight interconnect optimizations through MPI communication profiling that provides higher performance and better utilization that allow GPU cluster to scale. We will also demonstrate with a selection of HPC applications on InfiniBand cluster, with technology such as GPUDirect RDMA to see how to utilize this new feature to directly communicate in a peer-to-peer fashion, completely bypassing the CPU subsystem that allow application to perform and scale.

Level: Intermediate
Type: Talk
Tags: Supercomputing & HPC; Performance Optimization; Computer-Aided Engineering

Day: Tuesday, 04/05
Time: 16:30 - 16:55
Location: Room LL21A

S6515 - Listen, Attend and Spell

William Chan Ph.D. Candidate, Carnegie Mellon University
William Chan is a Ph.D. candidate at Carnegie Mellon University in the Department of Electrical and Computer Engineering. William graduated with an M.S. in electrical and computer engineering from Carnegie Mellon University in 2013, and a B.S. in computer engineering in 2011 from the University of Waterloo. His past industry experience includes internships at Google, Amazon, Intel, NVIDIA, AMD, and TD Securities. His current research crosses the fields of machine learning, deep learning, and speech recognition.

Most recently, Listen, Attend and Spell (LAS) was presented to directly transcribe speech utterances to characters. Unlike traditional DNN-HMM models, these models learn all the components of a speech recognizer jointly. The LAS model has two components: a listener and a speller. The listener is a pyramidal recurrent network encoder that accepts filter bank spectra as inputs. The speller is an attention-based recurrent network decoder that emits characters as outputs. The network produces character sequences without making any independence assumptions between the characters. We'll describe a distributed asynchronous training platform for training such an model on an array of GPUs.

Level: All
Type: Talk
Tags: Deep Learning & Artificial Intelligence; Signal & Audio Processing

Day: Tuesday, 04/05
Time: 16:30 - 16:55
Location: Room 210H

S6516 - Automatic Grading of Eye Diseases through Deep Learning

Apaar Sadhwani Ph.D. Candidate, Stanford University
Apaar Sadhwani is a Ph.D. candidate in Operations Research, Department of Management Science & Engineering at Stanford University. Prior to this, he obtained a B.Tech. in production engineering from Indian Institute of Technology, Delhi, and an M.S. in operations research from Stanford University. He has an extensive background in applied math, with research experience in probability theory. In his doctoral thesis, he pursues applications of mathematical models and deep learning to biometrics, healthcare, and finance. For example, his algorithms for biometrics are now used to authenticate more than 550 million users of the world's largest biometric program in India. He has significant interests in computer vision and has been a teaching assistant for machine learning, optimization, artificial intelligence, and algorithms at Stanford.

We'll outline the development of state-of-the-art medical imaging system using novel deep architectures that harness GPUs for accelerated training. Trained using data from Stanford Byers Eye Institute and Palo Alto VA Hospital, our model grades the severity of eye diseases and localizes lesions to help screen eye patients at primary care. At the heart of this system lies our hybrid approach to deep learning for high resolution images -- a large convnet with millions of parameters trained with downsized images, fused with a net trained on selected tiles of the high-resolution image. This innovative approach involves use of transfer learning, data augmentation, and multi-GPU systems to identify small-scale features that are critical to detecting eye diseases.

Level: Intermediate
Type: Talk
Tags: Deep Learning & Artificial Intelligence; Medical Imaging; Supercomputing & HPC; Press-Suggested Sessions: AI & Deep Learning

Day: Tuesday, 04/05
Time: 16:30 - 16:55
Location: Hall 3

S6566 - Heterogeneous Compute for Real-Time Image Processing Applications

Alan Purvis HPC Engineer @ The Foundry, The Foundry
Alan Purvis is originally from Dublin, Ireland where he studied Computer Graphics at Trinity College's GV2 Research Group. In previous employment he helped develop OpenGL drivers for Raspberry Pi and smartphones. Alan currently works at The Foundry's London based High Performance Computing team, where he specialises in inventing things.

We'll discuss work carried out at the Foundry on a heterogeneous image processing framework, utilizing all available CPU and GPU compute devices within a system. Complex graphs of processing effects can be authored in BLINK, a domain-specific language created in-house. By harnessing data parallelism, knowledge of transfer speeds, and device compute capabilities, we have developed a scheduling system for efficiently deploying workloads across all devices. The talk will give a brief overview of BLINK, how graphs of effects are authored, and the innovative use of our scheduling framework within a hybrid 3D rendering system for virtual production.

Level: Intermediate
Type: Talk
Tags: Media & Entertainment; Supercomputing & HPC

Day: Tuesday, 04/05
Time: 16:30 - 16:55
Location: Room LL21C

S6636 - GEM3: CPU-GPU Heterogeneous DNA Sequence Alignment for Scalable Read Sizes

Alejandro Chacon Ph.D. Student, Autonomous University of Barcelona
Alejandro Chacon is a fourth year Ph.D. student researching HPC applied to bioinformatics in the department of Computer Architectures and Operating Systems (CAOS) at Universitat Autonoma de Barcelona, Spain. He received a B.E. in computer science and M.S. in high performance computing and information theory in 2011 and 2012, respectively. He has been working with GPU architectures since his final degree project and in the CUDA library team and research group as an NVIDIA summer intern in 2015. His research interests include bioinformatics, high performance computing, and parallel heterogeneous systems.

Sequence alignment is one of the most computationally intensive steps in current bioinformatics analysis pipelines. Previous attempts to implement it in GPUs have failed to efficiently manage the inherent massive parallelism of the problem. The obvious data parallel strategy, which is having each read sequence being processed independently by a different thread, is very irregular and spans to a very large memory footprint. We'll introduce a CPU-GPU heterogeneous algorithm designed for GEM, a high-quality and already adopted aligner. It selects and packs regular work generated by the pipeline to be offloaded to multiple GPU devices; meanwhile, CPU cores cope with the filtered divergent cases, allowing to scale the read size and improving the quality of results on the new sequencing technologies.

Level: Intermediate
Type: Talk
Tags: Computational Biology; Performance Optimization; Supercomputing & HPC

Day: Tuesday, 04/05
Time: 16:30 - 16:55
Location: Marriott Salon 5

S6730 - Flexible Cluster Rendering with Quadro® VCA

Ankit Patel Senior Product Manager, NVIDIA
Ankit Patel is a Senior Product Manager at NVIDIA. Prior to joining NVIDIA in 2011, Ankit worked in the media and entertainment industry for over 10 years. He has held product management positions at Matrox Video and Echolab, which was acquired by Blackmagic Design in 2010. Ankit is passionate about building products that allow creative individuals to realize their dreams, whether that's through creative storytelling or building amazing products. Ankit holds an MBA from Cornell University and a bachelor's degree in computer science from Concordia University in Montreal, Canada.

Learn how to deliver photograph-quality images faster than ever before with NVIDIA® Quadro® VCA. Accelerate design and VFX workflows with NVIDIA Quadro VCA, the fastest way to interact with photorealistic digital 3D models and scenes. This is a powerful network-attached appliance that harnesses the power of the highest-performing NVIDIA GPUs. It's accessible to anyone on the network, easily integrated into design workflows and effortlessly scales to multiple VCAs to minimize the time to noiseless physically-based global illumination.

Level: Beginner
Type: Talk
Tags: Rendering & Ray Tracing; Large Scale and Multi-Display Visualization; Product & Building Design

Day: Tuesday, 04/05
Time: 16:30 - 16:55
Location: Room LL21B

S6745 - VQA: Visual Question Answering

Aishwarya Agrawal Ph.D. Student, Virginia Tech
Aishwarya Agrawal is a second year Ph.D. student at the Bradley Department of Electrical and Computer Engineering at Virginia Tech. She is a member of the Virginia Tech Machine Learning and Perception Lab and is advised by Dhruv Batra. Her research interests lie at the intersection of machine learning, computer vision and natural language processing with a focus on multi-modal Artificial Intelligence, e.g. Visual Question Answering (VQA).

We'll describe the task of free-form and open-ended Visual Question Answering (VQA). Given an image and a natural language question about the image (e.g., "What kind of store is this?", "How many people are waiting in the queue?", "Is it safe to cross the street?"), the machine's task is to automatically produce an accurate natural language answer ("bakery", "5", "Yes"). Answering any possible question about an image is one of the 'holy grails' of AI requiring integration of vision, language, and reasoning. We have collected and recently released a dataset containing >250,000 images, >750,000 questions, and ~10 Million answers (www.visualqa.org). We are also running VQA challenge (www.visualqa.org/challenge.html) which includes both an open-ended answering task and a multiple-choice task.

Level: Intermediate
Type: Talk
Tags: Computer Vision & Machine Vision; Deep Learning & Artificial Intelligence; Big Data Analytics

Day: Tuesday, 04/05
Time: 16:30 - 16:55
Location: Room 210F

S6813 - Real-Time 3D Imaging and Guidance of Intraocular Surgery Using NVIDIA GPUs

Joseph Izatt Michael J. Fitzpatrick Professor of Engineering, Duke University
Joseph A. Izatt is the Michael J. Fitzpatrick Professor of Engineering in the Edmund T. Pratt School of Engineering, Professor of Ophthalmology, and Program Director for Biophotonics at the Fitzpatrick Institute for Photonics at Duke University in Durham, North Carolina, USA. Joseph's research interests include biomedical optics and spectroscopy, coherence-based optical imaging in scattering media, and novel instrumentation for minimally invasive medical diagnostics. He is a Fellow of the American Institute for Medical and Biological Engineering (AIMBE), Society of Photo-Instrumentation Engineers (SPIE), and Optical Society of America (OSA).

We'll describe use of NVIDIA GPUs for real-time volumetric imaging and guidance of intraocular microsurgery. Surgery to correct blinding conditions in the cornea, lens, and retina of the eye is performed through a surgical microscope suspended over the patient. During these intricate procedures, surgeons have only indirect cues to gauge the depth of their surgical tools within the eye. Optical coherence tomography (OCT) is a novel biomedical imaging technique which can produce three-dimensional images with micrometer resolution. Using NVIDIA GPUs to compute and render image data in a custom stereoscopic heads-up display, our system incorporates OCT imaging into surgery to allow surgeons to view and interact with live volumetric visualizations of tissue structures during surgery.

Level: All
Type: Talk
Tags: Medical Imaging; Rendering & Ray Tracing; In-Situ and Scientific Visualization; Press-Suggested Sessions: Professional Graphics; Press-Suggested Sessions: HPC & Science

Day: Tuesday, 04/05
Time: 16:30 - 16:55
Location: Marriott Salon 1

S6832 - Ford's Autonomous Vehicles Using GPUs

Mark Crawford Technical Expert Autonomous Vehicles, Ford Motor Company
Mark Crawford is a Technical Expert in autonomous vehicles at Ford Motor Company, where he is working in an advanced engineering team to develop Ford’s autonomous vehicle. He holds two diploma degrees, a B.S. and M.S., in Mechanical Engineering from the Missouri University of Science and Technology and is currently pursuing his Ph.D. in Information Systems Engineering at the University of Michigan – Dearborn.

In this presentation, we discuss Ford's autonomous vehicle technology including an overview of the tasks of sensing, sensor fusion, localization and mapping, object detection and object classification. We examine the impact of GPU hardware to achieve significant improvements to the computational efficiency of our parallelized algorithms for vehicle localization based on a combination of a synthetic aperture camera (derived from lidar data) and a Gaussian mixture 3d map approach. We provide an overview of some preliminary results of our deep learning research in the novel area of lidar-based methods for vehicle localization and object classification.

Level: All
Type: Talk
Tags: Self-Driving Cars & Automotive ; Robotics & Autonomous Machines; Deep Learning & Artificial Intelligence; Press-Suggested Sessions: AI & Deep Learning; Press-Suggested Sessions: Self-Driving Cars & Auto

Day: Tuesday, 04/05
Time: 16:30 - 16:55
Location: Room LL21E

S6855 - Exascale Challenges for Numerical Weather Prediction: The ESCAPE Project

Olivier Marsden Research scientist, ECMWF
Olivier Marsden obtained his Ph.D in Computational AeroAcoustics in 2005 from the Ecole Centrale de Lyon under the direction of Professor Charles Bailly. He has held a position of associate professor of acoustics at ECL since 2007, and since 2015 is working as a research scientist in the field of data assimilation at ECMWF, the European Centre for Medium-range Weather Forecasts.

The European Centre for Medium-Range Weather Forecasts has been at the cutting edge of Numerical Weather Prediction for the past 40 years, and is making sure it will remain so as HPC heads for the exascale. To this end, ECMWF is leading the EU H2020 ESCAPE project, which promises to address the many requirements necessary for achieving exascale NWP. After talking about the general strategy that ECMWF currently envisages for accelerator usage, we'll look at GPGPU work being carried out for the ESCAPE project, focusing on two important components of the ECMWF weather model, the cloud physics routine and spectral transforms.

Level: Intermediate
Type: Talk
Tags: Earth System Modelling; Supercomputing & HPC; OpenACC

Day: Tuesday, 04/05
Time: 16:30 - 16:55
Location: Room 211A

S6872 - Hybrid Reality at NASA Powered by NVIDIA

Matthew Noyes Aerospace Technologist, National Aeronautics and Space Administration
Mr. Noyes is the Lead Software Engineer of the NASA Johnson Space Center Hybrid Reality and Advanced Operational Concepts Lab in Houston, TX. Topics of research include exploring, expanding and implementing technologies such as consumer virtual reality and game engines into next-generation systems for astronaut crew training, engineering analysis and scientific visualization.
Francisco Delgado Aerospace Technologist / Project Manager, National Aeronautics and Space Administration
Francisco Delgado is the Program Manager of the NASA Johnson Space Center Hybrid Reality and Advanced Operational Concepts Lab in Houston, TX. Topics of research include exploring, expanding and implementing technologies such as consumer virtual reality and game engines into next-generation systems for astronaut crew training, engineering analysis and scientific visualization.

This session demonstrates how NASA is using consumer VR headsets, game engine technology and NVIDIA's VRWorks to create highly immersive astronaut crew training augmented with extremely realistic haptic feedback, and to improve engineering workflow. Examples explored include a simulation of the International Space Station where users can interact with handlebars and tracked physical objects in the real world while inside VR, and an instance of how NASA is using one type of virtual reality technology (a consumer headset) to completely simulate and develop another type of virtual environment (a reconfigurable CAVE). Attendees will learn about how the best elements of real and virtual worlds can be combined into a hybrid reality with tangible scientific applications.

Level: All
Type: Talk
Tags: Virtual Reality & Augmented Reality; Graphics Virtualization; Aerospace & Defense

Day: Tuesday, 04/05
Time: 16:30 - 16:55
Location: Room LL20C

S6107 - Robust Model-Based 3D Head Pose Estimation

Shalini Gupta Senior Research Scientist, NVIDIA
Shalini Gupta has been a senior research scientist in the Mobile Visual Computing group of NVIDIA Research since April 2013. From 2011 to 2013, she worked as a senior mobile computer vision engineer at NVIDIA, where she designed and productized computer vision and computational photography solutions for mobile platforms and GPUs. She worked as an imaging and architecture scientist at Texas Instruments, from 2008 to 2010, where she designed algorithms for the image signal processing pipeline of mobile phones, at AT&T Laboratories on their IPTV project, and at Advanced Digital Imaging Research, LLC, where she designed algorithms for 3D human face recognition. Shalini received her M.S. and Ph.D. in electrical and computer engineering from the University of Texas at Austin in 2004 and 2008, respectively. She received a B.S. in electronics and electrical communication engineering from Punjab Engineering College, India, in 2002. She is a recipient of the Summer Research Fellowship 2001, awarded by the Jawaharlal Nehru Center for Advanced Scientific Research, Bangalore, India. Her primary research interests are image/signal processing, computer vision, and machine learning, and their application to scene understanding and interpretation.

Depth cameras have become cheap and ubiquitous. We introduce a computer vision algorithm for accurate, three-dimensional (3D) head pose (rotation and translation) estimation, which runs in near real time in CUDA. It works with different commodity depth sensors with minimal adaptation, handles large head rotations and occlusions gracefully, and does not require cumbersome subject initialization. Our algorithm results in an angular error of 2 degrees and a transnational error of 6 mm. It outperforms all seven competing methods on a benchmark data set. Accurate head pose estimation is an important fundamental problem in computer vision. It is a prerequisite for gaze estimation, facial animation capture, face recognition, driver monitoring, and head-coupled, 3D perspective displays.

Level: Intermediate
Type: Talk
Tags: Computer Vision & Machine Vision; Video & Image Processing; Intelligent Video Analytics (IVA)

Day: Wednesday, 04/06
Time: 09:00 - 09:25
Location: Room 210F

S6127 - NVIDIA Iray®: Changing the Face of Architecture and Design

Scott DeWoody Firmwide Creative Media Manager, Gensler
Scott DeWoody has always had an affinity for art and technology. After seeing the animation being done through computers, he knew he could combine the two. In 2007, he graduated from The Art Institute of Houston with a B.A. in media arts and animation. There, he focused on lighting and rendering techniques using 3ds Max software. Image quality and workflow are the top priorities in his work. He is constantly studying color theory, composition, and new ways to produce the best possible results. He has worked at Gensler for the past eight years as a visualization artist and manager. He has worked for numerous clients, including NVIDIA Corporation, ExxonMobil, Shell Oil Company, BP, City Center Las Vegas, and many more. He is exploring the new possibilities of architecture in the interactive space with gaming platforms, augmented reality, and virtual reality.
Hao Ko Principal, Gensler
Grounded by the belief that the fundamental role of an architect is to elevate the human spirit, Hao Ko strives to design beautiful places -- ones that inspire people and impact the way they live, work, and play. Always pursuing a high level of conceptual thinking, pushing performance boundaries, and detailed in execution and craft, Hao has just recently completed the Tower at PNC Plaza in Pittsburgh, Pa. Coupling a progressive workplace design to a unique and innovative passive, natural ventilation strategy driven by a breathable double-skin facade and solar chimney, this transformational project is designed to be the world's greenest high-rise building.

NVIDIA's Iray technology was a game changer in the design process of its new corporate campus. Gensler teamed up with developers at NVIDIA to help integrate this technology into the process to accurately simulate how the design of the campus would look in the real world. This process ended up helping everyone understand how light and materials were going to act in the 500,000-square-foot space. Being able to accurately compute how the massive amount of daylight coming into the space would react to changes in the design was incredible feedback for the designers. The data that Iray visualized helped with almost every design decision from start to finish.

Level: All
Type: Talk
Tags: Product & Building Design; Rendering & Ray Tracing; Press-Suggested Sessions: Professional Graphics

Day: Wednesday, 04/06
Time: 09:00 - 09:50
Location: Room LL21A

S6151 - XMP: An NVIDIA CUDA®-Accelerated Big Integer Library

Justin Luitjens Developer Technologies Engineer, NVIDIA
Highly-Rated Speaker
Justin Luitjens has been a developer technology engineer at NVIDIA for five years. He received his Ph.D. in scientific computing from the University of Utah.

We'll introduce the XMP library, which provides CUDA-accelerated implementations for many large integer arithmetic operations. These operations are generally used to implement encryption and decryption routines, including RSA, ECC, and Diffie-Hellman key exchange. We'll focus on what the capabilities of the library are along with how to efficiently use the library.

Level: All
Type: Talk
Tags: Aerospace & Defense; Tools & Libraries

Day: Wednesday, 04/06
Time: 09:00 - 09:25
Location: Marriott Salon 2

S6218 - TeamRGE.com - From the Fire Hose Series: Benchmarking and Scalability in Virtual Desktop Infrastructure (VDI) and Virtual Workstation Environments

Ruben Spruijt CTO, Atlantis Computing
Ruben Spruijt is CTO at Atlantis Computing, responsible for driving vision, technology evangelism, and thought leadership with Atlantis customers, partners and communities. Ruben is a well-regarded author, speaker, geek, market analyst, and all-around technologist. An established industry leader and luminary, he is one of only a few individuals in the world to hold three prestigious virtualization awards: Microsoft Most Valuable Professional (MVP), Citrix Technology Professional (CTP), and VMware vExpert. Ruben has presented more than 150 sessions at national and international events such as BriForum, Citrix iForum Japan, Citrix Synergy, Gartner Catalyst, Microsoft Ignite, Microsoft TechEd, NVIDIA GTC, and VMworld. Ruben co-founded several independent industry analysis bodies including ProjectVRC.team, Team Remote Graphics Experts (TeamRGE), AppVirtGURU, WhatMatrix. He has created and co-authored multiple disruptive 'Smackdown' research whitepapers.
Benny Trritsch CTO, wtstek
Benny Tritsch is a business developer, principal consultant, market analyst, author, and all-around geek specializing in enterprise Windows remoting and virtualization solutions. He is technical director Central Europe at Lakeside Software and speaks around the world at several conferences each year, including Microsoft TechEd/Ignite, Citrix Synergy, VMware VMworld, BriForum and E2EVC. He has received the Microsoft Most Valuable Professional (MVP) award for RDS since 2004, the Citrix Technology Professional (CTP) since 2006 and the VMware vExpert in 2015.

When planning to deploy high-end graphics apps in remote Windows sessions, scalability and UX are important success factors. Unfortunately, traditional benchmarking parameters such as frame rates and system performance counters do not entirely represent the perceived user experience on a remote client. Join TeamRGE in a session about benchmarking graphics-intensive remote user session and virtual desktop performance. We'll walk you through a working set of acceptance criteria and test methodology best practices used in real customer projects and when evaluating reference environments in our test labs. This session digs into the scalability of GRID in different scenarios and you'll get expert guidance on how to build your own remote UX test lab and what your criteria should be considering.

Level: All
Type: Talk
Tags: Graphics Virtualization; Data Center & Cloud Computing; Performance Optimization

Day: Wednesday, 04/06
Time: 09:00 - 09:50
Location: Marriott Salon 4

S6258 - VMD: Interactive Molecular Ray Tracing with NVIDIA OptiX™

John Stone Senior Research Programmer, University of Illinois at Urbana-Champaign
Highly-Rated Speaker
John Stone is a senior research programmer in the Theoretical and Computational Biophysics Group at the Beckman Institute for Advanced Science and Technology, and associate director of the NVIDIA CUDA Center of Excellence at the University of Illinois. John is the lead developer of VMD, a high-performance molecular visualization tool used by researchers all over the world. His research interests include molecular visualization, GPU computing, parallel processing, ray tracing, haptics, and virtual environments. John was named an NVIDIA CUDA Fellow in 2010. In 2015, he joined the Khronos Group Advisory Panel for the Vulkan graphics API. John also provides consulting services for projects involving computer graphics, GPU computing, and high performance computing.

We'll describe the adaptation of the popular molecular graphics program VMD for interactive ray tracing using NVIDIA OptiX, on computers ranging from laptops all the way up to large NVIDIA VCA GPU clusters, and petascale supercomputers such as the Blue Waters and Titan. We'll highlight the new OptiX 3.8 progressive rendering and remote device APIs, and show how they are using VMD for both local and remote VCA rendering. We'll highlight the use of OptiX GPU ray tracing for interactive panoramic and omnidirectional projections suited to planetariums, fulldome theaters, and VR headsets (HMDs) such as the Oculus Rift. The session will present the latest VMD+OptiX ray tracing performance data for workstation, VCA GPU clusters, and supercomputers.

Level: Intermediate
Type: Talk
Tags: Rendering & Ray Tracing; In-Situ and Scientific Visualization

Day: Wednesday, 04/06
Time: 09:00 - 09:25
Location: Room LL21B

S6266 - Automated Geophysical Feature Detection with Deep Learning

Chiyuan Zhang PhD Student, MIT
Chiyuan Zhang received his B.S. and M.S. in computer science from Zhejiang University, China, in 2009 and 2012, respectively. He is currently a Ph.D. candidate at the Computer Science and Artificial Intelligence Laboratory at MIT. His research interests include machine learning and computational neuroscience, as well as application to processing/analysis of speech, vision, and other kinds of real-world signals.

We introduce a novel approach to fault localization in oil and gas exploration based on automated feature detection with deep learning algorithms running on GPUs. Faults are key geological structures that can serve as boundaries for hydrocarbon reservoirs. Most current techniques that tackle this problem rely on seismic images, which are the outcome of expensive computing with substantial human intervention. We'll present latest results from a joint project by MIT and Shell International E&P Inc., on using deep learning to bypass the expensive processing mentioned above and perform fault detection on the raw seismic traces. We build a system in Julia/Mocha.jl and cuDNN to solve the challenging structured output prediction problem, and show promising preliminary results.

Level: Advanced
Type: Talk
Tags: Deep Learning & Artificial Intelligence; Energy Exploration

Day: Wednesday, 04/06
Time: 09:00 - 09:25
Location: Room 210H

S6295 - Counterparty Credit Risk and IM Computation for CCP on Multicore Systems

Prasad Pawar Developer, Tata Consultancy Services
Prasad Pawar is currently working as a developer at TCS in the Parallelization and Optimization group of HPC. Prasad received an M.S. in computer science and engineering from Kolhapur University, Maharashtra in 2008 and B.S. in computer science and engineering from Aurangabad University in 2005. He has seven years of experience in the HPC domain. Prasad is an inventor of one patent and published his research work in various national and international conferences. His research interests include high performance computing, parallelization and optimization, multicore programming, GPGPUs, and algorithms.
Nishant Kumar Consultant, Tata Consultancy Services Ltd
Presently working with Tata Consultancy Services Ltd for last 9.5 years and have an overall 10.5 years of software development experience in the area of HPC, Enterprise Messaging, Wireless telecom & Networking domain. Presently I have been working at Performance Engineering Research Centre (PERC), R&D lab as a Research Area Lead & primarily focus on System Performance & Optimizations in the area of HPC & Enterprise domain. Technical expert in Coding, Developing and optimizing applications using Parallel Computing languages such as CUDA, MPI, OpenMP.

We'll present how GPUs and performance optimization using the latest features available on Kepler GPUs enabled a critical risk estimation application in a trading system achieved near real-time performance compared to the 25 minutes on legacy systems. The counterparty credit risk is defined as the risk that the counterparty to a transaction could default before the final settlement of the transaction's cash flows. A CCP calculates the mark-to-market margin requirement for each member and blocks it from a member's collateral if the margin is not sufficient. Moreover, the GPU based approach minimizes the risk for the CCP.

Level: All
Type: Talk
Tags: Performance Optimization; Finance

Day: Wednesday, 04/06
Time: 09:00 - 09:50
Location: Room 212A

S6353 - Accelerating a Spectral Algorithm for Plasma Physics with Python/Numba on GPU

Manuel Kirchen PhD Student, Laser-Plasma Driven Light Sources Center for Free-Electron Laser Science, University of Hamburg
I am a physics PhD student at the University of Hamburg working in the field of plasma accelerators. Currently, my work focuses on the development of novel electromagnetic Particle-In-Cell algorithms, leveraging today's parallel computing architectures. In collaboration with Rémi Lehe (Lawrence Berkeley National Lab), I work on FBPIC, a spectral, quasi-3D and GPU-accelerated Particle-In-Cell code written in Python using Numba.
Remi Lehe Postdoctoral Researcher, Lawrence Berkeley National Laboratory
Remi Lehe graduated in physics from Ecole normale superieure, Paris, and obtained a Ph.D. from Ecole Polytechnique, France, where he studied plasma-based particle accelerators. His work on these accelerators is largely based on particle-in-cell (PIC) simulations, and in particular he developed an alternative finite-difference Maxwell solver, which is now implemented in several PIC codes used by research teams throughout the world (Osiris, PIConGPU, Warp, Calder). Remi is now a postdoctoral researcher at Lawrence Berkeley Laboratory, where he works on large-scale plasma simulations and advanced spectral algorithms.

Learn how a complex spectral algorithm can be rapidly ported to GPU, while writing only Python code. Overall, our spectral Particle-In-Cell simulation code runs ~40 times faster on one GPU than on one CPU. This is partly due to the extensive use of FFTs and matrix multiplications in our spectral solver. Those operations are very efficient on a single GPU, while they are relatively slow on a single CPU and difficult to parallelize over several CPUs. The entire code is written in Python, which allowed for fast development and debugging, while the Numba just-in-time compiler enabled high performance on GPU. In particular, we made use of cuFFT and cuBlas, of well-controled memory transfer from GPU to CPU, and of a parallel Radix Sort to avoid race conditions in critical sections of the code.

Level: Intermediate
Type: Talk
Tags: Computational Physics; Tools & Libraries

Day: Wednesday, 04/06
Time: 09:00 - 09:25
Location: Marriott Salon 6

S6444 - White Matter Tractography and Human Brain Connections Using GPUs

Moises Hernandez Fernandez Ph.D. Candidate, Oxford Centre for Functional MRI of the Brain (FMRIB). University of Oxford
Moises Hernandez Fernandez is a Ph.D. candidate at University of Oxford in clinical neurosciences under Professor Stephen Smith, Professor Mike Giles, Dr. Stamatios Sotiropoulos, and Dr. Istvan Reguly.

We'll present a novel analysis tool for diffusion MRI (dMRI) data using NVIDIA GPUs for mapping connections in the human brain. We'll describe the potential of dMRI and how it allows the study of brain microstructure and long-range brain connections, non-invasively and in-vivo (tractography). Due to the multidimensional nature of the data, modelling can be computationally demanding. We present a parallel framework for analysis of dMRI data that allows accelerations of up to two orders of magnitude when comparing GPU with CPU implementations. We'll highlight the tremendous benefit of these accelerations in very large recent studies such as the Human Connectome Project, where comprehensive maps of brain anatomical connectivity of unprecedented quality are being generated.

Level: Intermediate
Type: Talk
Tags: Medical Imaging; Press-Suggested Sessions: HPC & Science

Day: Wednesday, 04/06
Time: 09:00 - 09:25
Location: Room 212B

S6454 - Real-Time Graphics for Film Production at Pixar

Pol Jeremias Graphics Software Engineer, Pixar Animation Studios
Highly-Rated Speaker
Pol Jeremias is passionate about technology and art. He grew up near Barcelona and moved to California in 2006. Since then, Pol has researched computer graphics and worked in multiple games for companies such as LucasArts or SoMa Play. Today, he helps create movies at Pixar Animation Studios. In his spare time, he has co-founded Shadertoy.com and Beautypi. When he is not programming you will find him running, reading or watching movies.
Jeremy Cowles Lead Software Engineer, Pixar Animation Studios
Jeremy Cowles is the GPU team lead at Pixar, where he has contributed to Universal Scene Description, OpenSubdiv, and Pixar's animation system for Presto. He is also the co-architect of Hydra, Pixar's real-time render engine for film assets. Jeremy and his team will present Presto's next generation hybrid GL/path traced viewport architecture and the future of Presto real-time workflows, where these technologies come together.
Dirk Van Gelder Software Engineer, Pixar Animation Studios
Dirk Van Gelder joined Pixar Animation Studios in 1997 as an software engineer for Academy Award® nominated film “A Bug’s Life.” and winning short film “Geri’s Game”, working on animation software and the studio’s first use of subdivision surfaces. Dirk has worked on software for every Pixar movie since, including the ground-up rewrite of the studio's proprietary animation system Presto. Currently Dirk leads the Presto Character team within in the Pixar Studio Tools Department.

Join the Pixar GPU team for a session that explores how real-time graphics are used at Pixar. We'll cover the unique needs for film production, including loading and run-time management of massive movie sets and complex characters, real-time subdivision surfaces, real-time effects particularly useful for technical directors, and how these assets are rendered using the latest hardware features. Don't miss this great opportunity to learn about graphics, algorithms, and movies!

Level: All
Type: Talk
Tags: Media & Entertainment; Rendering & Ray Tracing; Real-Time Graphics; Press-Suggested Sessions: Professional Graphics

Day: Wednesday, 04/06
Time: 09:00 - 09:50
Location: Room LL21C

S6481 - Distributed Graph-Based Density Matrix Calculation for Quantum Molecular Dynamics Using GPUs

Susan Mniszewski Scientist, Los Alamos National Laboratory
Susan Mniszewski is a scientist in the Computer, Computational and Statistical Sciences Division at Los Alamos National Laboratory (LANL). Her work in computational co-design includes development of molecular dynamics proxy applications used to explore new programming models and algorithms for emerging hardware and software capabilities and exploration of sparse matrix and graph-based linear scaling approaches for quantum molecular dynamics on multicore, GPU-accelerated, and distributed architectures. This work was performed with C. Negre, M. Cawkwell, and A. Niklasson of LANL.

Quantum molecular dynamics (QMD) simulations are a highly accurate tool to predict material properties, with potential applications in targeted pharmaceuticals, fuel cells, and biomolecular systems. A graph-based second order spectral projection (SP2) approach is presented for calculation of the electronic density matrix from a Hamiltonian matrix. Large systems run distributed using OpenMP/MPI parallelism for the data decomposition, graph partitioning, submatrix extraction, and density matrix assembly. Compute-intensive SP2 calculations take advantage of GPU acceleration using the cuBLAS matrix algebra library. This hybrid parallel methodology is demonstrated for poly(ethylene) and protein structures solvated in water.

Level: All
Type: Talk
Tags: Computational Chemistry

Day: Wednesday, 04/06
Time: 09:00 - 09:25
Location: Marriott Salon 5

S6491 - Containerizing GPU Applications with Docker for Scaling to the Cloud: Future of Packaging Applications

Maciej Bajkowski COO, Bitfusion.io, Inc
Maciej Bajkowski is a proven innovator with extensive engineering experience in the design of high-speed components, memory systems and storage solutions for leading companies in the computing field, including Intel, Samsung, Freescale and Dell.
Subbu Rama CEO, Bitfusion.io, Inc
Subbu Rama has held engineering and leadership roles in hardware and software divisions, while building CPUs, micro-servers, SoCs, and cloud infrastructures, at companies like Intel and Dell. As a founding member, he built Dell's first cloud infrastructure marketplace.

We'll share different ways of packaging GPU applications as containers versus traditional options and shed light on performance versus portability. Docker containers wrap up a piece of software in a complete filesystem that contains everything it needs to run: code, runtime, system tools, system libraries -- anything you can install on a server. This guarantees that it will always run the same, regardless of the environment it is running in. We'll provide some background on linux containers, and its applicability to heterogeneous platforms, GPUs in specific, and challenges in adoption, and conclude with a demo of the whole process of containerizing, deploying, and managing GPU applications for the cloud.

Level: Beginner
Type: Talk
Tags: Data Center & Cloud Computing; Graphics Virtualization; Supercomputing & HPC

Day: Wednesday, 04/06
Time: 09:00 - 09:50
Location: Room 210E

S6531 - CUDA® Debugging Tools in CUDA 8

Vyas Venkataraman Engineering Manager, NVIDIA
Highly-Rated Speaker
Vyas is a Software Engineering Manager in the Developer Tools group at Nvidia. His team is responsible for the CUDA Debug API and cuda-memcheck. Vyas has been at Nvidia since 2010. He received his PhD in Computer Engineering from Boston University.
Kudbudeen Jalaludeen Software Engineer, NVIDIA

This talk will describe new features in debugging tools in the CUDA 8.0 toolkit.

Level: Intermediate
Type: Talk
Tags: Tools & Libraries

Day: Wednesday, 04/06
Time: 09:00 - 09:50
Location: Room 211B

S6572 - A Universal Trajectory Generator for Automated Vehicles

Christoph Klas Research Assistant and Function Developer, fka mbH, Aachen and Institute for Automotive Engineering, RWTH Aachen University
Christoph Klas is research assistant at the Institute for Automotive Engineering of RWTH Aachen University (ika) and function developer at Forschungsgesellschaft Kraftfahrwesen Aachen mbH (fka). After he finished his diploma in Electrical Engineering at RWTH Aachen University he has been working in the field of Driver Assistance Systems and Automated driving functions.

A universal, real-time capable NMPC (nonlinear model predictive controller) based implementation of a trajectory generator for highly automated vehicles is presented. Its main target is to serve as the central instance for all high-level ADAS or automated vehicle functions, therefore abstracting vehicle-dependent kinematics and dynamics. The trajectory planner is capable of the combined optimization of lateral and longitudinal dynamics in urban, rural, and highway scenarios. One of the major challenges besides stable system layout is the fast solution of the embedded optimal control problem. For this, a bespoke GPU-optimized implementation was developed; apart from the planner itself, details about this implementation will be presented.

Level: All
Type: Talk
Tags: Self-Driving Cars & Automotive

Day: Wednesday, 04/06
Time: 09:00 - 09:25
Location: Room LL21E

S6633 - Navigating the In-Situ Visualization Landscape

Tom Fogal Senior Software Engineer, NVIDIA
Thomas Fogal is an NVIDIA engineer specializing in HPC visualization. As a doctoral student, he worked on parallel volume rendering techniques as well as novel approaches to in situ visualization. At the Scientific Computing & Imaging Institute, ORNL, and LLNL, he worked on parallel rendering for large scientific data. Thomas holds a B.S. and an M.S. from the University of New Hampshire, and will soon have a doctorate from the University of Duisburg-Essen in Germany.

You'll learn how to navigate the complex landscape of in-situ visualization. There are a number of technologies and a variety of design challenges to overcome when adding in-situ visualization into simulation software. Should your coupling be tight or loose? Does 'in transit' visualization make sense in your environment? What value do you gain from coupling with VisIt's libsim or ParaView's Catalyst? How can you adjust to low-memory environments? Is high-performance analysis via VTK-m applicable in your workflows? How much temporal resolution does one need on the visualization side? What's the best way to approach CUDA-OpenGL interop to get zero-copy visualization? How should you organize data for visualization?

Level: Beginner
Type: Talk
Tags: In-Situ and Scientific Visualization; Programming Languages

Day: Wednesday, 04/06
Time: 09:00 - 09:25
Location: Room LL21D

S6683 - "Piz Daint" and "Piz Kesch": From General Purpose GPU-Accelerated Supercomputing to an Appliance for Weather Forecasting

Thomas Schulthess Director, Swiss National Supercomputing Centre
Thomas is Director of the Swiss National Supercomputing Centre (CSCS) and a professor for computational physics at ETH Zurich. He received his PhD in physics in 1994. Since 2010 he has taken interest in refactoring climate codes to take advantage of novel, energy efficient computing architectures.

One of today's biggest challenges for scientific computing is the rapidly developing architectural diversity and heterogeneity in computing systems. Application developers no longer face just concurrency as the major obstacle when scaling simulation codes, but have to adapt software to diverging architecture specific programming models and heterogeneous memory subsystems, requiring significant efforts in refactoring of software and development of new algorithms. In this talk, we will show how CSCS has turned these challenges into opportunity which led to software development collaborations with HPC centers in Europe, USA, and Japan, the deployment of "Piz Daint", a GPU-accelerated supercomputer that is among the top 10 systems worldwide that enabled development.

Level: All
Type: Talk
Tags: Supercomputing & HPC; Earth System Modelling; Press-Suggested Sessions: HPC & Science

Day: Wednesday, 04/06
Time: 09:00 - 09:25
Location: Room 211A

S6772 - A Platform for Accelerating Machine Learning Applications (Presented by Hewlett Packard Enterprise)

Ben Chandler Senior Research Scientist, Hewlett Packard Enterprise
Ben is a Senior Research Scientist in the Software and Analytics Lab at Hewlett Packard Labs. He holds a PhD from Boston University in Cognitive and Neural Systems and a BS in Cognitive Science from Carnegie Mellon.

Deep convolutional neural networks and other machine learning algorithms are both performance-critical and difficult to optimize. Our project, Cog ex Machina, provides a domain-specific embedded language (DSL) tuned for machine learning and data analysis applications, a compiler, and a runtime. The DSL, hosted on the Scala language, simplifies the process of writing accelerated applications, while preserving the information the compiler needs to emit efficient accelerator code. Our compiler performs kernel fusion and common subexpression elimination among other optimizations. Our runtime provides a simple control interface and reuses buffers in order to reduce the application's GPU global memory footprint. This session by Hewlett Packard Enterprise will outline the design and implementat

Level: All
Type: Talk
Tags: Deep Learning & Artificial Intelligence; Big Data Analytics; Algorithms

Day: Wednesday, 04/06
Time: 09:00 - 09:50
Location: Room 210G

S6786 - The Audi VR Experience - A Look into the Future of Digital Retail

Marcus Kuehne Project Lead Audi VR experience, Audi AG
Marcus Kuehne is a progressive mind, bringing several innovations to the car industry during his career. He studied Interface Design and started his way at the Audi product marketing. After that, Marcus changed to electronical development and was responsible for development and market introduction of the MMI touch – the first fully integrated touchpad based car HMI. In 2013, Marcus returned to the marketing & sales department and took over the project lead for the Audi VR experience. For a real VR enthusiast like him, it's a fulfilled dream to realize one of the most ambitious and complex VR industry applications.
Thomas Zuchtriegel Team Lead Digital Retail Solutions, Audi AG
Thomas Zuchtriegel is specialized in leading cross-functional teams to create amazing experiences using disruptive technology, e.g. the world's first digital showroom Audi City, a ground breaking digital retail experience. He studied Digital Film Making at the Middlesex University in London and worked as a creative/technical/managing consultant for many Automotive OEMs. Since 2012 Thomas leads the Audi City project team within Audi Digital Retail and is now working with Marcus on the Audi VR experience.
Darren Jobling CEO, Zerolight
Darren Jobling is the CEO and visionary behind ZeroLight, the interactive visualisation specialist for the automotive industry. Darren’s 25 years as a games industry executive provided the foundation for the company’s innovative use of games technology in commercial sectors, allowing ZeroLight to have rapidly become a leader in automotive visualisation.

We'll give an insight into the philosophy behind the "Audi VR Experience" and share that experience with you. We'll share the challenges as well as the learnings from creating this VR experience. We'll explain why Audi is an attractive industry partner for all VR technology and content companies. With special focus on visual performance. Darren Jobling, CEO of project partner Zerolight, will join us. He'll explain how Zerolight managed to create the VR visual performance defined by Audi.

Level: All
Type: Talk
Tags: Virtual Reality & Augmented Reality; Self-Driving Cars & Automotive ; Product & Building Design; Press-Suggested Sessions: Self-Driving Cars & Auto; Press-Suggested Sessions: Virtual Reality

Day: Wednesday, 04/06
Time: 09:00 - 09:25
Location: Room LL20C

S6837 - Hardware and Software Platform for Next Generation Industrial Drones

Chetak Kandaswamy Research Engineer, NII Predinger Lab
Chekat Kandaswamy is pursuing his 5th year in joint doctoral program at University of Minho, Aveiro and Porto, Portugal with Jaime Cardoso and Luis Silva. Chekat's research revolves around deep learning and transfer learning, their applications and interactions. Over the course of his Ph.D. Chekat squeezed in three internships at INEB where he worked on drug discovery for breast cancer using high-content images with stacked auto-encoders architectures, at INESC TEC he worked on identification of the Individual based on the particular region of the face with stacked de-noising auto-encoders/convolutional neural networks. Presently his is in an internship in a joint collaboration between NII, Tokyo and enRoute, Japan on situational awareness for ground-bots using real-time images captured using drones.
Kai Yan R&D Director, enRoute Co., Ltd.
Kai Yan is the research and development director at enRoute co., Ltd. Kai is a former researcher of quantum computing at National Institute of Informatics and a visiting scholar at Stanford University. His expertise is solving NP-complete problems using quantum algorithm. He co-founded LabRomance Inc. in 2014 which developed controllers for robotics. In 2015 he joined enRoute and is leading the R&D for advanced Drones. His research topic is to use quantum computing in the flight route optimization for Drone traffic control. Kai received his Ph.D in engineering from the University of Tokyo.
Helmut Prendinger Professor, National Institute of Informatics, Tokyo
Helmut Prendinger received his Masters and Doctoral degrees in Logic and Artificial Intelligence from the University of Salzburg in 1994 and 1998, respectively. Since 2012, he is a full professor at the National Institute of Informatics (NII), Tokyo, after joining NII in 2004 as Associate Professor. Previously, he held positions as research associate (2000 – 2004) and JSPS postdoctoral fellow (1998 – 2000) at the University of Tokyo, Dept. of Information and Communication Engineering, Faculty of Engineering. In 1996-1997, he was a junior specialist at the University of California, Irvine. His research interests include artificial intelligence including machine learning, intelligent user interface, cyber-physical systems, and the melding of real and virtual worlds, in which areas he has published more than 220 peer-reviewed journal and conference papers. His vision is to apply his research to establishing the IT infrastructure for Unmanned Aerial Vehicles, or "drones". He is a member of IEEE and ACM.

We'll show the latest material and battery technologies for next generation industry drones, as well as CUDA based Software platform for industry applications. Three major industrial solutions are introduced: 1) Deep learning enabled Vision Navigation, 2) Vision based autonomous Drone interceptor and 3) Vision assisted search & rescue missions.

Level: All
Type: Talk
Tags: Robotics & Autonomous Machines; Deep Learning & Artificial Intelligence; Aerospace & Defense

Day: Wednesday, 04/06
Time: 09:00 - 09:25
Location: Room LL20D

S6846 - Blue Waters Experiences, Observations and Projections for GPU use in Open, Scientific, Extreme Scale Research Systems

William Kramer Director of Blue Waters and Research Professor of Computer Science , National Center for Supercomputing Applications at the University of Illinois at Urbana-Champaign
Professor William T.C. Kramer is Director and Principle Investigator of the Blue Waters Project, the Director of the UIUC/NCSA @Scale Program office and a full Research Professor in the UIUC Computer Science Department. Bill is responsible for leading all aspects of the Blue Waters project, a National Science Foundation-funded project at NCSA. Blue Waters is the first sustained Petascale computing system. It is the most powerful general purpose computational and data analytics available to open science, system available, and by far the largest system Cray has ever built. It is one of the most powerful resources for the nation’s researchers and is the only public Top-5 systems in the world that chose not to list on the Top-500 list. Every year, Blue Waters is delivery over 6 billion core*hour equivalents of computational time to the nation’s leading science and engineering projects. Previously Bill was the General Manager of the NERSC at Lawrence Berkeley National Laboratory (LBNL) was responsible for all aspects of operations and customer service for NASA's Numerical Aerodynamic Simulator (NAS) supercomputer center. Blue Waters is the 20th supercomputer Kramer deployed and/or manages, deployed and managed large clusters of workstations, five extremely large data repositories, some of the world’s most intense networks. He has also been involved with the design, creation and commissioning of six “best of class” HPC facilities. He holds a BS and MS in computer science from Purdue University, an ME in electrical engineering from the University of Delaware, a PhD in computer science at UC Berkeley. Kramer’s research interests include large-scale system performance evaluation, systems and resource management, fault detection and resiliency, and cyber protection. Kramer has awards from NASA, Berkeley Laboratory, the Association for Computing Machinery (ACM) and was named one of HPCWire’s “People to Watch” in 2005 and 2013 and Inside HPC first “Rockstar of HPC”. He is the founder of several organizations, including the ACM/IEEE George Michael Memorial HPC Fellowship, the Joint Laboratory for Extreme Scale Computing, the Open Science Grid Executive Committee and the DECUS Seminar Program. Along with other certifications, he is a certified GSA “Trail Boss”, OMB/DOE Project Manger II, NASA Management Excellent Program Graduate, Open Water Scuba Instructor, Illinois Certified Firefighter, Emergency Medical Technician and FAA private pilot.

The "sustained Petascale" Blue Waters Supercomputer is the most powerful and productive supercomputer serving the entire academic and open science communities and the largest system Cray has ever created. Blue Waters is enabling "grand challenge" solutions for problems ranging from HIV and Ebola virus, to earthquake analysis, to severe weather, to the search for gravitational waves to the economics of climate change policy. The talk will discuss the architectural decisions for Blue Waters, cover experiences advanced research teams have using GPUs, highlight some of the efforts underway to expand use of GPUs and than draw observations for future generation systems that potentially would include GPUs.

Level: All
Type: Talk
Tags: Supercomputing & HPC; Performance Optimization; Press-Suggested Sessions: HPC & Science

Day: Wednesday, 04/06
Time: 09:00 - 09:25
Location: Marriott Salon 3

S6869 - Caffe: An Open Framework for Deep Learning

Evan Shelhamer PhD Student, Caffe Lead Developer, UC Berkeley
Evan Shelhamer is a PhD student at UC Berkeley advised by Trevor Darrell as a member of the Berkeley Vision and Learning Center. His research is on deep learning and end-to-end optimization for vision. He is the lead developer of the Caffe deep learning framework and takes his coffee black.

Caffe is an open framework for deep learning that equips researchers and engineers with state-of-the-art tools and models. Caffe and its community provide an open source library, reference models, and do-it-yourself examples. We'll highlight scientific and industrial usage of Caffe, talk about recent changes in the latest roast, and discuss future directions. At present the framework has 150+ contributors, 1,000+ citations, and 5,000+ forks.

Level: Beginner
Type: Talk
Tags: Deep Learning & Artificial Intelligence; Computer Vision & Machine Vision

Day: Wednesday, 04/06
Time: 09:00 - 09:50
Location: Grand Ballroom

S6130 - 3D Deep Learning

Jianxiong Xiao Assistant Professor, Princeton University
Jianxiong Xiao is an assistant professor in the Department of Computer Science at Princeton University and the director of the Princeton Vision Group. He received his Ph.D. from the Computer Science and Artificial Intelligence Laboratory (CSAIL) at Massachusetts Institute of Technology (MIT). Jianxiong's research interests are in computer vision. He has been motivated by the goal of building computer systems that automatically understand visual scenes, both inferring the semantics and extracting 3D structure. Jianxiong focuses on 3D deep learning, RGB-D recognition and reconstruction, place-centric 3D context modeling, graphics for vision (synthesis for analysis), deep learning for autonomous driving, large-scale crowd-sourcing, and petascale big data. His work has received the Best Student Paper Award at the European Conference on Computer Vision (ECCV) in 2012 and Google Research Best Papers Award for 2012. Jianxiong was awarded the Google U.S./Canada Fellowship in Computer Vision in 2012, MIT CSW Best Research Award in 2011, and two Google Research Awards in 2014 and in 2015.

We'll discuss some of our research projects about 3D deep learning in computer vision, including our projects to use 3D convolution neural networks on GPUs to learn 3D descriptors for point features, to model 3D shapes, and to parse 3D scenes. Finally, we'll talk about Marvin, a deep learning software framework for N-dimensional data that we developed for NVIDIA GPUs, which could impact other fields, such as neural sciences, biology, medical images, and healthcare.

Level: All
Type: Talk
Tags: Computer Vision & Machine Vision; Robotics & Autonomous Machines; Deep Learning & Artificial Intelligence

Day: Wednesday, 04/06
Time: 09:30 - 09:55
Location: Room 210F

S6220 - Not Just a Universal Crutch: Other Useful Things to Do with atomicCAS

Elmar Westphal Scientific Programmer, Forschungszentrum Jülich GmbH
Highly-Rated Speaker
Elmar Westphal has been working as a programmer and cluster architect at Forschungszentrum Juelich for more than 15 years. In the last years he ported simulation programs from different fields of computational physics to single- and multi-GPU systems and developed CUDA-based building blocks, libraries, and applications, mostly for molecular dynamics and micromagnetism simulations.

There is more to atomicCAS than the double-precision atomicAdd loop from the programming guide. Something different from the universal atomic operation loop it represents. We'll show how to build shared, memory-based hash function loops to solve different counting and grouping problems at warp- and block-level. Variations of this loop can be used to count unique elements in a block, find threads sharing common data elements, or speed up histogram building for large numbers of bins. With the now natively implemented atomic operations on shared memory on Maxwell, these functions can be significantly faster than algorithms optimised for other architectures.

Level: Advanced
Type: Talk
Tags: Algorithms; Performance Optimization

Day: Wednesday, 04/06
Time: 09:30 - 09:55
Location: Marriott Salon 3

S6270 - How to Create Photoreal Configurators Using Lightworks Iray+®

Dave Hutchinson Chief Technology and Operating Officer, Lightwork Design Ltd.
Dave Hutchinson converts leading lightwork design into new business opportunities through Lightworks Iray+, Iray+ Configurator, Iray+ for 3DSMax, NVIDIA Iray, and associated consultancy and digital services. Dave leads product development, engineering, support, and customer interactions. He is also responsible for driving commercial strategy and directing the marketing and sales teams to achieve new sales and existing customer satisfaction. Dave has an extensive background in visualization technology and the 3D market.
Dave Coldron Product Director, Lightwork Design Ltd.
As Lightworks Product Director, Dave Coldron has responsibility for the development of the Iray+ ecosystem, including Iray+ for 3DSMax and the new Iray+ Configurator. With over 20 years of experience in developing integrated systems for the computer graphics industry, Dave knows how to create applications that support the design workflow; focusing on the use of compelling digital content, interactive design, and the user experience.

Photoreal configurators powered by GPUs have come of age and we present an industry view on where we are seeing the demand emerging for this technology within product and architectural design review and client presentation through to enabling of dealership and online consumer product customisation. We'll use real-world projects to illustrate the demand we are seeing, outline what we and our clients had to do to create a great experience, and describe why the GPU is key to its success. With NVIDIA Iray technology being delivered within key products such as Iray+ for 3DSMax and Iray+ for Siemens NX, extending the power of physically based rendering through configuration in complementary workflows is being demanded by the industry. Join us to find out more about this exciting and growing area.

Level: Beginner
Type: Talk
Tags: Rendering & Ray Tracing; Product & Building Design

Day: Wednesday, 04/06
Time: 09:30 - 09:55
Location: Room LL21B

S6294 - Live, Interactive, In-Situ Visualization of Large-Scale Plasma Simulations

Axel Huebl PhD Student, Helmholtz-Zentrum Dresden - Rossendorf
Axel Huebl is one of the main developers of the PIConGPU laser plasma simulation and one of the inventors of the OpenPMD metadata format for particle mesh data. He has been part of the team to bring PIConGPU into the finals of the Gordon Bell award 2013. Axel currently works on his master thesis on laser-driven ion acceleration for cancer therapy and the interaction of X-Ray lasers with solid density plasmas.

In large-scale scientific simulations, I/O has become a bottleneck that can slow down the exploration of unknown physical scenarios. We show that it is vital to view a HPC system not only in its ability to simulate the system but also to visualize the simulated data. By keeping the data of the simulation in the GPU memory, remote analysis via a Wi-Fi connection can work at frame rates well above 10 fps while latencies are not of importance, even when spanning continents. This presentation includes a live demo.

Level: All
Type: Talk
Tags: In-Situ and Scientific Visualization; Supercomputing & HPC; Computational Physics

Day: Wednesday, 04/06
Time: 09:30 - 09:55
Location: Room LL21D

S6304 - Efficient Parallelization of Molecular Dynamics Simulations on Hybrid CPU/GPU Supercomputers

Jaewoon Jung Research Scientist, RIKEN AICS
Jaewoon Jung works as a technical scientist for RIKEN, Japan. He joined RIKEN as a research scientist in 2010. Jaewoon earned his Ph.D. in physics at the Institute of Science and Technology, Korea.
Yuji Sugita Chief Scientist, RIKEN
Yuji Sugita has a Ph.D in Chemistry from Kyoto University in 1998. He joined RIKEN as a Postdoctoral Fellow in 1998 and since 2012 has worked as the Chief Scientist at RIKEN's Theoretical Molecular Science Laboratory.

We address an efficient parallelization scheme of molecular dynamics (MD) simulations on hybrid CPU/GPU supercomputer systems. In this scheme, the most time-consuming calculations, the real-space nonbonded interactions and the setup of the pairlist for the nonbonded interactions are performed on GPUs, while the rest of the calculations are done on CPUs. In our program, GENESIS (Generalized-ensemble simulation system), we introduced a novel domain decomposition scheme, which we call the midpoint cell method, for its good weak scaling on massively parallel (CPU-based) supercomputer. This method is also applicable to hybrid CPU/GPU supercomputer systems for simulating large-scale biological systems. We show the performance of GENESIS on TSUBAME supercomputer.

Level: Intermediate
Type: Talk
Tags: Computational Chemistry

Day: Wednesday, 04/06
Time: 09:30 - 09:55
Location: Marriott Salon 5

S6376 - Development of a Track Trigger Based on GPUs for the CMS Experiment at CERN

Felice Pantaleo Physicist & PhD Student, CERN
Felice Pantaleo is a high-energy physicist (M.S. at University of Pisa), working at CERN for the CMS experiment. He has worked with GPUs since 2008, for astrophysical simulations, maximization of likelihood for fast fitting in ROOT framework. In the last 4 years his work has focused on real-time triggering for the NA62 and CMS experiments. He is a Ph.D. student at CERN and the University of Hamburg.

We'll discuss the CMS experiment at CERN, which is planning a major upgrade to cope with an expected average number of overlapping collisions per bunch crossing of 140. A key element of this upgrade will be the introduction of tracker information at the very first stages of the trigger system for which several possible hardware implementations are under study. In particular the adoption of GPUs in the first level of the trigger system is currently being investigated.

Level: All
Type: Talk
Tags: Computational Physics; Algorithms; Supercomputing & HPC

Day: Wednesday, 04/06
Time: 09:30 - 09:55
Location: Marriott Salon 6

S6377 - Building the Fully Digital Audi Virtual Cockpit

Horst Hadler Manager Cluster Instruments and Graphics Framework, e.solutions
Horst Hadler joined e.solutions in 2009 as one of the initial members in the infotainment group managing the framework team. With the definition of the virtual cluster Horst is responsible for the cluster instrument development and graphics framework. He has a degree from University Erlangen and specialized in computer graphics. Before his infotainment time, he did visual effects for a motion picture studio and worked in a project for simulation of heat distribution in high temperature furnaces at the Erlangen University's computer graphics chair.

Get an overview of the techniques used for Audi's Tegra 3 powered virtual cockpit, focusing on the topics (1) reduction of start-up time, (2) instrument display with 60 fps, and (3) synchronization with the infotainment main unit. Additionally, get to know the overall software structure and see how graphical effects were implemented. The virtual cockpit is available in single-display and dual-display configurations. The single-display configuration is used for sport models, like the TT and R8, where the output of the infotainment main unit is integrated into the instrument cluster. In contrast, the dual-display configuration additionally features a ""standard"" main unit display.

Level: Intermediate
Type: Talk
Tags: Self-Driving Cars & Automotive ; Embedded; Real-Time Graphics; Press-Suggested Sessions: Self-Driving Cars & Auto

Day: Wednesday, 04/06
Time: 09:30 - 09:55
Location: Room LL21E

S6460 - Coupling GPUDirect RDMA and InfiniBand Hardware Multicast Technologies for Streaming Applications

Dhabaleswar K. (DK) Panda Professor and University Distinguished Professor, The Ohio State University
Highly-Rated Speaker
Dhabaleswar K. (DK) Panda is a professor and university distinguished scholar of computer science and engineering at the Ohio State University. He has published over 350 papers in major journals and international conferences. The MVAPICH2 (High Performance MPI over InfiniBand, iWARP and RoCE) open-source software package, developed by his research group, are currently being used by more than 2,450 organizations in 76 countries around the world. This software has enabled several InfiniBand clusters to get into the latest TOP500 ranking during the last decade. More than 293,000 downloads of this software have taken place from the project's website alone. He is an IEEE fellow and a member of ACM.

Learn recent developments in middleware design to boost performance of GPU-based streaming applications. Several runtimes already support and optimize GPU communication using various CUDA features. Similarly, some runtimes use InfiniBand hardware multicast to boost broadcast performance for host-based communications. We'll focus on challenges in combining and fully utilizing GPUDirect RDMA and hardware multicast technologies in tandem to design support for high-performance broadcast operation for streaming applications. Further, we'll present associated challenges and designs for clusters with multi-HCA and multi-GPU configurations and MPI_Bcast operations performance evaluation of the proposed designs will be presented and analyzed.

Level: Intermediate
Type: Talk
Tags: Supercomputing & HPC; Tools & Libraries; Performance Optimization

Day: Wednesday, 04/06
Time: 09:30 - 09:55
Location: Room 211A

S6533 - Accelerating Neural Engineering: Closing the Loop on Brain Stimulation

Adam Lichtl Founder, Delta Brain Inc.
Adam Lichtl recently founded Delta Brain Inc. with the goal of using cutting-edge engineering and computing to facilitate healthy brain function. Before that, he was Director of Research at SpaceX, where he built up world-class teams in the areas of combustion simulation, machine learning, and analysis. He received his BS in Physics from Caltech at the age of 19, followed by an MBA and PhD in Computational Physics from Carnegie Mellon University. In addition to his time at Delta Brain Inc. and SpaceX, Adam has held positions as a postdoctoral fellow at Brookhaven National Lab, and as a quant at Morgan Stanley overseeing Global Base and Precious Metals Strategies.

Brain stimulation has been FDA approved for the modulation of a variety of drug-resistant mental disorders, ranging from extreme depression to Parkinson's, and we are just beginning to understand the neural circuitry involved. Most of the interesting circuitry is deep within the brain, making it difficult not only to stimulate, but also to monitor for improvement. We'll present the latest work in this area, as well as Delta Brain Inc.'s novel system-level approach to the brain: combining brain imaging, GPU-accelerated scientific computing, and targeted stimulation to create an end-to-end treatment protocol to generate healthy brain function in the people who need it most.

Level: All
Type: Talk
Tags: Medical Imaging; In-Situ and Scientific Visualization; Computational Physics; Press-Suggested Sessions: HPC & Science

Day: Wednesday, 04/06
Time: 09:30 - 09:55
Location: Room 212B

S6552 - Subgridded FDTD on GPU Allows Rapid Design of Implantable and Wearable Technology

Chris Mason Product Manager, Acceleware
Highly-Rated Speaker
Chris Mason is the product manager in charge of Acceleware's accelerated electromagnetic product line. He is responsible for the development and launch of Acceleware products used by companies worldwide. Chris has 10 years of experience in developing commercial applications for the GPU and multi-core CPUs. His previous experience also includes parallelization of algorithms on digital signal processors for cellular phones and base stations. His specialty is in electromagnetic simulations, medical imaging, signal processing, and linear algebra. Chris has an M.S. in electrical engineering from Stanford University.

Join Acceleware and SPEAG/Zurich MedTech to learn how GPU-enabled subgridding for the finite difference time domain (FDTD) algorithm can substantially reduce runtimes for electromagnetic simulations of human interface technology. We'll focus on real-life examples, including an RF-powered contact lens, a wireless capsule endoscopy, and a smart watch. We'll also outline the basics of the subgridding algorithm along with the GPU implementation and the development challenges. Performance results will illustrate the significant reduction in computation times when using a localized subgridded mesh running on an NVIDIA Tesla GPU.

Level: Intermediate
Type: Talk
Tags: Signal & Audio Processing; Computational Biology; Computer-Aided Engineering

Day: Wednesday, 04/06
Time: 09:30 - 09:55
Location: Marriott Salon 2

S6589 - Algorithmic Trading Strategy Performance Improvement Using Deep Learning

Masahiko Todoriki Assistant Vice President, Mizuho Securities. Co., Ltd.
Masahiko Todoriki is the project manager/lead developer of AI platform project in Mizuho Securities Co. Ltd., in the Wholesale IT Strategy Department. He works as a quantitative analyst in the Sales Trading Department to provide feedback of research and analysis using the latest technologies. From 2009 to 2014, he worked on the development, team management, and execution performance analysis of algorithmic trading strategies in Mizuho. Prior to Mizuho, he ran a firm, consulting and developing algorithmic trading strategies for FX and commodity futures. Masahiko majored in pure physics in Waseda University.

Learn how we improved stock price prediction accuracy in 30 minutes compared to two hours for instruments listed in Tokyo Stock Exchange. We used GPGPUs to speed up data preprocessing and deep learning. As a part of training dataset, we used ticks (every single trade) and quotes (every single change in order book) to detect micro changes of the market. As a result, we get constantly better accuracy than the historical probability.

Level: Intermediate
Type: Talk
Tags: Finance; Deep Learning & Artificial Intelligence; Big Data Analytics

Day: Wednesday, 04/06
Time: 09:30 - 09:55
Location: Marriott Salon 1

S6791 - Black Hornet: The World's First Operational Nano UAV and the Way Ahead

Jon Lund Research and Development Software Engineer, Prox Dynamics AS
Jon Lund is a research software engineer for Prox Dynamics AS since 2015. He has a B.S. in physics and a M.S. in neural systems and computation.

The Black Hornet Nano UAV form Prox Dynamics AS is about to become much more capable thanks to the NVIDIA's Tegra K1. We'll describe the unique and special challenges related to Nano UAVs for professional and military use. The Black Hornet helicopter weighs less than 20 grams, yet it can fly for 25 minutes, one mile out, in wind and rain, while transmitting live video to the operator. To satisfy future demands, the next generation Black Hornet will be even more capable. It will fly without GPS, outdoors and indoors, it will have collision avoidance systems, mapping features and recognition capabilities. The Tegra K1 will ensure that it lives up to expectations. While details of the next generation system is highly confidential, some challenges and result so far will be discussed.

Level: All
Type: Talk
Tags: Robotics & Autonomous Machines; Embedded; Aerospace & Defense; Video & Image Processing

Day: Wednesday, 04/06
Time: 09:30 - 09:55
Location: Room LL20D

S6795 - Camera-Based HMD for VR/AR Featuring Inside-Out Positional, Hand and Environment Tracking

Bertrand Nepveu CEO & Founder, Vrvana
Hardcore gamer since Donkey Kong on the Coleco Vision. Tried the PowerGlove on the NES and been obsessed with VR since then. Early adopter, he started Vrvana in 2005 to improve his gaming experience so that he can now be inside the games. Engineer and geek at heart, he gathered the best and brightest in Montreal to realize is long time dream of creating the ultimate Mixed Reality headset.

We present a Head-Mounted Display (HMD) system to achieve both virtual and augmented reality. Positional, hand and environment tracking are achieved by inside-out approaches, meaning that all required tracking components are part of the HMD, without the need to setup and use external input components. The system uses computer vision methods and data fusion from multiple camera sensors to automatically perform the different tracking tasks. Low latency is achieved by implementing the computer vision analysis on a GPU in an external computer, and sharing GPU data between the tracking threads. Part of the processing is also done on an FPGA embedded in the HMD. Applications include immersive gaming, augmented presentations and environment scanning from active stereo.

Level: Intermediate
Type: Talk
Tags: Virtual Reality & Augmented Reality; Computer Vision & Machine Vision; Video & Image Processing

Day: Wednesday, 04/06
Time: 09:30 - 09:55
Location: Room LL20C

S6812 - Deep Reinforcement Learning

Pieter Abbeel Professor, UC Berkeley
Pieter Abbeel (Associate Professor, UC Berkeley EECS) works in machine learning and robotics, in particular his research is on making robots learn from people (apprenticeship learning) and how to make robots learn through their own trial and error (reinforcement learning). His robots have learned: advanced helicopter aerobatics, knot-tying, basic assembly, and organizing laundry. He has won various awards, including best paper awards at ICML and ICRA, the Sloan Fellowship, the Air Force Office of Scientific Research Young Investigator Program (AFOSR-YIP) award, the Office of Naval Research Young Investigator Program (ONR-YIP) award, the DARPA Young Faculty Award (DARPA-YFA), the National Science Foundation Faculty Early Career Development Program Award (NSF-CAREER), the Presidential Early Career Award for Scientists and Engineers (PECASE), the CRA-E Undergraduate Research Faculty Mentoring Award, the MIT TR35, the IEEE Robotics and Automation Society (RAS) Early Career Award, and the Dick Volz Best U.S. Ph.D. Thesis in Robotics and Automation Award.

Deep learning has enabled significant advances in supervised learning problems such as speech recognition and visual recognition. Reinforcement learning provides only a weaker supervisory signal, posing additional challenges in the form of temporal credit assignment and exploration. Nevertheless, deep reinforcement learning has already enabled learning to play Atari games from raw pixels (without access to the underlying game state) and learning certain types of visuomotor manipulation primitives. I will discuss major challenges for, as well as some preliminary promising results towards, making deep reinforcement learning applicable to real robotic problems.

Level: All
Type: Talk
Tags: Deep Learning & Artificial Intelligence; Robotics & Autonomous Machines

Day: Wednesday, 04/06
Time: 09:30 - 09:55
Location: Room 210H

S6123 - Effects of GPU, AAD and XVA on the Future Computing Architecture of Banks

Pierre Spatz Head of Quantitative Research, Murex
Pierre Spatz heads the quantitative analysis team of Murex, a world leader in trading and risk management software. He holds a M.S. in computer engineering and applied mathematics from ENSIMAG in Grenoble, France.

The 2008 crisis has tremendously changed the way we approach financial computing in the banks. While the complexity and diversity of traded products have been reduced, volumes and regulatory computations needs have exploded while budgets became tight and we do not see any relief in the future. Several solutions including GPU– powerful parallel coprocessor - , AAD– an algorithm - or both of them have been implemented to cope with today workload. All these methods imply at least a partial rewrite of the code. We will come back on our experience and see how well each solution fit different test cases with current or future hardware and extrapolate how the future calculation servers of banks will look like.

Level: All
Type: Talk
Tags: Finance

Day: Wednesday, 04/06
Time: 10:00 - 10:50
Location: Marriott Salon 1

S6198 - The Latest in High Performance Desktops with VMware Horizon and NVIDIA GRID™ vGPU

Pat Lee Sr. Director, Remote Experience, VMware
Pat Lee is the Senior Director, Mobile Experience for VMware Desktop and Application products. The Mobile Experience team is responsible for 3D graphics, remote display protocols, remote device access, desktop clients, thin clients, web clients, and mobile clients. Since joining VMware in 2007, Pat has held multiple roles in product management and product marketing. Prior to VMware, Pat held multiple product management and marketing roles at Dantz Development and EMC. Pat earned a BA in Physics from the University of California, Berkeley.
Luke Wignall Manager, GRID Performance Engineering, NVIDIA
Highly-Rated Speaker
Luke came to NVIDIA after working as an owner of an integrator/VAR, as a sales engineer, solution architect, consultant, and system administrator with both VMware and Citrix technologies in both public and private industry. An early evangelist of virtualization, Luke saw the ability to bring GPU to the end user experience as the missing "special sauce" that brings virtual desktops to the next level. Now managing the NVIDIA GRID Performance Engineering Lab, his focus is on performance and scalability to deliver the best value with the highest end user experience across all virtual workloads.

Hear about the latest advances in 3D desktops with VMware Horizon and NVIDIA GRID. Until now, delivering high performance graphics workstations remotely was cost prohibitive and complicated to setup and deliver. With VMware Horizon and NVIDIA GRID vGPU, there has never been a better time to deliver high performance 3D desktops in a cost effective manner that is simple to setup and deploy bring your customers the security, performance, reliability, and collaboration needed to transform their business.

Level: Intermediate
Type: Talk
Tags: Graphics Virtualization; Product & Building Design

Day: Wednesday, 04/06
Time: 10:00 - 10:50
Location: Marriott Salon 4

S6212 - Complex Application Proxy Implementation on the GPU Through Use of Kokkos and Legion

Geoff Womeldorff Scientist, Los Alamos National Laboratory
Geoff Womeldorff is a computational scientist with a background in mathematics and centroidal Voronoi tessellations. He has experience in numerical methods and parallel frameworks for multi-scale models for ocean, and algorithms for communication aggregation thereof. His interests also include the coupling between proxy applications, their hosts, and programming models, codesign interactions, and parallel frameworks and algorithms, in general.

We'll present research on the implementation, performance, and optimization of a complex application kernel, dim3_sweep of SNAP, a neutral particle transport proxy, in CUDA through the use of the Kokkos programming model. Examples will be given of kernel performance measurements and optimization techniques enabled through the use of Kokkos. In addition, we'll discuss efforts to couple the coarse-grained parallelism of SNAP, as implemented in Legion, a task-based programming model, and the fine-grained aspects, as implemented in Kokkos and CUDA, and how that coupling compares and contrasts to the native MPI+OpenMP of SNAP.

Level: Advanced
Type: Talk
Tags: Tools & Libraries; Computational Physics; Performance Optimization

Day: Wednesday, 04/06
Time: 10:00 - 10:25
Location: Room 211B

S6249 - How to Deal with Radiation: Evaluation and Mitigation of GPUs Soft-Errors

Paolo Rech Associate Professor, UFRGS
Paolo Rech is an associate professor at the Federal University of Rio Grande do Sul, Porto Alegre, RS, Brazil. Paolo received his M.S. and Ph.D. from Padova University, Padova, Italy, in 2006 and 2009, respectively. His studies included radiation tests and the effect of neutrons, protons, and alpha particles on programmable devices like FPGAs and systems on chip. He was a postdoc at LIRMM, Montpellier, France from 2010 to 2012, working on radiation effects on electronic devices at high altitudes. Recently, he started collaborations with NVIDIA, AMD, Northeastern University, and Los Alamos National Lab to evaluate and mitigate the radiation-induced effects in devices designed for large-scale HPC centers and in heterogeneous systems for automotive and aerospace markets.

We will disclose the basics of radiation-induced effects on GPUs and propose effective solutions to mitigate them. The session will start with an exhaustive description of the physics mechanisms that induce ionizing particles to generate failures. Then, taking advantage of data gathered in four years of GPUs neutron beam tests, we evaluate GPUs' error rate in realistic applications and identify GPUs' weaker resources. Observed errors are also compared with Titan field data and automotive market reliability constraints. Additionally, mitigation strategies like ECC and software-based hardening solutions are analyzed and experimentally evaluated. Finally, we will advise on how to implement parallel algorithms and distribute threads in the more efficient and reliable way.

Level: Intermediate
Type: Talk
Tags: Supercomputing & HPC; Self-Driving Cars & Automotive

Day: Wednesday, 04/06
Time: 10:00 - 10:25
Location: Room 211A

S6252 - Developing Software Architectures for Autonomous Driving Vehicles

Sebastian Ohl Senior Expert, Driver Assistance, Elektrobit
Sebastian Ohl is senior expert for driver assistance at Elektrobit (EB), where he is responsible for the development of software for autonomous driving. An instrumental member of EB’s engineering team, Ohl previously led the development of electronic control unit software for EB customer Volkswagen Group.O Sebastian holds a Ph.D. in electrical engineering and a computer science degree from the Technical University at Braunschweig

Modern vehicle functions like advanced driver assistance systems (ADAS) or even fully autonomous driving functions have a rapidly growing demand for high performance computing power. To fulfill fail-operational requirements of autonomous driving functions, the next generation of a vehicle infrastructure platform has to ensure the execution of safety critical functions with high reliability. In addition the "always connected" feature, needed for autonomous driving, should be protected by the powerful security mechanisms. We'll show how the requirements of ADAS can be fulfilled in an efficient way, on both system and software architecture levels, using the example of automated valet parking from Elektrobit.

Level: Intermediate
Type: Talk
Tags: Self-Driving Cars & Automotive

Day: Wednesday, 04/06
Time: 10:00 - 10:25
Location: Room LL21E

S6272 - Deep Learning Algorithms for Recognizing the Features of Facial Ageing

Konstantin Kiselev Data Scientist, Youth Laboratories
Konstantin Kiselev conducts research in scientific startup, Youth Laboratories (which is co-founded by Dr. Alex Zhavoronkov), in the field of computer vision and deep learning and holds a position of lead data scientist in the big data project for TechnoServ, a top 5 Russian IT company, and Beeline, a top 3 Russian mobile operator. Konstantin holds an M.S. in theoretical physics from Lomonosov Moscow State University. He has a broad experience in software development of high-load systems and extensive knowledge in machine learning and big data. From 2014 to 2015, he was development lead for large IT systems for the Russian government at LANIT, a leading Russian software company. He received additional education in big data and machine learning fields, took first place in the Microsoft Machine Learning Hackathon (June 2015), and participated in the deep learning team competition organized by MIPT (deephack.me, mipt.ru/en/, July 2015).

We'll discuss DNN applications for determination of main facial skin biomarkers using a face photo. While there are a lot of other factors that enable to determine human age with high accuracy, the most obvious factor is how your face looks. Tracking face wrinkles enables us to track not only skin ageing process as such, but also the results and efficiency of treatment used. By following the dynamics of wrinkles appearance, it is possible to find out which treatment is more suitable for a particular face or skin type and hence provide recommendations.

Level: Intermediate
Type: Talk
Tags: Deep Learning & Artificial Intelligence; Computer Vision & Machine Vision; Press-Suggested Sessions: AI & Deep Learning

Day: Wednesday, 04/06
Time: 10:00 - 10:25
Location: Room 210H

S6363 - Algorithms for Auto-Tuning OpenACC Accelerated Kernels

Saber Feki Computational Scientist, King Abdullah University of Science and Technology
Saber Feki is a computational scientist at the KAUST Supercomputing Laboratory, where he contributed to the procurement of the Shaheen XC40 supercomputer. He received a Ph.D. in computer science from the University of Houston in 2010. He then joined the oil and gas company TOTAL as an HPC research scientist. His research interests include seismic imaging and computational electromagnetic applications using different programming models and automatic performance tuning of MPI communications and OpenACC accelerated applications on GPUs.

We'll present optimization techniques using different machine learning and derivative-free search algorithms, individually and in hybrid combinations, for auto-tuning parameters in OpenACC clauses for a stencil evaluation kernel executed on GPUs. We compare execution time performance of several auto-tuning techniques. These optimization algorithms will be evaluated over a large two-dimensional parameter space not satisfactorily addressed to date by OpenACC compilers, consisting of gang size and vector length. A hybrid of historic learning and Nelder-Mead delivers the best balance of high performance and low tuning effort.

Level: Intermediate
Type: Talk
Tags: Performance Optimization; Programming Languages; Algorithms; OpenACC

Day: Wednesday, 04/06
Time: 10:00 - 10:25
Location: Room 212A

S6368 - Hardware Architecture Considerations for Building Efficient GPU Cluster for Accelerated CNN Training

Jianxiong YIN Research & Development Engineer, Nanyang Technological University
Jianxiong Yin is a data communication system researcher at Nanyang Technological University (NTU), Singapore, where he researches and optimizes system architecture for improved performance and efficiency. Jianxiong's work in the system architecture domain has been recognized by top-tier conferences and reputable supercomputing competitions. His work won the industry award for the Cloud3DView Project and Jianxiong's team was awarded the 2015 Data Center Dynamics Asia Pacific Award. Jianxiong is now responsible for the development of deep learning infrastructure in ROSE lab, NTU, Singapore, and jointly working for NVIDIA Technology Centre at Singapore in deep learning and HPC application development. Jianxiong received his M.S. from Yonsei University, South Korea, in 2012, and a B.S. from South China University of Technology in 2009.
Pradeep Gupta Senior Solutions Architect, NVIDIA
Pradeep Gupta is a lead deep learning solutions architect at NVIDIA, where he supports customers and developers across the Asia Pacific, Japan, and India regions for deep learning and HPC application development. Pradeep also works to enable the GPU computing ecosystem in universities and research labs across the region. Pradeep is responsible for running and managing R&D projects at the NVIDIA Technology Centre in Singapore. He is working on smart cities enablement with the GPU computing initiative at NVIDIA. Before joining NVIDIA, he worked with various technologies in high performance computing domains. Pradeep received an M.S. in research from the Indian Institute of Science (IISc), Bangalore. His research focused on developing compute-efficient algorithms. He has numerous publications in IEEE, SPIE, and other reputed conferences.

Learn how the hardware architecture difference in terms of training infrastructure will affect the CNN training process, and what are the design principles of building an efficient CNN training cluster, what are the key metrics you should be watching, and what is the reference architecture and how it's been developed since from traditional IT server architecture to HPC architecture.

Level: Intermediate
Type: Talk
Tags: Data Center & Cloud Computing; Deep Learning & Artificial Intelligence; Supercomputing & HPC

Day: Wednesday, 04/06
Time: 10:00 - 10:50
Location: Room 210E

S6451 - Local Statistical Filtering through Concurrent Domain Dissection for Medical Imaging

Nikos Pitsianis Assistant Professor, Aristotle University of Thessaloniki, Greece
Nikos Pitsianis is an assistant professor at the Department of Electrical and Computer Engineering, Aristotle University of Thessaloniki, Greece, and an adjunct professor with the Department of Computer Science, Duke University, Durham, North Carolina. His research interests include high-performance algorithms and architectures for signal and image processing. He holds a Ph.D. in Computer Science from Cornell University.

We'll present a new parallel scheme for local statistical filtering (LSF), which is indispensable to high-fidelity medical image analysis but still challenges efficient solutions, due to range-value dependencies and irregular data accesses. The new scheme maintains high-degree concurrency and makes efficient use of advanced GPU/CUDA features. Experimental results are presented with 4D-CT images and associated deformation fields.

Level: Intermediate
Type: Talk
Tags: Medical Imaging; Algorithms

Day: Wednesday, 04/06
Time: 10:00 - 10:25
Location: Room 212B

S6461 - Advancements in a GPU Monte Carlo Simulator for Radiotherapy

Nick Henderson Research Associate, Stanford University
Highly-Rated Speaker
Nick Henderson is a Research Associate and Instructor at Stanford University. His primary affiliation is with Stanford's Institute for Computational and Mathematical Engineering.

We'll describe several advancements in our efforts to build a high-performance GPU Monte Carlo simulator for radiotherapy. The central idea is an algorithm that reduces thread divergence and run-time memory requirements compared to previous methods. The method presented also enables extensions to other applications such as the nanoscale interaction of DNA and ionizing radiation, which require larger number of physics models and particle types. Details of the performance analysis may also be applied to other Monte Carlo methods that rely on process selection as a part of the simulation.

Level: Intermediate
Type: Talk
Tags: Computational Physics; Performance Optimization; Medical Imaging

Day: Wednesday, 04/06
Time: 10:00 - 10:25
Location: Marriott Salon 6

S6618 - Trinity: A Novel Visualization and Data Distribution System

Jens Krüger Professor, CoViDAG, University Duisburg-Essen
Since 2013, Jens Kruger has been chair of the high performance computing group at the University of Duisburg-Essen. He also holds an adjunct professorship from the University of Utah and is a principal investigator of multiple projects in the Intel Visual Computing Institute at Saarland University. Jens studied computer science at the Rheinisch-Westfalische Technische Hochschule Aachen, where he received his diploma in 2002. In 2006, he finished his Ph.D. at the Technische Universitat Munchen and, after postdoc positions in Munich and at the Scientific Computing and Imaging Institute, he became a research assistant professor at the University of Utah. In 2009, he joined the Cluster of Excellence Multimodal Computing and Interaction at Saarland University to head the Interactive Visualization and Data Analysis group.
Andrey Krekhov Head of HCI Department, CoViDAG, CoViDAG, University Duisburg-Essen
Andrey Krekhov is employed at the HPC Group in Duisburg-Essen and heads the Human-Computer Interaction department of the "Center of Visual Data Analysis and Computer Graphics - CoViDAG". Andrey received his B.S. and M.S., with honors, in computer science from the Saarland University in 2011 and 2012, respectively.

Scalability matters. Today more than ever, if we consider all the available computation resources, GPU farms, and cloud solutions. Designing a highly adaptive and still user- and developer-friendly visualization system requires us to rethink existing visualization pipelines and to develop with scalability in mind. This session gives you insight in our novel "Trinity" system that separates frontends, processing and data nodes, interconnected by a simple, easy-to-use API. Browse data sets in the cloud, render them on the NVIDIA GPU cluster, and display the result on your phone or on a display wall -- the plethora of application scenarios take visualization to a whole new level.

Level: Intermediate
Type: Talk
Tags: In-Situ and Scientific Visualization

Day: Wednesday, 04/06
Time: 10:00 - 10:50
Location: Room LL21D

S6623 - Advances in NAMD GPU Performance

Antti-Pekka Hynninen Computational Scientist, Oak Ridge National Laboratory
Antti-Pekka is a computational scientist in biophysics at Oak Ridge National Laboratory (ORNL), where he focuses on the software development and INCITE user support of NAMD biomolecular modeling application. In particular, Antti-Pekka is interested in using GPUs to their fullest potential to enable fast and scalable molecular dynamics. Prior to joining ORNL in 2014, Antti-Pekka worked at National Renewable Energy Laboratory, where we rewrote much of the CHARMM molecular dynamics engine to be faster, more parallel, and support GPU acceleration. Antti-Pekka holds a PhD in physics from Utrecht University and he did his postdoctoral research at Princeton University on Monte Carlo simulations of charged colloids.

Learn about recent performance improvements in the GPU acceleration of NAMD biomolecular modeling application. These improvements include performance gains in the non-bonded CUDA kernels and new GPU-only implementation of Particle Mesh Ewald (PME) reciprocal computation. We will describe in detail the changes made in the non-bonded CUDA kernels that give 1.4-1.7 times better performance compared to the previous version. We will describe the new PME reciprocal code that enables computation on multiple GPUs and gives performance that is between 1.4-1.8 times faster than the previous code.

Level: Intermediate
Type: Talk
Tags: Computational Chemistry; Performance Optimization; Supercomputing & HPC

Day: Wednesday, 04/06
Time: 10:00 - 10:25
Location: Marriott Salon 5

S6635 - Portable Performance for Monte Carlo Simulations of Photon Migration in 3D Turbid Media for Single and Multiple GPUs

Leiming Yu PhDc Computer Eng., Northeastern University
Leiming Yu is a Ph.D. candidate in computer engineering at Northeastern University. He belongs to the Northeastern University Research Group (NUCAR) under supervision of Dr. David Kaeli. He has been involved on general purpose computing on GPUs, performance optimization and modeling and high performance computing. He has been published in different venues, such as International Workshop on OpenCL 2015, ALLDATA 2015, Proceedings of Workshop on General Purpose Processing Using GPUs, ACM 2015, Boston Area Architecture Workshop 2015, and ICPE 2015.
Fanny Nina Paravecino Ph.D. Computer Engineer, Northeastern University
Fanny Nina Paravecino is a Ph.D. candidate in computer engineering at Northeastern University. She belongs to the Northeastern University Research Group (NUCAR) under supervision of Dr. David Kaeli. She received her B.S. summa cum laude in computer engineering from University of San Antonio Abad of Cusco in Peru in 2005. She received an M.S. in computer engineering from University of Puerto Rico at Mayaguez in 2011. She achieved the best grade for her undergrad thesis titled "Virtual framework to simulate Industrial Robot" using OpenGL 3D graphics with C#. Her research interests focus on high-performance optimization of image processing algorithms on parallel architecture. She has been published in different venues such as: IWOCL, ICPE, BARC, ICCVG, SPIE, and Gordon-CENSSIS, among others. She has also been highlighted in Woman & CUDA on the NVIDIA website.

We present a parallel Monte Carlo (MCX) algorithm accelerated by GPUs for modeling time-resolved photon migration in a 3-D turbid media. We'll present optimizations that benefit execution on a single GPU as well as multiple GPUs. By leveraging persistent threads, our single-GPU implementation provides a high-performance parallel simulation of MCX when run on an NVIDIA GPU. Our implementation is automatically tuned to leverage persistent threads for different GPU architectures. We achieved improvements over 25% for Kepler and 12% for Maxwell architecture as compared to using a heuristic approach. In addition, we propose a linear programming approach based on predictive modeling to optimize MCX execution on multiple devices.

Level: Intermediate
Type: Talk
Tags: Algorithms; Performance Optimization; Rendering & Ray Tracing

Day: Wednesday, 04/06
Time: 10:00 - 10:25
Location: Marriott Salon 3

S6641 - HD GP-GPU Systems for HPC Applications

Sergio Tafur Physicist, Naval Research Laboratory
Sergio Tafur is a physicist working in the Computational Science Division of the Naval Research Laboratory, Code 5594. Sergio holds a Ph.D. in physics from the University of Central Florida and was inducted into his alma mater's Order of Pegasus in 2010. He has served as an XSEDE Campus Champion from 2011 to 2013, and since 2004 has supported optimizing scientific and engineering computing workflows, implementing and administering supercomputing HPC/HTC-related systems, establishing research networks, as well as the parallelization of scientific computing algorithms and visualization efforts. Sergio supports NRL's computational science community resolve challenges in traditional, and non-traditional supercomputing high performance and high throughput computing workflows, by leveraging existing and emerging technologies such as MPI, CUDA, and Intel MIC computing environments.
Christopher Kung Computational Scientist, Engility
Christopher Kung is a computational scientist with Engility under the DoD’s User Productivity Enhancement, Technology Transfer, and Training (PETTT) initiative. He supports scientists and engineers from the Army, Navy, and Air Force with algorithm development, code porting, and specialized training to further their research and execute their mission with the use of HPC from the DoD Supercomputer Resource Centers. His background is in computational electromagnetics but he has interest in novel architectures for HPC applications and deep learning.

We'll be presenting how we fielded a High Density (HD) GP-GPU system, currently 227 on the Top 500, evaluated its performance, and overcame challenges that arose during testing phases. In addition, we will touch on using Python to code for and "glue" CPUs and GP-GPUs together in such HD GP-GPU systems.

Level: All
Type: Talk
Tags: Aerospace & Defense; Supercomputing & HPC; Algorithms

Day: Wednesday, 04/06
Time: 10:00 - 10:25
Location: Marriott Salon 2

S6732 - RealityCapture: A New Software for VR Content Creation

Michal Jancosek Managing Partner, Capturing Reality
Michal Jancosek is a managing partner and co-founder of Capturing Reality. He was a part of several EU research projects like COSPAL, DIRAC, ProVisG, and PRoViScout. His main research is in 3D reconstruction from images. Michal is the author of implementation of non-commercial 3D reconstruction software called CMPMVS. He has Ph.D. from Center for Machine Perception, Department of Cybernetics, Faculty of Electrical Engineering, Czech Technical University in Prague.
Stanislava Jancosek Consultant, Capturing Reality
Stanislava received her M.D. degree from the First Faculty of Medicine, Charles University, Prague in 2011.

Creating realistic 3D content for virtual reality worlds is time consuming and difficult using traditional methods. Our software allows our customers to drastically reduce the time and difficulty. GPU processing is one of the tools that allows our software to push the limits. We'll show results from our customers and describe the basic parts of our pipeline. We'll provide statistics with the main focus on GPUs.

Level: All
Type: Talk
Tags: Virtual Reality & Augmented Reality; Media & Entertainment

Day: Wednesday, 04/06
Time: 10:00 - 10:25
Location: Room LL20C

S6759 - CAVE 2.0: The World's Largest Virtual Reality Cluster at PSA Peugeot Citroën

Matthieu Mika Virtual Reality Engineer, PSA Peugeot Citroën
Matthieu Mika has worked in PSA Peugeot's R&D department since 2009, starting as a Virtual Reality Engineer. Since 2013, he has been involved in the areas of Real Time Rendering for Immersive systems and the new CAVE project. He is a graduate of the Polytech University of Paris Sud Orsay with a master's degree in in computer science and engineering.
Alain Gonzalez Expert Workstations Graphics Systems & 3D Imagery, PSA Peugeot Citroën
Alain Gonzalez has worked in PSA Peugeot's IT department since 2000, starting as a workstations IT architect. Since 2009, he has been involved in the areas of expert workstations graphics technologies and 3-D imagery. He is a graduate of the University of Paris Sud Orsay with a master's degree in computer science and engineering.
Benoit Bastien Workstation Sales Lead - France, Dell
Benoit Bastien is working as workstation expert since 2006 by several OEM. Currently leading professional workstation business for Dell in France, he’s involved in multiple 3D related projects. He is a graduate of Clermont-Ferrand Business School with a master’s degree in IT management.

4K, stereoscopy, 53 million pixels, 400 TFlops, 56 Gbits high speed network, 10 tons of steel, 5 tons of glass, 2 miles of optical fiber, 3 miles of network cables and ...70 NVIDIA Quadro M6000 pro GPUs. This presentation will recap how PSA Peugeot Citroën managed a CAVE 2.0 implementation. The session will cover all aspects from end-user requirements to the final setup in order to meet all stakeholder needs: vehicle architecture, process, HMI, car interior and exterior styling, perceived quality.

Level: All
Type: Talk
Tags: Product & Building Design; Virtual Reality & Augmented Reality; Large Scale and Multi-Display Visualization; Press-Suggested Sessions: Professional Graphics

Day: Wednesday, 04/06
Time: 10:00 - 10:50
Location: Room LL21A

S6770 - GPU Image Processing on Giant Surfaces

Thomas Soetens Founder , Immersive Design Studios
Thomas Soetens graduated in 1992 with an MFA in Visual Arts from the St-Lucas School of Arts in Belgium. After practicing as a painter, he co-founded Workspace Unlimited in 2001 and founded Immersive Design Studios in 2007 where he currently acts as its research and development director. Immersive Design Studios is an interdisciplinary design and technology company based in Montreal utilizing the potential of 3D game technology in entertainment, corporate events, architecture, cultural installations, and real-time collaborative environments. His work has been highlighted in numerous publications, such as the New York Times, la Presse, Space Time Play (Birkhauser 2007).

We'll discuss how we are bridging the transition from FPGA to GPU-based image processing with their proprietary software - CANVAS: a GPU image-processing platform designed for large-scale projections and immersive experiences of realtime game engine content and video.

Level: All
Type: Talk
Tags: Media & Entertainment; Large Scale and Multi-Display Visualization; Virtual Reality & Augmented Reality; Video & Image Processing

Day: Wednesday, 04/06
Time: 10:00 - 10:25
Location: Room LL21C

S6773 - Sense Making in an IOT World: Sensor Data Analysis with Deep Learning (Presented by Hewlett Packard Enterprise)

Natalia Vassilieva Research Group Manager, Hewlett Packard Labs, Hewlett Packard Enterprise
Natalia Vassilieva is a Senior Research Manager in the Software and Analytics Lab at Hewlett Packard Labs. She leads research teams developing algorithms and applications for The Machine, a new type of computer architecture currently being developed by Hewlett Packard Enterprise. She joined HP Labs at 2007 as a research engineer, and later served as the head of HP Labs Russia from 20011 till 2015. In 2012-2015 Natalia also served as a part-time Associate Professor at St. Petersburg State University and a part-time lecturer at Computer Science Center, St. Petersburg, Russia. Before joining HP Natalia worked as a Software Engineer for different IT companies in Russia from 1999 till 2007. Natalia holds PhD in computer science from St. Petersburg State University.

Applications of deep learning in sensor data analysis has not been studied as extensively as in speech and vision. However, sensor data have properties similar to those of images and audio: multidimensional, with intrinsic dependencies and correlations in the data, and hard to analyze with conventional approaches. Our results prove that deep learning has better generalization capabilities compared to conventional methods on sensor data and has high potential in sensor data analytics. We also address scalability issues of the training process for models best suited for sensor data. The training of these models do not scale-out beyond a certain number of nodes.

Level: All
Type: Talk
Tags: Big Data Analytics; Deep Learning & Artificial Intelligence; IoT

Day: Wednesday, 04/06
Time: 10:00 - 10:50
Location: Room 210G

S6793 - Designing a Wearable Personal Assistant for the Blind: The Power of Embedded GPUs

Saverio Murgia CEO, Horus Technology
Founder and CEO of Horus Technology, Saverio Murgia is passionate about machine learning, computer vision and robotics. Both engineer and entrepreneur, in 2015 he obtained a double MSc/MEng in Advanced Robotics from the Ecole Centrale de Nantes and the University of Genoa. He also owns a degree in management from ISICT and a BSc in Biomedical Engineering from the University of Genoa. Before founding Horus Technology, Saverio was visiting researcher at EPFL and the Italian Institute of Technology.

With the introduction of embedded platforms featuring GPUs with advanced GPGPU capability, it is now possible to design systems and products that are able to extract and process in real time an amount of information not imaginable in the past. An example of what mobile GPGPU computing allows to do is the design of a wearable device that uses deep learning and other computational heavy techniques from Computer Vision and Machine Learning to describe the world to blind and visually impaired people. Horus is a wearable personal assistant for blind and visually impaired people that thanks to its stereo camera and sensor suite can detect obstacles, describe pictures and scenes, identify objects and people and read texts. All the processing is done locally in real time.

Level: All
Type: Talk
Tags: Computer Vision & Machine Vision; Embedded; Deep Learning & Artificial Intelligence

Day: Wednesday, 04/06
Time: 10:00 - 10:25
Location: Room 210F

S6804 - Affordable Persuasive Autonomous Light Electric Vehicles for Moving People and Goods

Sertac Karaman Assistant Professor of Aeronautics and Astronautics, MIT
Sertac Karaman is the Charles Stark Draper Assistant Professor of Aeronautics and Astronautics at the Massachusetts Institute of Technology (since Fall 2012). He has obtained B.S. degrees in mechanical engineering and in computer engineering from the Istanbul Technical University, Turkey, in 2007; an S.M. degree in mechanical engineering from MIT in 2009; and a Ph.D. degree in electrical engineering and computer science also from MIT in 2012. His research interests lie in the broad areas of robotics and control theory. In particular, he studies the applications of probability theory, stochastic processes, stochastic geometry, formal methods, and optimization for the design and analysis of high-performance cyber-physical systems. The application areas of his research include driverless cars, unmanned aerial vehicles, distributed aerial surveillance systems, air traffic control, certification and verification of control systems software, and many others. He is the recipient of an Army Research Office Young Investigator Award in 2015, National Science Foundation Faculty Career Development (CAREER) Award in 2014, AIAA Wright Brothers Graduate Award in 2012, and an NVIDIA Fellowship in 2011.

Autonomous vehicles hold the potential to disrupt urban mobility and logistics. In particular, autonomy-enabled ride sharing systems can reduce transportation delays and emissions, while enhancing safety. We'll outline a number of projects focused on developing autonomy-capable vehicles, including the development of an autonomous persuasive tricycle for dense urban centers (in partnership with Taiwan and Andorra), the development of fully-autonomous electric cars for ride sharing systems (in partnership with Singapore), and the development of cars with advanced safety features (as a part of MIT's collaboration with Toyota). We will focus on the first project, in which the MIT Media Lab, the Laboratory for Information and Decision Systems, and the Center for Logistics and Transportation partnered to evaluate current technology capabilities on a low cost multifunctional lightweight electric autonomous vehicle. This vehicle is intended to give the user a glimpse into the future shared use autonomous vehicle in urban environment that serve both people and goods. We will also outline the use of GPU-based computing technologies that are among the key enablers for this system.

Level: All
Type: Talk
Tags: Robotics & Autonomous Machines

Day: Wednesday, 04/06
Time: 10:00 - 10:25
Location: Room LL20D

S6805 - TensorFlow: Scaling Up Machine Learning

Rajat Monga Technical Lead Manager, Google
Rajat Monga works on the Google Brain Team, where he is the Technical Lead and Manager for TensorFlow – an open source machine learning library, and the center of Google's efforts at scaling up deep learning. Prior to Google, as the Chief Architect and Direction of Engineering at Attributor, Rajat led labs and operations. He also hired the founding engineering team at Attributor to build it out. As a veteran developer Rajat has worked at eBay, Infosys, and a number of startups.

Deep Learning is driving significant advances in what computers can achieve, this talk describes Google's efforts at scaling it up. The scaling is happening in two directions, better software that can leverage the power of many fast processors to make advances in machine learning, and making machine learning be part of every product we use to make it smarter. We'll talk about TensorFlow, the platform behind our efforts at Google, and how as an open source project it brings this same power to everyone.

Level: All
Type: Talk
Tags: Deep Learning & Artificial Intelligence; Data Center & Cloud Computing; Press-Suggested Sessions: AI & Deep Learning

Day: Wednesday, 04/06
Time: 10:00 - 10:50
Location: Grand Ballroom

S6844 - Designing Surface Materials with GPU Ray Tracing

Jean-Daniel Nahmias Technical Director, Pixar
Jean-Daniel Nahmias received his B.Sc., M.Sc and Ph.D from University College London, specializing in virtual reality, augmented reality and computer vision. Before joining Pixar he spent most of his time optimizing algorithms to run quickly on GPUs. This included limited angle tomography reconstruction for breast cancer screening and stereo vision reconstruction. He joined Pixar as a global tech TD to work on productions and is currently developing real time lighting technologies.

We demonstrate how our film artists can create the look and visual style of complex materials interactively using GPU ray tracing. In order to produce a rich and compelling surface appearance, our artists use mathematical functions or images encapsulated in nodes of a shader network. Using NVIDIA's OptiX toolkit we are experimenting with a GPU accelerated interactive physically-based path tracer that enables our artists to create and edit these shader networks, while emulating the subtleties that would traditionally only be visible in a final frame render.

Level: All
Type: Talk
Tags: Rendering & Ray Tracing; Media & Entertainment; Real-Time Graphics; Press-Suggested Sessions: Professional Graphics

Day: Wednesday, 04/06
Time: 10:00 - 10:25
Location: Room LL21B

S6175 - Scientific Simulations on Thousands of GPUs with Performance Portability

Alan Gray Research Architect, EPCC, The University of Edinburgh
Alan Gray was awarded the status of NVIDIA CUDA Fellow in 2014. His research career began in the area of theoretical physics: his Ph.D. thesis was awarded the UK-wide Ogden Prize in 2004 for the best thesis in particle physics phenomenology. He continued this work under a university fellowship at The Ohio State University, before moving to EPCC in 2005. His current research focuses on the exploitation of GPUs to the benefit of real scientific and industrial applications: he has a particular interest in the programming of large-scale, GPU-accelerated supercomputers. Alan leads EPCC's GPU-related activities and is involved in management, teaching, and supervision for the EPCC M.S. in high performance computing. Since 2003, he has authored more than 40 publications, many of which in refereed journals, which have received over 1,500 citations.

"Developing your application for GPUs destroys portability to other platforms." We'll debunk this and other myths as we describe how we have solved the performance-portability challenge, allowing two separate scientific applications (which simulate complex fluids and fundamental particle physics, respectively) to effectively utilize machines such as the world's largest GPU-accelerated supercomputer, Titan at Oak Ridge, while remaining completely portable to multi-core or many-core CPU-based systems when GPUs are unavailable. The key ingredient is a new simplistic abstraction layer called targetDP, which targets data parallel hardware in a platform-agnostic but performance-portable manner.

Level: Beginner
Type: Talk
Tags: Supercomputing & HPC; Computational Physics

Day: Wednesday, 04/06
Time: 10:30 - 10:55
Location: Room 211A

S6261 - VMD+NVIDIA OptiX™: Streaming Interactive Ray Tracing from Remote GPU Clusters to Your VR Headset

John Stone Senior Research Programmer, University of Illinois at Urbana-Champaign
Highly-Rated Speaker
John Stone is a senior research programmer in the Theoretical and Computational Biophysics Group at the Beckman Institute for Advanced Science and Technology, and associate director of the NVIDIA CUDA Center of Excellence at the University of Illinois. John is the lead developer of VMD, a high-performance molecular visualization tool used by researchers all over the world. His research interests include molecular visualization, GPU computing, parallel processing, ray tracing, haptics, and virtual environments. John was named an NVIDIA CUDA Fellow in 2010. In 2015, he joined the Khronos Group Advisory Panel for the Vulkan graphics API. John also provides consulting services for projects involving computer graphics, GPU computing, and high performance computing.

Commodity head-mounted displays (HMDs) offer a tremendous opportunity to make immersive molecular visualization techniques broadly available. HMDs offer the promise of intuitive exploration of large molecular complexes and their dynamics, but their requirement for low-latency, high-frame-rate display presents a formidable challenge for high-quality remote ray tracing at distant HPC centers. We'll present a new, interactive ray-tracing system for remote visualization with HMDs, implemented within the popular molecular visualization tool VMD using a combination of interactive OptiX ray tracing, omnidirectional stereoscopic projection, H.264 video streaming, and high performance OpenGL rasterization.

Level: Intermediate
Type: Talk
Tags: Virtual Reality & Augmented Reality; Rendering & Ray Tracing; In-Situ and Scientific Visualization; Press-Suggested Sessions: Professional Graphics

Day: Wednesday, 04/06
Time: 10:30 - 10:55
Location: Room LL20C

S6305 - 3D Point Cloud Registration Using GPU-Accelerated Expectation Maximization

Benjamin Eckart Ph.D. Student, Carnegie Mellon University
Benjamin Eckart is a Ph.D. candidate with the Robotics Institute at Carnegie Mellon University and an NVIDIA Graduate Fellow. Ben's research focuses on the creation of parallel algorithms for 3D robotic perception. He is currently exploring ways to use many-core architectures such as the GPU to rapidly create compact models to facilitate and unify common low-level perceptive tasks like segmentation, registration, and classification. Ben holds an M.S. in robotics from Carnegie Mellon University, an M.S. in electrical engineering from Tennessee Tech University, as well as a B.S. in computer science and a B.S. in computer engineering.

We'll discuss how to use GPUs to accelerate a common 3D spatial processing application, point cloud registration. Registration, or finding the relative rigid transform between two point clouds, forms a core component of many 3D vision algorithms such as object matching and environment reconstruction. We use the GPU to accelerate this process using a parallelized form of the Expectation Maximization (EM) algorithm. Using this novel EM construction can both accelerate registration as well as provide a natural geometric segmentation of the data, two processes that we show to be highly interrelated at the kernel level when deployed on a GPU. Finally, we discuss how GPU-accelerated registration can be used in the larger context of real-time 3D perception.

Level: Advanced
Type: Talk
Tags: Robotics & Autonomous Machines; Computer Vision & Machine Vision; IoT

Day: Wednesday, 04/06
Time: 10:30 - 10:55
Location: Room LL20D

S6320 - Opticks: Optical Photon Simulation for High Energy Physics with NVIDIA OptiX™

Simon Blyth Postdoctoral Fellow, National Taiwan University
Simon is a High Energy Physicist and Software Developer based at National Taiwan University, Taipei and member of the Daya Bay and JUNO Collaborations. Simon has a D.Phil in Particle Physics from Oxford University. His interests focus on applying techniques from Computer Science within the High Energy Physics community. He is currently working on using GPU ray tracing to accelerate optical photon simulation within photomultiplier based experiments.

Opticks is an open source project that brings NVIDIA OptiX ray tracing to existing Geant4 toolkit based simulations. Advantages of separate optical photon simulation and the approaches developed to integrate it with the general Geant4 particle simulation are presented. Approaches to minimize overheads arising from split are shown. Challenges included bringing complex CSG geometries with wavelength dependent material and surface properties to the GPU. Techniques for visualisation of photon propagations with interactive time scrubbing and history selection using OpenGL/OptiX/Thrust interop and geometry shaders are described. Results and demonstrations are shown for the photomultiplier based Daya Bay and JUNO Neutrino detectors.

Level: All
Type: Talk
Tags: Computational Physics; In-Situ and Scientific Visualization; Rendering & Ray Tracing

Day: Wednesday, 04/06
Time: 10:30 - 10:55
Location: Marriott Salon 6

S6350 - State of the Art Real-Time Graphics for Events, Broadcast and Interactive Content

Erik Beaumont COO, Ventuz Technology AG
Highly-Rated Speaker
Former senior product specialist at Softimage and technical director at Animoto, Erik Beaumont joined Ventuz Technology in 2013 and became its chief operating officer in 2014.

While game engines are advancing at an incredible rate in terms of capabilities and quality, adoption of these advancements in non-games or visualization markets has been slow. We'll look at some of the ways we seek to change that and bring cutting-edge graphics techniques and capabilities to these markets. We'll talk about some of the barriers (such as making these technologies approachable for non-expert designers and artists), some of the advantages, and some of the disadvantages. We'll also explore what direction we think these technologies will go.

Level: Beginner
Type: Talk
Tags: Media & Entertainment; Large Scale and Multi-Display Visualization; Real-Time Graphics; Product & Building Design

Day: Wednesday, 04/06
Time: 10:30 - 10:55
Location: Room LL21C

S6380 - Simulating a Quantum Annealer with GPU-Based Monte Carlo Algorithms

James King Senior Algorithms Researcher, D-Wave Systems
James King is a senior algorithms researcher at D-Wave Systems, where he works as part of the Benchmarking team evaluating D-Wave's quantum annealing processors. James was born and raised in Vancouver, Canada, and studied computer science at Waterloo (B.S.), UBC (M.S.), and McGill University (Ph.D.). His dissertation focused on computational geometry and probabilistic analysis of random data structures. During his postdoctoral research at the University of Oxford, he studied energy landscapes of discrete optimization problems.

Learn how the world's most powerful quantum computers are simulated and benchmarked using GPU-based Monte Carlo algorithms. We'll introduce D-Wave's quantum annealing platform, describe several Monte Carlo algorithms for their simulation, and compare CPU- and GPU-based implementations of these algorithms. In particular, we'll focus on considerations of memory layout and fast mathematical functions to maximize speed. Finally, we'll present benchmarking results, including CPU-based algorithms, GPU-based algorithms, and D-Wave's latest-generation quantum annealers.

Level: Beginner
Type: Talk
Tags: Algorithms; Computational Physics; Supercomputing & HPC

Day: Wednesday, 04/06
Time: 10:30 - 10:55
Location: Marriott Salon 3

S6431 - Advanced Thrust Programming with Policies

Steven Dalton Research Scientist, NVIDIA
Steven Dalton joined NVIDIA Research in July 2014. He completed his Ph.D. in computer science at UIUC, where his research focused on mapping irregular operations on sparse matrices related to algebraic multigrid (AMG) methods to GPU architectures. He is the primary contributor to the Cusp sparse linear algebra research library. He holds two Bachelor of Science degrees from Georgia Institute of Technology in the areas of Physics and Computer Science.

We'll discuss Advanced Thrust design patterns that help to facilitate the construction of complex high-performance libraries. The focus of our presentation will be the definition and use of execution-policies as a means of influencing the performance and execution of Thrust routines. As part of the technical specification on parallelism in the C++17 proposal, execution-policies will be an important feature to effectively design high-performance parallel applications in the future. This means it's imperative that developers start the process of understanding and experimenting with the execution-policy design pattern today.

Level: Advanced
Type: Talk
Tags: Tools & Libraries; Algorithms; Performance Optimization

Day: Wednesday, 04/06
Time: 10:30 - 10:55
Location: Room 211B

S6434 - Missile Defense Radar through Real-Time Electromagnetic Simulation Injection

Ted Selig Director and COO, FishEye Software, Inc.
Ted Selig is focused on technology exploiting the flow and analysis of real-time system data. Ted is director and COO of FishEye Software, a supplier of real-time software and real-time software development, integration, and test for some of the world's most complex civil and national defense systems. His commercial and government experience includes real-time systems, railway information, computer networks, air traffic control, and phase-array radar systems. He holds a B.S. in electrical engineering from the University of Massachusetts and an M.S. in computer systems from Northeastern University. Ted is an inventor on two U.S. patents.

Radars, electromagnetic sensors encoded transmit signals, focus beams, extract targets from noise, and perceive targets and environments. These real-time systems are expensive and risky to build and operate because they are complex, real-time, and difficult to test. The evolution of the GPU has the potential to disrupt this sensor industry by dramatically reducing the cost of radars, accelerate innovation, and reduce sensor maintenance. The presentation will discuss processing techniques and data flow architecture required by these sensors. The discussion explores how GPU adoption can reduce the development costs and risks of sensor development for missile defense but also enable low-cost applications like the self-driving car, weather sensing, and air traffic management.

Level: All
Type: Talk
Tags: Aerospace & Defense; Embedded; Signal & Audio Processing

Day: Wednesday, 04/06
Time: 10:30 - 10:55
Location: Marriott Salon 2

S6449 - Sustainability and Performance through Kokkos: A Case Study with LAMMPS

Christian Trott Senior Member of Technical Staff, Sandia National Laboratories
Christian Trott is a high performance computing expert with experience in designing and implementing software for GPU and MIC compute-clusters. He earned a Ph.D. from the University of Technology Ilmenau in theoretical physics. Christian's prior scientific work focused on computational material research using Ab-Initio calculations, molecular dynamic simulations and Monte Carlo methods. As of 2015, Christian is a senior member of technical staff at the Sandia National Laboratories. He is a core developer of the Kokkos programming model with a large role in advising applications on adopting Kokkos to achieve performance portability for next-generation supercomputers.

Learn about strategies to keep codes maintainable and performant in a diverse high performance computing environment. Using the example of LAMMPS we will demonstrate how the use of Kokkos can reduce code redundancy compared to reimplementing capabilities in hardware specific variants, while delivering similar performance. We will show how new features supported by Kokkos are closing some of the remaining gaps to the native models, with a particular focus on overlapping hybrid execution on GPU and CPU. You will also learn how the Kokkos model provides build-in instrumentation for an application, which supports kernel based analysis of applications across diverse architectures. Performance data will be shown for Intel Haswell, ARM and OpenPower based systems, with and without GPUs.

Level: Beginner
Type: Talk
Tags: Performance Optimization; Tools & Libraries; Programming Languages

Day: Wednesday, 04/06
Time: 10:30 - 10:55
Location: Room 212A

S6505 - Electron Dynamics on Graphics Processing Units

Xavier Andrade Postdoctoral Researcher, Lawrence Livermore National Laboratory
Xavier Andrade is a postdoctoral researcher with the Quantum Simulation Group at Lawrence Livermore National Laboratory. Xavier obtained a Ph.D. in physics from the University of the Basque Country, and worked as a postdoc at the Department of Chemistry and Chemical Biology at Harvard University. His research is focused on developing new theoretical models and algorithms for the computational simulation of electrons in materials. Xavier is one of the main developers of Octopus, a scientific code used by hundreds of researchers around the world to simulate materials. Currently, his research efforts are dedicated to the application of real-time electron dynamics to model conductivity and other properties of matter under extreme conditions.

Learn how scientists simulate the movement of electrons inside materials, and how GPUs can be used to accelerate these simulations. The dynamics of electrons give rise to important phenomena in materials, for example, it determines how they interact with light, or how they conduct heat or electricity. The simulation of electrons, however, is a challenging task as their behavior is governed by quantum mechanics. So, we need to represent electrons as "clouds" and model how these clouds evolve in time. Fortunately, those simulations have a great potential for parallelization, and are ideal for GPUs. Our current efforts are focused on using GPU clusters for large-scale electron dynamics that will allow us to perform simulations of unprecedented predictive power.

Level: Intermediate
Type: Talk
Tags: Computational Chemistry; Computational Physics; Supercomputing & HPC

Day: Wednesday, 04/06
Time: 10:30 - 10:55
Location: Marriott Salon 5

S6723 - Which Whale Is It, Anyway? Face Recognition for Right Whales Using Deep Learning

Robert Bogucki Chief Science Officer, deepsense.io
Robert Bogucki is a Chief Science Officer at deepsense.io where he currently manages the R&D team and focuses on deep learning. He is also a successful Kaggle competitor. When tackling real life problems, he particularly enjoys leveraging algorithms and computational power instead of, or in addition to, domain knowledge. His motivation to work in the IT Industry is to bring the theoretical ideas and concepts and put them to good use.

With fewer than 500 North Atlantic right whales left in the world's oceans, knowing the health and status of each whale is integral to the efforts of researchers working to protect the species from extinction. To interest the data science community, NOAA Fisheries has organized a competition hosted on Kaggle.com. The challenge was to automate the right whales recognition process using a dataset of aerial photographs of individual whales - currently a painstaking and lengthy, manual process. In this session, I will outline the winning solution. It is based on deep learning and convolutional neural networks.

Level: Advanced
Type: Talk
Tags: Deep Learning & Artificial Intelligence; Computer Vision & Machine Vision

Day: Wednesday, 04/06
Time: 10:30 - 10:55
Location: Room 210H

S6742 - Real-Time Person Tracking on Jetson with OpenPTrack

Jeff Burke Assistant Dean, Technology and Innovation, UCLA School of Theater, Film and Television
Jeff Burke is Assistant Dean for Technology and Innovation at the UCLA School of Theater, Film and Television (UCLA TFT). He has produced, managed, programmed and designed experimental performances, short films, new genre art installations and new facility construction internationally for more than 15 years. Jeff has been a faculty member since 2001 and today, in addition to his role developing technology and innovation strategy at TFT, is Co-PI and application team lead for the Named Data Networking project, a multi-campus effort supported by the National Science Foundation (NSF) and an international 25-member consortium to develop a future Internet architecture. In 2004, Burke co-founded UCLA TFT's Center for Research in Engineering, Media and Performance (REMAP), a collaboration with the Henry Samueli School of Engineering and Applied Science, which combines research, artistic production and community engagement. At REMAP, Burke's research has been supported by the NSF and NEA, Intel, Cisco, Trust for Mutual Understanding and the MacArthur Foundation, among others. From 2006-2012, he was area lead for participatory sensing at the NSF Center for Embedded Networked Sensing, helping to define a new application arena for mobile devices. In 2014, Jeff received a three-year Google Focused Award on the "Future of Storytelling," for work that will explore the intersection of storytelling and coding through research and production of original, interdisciplinary digital media works at UCLA TFT.
Matteo Munaro Ph.D. Researcher, University of Padova
Matteo Munaro is a post-doc researcher at the IAS-Lab of the University of Padova and a Scientist with Open Perception. His research interests are in the field of people detection, tracking and re-identification with color cameras and RGB-D sensors, multi-camera calibration, sensor fusion and robotic vision.

We'll provide an overview of OpenPTrack, a GPU-enabled, open-source project that enables real-time position tracking of many people using networked 3D imagers, which is now available for the Jetson TK1/TX1 embedded platform. OpenPTrack specifically targets innovative applications in education, arts, and culture, where it aims to meet a need for real-time person tracking that is reliably scalable over large areas, realistically deployable, and low cost. We'll cover the basic technical approach, UCLA REMAP's experience from real-world multi-imager deployments, and the technology roadmap, using Jetson, that aims to bring occlusion-resistant, real-time person tracking into the mainstream of interactive design and experimentation. You'll see this demo live at the GTC 2016 Party on Wednesday, April 6 on the lower level of the Tech Museum.

Level: Intermediate
Type: Talk
Tags: Computer Vision & Machine Vision; Robotics & Autonomous Machines; Embedded

Day: Wednesday, 04/06
Time: 10:30 - 10:55
Location: Room 210F

S6806 - GPU-Based Deep Learning in Cloud and Embedded Systems

Frederick Soo Chief Technology Officer, Nauto, Inc.
As chief technology officer and co-founder of Nauto, Inc., Frederick Soo has assembled a team of world-class computer vision and machine learning researchers and engineers and set them to build the core algorithms and hardware for Nauto's commercial products. Prior to joining Nauto, Fred studied the computational neurophysiology of the retina, receiving his Ph.D. in biophysics from Stanford University and completing post-doctoral fellowships at the University of Washington and Princeton University. His work experiences include working at McKinsey and Co., where he collaborated with Nauto co-founder Prof. Stefan Heck, and at Soo Embedded Systems, where he built products from the ground up.

We'll present how Nauto uses deep learning in its distributed, vehicle-based compute and sensor network, and our learnings to date. Topics will include the performance of deep learning algorithms for computer vision in embedded systems, strategies for distributing compute across networks of embedded systems and in the cloud, and collecting and labeling data to maximize the performance of the system. Nauto's system is a dual-camera, windshield-mounted dashcam with GPS, IMU, wireless/cellular connection, and a SoC capable of running small CNNs in real time.

Level: All
Type: Talk
Tags: Self-Driving Cars & Automotive ; Embedded; Deep Learning & Artificial Intelligence; Computer Vision & Machine Vision

Day: Wednesday, 04/06
Time: 10:30 - 10:55
Location: Room LL21E

S6870 - Ray Tracing: Today, Tomorrow and Beyond

Jon Peddie President, Jon Peddie Research
Dr. Jon Peddie is one of the pioneers of the graphics industry, and formed Jon Peddie Research (JPR) to provide customer intimate consulting and market forecasting services. Peddie lectures at numerous conferences on topics pertaining to graphics technology and the emerging trends in digital media technology. Recently named one of the most influential analysts, regularly advises investors in the GLG network, he is frequently quoted in trade and business publications, was the former president of Siggraph Pioneers, and he is also the author of several books his most recent, The History of Visual Magic in Computers. Dr. Jon Peddie was recently honored by the CAD Society with a lifetime achievement award.

Ray tracing is to manufacturing what a storyboard is to film — the ability to visualize the product before it's built. Movies couldn't be made today with the quality they have without ray tracing. The market for ray tracing is entering into a new phase which will be discussed in this talk. Jon will review market size, market composition and ways the industry is reducing rendering time and computational loads.

Level: All
Type: Talk
Tags: Rendering & Ray Tracing

Day: Wednesday, 04/06
Time: 10:30 - 10:55
Location: Room LL21B

ECS6101 - China In Focus - Opening

Kitty Fok Managing Director, IDC China
Kitty Fok is managing director for IDC China, leading a team of over 50 specialized analysts and overseeing all research processes for the Greater China region. Fluent in English and Mandarin, Fok is a frequently featured speaker at various industry events and she has been invited to appear on BBC, Bloomberg, CNN, and CNBC. Her opinions have also been published in publications such as the Asian Wall Street Journal, South China Morning Post, ComputerWorld, and China Daily. Fok has a Higher Diploma in mathematics, statistics, and computing from the Hong Kong Polytechnic University, and a Master's of Science in management science (operational research) from Lancaster University, UK.

Level: All
Type: Talk
Tags: Emerging Company Summit

Day: Wednesday, 04/06
Time: 10:40 - 11:10
Location: Room 220B

ECS6102 - China In Focus: Show & Tell - ANTVR

Qin Zheng CEO & Founder, ANTVR
Zheng Qin founded ANTVR in early 2014, when he left his doctoral program at the China Academy of Space Technology, known as the NASA of China. An avid science fiction fan, Qin has produced several sci-fi stories and movies. He holds more than 50 patents. Fast Company magazine named him one of the 100 “Most Creative People in Business in China” in 2014.

Level: All
Type: Talk
Tags: Emerging Company Summit

Day: Wednesday, 04/06
Time: 11:15 - 11:30
Location: Room 220B

ECS6213 - China In Focus: Show & Tell - SenseTime

Li Xu CEO, SenseTime
Li Xu has more than 10 years of R&D experience in computer vision and pattern recognition, and has published more than 40 papers at top conferences and in journals. He has served as a reviewer for various prestigious journals in the computer vision field, and was awarded the best reviewer of the International Conference on Computer Vision, in 2015. Three of his papers have been implemented in the latest version of OpenCV. He holds a B.S. and Ph.D. in computer science from Shanghai Jiaotong University and Chinese University of Hong Kong, respectively.

Level: All
Type: Talk
Tags: Emerging Company Summit

Day: Wednesday, 04/06
Time: 11:30 - 11:45
Location: Room 220B

ECS6214 - China In Focus: Show & Tell - Yuanqu Tech

Wu YiJian CEO, Yuanqu Tech
YiJian Wu founded Yuanqu Tech in 2013, following his work at iFlytek, Nagoya Institute of Technology, Microsoft Research Asia and Shanda Innovation of Speech. He has more than 15 years of experience in speech research and development, and has published more than 40 papers. He holds several domestic and international patents. He graduated from the Special Class of Gifted Young of University of Science and Technology of China) in 2001, and received his Ph.D. in 2006.

Level: All
Type: Talk
Tags: Emerging Company Summit

Day: Wednesday, 04/06
Time: 11:45 - 12:00
Location: Room 220B

S6374 - Gunrock: A Fast and Programmable Multi-GPU Graph Processing Library

Yangzihao Wang Graduate Student Researcher, University of California Davis
Yangzihao Wang is a Ph.D. candidate at UC Davis, supervised by Professor John D. Owens in the research of GPU graph processing. He felt the pain of coding and optimizing individual graph algorithms on the GPU and wanted a unified framework with both high-performance and easy programmability, which became the Gunrock library.
Yuechao Pan Graduate Student Researcher, University of California Davis
Yuechao Pan is a Ph.D. student at UC Davis, also from Professor John D. Owens' group, focusing on multi-GPU graph processing. He designed and implemented the multi-GPU framework of Gunrock, which brought the performance and the flexibility of the library to a new level.

We present Gunrock, a multi-GPU graph processing library, that enables easy graph algorithm implementation and extension onto multiple GPUs for scalable performance on large graphs with billions of edges. Attendees can learn how to 1) solve large-scale graph problems with high-performance GPU computing primitives and optimization strategies, using our high-level data-centric abstraction that focuses on vertex or edge frontier operations, and 2) utilize multi-GPU computing power by just a few algorithm-dependent blocks, using our multi-GPU framework that handles most multi-GPU implementation details and memory allocation. We will also share experience on the library's design and implementation that helps it achieve the best performance among programmable GPU graph libraries.

Level: Intermediate
Type: Talk
Tags: Big Data Analytics; Tools & Libraries; Supercomputing & HPC

Day: Wednesday, 04/06
Time: 13:00 - 13:25
Location: Room 210F

S6360 - Graph Analytics: Using GPU-Accelerated Sparse Linear Algebra Routines

Paul Fox Engineer, EM Photonics, Inc.
Highly-Rated Speaker
Paul Fox has seven years of experience in GPU and heterogeneous computing. Working at EM Photonics, he has contributed to the CULA GPU linear algebra library and the ATCOM image enhancement suite. His work has recently focused on programming methods for high-performance heterogeneous computing, particularly in scientific computing and signal processing application areas. He has an M.S. in electrical engineering from University of Delaware.

Large-scale graph analytics frameworks provide a convenient and highly scalable platform for developing algorithms to analyze large datasets. Although conceptually scalable, these techniques exhibit poor performance on modern computational hardware. We're developing an implementation of the high-level functions supported by these APIs in terms of linear algebra operations, which will be parallel on each pair of vertices connected by an edge. This technology can reduce the number of nodes required and map well to computational accelerators such as GPUs, thus enabling users to perform more complex analysis with less hardware at lower cost. We'll detail our latest work on this project, including challenges, specifics of our approach, and preliminary results.

Level: Beginner
Type: Talk
Tags: Big Data Analytics; Algorithms

Day: Wednesday, 04/06
Time: 13:30 - 13:55
Location: Room 210F

ECS6103 - Early Stage Challenge Opening

Jeff Herbst Vice President, Business Development, NVIDIA
Jeff Herbst is vice president of business development at NVIDIA, with responsibility for mergers and acquisitions strategy, investments, partnerships, and other strategic business relationships and transactions. Prior to joining NVIDIA in 2001, Herbst was worldwide head of corporate and business development at AltaVista, and also served as general manager for a startup focused on content delivery infrastructure for wireless networks. Earlier in his career, he was a partner with the law firm Wilson Sonsini Goodrich and Rosati, where he specialized in corporate finance, joint ventures, mergers and acquisitions, and other strategic business and intellectual property-related transactions. Herbst holds a bachelor of science degree in computer science, with an emphasis in computer graphics, from Brown University and a J.D. from Stanford University.

Level: All
Type: Talk
Tags: Emerging Company Summit; Press-Suggested Sessions: General Interest

Day: Wednesday, 04/06
Time: 14:00 - 14:15
Location: Room 220B

S6166 - CAD Benchmarking of NVIDIA® Graphics on HP Laptops, Blades, Workstations and VMs

Brian Walrath Senior Multi-Disciplined Engineer , Raytheon Missile Systems
Brian Walrath is a senior multi-disciplined engineer at Raytheon Missile Systems in Tucson, AZ. Brian has B.S. in Technology from Southwest Texas State.

Raytheon Missiles has done its CAD design work almost exclusively on NVIDIA hardware for the better part of the last decade, on Desktop, Laptop, Blade Workstation and now Virtual Machines. How do all these platforms stack up against each other under 3D CAD and Heavy Analysis and Simulation loads? Focusing on the last five years or so, Raytheon will share its benchmarking methodology, and data, and compare and contrast our 3D CAD experience with NVIDIA GPUs on HP hardware and VMs.

Level: All
Type: Talk
Tags: Graphics Virtualization; Aerospace & Defense; Product & Building Design; Computer-Aided Engineering; Press-Suggested Sessions: Professional Graphics

Day: Wednesday, 04/06
Time: 14:00 - 14:50
Location: Marriott Salon 4

S6192 - Brain-in-a-Box: A Unified Perception and Navigation Framework for Mobile Robots, Drones and Cars

Massimiliano Versace CEO, Neurala Inc.
Massimiliano Versace is the CEO of Neurala Inc. and founding director of the Neuromorphics Lab at Boston University. He is a pioneer in researching and bringing to market large-scale, deep learning neural models that allow robots to interact and learn in real time in complex environments. Max has authored approximately 40 journal articles, book chapters, and conference papers, holds several patents, and has been an invited speaker at dozens of academic and business meetings, research and national labs, and companies, including NASA, Los Alamos National Labs, Air Force Research Labs, HP, iRobot, Samsung, LG, Qualcomm, Ericsson, BAE Systems, Mitsubishi, ABB, and Accenture, among others. He is a Fulbright scholar and holds two Ph.D.s: experimental psychology, University of Trieste, Italy, and cognitive and neural systems, Boston University, USA. He obtained his B.S. from the University of Trieste, Italy.

Mobile robots, drones, and self-driving cars need advanced and coordinated capabilities in perception and mobility to co-exist with humans in complex environments. To date, the most effective "machines" built for these tasks come to biology. Max Versace, CEO of Neurala and director of the Boston University Neuromorphics Lab, will explain how mobile robots, drones, and cars can use GPUs coupled with relatively inexpensive sensors, today available in the sensor pack of a common smartphone, to enable machines to sense and navigate intelligently their environment. The talk will illustrate the working "mini-brain" that can drive a ground robot to learn, map, and understand the layout of the environment, objects in it, while avoiding collisions.

Level: All
Type: Talk
Tags: Robotics & Autonomous Machines; Deep Learning & Artificial Intelligence; Self-Driving Cars & Automotive ; IoT; Press-Suggested Sessions: AI & Deep Learning

Day: Wednesday, 04/06
Time: 14:00 - 14:25
Location: Room LL20D

S6201 - A New Parallel Prefix Scan Algorithm for GPUs

Sepideh Maleki Graduate Research Assistant , Texas State
Sepideh Maleki is a Graduate Research Assistant in the Efficient Computing Laboratory at Texas State University, where she is currently pursuing her Master's degree in computer science. Her research interest include performance optimization, GPGPUs, and parallel programing. She is a student member of IEEE, SWE, and the ACM. She received a Graduate Excellence Award from the Department of Computer Science in 2015.

We present and evaluate a new technique for implementing parallel prefix scans. A number of GPU libraries include this important parallel primitive, but most of those implementations are based on a hierarchical algorithm. In contrast, our algorithm only requires a single stage. We implemented it in portable CUDA code using just one relatively short templated kernel. Our code outperforms prefix scans from popular libraries like CUDPP, MGPU, and THRUST on both Kepler and Maxwell devices. In many cases, it even outperforms the CUB library, which employs architecture-specific assembly code.

Level: Advanced
Type: Talk
Tags: Performance Optimization

Day: Wednesday, 04/06
Time: 14:00 - 14:25
Location: Room 212A

S6202 - GPUCC: An Open-Source GPGPU Compiler

Jingyue Wu Software Engineer, Google Inc.
Jingyue Wu is a software engineer at Google and an active contributor to the LLVM compiler. He is one of the main contributors of gpucc, Google's open-source CUDA compiler. He completed his Ph.D. in computer science at Columbia University, where he worked with Professor Junfeng Yang on several projects related to software reliability and programming languages. This work has led to over 10 publications in top journals and conferences such as OSDI, SOSP, and PLDI.

We'll present gpucc, an LLVM-based, fully open-source, CUDA-compatible compiler for high performance computing. Its Clang-based front-end supports modern language features such as those in C++11 and C++14. Its compile time is faster than nvcc. It generates better code than nvcc on key end-to-end internal benchmarks and is on par with nvcc on a variety of open-source benchmarks.

Level: Advanced
Type: Talk
Tags: Tools & Libraries; Performance Optimization; Programming Languages

Day: Wednesday, 04/06
Time: 14:00 - 14:25
Location: Room 211B

S6225 - Efficient Utilization of Large-Scale Heterogeneous Systems Using the Uintah Computational Framework

Alan Humphrey Software Developer and Ph.D. Student, Scientific Computing and Imaging Institute, University of Utah
Alan Humphrey is a software developer at the Scientific Computing and Imaging Institute and also a Ph.D. student at the University of Utah, where he works with Dr. Martin Berzins on improving the performance and scalability of the Uintah Computational Framework. Alan has been primarily involved in extending Uintah to run on hybrid CPU/GPU systems with the development of Uintah's prototype CPU-GPU task scheduler and most recently, Uintah's Unified multi-threaded heterogeneous task scheduler and runtime system that allows Uintah to dynamically dispatch computational tasks to both CPU cores and available GPUs on-node. Much of Alan's past research has been focused on formal verification of concurrent systems, specifically the Message Passing Interface (MPI) and dynamic verification tools like In-situ Partial Order (University of Utah) - and its integration within the Eclipse Parallel Tools Platform (PTP). Alan has also been involved with the Eclipse PTP project from 2009-2015.

We'll discuss how directed acyclic graph (DAG) approaches provide a powerful abstraction for solving challenging engineering problems and how using this abstraction and DAG approach, computational frameworks such as Uintah can be extended with relative ease to efficiently leverage GPUs, even at scale. Attendees will learn how frameworks like Uintah are able to shield the application developer from the complexities of the deep memory hierarchies and multiple levels of parallelism found in heterogeneous supercomputers. Attendees will be shown how Uintah applications can be made to utilize thousands of GPUs within a single simulation, as shown by recent results for a GPU-based radiation model that achieves excellent strong scaling to 16,384 GPUs on DOE Titan.

Level: All
Type: Talk
Tags: Supercomputing & HPC; Computational Fluid Dynamics

Day: Wednesday, 04/06
Time: 14:00 - 14:25
Location: Room 211A

S6251 - Leveraging GPU Technology to Visualize Next-Generation Products and Ideas

Michael Wilken Director of 3D, Saatchi & Saatchi
Michael Wilken leads Saatchi and Saatchi LA's growing 3D capabilities. He built powerful agency 3D capability from a single role to a 30+ team serving some of the world's largest brands. His team has successfully realized an industry-leading integration of 3D production ability with creative collaboration within an advertising agency.

While CAD real-time visualization solutions and 3D content creation software have been available for decades, there were practical workflow barriers that inhibit efficient integration into an agency's creative and production process. Using the latest in GPU technology from NVIDIA, Saatchi and Saatchi LA is pioneering the breaking of these barriers. 3D artists work with creative directors and clients to rapidly visualize ideas and products. Real-time visualization is integrated into the production workflow seamlessly, making rapid visualization both inspiring and cost-saving. We'll provide a top-level overview of how Saatchi is leveraging NVIDIA GPU technologies, including the NVIDIA VCA, to create powerful virtual creative collaborations.

Level: Intermediate
Type: Talk
Tags: Product & Building Design; Rendering & Ray Tracing; Self-Driving Cars & Automotive ; Press-Suggested Sessions: Self-Driving Cars & Auto

Day: Wednesday, 04/06
Time: 14:00 - 14:25
Location: Room LL21A

S6255 - High-Performance CUDA™ Clustering at Cloud Scale: GPUDirect RDMA over 40GBE IWARP

Tom Reu Consulting Application Engineer, Chelsio Communications, Inc.
Tom has a long career in the computer industry. He started his career at a NJ based mini-computer company, Concurrent Computer Corporation, and ended his tenure in product development with HP working in the High Performance Computing Division. After leaving HP, Tom transitioned to his current Field Application role with Chelsio Communications. Tom has a Masters Degree in Computer Science from Monmouth University and a Bachelors Degree in Electrical Engineering from Villanova University.

Learn how to deploy GPU clustering at scale by integrating Chelsio's 40GbE iWARP (RDMA/TCP) into your GPU applications. GPUs have demonstrated paradigm-shifting performance in a wide variety of applications. But there remain network infrastructure challenges to the adoption of GPUs operating at scale, especially in large-scale cloud environments. We present 40GbE iWARP, which leverages existing Ethernet infrastructure and requires no new protocols, interoperability, or long maturity period as the no-risk path for Ethernet-based, large-scale GPU clustering. The first part of the session is a technical overview of 40GbE iWARP, including best practices and tuning for GPU applications. The second part summarizes benchmark results showing benefits of GPUDirect RDMA using 40GbE iWARP.

Level: Intermediate
Type: Talk
Tags: Data Center & Cloud Computing; Performance Optimization; Tools & Libraries

Day: Wednesday, 04/06
Time: 14:00 - 14:50
Location: Room 210E

S6267 - Data Analytics and Machine Learning at Your Finger Tips - No CUDA Required

Bryan Thompson Chief Scientist, Co-Founder, Blazegraph
Bryan Thompson is chief scientist at SYSTAP. He has 30+ years experience as a technologist, inventor, and researcher in cloud computing and big data. He is the lead architect for BlazeGraph, an open source, distributed graph database used by Fortune 500 companies, including EMC, Autodesk, and Yahoo!. He leads SYSTAP's research team investigating GPU-accelerated distributed architectures for graph processing, which, together with the SCI Institute, in 2014 published results for executing Breadth-First Search on a cluster of 64 GPUs at up to 32 billion traversed edges per second.
James Lewis CUDA Researcher, Blazegraph
James Lewis is a CUDA researcher with SYSTAP. He is the lead developer for Blazegraph GPU. He wrote the initial version of the software that uses SpMV techniques to implement Sparql Query evaluation on the GPU. He was the lead CUDA developer for integrating Mapgraph technology with the Merlin Application to accelerate Electronic Warfare using GPU graph capabilities. In this role, James exposed the graph capabilities on the GPU via a Java Native Interface (JNI) to enable the integration without the application developer writing any CUDA, C++, of non-Java code. He studied at the University of Utah Scientific Computing Institute (SCI) where he received B.S. degrees in both Computer Science and Applied Mathematics as a well as an M.S. in Computing. In his research work, James developed graph topological metrics to evaluate the performance of aggregation method in the context of multigrid coarsening. He implemented parallel aggregation techniques for multigrid coarsening in C++ and CUDA.

Writing fast, efficient data analytics for graph and machine learning on GPUs can be hard due to the complexities of CUDA and achieving effective parallelism. DASL and SPARQL are high-level languages for graph and machine learning algorithms (DASL) and graph pattern matching (SPARQL) that provide speedups of up to 1,000x over Spark native and up to 300x over leading graph databases when executed on the BlazeGraph platform. These high-level languages are translated into task graphs that expose the available parallelism. The mapgraph runtime evaluates the task graphs and provides a scalable architecture on GPUs and GPU clusters. This presentation discusses the concepts for graph algorithms and queries, the mapgraph architecture, and how algorithms are evaluated on a GPU cluster.

Level: Intermediate
Type: Talk
Tags: Big Data Analytics; Deep Learning & Artificial Intelligence; Aerospace & Defense

Day: Wednesday, 04/06
Time: 14:00 - 14:25
Location: Room 210F

S6300 - Big Lasers, Small Particles & GPUs: Our Weapons of Choice to Fight Cancer

Axel Huebl PhD Student, Helmholtz-Zentrum Dresden - Rossendorf
Axel Huebl is one of the main developers of the PIConGPU laser plasma simulation and one of the inventors of the OpenPMD metadata format for particle mesh data. He has been part of the team to bring PIConGPU into the finals of the Gordon Bell award 2013. Axel currently works on his master thesis on laser-driven ion acceleration for cancer therapy and the interaction of X-Ray lasers with solid density plasmas.

We'll present results on our INCITE project "Targeting Cancer with High Power Lasers," which aims to deliver beams of ions for cancer therapy accelerated by high power lasers. With a novel target design in which the target is levitated in a trap to isolate it from its environment, we study the properties of the generated ion beams and their potential for radiation therapy of cancer. In the discussion, we'll also present performance results of our own plasma simulation code PIConGPU on the Titan system, which has been used to study the laser plasma interaction in 3D.

Level: All
Type: Talk
Tags: Computational Physics; Computational Chemistry; Supercomputing & HPC; Press-Suggested Sessions: HPC & Science

Day: Wednesday, 04/06
Time: 14:00 - 14:25
Location: Marriott Salon 6

S6301 - Driver Face Analytics & Emotion Recognition Using Deep Learning

Modar Alaoui CEO, Eyeris
Modar Alaoui is a tech entrepreneur and expert in artificially intelligent vision technologies, deep learning, and ambient intelligence. Modar is founder and CEO at Eyeris, the worldwide leading deep learning-based emotion recognition software. The company's flagship product, EmoVu, reads facial micro-expressions in real time and uses convolutional neural networks as a deep learning architecture to train and deploy its algorithm into a myriad of today's commercial applications. Modar combines a decade of experience between human machine interaction and audience behavioral measurement. He is a frequent speaker and keynoter on "ambient intelligence" as the next frontier in AI, a winner of several technology and innovation awards, and has been featured in many major publications for his work.

We'll introduce you to ultra-lightweight vision software that reads facial micro-expressions in real time for use in driver monitoring systems in the next generation of vehicles. Using deep learning-based convolutional neural networks (CNNs) powered by GPUs, vision algorithms for embedded systems can now allow vehicles to constantly monitor drivers' inattention, cognitive awareness, and emotional distraction, through a number of face analytics and emotion recognition technology in a 30th of a second. We'll also reveal the five most common applications of such technology in the automotive space, ranging from invisible reactive support systems to semi-autonomous driving. We'll also present on stage a brief, highly rated live demo toward the end of the session.

Level: All
Type: Talk
Tags: Self-Driving Cars & Automotive ; Computer Vision & Machine Vision; Deep Learning & Artificial Intelligence; Intelligent Video Analytics (IVA); Press-Suggested Sessions: Self-Driving Cars & Auto

Day: Wednesday, 04/06
Time: 14:00 - 14:25
Location: Room LL21E

S6524 - Enabling the Electronic Structure Program Gaussian on GPGPUs Using OpenACC

Roberto Gomperts Principal Engineer, NVIDIA
In 2011, Roberto Gomperts became a principal engineer at NVIDIA working to enable computational chemistry applications on GPGPUs. After graduating at the University of Nijmegen (The Netherlands), Roberto became professor of physical chemistry at the University of Zulia in his native country, Venezuela. He developed his expertise in applying parallelism to computational chemistry programs first in 1985 at IBM, where he held a post-doctoral position for two years, and later at Alliant Computer Systems. In 1991, he joined Silicon Graphics, Inc., where he served as a computational chemistry specialist, principal scientist, and senior principal scientist, among other roles. Roberto is co-author of KGNMOL, Gaussian 92, Gaussian 92/DFT, Gaussian 94, Gaussian 98, Gaussian 03, and Gaussian 09 and was an active co-developer of early versions of Amber. He has published many technical reports and is co-author of over 20 peer-reviewed scientific publications and one patent.

In 2011, Gaussian, Inc., PGI, and NVIDIA embarked on a long-term project to enable Gaussian on GPGPUs using a directives-based approach. OpenACC has emerged as the de-facto standard to port complex programs to GPU accelerators. We'll discuss how we attacked some of the challenges involved in working with a large-scale, feature-rich application like Gaussian. This includes a number of PGI extensions to the OpenACC 2.0 standard that we believe will have a positive impact on other programs. To conclude, we'll present a sample of GPU-based performance improvements on a variety of theories and methods.

Level: Intermediate
Type: Talk
Tags: Computational Chemistry; Supercomputing & HPC; Tools & Libraries; OpenACC

Day: Wednesday, 04/06
Time: 14:00 - 14:50
Location: Marriott Salon 5

S6526 - Beyond Standards: A New GPU-Aware Image Coding System

Pablo Enfedaque Ph.D. Student, Universitat Autònoma de Barcelona
Pablo Enfedaque is a third year Ph.D. student with the departments of Information and Communications Engineering (dEIC) and Computer Architectures and Operating Systems (CAOS), Universitat Autonoma de Barcelona, Spain. He received a B.E. in computer science and an M.S. in high performance computing and information theory in 2012 and 2013, respectively. Pablo has been working with GPU architectures since his final degree project. His research interests include image coding, high performance computing, and parallel architectures.

Discover a new image coding system devised to exploit massive parallelism in a GPU. Current standards for the compression of images lack the kind of parallelism required for efficient implementation in GPUs. Although much effort is made to implement such standards in CUDA, most implementations obtain poor results. This session describes the main insights behind the proposed image coding system. Our starting point was JPEG2000. The core mechanisms of the standard were redefined to allow the type of parallelism required in SIMT computing. All the advanced features of the system are preserved, but it is no longer compatible with the standard. Performance results will be given, comparing state-of-the-art CPU and GPU implementations of JPEG2000 with the proposed system.

Level: Intermediate
Type: Talk
Tags: Video & Image Processing; Performance Optimization; Signal & Audio Processing

Day: Wednesday, 04/06
Time: 14:00 - 14:25
Location: Room LL21B

S6665 - Using Butterfly-Patterned Partial Sums to Draw from Discrete Distributions

Guy Steele Software Architect, Oracle Labs
Guy Steele is a software architect at Oracle Labs, responsible for research in language design and implementation strategies, and architectural and software support for programming languages. He received his B.A. in applied mathematics from Harvard College in 1975, and Ph.D. in computer science and artificial intelligence from M.I.T. in 1977 and 1980, respectively. He previously an assistant professor of computer science at Carnegie-Mellon University; a member of technical staff at Tartan Laboratories in Pittsburgh, Pa.; and a senior scientist at Thinking Machines Corporation in Cambridge, Mass. He joined Sun Microsystems, acquired by Oracle in 2010, as a distinguished engineer in 1994 and was named a Sun Fellow in 2003.

We describe a SIMD technique for drawing values from multiple discrete distributions, such as sampling from the random variables of a mixture model for machine learning, that avoids computing a complete table of partial sums of the relative probabilities. A table of alternate ("butterfly-patterned") form is faster to compute, making better use of coalesced memory accesses. From this table, complete partial sums are computed on the fly during a binary search. Measurements using an NVIDIA TITAN Black GPU show that for a sufficiently large number of clusters or topics (K > 200), this technique alone more than doubles the speed of a latent Dirichlet allocation (LDA) application already highly tuned for GPU execution.

Level: Advanced
Type: Talk
Tags: Algorithms; Performance Optimization; Deep Learning & Artificial Intelligence

Day: Wednesday, 04/06
Time: 14:00 - 14:25
Location: Marriott Salon 3

S6702 - Automated Creation of Tests from CUDA Kernels

Oleg Rasskazov Executive Director, Quantitative Research, JP Morgan Chase
For the last eight years, Oleg has worked in Quantitative Research at JP Morgan, focusing on High Performance Compute for Equities, Commodities and FX. He has a PhD in Applied Mathematics, focused on computer-assisted proofs.

JP Morgan is extensively using GPUs to speed up risk calculations and reduce computational costs since 2011. The computational library runs a large number of kernels, both hand-written and auto-generated, with a complex data flow. As we were upgrading the CUDA drivers, runtimes, and hardware, we infrequently saw regressions in performance, and numerical values, and understood the need of the test suite that would simplify submission of issue reproducers without sharing whole proprietary library. This talk will present an automated approach to converting individual kernel launches into standalone test cases, subject to some restrictions on the GPU code structure.

Level: Intermediate
Type: Talk
Tags: Finance; Tools & Libraries

Day: Wednesday, 04/06
Time: 14:00 - 14:25
Location: Marriott Salon 1

S6720 - Get into VR with 360 Video

Nicolas Burtey CEO, VideoStitch
Nicolas Burtey is the CEO of VideoStitch.
Nicolas Lopez Lead Programmer, VideoStitch
Nicolas is lead programmer at VideoStitch, where he develops computer vision algorithms and architecture running the stitching process. After a Master's degree in Computer Science from Mines Paristech, he spent 5 years working on Exalead' search engine technology in Paris, before switching from Information Retrieval to the Virtual Reality industry.

Both Facebook and Hollywood view VR as a new medium, not only for computer-generated images but also for video. VideoStitch has developed 360-degree video stitching software that combines multiple HD video streams in real time using CUDA and NVIDIA GPUs. Camera manufacturers, the defense industry and movie production companies are among initial customers. This talk gives an overview of the state of art for creating 360 degree video including the challenges making multi-sensor cameras and combining 6-12 HD video streams for up to 8K video in real time with multiple GPUs.

Level: All
Type: Talk
Tags: Virtual Reality & Augmented Reality; Video & Image Processing; Press-Suggested Sessions: Virtual Reality

Day: Wednesday, 04/06
Time: 14:00 - 14:25
Location: Room LL20C

S6788 - Mars 2030

Julian Reyes Lead VR Producer, Fusion Media Network
Julian Reyes is a VR Producer for Fusion Media Network, an ABC-Disney joint venture. His primary focus is on creating interactive VR experiences using Unreal Engine 4. Among some of his works, he teamed up with Canadian company General Fusion to produce a VR simulation of a nuclear fusion reactor concept. He also produced an interactive project on illegal gold mining in Colombia called Blood Gold. He graduated from SAE Institute with a focus on Sound Engineering and Music Production and picked up game development by watching YouTube and other online tutorials. On his free time, he produces interactive VR music experiences for music festivals and has performed at III Points, Miami's Interactive Festival. He's also scheduled to be a panel member at this year's SXSW VR Track. He is currently working on an upcoming VR project based on a future mission to Mars with help from Disney Interactive, NASA, and MIT's Space Systems Laboratory.
Justin Sonnekalb Independent Technical Designer (previously @ Irrational Games),
Justin Sonnekalb is a Technical Designer with four years' experience creating prototypes for major, back-of-the-box gameplay systems, a producer with three years' experience working with world-class story and art teams, and the voiceover editor for Bioshock and Bioshock Infinite. His specialty is tackling technical challenges and pulling off impossible scripting or shader hacks, working to establish the "voice" of a game, and just generally getting into a flow where all the many disparate disciplines that make up a game start coming together. Because gameplay prototyping rarely overlaps voiceover editing in a typical production cycle, Justin has the opportunity to fully develop both sets of skills and enjoy using each as a respite from the other.
Dave Flamburis Senior Lead Artist - Creative Consultant, Self-Employed
Dave has worked in the industry as a hands-on Art Director, Lead Artist, and Senior Artist. His goal is to continue to develop and craft unique experiences, share insight, knowledge, build momentum both within and across teams, and have an amazing time doing so.

Mars 2030 is an interactive virtual reality project that offers a breathtaking look into the life of an astronaut hard at work studying and exploring the Martian landscape. Produced in conjunction with NASA and Fusion Media Network (a joint venture between ABC and Disney), Mars 2030 aims to be the most photo realistic and scientifically accurate depiction of the Red Planet to date. We'll expound on the project's scope and technical capacities, in addition to showcasing a full VR demo of the game itself. Those in attendance will be among the first to glimpse the results of this exciting and wholly unprecedented multimedia collaboration.

Level: All
Type: Talk
Tags: Virtual Reality & Augmented Reality; Aerospace & Defense; Game Development; Press-Suggested Sessions: Professional Graphics; Press-Suggested Sessions: Virtual Reality

Day: Wednesday, 04/06
Time: 14:00 - 14:50
Location: Room LL20A

S6798 - Torch: A Flexible Platform for Deep Learning Research

Soumith Chintala Research Engineer, Facebook
Soumith Chintala is a Researcher at Facebook AI Research, where he works on deep learning, reinforcement learning, generative image models, agents for video games and large-scale high-performance deep learning. Prior to joining Facebook in August 2014, he worked at MuseAmi, where he built deep learning models for music and vision targeted at mobile devices. He holds a Masters in CS from NYU, and spent time in Yann LeCun's NYU lab building deep learning models for pedestrian detection, natural image OCR, depth-images among others.

We'll discuss Torch from a high-level perspective, discussing its usage style across the industry among deep learning giants such as Google DeepMind, Facebook AI Research, Twitter Cortex. We present the current state of Torch as a research and production framework for deep learning models, and finally we present our long term vision.

Level: Beginner
Type: Talk
Tags: Deep Learning & Artificial Intelligence

Day: Wednesday, 04/06
Time: 14:00 - 14:50
Location: Hall 3

S6819 - The Value of 3D in Moments of Visual Dominance (Presented by Dimension Technologies)

Tom Curtin Director Business Development, Dimension Technologies Inc.
Tom Curtin is responsible for planning and implementing DTI go-to-market strategies to successfully commercialize and license DTI's IP portfolio and knowhow in autostereoscopic (glasses-free) 3D/2D displays. He has more than 35 years' experience in strategic marketing and communications working with some of the most well-known brands in the world – Fedex, Kodak, Xerox, Bausch and Lomb, IBM, Dunlop Sports, Mattel Fisher-Price.

Why are NASA and world leaders in aerospace and automotive bringing glasses-free 3D to the cockpit and the dashboard? Heightened situational awareness. 18% performance improvement. Less trial and error. Lower heart rates. Studies at NASA and Wright-Patterson AFB have shown that pilots, drivers, remote operators perform better when they can see certain types of information in 3D – depth of field, slope and terrain, relative position, vehicle status, emergency alerts and response protocols. In moments of crisis, when visual dominance is at its peak, 3D provides additional insight into situation dynamics and enables better decision making. The next generation in autostereoscopic displays will help make skies and roads safer.

Level: All
Type: Talk
Tags: Large Scale and Multi-Display Visualization; Aerospace & Defense; Virtual Reality & Augmented Reality

Day: Wednesday, 04/06
Time: 14:00 - 14:25
Location: Room LL21D

S6826 - Deep Neural Networks for High Performance Computer-Aided Detection, Segmentation in Radiology

Le Lu Staff Scientist, National Institutes of Health
Le Lu is a Title-42.g staff scientist in Department of Radiology and Imaging Science of National Institutes of Health (NIH) Clinical Center, Bethesda, Maryland since 2013. His research is focused on modern medical image understanding and semantic parsing to fit into revolutionary clinical workflow practices, especially in the area of preventive cancer early detection and diagnosis via large scale imaging protocols and statistical (Deep) learning principles. He received my Ph.D. degree of Computer Science from Johns Hopkins University in May 2007.

Employing deep learning (DL), especially deep neural networks (powered by HPC, GPUs) for high performance radiological or medical image computing is the main focus. We'll present the motivation, technical details and quantitative results of our recent work at NIH for three core problems: 1) Improving Computer-aided Detection (CAD) using Convolutional Neural Networks and Decompositional Image Representations; 2) Robust Bottom-up Multi-level Deep Convolutional Networks for Automated Organ Segmentation; 3) Text/Image Deep Mining on a Large-Scale Radiology Image Database for Automated Image Interpretation. We validate some very promising observations of using DL to both significantly improve upon traditional CAD tasks in (1) and enable new exciting research directions as (2,3).

Level: Intermediate
Type: Talk
Tags: Medical Imaging; Computer Vision & Machine Vision; Deep Learning & Artificial Intelligence

Day: Wednesday, 04/06
Time: 14:00 - 14:50
Location: Room 212B

S6861 - Dynamic Memory Networks for Visual and Textual Question Answering

Stephen Merity Senior Software Engineer, MetaMind
Stephen Merity is a senior software engineer at MetaMind, where he works on researching and implementing deep learning models for vision and text, with a focus on memory networks and neural attention mechanisms for computer vision and natural language processing tasks. Prior to joining MetaMind, Stephen worked on big data at Common Crawl, data analytics at Freelancer.com, and online education at Grok Learning. Stephen holds a master's degree in computational science and engineering from Harvard University and a bachelor of information technology from the University of Sydney.

You will learn how neural networks with memory and attention mechanisms allow for state of the art question answering.Neural network architectures with memory and attention mechanisms exhibit certain reasoning capabilities required for question answering. We describe the dynamic memory network (DMN), which uses both of these mechanisms to achieve state of the art performance on both the Visual Question Answering dataset and the bAbI-10k text question-answering dataset. We demonstrate how attention mechanisms allow for improved inspection of deep learning models, helping to understand the evidence behind specific decisions. The techniques discussed are applicable to a wide range of tasks, helping to improve both the accuracy and interpretability of the resulting models.

Level: Intermediate
Type: Talk
Tags: Deep Learning & Artificial Intelligence; Computer Vision & Machine Vision

Day: Wednesday, 04/06
Time: 14:00 - 14:25
Location: Room 210H

ECS6104 - Early Stage Challenge Panelists & Contestants Introductions

Scott McGrew Business & Technology Reporter, NBC
Anchor and reporter Scott McGrew thrives at the intersection of entrepreneurship and venture capital. He is host of Press:Here, a weekly roundtable featuring world class technology reporters in conversation with Silicon Valley CEOs. Press:Here airs on NBC Bay Area Sunday mornings at 9am right after Meet the Press. Joining Scott each week is a team of contributors including reporters from the BBC, Forbes, Fortune, Wired, the New York Times and the Wall Street Journal. Scott was the first journalist to reveal the crimes behind the death of a South Bay Marine in Afghanistan and the only reporter in the world allowed to fly F-16 combat air patrol as part of Operation Enduring Freedom following 9/11. He's been an eyewitness to a firing squad and chased greased pigs at the Iowa State Fair. Scott is an active speaker on technology and television at Stanford, Brigham Young and other universities as well as numerous conferences. He is also a regular emcee and host for the Ernst and Young's Northern California Entrepreneur of the Year, the Global CleanTech Forum, United Way and the Boy Scouts. He also helps foster excellent business journalism as a judge the UCLA's Gerald Loeb awards. Scott is a part-time Cessna 172 pilot, MIG and TIG welder, CSS coder and a lector in the Episcopal Church. None of which he does very well, he claims, but he says he's trying.
Rob Enderle President & Principal, Enderle Group
Rob Enderle is president and principal analyst of the Enderle Group, a forward-looking emerging technology advisory firm. With over 25 years of experience with emerging technologies, he has provided regional and global companies with guidance on how to be successful in this changing world. Before founding the Enderle Group, Rob was the senior research fellow for Forrester Research and the Giga Information Group. While there, he worked for and with companies like Microsoft, TI, HP, IBM, Dell, Toshiba, Gateway, Sony, USAA, Texas Instruments, AMD, Intel, Credit Suisse First Boston, GM, Ford, ROLM, and Siemens. Prior to that, he worked for IBM and held positions in internal audit, competitive analysis, marketing, finance, and security. Currently, Rob writes on emerging personal technology, security, and Linux for a wide variety of publications, including TechNewsWorld, CIO, Forbes, TGDaily, TMCNET, Datamation, and IT Business Edge, and international news organizations such as CNBC, CNN, Bloomberg, and NPR. Rob also does a semi-weekly spot for Wall Street Journal radio on consumer technology. Rob sits on the advisory councils for a variety of technology companies.
Saeed Amidi Founder & CEO, Plug and Play Tech Center
Saeed Amidi is the founder and CEO of Plug and Play Tech Center, the premier technology startup accelerator, whose 300-plus companies have collectively have raised in excess of $750 million. Additionally, Saeed is a general partner in Amidzad. The fund has been investing in technology companies for over 15 years and holds successful investments in over 70 technology companies, including PayPal, Powerset, Danger, Bix, Powerset, DropBox, Lending Club, and Zoosk. Saeed is a serial entrepreneur and a seasoned executive with over 28 years of experience in founding, operating, and growing successful companies. He has successfully started and grown businesses both nationally, as well as internationally, in countries like Spain, Italy, France, and Austria. Saeed’s current passion is inspiring and helping entrepreneurs and startups out of universities. Some of the universities he is working with include MIT, Cornell, Carnegie Mellon, Harvard, Stanford, Berkeley, Santa Clara, Wharton, and Dartmouth. His objective is to identify great entrepreneurs with a passion to execute their ideas. Saeed is an active member of the technology community and a frequent contributor to numerous charitable foundations. He is also an active member of the Young Presidents Organization, a world-class network of Fortune 500 CEOs, accomplished serial entrepreneurs, and veteran financial executives.
Jeff Herbst Vice President, Business Development, NVIDIA
Jeff Herbst is vice president of business development at NVIDIA, with responsibility for mergers and acquisitions strategy, investments, partnerships, and other strategic business relationships and transactions. Prior to joining NVIDIA in 2001, Herbst was worldwide head of corporate and business development at AltaVista, and also served as general manager for a startup focused on content delivery infrastructure for wireless networks. Earlier in his career, he was a partner with the law firm Wilson Sonsini Goodrich and Rosati, where he specialized in corporate finance, joint ventures, mergers and acquisitions, and other strategic business and intellectual property-related transactions. Herbst holds a bachelor of science degree in computer science, with an emphasis in computer graphics, from Brown University and a J.D. from Stanford University.
George Hoyem Partner, In-Q-Tel
Mr. George Hoyem serves as a Partner and Investments Partner at In-Q-Tel, Inc. Mr. Hoyem served as the Managing Director and General Partner at Blueprint Ventures. His investment focus included software, wireless, security, other IT and communications infrastructure companies building disruptive technologies, and early-stage corporate IP spinouts. Mr. Hoyem has more than 25 years of entrepreneurial, operations, and venture experience in high technology companies. He co-founded Redleaf Group, Inc. and also served as its Managing Partner and Managing Director. Mr. Hoyem headed up their Silicon Valley investment office and led several key investment and successful mergers and acquisitions transactions. While at Redleaf, he led the sourcing and financing efforts for many promising investments including LumiCyte, Amperion, Atlantes, and ecMarkets. As a hands-on investor, he led the successful effort to sell Atlantes to Vistant and orchestrated several other mergers and acquisitions transactions. He was a Co-Founder and Managing Director of Redleaf Ventures II, L.P. Prior to Redleaf, he served as a Venture Partner at El Dorado Ventures, where Mr. Hoyem spearheaded their software-infrastructure investments including FusionOne and eCast. Prior to becoming a Venture Capitalist, he built an 18-year operating career with several prominent technology companies. He was employed at Hewlett-Packard Company. As General Manager of Hewlett Packard's Internet Commerce Division, Mr. Hoyem was responsible for a $40 million P&L and served as a Vice President of Software Marketing. He held several Executive positions in marketing and operations at VeriFone Systems, Inc. Mr. Hoyem was a key contributor in the sale of VeriFone to Hewlett Packard for $1.3 billion. Prior to VeriFone, he Co-founded Visix, Inc. and also served as its Vice President of Worldwide Sales. He has also held other management and sales positions at American Management Systems (AMS), Federal Data Corporation, and Tektronix, Inc. He served as Chairman of Vidient Systems, Inc. from February 1, 2007 to April 9, 2007. He serves as a Director of ecIndustries, and Atlantes Services, Inc. He serves as a Director of SpectraSensors, Inc. He has been a Director of Solera Networks, Inc. since October 2008. He serves on the Boards of Astoria Software, Inc., TEGSCO, Atreus Systems, and Platform Solutions, Inc. He serves as a Board Observer of MemSQL Inc., Visto and IntelliPath. Mr. Hoyem is a Member of Advisory Board of Greenhouse Capital Partners. He served as Member of the Board of Advisors at SeeControl, Inc. since July 31, 2006. He served as a Director of Lutris Technologies, Inc., Lightspeed Interactive, Inc., Telicor, Vidient Systems, Inc. and Kalepa Networks. Mr. Hoyem holds a Bachelor Degree in Business Administration with a minor in Computer Science from Western Michigan University.

Level: All
Type: Talk
Tags: Emerging Company Summit; Press-Suggested Sessions: General Interest

Day: Wednesday, 04/06
Time: 14:15 - 14:30
Location: Room 220B

ECS6157 - Early Stage Challenge Finalist: AerialGuard

Itai Orr CEO, AerialGuard
Itai Orr is a co-founder and CEO of AerialGuard, a startup which develops advance autonomous capabilities. He has extensive experience with unmanned systems and holds a M.Sc. in Physics and B.Sc. in Aerospace Engineering.

Level: All
Type: Talk
Tags: Robotics & Autonomous Machines; Computer Vision & Machine Vision; Deep Learning & Artificial Intelligence; Emerging Company Summit

Day: Wednesday, 04/06
Time: 14:30 - 14:40
Location: Room 220B

S6140 - Optimizing Instruction-Bound Kernels in Dissipative Particle Dynamics

Yu-Hang Tang PhD Candidate, Division of Applied Mathematics, Brown University
Yu-Hang is a PhD candidate with the Division of Applied Mathematics at Brown University. His primary research interests focus on High Performance Computing and concurrent multiscale coupling with applications in modelling soft matter systems and physiological fluids. He is the author of several open-source software packages, including the LAMMPS USERMESO GPU-accelerated package for Dissipative Particle Dynamics (DPD) and Smoothed Particle Hydrodynamics (SPH) simulations, as well as the Multiscale Universal Interface library for coupling standalone solvers to perform multiscale simulations.

In this talk, we report algorithmic and instruction-level optimizations used in uDeviceX, a CUDA particle simulator for biomedical microfluidic devices. First, an FMA-intense random number generator (RNG) was proposed by exploiting the chaotic logistic map. This RNG can take advantage of the higher FP-to-integer instruction throughput ratio of CUDA GPUs to generate a large number of high quality random streams in situ. Second, warp-votes and shared memory were used to consolidate workload from diverging warps. Last, inline PTX was used to emulate 24-bit integer arithmetics by their floating point counterparts in order to increase throughput. An implementation using C++ templates ensures that no type-casting overhead is triggered and also guards the technique from unintentional usage.

Level: Intermediate
Type: Talk
Tags: Algorithms; Computational Chemistry; Performance Optimization

Day: Wednesday, 04/06
Time: 14:30 - 14:55
Location: Marriott Salon 3

S6153 - Fast Non-Rigid Registration for Mobile High-Dynamic Range Photography

Orazio Gallo Senior Research Scientist, NVIDIA
Orazio earned a M.S. degree in Biomedical Engineering from "Politecnico di Milano" (Italy). He then joined the Smith-Kettlewell Eye Research Institute, where he developed a novel bio-imaging technique capable of recording micrometric deformations of soft tissues. Subsequently, he joined the University of California at Santa Cruz, where he received a Ph.D. in Computer Engineering in 2011. During his studies in Santa Cruz, Orazio also interned at Canesta, Inc. (now acquired by Microsoft), and at the Nokia Research Center in Palo Alto. In September 2011, Orazio joined NVIDIA Research, where he currently works in the Mobile Visual Computing team. His interests span several areas of the fields of computer vision and computational photography. For a complete list of papers, including those published before joining NVIDIA research, see here. Orazio regularly serves on the program committees of the top computer vision and computational photography conferences (CVPR, ICCV, ICCP) and is an associate editor of the journal Signal Processing: Image Communication.

We present a method that leverages the computational power of GPUs to create a high-dynamic-range (HDR) photograph in the presence of camera motion and scene changes. Our approach is extremely fast and prevents the artifacts that arise from insufficient registration quality. Previous methods to address this problem are either accurate, but too slow for mobile devices, or fast, but prone to failing. As a comparison, our method runs in under 700ms on an NVIDIA-powered tablet for a pair of 5MP images, whereas previous state-of-the-art methods performing non-rigid registration take over a minute on desktops for a pair of 1MP images.

Level: Intermediate
Type: Talk
Tags: Video & Image Processing; Computer Vision & Machine Vision

Day: Wednesday, 04/06
Time: 14:30 - 14:55
Location: Room LL21B

S6179 - Delivering Personalized Cloud Services to the Car

Albert Jordan VP Products, Cloudcar
Albert Jordan is a co-founder and vice president of products at CloudCar. Previously, he served as vice president of products at Core Mobility, Inc. (acquired by Smith Micro). He led the launch of Core Mobility cloud-based services on Tier 1 carrier networks. Prior to Core Mobility, Albert was the CEO of Adaptive Telecom, a company that developed software-based intelligent antenna solutions that dramatically increased cellular network capacity. Adaptive Telecom was purchased by Metawave Communications, where he became president of the the CDMA business unit. Albert began his career as an ASIC designer at Tandem Computers. He holds five technology patents.

Most of today's IVI solutions are trying to replicate the smartphone interaction model in the car. Adopting an approach that is similar to smartphones will not result in differentiated solutions with a sustainable competitive advantage. More importantly, the immersive experiences that are typical of smartphone interaction, are not suitable in a driving environment. CloudCar is proposing a new approach in delivering connected services to the car, which brings about a new interaction model suited for the car.

Level: All
Type: Talk
Tags: Self-Driving Cars & Automotive ; Deep Learning & Artificial Intelligence

Day: Wednesday, 04/06
Time: 14:30 - 14:55
Location: Room LL21E

S6275 - Restore, Customize and Revamp an Iconic Motorbike with NVIDIA Iray® and Substance Painter

Matt Gueller Sr. Surface Designer, Harley-Davidson
With a B.S. in industrial design from Milwaukee Institute of Art and Design, Matt Gueller has been working over 18 years in the design field, as Class A surfacer, digital model maker, and visualization expert. Much of his career has been focused on the use of digital tools in the design process and how non-traditional techniques can impact the manufacturing workflow.
Jérôme Derel Chief Product Officer, Allegorithmic
Engineer and product designer Jerome Derel joined Allegorithmic in 2014 as a chief product officer. Jerome has been working for seven years at Dassault Systemes as a visualization expert in the Design Studio and the CATIA Design teams, leading projects that produce high-quality virtual materials.
Pierre Maheut Product Manager, Allegorithmic
Graduated in Mechanical Engineering, Industrial Product Design and Innovation Management, Pierre has been managing the User Experience of CATIA Creative Design for 7 years and is now Product Manager at Allegorithmic.

Leveraging Substance Painter and Iray, a Harley-Davidson Knucklehead, the iconic chopper of the late 1960s, with rust and dust, can be restored to its original glory or even turn into a custom race bike, bringing design iterations and study to the next level.

Level: All
Type: Talk
Tags: Product & Building Design; Rendering & Ray Tracing; Press-Suggested Sessions: Professional Graphics

Day: Wednesday, 04/06
Time: 14:30 - 15:20
Location: Room LL21A

S6292 - Gradually Porting an In-Use Sparse Matrix Library to Use CUDA

Mark Hoemmen Senior Member, R&D Staff, Sandia National Laboratories
Mark Hoemmen is a member of the R&D staff at the Center for Computing Research at Sandia National Laboratories. His research interests include numerical linear algebra, fault tolerance, and parallel programming models. He contributes to the Trilinos open-source software library, leads Trilinos' Scalable Linear Algebra capability area, and gives Trilinos tutorials regularly. He has a B.S. in mathematics and computer science from the University of Illinois Urbana-Champaign and a Ph.D. in computer science from the University of California, Berkeley.

Learn how to port an existing parallel library to use CUDA, even while the library is under constant production use by applications. We did this for the Tpetra parallel sparse linear algebra library. Tpetra provides data structures, computational kernels, and MPI data redistribution for Trilinos' sparse linear solvers. We used Kokkos, an abstraction over different shared-memory parallel programming models, to rewrite Tpetra for CUDA. This, along with careful attention to backwards compatibility, unit testing, and frequent application feedback, let us undertake this rewrite gradually. It also gave both applications and Trilinos' sparse linear solver packages that depend on Tpetra a gradual path to embrace MPI + thread parallelism.

Level: Intermediate
Type: Talk
Tags: Performance Optimization; Supercomputing & HPC

Day: Wednesday, 04/06
Time: 14:30 - 14:55
Location: Room 212A

S6309 - Capitalico - Chart Pattern Matching in Financial Trading Using RNN

Hitoshi Harada CTO, Alpaca
Hitoshi Harada is CTO at Alpaca, a company enabling AI technology to automate professional human tasks. Before Alpaca, Hitoshi worked in the database industry and community for 10 years as a PostgreSQL's major feature contributor, a kernel architect of MPP database Greenplum, and a contributor of open source in-database machine learning library MADlib. He has a large amount of experience in distributed system, data science, and machine learning for industrial applications.

Discretionary trading by technical analysis and momentum strategy in the financial market has been difficult to automate by quant-style rigid conditional programming as it involves a lot of fuzziness and subtleties of human perception. Our application, Capitalico, analyzes the financial time-series data and trader's behavior to solve this problem using the RNN/LSTM. In this talk, we'll introduce the problem and our approach, and detail pitfalls and practices, such as how we choose networks and parameters to achieve the best accuracy and performance with deep learning using GPUs. As we borrowed great ideas from past deep learning applications, we'll help you understand how we converted those ideas to our solution and how to apply deep learning to your problem.

Level: Intermediate
Type: Talk
Tags: Finance; Deep Learning & Artificial Intelligence

Day: Wednesday, 04/06
Time: 14:30 - 14:55
Location: Marriott Salon 1

S6405 - DeepSpark: Asynchronous Deep Learning over Spark

Uri Verner Researcher, Huawei Technologies
Uri is a member of the GPGPU research team at Huawei Research. He has recently completed a PhD in Computer Science at the Technion Institute in Israel. His research interests include heterogeneous computing, GPGPU, scheduling, and real-time data processing. Uri interned at NVIDIA in 2014.

DeepSpark is a scalable deep learning framework for Spark-based distributed environments. It employs multiple independent Caffe-based workers that asynchronously generate model updates and a distributed parameter server that maintains a global model. The framework is designed to overcome network delays, and uses Spark's RDDs to provide fast and fault-tolerant access to the parameter server.

Level: All
Type: Talk
Tags: Deep Learning & Artificial Intelligence; Data Center & Cloud Computing; Big Data Analytics

Day: Wednesday, 04/06
Time: 14:30 - 14:55
Location: Room 210H

S6411 - MVAPICH2-GDR: Pushing the Frontier of Designing MPI Libraries Enabling GPUDirect Technologies

Dhabaleswar K. (DK) Panda Professor and University Distinguished Scholar, The Ohio State University
Highly-Rated Speaker
Dhabaleswar Panda is a professor and university distinguished scholar of computer science and engineering at The Ohio State University. He has published over 350 papers in major journals and international conferences. The MVAPICH2 (High Performance MPI over InfiniBand, iWARP and RoCE) open-source software package, developed by his research group, is used by more than 2,450 organizations in 76 countries. This software has enabled several InfiniBand clusters to get into the latest TOP500 ranking during the last decade. More than 293,000 downloads of this software have taken place from the project's website alone. He is an IEEE Fellow and a member of ACM.
Khaled Hamidouche Senior Research Associate, The Ohio State University
Khaled Hamidouche is a senior research associate in the Department of Computer Science and Engineering at The Ohio State University. He is a member of the Network-Based Computing Laboratory led by Dr. D. K. Panda. His research interests include high-performance interconnects, parallel programming models, accelerator computing, and high-end computing applications. His current focus is on designing high-performance, unified MPI, PGAS, and hybrid MPI+PGAS runtimes for InfiniBand clusters and their support for accelerators. Khaled is involved in the design and development of the popular MVAPICH2 library and its derivatives MVAPICH2-MIC, MVAPICH2-GDR, and MVAPICH2-X. He has published over 40 papers in international journals and conferences related to these research areas. He has been actively involved in various professional activities in academic journals and conferences. He is a member of ACM.

Learn how MVAPICH2-GDR library is enabling support for different GPUDirect technologies to simplify the task of porting message passing interface (MPI) applications to supercomputing clusters with NVIDIA GPUs. MVAPICH2-GDR supports MPI communication directly from GPU device memory and optimizes it using various features offered by the CUDA toolkit. Various optimizations are integrated transparently under standard MPI API. Recent advances in MVAPICH2 include support of GDR_Async, MPI-3 RMA using GPUDirect RDMA, usage of fast GDRCOPY, Non-Blocking Collectives using GDR and Core-Direct, and much more. Performance results with micro-benchmarks and applications will be presented using MPI and CUDA/OpenACC. Performance impact of application co-design using MVAPICH2-GDR will also be presented.

Level: Intermediate
Type: Talk
Tags: Supercomputing & HPC; Tools & Libraries; Performance Optimization; OpenACC

Day: Wednesday, 04/06
Time: 14:30 - 14:55
Location: Room 211A

S6438 - How GPU Can Help High Energy Physics Experiments

Gianluca Lamanna Researcher, INFN
Gianluca Lamanna is a researcher at the National Institute for Nuclear Research (INFN) in Italy. He received his Ph.D. in particle physics in 2006 at Pisa University. From 2007 to 2010, he was a postdoctoral student at Scuola Normale Superiore, working mainly in data analysis and detectors design. From 2010 to 2013, he was appointed as research fellow at CERN to work on NA62 experiment data acquisition and trigger. He received the Marie-Curie Fellowship. From 2013 he has been the principal investigator of GAP project, founded by Italian Ministry of Research, to study the application of GPUs for real-time data acquisition.

We aim to show how the online data selection in high-energy physics experiments could benefit from real-time GPU processing. The computing power of GPUs fits the requirements to increase the ability of the trigger systems to reduce the data bandwidth. We designed a system for online processing exploiting commercial GPUs for the NA62 experiment at CERN. In particular we will show different techniques to reduce and control the latency due to data transfer in order to have synchronous response from the system. We will show recent results obtained in a physics run at CERN with high data rate. Attendees will learn how a high-energy physics trigger system works and how GPUs can increase the discovery potential of high-precision experiments.

Level: Intermediate
Type: Talk
Tags: Computational Physics; Algorithms; Press-Suggested Sessions: HPC & Science

Day: Wednesday, 04/06
Time: 14:30 - 14:55
Location: Marriott Salon 6

S6471 - Accelerating Influence Spread Estimation on Social Networks in the Continuous-Time Domain

Zissis Poulos PhD Candidate, Mitacs Accelerate Intern, University of Toronto, Sysomos Inc.
Zissis received his B.E. Degree in Electrical and Computer Engineering (ECE) from the National Technical University of Athens in 2011, and his M.A.Sc degree in ECE from the University of Toronto in 2013. Currently, he is pursuing a PhD in the Department of ECE at the University of Toronto. His research is on formal verification and automated design debugging of digital systems, and extends to applications of data-mining for statistical design debugging. He recently joined Sysomos Inc. as a Data Science intern, where his research focus is on influence maximization, network diffusion models and GPU acceleration for inference algorithms.

This session showcases how to leverage GPUs to accelerate influence spread estimation in large social networks. Estimating the spread of an opinion or product across members of a graph-modelled social network is a hard problem requiring compute-intensive approximation algorithms. The complexity of the problem further rises in the continuous-time domain, where influence transmission rates on network edges are derived from stochastic distributions. Spread estimation algorithms that operate on stochastic transmission rates, such as naive sampling and neighbourhood size estimation, require a plethora of samples to achieve convergence. By exploiting the inherent independence across multiple sampling iterations of these algorithms we achieve up to 11x improvement in run-time using GPUs.

Level: Intermediate
Type: Talk
Tags: Big Data Analytics; Algorithms

Day: Wednesday, 04/06
Time: 14:30 - 14:55
Location: Room 210F

S6691 - Multi-Source Fusion Using Deep Learning

Andrew Jenkins Senior Data Scientist, Digital Globe
Andrew Jenkins works for DigitalGlobe Inc. as a Senior Data Scientist focused on applying deep learning to multi-source spatial data such as satellite imagery, geo-tagged photos and videos. Andrew is currently a PhD candidate in the Department of Geography and Geographic Information Science at George Mason University. He holds a MS degree in Geoinformatics and a BS in Computer and Information Science. Andrew previously worked as a government researcher at the US Army Engineer Research and Development Center, and prior to that spent eight years in the military.
Shay Har-Noy Vice President and General Manager, DigitalGlobe
Shay Har-Noy serves as Vice President and General Manager of DigitalGlobe’s Platform business -- a fast pace, high growth effort to get DigitalGlobe’s 15 year satellite image library in the cloud and available for processing. Machine learning against 100 Petabytes of high resolution imagery at a truly global scale!

DigitalGlobe's satellite constellation collects millions of square kilometers of earth's imagery daily, yielding high resolution data of our planet. By employing DL algorithms & NVIDIA GPUs, DigitalGlobe processes imagery & detect objects at speeds orders of magnitude faster than ever before. Emergency responders require a multitude of information sources to support their mission. DigitalGlobe utilizes several methodologies of fusing disparate data sets together. Social media, weather, other sensor types (eg. RADAR/LIDAR) & Satellite Imagery can be fused together to help decision makers answer questions. By combining the data sets based on their location and common categories from the DL algorithms, emergency responders & analysts are able to automatically verify objects on the ground.

Level: Intermediate
Type: Talk
Tags: Aerospace & Defense; Deep Learning & Artificial Intelligence; Video & Image Processing; Press-Suggested Sessions: HPC & Science

Day: Wednesday, 04/06
Time: 14:30 - 14:55
Location: Marriott Salon 2

S6749 - Light Field Rendering and Streaming for VR and AR

Jules Urbach CEO & Founder, OTOY, Inc.
Jules Urbach is a pioneer in computer graphics, streaming and 3D rendering with over 25 years of industry experience. He made his first game, Hell Cab (Time Warner Interactive) at age 18, which was one of the first CD-ROM games ever created. Six years after Hell Cab, Jules founded Groove Alliance. Groove created the first 3D game ever available on Shockwave.com (Real Pool). Currently, Jules is busy working on his two latest ventures, OTOY and LightStage which aim to revolutionize 3D content capture, creation and delivery.

Jules Urbach, Founder & CEO of OTOY will discuss OTOY's cutting edge light field rendering toolset and platform. OTOY's light field rendering technology allows for immersive experiences on mobile HMDs and next gen displays, ideal for VR and AR. OTOY is actively developing a groundbreaking light field rendering pipeline, including the world's first portable 360 LightStage capture system and a cloud-based graphics platform for creating and streaming light field media for virtual reality and emerging holographic displays.

Level: Intermediate
Type: Talk
Tags: Virtual Reality & Augmented Reality; Rendering & Ray Tracing; Media & Entertainment; Press-Suggested Sessions: Virtual Reality

Day: Wednesday, 04/06
Time: 14:30 - 14:55
Location: Room LL20C

ECS6204 - Early Stage Challenge Finalist: CogniCor

Sindhu Joseph CEO, CogniCor Technologies S.L
https://www.linkedin.com/in/sindhujoseph PhD in Artificial Intelligence with strong Industrial research background, founder of CogniCor Technologies. Inventor of 6 US Patents. Selected as “One of the 3 outstanding Entrepreneurs” by the UK government in 2012. Specialties: Research expertise in artificial intelligence(Software agents design, Cognitive architecture, agent reasoning, multiagent systems, automated negotiation, Text & Data classification and matching, Machine learning algorithms) Entrepreneurial experience and Cross-functional Team Leadership.

Level: All
Type: Talk
Tags: Deep Learning & Artificial Intelligence; Emerging Company Summit

Day: Wednesday, 04/06
Time: 14:40 - 14:50
Location: Room 220B

ECS6209 - Early Stage Challenge Finalist: Lucid VR

Han Jin Cofounder/CEO, Lucid VR Inc
Han is a world-traveling serial entrepreneur who worked and lived in 8 different countries. Born in China, raised in Germany, and finally settled in the United States because of his profound appreciation for the Bay Area startup culture after graduating from UC Berkeley. The problem solving ethos embodied by the entrepreneurs here inspired him to develop innovative approaches to founding, running and scaling startups. This has lead him from his first web-based startup directly out of Berkeley to his first IoT hardware startup on Kickstarter to his first non-profit startup in Y Combinator's Summer 2014 batch, and finally to his current endeavor in the Virtual Reality space - Lucidcam.com - a company in the Berkeley Skydeck Accelerator. He and his team believe that immersive content captured by LucidCam will ultimately disrupt pictures and videos forever. He was a speaker at the NAB Show in Las Vegas and panelist at several VR events. Since moving to the bay he has also enjoyed being an advisor to YC companies, a judge at Stanford and Berkeley Startup Competitions, and a mentor at the SanDisk accelerator focused on bringing entrepreneurship into a corporate environment.

Level: All
Type: Talk
Tags: Computer Vision & Machine Vision; Graphics Virtualization; Media & Entertainment; Emerging Company Summit

Day: Wednesday, 04/06
Time: 14:50 - 15:00
Location: Room 220B

ECS6128 - Early Stage Challenge Finalist: Linkface

Tingjiao Ma Co-founder, Linkface
Tingjiao Ma is one of Linkface co-founders, mainly in charge of Marketing & Communication affairs. Tingjiao Ma is a graduate from Chinese University of Hong Kong and Xi'an Jiaotong University(China's top Ten University).

Level: All
Type: Talk
Tags: Computer Vision & Machine Vision; Deep Learning & Artificial Intelligence; Video & Image Processing

Day: Wednesday, 04/06
Time: 15:00 - 15:10
Location: Room 220B

S6114 - Attack Graphs: Visualizing 200M Alerts a Day with GPU Clouds and JavaScript

Leo Meyerovich CEO, Graphistry, Inc.
Leo Meyerovich co-founded Graphistry, Inc., in 2014 to scale visual graph analytics. Graphistry builds upon the founding team's work at UC Berkeley on the first parallel web browser and Superconductor, a declarative GPU-accelerated data visualization language. Leo's most referenced work is in language-based security: language design and automatic verification for web apps and across control. However, his broader work from the past 10 years has been in designing programming languages, receiving awards for his research on the first reactive web language (OOPSLA), automatic parallelization (PLDI), and sociological foundations (OOPSLA, SIGPLAN).
Michael Wendt Data Engineer Principal, Accenture
Michael Wendt is a researcher at Accenture Technology Labs, where he focuses on developing new techniques and architectures to power next-generation data visualizations. As a developer Michael has created many responsive data visualizations for clients (like www.disasterviz.com and www.riskanalyticsviz.com) and believes that there is a fundamental connection between the user's experience and the backend performance. His previous research work on Hadoop, Cassandra, Storm and other technologies has allowed him to build and design big data solutions for clients. Michael has a BS in Computer Engineering from University of Maryland: College Park.
Joshua Patterson Data Science Principal, Accenture
Joshua Patterson is a Principal Data Scientist at Accenture Technology Labs, and a Presidential Innovation Fellow. At Accenture he leads data science research on Cyber Security and Risk, focusing on big data architecture, analytics, and visualization techniques to accelerate fraud and anomaly detection at scale. For the government, he supports the Data Services Initiative at the Department of Commerce. Prior to Accenture, Joshua led advanced analytics projects across several sectors including financial services, state and federal government, and commercial real estate. His current passion is graph analytics, GPUs, and advanced visualization. Joshua also loves storytelling with data, and some of his work can be seen at www.hotshotcharts.com, www.disasterviz.com, & www.riskanalyticsviz.com. Joshua holds a B.A. in Economics from the University of North Carolina Chapel Hill and a M.A. in Economics from the University of South Carolina Moore School of Business.

Enterprises "assume breach": someone, somewhere, already compromised them. Analysts sift through a GB/min (or more!) of attack logs from hundreds of thousands of systems. For every identified incident, they then map out the entire breach by backtracking through months of alerts. This talk shares how Graphistry and Accenture tackled the visual analytics problem: how do we explore big graphs? We'll drill into two of our GPU technologies for visualizing graphs: [1] StreamGL, our distributed real-time renderer for delivering buttery interactions, smart designs, and responsive analytics to standard web devices; [2] Node-OpenCL and our CLJS client: open source JavaScript libraries for server-side GPU scripting.

Level: Beginner
Type: Talk
Tags: Big Data Analytics; Aerospace & Defense; Large Scale and Multi-Display Visualization; Press-Suggested Sessions: HPC & Science

Day: Wednesday, 04/06
Time: 15:00 - 15:50
Location: Room 210F

S6121 - GPU-Accelerated Computer Vision for Multimedia, Post Production and Surveillance

Hannes Fassold Senior Researcher, JOANNEUM RESEARCH
Hannes Fassold works at Joanneum Research, where he is a senior researcher at the Audiovisual Media Group of DIGITAL -- the Institute for Information and Communication Technologies. His main research interests are algorithms for digital film restoration, content-based video quality analysis, and the efficient parallelization of these algorithms on the GPU. He received an M.S. in applied mathematics from Graz University of Technology in 2004. He has published several publications in these fields and is the principal investigator for the CUDA Research Center at DIGITAL - Institute for Information and Communication Technologies, Joanneum Research.

Computer vision is at the core of many tools used in multimedia, post-production, and surveillance. We'll present some key computer vision algorithms for motion compensation, feature point extraction and tracking, SIFT descriptor extraction, and wavelet transform. We'll provide information about the significant speed-up we gained from porting these algorithms to the GPU and lessons learned from the process of porting. We'll give insight how these algorithms are used in several applications like real-time video quality analysis (detection of dropouts and noise level), brand visibility monitoring in broadcast content, film and video restoration (dust and dirt removal, noise reduction, etc.), and traffic monitoring for wrong-way driver detection.

Level: All
Type: Talk
Tags: Media & Entertainment; Computer Vision & Machine Vision; Video & Image Processing

Day: Wednesday, 04/06
Time: 15:00 - 15:25
Location: Room LL21C

S6190 - Tools and Approaches for Increasing Developer Productivity on GPU-Enabled Systems

Eric Kelmelis CEO, EM Photonics
Highly-Rated Speaker
Eric Kelmelis is the co-founder and CEO of EM Photonics, a company focused on the development and transition of innovative research and technology in the fields of advanced imaging, high-performance computing, and embedded systems. Mr. Kelmelis received B.S. and M.S. degrees in electrical engineering from the University of Delaware, has more than 60 publications, and holds two patents. He has also served as conference chair at SPIE's Defense, Security, and Sensing symposium since 2010.

We are building technologies to allow developers to take advantage of GPUs without requiring deep knowledge of the underlying platform and programming paradigms. We'll provide an overview of our initiatives in the space. Specifically, we'll discuss libraries that encapsulate common and domain-specific functionality and abstract it to a level that allows the use of GPU-accelerated routines by users with no knowledge of the underlying hardware; static and dynamic task scheduling approaches to optimize workflows on mixed device systems; and tools to optimize computational software for either performance or power. We'll provide our perspective on easing the burden on developers and how our work can be applied to new applications or in refactoring existing code.

Level: All
Type: Talk
Tags: Tools & Libraries; Supercomputing & HPC; Programming Languages

Day: Wednesday, 04/06
Time: 15:00 - 15:25
Location: Room 211B

S6197 - Improving High Performance Image Resizing and Rotation: A Case Study of Texture Options

Ismayil Guracar Senior Key Expert, Siemens Medical Solutions, Ultrasound Business Unit
Highly-Rated Speaker
Ismayil Guracar has been working in the ultrasound imaging field for over 29 years. He is a senior key expert with the Innovations Applications Group at Siemens Healthcare, Ultrasound Business Unit in Mountain View, Calif. His interests include ultrasound image formation and high-performance, real-time signal processing, especially using GPUs. He holds 68 U.S. patents, has pioneered new ultrasound technologies in the areas of parametric and molecular imaging, and has contributed to the development of many successful diagnostic medical imaging products.

We present a case study on the use of textures for image resizing and rotation using conventional bilinear and high-quality cubic interpolation filtering using various texturing options and data widths. Choices including CUDA arrays, pitched 2D arrays, or linear memory each offer benefits and drawbacks that depend on the particular demands and details of the application. We provide performance measurements from the latest Maxwell GPU architecture, which has a number of performance improving advances over previous generations. We hope to provide information and insight to CUDA developers and demonstrate some benchmarking and measurement techniques with Nsight so that informed choices can be made about how best to match texture image processing options to application requirements.

Level: Intermediate
Type: Talk
Tags: Performance Optimization; Video & Image Processing; Real-Time Graphics

Day: Wednesday, 04/06
Time: 15:00 - 15:25
Location: Room LL21B

S6233 - Intelligent Mobile System for Improving Spatial Design Support and Security Inside Buildings

Janusz Bedkowski Senior Researcher, Institute of Mathematical Machines
Janusz Bedkowski works for Institute of Mathematical Machines. He received his Ph.D. in 2010 in automation and robotics. His main research is related with autonomous mobile mapping systems. He is head of the Laboratory of Image Processing and NVIDIA GPU Research Centre in Institute of Mathematical Machines. Janusz is involved in a research project funded by EU FP7 and NCBiR (Polish National Centre for Research and Development) on a mobile spatial assistance system as well as a methodology project with the NCN (Polish National Centre of Science) on semantic models building based on mobile robots observations. His main expertise is mobile robotics, image processing, parallel computing, mobile robotic system design, and artificial intelligence. He is working on training and support technologies for operators of multi-robotic systems. He is actively publishing in international journals such as Elsevier Automation in Construction, Industrial Robot an International Journal, and Springer Optoelectronics Review. He is also reviewer in these journals.

This talk concerns the intelligent mobile application for spatial design support and security domain. Mobility has two aspects in our research: The first one is the usage of mobile robots for 3D mapping of urban areas and for performing some specific tasks. The second is related to a novel software as a service system that allows access to robotic functionalities and data over the Ethernet. Thus, we demonstrate the use of the novel NVIDIA GRID technology, which virtualizes the GPU. We introduce Complex Shape Histogram, a core component of our artificial intelligence engine, used for classifying 3D point clouds with Support Vector Machine. We use NVIDIA CUDA for accelerating computations.

Level: Intermediate
Type: Talk
Tags: Aerospace & Defense; Data Center & Cloud Computing; Robotics & Autonomous Machines; Graphics Virtualization

Day: Wednesday, 04/06
Time: 15:00 - 15:25
Location: Marriott Salon 2

S6239 - From CVA to the Resolution of a Large Number of Small Random Systems

Lokman Abbas Turki Lecturer, LPMA, Paris 6 University
Lokman Abbas Turki is a lecturer at the Laboratoire de Probabilites et Modeles Aleatoires. Prior to this position, he spent two years as a postdoc at TU Berlin working on probability problems related to market impact and liquidity. Before that, Lokman earned his Ph.D. in probability and worked for a few months at INRIA as an expert in GPU parallelization of financial algorithms. During his Ph.D., he built a strong relationship with financial institutions like Credit Agricole and Pricing Partners.

The credit valuation adjustment (CVA) simulation represents a typical example of a problem that can be successfully overcome using GPUs. It also shows how challenges from a real-world application require advanced computing optimizations. This presentation covers both the algorithmic aspect of using GPUs for the CVA and the implementation optimization that should be performed when resolving a large number of small systems. The algorithmic part involves a nested Monte Carlo method for which we establish a judicious choice that relates the number of the inner and the number of the outer simulated trajectories. The implementation part presents and compares on a large number of small systems: the LDLt factorization, the Householder reduction, and the divide and conquer diagonalization.

Level: Intermediate
Type: Talk
Tags: Finance; Algorithms; Performance Optimization

Day: Wednesday, 04/06
Time: 15:00 - 15:50
Location: Marriott Salon 1

S6352 - Adapting the Visualization Toolkit for Many-Core Processors with the VTK-m Library

Christopher Sewell Staff Scientist, Los Alamos National Laboratory
Christopher Sewell is a staff scientist in the Computer, Computational, and Statistical Sciences Division at Los Alamos National Laboratory. His research interests include large-scale and in-situ visualization and analysis, data-parallelism, and multi-core and many-core technologies. Before joining Los Alamos, Christopher worked in the fields of haptics and medical robotics. He received a B.S. in Computer Science from Texas A&M University and an M.S. and Ph.D. in Computer Science from Stanford University.

To address the need for HPC scientific visualization software to effectively exploit GPUs and other many-core processors, U.S. DOE researchers are building a new library, named VTK-m. VTK-m provides a framework for simplifying the design of visualization algorithms on emerging architectures. It provides a flexible data model that can adapt to many scientific data types and operate well on multithreaded devices. Finally, VTK-m serves as a container for algorithms designed in the framework and gives the visualization community a common point to collaborate, contribute, and leverage massively threaded algorithms. In this talk, we will describe the design of VTK-m, and present results related to the functionality and performance of VTK-m for a variety of visualization applications.

Level: Intermediate
Type: Talk
Tags: In-Situ and Scientific Visualization; Supercomputing & HPC; Algorithms

Day: Wednesday, 04/06
Time: 15:00 - 15:50
Location: Room LL21D

S6408 - HPC Application Porting to CUDA® at BSC

Pau Farré HPC Software Developer, Barcelona Supercomputing Center
Pau Farre received his B.S. in computer science in February 2014 from the Universitat Politecnica de Catalunya. Since then, he has been working at the Barcelona Supercomputing Center in the Accelerators for HPC group. He also is a member of UPC/BSC CUDA Center of Excellence.
Marc Jorda Resident Student, Barcelona Supercomputing Center
Marc Jorda received his MSc in computer science in February 2012 from the Universitat Politecnica de Catalunya. Since then, he has been working at the Barcelona Supercomputing Center in the Accelerators for HPC group, pursuing his PhD. He also is a member of UPC/BSC CUDA Center of Excellence.

In this session you will learn the main challenges that we have overcome at the BSC to successfully accelerate two large applications by using CUDA and NVIDIA GPUs: WARIS (a Volcanic Ash Transportation Model) and PELE (a Drug Molecule Interaction Simulator). We show that leveraging asynchronous execution is key to achieve a high utilization of the GPU resources (even for very small problem sizes) and to overlap CPU and GPU execution. We also explain some techniques to introduce Unified Virtual Memory in your data structures for seamless CPU/GPU data sharing. Our results show an execution time improvement in WARIS of 8.6x for a 4-GPU node compared to a 16-core CPU node (using by-hand AVX vectorization and MPI). Preliminary experiments in PELE already show a 2x speedup.

Level: All
Type: Talk
Tags: Supercomputing & HPC; Computational Chemistry; Earth System Modelling

Day: Wednesday, 04/06
Time: 15:00 - 15:25
Location: Room 211A

S6480 - GPUAM: Graphics Processing Units for Atoms and Molecules

Jorge Garza Professor, Universidad Autonoma Metropolitana
Jorge Garza is a full-time professor at the Universidad Autonoma Metropolitana-Iztapalapa in Mexico City, and has published around 70 scientific reports related to quantum chemistry supported by parallel computing. He obtained his Ph.D. at UAMI by studying confinement effects on the electron structure of atoms, within the context of the density functional theory. After his Ph.D., he gained experience on parallel programming techniques at the Pacific Northwest National Laboratory, working with quantum chemistry suite code NWChem. In 2008, he was responsible for the installation of the fastest supercomputer in Latin America. Now, he applies parallel programming techniques on heterogeneous computing.

We'll present the GPUAM code, which uses GPUs to analyze electron density or molecular orbitals for atoms and molecules. Several quantum chemistry vector and scalar fields are evaluated on 3D grids, which are mapped on 2D GPU grids. Among the quantum chemistry fields considered by GPUAM are: 1) Molecular orbitals (MO) and electron density, 2) Gradient or Laplacian of MO and electron density, 3) Non-covalent interactions index, 4) Electrostatic potential, 5) Critical points search within the atoms in molecules approach. We'll present several applications, in particular, proteins or parts of proteins where hydrogen bonds are relevant. The performance of GPUAM is contrasted with a CPU counterpart, showing the importance of the GPUs in quantum chemistry analysis.

Level: All
Type: Talk
Tags: Computational Chemistry; Computational Physics

Day: Wednesday, 04/06
Time: 15:00 - 15:25
Location: Marriott Salon 5

S6509 - High-Performance Batched Computations for GPUs: Approaches and Applications

Stanimire Tomov Research Director, UTK
Stan Tomov received a Ph.D. in mathematics from Texas A&M University in 2002. He is a research director at the Innovative Computing Laboratory and adjunct assistant professor in EECS at the University of Tennessee, Knoxville (UTK). His research interests are in parallel algorithms, numerical analysis, and HPC. Currently, his work is concentrated on the development of numerical linear algebra software for emerging architectures. Stan is a PI of the GPU Center of Excellence at UTK.
Azzam Haidar Research Scientist, UTK
Azzam Haidar is a research scientist at the Innovative Computing Laboratory at the University of Tennessee, Knoxville. He received a Ph.D. in 2008 from CERFACS, France. His research interests focus on the development and implementation of parallel linear algebra routines for scalable multi-core and GPU architectures, for large-scale dense and sparse problems, as well as approaches that combine direct and iterative algorithms to solve large linear systems as well as eigenvalue problems.

Learn techniques for efficient batched computations on GPUs, where small and independent computations must be grouped and executed together to obtain high performance. These problems occur very frequently in scientific applications like machine learning, data mining, dense and sparse solvers, high-order FEM, astrophysics, and more. We will consider the development of batched computations for these applications, stressing innovative GPU techniques and algorithms for uniform, as well as variable-size batches, tensor contractions, batched BLAS, and more. Batched computations can fill up the GPU with work, remove scheduling overheads and costly CPU-GPU communications to accelerate the computation often by an order of magnitude compared to non-batched approaches.

Level: Intermediate
Type: Talk
Tags: Algorithms; Tools & Libraries; Performance Optimization; Computer-Aided Engineering

Day: Wednesday, 04/06
Time: 15:00 - 15:25
Location: Marriott Salon 3

S6522 - A Novel Neural Network Architecture for Representing Scene Structure

Eric Weiss Graduate Student, UC Berkeley
Eric Weiss is a Ph.D. student at UC Berkeley, working under Professor Bruno Olshausen at the Redwood Center for Theoretical Neuroscience. His work focuses on computational modeling of cognitive and neural processes using methods from statistics and machine learning.

Early works on deep image processing using recurrent neural networks with selective attention have yielded promising results. However, it is unclear whether standard recurrent network architectures are well-suited to representing scene structure. We present a novel memory system that can efficiently store a high-level model of a scene. The proposed approach features several advantages: it is differentiable, easy to analyze, and has constant memory requirements. Additionally, we show how it is relatively straightforward to incorporate it into a selective attention mechanism based on information theoretic principles, enabling highly efficient image processing. We present results on a toy dataset.

Level: Intermediate
Type: Talk
Tags: Robotics & Autonomous Machines; Computer Vision & Machine Vision; Deep Learning & Artificial Intelligence; Algorithms; IoT

Day: Wednesday, 04/06
Time: 15:00 - 15:25
Location: Room LL20D

S6523 - Chainer: A Powerful, Flexible, and Intuitive Deep Learning Framework

Shohei Hido Chief Research Officer, Preferred Networks America, Inc.
Shohei Hido is Chief Research Officer of Preferred Networks America, Inc. He received M.S in Informatics from Kyoto University in Japan, 2006. Since then, he has worked at IBM Research in Tokyo for six years as a staff researcher in machine learning and its applications to many industries. After joining Preferred Infrastructure, Inc. in 2012, he has worked as the leader of Jubatus project, an open source software framework for real-time, streaming machine learning. Currently, he is the product manager of Deep Intelligence in Motion, software for using deep learning in IoT applications. Preferred Networks was established as a spinout company from Preferred Infrastructure in 2014.

CUDA-based framework is the key for applying deep learning technologies. We introduce Chainer, a Python-based standalone open-source framework. The audiences will know how Chainer works and enables new kinds of applications of deep learning. Due to the success of Caffe, Torch, and Theano, the power of deep learning continues to expand beyond traditional pattern recognition tasks such as image recognition. However, the gap is rapidly increasing between the complexities of newly proposed neural network models, and the capabilities of existing frameworks, which have been mainly used for convolutional neural networks. Chainer enables users to intuitively implement many kinds of other models including recurrent neural networks with a lot of flexibility and comparable performance with GPGPU.

Level: Intermediate
Type: Talk
Tags: Deep Learning & Artificial Intelligence

Day: Wednesday, 04/06
Time: 15:00 - 15:50
Location: Hall 3

S6547 - Segmentation of Medical Volumes Using Convolutional Neural Networks

Fausto Milletari Research Scientist, PhD Candidate, Technical University of Munich
Fausto Milletari has been a Ph.D. candidate at the Technical University of Munich (TUM) since October 2013. After earning his M.S. in informatics, passed with high distinction, he joined the chair for Computer Aided Medical Procedures, directed by Professor Nassir Navab. Fausto's major research topic is segmentation of ultrasound images of the brain. In addition, he works on a variety of other computer vision problems, such as object tracking and detection. His work focuses on pattern recognition and machine learning, and in particular on voting-based approaches using state-of-the-art learning techniques. Several of his contributions have been presented in recent editions of MICCAI, IPCAI, and BMVC. Outside of the lab, Fausto strives to spread scientific knowledge about machine vision to a wider audience. He recently founded the computer vision and medical image analysis meetup group of Munich, which hosts monthly events that bring together academics and industry representatives interested in the field.

Can convolutional neural networks be used effectively for medical tasks? How does the choice of network architecture influence outcomes? How can we cope with the limited amount of annotated training data that is usually available in the medical domain? Is it advantageous to process volumetric data instead of 2D images? We'll seek answers to these questions by showing our recent results on segmentation. A Hough-voting strategy is used in conjunction with CNNs to localise and segment deep brain regions in MRI and ultrasound. We benchmark six different CNN architectures by training them with different amounts of training data and different input dimensionality. Results suggest that the complexity of the task, from a human standpoint, correlates to the required network complexity.

Level: All
Type: Talk
Tags: Medical Imaging; Deep Learning & Artificial Intelligence; Press-Suggested Sessions: AI & Deep Learning

Day: Wednesday, 04/06
Time: 15:00 - 15:25
Location: Room 212B

S6556 - Hollywood Under the Hood: The Mercedes Concept IAA

Vilya Harvey Senior Software Engineer, The Foundry
Vil is a Senior Software Engineer at The Foundry, where he's worn many hats and grown a beard. He's been part of the Nuke team, the Research team and, nowadays, the Future Technologies team where he's designing and building a real-time rendering engine. Before joining The Foundry, Vil was the Head of Systems Development at industry-leading VFX house Framestore - a role he came to after a 5 year stint writing pipeline management software for Oil & Gas. He still wonders whether Framestore understood what type of pipeline his previous experience involved. A native of Perth, Western Australia, Vil moved to England straight after graduating from the computer science programme at Curtin University and has been there ever since. He now lives in Leeds with his wife and despite working with Mercedes has recently bought a VW.
Alexander Hilliger_Von_Thile Senior Manager, Advanced Graphics & Rendering, Mercedes-Benz Research & Development
Alexander Hilliger_Von_Thile joined MBRDNA in 2010 working in the field of high performance real time graphics for head units and instrument clusters, both for show cars and products. He was responsible for the UI/UX implementation of the Concept S-Class Coupe, the F 015 Luxury in Motion show car, and the Concept Intelligent Aerodynamic Automobile. He manages the Advanced Graphics and Rendering team in Sunnyvale, CA. His team is closely working with designers and engineers to bring real time rendering technologies into the vehicle.

Mercedes Benz' history is defined by moments of innovation that have dramatically shaped and impacted the automotive industry. When the company sought a solution to support the efficient development and delivery of next generation digital user experiences (UX), it engaged The Foundry to help create that solution. The solution, code named Project Dash, leverages proven 3D content and digital visualization technology from The Foundry, existing Mercedes solutions and custom software development. Working closely with Mercedes, The Foundry created a fully bespoke solution for real time UI/UX design. With this solution, Mercedes UX designers can explore, create and iterate faster, with high-quality content.

Level: All
Type: Talk
Tags: Self-Driving Cars & Automotive ; Media & Entertainment; Press-Suggested Sessions: Self-Driving Cars & Auto

Day: Wednesday, 04/06
Time: 15:00 - 15:25
Location: Room LL21E

S6587 - Lessons Learned from VR Navigation in Neurosurgery at UCLA

Moty Avisar Co-Founder and CEO, Surgical Theater
Moty Avisar is president and co-founder of Surgical Theater. He has led his company from startup venture to inception to maturity -- from the development of the idea and the business case throughout sales and revenue. He has led IP, regulatory, and financial strategies with successful execution -- achieving patent approval and FDA / 510k clearance in record time frames.
Neil Martin Chairman-Department of Neurosurgery-UCLA, UCLA Department of Neurosurgery
Dr. Neil Martin is Chair of Neurosurgery at Ronald Reagan UCLA Medical Center, and serves as Head of the Neurovascular Surgery Section. He is the Director of the UCLA Cerebral Blood Flow Laboratory, Medical Director of the UCLA Neurosurgery Intensive Care Unit and a co-Director of the UCLA Stroke Center. He is internationally known for his research regarding cerebral blood flow and brain metabolism following brain injury, and is recognized as one of the top specialists in the area of surgical treatment of cerebrovascular disease.
Alon Geri EVP Engineering & Co-Founder, Surgical Theater
Born & Raised in Israel, currently living in Cleveland, OH. Ex-Israeli Air Force Pilot and R&D Officer. Senior software engineer and Chief engineer of large scale Flight Simulation Programs for the Israeli Air Force. BSC, Computer Science & Mathematics. Recognized as creative, "out of the box" problem solver.

Learn about the real-world experiences of using virtual reality in the operating room from a leading brain surgeon. Dr. Neil Martin, chairman of Neurosurgery at UCLA, will share his insights and vision of how using virtual reality and enhanced 3D imaging is transforming complex surgery. Joining Dr. Martin will be Moty Avisar, co-founder and CEO of Surgical Theater, who will share how his company has transferred the science behind advanced flight simulation into a first-of-its-kind healthcare solution.

Level: Beginner
Type: Talk
Tags: Virtual Reality & Augmented Reality; Medical Imaging; Press-Suggested Sessions: Virtual Reality; Press-Suggested Sessions: HPC & Science

Day: Wednesday, 04/06
Time: 15:00 - 15:25
Location: Room LL20C

S6644 - How an Architectural Design Firm Leverages Virtual GPU Technology for Global Collaboration

Jimmy Rotella Design Application Specialist, CannonDesign
Jimmy Rotella received his B.A. in architecture from the Illinois Institute of Technology, and is now a design application specialist at CannonDesign in Chicago. In his 10 years of experience, he has worked for multiple large design firms implementing Revit, developing project standards, managing software and infrastructure, providing technical support for design applications and computers, and also teaching in both corporate and educational environments. His background in both IT and architecture put him at the forefront of design technology and position him to share his knowledge of new tools with others to help them build and realize their digital designs.
,

Learn the benefits that virtualization provides for an architecture and engineering design firm, along with the journey through the advancements in virtualization technology it took to finally meet the graphics-intensive needs of our design software. We'll share our actual experiences in how virtualization allows a large company, with over 15 offices and 1,000 people worldwide, to collaborate and work as a single firm. We'll show some cost comparisons with virtualization, along with their management benefits and requirements. We'll also look at the methods we used to set and test metrics specific to our requirements, and follow the results of those metrics through the changes in graphics virtualization technology.

Level: All
Type: Talk
Tags: Graphics Virtualization; Product & Building Design; Data Center & Cloud Computing; Press-Suggested Sessions: Professional Graphics

Day: Wednesday, 04/06
Time: 15:00 - 15:50
Location: Marriott Salon 4

S6667 - Revolutionizing Lattice QCD Physics with Heterogeneous Multigrid

Kate Clark HPC Engineer, NVIDIA
Kate Clark has worked at NVIDIA since 2011, where she works at the interface between applications, algorithms, and parallel computation. Kate's background is in high energy physics, having completed doctoral research in Monte Carlo algorithms for lattice quantum chromodynamics in 2005 and graduating from the University of Edinburgh. Upon subsequently moving to Boston University, Kate focused on adaptive multi-grid algorithms and symplectic integrators. It was during this time that research was initiated into harnessing GPUs for lattice QCD computation: this research has since evolved into the QUDA library. Kate spent 2009-2011 at Harvard University, continuing to work on algorithms for GPUs and many-core processors, with focus on signal processing.
Alexei Strelchenko Staff Scientist, Fermilab National Laboratory
Alexei Strelchenkov joined the Scientific Computing Division's Lattice Quantum Chromodynamics (LQCD) group at Fermilab in 2013, coming from a postdoc in computational physics in Cyprus. Since 2010, he has been working on linear solver algorithms in the QUDA library, which enables high-efficiency LQCD computations to be performed on GPU-based HPC clusters. More recently, he has been applying similar techniques for Xeon Phi coprocessors, in preparation for the forthcoming Cray 'Cori' system to be based at NERSC.

Learn how combining GPUs with advanced multi-grid solvers are revolutionizing the study of lattice quantum chromodynamics (LQCD). LQCD is a computational tool for probing nuclear and particle physics, however, it can require thousands of GPUs working in tandem for months due to the computationally prohibitive linear solver. Using the QUDA framework, we describe how the solver can be accelerated using an adaptive multi-grid method. The optimization techniques employed are: fine-grained parallelization, mixed precision, communication reducing solvers, and reformulation of the algorithm to allow the CPU and GPU to work in parallel. Using this multitude of algorithmic innovations, we demonstrate that a 5X speedup can be realized over present state-of-the-art methods using GPUs.

Level: Intermediate
Type: Talk
Tags: Computational Physics; Algorithms; Performance Optimization

Day: Wednesday, 04/06
Time: 15:00 - 15:50
Location: Marriott Salon 6

S6681 - Benefits of Remote GPU Virtualization: The rCUDA® Perspective

Federico Silla Associate Professor, Technical University of Valencia
Federico received the MSc and PhD degrees in Computer Engineering from Technical University of Valencia, Spain, in 1995 and 1999, respectively. He is currently an associate professor at the Department of Computer Engineering (DISCA) at that university, where he teaches Computer Networks as well as High Performance Interconnects courses at the Computer Engineering School. He is also an external contributor of the Advanced Computer Architecture research group at the Department of Computer Engineering at the University of Heidelberg. Furthermore, he worked for two years at Intel Corporation, developing on-chip networks. His research addresses high performance on-chip and off-chip interconnection networks as well as distributed memory systems and remote GPU virtualization mechanisms. In this regard, he is the coordinator of the rCUDA remote GPU virtualization project since it began in 2008.

Many applications use GPUs to accelerate their execution. However, using GPUs presents several side effects, such as increased acquisition and maintenance costs and space requirements. Moreover, these increased costs may not be easily amortized because GPUs usually present very low utilization rates. In a similar way to virtual machines, the use of virtual GPUs may overcome the concerns associated with the use of real GPU devices. The remote GPU virtualization technique allows an application being executed in a computer not having a GPU to transparently make use of a GPU installed in other node of the cluster. Although the use of remote GPUs may seem to be a senseless idea, it provides several benefits as described in this talk by using the rCUDA (remote CUDA) middleware as a case study.

Level: All
Type: Talk
Tags: Data Center & Cloud Computing; Supercomputing & HPC; Tools & Libraries

Day: Wednesday, 04/06
Time: 15:00 - 15:25
Location: Room 210E

S6781 - Deep Neural Networks for Conversational Language Understanding

Kaheer Suleman CTO, Maluuba Inc.
As CTO and co-founder, Kaheer Suleman led the creation of Maluuba's Deep-Learning based Natural Language Understanding platform and is the technology visionary behind the algorithms that power voice search across millions of devices. Kaheer's background is in information retrieval and artificial intelligence and he previously worked in the artificial intelligence and information retrieval labs at the University of Waterloo under Information Retrieval experts - Professor Pascal Poupart and Olga Vectomova. Kaheer's expertise in building language understanding and conversational systems has served as guide for Maluuba's research and development team through some of the toughest challenges in machine comprehension and spoken dialogue.

Recent advances in deep learning have all but solved speech recognition and image processing. The next frontier is natural language. Maluuba's vision is intelligent machines that can think, reason, and communicate, and we believe that language and intelligence are inextricable. We'll describe steps taken toward next-generation dialogue systems. Such systems should include the capacity to retain memory across dialogue turns and from past conversations, to clarify user intents through dynamic, back-and-forth speech, and to acquire new knowledge through interaction with humans. By taking a deep-learning approach, we have achieved state-of-the-art performance in natural language understanding and dialogue state tracking, tasks vital for any goal-driven dialogue system.

Level: Intermediate
Type: Talk
Tags: Deep Learning & Artificial Intelligence

Day: Wednesday, 04/06
Time: 15:00 - 15:25
Location: Room 210H

ECS6158 - Early Stage Challenge Finalist: Intelligent Voice

Nigel Cannings CTO, Intelligent Voice
Nigel became a lawyer because his father told him to go out and get a proper job, for which he is now eternally grateful. He qualified as a solicitor in 1993, and has worked for some of the world’s largest law firms and software companies. Nigel has been the CTO of Intelligent Voice, Speech Recognition Expert, for the last 7 years. As a keen technologist, Nigel is always on the lookout for new challenges, and so is often seen finding new ways of stretching existing techniques and technology. This has led to interesting commissions such as recovering data from an “unbreakable” portable recording device. He has also gained UK government recognition for high tech research in the form of a large grant to explore leading edge problems in speech research, such as ultra-high speed GPU accelerated speech recognition, and emotional analysis of telephone calls. Nigel contributes regularly to a number of publications, including the Huffington Post, and the Global Legal Post, and well as blogging on the Intelligent Voice website. He has featured in a number of newspaper articles, including the front page of the Wall Street Journal. The WSJ also made a video looking at the advanced techniques used by Intelligent Voice to track trader wrongdoing.

Level: All
Type: Talk
Tags: Deep Learning & Artificial Intelligence; Finance

Day: Wednesday, 04/06
Time: 15:10 - 15:20
Location: Room 220B

ECS6177 - Early Stage Challenge Finalist: Horus Technology

Saverio Murgia CEO, Horus Technology
When I was a child I used to dream of becoming an inventor whose inventions could improve the life of millions of people. Now, I am CEO and co-founder at Horus Technology, a startup whose goal is exactly that: improve the life of nearly 300 million people who are blind or visually impaired. I hold a MSc in Advanced Robotics and a BSc in Biomedical Engineering and I spent time in some of the top instution in my field: EPFL, IIT (Italy) and Ecole Centrale de Nantes. Proud alumnus of ISICT, the excellence school at the University of Genova. where I studied subjects like management, economics and law.

Level: All
Type: Talk
Tags: Computer Vision & Machine Vision; Deep Learning & Artificial Intelligence

Day: Wednesday, 04/06
Time: 15:20 - 15:30
Location: Room 220B

ECS6167 - Early Stage Challenge Finalist: Hypercubes Inc

Yen Yang Lim CEO, Hypercubes
A published researcher in the medical Internet of Things, a self taught space engineer and five-year veteran of the Australian startup scene, Brian Lim believes that we must ask the hardest question in the hardest places, to get the best answers! Brian Lim understands the nature of innovation in the space industry, and the new wave of Space Engineering and opportunities that are presenting themselves.  He remains on the cutting edge of technology, innovation and disruption. He remains a firm believer that the space industry holds new opportunities that the globe needs for economic recovery

Level: All
Type: Talk
Tags: Computer Vision & Machine Vision; Earth System Modelling; Aerospace & Defense; Emerging Company Summit

Day: Wednesday, 04/06
Time: 15:30 - 15:40
Location: Room 220B

S6167 - GPU-Accelerated Radiological Knowledge Extraction System at MGH

Synho Do Assistant Medical Director, MGH and Harvard Medical School
Synho Do, Ph.D., is an assistant in physics at Massachusetts General Hospital, where he is a technical committee member of Webster Center for Advanced Research and Education in Radiation, and instructor at Harvard Medical School. Synho received a Ph.D. degree in Biomedical Engineering from University of Southern California. He is currently a member of IEEE Signal Processing Society, Bio-Imaging and Signal Processing (BISP). He is a MGH site PI for NVIDIA CUDA Research Center (CRC). Synho's current research interests include statistical signal and image processing, estimation, detection, and medical signal and image processing, such as computed tomography. He has been a co-investigator for multiple medical imaging projects, and co-PI/PI on medical (i.e., GE, Siemens, and Philips etc) and security (i.e., DHS, DARPA etc) image reconstruction projects.

We'll present a novel GPU-accelerated knowledge extraction system as decision support for radiologists to help reduce human error and improve workflow efficiency. The Massachusetts General Hospital (MGH) Picture Archiving and Communication System (PACS) boasts a database of 20 billion radiology images across 13 million studies that remains limited by a system that is not "intelligent" and fully optimized to extract value for patient care. We are developing a powerful GPU computing system using NVIDIA DIGITS DevBox and GPU clusters to process the vast amounts of data sets and extract clinically relevant radiological knowledge that can enhance image interpretation. We'll introduce the architecture of our radiological knowledge extraction system and the results of the training.

Level: Intermediate
Type: Talk
Tags: Medical Imaging; Big Data Analytics; Deep Learning & Artificial Intelligence; Press-Suggested Sessions: AI & Deep Learning

Day: Wednesday, 04/06
Time: 15:30 - 15:55
Location: Room 212B

S6204 - Energy Consumption Evaluation for Krylov Methods on Cluster of GPU Accelerators

Serge Petiton Professor, University of Lille 1, Sciences and Technologies
Serge G. Petiton is a professor at the Scientific and Technical University of Lille. He received his Ph.D. in computer science in 1988 and the "Habilitation a diriger des recherches" in 1993 from Pierre and Marie Curie University, in Paris, France. Serge was a post-doc student, registered at the graduate school, and junior researcher scientist at Yale University from 1989-1990. He was researcher at the "Site Experimental en Hyperparallelisme" (supported by CNRS, CEA, and French DoD) from 1991 to 1994. Serge also was affiliate research scientist at Yale and visiting research fellow in several U.S. laboratories, especially in NASA-ICASE and the AHPCRC during the period 1991-1994. Serge leads the "Methodology and Algorithmic Programming" group of the CNRS "Laboratoire d'Informatique Fondamentale de Lille," and he participated to the INRIA Saclay "Grand Large" project. Serge has been scientific director of more than 22 Ph.D.s and has authored more than 100 articles on international journals and conferences. His current research interests are in "Parallel and Distributed Computing," "Post-Petascale Auto/smart-tuned Dense and Sparse Linear Algebra," and "Language and Programming Paradigm for Extreme Modern Scientific Computing," targeting especially geoscience and big data applications.

We'll evaluate the energy consumption of several orthogonalization methods computed during Krylov methods for large linear algebra problems on a supercomputer using dozen of GPU accelerators. We analyze that performance with respect to several methods optimizing the communications between nodes, from incomplete orthogonalization to "communication-avoiding" techniques using a multi-GPU TSQR method we implemented. We'll compare the impact of different algorithms to compute sparse-matrix vector multiplications, using hypergraph techniques in particular, with respect to different sparse matrices. After experimenting on a supercomputer using several dozen GPUs, we conclude that sometimes we have to find a tradeoff between energy consumption and the number of iteration using smart-tuning.

Level: All
Type: Talk
Tags: Supercomputing & HPC; Algorithms; Performance Optimization

Day: Wednesday, 04/06
Time: 15:30 - 15:55
Location: Room 211A

S6308 - Image Super-Resolution: From Sparse Coding to Deep Network

Zhaowen Wang Research Scientist, Adobe Systems Inc.
Zhaowen Wang is a research scientist in Adobe Systems, Inc. His research areas include image understanding and enhancement through machine learning algorithms, with a special interest in deep learning. Before joining Adobe, Zhaowen obtained a Ph.D. from the University of Illinois at Urbana Champaign in 2014.

Learn how to combine a conventional signal processing model with a deep neural network to achieve state-of-the-art performance for single-image super-resolution. Representing an image signal with its sparse coefficients has been proven as an effective prior for many image restoration problems including super-resolution. We design a deep convolutional network that mimics the sparse coding model and at the same time has the same advantage of end-to-end optimization as other deep learning models. By unifying the strengths of good image prior and large learning capacity, our method generates much better upscaling results than vanilla sparse coding and neural network in both visual and numerical quality. The learned network has a very compact size and can be implemented efficiently on a GPU.

Level: Intermediate
Type: Talk
Tags: Media & Entertainment; Video & Image Processing; Deep Learning & Artificial Intelligence

Day: Wednesday, 04/06
Time: 15:30 - 15:55
Location: Room LL21C

S6321 - How Deep Learning Works for Automated Customer Service

Chenghua (Kevin) Li Chief Scientist of DNN Lab, JD.COM
Dr. Chenghua Li is the chief scientist in the deep neural network (deep learning) laboratory of JD, in charge of promoting the application of deep learning technologies in JD products. He was a data mining expert in the National Key Laboratories of Hisense, in charge of intelligent hardware innovation and the development of data mining. Chenghua has been researching and working in machine learning, especially in neural network and data mining for decades. He has published more than 30 papers in world-leading academic journals such as Expert System with Application, Information Processing and Management, Knowledge based system, and Neurocomputing, and hold more than 10 patents. He received his Ph.D in data mining and machine learning at Chonbuk National University and finished his post-doctoral research at St. Francis Xavier University and York University in Canada. He was also a visiting scientist at MIT Media Lab.

Deep learning research and applications have seen numerous successes in the field of image processing and speech recognition. However, in the field of natural language processing, it is still under utilized. This session will share the relevant technology and the development process of the intelligent customer service robot; as well as machine learning, deep learning, and natural language processing related technology. We'll also discuss the application of deep learning on natural language processing and automatic question answering system, the role it plays in business, and how it enhances the ability to answer customer questions and boost customer satisfaction.

Level: All
Type: Talk
Tags: Deep Learning & Artificial Intelligence; Robotics & Autonomous Machines; Computer Vision & Machine Vision

Day: Wednesday, 04/06
Time: 15:30 - 15:55
Location: Room 210H

S6342 - Putting Tegra into Drive: Safe, Secure, and Seamless Vehicle Integration

Ulrich Meis Senior Software Engineer, OpenSynergy
Ulrich Meis is currently a software engineer at OpenSynergy GmbH dealing with virtualization and especially graphics virtualization for automotive systems. Before joining OpenSynergy he has worked as a researcher in the area of wireless multi-hop networks at institutes in Japan and Germany. He holds a diploma in computer science from RWTH Aachen University, Germany. Software development is his passion.

A solution for vehicle integration targeting the NVIDIA Tegra Jetson Pro and DriveCX platforms will be presented. Communication with the vehicle via the automotive CAN bus is managed by a system that runs separately from other functions in its own execution environment and backed by its own real-time operating system -- all based on the industry's standard Automotive Open System Architecture (AUTOSAR). Learn about the various benefits this design often has versus handling CAN directly in systems like Linux, Android, or QNX.

Level: Intermediate
Type: Talk
Tags: Self-Driving Cars & Automotive

Day: Wednesday, 04/06
Time: 15:30 - 15:55
Location: Room LL21E

S6349 - XMP Library Internals: Modular Multiplication on Kepler and Maxwell

Niall Emmart Ph.D. Student, University of Massachusetts
Niall Emmart is working towards his Ph.D. in computer science at the University of Massachusetts, Amherst, where his research focuses on high-performance, multiple-precision arithmetic across recent NVIDIA GPU architectures. Niall received a B.S. in pure mathematics from University of Massachusetts, Amherst, in 1992. From 1992 to 2012 he ran Yrrid Software, a small firm focusing on legacy system integration with the web.

We'll present an overview of the internals of the XMP multiple precision library and take a detailed look at the low-level algorithms used for modular squaring and modular multiplication on Kepler and present novel algorithms for Maxwell. Modular multiplication is a performance-critical primitive and widely used in cryptographic algorithms from prime testing and factorization to public key/private key algorithms such as RSA, Diffie-Hellman, and digital signatures.

Level: Intermediate
Type: Talk
Tags: Algorithms; Tools & Libraries

Day: Wednesday, 04/06
Time: 15:30 - 15:55
Location: Marriott Salon 3

S6396 - Experience Running a HPC System with up to 16 GPUs per Node

Maxime Boissonneault HPC Specialist, Calcul Québec - Compute Canada
Maxime Boissonneault earned his Ph.D. in quantum physics at Universite of Sherbrooke and has been working as a high performance computing specialist for Calcul Quebec at Universite Laval since 2013. While he has no academic background in computing, he has learned numerous programming languages starting at the age of 12, going from Java to C++ through Python and C#. He has been giving CUDA training classes since 2014, and has been managing the Calcul Quebec Helios GPU cluster located at Universite Laval.

We'll describe our experience running an HPC cluster composed of high-density GPU nodes of either 8 Tesla K20 or 8 K80 GPU accelerators (16 logical GPUs). This system is being used by a wide variety of researchers who run jobs ranging from 1 to 16 GPUs per node. As this heterogenous workload requires the sharing of nodes, we'll detail how the system was tuned to achieve the best balance between shareability, stability, and overall performance of the cluster. We'll communicate our experience on the following topics: [1] Restricting access to GPU devices, [2] Benchmarks to identify ideal workloads per node type, [3] Numa nodes and the impact of sharing resources on memory management.

Level: Advanced
Type: Talk
Tags: Data Center & Cloud Computing; Supercomputing & HPC

Day: Wednesday, 04/06
Time: 15:30 - 15:55
Location: Room 210E

S6478 - Real-Time Visualization of CUDA® Data Using ArrayFire Forge

Brian Kloppenborg Research Scientist, ArrayFire
Brian Kloppenborg is an astrophysicist turned high performance computing engineer and current a Research Scientist at ArrayFire. As a research scientist, he writes massively parallel, high performance software with applications in physics, astrophysics, and computer vision. Brian is particularly involved with DARPA's MEMEX project, a program which seeks to identify individuals who might be subject to human trafficking. Additionally, Brian is an adjunct professor in the Department of Physics and Astronomy at Georgia State University. As a professor, he conducts astrophysical research using spatially resolved optical interferometric imagin of stellar surfaces, eclipsing binary stars, and novae.

We will debut ArrayFire Forge, our new general-purpose data visualization library for GPUs. ArrayFire Forge is a data visualization library that is written specifically for use with GPU-accelerated applications. By using interoperability with OpenGL, Forge enables developers to create real-time, responsive, and stunning visualizations in 2D and 3D. Forge is an open-source project and distributed on GitHub.

Level: Intermediate
Type: Talk
Tags: Tools & Libraries; Real-Time Graphics; Graphics Virtualization

Day: Wednesday, 04/06
Time: 15:30 - 15:55
Location: Room 211B

S6492 - If It's Not Reproducible, It's Not Worth It! Deterministic Machine Learning and Molecular Dynamics

Scott LeGrand CTO, AMBER Inc.
Highly-Rated Speaker
Scott LeGrand is a principal engineer at Amazon Inc., working on neural network-based recommendation systems. In college, he developed the first molecular modeling system for home computers, Genesis, Folderol, the distributed computing project targeted at the protein folding problem in 2000, and BattleSphere, a networkable 3D space shooter for the Atari Jaguar the same year. Surprisingly, all three of these efforts shared a common codebase. More recently, he ported the Folding@Home codebase to CUDA, achieving a 5X speedup over previous efforts, and which accounts for 2.6 petaFLOPs of the project's computational firepower. He is best known for his work porting the AMBER molecular dynamics package to CUDA, attaining record-breaking performance in the process. Scott earned a B.S. in biology from Siena College and a Ph.D. in biochemistry from the Pennsylvania State University. He is currently obsessed with deep neural network performance optimization.

Parallel algorithms are hard. Data-parallelizing such algorithms is even harder. Extending such parallelization to multiple GPUs can tempt one to relax the reproducibility of computations in order to simplify reductions and allow for dynamic load-balancing as the distribution of the underlying data shifts throughout a computation. But once done, it's impossible to detect any sort of race condition in your codebase. Race condition effects can range from introducing mostly harmless noise in an approximate algorithm to catastrophic corruption that destroys the validity of computations. This talk will show how to maintain high performance on GPU clusters without losing reproducibility with examples in machine learning, deep neural networks, and molecular dynamics.

Level: Intermediate
Type: Talk
Tags: Computational Chemistry; Deep Learning & Artificial Intelligence; Algorithms

Day: Wednesday, 04/06
Time: 15:30 - 16:20
Location: Marriott Salon 5

S6541 - Augmented Reality for In-Vehicle Head-Up Displays

Christian Reinhard Senior Director, HMI, Elektrobit
Christian Reinhard is Senior Director, HMI, at Elektrobit (EB). Since 2012, he's led the department that is responsible for developing software solutions for industry-leading graphical, touch and speech user interfaces for infotainment and instrument clusters as well as industrial and medical applications. Christian studied computer sciences at Friedrich-Alexander-Universität in Erlangen and joined EB in 2001. During his career, he held different positions at EB and was responsible for the development of navigation systems for the automotive and consumer markets before joining the HMI department.

Find out why technology like the augmented reality head-up display will be a critical component in autonomous vehicles, providing software-enabled features that will make safe autonomous driving possible. Infotainment systems and graphical interfaces are key differentiators for carmakers in the age of smartphones. Consumer electronics-inspired technologies and multimodal HMIs enriched with app-like content are becoming commonplace in vehicles, and HMI will play an increasingly important role in automated/autonomous driving. Its critical functionality will help transition control from the driver to the car and vice versa while taking into account the driver's status and distraction level. This will require HMI and driver-assistance software to work together more closely, across all screens in the car.

Level: Intermediate
Type: Talk
Tags: Self-Driving Cars & Automotive ; Embedded; Virtual Reality & Augmented Reality; Product & Building Design; Press-Suggested Sessions: Self-Driving Cars & Auto

Day: Wednesday, 04/06
Time: 15:30 - 15:55
Location: Marriott Salon 2

S6596 - IFM Technologies: Intelligent Flying Machines for Indoor Applications

Marc Gyongyosi Founder , IFM Technologies
Marc Gyongyosi is a junior in computer science at Northwestern University in the McCormick School of Engineering. For the past two years, he has been working closely together with BMW's robotics research department to develop novel robotic systems assisting workers at BMW factories. At BMW, Marc's primary research focus is the implementation and development of cooperative lightweight robots. At Northwestern's The Garage, Marc is involved in two startups: at MDAR Technologies, he works on a novel 3D vision system for self-driving cars and other autonomous vehicles. As the founder of IFM Technologies, he develops novel "Intelligent Flying Machines," i.e., Drones for Decisions. IFM Technologies aims to increase productivity and improve efficiency in everyday manufacturing and logistics processes.

We'll present recent advancements in leveraging the GPU on-board IFM Technologies' "Intelligent Flying Machines." IFM is providing industrial, indoor, flying platforms for data-driven decisions in the manufacturing and logistics industry. IFM provides a complete framework to collect, visualize, and leverage three-dimensional data analysis in indoor environments. Using the onboard GPU, IFM Technologies takes innovative production and logistics technology to a -- quite literally -- new dimension.

Level: Intermediate
Type: Talk
Tags: Robotics & Autonomous Machines; Computer Vision & Machine Vision; IoT

Day: Wednesday, 04/06
Time: 15:30 - 15:55
Location: Room LL20D

S6743 - Large Scale Video Processing for Virtual Reality

Arthur van Hoff CTO, Jaunt VR
Arthur van Hoff is a serial entrepreneur and was most recently CTO at Flipboard. He started his career in Silicon Valley at Sun Microsystems where he was an early developer of the Java programming language. Since then he has started several successful companies including Marimba (IPO 1999), Strangeberry (acquired by TiVo), ZING (acquired by Dell), and Ellerdale (acquired by Flipboard). Arthur has expertise in machine learning, big data, mobile applications, 3D printing, and computational photography. He is originally from the Netherlands and has a master's degree in Computer Science from Strathclyde University in Glasgow.

Jaunt VR has developed a GPU based large scale video processing platform to combine multiple HD camera streams in radial configuration into seamlessly stitched stereoscopic spherical panoramas. The approach uses complex computational photography algorithms that require sharded processing of the data across hundreds of cloud based GPU instances.

Level: All
Type: Talk
Tags: Virtual Reality & Augmented Reality; Computer Vision & Machine Vision; Video & Image Processing; Press-Suggested Sessions: Virtual Reality

Day: Wednesday, 04/06
Time: 15:30 - 15:55
Location: Room LL20C

S6751 - Rendering Lost Historical Buildings with NVIDIA Technology

Andrew Rink Marketing Strategy, NVIDIA
At NVIDIA, Andrew Rink is responsible for global marketing strategy for Manufacturing and AEC Industries. With 25 years’ international experience in various industries including CAD and animation software, lasers and photonic power, Andrew has extensive understanding of business challenges faced by companies around the world and expertise in bringing innovative technology to market. Based at NVIDIA’s Silicon Valley headquarters, he has travelled to over 80 countries and is fluent in three languages.

New powerful tools are now available to accelerate and improve architectural visualization. We'll present an overview of how NVIDIA Iray plugins, Iray Server distributed rendering software, Quadro pro GPUs and Quadro Visual Computing Appliance are harnessed to drive interactive photorealistic rendering for building design visualization. When the Bank of England made renovations in the 1920s, almost all the neoclassical design contributed by Sir John Soane was lost. Project Soane was launched in 2015 to produce a crowdsourced digital model of Soane's work. Now that model is being used to create photorealistic renders of the building from the early 1800s. Join this session to hear how history is being rendered with leading edge technology.

Level: All
Type: Talk
Tags: Product & Building Design

Day: Wednesday, 04/06
Time: 15:30 - 15:55
Location: Room LL21A

S6762 - Reimagining Cartography for Navigation

Eric Gundersen CEO, Mapbox
As CEO of Mapbox, Eric Gundersen coordinates product and business development. Eric has been with the team since the start and helped grow Mapbox out of the need for a better mapping platform. The platform is now powering maps with NVIDIA, Foursquare, Pinterst, MapQuest, CNN, and thousands more. Mapbox has over 130 people worldwide, with main offices in Washington DC and San Francisco.

Attendees will be able to walk away with an appreciation for how modern computing power and GPUs are enabling a whole new world of map design potential for the car. Vector-based maps can render data on the fly, 60fps, taking in-car map design to a more video game-like state. The driving experience can be seamless across devices, and tailored to exactly what a user needs for any specific use case.

Level: Beginner
Type: Talk
Tags: Self-Driving Cars & Automotive ; Embedded; Real-Time Graphics; Press-Suggested Sessions: Self-Driving Cars & Auto

Day: Wednesday, 04/06
Time: 15:30 - 15:55
Location: Room LL21B

ECS6174 - Early Stage Challenge Finalist: Beijing BriSky Technology Development Corp. Ltd.

Bin Zhou CEO, Beijing BriSky Technology Limited Corp.
Dr. Bin ZHOU is the co-founder of Beijing BriSky Technology Development Corp. Ltd (BriSky). He received his BS. MS. and PhD. in electronic engineering and communication from Beijing Tsinghua University. He also had MS. degree in computer engineering from George Mason University. He worked for NVIDIA Corp as senior HPC Developer engineer for a year. After that, he worked as the director and chief scientist for a research lab in Shandong Academy of Sciences. He also works as an adjunct research professor in University of Science and Technology of China. In 2013, he was selected as NVIDIA CUDA Fellow for his work in CUDA/GPU research and evangelism. His research interests include GPU based DNN algorithms, UAS systems, computer vision and embedded super computing. He founded BriSky, which focuses on providing innovative UAS services and solutions for industrial customers.

Level: All
Type: Talk
Tags: Robotics & Autonomous Machines; Computer Vision & Machine Vision; Emerging Company Summit

Day: Wednesday, 04/06
Time: 15:40 - 15:50
Location: Room 220B

ECS6207 - Early Stage Challenge Finalist: Tempo Quest, Inc.

Allen Huang CTO, Tempo Quest, Inc.
Allen Huang, received his Ph.D. in the area of satellite remote sensing from the University of Wisconsin-Madison in 1989. In the same year he joined Cooperative Institute for Meteorological Satellite Studies, Space Science and Engineering Center, University of Wisconsin-Madison, and currently a Distinguished Scientist of the UW- Madison, a Fellow of International Society for Optical Engineering (SPIE), PI of many NOAA and NASA sponsored projects, an Adjunct Professor of several universities, CEO of Hyper Sensing, LLC, and CTO of Tempo Quest, INC.

Level: All
Type: Talk
Tags: Earth System Modelling; Emerging Company Summit

Day: Wednesday, 04/06
Time: 15:50 - 16:00
Location: Room 220B

ECS6160 - Early Stage Challenge Finalist: SADAKO TECHNOLOGIES, S.L.

Eugenio Garnica González-Bárcena Cofounder and CEO, SADAKO TECHNOLOGIES, S.L.
Industrial Engineer. Associate teacher of Project Management in UPC University. Experience in international nuclear services management & project management in Iberdrola, having managed international projects of Several Milion € and hundred people teams. He quit the company to found his own venture with the aim to develop "Technology for a better World". High technical profile and management skills.

Level: All
Type: Talk
Tags: Deep Learning & Artificial Intelligence; Computer Vision & Machine Vision; Robotics & Autonomous Machines; Emerging Company Summit

Day: Wednesday, 04/06
Time: 16:00 - 16:10
Location: Room 220B

S6199 - Raytracing Scientific Data in NVIDIA OptiX™ with GVDB Sparse Volumes

Rama Hoetzlein Graphics Research Engineer, NVIDIA
Rama Hoetzlein's current research with NVIDIA explores data structures for large-scale simulation and volume rendering. Rama completed a dual-degree in computer science and fine arts from Cornell in 2001, with research in robotics and imaging. In 2010, his dissertation at the University of California, Santa Barbara, focused on tools for creative interaction in procedural modeling for media artists. In 2010, Rama was co-director and lead scientist of the Transliteracies project in the Digital Humanities, and professor of media studies at the Medialogy program in Copenhagen with a focus on visual effects and animation.
Tom Fogal Senior Software Engineer, NVIDIA
Thomas Fogal is an NVIDIA engineer specializing in HPC visualization. As a doctoral student, he worked on parallel volume rendering techniques as well as novel approaches to in situ visualization. At the Scientific Computing & Imaging Institute, ORNL, and LLNL, he worked on parallel rendering for large scientific data. Thomas holds a B.S. and M.S. from the University of New Hampshire, and will soon have a doctorate from the University of Duisburg-Essen in Germany.

We present a novel technique for visualization of scientific data with compute operators and multi-scatter ray tracing entirely on GPU. Our source data consists of a high-resolution simulation using point-based wavelets, a representation not supported by existing tools. To visualize this data, and consider dynamic time-based rendering, our approach is inspired by OpenVDB from motion pictures, which uses a hierarchy of grids similar to AMR. We develop GVDB, a ground-up implementation with tree traversal, compute, and ray tracing via OptiX all on the GPU. GVDB enables multi-scatter rendering at 200 million rays/sec, and full-volume compute operations in a few milliseconds on datasets up to 4,200^3 entirely in GPU memory.

Level: All
Type: Talk
Tags: In-Situ and Scientific Visualization; Rendering & Ray Tracing; Computational Fluid Dynamics

Day: Wednesday, 04/06
Time: 16:00 - 16:50
Location: Room LL21D

S6242 - Parallel CAFFE Framework Based on GPU Cluster: IB + GPU Cluster + Lustre + MPI

Qing Zhang HPC Application R&D Manager, Inspur Electronic information Indudtry Co.,Ltd
Qing is currently the HPC Application R&D Manager at Inspur Group. He is Inspur-Intel China Parallel Computing Joint Lab and Inspur-Nvidia GPU joint lab Chief Architect. He is in charge of the two labs. His research directions include HPC, multi-core CPU parallel computing and GPU/MIC/FPGA heterogeneous computing. His research focuses on HPC applications, deep learning applications and Internet Data Center (IDC) applications. His research application fields include Oil & Gas, CFD/CAE, Life Science, Finance, Internet and so on. He is an expert of heterogeneous computing and leads a HPC optimization team of 10+ people. He is the author of the book "High Performance Computing on the Intel Xeon Phi", which is the 1st MIC technology guide in the world.

The goal of this session is to explain the strategy on how to design parallel CAFFE framework on GPU cluster platform to handle big data, like three different kinds of MPI parallel mechanism, and how to optimize the data reading, network communication, multi-GPU parallel efficiency.

Level: Intermediate
Type: Talk
Tags: Deep Learning & Artificial Intelligence

Day: Wednesday, 04/06
Time: 16:00 - 16:25
Location: Room 210E

S6262 - A Comparison of Accelerator Architectures for Signal-Processing Algorithms

John Romein Senior Researcher HPC, ASTRON (Netherlands Institute for Radio Astronomy)
John Romein is senior researcher at ASTRON, where he leads several projects on HPC research for radio-astronomical applications. His primary focus is the use of accelerator hardware. He implemented the Blue Gene/P correlator for the LOFAR radio telescope. John received his Ph.D. in computer science on distributed game-tree search at the Vrije Universiteit, Amsterdam, in 2001. As a postdoctoral researcher, he solved the game of Awari using a large computer cluster and did research on parallel algorithms for bioinformatics. His research interests include high-performance computing, parallel algorithms, networks, programming languages, and compiler construction.

We'll compare different accelerator platforms (GPUs from NVIDIA and AMD, the Xeon Phi, a DSP from Texas Instruments, and a regular Xeon CPU as reference platform) for signal-processing algorithms that are used in radio astronomy (e.g., a filter, correlator). We'll show why the architectures are (in)efficient, discuss which architecture-(in)dependent optimizations are necessary, report on energy efficiency, and assess programmability.

Level: Advanced
Type: Talk
Tags: Performance Optimization; Supercomputing & HPC; Signal & Audio Processing

Day: Wednesday, 04/06
Time: 16:00 - 16:25
Location: Room 212A

S6395 - Graph Database and Analytics in a GPU-Accelerated Cloud Offering

Brad Bebee CEO, Blazegraph
Brad Bebee is the CEO of SYSTAP, leading efforts to deliver graphs at scale with Blazegraph products. An expert in graphs and large-scale analytics, he has a diverse background in software developments, telecommunications, and information retrieval.
Dave Driggers Chief Executive and Technical Officer, Cirrascale Corporation
David Driggers is the chief executive officer and original founder of Cirrascale, responsible for the company's strategic direction. David establishes the technology roadmap of the company and provides guidance for hardware technology development. He is directly responsible for the patents surrounding the company's vertical cooling technology and blade-based products.

Blazegraph GPU provides 300X acceleration for SPARQL graph query and graph database management with acceleration for existing RDF/SPARQL and Property Graph (Tinkerpop) applications. Multi-GPU configurations can effectively manage billion+ edge graphs on single-node machines with 4 or 8 K80 GPU accelerators. This is a cost-effective way to deliver high performance for graphs, but many end-users and applications do not have existing multi-GPU systems; current cloud offerings at this scale are not generally available. Cirrascale has developed a cloud-based solution for provisioning multi-GPU Tesla systems using its switch riser technology. This session details the Blazegraph GPU cloud offering on Cirrascale, demonstrates how to quickly deploy it in the cloud, and shows graph benchmarks on cloud systems.

Level: Beginner
Type: Talk
Tags: Big Data Analytics; Data Center & Cloud Computing; Aerospace & Defense

Day: Wednesday, 04/06
Time: 16:00 - 16:25
Location: Room 210F

S6400 - Quants Coding CUDA® in .NET: Pitfalls and Solutions

Benjamin Eimer Quantitative Developer, Chatham Financial
Benjamin Eimer has been a quantitative developer at Chatham Financial for the past four years, where he focuses on model development and performance. Previous to working in finance, Ben worked for the National Institute of Occupational Safety and Health as an aerosol scientist. Ben received his Ph.D. in physics with an emphasis in computational modeling and material science from New Mexico State University in 2006.

We'll cover some of the lessons we have learned in developing a hybrid GPU/CPU linear algebra library in .NET to accelerate the financial risk and derivative pricing models developed by our quant team. The purpose of this library is to allow our team to transition to GPU computing incrementally within our extensive .NET codebase. We'll present some of the difficulties encountered when .NET automated garbage collection interacts with low-level memory management and how we addressed them. Solving these problems is essential for running CUDA code as part of a highly available, web service architecture.

Level: All
Type: Talk
Tags: Finance; Tools & Libraries; Programming Languages

Day: Wednesday, 04/06
Time: 16:00 - 16:25
Location: Marriott Salon 1

S6442 - Recognizing Cancerous Cells in Histology Imagery Using Deep Learning

Ted Hromadka Senior Software Engineer, Integrity Applications Incorporated
Theodore Hromadka is a senior software engineer at Integrity Applications Incorporated, working on the HPCMP Portal project. His research areas include deep learning, high-performance computing, and mobile app development. He is a graduate student in computer science at University of California, San Diego.
Ken Abeloe Co-Founder, Integrity Applications Inc.
Ken Abeloe is a co-founder of Integrity Applications Inc. Ken leads the development and delivery of cutting edge software products. Ken supports IAI's research and development efforts emphasizing web services integration, geospatial data fusion, and precision strike concepts. Ken is a Physics graduate from the University of California, San Diego.
Niels Olson Pathology Resident, Naval Medical Center San Diego
Niels Olson is a pathology resident at Naval Medical Center San Diego. His research includes prostate cancer biomarkers and image analysis. Niels earned his MD at Tulane. He majored in Physics at the Naval Academy and served as a surface warfare officer. His hobbies are scientific python and surfing.

We'll present the results of applying deep learning techniques and GPUs towards classification of histology imagery. At Naval Medical Center, San Diego, pathologists manually inspect biopsy samples to identify cancerous cells amid healthy tissue. This process is time-intensive and susceptible to errors caused by fatigue. Using DIGITS, Caffe and GPUs, researchers are automating this process.

Level: Beginner
Type: Talk
Tags: Medical Imaging; Deep Learning & Artificial Intelligence; Press-Suggested Sessions: AI & Deep Learning

Day: Wednesday, 04/06
Time: 16:00 - 16:25
Location: Room 212B

S6497 - Neural Attention for Object Tracking

Brian Cheung Ph.D. Student, UC Berkeley
Brian Cheung is a Ph.D. student at UC Berkeley working with Professor Bruno Olshausen at the Redwood Center for Theoretical Neuroscience. His research interests lie at the intersection between machine learning and neuroscience.

With differentiable forms of attention being integrated into neural networks, end-to-end training with backpropagation is possible. We adopt the recently proposed attention mechanism in spatial transformer networks (STNs) into a recurrent architecture to perform object tracking. We show that this attention mechanism has significant overlap with the mechanism in deep recurrent attentive writer (DRAW) networks, which have been successfully used to create generative models of images. We present an end-to-end trainable recurrent attention model for tracking a variety of objects in video recorded by cameras mounted on an automobile.

Level: Advanced
Type: Talk
Tags: Deep Learning & Artificial Intelligence; Robotics & Autonomous Machines; Computer Vision & Machine Vision; Self-Driving Cars & Automotive

Day: Wednesday, 04/06
Time: 16:00 - 16:25
Location: Room 210H

S6565 - Creating Unique Customers Relationships with Deep Learning in the Cloud and in the Car

Nick Black Chief Product Officer, CloudMade
Nick Black is a product-focused founder and entrepreneur with domain experience in location-based services, connected cars, and automotive infotainment. His experience includes building and leading product planning and product design organizations through full product life cycles from concepting, research, co-creation, business model development, prototyping, user testing, development, and release cycles. At CloudMade, Nick leads product planning and design, where he is focussed on designing, specifying, and delivering CloudMade's self-learning car solutions.

The car presents a particular challenge for creators of learning systems -- it is incredibly rich in data and context, its hardware and software environments are heterogeneous and fragmented, and drivers expect incredible precision from its interactions. CloudMade has pioneered an approach to machine learning in the automotive context that leverages the richness of car data, the emerging computational power of the car, and the existing computational power of the cloud to deliver an automotive-grade machine learning toolset. With CloudMade's solutions, automotive OEMs can deliver personalized experiences to customers that together create a self-learning car that anticipates the needs and desires of the user.

Level: All
Type: Talk
Tags: Self-Driving Cars & Automotive ; Deep Learning & Artificial Intelligence; Embedded

Day: Wednesday, 04/06
Time: 16:00 - 16:25
Location: Room LL21E

S6666 - Fast Splittable Pseudorandom Number Generators

Guy Steele Software Architect, Oracle Labs
Guy Steele is a software architect at Oracle Labs, where he is responsible for research in language design and implementation strategies, and architectural and software support for programming languages. He has also been an assistant professor of computer science at Carnegie-Mellon University; a member of technical staff at Tartan Laboratories in Pittsburgh, Pa.; and a senior scientist at Thinking Machines Corporation in Cambridge, Mass. Guy joined Sun Microsystems (later acquired by Oracle) in 1994 as a distinguished engineer and was named a Sun Fellow in 2003. He received his B.A. in applied mathematics from Harvard College (1975), and his M.S. and Ph.D. in computer science and artificial intelligence from M.I.T. (1977 and 1980).

We describe two new classes of algorithm for a "splittable" pseudorandom number generator (PRNG) that is quite fast: either 9 or 11 64-bit arithmetic/logical operations per 64 bits generated. A splittable PRNG provides a "split" operation that creates a new PRNG that is computationally and statistically independent of its creator and therefore may be used in parallel. Splittable PRNG objects make it easy to organize the use of pseudorandom numbers in multithreaded programs where the number of threads may vary dynamically, but also have sufficient speed and quality to be useful when the number of threads is fixed. It is faster than MRG32k3a and of higher quality than XORWOW. No locking or synchronization is required, and the algorithm is quite suitable for SIMD or GPU implementation.

Level: All
Type: Talk
Tags: Algorithms; Tools & Libraries; Performance Optimization

Day: Wednesday, 04/06
Time: 16:00 - 16:25
Location: Marriott Salon 3

S6677 - Photogrammetry and Virtual Reality: Transporting Real Sites into VR

David Finsterwalder CTO, realities.io
Working in Archeology, David gained profound knowledge in 3D reconstruction through LiDAR scanning, UAV and ground photogrammetry. Driven by the question how to unlock the full potential of the gathered data for research and public relations alike, he started to look into the use of game engines and VR HMDs for visualization. Amazed by all the possibilities for scene reconstructions and VR HMDs – not only for Archeology but also for Virtual Tourism and Gaming – he decided to dedicate all his efforts in immersive technology and founded realities.io which, among other projects, visualized 3D scans of two archeological excavations and a medieval castle.
Daniel Sproll CXO, realities.io
A background in cognitive science and Virtual Reality psychology research, the latest VR renaissance allowed Daniel to dive into the field of VR user experience design. His design process combines perceptional psychology with rapid prototyping to explore the huge uncharted territory that is VR interaction and UX design. His special interest lies in applications beyond gaming - from interactive storytelling to data visualization. Daniel is a Co-Founder and CXO at realities.io.

In this session, we will give a detailed overview of the applications for photogrammetry in the new medium of Virtual Reality and offer an insight into the workflow involved. Photogrammetry allows the 1:1 visualization of real-world sites in a virtual environment, but sets unique challenges both for content creators and GPU hardware. Photogrammetrical reconstruction and texturing is computationally very intensive, and thus profits heavily from parallel processing provided by software layers like CUDA - attend this talk to learn about specialized software, workflow optimizations and the unique applications that photogrammetry has in Virtual Reality and beyond.

Level: All
Type: Talk
Tags: Virtual Reality & Augmented Reality

Day: Wednesday, 04/06
Time: 16:00 - 16:25
Location: Room LL20C

S6678 - Accelerating Real Applications: Best Practices for Profiling and Debugging Complex Code

Beau Paisley Support Engineer, Allinea
Beau is a computer science and mathematics graduate from the College of William and Mary and performed graduate studies in Electrical Engineering at Purdue University. He has over twenty-five years of experience in development, marketing and sales roles with research, academic, and startup organizations. He has previously held positions with NCAR, Applied Physics Lab, and several startup and early growth technical computing companies.

Real-world codes are often made up of 100,000-line files written by generations of contributors none of whom fully understood the previous work. As the complexity of a code grows it goes through a phase change and the simple techniques learned on simple examples fail. In this talk we share some best practices for understanding and accelerating complex, real-world codes based on over a decade working with scientists on some of the world's largest HPC systems and applications. Topics covered include: (1) Understanding and navigating complex codes from a performance perspective, (2) Practical advice on identifying areas to accelerate, (3) A scientific approach to preventing, identifying and tracking down errors and (4) Profiling and debugging tools and when NOT to use them.

Level: Beginner
Type: Talk
Tags: Tools & Libraries; Performance Optimization

Day: Wednesday, 04/06
Time: 16:00 - 16:50
Location: Room 211B

S6704 - The Microscope for 21st Century Discovery

Wojtek James Goscinski Manager, External Collaborations, eResearch Centre, Monash University
Dr Wojtek James Goscinski is the coordinator of the Multimodal Australian ScienceS Imaging and Visualisation Environment (MASSIVE), a specialist Australian high performance computing facility for imaging and visualization, and the External Collaborations Manager at the Monash e-Research Centre a role in which he leads teams to develop effective and creative applications of computing in research. He holds a Ph.D. in Computer Science, a Bachelor of Design (Architecture), and a Bachelor of Computer Science.
Paul McIntosh Technical Lead, MASSIVE, Monash University
Dr Paul McIntosh is the technical lead for the Multi-modal Australian ScienceS Imaging and Visualisation Environment (MASSIVE). MASSIVE is a specialised GPU high performance computer operated under a partnership between Monash University, Australian Synchrotron and CSIRO.

World-class environments for research require the orchestration of specialised instruments, data storage and processing facilities, and advanced data visualisation environments. The Clayton Innovation Precinct is now home to a world-unique trifecta to support this vision: (1) World-class scientific instruments located at Monash University, CSIRO, Australian Synchrotron and affiliated medical research institutes; (2) Unique data processing capabilities of the MASSIVE HPC facility; and (3) A world-class immersive visualisation environment for data analysis and collaboration (the CAVE2). The way in which scientists apply these three capabilities in concert will be an archetype of the way research will be performed in the 21st Century.

Level: All
Type: Talk
Tags: Supercomputing & HPC; Data Center & Cloud Computing

Day: Wednesday, 04/06
Time: 16:00 - 16:25
Location: Room 211A

S6705 - Solving the Mysteries of Particle Physics with GPUs

Waseem Kamleh Senior Research Associate, University of Adelaide
Dr Waseem Kamleh is a leading researcher in computational physics based at the University of Adelaide. Dr Kamleh graduated in 1999 with Honours in Mathematical Physics, and was awarded the University Medal. He received his doctorate in 2004 and performed post-doctoral studies at Trinity College, Dublin. Dr Kamleh returned to the University of Adelaide in 2008 to join the Centre for the Subatomic Structure of Matter (CSSM), where he leads the high performance computational physics component of the CSSM lattice research program. Dr Kamleh's research interests cover a wide variety of topics within lattice quantum chromodynamics (QCD), ranging from hadron structure to dynamical fermion algorithms. Dr Kamleh has been an avid programmer all his life, and has over 15 years experience in high performance computing. Dr Kamleh has transformed the way in which the CSSM research group's calculations are performed by exploiting GPU technology in the software he develops, and is a leading expert in the use of GPUs for studying lattice QCD.

A 50-year old mystery in particle physics is solved with the discovery that an exotic subatomic particle is actually a molecule. Lattice quantum chromodynamics (QCD) uses high performance computing to solve the fundamental equations that describe the interactions of subatomic particles and reveal their internal structure. The anomalously low mass of the Lambda(1405) resonance has puzzled physicists since it was first observed in the 1960s. We show how a recent Lattice QCD calculation conducted by the University of Adelaide demonstrated that the Lambda(1405) is an exotic meson-baryon molecular state. The highly parallel nature of Lattice QCD calculations makes GPUs an ideal hardware platform and we discuss the critical importance of large-scale GPU clusters to this field of research.

Level: All
Type: Talk
Tags: Computational Physics; Supercomputing & HPC; Press-Suggested Sessions: HPC & Science

Day: Wednesday, 04/06
Time: 16:00 - 16:25
Location: Marriott Salon 6

S6746 - Will SLI Improve your Virtual Reality Experience and Development?

Tim Bates Senior Manager, IT Visualization Global Sales, Marketing & Customer Experience, General Motors
Tim Bates is an IT Visualization & Advance Graphics architect with 25 years as an Information security professional designing, planning and implementing programs and systems aligned to help organizations meet their business goals. Tim works for General Motors.

Does SLI improve the Virtual Reality experience? What effect does SLI have on the top industry HMD's on the market today? What tools are used in a IT Visualization pipeline, and will working in SLI improve your work experience while using game development tools.

Level: All
Type: Talk
Tags: Product & Building Design; Virtual Reality & Augmented Reality; Game Development; Graphics Virtualization

Day: Wednesday, 04/06
Time: 16:00 - 16:25
Location: Room LL21A

S6801 - Beyond Mad Men: Moving the Kitchen Into the Age of the Smart Home

Matt Van Horn CEO, co-founder, June
Matt Van Horn is the co-founder and CEO at June. Previously, Matt was the VP of business for Path, where he ran business operations and started the Path platform. Prior to Path, Matt ran business development for Digg where he led partnerships and started the publisher Digg button program. He also co-founded Zimride (well-known now as Lyft), the San Francisco based transportation network company. Matt is a University of Arizona alum where he was as Apple's higher education marketer for the school. Today he lives in San Francisco with his wife Lauren and their dog Lady, and loves to cook baked lemon garlic salmon.
Nikhil Bhogal Co-founder, CTO, June
Nikhil Bhogal is the co-founder and CTO at June, where he oversees the engineering and design teams behind the product. He was previously the lead iOS developer for Path. Before that Nikhil was an engineer at Apple, where he worked on the first iPhone through iPhone 5, as well as the first iPad through iPad 3. He's an inventor on many of Apple's camera technology patents, including tap to focus, panorama processing, zero shutter lag and lockscreen camera. Nikhil started his career as an engineer at Motorola, focusing on embedded software development for the Razr family of phones. He lives in San Francisco with his wife and little one on the way. In his spare time, Nikhil enjoys cooking, photography, making espresso and building stereos.

The founders of the June Intelligent Oven, will share how they are leveraging the latest technology available to create a compact, connected oven that offers the convenience of a microwave with more power and intelligence than a traditional high-end stovetop, and their perspectives on the kitchen being the last room of the home to be brought into the IoT era, and view of the future of cooking where classic techniques intersect with new world technology.

Level: All
Type: Talk
Tags: Robotics & Autonomous Machines; Deep Learning & Artificial Intelligence; Computer Vision & Machine Vision; Press-Suggested Sessions: AI & Deep Learning

Day: Wednesday, 04/06
Time: 16:00 - 16:25
Location: Room LL20D

S6824 - Making Business Intelligence and Deep Learning Accessible with Cloud, OCP (Presented by Quanta Cloud Technology)

Alan Chang Product Marketing Manager, QCT
Alan Chang has been the Product Marketing Manager at QCT (Quanta Cloud Technology) for the past 5 years, helping to define QCT's own brand server series (QuantaGrid/QuantaPlex) and rack solutions (Open Rack, OCS Rack). Graduating from the University of California Irvine with a Master's Degree in computer science and over 10 years of experience in the IT industry, he continues to contribute to the industry at QCT with an extended five year roadmap for tier 2 and enterprise customers.
Rob Heath VP of solution business, QCT US, QCT
Rob Heath is the VP of solution business at QCT USA, responsible for QCT Solution Business Development in USA. He has more than 15+ years experiences in IT industry on both solution technical engineering and business sides.

QCT introduction to the world of GPU/HPC: QCT's value proposition as the newcomer to the GPU acceleration market, and how it benefits the end customer without seamless design concept from hyperscale datacenter to HPC/Academic/Research and even Dell's Private Enterprise Cloud. With Facebook's Open Compute Project Big Sur GPU acceleration system, to the world's first X86based Pascal system. QCT is aiming to launch a series of GPU acceleration product line to enable the high-performance computing market. More and more customers see the benefits of server-hosted desktop virtualization solutions. QCT will introduce 3D application in virtualization environment with QCT servers which delivers better graphics, improves productivity and grants access to business-critical applications anywhere.

Level: All
Type: Talk
Tags: Data Center & Cloud Computing; Graphics Virtualization; Deep Learning & Artificial Intelligence

Day: Wednesday, 04/06
Time: 16:00 - 16:50
Location: Room 210G

S6839 - Leveraging Microsoft Azure's GPU N-Series for Compute Workloads and Visualization

Karan Batta Program Manager, Big Compute/HPC, Azure, Microsoft
Karan is a Program Manager in the Big Compute/HPC team in Microsoft's Azure, where he leads the vision and deployment of the new Azure GPU N-Series as part of broader Azure Compute IaaS capabilities. Additionally,he leads the media and entertainment vertical solutions as part of the Azure Batch HPC service.
Alexey Kamenev Engineer, Advanced Technology Group, Microsoft Research
Alexey Kamenev is an engineer in Microsoft Research Advanced Technology Group where he primarily works on CNTK (open-source, distributed deep learning toolkit). Prior to MSR he worked in Azure Machine Learning on deep learning and other machine learning algorithms.

This talk will cover the recently announced state-of-the-art GPU visualization and compute infrastructure in Microsoft's Azure cloud including how the GPUs are exposed using new capabilities in Hyper-V as part of Microsoft Server 2016, more importantly, the session will cover how you can leverage deep learning libraries such as CNTK on the N-Series GPUs. CNTK is an open source computational network toolkit that is designed for single GPU and Multi-GPU scenarios and is a highly flexible toolkit. This session is aimed at folks who would like to learn more about how to utilize and leverage Azure for deep learning, simulation, HPC, visualizing Open GL and DirectX applications.

Level: Beginner
Type: Talk
Tags: Data Center & Cloud Computing; Deep Learning & Artificial Intelligence; Video & Image Processing

Day: Wednesday, 04/06
Time: 16:00 - 16:25
Location: Room LL21C

S6845 - Theano at a Glance: A Framework for Machine Learning

Frédéric Bastien Team Lead - Software Infrastructure at MILA, Université de Montréal
Frédéric Bastien is team lead - software infrastructure at the Montreal Institute of Learning Algorithms, Canada (MILA) and lead developer for the Theano library. In 2007, he finished an M.S. in computer architectures at University of Montreal and has since been working at MILA (formerly LISA lab).

Theano is an extremely popular framework for machine learning, providing a generalized toolset for machine learning tasks. In this session, we'll discuss the Why and the What of Theano, leaving the How for the Theano Hands-on Lab. We'll cover the high-level motivations and general philosophy behind Theano, including future direction and goals. Please join the Theano developers to learn where this tool fits into your machine learning efforts.

Level: All
Type: Talk
Tags: Deep Learning & Artificial Intelligence

Day: Wednesday, 04/06
Time: 16:00 - 16:25
Location: Hall 3

ECS6189 - Early Stage Challenge Finalist: Entropix, Inc.

Michael Korkin CEO, Entropix, Inc.
2005-2014 Vice President of Engineering at Arecont Vision, LLC -- Developed first full line of H.264 IP/HD video surveillance cameras 1MP-40MP, first ever panoramic multi-sensor IP cameras, and cameras currently used aboard SpaceX Dragon spacecraft. The company is highly profitable, currently shipping to over 100 countries. The company was listed in 2011 by Inc. among the fastest growing companies in all industries in the US. 1997-2004 Founder and CTO of Genobyte, Inc. Engineering services company specializing in FPGA-based High Performance Computing. The company set Guinness World Record in Science and Technology in 2001 for developing a Neural Network Supercomputer for Japan's Key Technology Center / Advanced Telecommunications Research in Kyoto, Japan, with computational performance of 10,000 Pentiums. The supercomputer was developed for training and real-time simulation of large-scale neural networks, a predecessor of modern-day deep learning and GPU-based supercomputing. 1991-1997 Sr. Hardware Engineer at Fischer Imaging Corp. designing high-speed hardware image processors for medical industry: digital radiology, cardiac catheterization labs, lithotripsy, digital stereotactic mammography. 1982-1990 Sr. Research Scientist at National Graphic Arts Technology Research Institute. Designed digital imaging systems for graphic arts industry: high-resolution digital color scanners, digital color correction systems. Ph.D. in Digital Image Processing (MGUP, Russia, 1988) M.S. in Computer Systems (MIIT, Russia, 1982)

Level: All
Type: Talk
Tags: Video & Image Processing; Supercomputing & HPC; Deep Learning & Artificial Intelligence; Emerging Company Summit

Day: Wednesday, 04/06
Time: 16:10 - 16:20
Location: Room 220B

ECS6111 - Early Stage Challenge Finalist: Analytical Flavor Systems

Jason Cohen Founder & CEO, Analytical Flavor Systems
Before starting Analytical Flavor Systems, Jason was the Founder and Executive Director of The Tea Institute at Penn State, which oversees 20+ researchers in 5 fields of study in traditional Chinese, Japanese, and Korean Tea. Jason did his research in Sensory Science and Data Mining, eventually developing the Gastrograph system after 3 1/2 years of research. Jason is a professional coffee, tea, and beer taster, and when he is not trying new products, he enjoys rock climbing, ice climbing, and fencing.

Level: All
Type: Talk
Tags: Deep Learning & Artificial Intelligence; Big Data Analytics; Emerging Company Summit

Day: Wednesday, 04/06
Time: 16:20 - 16:30
Location: Room 220B

ECS6212 - Early Stage Challenge Voting & Award Ceremony

Level: All
Type: Talk
Tags: Emerging Company Summit; Press-Suggested Sessions: General Interest

Day: Wednesday, 04/06
Time: 16:30 - 17:00
Location: Room 220B

S6143 - Enhanced Blueprint Rendering in OpenGL

Christoph Kubisch Senior Developer Technology Engineer, NVIDIA
Highly-Rated Speaker
Christoph Kubisch is a senior developer technology engineer for NVIDIA, where he focuses on advanced OpenGL and Vulkan real-time rendering techniques suitable for CAD/DCC and scientific applications. He collaborates with external partners and NVIDIA's internal teams to optimize rendering algorithms. Prior to joining NVIDIA, he was a researcher on hardware-accelerated visualization techniques for medical datasets at the Otto-von-Guericke University of Magdeburg. He has also worked as a technical artist creating game art, technology, and tools.

We'll present rendering technology for stylized lines, often found in blueprint drawings for CAD software. The techniques allow higher quality depiction of lines than classic OpenGL, by adding flexibility to stippling patterns or joints, and improve the quality of lines with arbitrary widths. We'll present several optimization techniques that the system makes use of.

Level: All
Type: Talk
Tags: Product & Building Design; Real-Time Graphics

Day: Wednesday, 04/06
Time: 16:30 - 16:55
Location: Room LL21A

S6205 - Towards Efficient Option Pricing in Incomplete Markets

Shih-Hau Tan Ph.D. Student, University of Greenwich
Shih-Hau Tan graduated from National Tsing Hua University in Taiwan, finished his M.S. in University of Nice Sophia Antipolis with internship in INRIA in France, and is now working for ITN-STRIKE Marie Curie project under European Union on computational finance in London. His research interests include nonlinear option pricing for incomplete markets, application in commodity markets, and high performance computing with implementation on GPUs.

Nonlinear option pricing is a new approach for traders, hedge funds, or banks to obtain more accurate option price and to do fast model calibration using huge market data. Numerically the main problem is to solve fully nonlinear PDEs and strategies like Newton's method and ADI scheme are employed. Batch operations are used as well for calculating different option pricing problems together at the same time. We'll introduce how to use OpenACC and CUDA libraries to accelerate the whole computation. The complexity analysis will be shown first. We can obtain around 2X speedup by using OpenACC, and around 5X speedup by using libraries from cuSPARSE for solving tridiagonal systems and cuBLAS for computing level-2 functions.

Level: All
Type: Talk
Tags: Finance; OpenACC

Day: Wednesday, 04/06
Time: 16:30 - 16:55
Location: Marriott Salon 1

S6260 - Big Geospatial Data + Deep Learning + High Performance Computing = Geospatial Intelligence

Bingcai Zhang Tech Fellow, BAE Systems
Dr. Bingcai Zhang is a technical fellow at BAE Systems, the premier global defense and aerospace company. He joined BAE Systems in September 1995 right out of the University of Wisconsin-Madison, where he earned his Ph.D. in engineering and M.S. in computer science. Bingcai's research interests are geospatial information technology and 3D mapping; robot vision and unmanned systems; and 3D geoweb search. He has held positions as chief architect, chief photogrammetrist, R&D manager, and technical fellow with BAE Systems. Bingcai has three inventions: Embedded Photogrammetry, Next Generation Automatic Terrain Extraction (NGATE), and Automatic 3D Object Extraction.

We present two algorithms that are specifically designed to accurately detect geospatial objects in geospatial images. Combining these two algorithms with deep learning algorithms, we have achieved detection accuracy over 99% for vehicles, positional accuracy of within 6 pixels, orientation accuracy of less than 10 degrees, and false positive error rate of 0.001% with 7.5cm GSD aerial images. In essence, our algorithms induce learning capability from deep learning into template image matching in geospatial intelligence. Our algorithms reduce false positive error rate by an order of magnitude over softmax classifier. With over 99% accuracy, we believe this may be the game changer in geospatial intelligence domain.

Level: All
Type: Talk
Tags: Aerospace & Defense; Big Data Analytics; Deep Learning & Artificial Intelligence

Day: Wednesday, 04/06
Time: 16:30 - 16:55
Location: Marriott Salon 2

S6280 - Accelerating Spark Workloads Using GPUs

Rajesh Bordawekar Research Staff Member, IBM Research
Rajesh Bordawekar is a Research Staff Member at the IBM T. J. Watson Research Center. His current interest is exploring software-hardware co-design of analytics workloads. He works at the intersection of high-performance computing, analytics, and data management domains. He has been investigating how GPUs could be used for accelerating key analytics kernels in text analytics, data management, graph analytics, and deep learning. As part of this work, he collaborates closely with the IBM Power Systems, and various analytics and database product teams. He is currently leading a team that is exploring applications of GPUs for accelerating key Spark workloads.

The Apache Spark engine is being increasingly used for implementing large-scale distributed analytics workloads. These workloads cover a wide array of analytics models, including predictive analytics, optimizations, and graph analytics. We'll discuss opportunities for exploiting GPUs for accelerating different Spark components such as MLLib. The talk will first overview the Spark programming and execution model and the describe key issues in integrating GPUs into the Spark infrastructure. We then describe our approach for enabling Spark to use multiple GPUs in a distributed manner and provide details of accelerating key MLLib kernels without changing the source Spark program.

Level: All
Type: Talk
Tags: Big Data Analytics; Deep Learning & Artificial Intelligence; Algorithms

Day: Wednesday, 04/06
Time: 16:30 - 16:55
Location: Room 210F

S6287 - PerfMon Redux: Analyzing a CUDA® Application With the Windows Performance Monitor

Richard Wilton Research Scientist, Johns Hopkins University
Highly-Rated Speaker
Richard works on petabyte-scale databases in the Institute for Data Intensive Engineering and Science (IDIES) in the Department of Physics and Astronomy at Johns Hopkins University. He designed and implemented data-transformation workflows for the Pan-STARRS astronomical survey database. He is the lead developer of Arioc, a GPU-based short-read DNA sequence aligner that is a key component in the preparation of data for the NIH-funded Terabase Search Engine project.

Learn how to use the Performance Monitor tool ("PerfMon") in Microsoft Windows to do non-invasive real-time visualization of the performance of a CUDA application. This approach lets you aggregate performance data from the host operating system and hardware along with GPU performance metrics, and makes it possible to examine the interactions between GPU components (CUDA compute and memory activity) and non-GPU components (CPU activity, disk I/O, and host memory) throughout the execution lifetime of a complex CUDA application. Examples will be provided from the performance analysis of a pipelined CUDA application that runs kernels on multiple GPUs and that makes intensive concurrent use of CPU threads and host memory.

Level: Intermediate
Type: Talk
Tags: Performance Optimization

Day: Wednesday, 04/06
Time: 16:30 - 16:55
Location: Room 212A

S6387 - GPU Acceleration of Cholesky's Factorization in CHOLMOD: Batching, Hybrid and Multi-GPU

Steven Rennich Sr. HPC Developer Technology Engineer, NVIDIA
Highly-Rated Speaker
Steven Rennich has been developing massively parallel algorithms and supporting the use of GPUs in the fields of linear algebra and structural mechanics as part of NVIDIA HPC DevTech for five years. His current research interests include the GPU acceleration of direct sparse solvers and sparse matrix-vector multiplication and the continued optimization of GPU implementations of BLAS and LAPACK library functions. Prior to NVIDIA, Steven spent 10 years performing development and performance optimization for several commercial structural mechanics and rigid body dynamics codes. He obtained a Ph.D. in aeronautics and astronautics from Stanford University, where he developed novel computational fluid mechanics codes and studied vortex dynamics and plant morphogenesis.

Sparse matrix factorization is a fundamental tool in scientific computing and has been shown to be well accelerated using GPUs. Yet applying the full capability of the GPU to the factorization operation remains a challenge. This talk covers the latest GPU optimizations that have been applied to the Cholesky factorization algorithm within the well-known SuiteSparse/CHOLMOD linear solver. These optimizations include new NVIDIA CUDA versions of BLAS and LAPACK routines to accelerate operations on batches of small, non-uniformly sized matrices, hybrid computing enhancements, support for multi-GPU acceleration, and further avoidance of PCIe communication through refinements to the sub-tree algorithm.

Level: Intermediate
Type: Talk
Tags: Algorithms; Performance Optimization; Tools & Libraries; Computer-Aided Engineering

Day: Wednesday, 04/06
Time: 16:30 - 16:55
Location: Marriott Salon 3

S6417 - FireCaffe: Near-Linear Acceleration of Deep Neural Network Training on Compute Clusters

Forrest Iandola CEO, DeepScale
Forrest Iandola will complete his Ph.D. in EECS from UC Berkeley in spring 2016. Forrest has published more than 10 papers on computer vision and has applied computer vision research experience at companies such as NVIDIA and Microsoft.

One of the largest barriers to industrial adoption of deep learning is the time required to train models; it can take a week or more to train a high-quality deep neural network on a GPU workstation. We present FireCaffe, which trains state-of-the-art deep neural networks on a cluster of 32 GPUs with a 23x speedup over a single GPU.

Level: Intermediate
Type: Talk
Tags: Deep Learning & Artificial Intelligence; Computer Vision & Machine Vision; Big Data Analytics

Day: Wednesday, 04/06
Time: 16:30 - 16:55
Location: Room 210E

S6418 - Bringing NVIDIA GPUs to the PGAS/OpenSHMEM World: Challenges and Solutions

Dhabaleswar K. (DK) Panda Professor and University Distinguished Scholar, The Ohio State University
Highly-Rated Speaker
Dhabaleswar K. (DK) Panda is a professor and university distinguished scholar of computer science and engineering at the Ohio State University. He has published over 350 papers in major journals and international conferences. The MVAPICH2 (High Performance MPI over InfiniBand, iWARP and RoCE) open-source software package, developed by his research group, are currently being used by more than 2,450 organizations in 76 countries around the world. This software has enabled several InfiniBand clusters to get into the latest TOP500 ranking during the last decade. More than 293,000 downloads of this software have taken place from the project's website alone. He is an IEEE fellow and a member of ACM.

Learn about techniques and solutions that bring GPU computing to the World of Partitioned Global Address Space (PGAS) Models, especially with the emerging OpenSHMEM paradigm. PGAS models are gaining attention for providing shared memory abstractions that make it easy to develop parallel applications with dynamic and irregular communication patterns. However, the existing OpenSHMEM standards do not support direct communication on GPU memory. This talk discusses simple extensions to the OpenSHMEM model to address this issue. Challenges and solutions in designing CUDA-aware runtimes to support these extensions, optimize data movement using CUDA IPC and GPUDirect RDMA features are presented. The impact of these concepts on applications performance is demonstrated.

Level: Intermediate
Type: Talk
Tags: Supercomputing & HPC; Programming Languages; Tools & Libraries

Day: Wednesday, 04/06
Time: 16:30 - 16:55
Location: Room 211A

S6432 - Hand Gesture Recognition with 3D Convolutional Neural Networks

Pavlo Molchanov Research Scientist, NVIDIA
Pavlo Molchanov is with NVIDIA research science May 2015. He received BSc and MSc degrees, with distinction, in radio technical systems, devices, and complexes from National Aerospace University, Kharkov, Ukraine, in 2008 and 2010, respectively. He received PhD degree with Tampere University of Technology, Tampere, Finland in the area of signal processing.

This presentation will describe the design of a multi-resolution 3D convolutional neural network for drivers' hand gesture recognition. The talk will include task-specific data augmentation strategies that help to achieve state-of-the-art performance on a publicly available dataset. Several aspects of multi-sensor fusion with deep neural networks will be discussed in detail.

Level: Beginner
Type: Talk
Tags: Deep Learning & Artificial Intelligence; Video & Image Processing

Day: Wednesday, 04/06
Time: 16:30 - 16:55
Location: Room 210H

S6506 - Windowed All-kNN Search over Multidimensional Array Data from Medical Imaging

Dimitris Floros Ph.D. Student, EECS Aristotle University of Thessaloniki
Dimitris Floros is a Ph.D. candidate in Electrical and Computer Engineering at Aristotle University of Thessaloniki, Greece. Dimitris received his diploma from the same institution in 2015, with a thesis on the analysis and design of 3D projection mapping systems. His primary research interests lie in the field of high performance numerical computing. He is also involved with an award winning team that designs and implements interactive web apps and tools that enable a casual user to solve practical everyday problems.

We'll present a systematic approach for automatic optimization in searching for the k-nearest neighbors among all elements in a space/space-time domain, by their affinity in a feature space as well as their proximity in the domain. In comparison to its window-free counterpart, windowed kNN search, respecting local properties in images, scales linearly with the data domain size in terms of arithmetic complexity. But the search remains time consuming, due in part to obscure difficulties in keeping data locality high and computation redundancy low. We resolve the problem by means of orchestrating dimension permutations and reshapes, depending on the data size, window size, and GPU memory hierarchy, and utilizing efficient matrix operations, all in high-level CUDA expressions.

Level: Intermediate
Type: Talk
Tags: Medical Imaging; Big Data Analytics; Performance Optimization

Day: Wednesday, 04/06
Time: 16:30 - 16:55
Location: Room 212B

S6540 - Need for Speed: Accelerating High-Accuracy Quantum Chemistry Using OpenACC Directives

Janus Eriksen Postdoctoral Researcher, Aarhus University
Janus Eriksen, Ph.D., is a postdoctoral researcher employed at the qLEAP Center for Theoretical Chemistry, Aarhus University, Denmark, where his work is concerned in part with new theoretical developments in the area of accurate wave function-based quantum chemistry, and in part the HPC adaption of the resulting models. He has authored or co-authored more than 10 publications in international peer-reviewed journals and is one of the key developers of the Divide-Expand-Consolidate (DEC) local correlation module of the massively parallel and linear-scaling LSDalton electronic structure program.

Quantum chemistry (QC)–that is, the application of quantum mechanics to molecular systems–has become an integral tool to most, if not all of chemical, biological, and general material sciences. In this session, we describe how we have achieved speed-ups of more than 10x by accelerating existing CPU-based implementations of two of the most prominent models of modern wave function-based QC–the RI-MP2 and CCSD(T) models–as well as their local correlation Divide-Expand-Consolidate (DEC) formulations–DEC-RI-MP2 and DEC-CCSD(T). The codes in question have been accelerated in the massively parallel and linear-scaling LSDalton program using the compiler directives of the OpenACC 2.0 standard. Examples illustrating the efficiency of the resulting (portable) OpenACC GPU port will be provided.

Level: Intermediate
Type: Talk
Tags: Computational Chemistry; Supercomputing & HPC; OpenACC

Day: Wednesday, 04/06
Time: 16:30 - 16:55
Location: Marriott Salon 5

S6726 - Solid State LiDAR for Ubiquitous 3D Sensing

Louay Eldada CEO, Quanergy Systems, Inc.
Dr. Louay Eldada is Founder and Chief Executive Officer of Quanergy Systems, Inc. Prior to founding Quanergy, he founded and sold three photonic IC businesses to Fortune 100 companies. He chaired and organized 160 conferences; delivered 200 keynotes, invited talks and courses; published 270 technical papers, books and book chapters; received 50 technical awards; and holds 65 patents. Dr. Eldada studied business administration at Harvard, MIT and Stanford, and holds a Ph.D. in optoelectronics from Columbia University.

This tutorial covers for the first time the technology, operation and application of Quanergy's solid state LiDAR that is making 3D sensing ubiquitous, with its low price point, no moving parts, small form factor, light weight, low power consumption, long range, high resolution, high accuracy, long lifetime, and ability to operate in various environmental conditions. GPUs are used for performing in real time (1) LiDAR/Video data fusion for modeling and recognizing the environment around a vehicle, (2) object detection, classification, identification, and tracking, (3) scenario analysis and path planning based on deep learning, and (4) actuation of vehicle controls.

Level: All
Type: Talk
Tags: Self-Driving Cars & Automotive ; Robotics & Autonomous Machines; Deep Learning & Artificial Intelligence

Day: Wednesday, 04/06
Time: 16:30 - 16:55
Location: Room LL21E

S6752 - Sports Training and Virtual Reality: Challenges in Making the Physical, Virtual

Brendan Reilly CEO, EON Sports VR
Brendan Reilly is the CEO and Co-Founder of EON Sports VR. A former student assistant for Bill Self at the University of Kansas, he went on to work as a administrative assistant for Tim Jankovich at Illinois State's Men's Basketball team. While on staff at ISU, he became focused on virtual reality. While working out of the proverbial garage for about a year, he refined what the ideal virtual reality training program would look like and with no formal business or computer science background, he was able to convince the executives at one of the world's leading providers of virtual and augmented Reality, EON Reality, to join forces with him. Brendan has worked in the virtual and augmented reality market since joining EON Reality in 2011 and is now recognized as one of the leaders in not only innovative sports training, but the Virtual Reality and Augmented Reality industry.
Nils Andersson CTO, EON Reality, Inc.
As CTO, Nils Andersson is responsible for overseeing the direction of EON Reality's innovative virtual reality technologies globally. Nils started his career in 1991 at SAAB SPACE working as an electronics hardware engineer for the European Space Station. Shortly after, he began his 20+ year career as a software engineer at Enera, where he worked on the GPS-based tracking of vehicles during 1992 – 1995. Nils became CTO at EON Reality in 1999, and currently leads the company's technology into the future. Using agile development methods, Nils leads the core development team efficiently to develop future products for the interactive 3D market. Nils and his team have developed strong, collaborative relations with technology partners Texas Instruments, NVidia, and Vicon to ensure the smooth integration and delivery for EON Reality's newest technology. Nils received his Masters Degree in Electrical Engineering specializing in Computer Engineering and Science at Chalmers University of Technology in Gothenburg Sweden.

Using Virtual Reality, EON Sports VR has developed applications to bring specialized sports training to athletes, whether professionals or amateurs. The goal is to leverage Virtual Reality technology to provide realistic repetitions at game speed to improve on-field decision making. In doing this, the EON Reality and EON Sports VR development teams encountered specific challenges related to translating the training for American Football, Baseball, and Soccer to a Virtual Environment.

Level: All
Type: Talk
Tags: Virtual Reality & Augmented Reality; Education & Training; General Interest; Press-Suggested Sessions: Virtual Reality

Day: Wednesday, 04/06
Time: 16:30 - 16:55
Location: Room LL20C

S6843 - Deep Learning in Microsoft with CNTK

Alexey Kamenev Senior Software Engineer, Microsoft
Alexey Kamenev is an engineer in Microsoft Research Advanced Technology Group where he primarily works on CNTK (open-source, distributed deep learning toolkit). Prior to MSR he worked in Azure Machine Learning on deep learning and other machine learning algorithms.

This talk provides an overview of how Microsoft uses its open-source, distributed deep learning toolkit, CNTK, to make our products and services better. We'll show how you can use CNTK to train deep learning models of almost any topology and scale out to many GPUs. We'll review some of the challenges arising in scaling out deep learning workloads and CNTK way of solving them.

Level: Intermediate
Type: Talk
Tags: Deep Learning & Artificial Intelligence

Day: Wednesday, 04/06
Time: 16:30 - 16:55
Location: Hall 3

ECS6210 - MapD Interview

Todd Mostak Founder & CEO, MapD
Todd Mostak is the founder and CEO of MapD. Before MapD, Todd was a researcher focusing on GPU databases at MIT CSAIL. Seeking adventure upon finishing his undergrad, Todd moved to the Middle East, spending two years in Syria and Egypt teaching English, studying Arabic, and eventually working as a translator for an Egyptian newspaper. He then completed his M.A. in Middle East studies at Harvard University, afterwards taking a position as a research fellow at Harvard's Kennedy School of Government, focusing on the analysis of Islamism using forum and social media datasets. His frustration with the inability of existing tools to allow for the interactive exploration of large Twitter datasets motivated him to create MapD.

Level: All
Type: Talk
Tags: Emerging Company Summit

Day: Wednesday, 04/06
Time: 16:35 - 16:40
Location: Room 220B

ECS6211 - Artomatix Interview

Eric Risser CTO, Artomatix
Neal is a serial entrepreneur, with Artomatix being the third company he has founded – the first was acquired by Agilent Technologies in 2006. His second project was a social games company that had an acquisition offer from one of the top social games companies in the world. Neal has a technical background with three patents to his name but in the last decade has been more commercially focused, fulfilling many roles such as CEO, Product Manager, Portfolio Manager, Sales & Business Development.

Level: All
Type: Talk
Tags: Emerging Company Summit

Day: Wednesday, 04/06
Time: 16:40 - 16:45
Location: Room 220B

S6146 - SculptPrint: Subtractive 3D Printing through GPUs

Tommy Tucker CEO, Tucker Innovations
Tommy Tucker is the CEO and owner of Tucker Innovations. He has a secret clearance and a Ph.D., M.S., and B.S. in mechanical engineering. He has over 15 years of experience writing computationally intensive software applications for engineering, medical, and defense applications. After spending the early part of his career at high-tech startup companies, Tommy founded Tucker Innovations to facilitate his software consulting activities. Through Tucker Innovations, he has aided various organizations in producing software applications from concept to product launch and continuing through multiple release cycles. The Tucker Innovations team includes a blend of U.S.-based employees and offshore contractors. Clients range from small, high-tech startup companies to large organizations such as 3M, 3D Systems, the U.S. Navy, and U.S. Air Force.

We'll describe a new software package that moves GPUs from rendering virtual 3D objects seen with your eyes to making physical 3D objects held in your hands. SculptPrint is a computer-aided manufacturing (CAM) application for producing computer numerical controlled (CNC) machine tool cutting tool paths with a high level of automation and rich 3D feedback to the machinist and manufacturing engineer. The underlying technology uses a fundamentally different geometry representation from traditional CAD/CAM systems and leverages a suite of different GPU parallel processing algorithms to accomplish its workflow. SculptPrint technology is referred to as "Subtractive 3D Printing" to distinguish it from traditional NC CAM.

Level: Intermediate
Type: Talk
Tags: Product & Building Design

Day: Thursday, 04/07
Time: 09:00 - 09:50
Location: Room LL21A

S6148 - Enabling Smart Cities with GPU-Accelerated Infrastructure

Pradeep Gupta Senior Solutions Architect - APJ, NVIDIA
Pradeep Gupta is a lead deep learning solutions architect at NVIDIA, where he supports customers and developers across the Asia Pacific, Japan, and India regions for deep learning and HPC application development. He also works to enable the GPU computing ecosystem in universities and research labs across these regions. He is responsible for running and managing R&D projects at the NVIDIA Technology Centre at Singapore. He is working on smart cities enablement with GPU computing initiative at NVIDIA. Before joining NVIDIA, Pradeep worked with various technologies in high performance computing domains. He received an M.S. in research from the Indian Institute of Science (IISc), Bangalore. His research focused on developing compute-efficient algorithms. He has numerous publications in IEEE, SPIE, and other reputed conferences.

Smart cities are getting a lot of attention and both academia and industry are focusing and investing in next-generation technologies for making this as a reality. We'll present a case study on how GPU-based IT infrastructure can enable different components and use-cases of a smart city platform. Smart cities IT infrastructure will need massive computational power and visualization of extremely rich visual contents within a given energy budget. GPU-accelerated data centers can provide a unified IT infrastructure and software platform to achieve that. This case study has taken Singapore's smart nation initiative as a reference and will also present different initiatives and projects using the GPU platform.

Level: All
Type: Talk
Tags: IoT; Intelligent Video Analytics (IVA); Self-Driving Cars & Automotive ; Deep Learning & Artificial Intelligence; Big Data Analytics

Day: Thursday, 04/07
Time: 09:00 - 09:25
Location: Room LL20D

S6195 - Burning on the GPU: Fast and Accurate Chemical Kinetics

Russell Whitesides Member of Technical Staff, Lawrence Livermore National Laboratory
Dr. Russell Whitesides has applied his theoretical and applied knowledge of chemical kinetics and scientific computing platforms towards internal combustion engine simulations with the goal of highly efficient, clean-combustion engines for transportation. Russell has pursued a variety of topics in mechanical engineering R&D in the course of his academic and research career. Since joining Lawrence Livermore National Laboratory, he has worked alongside the Methods Development Group at LLNL to enhance the capabilities and interoperability of scalable structural mechanics codes. His doctoral thesis focused on the atomistic chemical mechanisms of soot particle growth in combustion environments.

Come learn about our latest developments in accelerating combustion kinetics for computational fluid dynamics (CFD). We have extended our previously presented CUDA implementation to improve performance and will present multiple examples of improved solver performance. We will discuss the merits of our approach in comparison to related approaches and provide insight into the lessons we've learned along the way.

Level: Intermediate
Type: Talk
Tags: Computational Fluid Dynamics; Performance Optimization; Computational Physics; Aerospace & Defense; Computer-Aided Engineering

Day: Thursday, 04/07
Time: 09:00 - 09:25
Location: Marriott Salon 1

S6196 - Graphics Virtualization: Leveraging Microsoft's New GPU Virtualization Technologies

Chris Huybregts Senior Program Manager, Microsoft
Chris Huybregts drives the strategy and vision for Microsoft's Virtual GPU technology stack. By working with teams like HyperV and Azure, he's helping ensure the capabilities of GPUs in different virtualized environments meet the expectations of Microsoft's technology partners. Additionally, Chris owns the story around Microsoft's virtualized, GPU-accelerated visualization technology stack found in Remote Desktop.
Jeroen van Eesteren Senior Program Manager , Microsoft
Jeroen van Eesteren is a Senior Program Manager with the Microsoft Remote Desktop Services product team responsible for Remote Desktop Protocol Compression and RemoteFX vGPU technologies shipping as part of Windows and Azure RemoteApp.

Azure's new GPU-enabled N series VMs are leveraging new technology being developed across Microsoft's stack. In this talk we'll go over the roadmap of GPU virtualization advancements, how you can leverage HyperV to configure a deployment in house for testing or production, and what changes are being made to support visualization on this platform. The talk will provide the details needed for solution providers to leverage Microsoft's hypervisor to enable GPU pass-through leveraging discrete device assignment to a variety of VMs. We'll also be going over the additional visualization scenarios this new technology stack enables. This talk is appropriate for those looking to leverage Azure as well as their own deployments.

Level: All
Type: Talk
Tags: Graphics Virtualization; Data Center & Cloud Computing; Computer-Aided Engineering

Day: Thursday, 04/07
Time: 09:00 - 09:25
Location: Marriott Salon 4

S6256 - Particle Simulations with HOOMD-Blue

Joshua Anderson Research Area Specialist Lead, University of Michigan
Joshua Anderson is a Research Area Specialist Lead in the Glotzer Group at the University of Michigan. Dr. Anderson received his Ph.D. in Condensed Matter Physics from Iowa State University and is the creator and lead developer of HOOMD-blue. He is the 2015 winner of the CoMEF Young Investigator Award for Modeling and Simulation.

Come and see how to use HOOMD-blue, a flexible particle simulation tool. HOOMD-blue runs hard particle Monte Carlo, Molecular Dynamics, DPD, and other types of particle simulations, all on GPUs. It runs on single GPU workstations up to thousands of GPUs on supercomputers. Use python scripts to configure jobs with custom initialization, complex flow control, and in-situ analysis of data. This talk introduces HOOMD-blue features and describes how to use them, focusing on the newest capabilities. It demonstrate job scripts for common usage patterns and shows examples of efficient workflows.

Level: Beginner
Type: Talk
Tags: Computational Chemistry; Computational Physics

Day: Thursday, 04/07
Time: 09:00 - 09:50
Location: Marriott Salon 5

S6356 - Visual Sensemaking with GPU-Driven Machine Learning

Stef van den Elzen Visualization Architect, SynerScope BV
Stef van den Elzen has worked at SynerScope B.V. as a visualization architect since July 2015. Stef obtained his M.S. with honors in 2011. In July 2011, he started as a developer with SynerScope B.V. on a Ph.D. project at the Eindhoven University of Technology. For his research, he developed tools and techniques for the exploration of dynamic multivariate networks. The results of his Ph.D. research, completed in 2015, were published in a number of articles in international conference proceedings and journals. He received three best paper awards: at IEEE PacificVis 2013 for his work on extended massive sequence views, at IEEE InfoVis 2014 for his work on multivariate network exploration and presentation, and at IEEE VAST 2015 for work on dynamic network exploration. Furthermore, he received two Best Visualization awards for his work on the exploration and analysis of massive mobile data at the Data for Development Challenges 2013 and 2015. The work on reordering massive sequence views led to a patent application.

We show how our interactive, integrated analytics solution allows a new class of users to perform machine-assisted visual sensemaking. Up till now, machine learning techniques such as predictive analytics and deep learning are mostly used as part of a complex tool-chain that serves as an endpoint in the decision making process. We combine the strengths of human decision making and GPU-driven machine learning in a multi-coordinated visual analytics solution. This enables the discovery of actionable insights by bridging the gap between data scientist and business user.

Level: Beginner
Type: Talk
Tags: Big Data Analytics; Deep Learning & Artificial Intelligence; Self-Driving Cars & Automotive

Day: Thursday, 04/07
Time: 09:00 - 09:25
Location: Room 210F

S6410 - Comparing OpenACC 2.5 and OpenMP 4.5

Jeff Larkin DevTech Engineer, NVIDIA
Highly-Rated Speaker
Jeff Larkin is a software engineer in NVIDIA's Developer Technology (DevTech) group where he works on porting and optimizing HPC applications. He is also closely involved with the development of both the OpenACC and OpenMP specifications. Prior to joining NVIDIA Jeff worked in Cray's Supercomputing Center of Excellence at Oak Ridge National Laboratory.
James Beyer Senior Runtime Engineer, NVIDIA
James Beyer recently moved to NVIDIA after a 15-year tenure at Cray Inc. He is a longtime member of the OpenMP language committee, and co-chair of the OpenMP subcommittee on accelerator directives. He was one of the founding members of the OpenACC specification and remains an active member of the OpenACC technical committee.

We'll compare the current state of two competing accelerator directive sets: OpenACC 2.5 and OpenMP 4.5. As members of both the OpenACC technical committee and the OpenMP language committee, we'll provide an inside take on the current state of the directives and insight into how to transition between the directive sets.

Level: All
Type: Talk
Tags: Programming Languages; Performance Optimization; OpenACC

Day: Thursday, 04/07
Time: 09:00 - 09:50
Location: Room 212B

S6443 - Mining Audio Information on Web Videos and Recordings

Benjamin Elizalde PhD Student, Carnegie Mellon University
Benjamin Elizalde is a Ph.D. student at Carnegie Mellon University under the direction of Professor Ian Lane. He was a staff researcher in the Audio and Multimedia group at the International Computer Science Institute affiliated to UC Berkeley from 2012 to 2015. He worked under IARPA's ALADDIN project for video event detection on web videos as a participant of the TRECVID MED evaluations. He also worked on a Livermore Labs project on multimedia content analysis with high performance computing. His research has resulted in over 15 peer-reviewed publications. He received his B.S. and M.S. from Tecnologico de Monterrey in Mexico. During his M.S., he also did research at Carnegie Mellon University, where he began his work on audio-based content analysis.

We are surrounded by sounds and acoustics that describe the world we live in. This audio information is reflected in the content of videos, providing unique characteristics as well as complementing cues such as images and text. We'll present our ongoing research on deriving information from environmental sounds recordings and audio from city-location web videos utilizing GPU-based recurrent neural networks. These methods and findings can be applied to multimedia content analysis, robotics, and IoTs.

Level: All
Type: Talk
Tags: Deep Learning & Artificial Intelligence; Big Data Analytics; Press-Suggested Sessions: AI & Deep Learning

Day: Thursday, 04/07
Time: 09:00 - 09:25
Location: Room 210H

S6466 - High-Speed GPU Parallel Algorithm for the 2-D HOPS Formula and Application to Bio-Sensor Design

Takahiro Sasaki PhD student, University of Minnesota, Twin Cities
Takahiro Sasaki is a Ph.D. candidate in scientific computation at the University of Minnesota, Twin Cities. Takahiro received his B.S. and M.S. in electronic engineering focusing on optics from Osaka University in Japan, and experienced optical-system design and analysis of imaging performance for optical lithography tools at an optical instruments company in Japan. His research includes a combination of optics, numerical methods, and parallel computing.

We'll describe high-speed GPU parallel algorithms for computing rigorous diffracted optical fields from 2-D periodic structures by a combination of the HOPS (high-order perturbation of surface) formula and high-performance GPUs. This work makes a contribution not only to the acceleration of device development, such as bio-sensors using surface plasmon resonance, but also to a wide range of applications such as computer graphics and inverse problems. After providing analysis regarding the parallelizability of the HOPS formula, two different algorithms to extract full performance from GPUs will be explored for different problem sizes, and a FFT-accelerated algorithm will be investigated. In experiments, executing the three algorithms on CPUs and GPUs will be compared.

Level: Intermediate
Type: Talk
Tags: Computational Physics; Algorithms

Day: Thursday, 04/07
Time: 09:00 - 09:25
Location: Marriott Salon 6

S6501 - Deep Correspondence Restricted Boltzmann Machine for Cross-Modal Retrieval

Ruifan Li Assistant Professor , Beijing University of Posts and Telecommunications
Ruifan Li is an assistant professor of computer science at Beijing University of Posts and Telecommunications, and affiliated with the Engineering Research Center of Information Networks, Ministry of Education. Ruifan received B.S. and M.S. degrees in control systems and in circuits and systems from Huazhong University of Science and Technology, China, in 1998 and 2001, respectively. In 2006, he received a Ph.D. in signal and information processing from BUPT and joined the School of Computer Science there. In 2011, he spent one year as a visiting scholar at the Information Sciences Institute, University of Southern California. Ruifan's current research activities include neural information processing, multimedia information processing, and statistical machine learning.

Learn how correspondence restricted Boltzmann machine (Corr-RBM), a kind of classical model in deep learning, is used for the task of large-scale cross-modal retrieval, such as using text query for images. We'll first illustrate the RBM model as one of the building blocks in deep learning. We'll describe the architecture of the Corr-RBM model and its learning algorithm. We construct two deep neural structures using Corr-RBM for the task of cross-modal retrieval. A number of comparison experiments with their hardware and software settings are reported on three public real-world datasets. We report the computational time using the NVIDIA Tesla K20c GPU card for the largest dataset used in our experiments.

Level: Intermediate
Type: Talk
Tags: Deep Learning & Artificial Intelligence; Algorithms; Video & Image Processing

Day: Thursday, 04/07
Time: 09:00 - 09:25
Location: Grand Ballroom

S6622 - Advances in Remoting Protocol Technology for 3D Graphics

Derek Thorslund Director of Product Management, Citrix
Highly-Rated Speaker
Derek Thorslund is director of product management, driving Citrix's product strategy and roadmap for HDX (high definition experience) multimedia virtualization and remoting protocol technologies across XenApp, XenDesktop, and the Citrix Receiver. He joined Citrix in 2003, where he played a key role in introducing the Citrix Access Suite, forerunner to XenApp/XenDesktop Platinum Edition. Derek's previous roles in the high-tech industry include director of Product Management at Avotus and manager of New Business Applications at Bell-Northern Research (Canada).
Stephen Vilke Sr. Director of Engineering, Citrix
Stephen Vilke is a technologist, entrepreneur, and enterprise executive with 24 years of experience innovating, inventing, and leveraging technology to solve real-world business problems across financial services, technology and government sectors. He led the development of the ground-breaking Framehawk HDX technology introduced to Citrix XenApp/XenDesktop in 2015. Previously, Stephen held leadership positions in technology strategy, architecture, and IT operations at Barclays Global Investors, CIBC, Alteon WebSystems, and Clarify/Nortel. Stephen started his career as a NASA programmer/analyst at the Space Sciences Laboratory at U.C. Berkeley.

Learn about Human UX Protocol design concepts in Citrix's next-gen HDX display remoting technology and hear from XenApp customers pushing the limits with graphics virtualization over high-latency connections halfway around the world. User experience is fundamental to a successful implementation and the remoting protocol used to transmit the virtualized 3D app or desktop from the data center or cloud to the worker is critical. Delivering 3D graphics to demanding users over a long-haul corporate WAN connection without excessive bandwidth consumption requires innovative solutions.

Level: All
Type: Talk
Tags: Graphics Virtualization; Data Center & Cloud Computing

Day: Thursday, 04/07
Time: 09:00 - 09:50
Location: Marriott Salon 2

S6649 - One Size Doesn't Fit All: The Importance of Aligning VR Environments to Workflows

Matt Szymanski Technical Consultant, Mechdyne Corporation
Matt earned his BS in Bioengineering and MS in Electrical Engineering and Computer Science from the University of Illinois-Chicago. After working at the Argonne National Laboratory as a Scientific Research Software Developer, Matt co-founded and served as Chief Technology Officer CTO at VRCO, a software company acquired by Mechdyne in 2006. Prior to his current role, Matt served as the VP of Products at Mechdyne.

Despite the resurgence of interest in Virtual Reality due to HMDs like Oculus Rift and HTC Vive, VR has been available in a variety of forms from wearable, to desktop, to fully immersive rooms for several decades. With all of the recent fanfare, you may be wondering, what is the best VR solution for me? The answer: it depends on your goals. This presentation covers VR's complexities, NVIDIA's role in a solution, and what qualifies as VR, opposed to only 3D. A visual tour of VR technologies will demonstrate the pros and cons of different solutions. Use cases from various industries highlight how VR technologies improve decision making, problem solving, and learning. We conclude with a checklist of questions you should ask when determining the best VR solution to meet your needs.

Level: All
Type: Talk
Tags: Large Scale and Multi-Display Visualization; Virtual Reality & Augmented Reality

Day: Thursday, 04/07
Time: 09:00 - 09:50
Location: Room 210E

S6652 - SQLite Running Entirely on the GPU

Sky Morey Chief Architect, DEG
Sky Morey embodies creative, adaptive thinking as made manifest through technology. He operates as a technical architect at the macro level, drawing on decades of experience across multiple projects and environments. Since joining DEG as its first associate, Sky has demonstrated his software development expertise through direct engagement with DEG's engineering staff, niche troubleshooting, and ongoing research and development.

SQLite is a lightweight relational database and the most widely deployed database engine in the world, usually embedded in other applications. We have ported SQLite entirely to the GPU. You'll see a working version running on the GPU with native host file system access, followed by a lessons learned, and a deeper dive into the platform. The project is open source under the MIT license and usage examples will be shown. This project is still in early alpha.

Level: All
Type: Talk
Tags: Tools & Libraries; Embedded; Performance Optimization

Day: Thursday, 04/07
Time: 09:00 - 09:50
Location: Room 211B

S6679 - GPU in Surgery and Anatomy Simulation, Image Processing and Augmented Reality Projects

Boris Yaremin Director, High Performance Computing And Big Data Lab, Samara State Medical University
Born in 1977 at West Ukraine. Graduated Samara State Medical University at 2000. Working as Associate Professor at Operative Surgery And Clinical Anatomy Chair, as a transplant surgeon in Samara Center Of Organ And Tissue Transplantation. Works as computational scientist - as a team leader of Scientific Educational Center "Virtual Technology in medicine", product manager of Virtual Anatomy Atlas "Deep Anatomy", endovascular surgery simulator. Now is working as director of HPC unit of Samara State Medical University, director of Regional Transplant Service Samara Region Health Ministry.

This talk is about technical aspects of surgery and anatomy simulation in different levels. We will talk about GPU use in virtual clinic environment, about surgery simulation on endovascular trainer, laparoscopic training system. We will also discuss virtual dissection table creation. Other part of talk is about augmented reality in surgical practice.

Level: All
Type: Talk
Tags: Medical Imaging; Virtual Reality & Augmented Reality; Education & Training

Day: Thursday, 04/07
Time: 09:00 - 09:25
Location: Room LL21B

S6692 - Embedded Supercomputing: Radio Astronomy at the Limit

Simon Ratcliffe Technical Lead: Scientific Computing, SKA South Africa
Simon is the lead for scientific computing at SKA South Africa. Overall design and architecture for the high performance computing systems on the MeerKAT radio telescope. Core member of the international SKA Science Data Processor consortia that is currently designing the compute backend for the Square Kilometer Array, which will be one of the largest scientific facilities in the world.

This talk will present designs and performance results for a highly parallel Tegra X1 based compute platform being developed as part of a next generation radio telescope. The MeerKAT radio telescope is currently under construction in the semi-desert Karoo region of Southern Africa. This talk presents the ongoing work into developing novel computing technologies to deliver a large scale computational platform within the strict confines of power, space and emission that are in force at this remote site. Using the Tegra X1 as a building block, a rugged, oil-cooled platform has been developed that will power the imager that lies at the heart of the compute challenge. This is a follow on talk from an initial exploration presented in 2015.

Level: All
Type: Talk
Tags: Astronomy & Astrophysics; Embedded; Press-Suggested Sessions: HPC & Science

Day: Thursday, 04/07
Time: 09:00 - 09:25
Location: Room 212A

S6744 - Solving One of the Hardest Challenges in Virtual Reality: Human Eyes

Jeroen Snepvangers Co-Founder, Transfolio, LLC
Jeroen Snepvangers has been a leader in enterprise innovation using Virtual Reality, 3D Visualization and Digital Media for the past 13 years, beginning with founding RTT USA (acquired by Dassault 3DExcite in 2014). He served as North American CEO for 9 years, helping RTT grow from $10M to $100M globally. While with RTT, Jeroen serviced a long list of major automotive clients (GM, Toyota, Lexus, Nissan, Infiniti, VW, Audi, Porsche) as well as established consumer brands (The North Face, Coach, Adidas, Vans, Under Armour, HP, Beats by Dr. Dre). He led award-winning and forward-thinking virtual reality, augmented reality, interactive and mobile campaigns for these brands. Since 2012, Jeroen is an independent consultant, where he advises executives and investors at technology companies, consumer product companies and digital agencies. Jeroen also serves as non-executive board member at Mackevision GmbH, an Emmy award winning 3D VFX and automotive visualization company.

We address the numerous challenges of visualizing human eyes, and we explain why this is so vitally important to the VR community. The potential is massive, if you can capture and visualize anyone's eyes quickly and simply, it opens VR to all types of human interaction applications, which is usually the largest market for any consumer technology. However, digitizing eyes is hard. We will consider several current attempts to solve this problem and their differing approaches. We also take a detailed look at the science and technology behind one of the most promising solutions, a double projection Moiré profilometer with an accuracy of 300,000 measured points per inch. It raises the question: How long until your next selfie is a fully animated VR avatar?

Level: All
Type: Talk
Tags: Virtual Reality & Augmented Reality; Media & Entertainment

Day: Thursday, 04/07
Time: 09:00 - 09:25
Location: Room LL20C

S6867 - Learn Why Scyld Cloud Workstation (3D Accelerated Remote Desktop in a Web Browser) (Sponsored by Penguin Computing)

Gary Yee Senior Software Engineer , Penguin Computing
Learn why Scyld Cloud Workstation, a browser-based, high quality, low- bandwidth, 3D accelerated desktop can greatly streamline cloud HPC workflows. Penguin's Scyld Cloud Workstation provides significant time savings by eliminating the need for downloading large data files when using the cloud for HPC workflows. For example, a typical manufacturing engineer using a Scyld Cloud Workstation can run real-time, interactive GUI tools and 3D visualizations in the cloud, removing the traditional barrier of moving large files down to on-premises workstations. Additionally, since there is no browser plug-in or application installation needed, the difficulty of security changes is eliminated.

Learn why Scyld Cloud Workstation, a browser-based, high quality, low- bandwidth, 3D accelerated desktop can greatly streamline cloud HPC workflows. Penguin's Scyld Cloud Workstation provides significant time savings by eliminating the need for downloading large data files when using the cloud for HPC workflows. For example, a typical manufacturing engineer using a Scyld Cloud Workstation can run real-time, interactive GUI tools and 3D visualizations in the cloud, removing the traditional barrier of moving large files down to on-premises workstations. Additionally, since there is no browser plug-in or application installation needed, the difficulty of security changes is eliminated.

Level: All
Type: Talk
Tags: Data Center & Cloud Computing; Graphics Virtualization; Supercomputing & HPC

Day: Thursday, 04/07
Time: 09:00 - 09:50
Location: Room LL21D

S6173 - Tuning Performance on Kepler GPUs: An Introduction to Kepler Assembler and Its Usage in CNN optimization

Zhe Jia Development Engineer, Alibaba
Zhe is a development engineer in domain specific computing team at Alibaba and has participated in several projects on deep learning code optimization. He specializes in improving kernel performance through various aspects, from algorithm designing to low level optimization. He is one of developers of Kepler Assembler used in Alibaba. He graduated from Peking University, China. Before his graduation, he developed a computer model to simulate Jupiter's great red spot and tsunami, and accelerated it with CUDA.

Learn some advanced skills about performance optimization on Kepler GPUs. NVIDIA has provided many powerful tools to analyze and improve efficiency of CUDA kernels. However, in many specific cases, developers need to do some more detailed adjusting to get expected performance. In this session, a native assembler for Kepler architecture used in Alibaba will be introduced. Also, turning experiences of CNN and gemm implementation with this assembler will be shown as examples. If you are interested in assembly level optimization and want to use such a tool in Kepler architecture, you shouldn't miss this session!

Level: Advanced
Type: Talk
Tags: Deep Learning & Artificial Intelligence; Performance Optimization

Day: Thursday, 04/07
Time: 09:30 - 09:55
Location: Grand Ballroom

S6347 - Multi GPU, Interactive 3D Simulator for the Lattice Boltzmann Immersed Boundary Method

Bob Zigon Senior Staff Research Engineer, Beckman Coulter
Highly-Rated Speaker
Bob Zigon is a senior staff research engineer and has worked at Beckman Coulter for 13 years. He has degrees in computer science and mathematics from Purdue University. He was the architect of Kaluza, an NVIDIA Tesla-powered analysis application for flow cytometry. He's now researching how machine learning techniques can be applied to laboratory automation. His interests include high performance computing, numerical analysis, machine learning, and software development for life science.

The Lattice Boltzmann Immersed Boundary method is a technique in computational fluid dynamics used to model and simulate fluid-structure-interaction problems. The goal of this session is to demonstrate practical strategies for partitioning the computations across Tesla K40 cards while exploiting the programmable pipeline inside of a Quadro K5000 to visualize the 3D flow fields at interactive rates. GPU results will be compared with an OpenMP implementation. Full source code will be provided.

Level: All
Type: Talk
Tags: Computational Fluid Dynamics; Computational Physics; Computer-Aided Engineering

Day: Thursday, 04/07
Time: 09:30 - 09:55
Location: Marriott Salon 1

S6348 - Computational Drug Discovery Using Deep Learning

Olexandr Isayev Research Professor, University of North Carolina at Chapel Hill
Olexandr Isayev is a scientist at UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill. His research interests focus on making sense of chemical data with molecular modeling and machine learning. Before joining UNC in 2013, he was a post-doctoral research fellow at the Case Western Reserve University and scientist at a government research lab. In 2008, he received his Ph.D. in computational chemistry. He received the "Emerging Technology Award" from the American Chemical Society (ACS) and the GPU computing award from NVIDIA in 2014.

Learn how deep learning can address some of the most critical problems of computational drug discovery. Historically, the field has been strongly focused on the development of drugs intended to act against one specific target with high potency and selectivity. It is now recognized that these concepts are too simplistic. At the same time, there was an unprecedented growth of chemical databases incorporating hundreds of billions of useful chemical records. Deep learning is well suited to address both of these challenges. GPU computing is the central hardware technology that allows deep learning to scale.

Level: Advanced
Type: Talk
Tags: Deep Learning & Artificial Intelligence; Computational Chemistry; Big Data Analytics

Day: Thursday, 04/07
Time: 09:30 - 09:55
Location: Room 210H

S6402 - Fast Detection of Neighboring Vectors

Krzysztof Kaczmarski Assistant Professor, Warsaw University of Technology, Faculty of Mathematics and Information Science
Since 2008, Krzysztof Kaczmarski has worked at the Faculty of Mathematics and Information Science at Warsaw University of Technology in the Department of Computer Science and Numerical Methods as a coordinator of computer science studies. Before this, he took part in several projects in Polish-Japanese Institute of Information Technology. In 2007 he got a Ph.D. at the Institute of Computer Science, Polish Academy of Sciences, in the field of modeling distributed object-oriented databases.

We'll present several methods for detecting pairs of vectors, which are in Hamming distance 1. This problem is an important part of the cell graph construction in motion planning in a space with obstacles. We'll begin with a naive square-time solution, which simply compares pairs of vectors, through building dedicated search trees, moving towards an optimal linear algorithm. Sequential linear time algorithms for the problem were already known, but due to high constants hidden in the complexity function, they appeared to be not very efficient for real-life data. Our GPU-based massively parallel solution promises acceptable execution times, opening dynamic cell graph construction for real-time applications like robotics and optimal path searching.

Level: Intermediate
Type: Talk
Tags: Algorithms; Robotics & Autonomous Machines; Tools & Libraries

Day: Thursday, 04/07
Time: 09:30 - 09:55
Location: Marriott Salon 3

S6404 - Accelerating Transport System Micro-Simulations Using an Omnidirectional Treadmill

Peter Heywood Ph.D. Student, The University of Sheffield
Peter Heywood is a Ph.D. student in the University of Sheffield's Department of Computer Science (a NVIDIA GPU Research Center), working on GPU-accelerated micro-simulation of transport systems and is named as lead researcher on an ongoing collaboration funded by Highways UK. Under the supervision of Dr. Paul Richmond and working closely with the Transport Systems Catapult (the UK's innovation centre for intelligent mobility), Peter is currently developing techniques to extend FLAME GPU (open-source agent-based modelling environment) specific to transport systems simulation, which will enable simulations of significantly greater scale and complexity than currently possible.

Discover how GPUs are being used to accelerate predictive simulations used in transport system planning and management to help alleviate the global increase in transport demand. We'll discuss the role of predictive, high-performance micro-simulations in transport management and provide insight into the development process and benchmark performance of agent-based transport models developed using FLAME GPU. We'll also describe the lessons learned in creating a virtual reality experience of real-time crowd simulation using an omnidirectional treadmill for urban planning.

Level: All
Type: Talk
Tags: Virtual Reality & Augmented Reality; Real-Time Graphics

Day: Thursday, 04/07
Time: 09:30 - 09:55
Location: Room LL20C

S6412 - Fourier Domain Pulsar Acceleration Searches on GPUs for the Square Kilometre Array

Sofia Dimoudi Research Associate, University of Oxford
Sofia Dimoudi studied for her Ph.D. at Durham University, where she worked on GPU acceleration of atmospheric tomography computational algorithms for real-time control on adaptive optics systems for extremely large telescopes. Currently, she works as a research associate at the OeRC, looking at hardware acceleration of real-time pulsar signal processing algorithms for next-generation radio telescopes.

We'll describe how we can accelerate one of the most demanding computational tasks of the real-time pulsar signal processing pipeline of the world's largest next generation radio telescope, the Square Kilometre Array (SKA). We'll explain the scientific goals and importance of pulsar searches, along with the technical challenges facing pulsar signal processing on the SKA. Pulsar acceleration searches will be introduced, and an overview of a Fourier Domain method for recovering signal power from binary accelerated pulsars will be given. We'll then present our GPU implementation of this method, discuss techniques used for optimisation, show comparative computational performance results, and consider performance projections with future GPU technology.

Level: All
Type: Talk
Tags: Astronomy & Astrophysics; Algorithms

Day: Thursday, 04/07
Time: 09:30 - 09:55
Location: Room 212A

S6472 - The Promise of GPU Analytics or Why GPU is the New CPU

Todd Mostak CEO, MapD
Todd Mostak is the founder and CEO of MapD. Before MapD, Todd was a researcher focusing on GPU databases at MIT CSAIL. Seeking adventure upon finishing his undergrad, Todd moved to the Middle East, spending two years in Syria and Egypt teaching English, studying Arabic, and eventually working as a translator for an Egyptian newspaper. He then completed his M.A. in Middle East studies at Harvard University, afterwards taking a position as a research fellow at Harvard's Kennedy School of Government, focusing on the analysis of Islamism using forum and social media datasets. His frustration with the inability of existing tools to allow for the interactive exploration of large Twitter datasets motivated him to create MapD.

We'll explain why GPU-powered in-memory databases and analytics platforms are the logical successor to CPU in-memory systems, largely due to recent increases in onboard memory available on GPUs. With sufficient memory, GPUs possess numerous advantages over CPUs, including much greater compute and memory bandwidth and a native graphics pipeline. We'll demo how MapD is able to leverage multiple GPUs per server to extract orders-of-magnitude performance increases over CPU-based systems, bringing interactive querying and visualization to multi-billion row datasets.

Level: All
Type: Talk
Tags: Big Data Analytics; Performance Optimization

Day: Thursday, 04/07
Time: 09:30 - 09:55
Location: Room 210F

S6543 - IBM Cloud Services (SoftLayer) Enables End-to-End HPC, Machine Learning, and Graphics Infrastructure in the Cloud

Jerry Gutierrez Global HPC Leader, SoftLayer, an IBM Company
Jerry Gutierrez, Global HPC Sales Leader at SoftLayer, has worked in the IT industry for over 20 years. His career has spanned from positions at Hewlett Packard, Terremark (formerly Data Return), and founder of Data Direct Business Solutions, an IT managed services and cloud solution provider. In 2012, he joined SoftLayer to assist businesses from various industries across the globe successfully utilize SoftLayer Cloud with a focus on HPC and GPU Accelerated solutions.

Expand virtualization to any user on the network. Deliver better graphics, improve productivity, and grant access to business-critical applications anywhere, all from SoftLayer and the IBM Cloud.

Level: All
Type: Talk
Tags: Graphics Virtualization; Media & Entertainment; Product & Building Design

Day: Thursday, 04/07
Time: 09:30 - 09:55
Location: Marriott Salon 4

S6651 - Using Today's Fastest Chips to Design the Chips of Tomorrow

Mauro Calderara Ph.D. Student, ETH Zurich
Mauro Calderara is a Ph.D. student working on a density functional theory code simulating nano structures under the supervision of Professor Dr. Mathieu Luisier at the Integrated Systems Laboratory at ETH Zurich. His background is in theoretical physics, which Mauro has studied at ETH and UC Berkeley until 2009, receiving an M.S. from ETH. While working for an IT company, he received a Certificate of Advanced Studies in computer science (CAS ETH in computational science and distributed systems) before he started his Ph.D. His interests focus on the algorithmic challenges in ab-initio quantum transport on modern hybrid supercomputers. He has developed SplitSolve, a linear solver targeting the sparse linear systems we encounter in our quantum transport simulations. It runs on accelerators like NVIDIA GPUs or Intel's MICs and typically outperforms traditional sparse solvers such as MUMPS, Pardiso, or SuperLU.

We'll show how one can effectively leverage GPUs to perform ab-initio quantum transport simulations in realistic nanostructures and investigate the behavior of tomorrow's transistors down to the flow of electrons through atomic geometries. Due to transistors getting smaller and smaller, simulations have to account for quantum mechanical effects from first principles. The incurred computational cost has so far limited such investigations to small domains that are of little practical interest from an experimental point of view. We present algorithms specifically geared towards GPUs that are more memory efficient and faster than traditional CPU-based ones. They speed up the calculations by an order of magnitude allowing for the simulation of realistic devices.

Level: Intermediate
Type: Talk
Tags: Computational Physics; Algorithms; Supercomputing & HPC

Day: Thursday, 04/07
Time: 09:30 - 09:55
Location: Marriott Salon 6

S6811 - GPU-Based Real Time Reconstruction and Visualization of Cardiovascular 3D Ultrasound Images

Erik N. Steen Principal Engineer, GE Healthcare
Erik N. Steen is a principal engineer with GE VIngmed Ultrasound, responsible for technology strategy. He received his M.S. in computer science from the Norwegian Technical University in Trondheim in 1992. He received his Ph.D. in the field of 3D medical image processing in 1996. He has worked at GE Vingmed Ultrasound in Norway since 1996. During the last few years, he has been actively involved in development of a new GPU-based architecture for real-time ultrasound image reconstruction as well as real-time visualization of 3D cardiac images.

We'll cover the clinical and technical benefits of using GPUs for real-time reconstruction and visualization of 3-dimensional and 2-dimensional cardiovascular ultrasound images. The session has three main parts. First, we'll introduce some of the clinical challenges in cardiovascular ultrasound imaging. Second, we'll give an overview of a new image reconstruction architecture called cSound as well as some of the algorithms that have been implemented with this new architecture. The technical and clinical benefits of this architecture also will be discussed. Finally, we'll cover GPU-based real-time visualization of the reconstructed 3D images. Several examples of 2D and 3D ultrasound images will be shown.

Level: All
Type: Talk
Tags: Medical Imaging; Video & Image Processing; Press-Suggested Sessions: HPC & Science

Day: Thursday, 04/07
Time: 09:30 - 09:55
Location: Room LL21B

S6849 - Maps for Autonomous Cars

Willem Strijbosch Head of Product Office, TomTom
Willem Strijbosch is Head of the Product Office at TomTom. He is responsible for autonomous driving within TomTom. Before TomTom, Willem spent 10 years at McKinsey & Company, the consultancy firm, serving clients globally on technology topics. Willem is a physicist with an MBA from INSEAD and has throughout his career focused on opportunities that combine cutting-edge technology with strong business potential.
Blazej Kubiak Expert Software Engineer, TomTom
Blazej Kubiak is enthusiast of all aspects of big data processing and all technologies that bring this enthusiasm from dream into reality – like CUDA. Blazej has been working in Tele Atlas and TomTom for eight years and has been involved in many challenging projects related to image and laser data processing. He has been one of the authors of automated traffic signs detection systems and bird-eye image mosaic creation process. Currently he works as Expert Software Engineer in areas of Deep Neural Networks for object detection and recognition. Blazej is co-author of Road DNA positioning patent application.

Hear the latest thinking on the maps that autonomous cars will use for highly accurate positioning. Autonomous cars need maps to function. The most critical use of maps is centimeter-level positioning. TomTom solves this with highly accurate lane information and lateral depth maps, which we call RoadDNA. Autonomous driving and map creation have incredible synergy. Mobile mapping cars go through the exact same process as autonomous cars: sensor perception, sensor data processing and comparing it with a stored version of reality. We process the sensor data with GPUs for fast creation of deep neural networks (DNNs) that can recognize traffic signs and other road attributes, both in-car as well as in the cloud. These DNNs, RoadDNA and sensors in the car together enable autonomous cars.

Level: Beginner
Type: Talk
Tags: Self-Driving Cars & Automotive ; Deep Learning & Artificial Intelligence; Big Data Analytics

Day: Thursday, 04/07
Time: 09:30 - 09:55
Location: Room LL21E

S6871 - Fun! Free! Awesome! Advanced Robotics in the Era of Open Source Software

Brian Gerkey CEO, Founder, Open Source Robotics Foundation
Brian Gerkey is CEO of OSRF. Prior to joining OSRF, Brian was Director of Open Source Development at Willow Garage. Previously, Brian was a Computer Scientist in the Artificial Intelligence Center at SRI, and before that, a postdoctoral research fellow in the Artificial Intelligence Lab at Stanford University. Brian received his Ph.D. in Computer Science from the University of Southern California (USC) in 2003, his M.S. in Computer Science from USC in 2000, and his B.S.E. in Computer Engineering, with a secondary major in Mathematics and a minor in Robotics and Automation, from Tulane University in 1998. Brian is a strong believer in, frequent contributor to, and constant beneficiary of open source software. Since 2008, Brian has worked on the ROS Project, which develops and releases one of the most widely used robot software platforms in robotics research and education (and soon industry). He is founding and former lead developer on the open source Player Project, which continues to maintain widely used robot simulation and development tools. For his work on Player and ROS, Brian was recognized by MIT Technology Review with the TR35 award in 2011.

After many years of its being "just around the corner," we are now witnessing the beginning of a robot revolution. We hear about robots daily, from awe-inspiring technical achievements to breath-taking investments and acquisitions. Why? And, why now? Learn how open source software, embedded and parallel computing, and new sensors have come together to change the landscape for robotics developers (and users). In particular, GPUs have become increasingly useful to build better robots. Given large data volumes and correspondingly complex decisions, highly parallel processing is becoming useful not only for machine learning but also other key aspects of robotics applications. Pose estimation, path-planning, and simulation may be the next domains to increasingly depend on a multi-core approach.

Level: All
Type: Talk
Tags: Robotics & Autonomous Machines; Embedded

Day: Thursday, 04/07
Time: 09:30 - 09:55
Location: Room LL20D

S6134 - High Performance and Productivity with Unified Memory and OpenACC: A LBM Case Study

Jiri Kraus Compute Devtech Software Engineer, NVIDIA
Highly-Rated Speaker
Jiri Kraus is a senior developer in NVIDIA's European Developer Technology team. As a consultant for GPU HPC applications at the NVIDIA Julich Applications Lab, Jiri collaborates with local developers and scientists at the Julich Supercomputing Centre and the Forschungszentrum Julich. Before joining NVIDIA, he worked on the parallelization and optimization of scientific and technical applications for clusters of multicore CPUs and GPUs at Fraunhofer SCAI in St. Augustin. He holds a diploma in mathematics from the University of Cologne, Germany.

Learn how to use unified memory to improve your productivity in accelerating applications with OpenACC. Using a Lattice Boltzmann CFD solver as an example, we'll explain how a profile-driven approach allows one to incrementally accelerate an application with OpenACC and unified memory. Besides the productivity gain, a primary advantages of this approach is that it is very accessible also for developers new to a project and therefore not familiar with the whole code base.

Level: Intermediate
Type: Talk
Tags: Computational Fluid Dynamics; Supercomputing & HPC; Tools & Libraries; OpenACC; Aerospace & Defense

Day: Thursday, 04/07
Time: 10:00 - 10:25
Location: Marriott Salon 1

S6171 - How Elixir Aircraft Designs the First Aircraft with Catia in the Cloud with NVIDIA GRID™

Christophe Delattre Software Architecture Director, Dassault Systemes
Graduated in Computer Graphics and Information Science, Christophe Delattre spent his entire career at Dassault Systèmes. After leading the 3D Visualization team for over 10 years, making sure CATIA and other Dassault Systèmes solutions provide best performance and reliability to their customers, he has also been managing the technical partnership between NVIDIA and Dassault Systèmes. Over the past 3 years, he has been entitled to lead the entire remote graphics architecture and strategy of the Dassault Systèmes platform (aka 3DEXPERIENCE platform). Supervising both internal developments and the entire deployment over the Cloud infrastructure, he makes sure the best integration possible is implemented into the Dassault Systèmes dashboard with minimal lag and best user experience leveraging NVIDIA GRID technology.
Stefan Schoenefeld Devtech Engineer, NVIDIA
Highly-Rated Speaker
Stefan Schoenefeld is a Senior Developer Technology Engineer at NVIDIA. After working on the Scenix Scene Graph SDK and the Workstation Performance Drivers, he now works on implementing NVIDIA GRID™, improving video encoding and new ideas for remote graphics.
Cyril Champenios COO and Co-Founder, Elixir Aircraft
With a bachelor of engineering in material science, a bachelor in aerospace engineering and an International Degree of Technology from both the University of Nantes and Kingston (UK), Cyril spent 5 years as an aerospace consultant at Dassault Systèmes before co-founding and becoming COO of Elixir Aircraft in 2014.

Elixir Aircraft is using CATIA on the Cloud and leveraging Dassault Systèmes 3D Remote service relying on NVIDIA GRID technology. Showcased last year as a proof of concept, 3D Remote (DS offering on the cloud to run CATIA on a remote VM on the cloud) is now officially available. Elixir Aircraft will explain how they could benefit from workstation in the Cloud leveraging 3DEXPERIENCE platform with minimal setup and no hardware constraints to design their aircraft with latest CATIA version. Dassault Systèmes and NVIDIA will also share some of the coming developments available in the future.

Level: Intermediate
Type: Talk
Tags: Graphics Virtualization; Data Center & Cloud Computing; Product & Building Design

Day: Thursday, 04/07
Time: 10:00 - 10:50
Location: Marriott Salon 4

S6177 - Efficient Imaging in Radio Astronomy Using GPUs

Bram Veenboer Ph.D. Researcher, Astron
Bram Veenboer is working as Ph.D. researcher at ASTRON, the Netherlands Institute for Radio Astronomy. Bram works on the Dome project, where he did research towards the biggest radio telescope in the world, the Square Kilometre Array. Bram's research focuses on accelerator platforms and how they can be used to speed up the algorithms that are used to transform observation data into sky-images. Before joining ASTRON, he was student in computer science at the Vrije Universiteit in Amsterdam. His M.S. program focused on high performance and distributed computing, and for his final thesis he worked on spiking neural network on GPUs at the Centrum voor Wiskunde en Informatica in Amsterdam.

We'll present an optimized GPU implementation of a new radio astronomical imaging algorithm that outperforms the current state of the art. In contrast to traditional imaging algorithms, it offers correction for direction-dependent effects at negligible additional computational cost. We'll explain why this algorithm works so well on GPUs and show the optimization techniques that were applied to get there.