Sign In
GTC Logo
GPU
Technology
Conference

April 4-7, 2016 | Silicon Valley
Check back often for session updates.

Scheduler Planner

Print
Download Pdf
 

 
Refine:
  • Session Levels:
  • |
  • |
  • |
  • |
  • Session Levels:
  •  
  •  
  •  
  •  
  • = Highly Rated Speaker

HANDS-ON LAB

Presentation
Details

L6118 - Simple Steps to Speed Up Your C Code Dramatically with GPU

Beatriz Pazmino Guest Researcher, Wesleylan - NIST
Beatriz Pazmino is a postdoctoral researcher at the Materials Science and Engineering Division, National Institute of Standards and Technology. Her research interests include: implementing and developing computational methods to aid the understanding of soft materials, in particular: 1) calculating electromagnetic and hydrodynamic properties of objects having arbitrary shape, including polymers and polymer assemblies in solution; 2) characterizing biometric data (i.e., cell morphology and human activities) based on shape metrologies using path-integration methods to find ""the useful metric"" that relates with its functionality; 3) unifying theoretical perspectives in glass dynamics and its application in understanding the effects of nanoparticles and confinement in polymer-glass forming materials, and the quantification of the interaction strength between polymer matrix and nanoparticle. Beatriz has a Ph.D. in physics from Wesleyan University and a B.S. in electrical engineering.
Fernando Vargas-Lara Guest Researcher, NIST-Weselyan
Fernando Vargas-Lara is a guest researcher at the Material Science and Engineering Division. His research interests include transport properties, computer modeling, statistical mechanics, soft matter, polymer physics, DNA, DNA-based materials, carbon-based nanocomposites. He received his Ph.D. in physics from Wesleyan University, Middletown, CT.

Many scientists have their old trusted codes in C and want to gain the GPUs speed-up without much trouble. To reach this community, we have selected the most common algorithms where the parallel architecture benefits are best and describe the little details encountered in the transition to CUDA programming. We'll cover simple concepts, such as identifying device and host routines, dynamic memory allocation, calling external functions, and how to implement averages and histograms.

Level: Beginner
Type: Hands-on Lab
Tags: Programming Languages; Algorithms

Day: Monday, 04/04
Time: 09:00 - 10:30
Location: Room 210A

L6128 - OpenACC Bootcamp

Jeff Larkin DevTech Software Engineer, NVIDIA
Highly-Rated Speaker
Jeff is a software engineer in NVIDIA's Developer Technology (DevTech) group where he works on porting and optimizing HPC applications. He is also closely involved with the development of both the OpenACC and OpenMP specifications. Prior to joining NVIDIA Jeff worked in Cray's Supercomputing Center of Excellence at Oak Ridge National Laboratory.

In this session participants will learn OpenACC programming by example. Participants must be comfortable with C/C++ or Fortran programming, but no prior OpenACC or GPU programming experience is required. This lab will demonstrate a 4 step process for applying OpenACC to an existing application: Identify Parallelism, Express Parallelism, Express Data Movement, Optimize Loops and discussion OpenACC best practices. Upon completion participants will be able to begin accelerating their own applications using OpenACC.

Level: Beginner
Type: Hands-on Lab
Tags: Programming Languages; OpenACC

Day: Monday, 04/04
Time: 09:00 - 12:00
Location: Room 210C

L6105 - Developing, Debugging and Optimizing GPU Codes for High Performance Computing with Allinea Forge

Beau Paisley Support Engineer, Allinea Software
Beau Paisley is a support engineer with Allinea Software. He has over 25 years of experience in development, marketing, and sales roles with research, academic, and startup organizations. Beau has previously held positions with NCAR, Applied Physics Lab, and several startup and early growth technical computing companies. He is a computer science and mathematics graduate from the College of William and Mary and performed M.S. studies in electrical engineering at Purdue University.

We'll bring CUDA into a compute-intensive application by learning how to use CUDA-enabled development tools in the process of profiling, optimization, editing, building, and debugging. Using the Allinea Forge development toolkit, we'll cover how to profile an existing application and identify the most compute-intensive code regions. We'll then replace these regions with CUDA implementations and review the results before turning to the task of debugging the GPU-enabled code to fix an error introduced during the exercise. We'll learn debugging techniques for CUDA and debug using Allinea Forge to produce the correct, working, high-performance GPU-accelerated code. We'll be using GPUs hosted in the cloud, so simply bring a laptop with a modern browser.

Level: Beginner
Type: Hands-on Lab
Tags: Supercomputing & HPC; Tools & Libraries

Day: Monday, 04/04
Time: 10:30 - 12:00
Location: Room 210A

L6127 - Simplified CUDA® Development with C#

Daniel Egloff Managing Director QuantAlea / Partner InCube Group, QuantAlea / InCube Group
Highly-Rated Speaker
Daniel Egloff is a partner of InCube Group and managing director of QuantAlea, a Swiss software engineering company specializing in GPU software development. He studied mathematics, theoretical physics, and computer science and worked for almost 20 years as a quant and software architect.

Because of the availability of GPUs on Azure and in the new Surface, developing GPU accelerated applications in C# is becoming more important than ever before. Scaling out to GPUs hosted in a public cloud is now simpler and very cost effective. On Azure C# is the language of choice for many enterprises and software developers. In this session we use the Alea GPU V3 development stack to program GPU algorithms directly in C# and show how to perform native debugging, profiling and performance tuning.

Level: Beginner
Type: Hands-on Lab
Tags: Programming Languages; Tools & Libraries

Day: Monday, 04/04
Time: 10:30 - 12:00
Location: Room 210B

L6104 - In-Depth Performance Analysis for OpenACC/CUDA®/OpenCL Applications with Score-P and Vampir

Robert Henschel Manager, Scientific Applications and Performance Tuning, Research Technologies, Pervasive Technology Institute, Indiana University
Robert Henschel runs the Scientific Applications and Performance Tuning group at Indiana University, focused on optimizing scientific applications. He received an M.S. in computer science from Technische Universitat Dresden, Germany.
Guido Juckeland IT Architect and Leader Hardware Accelerator Group, TU Dresden - ZIH
Guido Juckeland coordinates the work of the CUDA Center of Excellence at Technische Universitat Dresden and also represents TU Dresden at the SPEC High Performance Group and OpenACC committee. He received his Ph.D. for his work on performance analysis for hardware accelerators.

Participants will work with Score-P/Vampir to learn how to dive into the execution properties of CUDA and OpenACC applications. We'll show how to use Score-P to generate a trace file and how to study it with Vampir. Additionally, we'll use the newly established OpenACC tools interface to also present how OpenACC applications can be studied for performance bottlenecks.

Level: Advanced
Type: Hands-on Lab
Tags: Performance Optimization; Tools & Libraries

Day: Monday, 04/04
Time: 13:00 - 14:30
Location: Room 210C

L6123 - Advanced OpenACC Programming

Jeff Larkin DevTech Software Engineer, NVIDIA
Highly-Rated Speaker
Jeff Larkin is a software engineer in NVIDIA's Developer Technology (DevTech) group, where he works on porting and optimizing HPC applications. He is also closely involved with the development of both the OpenACC and OpenMP specifications. Prior to joining NVIDIA, Jeff worked in Cray's Supercomputing Center of Excellence at Oak Ridge National Laboratory.

This tutorial will teach experienced OpenACC programmers several techniques that will take them to the next level in their code. Participants will learn via hands-on examples how to pipeline computations to overlap data transfers, multi-GPU programming, OpenACC interoperability, and more. Participants must be comfortable with C/C++ or Fortran programming and already have experience with OpenACC programming.

Level: Advanced
Type: Hands-on Lab
Tags: Programming Languages; OpenACC

Day: Monday, 04/04
Time: 15:00 - 16:30
Location: Room 210C

L6113 - Teach GPU Accelerating Computing: Hands-on with NVIDIA Teaching Kit for University Educators

Wen-Mei Hwu Professor of Electrical and Computer Engineering, University of Illinois
Professor Wen-mei Hwu holds the Sanders-AMD Endowed Chair in the Department of Electrical and Computer Engineering, at the University of Illinois at Urbana-Champaign. His research interests are in the area of architecture, implementation, compilation, and algorithms for parallel computing. He is the chief scientist of the Parallel Computing Institute and director of the IMPACT research group. He is a co-founder and CTO of MulticoreWare. He is the instructor for the Coursera Heterogeneous Parallel Programming course, which more than 60,000 students have taken. For his contributions in research and teaching, he received the ACM SigArch Maurice Wilkes Award, the ACM Grace Murray Hopper Award, the EKN Holmes MacDonald Outstanding Teaching Award, the ISCA Influential Paper Award, the IEEE Computer Society B. R. Rau Award, and the Distinguished Alumni Award in Computer Science of the University of California, Berkeley. He is a fellow of IEEE and ACM. He directs the UIUC CUDA Center of Excellence and serves as one of the principal investigators of the NSF Blue Waters Petascale computer project. Wen-mei received his Ph.D. in computer science from the University of California, Berkeley.
Joe Bungo GPU Educators Program Manager, NVIDIA
Joe Bungo is the GPU Educators Program Manager at NVIDIA, where he enables the use of NVIDIA and GPU technologies in universities in a variety of ways, including curriculum and teaching material development, facilitation of academic ecosystems, and hands-on instructor workshops. Previously, he managed the university program at ARM Inc. and worked as an applications engineer there.

As performance and functionality requirements of interdisciplinary computing applications rise, industry demand for new graduates familiar with accelerated computing with GPUs grows. In the future, many mass-market applications will be what are considered "supercomputing applications" as per today's standards. This hands-on tutorial introduces a comprehensive set of academic labs and university teaching material for use in introductory and advanced parallel programming courses. The teaching materials start with the basics and focus on programming GPUs, and include advanced topics such as optimization, advanced architectural enhancements, and integration of a variety of programming languages.

Level: Intermediate
Type: Hands-on Lab
Tags: Tools & Libraries; Education & Training

Day: Tuesday, 04/05
Time: 13:00 - 14:30
Location: Room 210B

L6130 - Deep Learning on GPUs: From Large Scale Training to Embedded Deployment on Maxwell

Julie Bernauer Senior Solutions Architect, NVIDIA
Julie Bernauer is Senior Solutions Architect for Deep Learning at NVIDIA since 2015. She attended ENS Cachan from 2001 to 2004 where she received a degree in Physical Chemistry. She obtained her PhD from Université Paris-Sud in 2006 while performing research in the Yeast Structural Genomics group. Her thesis focused on the use of Voronoi models for modelling protein complexes. After a post-doctoral position at Stanford University with Pr. Michael Levitt, Nobel Prize in Chemistry 2013, she joined Inria, the French National Institute for Computer Science. While Senior Research Scientist at Inria, Adjunct Associate Professor of Computer Science at École Polytechnique and Visiting Research Scientist at SLAC, her work focused on computational methods and machine learning for structural bioinformatics, specifically scoring functions for macromolecule docking, and statistical potentials for molecular simulations. She was the first to successfully introduce machine learning for coarse-grained models in the CAPRI challenge.
Allison Gray Solutions Architect, NVIDIA
TBA

The tutorial will show how to set up a deep learning environment on Jetson TX1 to perform deep learning tasks, with particular inference using pretrained models from a Digits server. Other demo applications, including, live image classification and image captioning will be covered.

Level: Intermediate
Type: Hands-on Lab
Tags: Deep Learning & Artificial Intelligence

Day: Tuesday, 04/05
Time: 13:00 - 16:00
Location: Room 210A

L6108 - Kokkos, Manycore Performance Portability Made Easy for C++ HPC Applications

H. Carter Edwards Principle Member Technical Staff, Sandia National Laboratories
Highly-Rated Speaker
H. Carter Edwards has over three decades of experience developing software for simulations of a variety of engineering domains. He has been researching and developing software for HPC algorithms and data structures for the past 16 years at Sandia National Laboratories. An expert in high performance computing, he's currently focusing on thread-scalable algorithms and data structures for heterogeneous many-core architectures, such as NVIDIA GPU, AMD Fusion, and Intel Xeon Phi. He has a B.S. and M.S. in aerospace engineering from the University of Texas at Austin, and worked for 10 years at the Johnson Space Center in the domain of spacecraft guidance, navigation, and control. He has a Ph.D. in computational and applied mathematics, also from the University of Texas at Austin.
Christian Trott Senior Member Technical Staff, Sandia National Laboratories
Christian Trott is a high performance computing expert with experience in designing and implementing software for GPU and MIC compute-clusters. He earned a Dr. rer. nat. from the University of Technology Ilmenau in theoretical physics. Prior scientific work focused on computational material research using Ab-Initio calculations, molecular dynamic simulations and monte carlo methods. As of 2015 Christian is a senior member of technical staff at the Sandia National Laboratories. He is a core developer of the Kokkos programming model with a large role in advising applications on adopting Kokkos to achieve performance portability for next generation super computers.
Jeff Amelang Visiting Professor, Harvey Mudd College
Jeff Amelang focuses on teaching high performance computing technologies and techniques in a variety of contexts. He has taught courses, tutorials, and workshops at several national labs as well as US and international universities. He obtained his MS and PhD in Mechanical Engineering from the California Institute of Technology, with a focus on Computational Science and Engineering. Currently serving as a Visiting Professor at Harvey Mudd College, his favorite courses to teach are on distributed and GPU programming.

The Kokkos C++ library enables development of HPC scientific applications that are performance portable across disparate manycore architectures such as NVIDIA Kepler, AMD Fusion, and Intel Xeon Phi. Kokkos leverages the CUDA 7.5 device lambda capability to provide a highly intuitive and easy to use parallel programming model. Kokkos simplifies data management for heterogeneous memory (CPU, GPU, UVM, etc.) through a unique polymorphic multidimensional array view interface. View polymorphism includes mutable multidimensional layout, transparent overloads for atomic operations, and simplified access to GPU texture hardware. Kokkos advanced features culminate in portable team parallelism that performantly maps onto CUDA grids, blocks, and shared memory.

Level: Beginner
Type: Hands-on Lab
Tags: Supercomputing & HPC; Tools & Libraries

Day: Tuesday, 04/05
Time: 14:00 - 17:00
Location: Room 210A

L6131 - Deep Learning on GPUs: From Large Scale Training to Embedded Deployment on Maxwell

Julie Bernauer Senior Solutions Architect, NVIDIA
Julie Bernauer is Senior Solutions Architect for Deep Learning at NVIDIA since 2015. She attended ENS Cachan from 2001 to 2004 where she received a degree in Physical Chemistry. She obtained her PhD from Université Paris-Sud in 2006 while performing research in the Yeast Structural Genomics group. Her thesis focused on the use of Voronoi models for modelling protein complexes. After a post-doctoral position at Stanford University with Pr. Michael Levitt, Nobel Prize in Chemistry 2013, she joined Inria, the French National Institute for Computer Science. While Senior Research Scientist at Inria, Adjunct Associate Professor of Computer Science at École Polytechnique and Visiting Research Scientist at SLAC, her work focused on computational methods and machine learning for structural bioinformatics, specifically scoring functions for macromolecule docking, and statistical potentials for molecular simulations. She was the first to successfully introduce machine learning for coarse-grained models in the CAPRI challenge.
Allison Gray Solutions Architect, NVIDIA
TBA

The tutorial will show how to set up a deep learning environment on Jetson TX1 to perform deep learning tasks, with particular inference using pretrained models from a Digits server. Other demo applications, including, live image classification and image captioning will be covered.

Level: Intermediate
Type: Hands-on Lab
Tags: Deep Learning & Artificial Intelligence

Day: Tuesday, 04/05
Time: 14:00 - 17:00
Location: Room 210C

L6121 - Applied Deep Learning for Vision and Natural Language with Torch7

Nicholas Leonard Research Engineer, Element Inc.
Nicholas Leonard applies deep learning to biometric authentication using smartphones. He graduated from the Royal Military College of Canada in 2008 with a B.S. in computer science. Nicholas retired from the Canadian Army Officer Corp in 2012 to complete an M.S. in deep learning at the University of Montreal.

This hands-on tutorial targets machine learning enthusiasts and researchers and will cover applying deep learning techniques on classifying images and natural language data. The session is driven in Torch: a scientific computing platform that has great toolboxes for deep learning and optimization among others, and fast CUDA backends with multi-GPU support. Torch is supported by Facebook, Google, Twitter, and a strong community who actively open-source their code and packages.

Level: Beginner
Type: Hands-on Lab
Tags: Deep Learning & Artificial Intelligence; Computer Vision & Machine Vision; Tools & Libraries

Day: Tuesday, 04/05
Time: 15:00 - 16:30
Location: Room 210B

L6114 - Advanced Tools for GPU Cluster

Jean-Matthieu Etancelin Research Engineer, ROMEO HPC Center, University of Reims Champagne-Ardenne
Jean-Matthieu Etancelin has been a research engineer in the GPU Application Lab from the University of Reims Champagne-Ardenne in the ROMEO High Performance Computing Center since 2015. Jean-Matthieu defended his Ph.D. in applied mathematics at the University of Grenoble in 2014. His research interests are mainly focused on hybrid computing and numerical analysis of PDE problems applied to fluid dynamics.

In this lab, attendees will experiment with some leading-edge distributed GPU technologies that could strongly enhance HPC efficiency and productivity. We'll provide exercises on the ROMEO GPU-powered cluster for getting faster GPU data communications with GPU Direct RDMA, for using many virtualized GPUs through rCUDA and for improving visualization using NVIDIA IndeX. GPU Direct RDMA technology enhances the efficiency of data exchange between GPU and PCI Express devices, such as other GPU or infiniband networks. rCUDA is a virtualization framework that enables the usage of remote CUDA-enabled GPU devices locally in a transparent way. NVIDIA IndeX is software for real-time scalable visualization and computing of volumetric data together with embedded geometry data.

Level: Advanced
Type: Hands-on Lab
Tags: Supercomputing & HPC; In-Situ and Scientific Visualization

Day: Wednesday, 04/06
Time: 09:30 - 11:00
Location: Room 210A

L6129 - VisionWorks Toolkit Hands-on (Computer Vision)

Thierry Lepley Senior Software Engineer, NVIDIA
[To Be Written]
Colin Tracey Senior System Software Engineer, NVIDIA
Colin Tracey has been with NVIDIA as a Senior System Software Engineer since 2011. He has worked on camera features for mobile devices including panorama, HDR, video stabilization, and object tracking. More recent work has been in ADAS and autonomous driving systems including surround view, obstacle detection, and sensor fusion.

In this hands-on session, we'll discover practically the VisionWorks™ toolkit, a NVIDIA SDK for computer vision (CV) that implements and extends the new OpenVX standard. The first step will be to install the VisionWorks toolkit, discover its structure, its documentation and run samples. In a second step, we will experiment different ways debugging and profiling an application developed with VisionWorks. Finally, we will do some programming to experience practically the API.

Level: Intermediate
Type: Hands-on Lab
Tags: Computer Vision & Machine Vision

Day: Wednesday, 04/06
Time: 09:30 - 11:00
Location: Room 210C

L6133 - BIDMach Machine Learning Toolkit

John Canny Professor, UC Berkeley
John Canny is a professor in computer science at UC Berkeley. He is an ACM dissertation award winner and a Packard Fellow. He works on human-computer interaction and large-scale machine learning. Since 2002, he has been developing and deploying behavioral modeling systems in industry. He designed and protyped production systems for Overstock.com, Yahoo, Ebay, and Quantcast, and is currently a visiting scientist at Yahoo Inc. His goals are to help machine learning better address real-world needs, and to integrate it more closely with the data interpretation process through interactive machine learning systems.

We will give a hands-on introduction to the BIDMach machine learning toolkit. BIDMach is a new "rooflined" toolkit that features the fastest implementations of many machine learning algorithms (regression, clustering, topics models, random forests...) on CPU or PU hardware, and which has a growing suite of deep learning layers. The tutorial will be divided into three parts: (1) rapid prototyping of single-machine algorithms with BIDMach; (2) building and running deep learning models in BIDMach; and (3) scaling up to clusters: running on Apache Spark.

Level: Intermediate
Type: Hands-on Lab
Tags: Deep Learning & Artificial Intelligence; Big Data Analytics

Day: Wednesday, 04/06
Time: 09:30 - 11:00
Location: Room 210B

L6117 - NVIDIA GRID™ 2.0 on Horizon View Hand-on Lab

Jeff Weiss GRID Solution Architect Manager, NVIDIA
Jeff Weiss is the GRID solutions architect manager for North America working with the Solution Architecture & Engineering team at NVIDIA. Prior to joining NVIDIA, Jeff worked for 7 years at VMware as an EUC staff engineer, as well as at Symantec and Sun Microsystems. Along with his current focus on NVIDIA GRID vGPU-enabled end-user computing, his experience includes data center business continuity/disaster recovery solutions, software infrastructure identity management, and email security/archiving tools.
Matt Coppinger Director, Technical Marketing & Enablement, End User Computing, VMware
Matt Coppinger is director of technical marketing and enablement for End User Computing at VMware. He has worked on desktop virtualization since its inception at VMware in 2007, first in engineering, then as a field consultant, and finally in his current role. He has authored a number of reference architectures for VMware, including virtualizing 3D applications, and has spoken on the technical aspects of desktop virtualization at VMworld and other major conferences since 2010.

We'll cover installation of NVIDIA vGPU components into a Horizon View environment, building out and optimizing vGPU Goldmaster images, deployment options of vGPU-enabled desktop pools, monitoring the workloads, performance tools, and debugging issues. The session will combine lecture and labs with guides supplied to participants.

Level: Intermediate
Type: Hands-on Lab
Tags: Graphics Virtualization; Performance Optimization

Day: Wednesday, 04/06
Time: 13:00 - 16:00
Location: Room 210B

L6126 - Tips and Tricks for Unified Memory on NVIDIA Kepler and Maxwell Architectures

Nikolay Sakharnykh Developer Technology Engineer, NVIDIA
Nikolay Sakharnykh is a Senior Developer Technology Engineer at NVIDIA where he works on accelerating applications on GPUs. He has experience in scientific research and software development focusing on computational techniques related to physics, chemistry, and biology.
Jiri Kraus Compute Devtech Software Engineer, NVIDIA
Highly-Rated Speaker
Jiri Kraus is a senior developer in NVIDIA's European Developer Technology team. As a consultant for GPU HPC applications at the NVIDIA Julich Applications Lab, Jiri collaborates with local developers and scientists at the Julich Supercomputing Centre and the Forschungszentrum Julich. Before joining NVIDIA, he worked on the parallelization and optimization of scientific and technical applications for clusters of multicore CPUs and GPUs at Fraunhofer SCAI in St. Augustin. He holds a diploma in mathematics from the University of Cologne, Germany.

Exited about Unified Memory and how it can improve your productivity when programming GPUs? Want to learn how to improve performance and productivity when using Unified Memory on Kepler? Then this session is the right for you. The session will explain by example how Unified Memory can be used, introduce advanced features like stream attachments and how performance of applications using Unified Memory can be tuned. Furthermore the tools support for Unified Memory is covered.

Level: Intermediate
Type: Hands-on Lab
Tags: Tools & Libraries

Day: Wednesday, 04/06
Time: 14:00 - 15:30
Location: Room 210A

L6136 - Jetson TX1 Hands On Lab for FIRST Robotics

Phil Lawrence Program Manager for Jetson Embedded Platform, NVIDIA
Phil Lawrence is a program manager for NVIDIA's Jetson embedded platform.

For high school students in FIRST Robotics program. Hands-on lab with Jetson TX1 Developer Kit. Learn how to use the developer tools in the context of live video streaming, with object recognition, CUDA processing and visualization.

Level: Beginner
Type: Hands-on Lab
Tags: Robotics & Autonomous Vehicles; Embedded

Day: Thursday, 04/07
Time: 09:00 - 10:30
Location: Room 210C

L6116 - Deep Learning With the Theano Python Library

Frédéric Bastien Team Lead Developer, MILA, Université de Montréal
Frédéric Bastien is team lead - software infrastructure at the Montreal Institute of Learning Algorithms, Canada (MILA) and lead developer for the Theano library. In 2007, he finished an M.S. in computer architectures at University of Montreal and has since been working at MILA (formerly LISA lab).

This hands-on lab will introduce the Theano framework, a software compiler/library based on Python. Learn how to get started with the software, as well as work through a few useful machine learning examples accelerated on an NVIDIA GPU. We'll have Theano exercises as well as exercises on the LeNet model, an older model that allows quick experimentation. We'll be using GPUs hosted in the cloud, so simply bring a laptop with a modern browser.

Level: Beginner
Type: Hands-on Lab
Tags: Deep Learning & Artificial Intelligence; Tools & Libraries

Day: Thursday, 04/07
Time: 09:30 - 11:00
Location: Room 210B

L6124 - Teach GPU Accelerated Robotics: Hands-on with Jetson™ Robotics Teaching Kit

Joe Bungo GPU Educators Program Manager, NVIDIA
Joe Bungo is the GPU Educators Program Manager at NVIDIA, where he enables the use of NVIDIA and GPU technologies in universities in a variety of ways, including curriculum and teaching material development, facilitation of academic ecosystems, and hands-on instructor workshops. Previously, he managed the university program at ARM Inc. and worked as an applications engineer there.

As performance and functionality requirements of interdisciplinary robotics applications rise, industry demand for new graduates familiar with GPU-accelerated computer vision, machine learning and other robotics concepts grows. This hands-on tutorial introduces a comprehensive set of academic labs and university teaching material targeted at the NVIDIA Tegra-based Jetson embedded computing platform for use in introductory and advanced interdisciplinary robotics courses. The teaching materials start with the basics and focus on programming the Jetson platform, and include advanced topics such as computer vision, machine learning, robot localization and controls.

Level: Intermediate
Type: Hands-on Lab
Tags: Robotics & Autonomous Vehicles; Computer Vision & Machine Vision

Day: TBD, TBD
Time: TBD - TBD
Location: TBD

L6132 - Software Reuse Inside the Kernel: Modular Design with Group-Wide Abstractions

Duane Merrill Senior Research Scientist, NVIDIA Corporation
Duane Merrill is a Senior Research Scientist at NVIDIA. His principal research interests are programming model and algorithm design for parallel computing. His work focuses on problems involving sparse, irregular, and cooperative computation. He is the author of CUB, a library of "collective" software primitives to simplify CUDA kernel construction, performance tuning, and maintenance. He received his B.S., M.S., and Ph. D. from the University of Virginia.

In the CUDA ecosystem, the GPU kernel is where the complexities of fine-grained cooperative parallelism exist (e.g., synchronization, race conditions, shared memory layout, plurality of state, memory conflicts, special instructions, etc.). In this lab, we'll show you how to encapsulate these complexities within fast, efficient, reusable group-wide software abstractions. With modular design practice, you can significantly reduce development time, easily tune for performance, and simplify maintenance effort. To illustrate this design pattern, we begin by creating our own reusable warp-wide (and then block-wide) reduction primitives. We follow that by using the prefix scan primitives from the CUB library to demonstrate block-wide selection/compaction and reduce-value-by-key.

Level: Intermediate
Type: Hands-on Lab
Tags: Tools & Libraries; Programming Languages

Day: TBD, TBD
Time: TBD - TBD
Location: TBD

L6134 - Creating Accelerated Microservices with the GPU Rest Engine

Chris Gottbrath Accelerated Computing Product Manager, NVIDIA
Chris Gottbrath is an Accelerated Computing Product Manager working to ensure that the CUDA Math Libraries and other software products that NVIDIA provides deliver exceptional value to users. He has more than 15 years experience in the High Performance, Scientific, Technical and Enterprise Computing business with a lot of focus on user productivity, application performance and correctness. He started exploring CUDA about 10 years ago.

We'll teach you how to use GPUs to create amazing data-driven capabilities in your hyper-scale Software-as-a-service offering hosted in the cloud. You'll create GPU accelerated micro-services which deliver high throughput and low latency using RESTful APIs ready to be plugged right into web scale applications. In the space of 90 minutes participants will use GPU REST Engine (GRE) to create two different micro-services exposing GPU kernels working on very different kinds of datasets. The first example uses the ability of the GPU to run convolutional neural networks to create a high speed image classification service. The second example leverages the new NVIDIA Graph Analytics Library to provide a pagerank micro-service.

Level: Beginner
Type: Hands-on Lab
Tags: Data Center & Cloud Computing; Big Data Analytics

Day: TBD, TBD
Time: TBD - TBD
Location: TBD

L6135 - Jetson Developer Tools Lab

Sebastien Domine Senior Director Software Engineering - Developer Tools, NVIDIA
Highly-Rated Speaker
Sébastien Domine is the Senior Director of Developer Technology Tools at NVIDIA. He runs various software engineering teams and oversees the development of software products dedicated to ease the developer's life and to foster the creation of applications that can take full advantage of the GPU and the SoC. Prior to NVIDIA, he worked on PC games at GameFX/THQ and 3D digital content creation tools at Katrix and Nichimen Graphics. He holds a Diplôme d'Ingénieur in Computer Science from EPITA, Paris, France.

Hands-on developer tools lab on Jetson TX1. Learn how to use the developer tools in the context of live video streaming, with object recognition, CUDA processing and visualization. You'll learn about NVTX, Tegra System Profiler, Visual Profiler and Nsight Eclipse Edition.

Level: Intermediate
Type: Hands-on Lab
Tags: Tools & Libraries; Embedded; Performance Optimization

Day: TBD, TBD
Time: TBD - TBD
Location: TBD

L6137 - Chainer Hands-on: Introduction To Train Deep Learning Model in Python

Shohei Hido Chief Research Officer, Preferred Networks
Shohei Hido is Chief Research Officer of Preferred Networks America, Inc. He received his M.S. in Informatics from Kyoto University in Japan, 2006. Since then, he has worked at IBM Research in Tokyo for six years as a staff researcher in machine learning and its applications to many industries. After joining Preferred Infrastructure, Inc. in 2012, he has worked as the leader of Jubatus project, an open source software framework for real-time, streaming machine learning. Currently, he is the product manager of Deep Intelligence in Motion, software for using deep learning in IoT applications. Preferred Networks was established as a spinout company from Preferred Infrastructure in 2014.

This hands-on session is aimed for Chainer beginners. Chainer is a Python-based open-source software framework for deep learning. Though Chainer is independent from Theano, the popular deep learning backend in Python, it is flexible, intuitive and powerful by applying a unique paradigm of building computational graph. Programming with Chainer is fairly straightforward and the code is easy to understand so that users can efficiently implement complicated neural network models. In this session, attendees will learn basics of Chainer, how to train image recognition models, and some extensions, based on AWS GPU instance and IPython notebook. Attendees are expected to have basic knowledge on Python, machine learning, and deep neural networks.

Level: Intermediate
Type: Hands-on Lab
Tags: Deep Learning & Artificial Intelligence

Day: TBD, TBD
Time: TBD - TBD
Location: TBD

Hands-on lab