Sign In
GTC Logo
GPU
Technology
Conference

March 17-20, 2015 | San Jose, California
Check back often for session updates.
Registration is currently closed. To be notified when registration opens, along with other pertinent news from GTC, sign up here.

Scheduler Planner

Print
Download Pdf
 
  • List View
  • Calender View

 
Refine:
  • Session Levels:
  • |
  • |
  • |
  • |
  • Session Levels:
  •  
  •  
  •  
  •  
  • = Highly Rated Speaker

TALK

Presentation
Details

S5706 - Power and Speed: Maximizing Application Performance on IBM Power Systems with XL C/C++ Compiler

Yaoqing Gao Senior Technical Staff Member, IBM Canada
Yaoqing Gao is a Senior Technical Staff Member at IBM Canada Lab in the compiler development area. His major interests are compilation technology, optimization and performance tuning tools, parallel programming models and languages, and computer architecture. He has been doing research and development for IBM XL C/C++ and Fortran compiler products on IBM POWER, System z, CELL processors and Blue Gene. He authored over 30 papers in journals and conferences. He has been an IBM Master inventor since 2006 and authored over 30 issued and pending patents.

This presentation will provide the latest news on IBM's compilers on Power. The major features to enhance portability such as improved standards compliance and gcc compiler source code and option compatibility will be presented. The presentation will also cover performance tuning and compiler optimization tips to maximize workload performance on IBM Power Systems including exploitation of the POWER8 processor and architecture.

Level: All
Type: Talk
Tags: OpenPOWER; Supercomputing; Developer - Performance Optimization

Day: Tuesday, 03/17
Time: 11:00 - 11:25
Location: OpenPOWER Booth
View Recording

S5707 - XL C/C++ and GPU Programming on Power Systems

Kelvin Li IBM

The OpenPOWER foundation is an organization with a mandate to enable member companies to customize the POWER CPU processors and system platforms for optimization and innovation for their business needs. One such customization is the integration of graphics processing unit (GPU) technology with the POWER processor. IBM has recently announced the IBM POWER S824L system, a data processing powerhouse that integrates the nVidia Tesla GPU with IBM's POWER8 processor. This joint presentation with nVidia and IBM will contain details of the S824L System, including an overview of the Tesla GPU and how it interoperates with the POWER8 processor. It will also describe the nVidia software stack and how it works with the POWER8 compilers.

Level: All
Type: Talk
Tags: OpenPOWER; Supercomputing

Day: Tuesday, 03/17
Time: 11:30 - 11:55
Location: OpenPOWER Booth
View Recording

S5835 - DB2 BLU w/GPU Demo: Concurrent Execution of an Analytical Workload on a POWER8 Server with K40 GPUs

Sina Meraji, PhD Hardware Acceleration Laboratory, SWG, IBM
Bio coming soon
Berni Schiefer Technical Executive, Information Management Performance and Benchmarks, IBM
Bio coming soon

In this technology preview demonstration, we will show the concurrent execution of an analytical workload on a POWER8 server with K40 GPUs. DB2 will detect both the presence of GPU cards in the server and the opportunity in queries to shift the processing of certain core operations to the GPU. The required data will be copied into the GPU memory, the operation performed and the results sent back to the P8 processor for any remaining processing. The objective is to 1) reduce the elapsed time for the operation and 2) Make more CPU available to other SQL processing and increase overall system throughput by moving intensive CPU processing tasks to GPU.

Level: All
Type: Talk
Tags: OpenPOWER

Day: Tuesday, 03/17
Time: 12:00 - 12:15
Location: OpenPOWER Booth
View Recording
View PDF

S5696 - POWER8: The First OpenPOWER Processor

Michael Gschwind STSM & Manager, System Architecture, IBM Systems & Technology Group
Michael is an STSM & Manager for System Architecture in the IBM Systems & Technology Group. He is a Fellow at IEEE and a Member of IBM Academy of Technology - IBM Master Inventor.

The POWER8 processor is the latest RISC (Reduced Instruction Set Computer) microprocessor from IBM and the first processor supporting the new OpenPOWER software environment. Power8 was designed to deliver unprecedented performance for emerging workloads, such as Business Analytics and Big Data applications, Cloud computing and Scale out Datacenter workloads. It is fabricated using IBM's 22-nm Silicon on Insulator (SOI) technology with layers of metal, and it has been designed to significantly improve both single-thread performance and single-core throughput over its predecessor, the POWER7i processor.

Level: All
Type: Talk
Tags: OpenPOWER; Supercomputing

Day: Tuesday, 03/17
Time: 12:30 - 12:55
Location: OpenPOWER Booth
View Recording

S5407 - Early Evaluation of the Jetson TK1 Development Board for Power and Performance

Jee Choi graduate research assistant, Georgia Tech
I am a graduate student at Georgia Tech conducting research in high performance computing. I was among the first to model, implement, and autotune an optimized GPU kernel for sparse matrix-vector multiply (SpMV). More recently, I have authored the energy roofline model, which attempts to model performance, energy, and power from the perspective of algorithm designers and performance tuners.

In this session, we will describe our experience in evaluating the Jetson TK1 development for performance, energy and power. We first describe the benchmarks used in our evaluation and present the performance and power results for various throughputs, including single- and double- precision compute, memory bandwidth, and more. We will also present our model for predicting the energy costs of various operations under different frequency and voltage settings, and show how different settings map to different arithmetic intensity regimes in terms of performance and energy efficiencies. Finally, we present preliminary results in using the Jetson TK1 for computing the fast multipole method (FMM) kernel and compare its performance and energy efficiency against that of high-end Tesla GPUs.

Level: All
Type: Talk
Tags: Embedded Systems

Day: Tuesday, 03/17
Time: 13:00 - 13:25
Location: Room 210G
View Recording
View PDF

S5489 - Large-Scale Spatial Query Processing on GPU-Accelerated Big Data Systems by Extending Cloudera Impala

Jianting Zhang Assistant Professor, The City College of New York
Jianting Zhang is currently an Assistant Professor in the Department of Computer Science at the City College of New York (CCNY). He received his bachelor degree in Water and Environments (1993) and master degree in Physical Geography (1996) from Nanjing University, China. He received his master degree (2001) and Ph.D. degree (2004) in Computer Science from the University of Oklahoma, USA. He also received post-doctoral training in Ecological Informatics (2004-2007) at the University of New Mexico, USA. Dr. Zhang has nearly 20 years of expertise in geospatial computing technologies, including: 1) High-performance geospatial computing on commodity parallel hardware and supercomputers; 2) Efficient data structures and algorithms in Spatial databases and GIS applications; 3) Desktop and Web-based GIS prototypes for interactive visual analytics of large-scale geospatial data, and; 4) Track record in interdisciplinary research with multidisciplinary collaborators in hydrological modeling, satellite remote sensing, radar meteorologist, climate/weather modeling, ecological/biodiversity studies, urban/transportation research and advanced computing infrastructure development. Dr. Zhang is currently working with collaborators to develop the next generation high-performance GIS on commodity massively data parallel Graphics Processing Units (GPUs) that can potentially speed up current GIS performance by 10-50X and 3-4 orders of magnitudes on main-memory and disk-resident systems, respectively. The research and development effort is being supported by a 4-year National Science Foundation (NSF) Intelligent Information System (IIS) Medium Collaborative Research award (2013-2017).

Geo-referenced spatial (or geospatial) data volumes are increasing. Traditional data management techniques, such as Geographical Information System (GIS) and Spatial Databases, do not work well for big spatial data while existing Big Data systems do not support geospatial data. In addition to our work on managing spatial data on single-node GPUs, we have integrated our parallel designs with an open source, a big data system called Impala to support both efficient and scalable distributed spatial query processing in an interactive SQL environment. We present system architecture, data parallel designs for spatial indexing and query processing as well as performance on real datasets for point-in-polygon test based spatial joins.

Level: All
Type: Talk
Tags: Big Data Analytics; Data Center, Cloud Computing & HPC

Day: Tuesday, 03/17
Time: 13:00 - 13:25
Location: Room 210D
View Recording

S5552 - Transparent Parallelization of Neural Network Training

Cyprien Noel Software Engineer, Flickr / Yahoo Inc.
Cyprien Noel
Cyprien worked on high performance distributed software in various settings, finance software startup, gaming, internet of things, and an Air Control simulator at NASA. He lived in France, NY, and loves his new home San Francisco. A couple years ago he started working again on his beginnings in machine learning, combining expertise in high performance computing and neural networks.Bio details to go here.
Simon Osindero Senior Manager / Research Engineer, Flickr / Yahoo Inc.
Simon Osindero
Simon Osindero is currently a senior principal researcher and manager at Flickr, Yahoo Inc. where he leads efforts on applied machine learning. Prior to joining Yahoo, he was CTO and co-founder of LookFlow, a startup that combined state-of-the-art machine learning, computer vision, and information visualization methods to build a revolutionary search-and-discovery experience. (LookFlow was acquired by Yahoo at the end of 2013.) Before starting LookFlow he developed machine learning algorithms for natural language processing and semantic analysis as a researcher at Idilia Inc. He is perhaps best known for his contribution to the field of machine learning through his post-doctoral work on Deep Belief Networks, at the University of Toronto in collaboration with Geoff Hinton and Yee Whye Teh. His 2006 paper is widely credited as reigniting the current wave of interest in "deep learning". He holds: a PhD in Computational Neuroscience from the Gatsby Unit at UCL; an MSci in Experimental & Theoretical Physics along with a BA/MA degrees in Physics, Molecular Biology, and Mathematics from the University of Cambridge; and diplomas in Photography and Design from Concordia University.

Deep learning has enjoyed tremendous success in recent years. Unfortunately, training large models can be very time consuming, even on GPU hardware. We describe a set of extensions to the state of the art Caffe library cite{Jia13caffe}, allowing training on multiple threads and GPUs, and across multiple machines. Our focus is on architecture, implementing asynchronous SGD without increasing Caffe's complexity.

Level: Intermediate
Type: Talk
Tags: Machine Learning & Deep Learning; Computer Vision & Machine Vision; Developer - Performance Optimization; Press-Suggested Sessions: Deep Learning & Computer Vision

Day: Tuesday, 03/17
Time: 13:00 - 13:25
Location: Room 210A
View Recording
View PDF

S5576 - Applying MapGraph: GPU-Accelerated Merlin Decision Support and COA Generation for Electronic Warfare Operations

Brad Bebee Director of Mission Analytics, SYSTAP, LLC
Brad Bebee
Brad's passion is helping customers navigate complex technology and business challenges and delivering products and solutions that solve them quickly and effectively. He has focused on participating and running businesses that apply novel and advanced technology solutions to new mission and business problems. Over the course of his career, I have performed advanced technology development for commercial and government customers. His technology experience ranges from early work in modeling methodologies and knowledge representation dating back to precursors of DARPA's DAML program to more recent work with large scale data analytics using the Hadoop ecosystem, Accumulo, and related technologies. In his current role with SYSTAP, LLC, he is focused on leveraging products for high performance graph databases and analytics into business and mission areas.
Matthew Goldsbury Lead Engineer, Chesapeake Technology International Corp.
Matthew Goldsbury is the Technical Lead of a distributed agile team responsible for the development and deployment of the Merlin analytic. He is a co-inventor of the patent pending technologies powering Merlin's pathfinding capability. Mr. Goldsbury has 10 years of software development experience ranging from Bioinformatics to DOD related topics.

This session presents work by Chesapeake Technology International Corp (CTI) and SYSTAP to accelerate automated decision support and course of action (COA) generation using MapGraph. COA generation is an enabling analytic for tactical Airborne Electronic Attack (AEA) and for combined cyber and electronic warfare operations. CTI developed the Merlin capability to construct an automated decision space within dense, operationally-relevant environments. This capability enabled solutions to complete problems within tactically-relevant timelines (seconds or minutes) rather than hours. Many of these analytics may be represented as data-parallel graph analytics. GPU-acceleration enables new capabilities to provide the operator multiple COAs in near-realtime with dynamic updates in seconds.

Level: Intermediate
Type: Talk
Tags: Defense; Signal & Audio Processing; Big Data Analytics; Machine Learning & Deep Learning

Day: Tuesday, 03/17
Time: 13:00 - 13:50
Location: Room 210C
View Recording
View PDF

S5579 - GPU DNA Sequencing Base Quality Recalibration

Mauricio Carneiro Group Lead, Computational Technology Development, Broad Institute of MIT and Harvard
Mauricio Carneiro
Dr. Mauricio Carneiro leads the computational technology development of the Broad Institute of MIT and Harvard. Dr. Carneiro has contributed to major advances in DNA sequencing analysis with compression algorithms, statistical methods, heterogeneous compute optimizations and a systematic approach to the institute's computing methods development, distribution and support. His team is also responsible for the evaluation of new technologies in sequencing data and has provided many methods and tools to handle new data types in the world of next generation sequencing. Dr. Carneiro joined the Broad Institute in December 2010 after completing a Ph.D. in computational biology from Harvard University. He holds two championships in the International Collegiate Programming Contest of the Association for Computing Machinery in 2003 and 2002, respectively, and a Programming Excellence Award from the association's Upsilon Pi Epsilon Society in 2003. In his previous life, he was a video game developer and has led the development of the world's first massive mobile multiplayer game: Alien Revolt.
Nuno Subtil Senior System Software Engineer, CUDA Libraries & Algorithms, NVIDIA
Nuno Subtil
Nuno Subtil is a system software engineer in the NVBIO team at NVIDIA, focused on high performance parallel algorithms for bioinformatics. Prior to that, he worked on low-level graphics system software for mobile and desktop platforms at NVIDIA. Before joining the company, he did research on physically-based image synthesis and on accelerating computer vision algorithms during the early days of GPUs. Nuno Subtil holds a computer science degree from the University of Coimbra, Portugal.

Base recalibration is a crucial step in data processing for DNA and RNA sequencing. Established in 2010 by our group in conjunction with the 1000 Genomes project, recalibrating the probability of error for each base in a genome based on counting observations and re-modeling the empirical error has proven to correctly down estimate the systematic errors made by the sequencing instrument allowing bayesian variant calling algorithms to make the most accurate choice. The task of counting observations in the entire genome is daunting and slow. In this talk we will show how we adapted the algorithm for GPU processing to improve the very long runtimes of this process and how the use of GPUs puts us one step closer to enable fast diagnostics of critical patients in need of a fast answer.

Level: All
Type: Talk
Tags: Life & Material Science; Big Data Analytics; Developer - Algorithms

Day: Tuesday, 03/17
Time: 13:00 - 13:50
Location: Room 212A
View Recording

S5607 - Creating CONSTRUCT: A GPU-Rendered Short Film

Kevin Margo Director, VFX Supervisor, blur studio
Highly-Rated Speaker
Kevin Margo
Kevin is an independent director and vfx supervisor at blur studio.

Kevin will describe how Chaos V-Ray RT and NVIDIA GPUs were used throughout production on his groundbreaking short film CONSTRUCT, rendered entirely on GPUs. Go here here (http://constructfilm.com/) to see more of the project and here (https://www.youtube.com/watch?v=nnaz8q6FLCk) to see how interactive GPU rendering was used on a motion capture stage during production.

Level: All
Type: Talk
Tags: Media & Entertainment; Rendering & Ray Tracing; Real-Time Graphics; Press-Suggested Sessions: Professional Graphics

Day: Tuesday, 03/17
Time: 13:00 - 13:25
Location: Room LL21D

S5637 - ZFAS - The Brain of Piloted Driving at Audi

Matthias Rudolph Head of Architecture Driver Assistance Systems, Audi AG
Matthias Rudolph
Dr. Rudolph studied Electrical Engineering at the University of Kassel and got his Ph.D. in Aerospace Engineering and Engineering Mechanics from Iowa State in 1999 with a minor in mathematics. After holding various positions at Audi he took in 2009 the Lead of the Department "Architecture Driver Assistance Systems". The zFAS project is one of the core development of the department. Dr. Rudolph is a member of the management at Audi.

During the last several years, Audi has developed with partners a platform that enables piloted driving and piloted parking. At CES 2015 it was shown that the system can drive piloted on the highway from Silicon Valley to Las Vegas. The computational platform or brain of this vehicle is called zFAS, with the core element being the NVIDIA Tegra K1. This talk will start with the history and the motivation of piloted functions at Audi, followed by an overview of the current architecture and an outline of future potential leveraging deep learning algorithms.

Level: Intermediate
Type: Talk
Tags: Automotive; Computer Vision & Machine Vision; Video & Image Processing; Press-Suggested Sessions: Cars

Day: Tuesday, 03/17
Time: 13:00 - 13:25
Location: Room LL21F
View Recording

S5670 - NVIDIA VXGI: Dynamic Global Illumination for Games

Alexey Panteleev Developer Technology Engineer, NVIDIA
Alexey Panteleev
Alexey Panteleev is the lead engineer of VXGI who has been working on its algorithms since 2012. He joined NVIDIA in 2010 as a GPU application performance engineer. In 2013, Alexey received a Ph.D. in computer architecture from the Moscow Engineering and Physics Institute.

VXGI is the new real-time dynamic global illumination technology from NVIDIA that can completely change the way that games look. We'll demonstrate the possibilities it provides, describe what is required to use VXGI in a rendering engine, and talk about the basics of the algorithm that is applied to compute indirect illumination, along with the limitations of this algorithm. We'll also show some techniques that you can use with VXGI's custom voxelization and cone tracing shaders.

Level: All
Type: Talk
Tags: Real-Time Graphics; Game Development

Day: Tuesday, 03/17
Time: 13:00 - 13:50
Location: Room LL21B
View Recording
View PDF

S5824 - Delivering Workstation Class Graphics Anywhere with HP Remote Graphics Software (Presented by HP)

Annika Muehlbradt SW Engineer, HP
Annika Muehlbradt
Annika is a Software Engineer at HP. She specializes in evaluating, deploying and troubleshooting Workstation Graphics Technologies. Annika received her B.S., Computer Science and Applied Computing Technology from Colorado State University.

HP Remote Graphics Software enables instant, secure access to graphics-rich application anywhere. With native Windows and Linux support, built in collaboration functionality, rock solid performance, and the ease and simplicity of deployment, HP RGS is the go-to remote protocol for workstation class users. Join us for an interactive discussion highlighting the benefits of remote workstation access and a live demonstration of the latest software innovations in 3D graphics remoting.

Level: Intermediate
Type: Talk
Tags: Real-Time Graphics; Media & Entertainment; Graphics Virtualization

Day: Tuesday, 03/17
Time: 13:00 - 13:50
Location: Room 210H
View Recording

S5862 - Server-Based GPUs: Expanding the Use Cases for Application and Desktop Delivery

Nathan Hill Research Director - Desktop Virtualization, Gartner, Inc.
Nathan Hill
Nathan Hill is a Research Director in Gartner Research, where he is part of the IT Systems, Security and Risk group. His research covers end-user computing, with a primary focus on desktop virtualization, including, but not limited to, hosted virtual desktops and server-based computing. In addition, Mr. Hill assists organizations in the justification and planning of technology deployments. Prior to joining Gartner, Mr. Hill was the global product manager for HP, covering client virtualization solutions in the Enterprise Services division. He has extensive outsourcing and managed service experience in both server and desktop infrastructure services across all industry verticals. This gives him a deep insight into a diverse set of business issues and how they can be addressed via technology solutions.

New server-based GPU technologies are enabling richer end-user content delivery, whilst leveraging the benefits of the centralized delivery architectures of Virtual Desktop Infrastructure (VDI) and Server Based Computing (SBC). This Gartner presentation explores the current market landscape, the business value of evolving use cases and what developments we can expect in the future.

Level: All
Type: Talk
Tags: Graphics Virtualization

Day: Tuesday, 03/17
Time: 13:00 - 13:50
Location: Room LL20C

S5866 - Project Tango: Mobile 3D Tracking and Perception

Johnny Lee Lead, Project Tango, Google
Highly-Rated Speaker
Johnny Lee
Johnny Lee is the lead of Project Tango at Google - which is a focused effort to bring computer vision and advanced sensor fusion to mobile platforms. Previously, he helped Google X explore new projects as Rapid Evaluator and was a core algorithms contributor to the original Xbox Kinect. His YouTube videos demonstrating Wii remote hacks have surpassed over 15 million views and became one of the most popular TED talk videos. In 2008, he received his PhD in Human-Computer Interaction from Carnegie Mellon University and has been recognized in MIT Technology Review’s TR35.
James Fung Platform Software Lead, Project Tango, Google
James Fung
James Fung has been applying GPUs to accelerate general purpose parallel computing, with a focus on image processing and computer vision. He received his Ph.D. in Electrical and Computer Engineering from the University of Toronto. He worked in Developer Technology at NVIDIA helping adoption of GPU Computer Vision. He is currently the Platform Software Lead in Project Tango at Google

Project Tango is a focused effort accelerate the progress and adoption in of 3D tracking and sensing technologies on mobile devices. It is a platform for developing advanced computer vision and sensor fusion algorithms to estimate position and orientation of the device in the real-time, while simultaneously generating a 3D map of the environment. This talk we discuss some of the underlying technologies that make this possible, such as the hardware sensors and some of the software algorithms. We will also show demonstrations of current state of development, and discuss the role of 3D sensing in mobile gaming, indoor navigation, virtual reality, augmented reality, and autonomous drones. We hope you will join us on this journey. We believe it will be one worth taking.

Level: All
Type: Talk
Tags: Computer Vision & Machine Vision; Augmented Reality & Virtual Reality; Press-Suggested Sessions: Deep Learning & Computer Vision

Day: Tuesday, 03/17
Time: 13:00 - 13:50
Location: Room 212B

S5868 - Huawei and NVIDIA Collaborations in Optimized GPU Computing Solutions (Presented by Huawei)

Gary Xia Principal Architect, Huawei FusionAccess, Huawei
Gary Xia
Industry veteran who has worked on virtual desktop technologies for the last ten years. Has been with Huawei for the last four years and lead the technical innovation for Huawei FusionAccess.
Francis Lam Huawei IT Hardware Architect, Huawei
Francis Lam
15+ years experiences in server system design, product planning and management with various world leading IT system companies. Francis joined Huawei four years ago, responsible for server hardware architecture and HPC solutions design.

This presentation outlines the strong collaboration between Huawei and NVIDIA to offer state-of-the-art optimized GPU computing solutions. An overview of Huawei HPC computing systems and use cases will be presented in addition to a deeper dive into the innovative Huawei FusionAccess remote desktop solution.

Level: All
Type: Talk
Tags: Data Center, Cloud Computing & HPC

Day: Tuesday, 03/17
Time: 13:00 - 13:50
Location: Room LL21E
View Recording
View PDF

S5906 - NVScene Opening and Shadertoy Hackathon Kickoff

Level: All
Type: Talk
Tags: NVScene

Day: Tuesday, 03/17
Time: 13:00 - 13:50
Location: Room LL20A
View Recording

S5108 - Vision-Based Driver Assistance: Seeing the Way Forward

Ian Riches Director, Global Automotive Practice, Strategy Analytics
Ian Riches is a Director in the Global Automotive Practice at Strategy Analytics. He heads a research team that covers all aspects of embedded automotive electronic systems, semiconductors and sensors on a worldwide basis. His areas of research include powertrain, chassis, safety, security and body applications – including high-growth areas such as hybrid and electric vehicles and advanced driver assistance systems. Before joining Strategy Analytics, Ian spent two years as assistant editor of Automotive Engineer, the UK magazine published by the IMechE. He has also held the position of Press Officer/Technical Author for MTL, a safety-related electronic equipment manufacturing company. With over eighteen years of experience, he is one of the foremost industry analysts in the automotive electronics sector. Ian holds an MA in engineering from Cambridge University, UK, where he specialized in fluid dynamics, turbo-machinery and internal combustion engines.

This market introduction to vision-based solutions in advanced driver assistance systems will highlight the regions, applications and vehicle sectors are driving the growth. Current and likely future architectures will be explored and the implications for both traditional and non-traditional automotive suppliers will be highlighted. Finally, the role of and implications of automated driving will be investigated and analyzed.

Level: All
Type: Talk
Tags: Automotive; Computer Vision & Machine Vision; Video & Image Processing; Press-Suggested Sessions: Cars

Day: Tuesday, 03/17
Time: 13:30 - 13:55
Location: Room LL21F
View Recording
View PDF

S5131 - Mobile Visual Search

Martin Peniak Parallel Computing Software Engineer, Cortexica
Martin Peniak
Martin works as a parallel computing software engineer at Cortexica where he develops algorithms for discrete as well as mobile GPUs. Martin got his Ph.D. in GPU computing applied to cognitive robotics and previously collaborated with international EU FP7 ITALK and Poeticon++ consortium that aimed at developing biologically-inspired artificial systems capable of progressively developing their cognitive capabilities through the interaction with their environments. He also collaborated with ESA (European Space Agency) on a project evolving neural network controllers for simulated Mars rover robots. In summer 2012, Martin worked at NVIDIA research in Santa Clara where he evaluated several machine learning algorithms on the next-generation GPU architecture. During his work at NVIDIA, he also developed a novel bio-inspired system for 3D object recognition.More recently, Martin did a TEDx talk, the first one covering GPU computing and its implications to robotics.

The attendees will learn about Cortexica's FindSimilar™ technology. Its algorithms are based on the way the human visual cortex recognises images and objects, meaning that poor lighting conditions, rotated or skewed images and other 'imperfect' objects can all be recognized accurately. In this presentation, you will learn about the challenges in the field of visual search and how our company addresses them by leveraging the processing power of GPUs including the latest NVIDIA K1 processor. This session will include several demonstrations of our technology and the latest mobile applications using NVIDIA K1 processors to speedup the visual search performance.

Level: Intermediate
Type: Talk
Tags: Computer Vision & Machine Vision; Video & Image Processing; Embedded Systems

Day: Tuesday, 03/17
Time: 13:30 - 13:55
Location: Room 210B
View Recording

S5182 - The Future of Human Vision: Preferential Augmentation Using GPUs

Muhammad Shamim Bioinformatics Programmer, Baylor College of Medicine
Muhammad Shamim
Muhammad Shamim is a bioinformatics programmer in Dr. Erez Lieberman Aiden's Lab at the Baylor College of Medicine, working on a variety of projects ranging from big data and genomics to augmented reality. Muhammad is a graduate of Rice University with a BS in Computer Science and a BA in Computational & Applied Mathematics and Cognitive Sciences.

Loss of vision can result from an enormous number of visual disorders, a small subset of which can be addressed using traditional corrective lenses, i.e. by transforming light in accordance with Snell's law of refraction. In principle, a more general class of transformations might help address a broader range of disorders. Discover how GPUs are being used in augmented reality applications to correct or alleviate vision deterioration in real-time, as well as personalize vision in novel ways.

Level: All
Type: Talk
Tags: Augmented Reality & Virtual Reality; Computer Vision & Machine Vision; Video & Image Processing; Medical Imaging; Press-Suggested Sessions: Deep Learning & Computer Vision

Day: Tuesday, 03/17
Time: 13:30 - 13:55
Location: Room LL21C
View Recording
View PDF

S5436 - Deploying Low-Power Embedded Devices with Tegra K1 (Presented by GE)

Dustin Franklin GPGPU Applications Engineer, GE Intelligent Platforms
Highly-Rated Speaker
Dustin is an embedded GPGPU developer and system architect for GE Intelligent Platforms. With a background in robotics and computational imaging, he works with integrators to deploy CUDA-accelerated embedded systems. Visit www.ge-ip.com/gpgpu for more info.

Tegra's low power and computational efficiency are driving the development of new and exciting embedded devices. Explore CUDA-accelerated applications in sensor processing, security & surveillance, robotics, networking, medical imaging, industrial machine vision, energy & agriculture, that tap TK1 to provide next-generation features and capabilities to the user, all while consuming minimal power. Miniaturized Tegra modules can be quickly integrated into end-user products with a variety of packaging options available. Leverage TK1's friendly software ecosystem and code compatibility with NVIDIA's discrete GPUs to architect scalable embedded systems with reduced risk and shortened development cycles.

Level: All
Type: Talk
Tags: Embedded Systems; Video & Image Processing

Day: Tuesday, 03/17
Time: 13:30 - 13:55
Location: Room 210G
View Recording
View PDF

S5481 - Multi-Dimensional, In-GPU-Memory Databases: Streaming Conditional Calculations in Big Data Sets

Peter Strohm Project Manager Research, Jedox AG
Peter Strohm
Peter Strohm obtained his diploma in Computer Science from the University of Freiburg, Germany, in 2008. After that he joined the Inline Processing Team at Fraunhofer Institute for Physical Measurement Techniques IPM, Freiburg, as a software developer for parallel real-time applications. Since 2013, he has been with Jedox as a GPU developer and manager for research projects.

Learn how In-GPU-memory databases can change the way of analyzing real world Big Data sets such as social media entries, webpage hits or business data. Analytical queries in databases often involve calculations of extremely large areas of aggregated values as input for further processing like conditional calculating (if-then-else) or top-k evaluation and therefore often run into memory problems. We present the design of optimized condition-based processors in large data sets combined with a floating frame approach to stream through these data areas. Conditional calculations are especially useful to split large value sets into clusters for further analyzing or aggregating and we will provide examples on real world social media data including localized Twitter trends and Wikipedia page hits.

Level: Intermediate
Type: Talk
Tags: Big Data Analytics; Developer - Performance Optimization; Press-Suggested Sessions: HPC & Science

Day: Tuesday, 03/17
Time: 13:30 - 13:55
Location: Room 210D
View Recording
View PDF

S5581 - Visual Object Recognition Using Deep Convolutional Neural Networks

Rob Fergus Research Scientist , Facebook
Highly-Rated Speaker
Rob Fergus
Rob Fergus is an Associate Professor of Computer Science at the Courant Institute of Mathematical Sciences, New York University. He is also a Research Scientist at Facebook, working in their AI Research Group. He received a Masters in Electrical Engineering with Prof. Pietro Perona at Caltech, before completing a PhD with Prof. Andrew Zisserman at the University of Oxford in 2005. Before coming to NYU, he spent two years as a post-doc in the Computer Science and Artificial Intelligence Lab (CSAIL) at MIT, working with Prof. William Freeman. He has received several awards including a CVPR best paper prize, a Sloan Fellowship & NSF Career award and the IEEE Longuet-Higgins prize.

This talk will describe recent progress in object recognition using deep convolutional networks. Over the last few years, these have demonstrated significant gains over traditional computer vision approaches and are now widely used in industry (e.g. Google, Facebook, Microsoft, Baidu). Rob Fergus will outline how these models work, and describe architectures that produce state-of-the-art results on the leading recognition benchmarks. GPUs are an essential component to training these models. The talk will conclude with a live demo.

Level: Intermediate
Type: Talk
Tags: Machine Learning & Deep Learning; Computer Vision & Machine Vision; Press-Suggested Sessions: Deep Learning & Computer Vision

Day: Tuesday, 03/17
Time: 13:30 - 13:55
Location: Room 210A
View Recording

S5608 - Advancements in V-Ray RT GPU

Vladimir Koylazov CTO, Chaos Software
Highly-Rated Speaker
Vladimir Koylazov
Vladimir is CTO of Chaos Software and one of the original developers of the popular V-Ray raytracing engine.
Blagovest Taskov Software Developer, Chaos Software Ltd.
Blagovest is one of the developers of V-Ray RT GPU and implemented many of the features described in the talk.

This talk discusses recent advancements in V-Ray RT GPU towards a fully-featured production renderer. Covered topics include implementations on the the GPU for hair raytracing, sub-surface scattering, out-of-core texture paging, displacement and others.

Level: Advanced
Type: Talk
Tags: Media & Entertainment; Rendering & Ray Tracing

Day: Tuesday, 03/17
Time: 13:30 - 13:55
Location: Room LL21D
View Recording
View PDF

S5659 - Accelerating Mountain Bike Development with Optimized Design Visualization

Geoff Casey Product Design Manager, Santa Cruz Bicycles
Geoff Casey
An Industrial Designer for 14 years, Geoff traded the wet pavement of Seattle for the Redwood trails of California's Santa Cruz Mountains 4 years ago. He's currently the Design Manager at Santa Cruz Bicycles, where he and his team define the aesthetic direction of all things Santa Cruz.

Santa Cruz Bicycles is an industry leading manufacturer of high-end, high-performance mountain bikes. Join Product Design Manager, Geoff Casey as he demonstrates his team's approach to creating bikes that are at the forefront of engineering. With color and graphic design such a critical aspect of bike design, the company leverages visual computing tools to gain an advantage in a highly competitive industry. Harnessing the power of the GPU in conjunction with Bunkspeed's 3D visualization software, Santa Cruz's design team rapidly realizes their vision in real time, making on the fly design decisions that cut both time and cost out of the product development lifecycle.

Level: All
Type: Talk
Tags: Manufacturing; Product Design & Styling; Rendering & Ray Tracing

Day: Tuesday, 03/17
Time: 13:30 - 13:55
Location: Room LL21A
View Recording
View PDF

S5119 - Making Virtual Crash Tests a Reality

Eric DeHoff Principle Engineer - Vehicle Structures Research, Automotive Safety, Honda R&D Americas, Inc.
Eric DeHoff
Eric started his career as a rocket scientist at NASA in Houston in 1988. He joined Honda R&D Americas in Ohio in 1996 in the first CAE group responsible for developing all CAE methods for the company. He is now Technical Leader in the Vehicle Research - Automotive Safety Group responsible for developing new methods to analyze simulations of crash tests.

To save money, most auto manufacturers utilize virtual models instead of expensive prototype vehicles throughout the design and development process. The challenge is to ensure that the virtual world accurately reflects what happens in physical crash tests. With the latest photo-realistic rendering technology, crash simulation results can be made to look like the actual physical test results. This leads to better understanding, communication and faster decision making from both experts and non-experts. This presentation will demonstrate how photo-realistic rendering brings real and virtual worlds closer together.

Level: All
Type: Talk
Tags: Manufacturing; Automotive; Press-Suggested Sessions: Professional Graphics; Press-Suggested Sessions: Cars

Day: Tuesday, 03/17
Time: 14:00 - 14:25
Location: Room LL21A

S5148 - Nvpro-Pipeline: A Research Rendering Pipeline

Markus Tavenrath Senior Developer Technology Engineer, NVIDIA
Markus Tavenrath
Markus Tavenrath finished his studies in computer science with focus on computer graphics in 2008. He was one of the first using raytracing on CUDA for this diploma thesis which brought him straight to NVIDIA. There he primarily worked on GPU raytracing for SceniX, NVIDIA's scenegraph technology, which had been showcase at SIGGRAPH 2008. Afterwards he applied his experience to implement parts of OptiX, improve SceniX and develop several raytracing demos. In close cooperation with external partners he improved rendering performance and scenegraph usability as developer technology engineer. Now he is using the gained knowledge to experiment with future rendering technologies that bring high interactivity to complex scenes. This work includes both CPU and GPU strategies to solve typical scenegraph operations related to rendering.

Nvpro-pipeline is a research rendering pipeline based on SceniX featuring a scene graph, an effect system including support for OIT algorithms, a xbar which generates a flat list of objects to render, a frustum culling system, and RiX as rendering backend which supports several OpenGL techniques to keep the CPU cost of rendering as minimal as possible. This talk will present the different modules of the pipeline and some of the implementation details.

Level: Intermediate
Type: Talk
Tags: Real-Time Graphics; Developer - Performance Optimization; Rendering & Ray Tracing

Day: Tuesday, 03/17
Time: 14:00 - 14:50
Location: Room LL21B
View Recording
View PDF

S5164 - Using a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics

Jorge González-Domínguez Post-Doc, JGU Mainz
Jorge González-Domínguez
Jorge González-Domínguez received the B.Sc., M.Sc. and PhD degrees in Computer Science from the University of A Coruña, Spain, in 2008, 2010 and 2013, respectively. He is currently a postdoctoral researcher in the Parallel and Distributed Architectures Group at the Johannes Gutenberg University Mainz, Germany. His main research interests are in the areas of high performance computing for bioinformatics, PGAS programming languages and GPU parallelization with CUDA.

In this session, you will learn about how to develop a bioinformatics tool for GPU clusters using CUDA and UPC++, a PGAS language. In particular, the analyzed tool detects epistasis between SNP-pairs of a GWAS dataset. I will describe: (1) how to distribute the workload among different GPUs using a UPC++; and (2) how to exploit the GPU characteristics to speedup the epistasis detection. Results on two different clusters with different GPUs and characteristics will be presented.

Level: Intermediate
Type: Talk
Tags: Life & Material Science

Day: Tuesday, 03/17
Time: 14:00 - 14:25
Location: Room 212A
View Recording
View PDF

S5168 - So You Want to Create the Holodeck? A Closer Look at OTOY's Lightfield Technology

Jules Urbach CEO & Founder, OTOY Inc.
Jules Urbach
Jules Urbach is a pioneer in computer graphics, streaming and 3D rendering with over 25 years of industry experience. He attended Harvard-Westlake high school in LA before being accepted to Harvard University. He decided to defer his acceptance to Harvard (indefinitely as it turned out) to make revolutionary video games. To that end, he made his first game, Hell Cab (Time Warner Interactive) at age 18, which was one of the first CD-ROM games ever created. Six years after Hell Cab, Jules founded Groove Alliance. Groove created the first 3D game ever available on Shockwave.com (Real Pool). Currently, Jules is busy working on his two latest ventures, OTOY and LightStage which aim to revolutionize 3D content capture, creation and delivery.

Attendees will learn about OTOY's light field rendering technology which allows for immersive experiences on mobile HMDs and next gen displays. OTOY is actively developing a groundbreaking light field rendering pipeline, including the world's first portable 360 LightStage capture system and a cloud-based graphics platform for creating and streaming light field media for virtual reality and emerging holographic displays. OTOY's breakthroughs in compression and rendering on NVIDIA GPUs have dramatically reduced the barriers for light field video streaming, making it a viable media format that gives content creators everywhere a simple, cost-effective way to bring high quality, interactive 3D content to multiple platforms for the world to enjoy.

Level: Intermediate
Type: Talk
Tags: Media & Entertainment; Augmented Reality & Virtual Reality; Real-Time Graphics; Rendering & Ray Tracing

Day: Tuesday, 03/17
Time: 14:00 - 14:25
Location: Room LL21D
View Recording

S5201 - SMTool: A GPU based Satellite Image Analysis Tool

Dilip Patlolla R & D Staff, Oak Ridge National Laboratory
Highly-Rated Speaker
Dilip Patlolla
Dilip Patlolla is an R & D Staff in the Geographic Information Science and Technology (GIST) Group at the Oak Ridge National Laboratory, which has been a pioneer in the development, implementation, and application of systems, science, and technology for geographic information. His primary responsibilities include: opening up new domains of application for HPC, FPGAs, GPUs by researching and developing computing algorithms, and ensuring best possible performance on current and next-generation architectures. He leads the development of mapping and characterizing global-scale human settlements using advanced computing methods and received ORNL's 2013 Significant Event Achievement Award for the effort.

This session will demonstrate our advanced satellite image analytic tool referred as SMTool built on the CUDA platform to process city-scale sub-meter resolution satellite imagery to detect and discriminate man-made structures. Automated analysis of large-scale high resolution satellite imagery requires computationally efficient image representation techniques that characterize the distribution of structures in the scene. The interesting structures could range from simple edges, lines, to complex shapes of objects on the ground.Different representation techniques and their careful implementation exploiting the GPU architecture will be reviewed. We present results of SMTool from our ongoing work supporting global-scale population mapping and polio eradication and immunization efforts.

Level: All
Type: Talk
Tags: Computer Vision & Machine Vision; Machine Learning & Deep Learning; Big Data Analytics; Supercomputing; Press-Suggested Sessions: HPC & Science

Day: Tuesday, 03/17
Time: 14:00 - 14:25
Location: Room 210B
View Recording
View PDF

S5276 - PG-Strom: Query Acceleration Engine of PostgreSQL Powered by GPGPU

Kohei KaiGai Lead, PG-Strom Project, NEC
Kohei KaiGai
Kohei has about ten years experiences in development of OSS/Linux, and contributed to both SELinux and PostgreSQL projects. He launched SE-PostgreSQL project at 2006, which has been partially revised and upstreamed in the v9.1 release. Kohei is also interested in parallel query executions and the PG-Strom extension that run queries using GPU devices. In the v9.3 development cycle, he mainly worked for Row-Security and Writable FDW features; both can become basis of SE-PostgreSQL and PG-Strom.

This session will introduce how we integrated CPU acceleration on PostgreSQL database, with keeping 100% compatibility for application's landscape. RDBMS is a long standing and widely used technology, it has been still core of business activities, however, increasion of data size makes performance concerns. PG-Strom is an extension of PostgreSQL database, designed to off-load several CPU intensive query workloads (scan, join and aggregation; right now) to GPGPU, then x10 times faster than existing SQL implementation. Its characteristics well fits usual workloads of BI (business intelligence) tools in cost effective way, but not all. PG-Strom extension is released under the GPLv2 terms, and will be supported by PostgreSQL v9.5.

Level: Intermediate
Type: Talk
Tags: Big Data Analytics

Day: Tuesday, 03/17
Time: 14:00 - 14:25
Location: Room 210D
View Recording
View PDF

S5434 - High-Performance Molecular Simulation With GROMACS: Heterogeneous Acceleration on x86, ARM & Power

Erik Lindahl Professor, KTH Royal Institute of Technology
Erik Lindahl holds dual appointments as professor of biophysics at KTH Royal Institute of Technology & Stockholm University, and is the director of the bioinformatics platform of the Swedish Science for Life Laboratory national infrastructure. He has previously been a postdoctoral scholar with Michael Levitt at Stanford University, and worked at Groningen University and the Pasteur Institute. The research in the Lindahl lab is focused on advancing the state of the art of methodology and implementations in biomolecular simulations and applications to membrane proteins, in particular understanding and altering properties of the voltage and ligand gated ion channels responsible for nerve impulse transmission. Erik Lindahl is the principal investigator of the international multi-site team developing the GROMACS molecular simulation toolkit, which is one of the most widely used scientific codes in the world. His team is responsible for a state-of-the-art highly parallel GPU accelerated molecular dynamics implementation that relies on heterogeneous acceleration for a number of architectures.

This session will showcase how the latest CUDA devices have expanded beyond x86 in high performance computing (HPC), and are enabling new combinations with power-efficient ARM or extreme-performance Power processors. In particular we will describe the challenges in accelerating our molecular simulation code GROMACS, combined with general HPC conclusions. We will cover challenges and advantages compared to x86 and discuss strategies for scheduling and partitioning work over wide ranges of GPU & CPU hardware, in particular for heterogeneous acceleration, large-scale parallelization, and achieving outstanding scientific code performance. The registrants should ideally have some experience from scientific computing and/or biomolecular simulation.

Level: Intermediate
Type: Talk
Tags: Supercomputing; Life & Material Science

Day: Tuesday, 03/17
Time: 14:00 - 14:50
Location: Room 212B
View Recording
View PDF

S5484 - GPUdb: GPU-Accelerated Distributed Database

Eli Glaser Senior Software Engineer, GIS Federal
Eli Glaser
Eli Glaser is a Senior Software Engineer at GIS Federal with 15 years of experience developing high performance software for signal, image, video, and data processing applications. He has been writing software targeting GPUs for over 5 years on projects ranging from synthetic aperture radar to video processing and is now working on developing the core GPU infrastructure and algorithms for GPUdb.

GPUdb is a high performance GPU-accelerated distributed database. Users can ingest arbitrary data and then run queries against the data via an SQL-like syntax. Queries are handled by our highly optimized GPU-accelerated distributed back-end. Complex filters and server-side visualizations typically complete in under one second, even for billions of objects. Find out how GPUdb can help you solve your big data challenges.

Level: All
Type: Talk
Tags: Defense; Big Data Analytics; Data Center, Cloud Computing & HPC

Day: Tuesday, 03/17
Time: 14:00 - 14:25
Location: Room 210C
View Recording
View PDF

S5588 - Self-Driving Vehicles: Changing the Mission of Human-Machine Interface

Walter Sullivan Head of Silicon Valley Innovation Lab, Elektrobit
Walter Sullivan
Walter Sullivan is head of Elektrobit (EB) Automotive's newly established Silicon Valley Innovation Lab, responsible for developing and leading the company's presence in Silicon Valley, as well as building and fostering strategic partnerships around the globe. Sullivan joined Elektrobit Automotive from Microsoft, where he's held multiple leadership positions since 1995, most recently as a senior manager, product planning and strategy for the Automotive Business Unit. Sullivan previously was a lead program manager and architect at Microsoft, working on the development of cutting-edge embedded systems for leading automakers such as Fiat, Ford and Kia, among others. He began his career at Microsoft in the Visual C++ division. Sullivan holds a Bachelor of Arts degree in Computer Engineering from Seattle Pacific University.

Highly connected vehicles have clear implications on various aspects of the driver-vehicle interaction. The HMI design will be influenced by the high load of information that will be put on the driver. How can information best be presented? How can it be selected? Is the idea of a work load manager still relevant? On the other hand, autonomous driving brings new challenges for the vigilance and distraction of the driver. How can the driver be pulled back into the loop when required? When is it required? How can drivers be informed about the limits of the machine? We will also discuss methods on how to "measure" HMI and driving performance in automation, such as steering wheel reversal rate, standard deviation lane position, speed keeping and more.

Level: Advanced
Type: Talk
Tags: Automotive; Augmented Reality & Virtual Reality; Press-Suggested Sessions: Cars

Day: Tuesday, 03/17
Time: 14:00 - 14:25
Location: Room LL21F
View Recording
View PDF

S5594 - Storage: Critical to the Success of vGPU Workloads (Presented by Pure Storage)

Kyle Grossmiller VDI Solutions Architect, Pure Storage
Kyle Grossmiller
Kyle is a VDI Solutions Architect at Pure Storage, where he focuses on helping customers bring their VDI projects to the next level of success using Pure’s All-Flash Arrays. Prior to joining Pure, Kyle was at Lockheed Martin Space Systems Company for over 12 years where he worked in dual IT roles supporting their CAD, CAM and FAE engineering user base as well as serving as the technical lead for their internal private-cloud VDI. Recently coming from a larger enterprise, Kyle is intimately aware of the challenges faced by those organizations and is extremely optimistic about the transformative positive change that emerging technologies like vGPU and Pure All-Flash Arrays will be able to rapidly bring about.
Ravi Venkat Sr. Solution Architect, Pure Storage
Ravi is a Virtualization Solutions Architect at Pure Storage for the last three years where he is the expert at the intersection of flash, virtualization, and remote graphics. Prior to Pure, he held a similar role at Cisco where he helped drive the virtualization benefits of Cisco's new servers - Unified Computing System (UCS) and build reference architectures that are still being used today. He has held various engineering positions at VMware (for three plus years) and Veritas working on storage virtualization, volume management and file system technologies for about eight years. Ravi has been very active in the VMware community and has given several talks in vBrownBag, VMworld, and Cisco LIVE. For all his contribution to the Virtualization community, Ravi was awarded the vExpert for 2013. He has a Master's degree in Computer Science and Engineering from Santa Clara University and also holds the VCAP5-DCA certification.

Storage is a critical part of any Virtual Desktop Infrastructure (VDI) deployment. Successful deployment of a large scale of rich graphics VDI with vGPU requires determining the Input/Output Operations per Second (IOPs). The type of application, data set size, redundancy, and compressibility combined with other parts of the VDI architecture drive the need for IOPs. Fortunately, the advent of affordable all flash arrays with enterprise services like HA, replication, and snapshots can meet or exceed the needs for these demanding systems. This session will cover the factors to determine the IOPs and how all flash arrays is a new class of storage ideally suited to power the storage for this demanding architecture.

Level: Intermediate
Type: Talk
Tags: Graphics Virtualization; Developer - Performance Optimization; Data Center, Cloud Computing & HPC

Day: Tuesday, 03/17
Time: 14:00 - 14:50
Location: Room LL20D
View Recording

S5626 - Accelerating Computer Vision and Augmented Reality via GPGPU Computing

Jack Dashwood Senior PR & Marketing Manager, Metaio
Jack Dashwood
Jack Dashwood is the PR & Marketing manager at Metaio Inc. He received his Bachelor of Science in Psychology, and Bachelor of Commerce in Finance from the University of Calgary, Canada, and later obtained a Master's in Global Business from the University of Victoria. Having recently relocated from Metaio's Munich HQ, Jack now oversees Metaio's PR and marketing activities in North America.

It is no secret that augmented reality is a computationally-intensive endeavor. While human sight is taken for granted by the average person, getting a computer to "see" in a way that remotely resembles our expectations is an extremely complex challenge. Tasks such as extracting significant features from a camera feed, estimating a camera pose, and finally rendering digital content appropriately demand a huge amount of processing power from the CPU of today's mobile devices. Compounding this problem is the increasing interest 3D object tracking and depth-sensing cameras. Metaio CEO Dr. Thomas Alt will illustrate the current "CV Bottlenecks" and how GPU-based solutions can significantly improve the increasingly important mobile computer vision and augmented reality apps coming to market.

Level: All
Type: Talk
Tags: Augmented Reality & Virtual Reality; Computer Vision & Machine Vision; Machine Learning & Deep Learning

Day: Tuesday, 03/17
Time: 14:00 - 14:50
Location: Room LL21C
View Recording
View PDF

S5638 - Delivering Best-in-Class Graphic Intensive Virtual Workspaces with Cisco UCS (Presented by Cisco)

Aniket Patankar UCS Product Manager, Cisco Systems Inc.
UCS Product Manager Responsible for developing and executing strategy for Cisco Unified Computing System (UCS) solutions for: Desktop Virtualization, VDI, Scale-out Storage, Software Defined Storage, Converged Infrastructure, Private/Hybrid Cloud Based Application Delivery, Application Centric Infrastructure. More than 8 years of Product Management, Technical Marketing & Software Engineering experience in Server, Management Systems, Networking, Datacenter Virtualization & Cloud architectures
Timothy Ma Product Mangager, Cisco

Desktop virtualization implementers are reaping the benefits of Cisco Unified Computing to provide end users an uncompromised experience. This session will detail key design strategies and best practices for deploying Graphic Intensive VDI on Cisco UCS with NVIDIA GPU. Topics will include: an overview on enhanced Graphics capabilities supported by newer generation of UCS servers, UCS2.0; how Cisco UCS Director can help unify and automate the management of your desktop virtualization infrastructure from end to end; how to simplify the manageability of GPU-enabled VDI solutions with Cisco UCS C-Series Rack Servers with Single Connect technology and; how to accelerate your path to ROI with VDI using the latest Cisco Validated Designs for VDI with FlexPod, VSPEX.

Level: Intermediate
Type: Talk
Tags: Graphics Virtualization; Data Center, Cloud Computing & HPC

Day: Tuesday, 03/17
Time: 14:00 - 14:50
Location: Room LL20C
View Recording

S5713 - Collaborative Feature Learning from Social Media

Hailin Jin Principal Scientist, Adobe
Hailin Jin
Hailin received his Ph.D. degree in Electrical Engineering from Washington University in Saint Louis. His research interests include deep learning, 3D reconstruction, structure from motion, optical flow, and stereo. His work can be found in several Adobe products, including Photoshop, After Effects, Premiere Pro, and Behance.

Image feature representation plays an essential role in image recognition. The current state-of-the-art feature learning paradigm is supervised learning from labeled data. However, this paradigm requires large-scale category labels, which limits its applicability to domains where labels are hard to obtain. I will present a new data-driven feature learning paradigm which does not rely on category labels. Instead, we learn from user behavior data collected on social media. We use the image relationship discovered in the latent space from the user behavior data to guide the image feature learning. Also presented is a new large-scale image and user behavior dataset collected on Behance.net. The dataset consists of 1.9 million images and over 300 million view records from 1.9 million users.

Level: All
Type: Talk
Tags: Machine Learning & Deep Learning; Press-Suggested Sessions: Deep Learning & Computer Vision

Day: Tuesday, 03/17
Time: 14:00 - 14:25
Location: Room 210A
View Recording
View PDF

S5782 - Reinventing the Wheel - One Last Time

Ricardo Cabello Freelance
Ricardo Cabello
Ricardo Cabello (@mrdoob) is a self-taught computer-graphics programmer. Originally from Barcelona, Cabello began his professional career alternating between roles as a designer and developer. In his spare time, his involvement in the demoscene set him on the path to learning graphics programming. Combining his background as a designer and expertise in development, his work ranges from simple interactive digital toys — Google Gravity, Ball Pool and Harmony — to full featured experiences — The Johnny Cash Project, The Wilderness Downtown and ROME. Nowadays, Cabello spends most of his time developing open source libraries and tools — three.js, frame.js and stats.js — with the aim of making design and development simpler for everyone.

- Hey! Assembly is in 4 months! Do you want to do a demo for it? - Lets do it! Do you have new effects? - Hmm... Yeah. But I think we should do a new demosystem. The code for our last demo ended up a bit too messy. - Oh! Ok. 8 years later...

Level: All
Type: Talk
Tags: NVScene; Real-Time Graphics

Day: Tuesday, 03/17
Time: 14:00 - 14:50
Location: Room LL20A
View Recording

S5820 - CUDA 7 and Beyond

Mark Harris Chief Technologist, GPU Computing, NVIDIA
Highly-Rated Speaker
Mark Harris is Chief Technologist for GPU Computing at NVIDIA, where he works as a developer advocate and helps drive NVIDIA's GPU computing software strategy. His research interests include parallel computing, general-purpose computation on GPUs, physically based simulation, and real-time rendering. Mark founded www.GPGPU.org while he was earning his PhD in computer science from the University of North Carolina at Chapel Hill. Mark brews his own beer and cures his own bacon in Brisbane, Australia, where he lives with his wife and daughter.

CUDA is NVIDIA's parallel computing platform and programming model. In this talk you'll learn how new support for C++11 in CUDA 7, along with new features and performance improvements in the Thrust C++ parallel algorithms library, and support for runtime compilation, makes parallel C++ more productive than ever. CUDA 7 also includes cuSOLVER, a new direct linear solver library, as well as new features and improved performance in other CUDA libraries. In this talk you'll hear about these features and get insight into the philosophy driving the development of CUDA and how it will take advantage of current and future GPUs. You will learn about NVIDIA's vision for CUDA and the challenges for the future of parallel software development.

Level: Intermediate
Type: Talk
Tags: Developer - Programming Languages

Day: Tuesday, 03/17
Time: 14:00 - 14:50
Location: Room 210H
View Recording
View PDF

S5820A - CUDA 7 and Beyond

Mark Harris Chief Technologist, GPU Computing, NVIDIA
Highly-Rated Speaker
Mark Harris is Chief Technologist for GPU Computing at NVIDIA, where he works as a developer advocate and helps drive NVIDIA's GPU computing software strategy. His research interests include parallel computing, general-purpose computation on GPUs, physically based simulation, and real-time rendering. Mark founded www.GPGPU.org while he was earning his PhD in computer science from the University of North Carolina at Chapel Hill. Mark brews his own beer and cures his own bacon in Brisbane, Australia, where he lives with his wife and daughter.

CUDA is NVIDIA's parallel computing platform and programming model. In this talk you'll learn how new support for C++11 in CUDA 7, along with new features and performance improvements in the Thrust C++ parallel algorithms library, and support for runtime compilation, makes parallel C++ more productive than ever. CUDA 7 also includes cuSOLVER, a new direct linear solver library, as well as new features and improved performance in other CUDA libraries. In this talk you'll hear about these features and get insight into the philosophy driving the development of CUDA and how it will take advantage of current and future GPUs. You will learn about NVIDIA's vision for CUDA and the challenges for the future of parallel software development.

Level: Intermediate
Type: Talk
Tags: Developer - Programming Languages

Day: Tuesday, 03/17
Time: 14:00 - 14:50
Location: Room 210H
View PDF

S5826 - Workstation Physical to Virtual: A Road Fraught with Perils? Here is your Guide! (Presented by Dell)

Gary Radburn Head, Workstation Virtualization, Dell
Gary Radburn
Gary currently heads up the Workstation Virtualization initiatives globally at Dell Inc. With over 25 years experience in the technology industry, ranging from Engineering to Sales and Marketing, Gary has had experience of all aspects of designing successful products and solutions and bringing them to market. He has worked for companies such as Digital Equipment, 3Com and most recently (for the past 13 years) at Dell. Originating from the UK, where he managed the OptiPlex client business for EMEA, he went on to lead the Workstation Solutions team in the US and then championed graphics virtualization for Engineering applications. This has now become one of the most talked about topics from workstation customers today and will be for the foreseeable future. He still retains his English accent thankfully assisted by the presence of BBC America on TV.

In this session we intend to give you tips and tricks to navigate the implementation of a virtualized workstation environment to ensure it is a good fit for your company. How to look at the infrastructure, performance and end users and develop a plan to include all needs and wants and have all aspects satisfied. We will also look at items to help you along the way including tools that will help performance analysis, and even how to set up a proof of concept in virtualization with no money down! Virtualization does not have to be daunting – now is the right time to see if this exciting technology is right for your organization.

Level: All
Type: Talk
Tags: Graphics Virtualization

Day: Tuesday, 03/17
Time: 14:00 - 14:50
Location: Room LL21E
View Recording

S5896 - A Performance, Energy and Accuracy Aware Benchmarking Methodology for Robot Vision

Luigi Nardi Research Associate, Imperial College London
Dr Luigi Nardi is a post-doctoral research associate at Imperial College London in the Software performance optimisation group. Luigi's primary role is to work in the co-design of high-performance computer vision systems where performance, power and accuracy are part of the same optimisation space. Luigi earned his Ph.D. in computer science creating a new performance domain-specific language (DSL) in the context of automatic code generation for applied mathematics. He has almost 10 years of experience in parallel computing and more than 4 years of experience developing GPU enabled codes using CUDA and OpenCL from desktop to embedded. Prior to his current position, Luigi was a permanent researcher at Murex S.A.S. working on the acceleration of production-level computational finance codes for pricing evaluation and risk management on cluster of GPUs.

We introduce SLAMBench, a publicly-available software framework for quantitative, comparable and validatable experimental research to investigate trade-offs in performance, accuracy and energy consumption for real-time 3D scene understanding. 3D scene understanding offers great potential for a new level of scene modelling, localisation and real environmental interaction for many types of robot, but its high computational requirements means that use on mass market embedded platforms is challenging. SLAMBench provides a KinectFusion implementation in C++, OpenMP, OpenCL and CUDA, and a powerful mechanism for reliable accuracy comparison of different implementation and algorithms. We experimentally investigate SLAMBench execution time, energy and accuracy on a variety of multicore and GPU-accelerated platforms.

Level: All
Type: Talk
Tags: Embedded Systems; Computer Vision & Machine Vision; Developer - Performance Optimization

Day: Tuesday, 03/17
Time: 14:00 - 14:25
Location: Room 210G
View Recording
View PDF

S5184 - GPU-Accelerated Algorithm for the Whole Genome Assembly Problem

Michal Kierzynka Researcher, Poznan University of Technology
Michal Kierzynka
I am a PhD student at Poznan University of Technology, Poland. Between 2008 and 2010 I was employed at the Laboratory of Algorithm Design and Programming Systems at the Institute of Computing Science, Poznan, where I worked on pairwise as well as multiple biological sequence alignment on GPU. Currently I'm employed at Poznan Supercomputing and Networking Center. My interests include bioinformatics, high performance computing, new computing architectures and visualization systems.

Learn how to assemble genomes more efficiently and more accurately with a new GPU-accelerated de novo assembler. As Next Generation Sequencing tends to generate immense amounts of genomic data, scientists around the world are constantly looking for faster, cheaper and better software tools for the DNA assembly. This session will describe the first ever GPU-based algorithm for the DNA assembly and explain how the GPUs are used to effectively tackle this complex problem. Participants will also learn some key optimizations that have helped us to achieve the peak performance in sequence alignment on GPUs. Moreover, examples will be given how the software performs on real data coming from the Illumina sequencer. You cannot miss this session if you want to stay up to date!

Level: All
Type: Talk
Tags: Life & Material Science; Developer - Algorithms

Day: Tuesday, 03/17
Time: 14:30 - 14:55
Location: Room 212A
View Recording
View PDF

S5229 - Towards Fast SQL Query Processing in DB2-BLU Using GPUs

Sina Meraji R&D Engineer, IBM
Sina  Meraji
Sina Meraji received the B.S. degree from the Department of Computer Engineering, Amirkabir University, Tehran, Iran, and the M.S. degree from the Department of Computer Engineering, Sharif University, Tehran, Iran. He also has a Ph.D. degree from the School of Computer Science of McGill Universirty, Montreal, Canada in the area of parallel gate level simulation. He also finished his postdoctoral studies in Electrical and Computer Engineering Department of University of Toronto. He is currently a team lead in Hardware Acceleration Lab of IBM/Toronto working and researching on GPU exploitation for database operations. His current research interests include: parallel processing, GPUs, columnar databases and dynamic load balancing.

Column-store in-memory databases have received a lot of attention because of their fast query processing response times on modern multi-core machines. IBM DB2-BLU is an example of such databases. In order to improve the performance of query processing in such databases, GPUs can be used as fast, high bandwidth co-processors. As part of our work, we integrate Nvidia GPUs to DB2-BLU by changing the infrastructure of DB2-BLU and developing GPU kernels. We have a hybrid design in which we use some of DB2-BLU features on IBM's POWER8 processor and NVIDIA's GPU accelerator technology for fast query processing. This work was done in collaboration with Peter Kokosielis.

Level: All
Type: Talk
Tags: Big Data Analytics; Data Center, Cloud Computing & HPC; Developer - Algorithms

Day: Tuesday, 03/17
Time: 14:30 - 14:55
Location: Room 210D
View Recording
View PDF

S5251 - Accelerating Automated Image Processing Pipelines for Cameras with Novel CFAs on GPUs

Qiyuan Tian Ph.D. Candidate, Stanford University
Qiyuan  Tian
Qiyuan Tian is a Ph.D. Candidate in the Department of Electrical Engineering at Stanford University. He received B.Eng. (2011) in Communication Science and Engineering at Fudan University, China, and M.S. (2013) in Electrical Engineering at Stanford University. He studied as an undergraduate exchange student (2009) in the Department of Electronic and Computer Engineering at The Hong Kong University of Science and Techonology. He is working on digital imaging, magnetic resonance imaging and neuroimaging.
Haomiao Jiang Ph.D. Candidate, Stanford University
Haomiao Jiang
Haomiao Jiang is a Ph.D. candidate in the Department of Electrical Engineering at Stanford University. He received B.A. (2011) in Information Security at Shanghai Jiao Tong University, China, and M.S. (2013) in Electrical Engineering at Stanford University. He is working with Professor Brian Wandell on color vision, display modeling and computational photography.

L3 (Local, Linear, Learned) is a new technology to automate and customize the design of image processing pipelines for cameras with novel architecture, such as unconventional color filter arrays. L3 classifies sensor image pixels into categories that are local in space and response and automatically learns linear operators that transform pixels to the calibrated output space using training data from camera simulation. The local and linear processing of individual pixels makes L3 ideal for parallelization. We accelerated the L3 pipeline on NVIDIA® Shield™ Tablets using GPUs for real time rendering of video captured by a multispectral camera prototype. The combination of L3 and GPUs delivers high performance with low power for image processing on mobile devices.

Level: All
Type: Talk
Tags: Defense; Video & Image Processing; Computer Vision & Machine Vision

Day: Tuesday, 03/17
Time: 14:30 - 14:55
Location: Room 210C
View Recording
View PDF

S5272 - Evolutionary Artificial Potential Field for Path Planning: A GPU Implementation

Ulises Orozco-Rosas Ph.D. Candidate, Instituto Politécnico Nacional
Ulises Orozco-Rosas
Ulises Orozco-Rosas received the B.S. degree in electronics engineering from the Autonomous University of Baja California, Mexico, in 2006, and the M.Sc. in Digital Systems from the Instituto Politécnico Nacional, Mexico, in 2014. He is currently pursuing a PhD in Digital Systems from the Instituto Politécnico Nacional. He is working in the Department of Intelligent Systems, Center of Research and Development in Digital Technology (IPN – CITEDI), his research activities include the design of intelligent mobile robots, motion planning, collision avoidance, and software engineering. His current research interest include machine learning, evolutionary computation, parallel and distributed computing systems, and mobile robotics.

Autonomous path planning plays an important role in mobile robots with many methods developed for off-line, on-line and a combination of both approaches. Important points that designers consider in the development of new methods are computational complexity that is closely related to the time-consumption to find the optimal path, reliability in real-life applications, computer resource requirements, and other factors. In this session a GPU implementation of the Evolutionary Artificial Potential Field (EAPF) is presented as an innovative method for path planning in mobile robot navigation. The results demonstrate that the parallel Evolutionary Artificial Potential Field overcomes the sequential implementation and the original Artificial Potential Field proposal.

Level: Beginner
Type: Talk
Tags: Embedded Systems; Automotive; Developer - Performance Optimization

Day: Tuesday, 03/17
Time: 14:30 - 14:55
Location: Room 210G
View Recording
View PDF

S5333 - SceneNet: 3D Reconstruction of Videos Taken by the Crowd on GPU

Chen Sagiv CEO, SagivTech Ltd.
Chen  Sagiv
Dr. Sagiv brings to SagivTech over 15 years of experience in the image processing industry both in Israel and the Netherlands. In addition to her activities with the company, she also collaborates with academic beneficiaries in Israel and Europe. Chen Sagiv holds a PhD from Tel Aviv University in Applied Mathematics, with specializations in texture analysis, filter banks and optimization problems.

If you visited a rock concert recently you probably recognized how many people are taking videos of the scenario, using their mobile phone cameras.The aim of SceneNet is to use these multiple video sources to create a high quality 3D video scene that can be shared via social networks. The SceneNet pipeline starts at the mobile device where the video streams are acquired, pre-processed and transmitted to the server, where the various video streams are registered and submitted to 3D reconstruction. We will share the compute challenges of SceneNet and the GPU based acceleration on mobile devices and the server, from pre-processing on the mobile device to extremely computationally demanding algorithms such as bundle adjustment and 3D reconstruction. SceneNet is a FP7 European funded project.

Level: All
Type: Talk
Tags: Computer Vision & Machine Vision; Developer - Algorithms; Video & Image Processing

Day: Tuesday, 03/17
Time: 14:30 - 14:55
Location: Room 210B
View Recording
View PDF

S5469 - Enabling Next-Gen Vehicle Architectures with Embedded Supercomputing

Uday Pitambare Software Engineer, Delphi
Uday Pitambare
Uday Pitambare works as a Software Engineer with Delphi Labs @ Silicon Valley. He has been involved in development and integration of innovative concepts for infotainment systems & automated driving experience that Delphi has demonstrated at CES & other auto shows. Uday has experience working on GPU technology & Video processing in the past. He earned a Masters in Computer Science from Indiana University, Bloomington.

Evolution of GPU accelerated computing is enabling us to rethink vehicle architecture in ways previously believed not feasible. We will see how Delphi's signature Integrated Cockpit and Multi-domain controller projects now leverage parallel computing to up-integrate traditionally disparate vehicle systems. We will also discuss the advantages and challenges involved in this process.

Level: Beginner
Type: Talk
Tags: Automotive; Press-Suggested Sessions: Cars

Day: Tuesday, 03/17
Time: 14:30 - 14:55
Location: Room LL21F
View Recording
View PDF

S5571 - A High-Density GPU Solution for DNN Training

Franco Mana Research Engineer, NUANCE
Franco Mana
Franco Mana graduated in Computer Science at the Turin University, Italy. With CSELT (the Italian Telecom Research LAB) since 1986, where he developed his thesis in the Artificial Intelligence field, his first scientific interest has been on automatic learning systems. Next, he moved to the field of neural networks mainly applied to speech recognition. Currently he is in R&D division at NUANCE. He is engaged in algorithmic research for speech recognition and there he developed his know-how in the use of GPU for speeding up HMM-NN hybrid system training.

We, at Nuance, use DNN in ASR systems in multiple languages and tasks. We train DNN with large amounts of data. DNNs are trained with gradient descent methods that are slow and difficult to parallelize across machines. In order to speed up the training process, we have developed training algorithms/recipes which can be used to train a DNN in parallel on multiple GPU devices. This can significantly reduce the DNN training time. We will present benchmark results that include the basic computational operations included in DNN training (SGEMM, Memory copy throughput, etc.) as well as the end-to-end training time on different GPU based hardware configurations. In particular we will benchmark systems based on K10 versus systems based on K80, with a number of GPU varying from 1 to 16.

Level: All
Type: Talk
Tags: Machine Learning & Deep Learning; Supercomputing; Signal & Audio Processing; Press-Suggested Sessions: Deep Learning & Computer Vision

Day: Tuesday, 03/17
Time: 14:30 - 14:55
Location: Room 210A
View Recording
View PDF

S5625 - NVIDIA GRID™ at PSA Peugeot Citroën: The Year in Review

Alain Gonzalez Expert Graphic Systems & 3 Imagery, PSA PEUGEOT CITROËN
Alain Gonzalez
A graduate of the University of Paris Sud Orsay with a Master's Degree in Computer Science & Engineering, Alain has worked in PSA Peugeot's IT department since 2000 starting as a Workstations IT Architect. Since 2009, Alain has been involved with the Expert Workstations Graphics Technologies & 3D Imagery area.

After a year of 500 users working with NVIDIA GRID in a virtualized CAD environment at PSA Peugeot Citroen, we will present the who, what, where, why, and how the PSA IT department enables CAD workstations end users to work almost anywhere. Learn how virtualization helps us to handle our business challenges and the benefits and improvements virtualization brought to our business.

Level: All
Type: Talk
Tags: Manufacturing; Automotive; Graphics Virtualization; Product Design & Styling

Day: Tuesday, 03/17
Time: 14:30 - 14:55
Location: Room LL21A
View Recording

S5716 - Redshift: Production-quality, final-frame rendering on the GPU

Panagiotis Zompolas CTO, co-founder, Redshift Rendering Technologies
Panagiotis Zompolas
Panos is the CTO and co-founder of Redshift Rendering Technologies, the makers of "Redshift": the GPU-accelerated production-quality final-frame renderer. Panos is a videogame industry veteran and has worked at several companies such as Sony Computer Entertainment Europe and Double Helix Games (now Amazon Games). He have been following and participating in the GPU revolution since the mid-nineties.
Robert Slater VP Engineering, co-founder, Redshift Rendering Technologies
Robert Slater
Robert is a co-founder of Redshift Rendering Technologies and video game industry veteran. In the last fifteen years he has focused on core rendering technology, squeezing the most out of GPUs of all shapes and sizes.

This talk introduces Redshift, the GPU-accelerated renderer running on NVIDIA GPUs that is redefining the industry's perception towards GPU final-frame rendering. The talk covers features that make Redshift unique among commercial GPU renderers such as out-of-core data access, memory efficiency, multiple GI modes and comprehensive shading capabilities, among others. It also focuses on the technical challenges the Redshift development team faced while implementing final-frame, production-quality rendering on the GPU. A few customer work examples will also be demonstrated. This talk will be of interest both to the industry professional who wants to learn more about GPU-accelerated production-quality rendering as well as the software developer who's interested on GPU-accelerated rendering

Level: All
Type: Talk
Tags: Media & Entertainment; Rendering & Ray Tracing

Day: Tuesday, 03/17
Time: 14:30 - 14:55
Location: Room LL21D
View Recording
View PDF

S5123 - Through the Eyes of a Car: Visualizing a Car's Camera System

Gernot Ziegler Senior Developer Technology Engineer (Computer Vision), NVIDIA
Highly-Rated Speaker
Gernot Ziegler
Dr Gernot Ziegler is an Austrian engineer with a PhD in Computer Science from University of Saarbrücken, Germany, and an MSc degree in Computer Science and Engineering from Linköping University, Sweden. He pursued his PhD studies at the Max-Planck-Institute for Computer Science, where he specialized in GPU algorithms for computer vision and data-parallel algorithms for spatial data structures. After six years in NVIDIA's Developer Technology (Compute) team, working with High Performance Computing, Gernot now has returned to his original research domain and consults in the use of GPU algorithms for computer vision in the automotive as a senior member of the NVIDIA's Developer Technology team for Computer Vision.

Learn how the GPU's real-time graphics capabilities can be used to interactively visualize and enhance the camera system of modern cars. The GPU simplifies design, interactive calibration and testing of the car's computer vision systems, and even allows for creating simulated environments where the behavior of the car's computer vision can be tested to pass standard safety tests or navigational street situations.

Level: Intermediate
Type: Talk
Tags: Automotive; Computer Vision & Machine Vision; Real-Time Graphics; Press-Suggested Sessions: Cars

Day: Tuesday, 03/17
Time: 15:00 - 15:25
Location: Room LL21F
View Recording
View PDF

S5188 - FurryBall RT: New OptiX Core and 30x Speed Up

Jan Tománek Owner/CEO, AAA studio
Jan Tománek
Jan Tománek is a film producer, director and owner of the AAA Studio. AAA Studio was founded in 1990 as one of the first private studios in the Czech Republic. After producing the first East-European CGI feature movie in 2008, Jan decided to develop his own in-house GPU-renderer, FurryBall, based on Direct X 11. In 2012, AAA Studio finished the sequel "Goat Story 2", completely rendered on GPUs, with much better quality than the previous movie and 10 times faster. Jan's philosophy is that renderers unconstrained by trying to be totally realistic allow more artistic freedom.

Jan will present a completely rewritten FurryBall, a real-time, production-quality, GPU-accelerated render, using CUDA and OptiX. Now called FurryBall RT, performance and viewport interactivity has improved 10-30X times compared to the earlier DX-based version. FurryBall's power was proven in rendering a complete, full-length animated 3D stereo movie for cinemas on NVIDIA GPUs.

Level: All
Type: Talk
Tags: Media & Entertainment; Rendering & Ray Tracing

Day: Tuesday, 03/17
Time: 15:00 - 15:25
Location: Room LL21D
View Recording
View PDF

S5216 - Scaling Ion Torrent Semiconductor Sequencing Analysis with GPU's

Mohit Gupta Staff Software Engineer, Thermo Fisher Scientific
Mohit  Gupta
Mohit Gupta is working as a Staff Software Engineer in Genetic, Medical and Applied Sciences division of Life Sciences Solutions, a part of Thermo Fisher Scientific Inc. In this capacity, he is responsible for speeding up algorithms used in data analysis of PGM and Proton DNA sequencers with a particular focus on GPU computing. Prior to his stint at Life, he worked as Senior Research and Development Engineer with Mirafra Technologies, Bangalore, India, in the area of Electronic Design Automation working on compiler for hardware description languages like Verilog. He holds a B.Tech in electrical engineering from Indian Institute of Technology, Bombay, India and M.S. in Computer Engineering from University of California, San Diego. He has published or presented in conferences and workshops like ICCAD, GTC and DFMY.
Jakob Siegel Staff Software Engineer, Thermo Fisher Scientific
Highly-Rated Speaker
Jakob Siegel
Jakob Siegel is currently working as a Staff Software Engineer in Genetic, Medical and Applied Sciences division of Life Sciences Solutions, a part of Thermo Fisher Scientific Inc. His work focusses on High Performance Computing tasks in the context of DNA sequencing. Jakob graduated as a Dipl-Ing in Software Engineering from the University of Applied Sciences in Esslingen Germany, after which he also got a M.S. and a Ph.D. in Electrical and Computer Engineering from the University of Delaware. He has been involved in software projects in a variety of fields from pure computer sciences through the automotive sector, naval communication systems, atmospheric research until he joined the Ion Torrent team in January 2012 to work on the software side of DNA sequencing. In the past Jakob published or presented in multiple computer engineering conferences, workshops and journals for example: Computing Frontiers, ICS, ICPPW, GTC, AACEC, JACT.

Learn how GPU's are playing a central role in conquering compute challenges posed by current and next generation of Ion Torrent DNA sequencing chips in Ion Proton DNA sequencer. We will showcase our complete signal processing pipeline running on GPU and the journey in developing CUDA code for data fitting algorithms targeted at different GPU architectures like Fermi, Keplar and Maxwell. We will also share our evaluation of NVIDIA's aligner nvBowtie and how it stands in terms of speed and accuracy of alignments. We will touch upon several examples in the life sciences field that concern with performing cutting edge research in clinical diagnostics, drug discovery and human identification whose work is rapidly accelerated by turnaround time of our technology powered by GPU's.

Level: Beginner
Type: Talk
Tags: Life & Material Science; Press-Suggested Sessions: HPC & Science

Day: Tuesday, 03/17
Time: 15:00 - 15:50
Location: Room 212A
View Recording
View PDF

S5383 - Mobile 3D Mapping With Tegra K1

Karol Majek Researcher, Institute of Mathematical Machines
Karol Majek
Karol is a PhD Student and Researcher at CUDA Research Center in Institute of Mathematical Machines. He is doing research in robotics. In the last two years he has focused on using CUDA technology in 3D mapping on robotic platforms. Currently he is working on embedding CUDA enabled algorithms to run on Tegra K1.

This work presents 3D mapping algorithm implemented on Tegra K1 device. The data processing pipeline is implemented in parallel using CUDA. The performance and accuracy of the final model were compared to the mobile and desktop GPU results. This work shows how to replace traditional CUDA-enabled laptops with embedded Tegra K1. Attendees will learn about the problems and challenges of embedding parallel 3D mapping algorithm and how to improve its speed.

Level: Intermediate
Type: Talk
Tags: Embedded Systems; Computer Vision & Machine Vision

Day: Tuesday, 03/17
Time: 15:00 - 15:25
Location: Room 210G
View Recording
View PDF

S5398 - GPUs to Mars: Full-Scale Simulation of SpaceX's Mars Rocket Engine

Stephen Jones Lead Software Engineer, SpaceX
Stephen Jones
Stephen is lead engineer of the Simulation and Analytics group at SpaceX, where he works on large-scale simulation of combustion processes in rocket engines. Prior to being at SpaceX he worked at NVIDIA, where he was the architect for the CUDA language and worked closely with NVIDIA's hardware designers to develop new GPU features in support of parallel programming. His background is in computational fluid mechanics and plasma physics, but he has worked in diverse industries including networking, CAD/CAM and scientific computing.
Adam Lichtl Director of Research, SpaceX
Adam Lichtl
Adam is the Director of Research at SpaceX, where he helps identify and develop technologies to advance the company mission: transporting and sustaining a human colony on Mars. His background is in Lattice QCD, but he has worked on particle accelerator design, quantitative financial systems, simulation of combustion devices for chemical rocket propulsion, and high density power generation systems.

SpaceX is designing a new, methane-fueled engine powerful enough to lift the equipment and personnel needed to colonize Mars. A vital aspect of this effort involves the creation of a multi-physics code to accurately model a running rocket engine. The scale and complexity of turbulent non-premixed combustion has so far made it impractical to simulate, even on today's largest supercomputers. We present a novel approach using wavelets on GPUs, capable of capturing physics down to the finest turbulent scales.

Level: All
Type: Talk
Tags: Manufacturing; Computational Physics; Supercomputing; Developer - Algorithms

Day: Tuesday, 03/17
Time: 15:00 - 15:50
Location: Room LL21A
View Recording
View PDF

S5429 - Creating Dense Mixed GPU and FPGA Systems With Tegra K1 Using OpenCL

Lance Brown Director - Radar, EW and HPC, Colorado Engineering Inc
Lance Brown
Lance Brown has been in the COTS hardware world since 1999 after spending 5 years as a software engineering at Nortel and Motorola. Lance has been a field application engineer for Curtiss Wright and GE supporting GPU, CPU and FPGA products for telecom, networking and defense. He is now the director of radar, EW and HPC at Colorado Engineering Inc focusing on high TFLOP, low CSWAP systems. CEI's product line is based on 3D computing architectures co-developed with the Missile Defense Agency and Naval Research Labs. Lance is a graduate of the University of Texas - Arlington with a BS in Computer Science Engineering.

With the introduction of comprehensive OpenCL support and IEE754 hard floating point units for Altera FPGAs and availability of NVIDIA® Tegra® K1 GPUs, opportunities for designing compact solutions that used to require many discrete boards can now be done in small form factors for Distributed Aperture Systems (DAS), Situational Awareness 360 (SA360), Digital Signal Processing (DSP) and 100s of other high performance embedded computing (HPEC) from mil-aero to commercial to industrial to medical to consumer applications. Funded by Missile Defense Agency, Lance Brown will discuss the challenges and benefits of using multiple Altera Arria 10 FPGAs and multiple NVIDIA® Tegra® K1 GPUs on a single card to speed up 6 degrees of freedom simulations.

Level: Intermediate
Type: Talk
Tags: Defense; Embedded Systems; Signal & Audio Processing; Computer Vision & Machine Vision

Day: Tuesday, 03/17
Time: 15:00 - 15:25
Location: Room 210C
View Recording
View PDF

S5431 - Desktop Virtualization 101: The Technical and Business Drivers That Make It Happen

Jeff Weiss NVIDIA GRID SA Manager, NVIDIA
Jeff Weiss is the GRID SA Manager for North America working with the Solution Architecture & Engineering team at NVIDIA. Prior to joining NVIDIA, Jeff has a pedigree that includes a 7 year stint at VMware as an EUC Staff Engineer, as well as time at Symantec and Sun Microsystems. Along with his current focus on vGPU enabled end user computing, his experience includes datacenter business continuity/disaster recovery solutions, software infrastructure identity management and email security/archiving tools. During his tenure, he has architected, sold and deployed complex solutions into a wide array of both public and private accounts, from commercial to government, healthcare to education. Prior to working in sales, he spent time as a networking and datacenter manager at Hughes Aircraft for 14 years. Jeff is based in Los Angeles, CA.
Luke Wignall GRID Performance Engineering Manager, NVIDIA
Highly-Rated Speaker
Luke Wignall
Luke came to NVIDIA after working as an owner of an integrator/VAR, as a sales engineer, solution architect, consultant, and system administrator with both VMware and Citrix technologies in both public and private industry. An early evangelist of virtualization, Luke now sees the ability to bring GPU to the end user experience as the missing "special sauce" that brings virtual desktops to the next level.

An overview of server and desktop virtualization. A brief history of virtualization, the business drivers and the technologies that help to enable virtual desktops and application remoting.

Level: Beginner
Type: Talk
Tags: Graphics Virtualization

Day: Tuesday, 03/17
Time: 15:00 - 15:50
Location: Room LL20C
View Recording

S5450 - Rendering Rich User Experiences in Virtualized Environments

John Meza Performance Engineering Team Lead, Esri
Highly-Rated Speaker
John Meza
John Meza is the team lead of the Esri Performance Engineering Team. The Performance Engineering team is responsible for performance and scalability testing of the Esri ArcGIS platform. This testing effort is integrated into the daily operations of a large ongoing development environment at Esri headquarters in Redlands, CA. The team is responsible for determining and advising on performance in many environments and with new technology. John has also actively participated in the configuration, problem investigation and isolation in the virtualization testing effort. As well as coordinating with virtualization and hardware vendors that are actively participating and supporting the testing.

This presentation will show how NVIDIA® GRID™ K1 and K2 boards benefit Esri ArcGIS Pro, a newly developed 3D GIS platform using DirectX and OpenGL, in Hyper-V, XenDesktop, XenApp and Horizon View virtualized environments. We will show the FPS, elapsed time, virtual GPU configurations and other usability metrics for each virtualized environment configured with NVIDIA® GRID™ K1 and K2 boards. The attendee will see how the environment scaled while providing and acceptable user experience and what the density per GPU was achieved. Also the capabilities of cloud-based VDI environments configured with NVIDIA® GRID™ boards to support graphics applications will be discussed and demonstrated. The attendee will see live demonstrations of the user experience in these virtualized environments.

Level: All
Type: Talk
Tags: Graphics Virtualization; Real-Time Graphics

Day: Tuesday, 03/17
Time: 15:00 - 15:50
Location: Room LL20D
View Recording

S5459 - Exploiting the OpenPOWER Platform for Big Data Analytics and Cognitive Computing

Rajesh Bordawekar Research Staff Member, IBM T. J. Watson Research Center
Rajesh Bordawekar
Rajesh Bordawekar is a research staff member at IBM T. J. Watson Research Center and a member of the Programming Technologies department at the IBM T. J. Watson Research Center. Rajesh studies interactions between applications, programming languages/runtime systems, and computer architectures. His current interest is exploring software-hardware co-design of analytics workloads. Specifically, he has been investigating how GPUs could be used for accelerating key analytics kernels in text analytics, data management, graph analytics, and deep learning.
Ruchir Puri IBM Fellow, IBM T J Watson Research Center, Yorktown Hts, NY, IBM Research
Ruchir Puri
Ruchir Puri is an IBM Fellow at IBM Thomas J Watson Research Center atYorktown Heights, NY where he leads high performance design methodology and SW HW Acceleration research for all of IBM’s enterprise server and system chip designs. Ruchir is a Fellow of the IEEE, an ACM Distinguished Speaker, and has been an IEEE Distinguished Lecturer. He received Asian American Engineer of the year award in 2014. Ruchir was also honored as a John Von-Neumann Chair at Bonn University, Germany and has been an adjunct professor at Dept. of Electrical Engineering, Columbia University, NY.

In this session you will learn about how IBM is exploiting GPUs in its new IBM OpenPOWER platform for acceleration of Big Data Analytics and Cognitive Computing solutions. The Hardware Acceleration Lab in IBM's Software Group is partnering with IBM Research to develop optimized heterogeneous computing solutions. With the creation of the OpenPOWER consortium last year, IBM has created an open ecosystem along with heterogeneous computing platforms that include NVIDIA's Tesla GPUs. GPUs are gaining traction in the Enterprise as accelerators for Big Data Analytics and Cognitive Computing workloads. This session will focus on Industrial case studies and exploitation of GPUs. Some early results will also be shared.

Level: All
Type: Talk
Tags: Big Data Analytics; Machine Learning & Deep Learning; Press-Suggested Sessions: Deep Learning & Computer Vision

Day: Tuesday, 03/17
Time: 15:00 - 15:25
Location: Room 210D
View Recording
View PDF

S5474 - CloudCV: Large-Scale Distributed Computer Vision as a Cloud Service

Dhruv Batra Assistant Professor, Virginia Tech
Dhruv Batra
Dhruv Batra is an Assistant Professor at the Bradley Department of Electrical and Computer Engineering at Virginia Tech, where he leads the VT Machine Learning & Perception group. He is a member of the Virginia Center for Autonomous Systems (VaCAS) and the VT Discovery Analytic Center (DAC). Prior to joining VT, he was a Research Assistant Professor at Toyota Technological Institute at Chicago (TTIC), a philanthropically endowed academic computer science institute located in the campus of University of Chicago. He received his M.S. and Ph.D. degrees from Carnegie Mellon University in 2007 and 2010 respectively, advised by Tsuhan Chen. In past, he has held visiting positions at the Machine Learning Department at CMU, and at CSAIL MIT. His research interests lie at the intersection of machine learning, computer vision and AI, with a focus on developing scalable algorithms for learning and inference in probabilistic models for holistic scene understanding. He has also worked on other topics such as interactive co-segmentation of large image collections, human body pose estimation, action recognition, depth estimation and distributed optimization for inference and learning in probabilistic graphical models. He was a recipient of the Carnegie Mellon Dean's Fellowship in 2007, the Google Faculty Research Award in 2013, the Virginia Tech Teacher of the Week in 2013, the Army Research Office (ARO) Young Investigator Program (YIP) award in 2014, and the National Science Foundation (NSF) CAREER award in 2014. His research is supported by NSF, ARO, ONR, Amazon, Google, Microsoft, and NVIDIA.

In this talk, Attendees can expect to learn about CloudCV, an ambitious system that will provide access to state-of-the-art distributed computer vision algorithms as a cloud service. Our goal is to democratize computer vision; one should not have to be a computer vision, big data and distributed computing expert to have access to state-of-the-art distributed computer vision algorithms. As the first step, CloudCV is focused on object detection and localization in images. CloudCV provides APIs for detecting if one of 200 different object categories such as entities (person, dog, cat, horse, etc), indoor objects (chair, table, sofa, etc), outdoor objects (car, bicycle, etc) are present in the image.

Level: All
Type: Talk
Tags: Computer Vision & Machine Vision; Machine Learning & Deep Learning; Data Center, Cloud Computing & HPC

Day: Tuesday, 03/17
Time: 15:00 - 15:25
Location: Room 210B
View Recording
View PDF

S5631 - Speech: The Next Generation

Bryan Catanzaro Senior Researcher, Baidu
Highly-Rated Speaker
Bryan Catanzaro
Bryan Catanzaro is a research scientist at Baidu's new Silicon Valley Artificial Intelligence Laboratory, working with Adam Coates and Andrew Ng to create next generation systems for deep learning. He came to Baidu from NVIDIA, where he researched tools and libraries for making machine learning more efficient and easier to implement on parallel processors. He earned his Ph.D. from Berkeley where he built the Copperhead language and compiler, which allows Python programmers to use nested data parallel abstractions and gain high efficiency on contemporary parallel platforms.

Speech is the user interface of the future, but today's implementations often fail when we need them the most, such as in noisy environments or when the microphone isn't close at hand. At Baidu, an increasing fraction of our users employ speech interfaces to find what they are looking for. In this talk, I will show how next generation deep learning models can provide state-of-the-art speech recognition performance. We train these models using clusters of GPUs using CUDA, MPI and Infiniband.

Level: Intermediate
Type: Talk
Tags: Machine Learning & Deep Learning; Press-Suggested Sessions: Deep Learning & Computer Vision

Day: Tuesday, 03/17
Time: 15:00 - 15:25
Location: Room 210A
View Recording

S5649 - Heterogeneous HPC, Architectural Optimization, and NVLink

Steve Oberlin CTO, Accelerated Computing, NVIDIA
Steve Oberlin is the CTO for Accelerated Computing at NVIDIA. His 30+ years in HPC include leadership, architecture, and design roles on the Cray-2 and Cray-3 vector supercomputers, and as chief architect of the massively parallel T3D and T3E. He joined NVIDIA in 2013.

The emergence of heterogeneous computing has demonstrated that the highest performance and efficiency can be achieved in a general way by tightly coupling compute engines optimized for latency-sensitive and throughput-oriented operations. This talk will explore heterogeneous node design and architecture and how NVLink, a new scalable node integration channel, enables uncompromising performance on the most demanding applications, using the next-generation DoE CORAL Summit and Sierra supercomputer systems as a case in point.

Level: Beginner
Type: Talk
Tags: Supercomputing; Data Center, Cloud Computing & HPC; Press-Suggested Sessions: HPC & Science

Day: Tuesday, 03/17
Time: 15:00 - 15:50
Location: Room 212B
View Recording
View PDF

S5726 - Tips and Techniques For Efficient and Impressive Animations

Alexander Lehmann Freelance , Filmmaker
Alexander Lehmann
Alexander Lehmann entered the world of graphc design as a level designer and 3D modeler for games. After working as a professional freelancer on a series of well-known games (Unreal Tournament 2004, Crysis) and after finishing his apprenticeship as a Design Assistant his interrest shifted to short films and music videos. In 2005 he began to study at the university of applied sciences in Kaiserslautern which he finished with a Bachelor Degree in 'virtual design'. In 2009 he worked as a digital artist at ImageEngine on “District 9”. Since 2010 he has worked as a freelance filmmaker, directing and animating music videos (Noisia, Hybris) and short films and commercials (Deutsche Telekom).

Creating animated short films, music videos and demos is an extremely complex process. Between a concept and the final release many fields of filmmaking and design needs to be mastered and applied. In this session Alexander will show and explain the workflow which he developed to create impressive animations on a tight budget of both time and funds. We will look at fun and time efficient processes and techniques that allows you to become a self-employed "one man 3D army". Alexander will also cover how he started his animation studio and how the demoscene has always played a role in it.

Level: All
Type: Talk
Tags: NVScene; Real-Time Graphics

Day: Tuesday, 03/17
Time: 15:00 - 15:50
Location: Room LL20A
View Recording

S5737 - VR Everywhere: Consumer Virtual Reality for Desktop, Mobile and Web

Tony Parisi Founder, Third Eye
Highly-Rated Speaker
Tony Parisi is an entrepreneur and career CTO/software architect. He has developed international standards and protocols, created noteworthy software products, and started and sold technology companies. Tony's passion for innovating is exceeded only by his desire to build great products. Tony is a pioneer in virtual reality, the co-creator of the VRML and X3D ISO standards for networked 3D graphics, and continues to innovate in 3D technology. Tony is the co-organizer of the San Francisco WebGL Meetup (http://www.meetup.com/WebGL-Developers-Meetup), and the San Francisco WebVR Meetup (http://www.meetup.com/Web-VR/), and a member of the Khronos COLLADA working group creating glTF (http://www.gltf.gl/), the new file format standard for 3D web and mobile applications. Tony is also the author of O'Reilly Media's books on WebGL: WebGL Up and Running (2012), and Programming 3D Applications in HTML5 and WebGL (2014). Tony is the founder of Third Eye, a San Francisco-based startup developing publishing software for the web, mobile and the new generation of virtual reality systems.

Virtual Reality has taken the computer industry by storm. Developers, artists, end users, educators, advertisers and retailers are flocking by the thousands to realize the decades-long dream of virtual reality for the masses. The combination of GPU acceleration and cheap sensors has enabled low-cost consumer-grade VR, and the rapid adoption of software development kits is paving the way for creating virtual reality apps on platforms from desktops to smartphones, and even running in your web browser using WebGL. Join VR pioneer and WebGL developer Tony Parisi as he explores this exciting frontier. This session will take a look at the latest VR hardware devices, supported operating systems and software development kits, and a wide applications already being deployed.

Level: All
Type: Talk
Tags: Augmented Reality & Virtual Reality; Real-Time Graphics; Developer - Tools & Libraries

Day: Tuesday, 03/17
Time: 15:00 - 15:25
Location: Room LL21C
View Recording
View PDF

S5752 - New GPU Features of NVIDIA's Maxwell Architecture

Alexey Panteleev Developer Technology Engineer, NVIDIA
Alexey Panteleev
to be added later

NVIDIA's GeForce® GTX 900-series GPUs, powered by NVIDIA Maxwell architecture, are the most power-efficient graphics cards on the planet. But Maxwell is also a trove of new and exciting graphics features that can be used to implement effects and techniques not previously possible. In this talk, we'll discuss new functionality enabled by the Maxwell architecture, and examine practical ways to use those features.

Level: Intermediate
Type: Talk
Tags: Real-Time Graphics

Day: Tuesday, 03/17
Time: 15:00 - 15:50
Location: Room LL21B
View Recording
View PDF

S5157 - Synthetic Aperture Radar on Jetson TK1

Massimiliano Fatica Senior Manager, Tesla HPC Performance Group, NVIDIA
Massimiliano Fatica
Massimiliano Fatica is a Senior Manager at NVIDIA in the Tesla HPC Performance and Benchmark Group, where he works in the area of GPU computing (high-performance computing and clusters). Prior to joining NVIDIA, he was a research staff member at Stanford University where he worked on applications for the Stanford Streaming Supercomputer. He holds a laurea in Aeronautical Engineering and a PhD in Theoretical and Applied Mechanics from the University of Rome “La Sapienza”.

This talk will present the details of a Synthetic Aperture Radar (SAR) imaging on the smallest CUDA-capable platform available, the Jetson TK1. The full processing starting from the raw radar data has been implemented using both Octave with CUDA acceleration and CUDA directly. The results indicate that GPU accelerated embedded platforms have considerable potential for this type of workload and in conjunction with low power consumption, light weight and standard programming tools, could open new horizons in the embedded space.

Level: Intermediate
Type: Talk
Tags: Embedded Systems; Video & Image Processing

Day: Tuesday, 03/17
Time: 15:30 - 15:55
Location: Room 210G
View Recording
View PDF

S5339 - GPU Powered VDI Regenerates the Creative Capability of Dr. Who VFX Studio

Barry Daniels Strategic Alliances Manager, Exponential-e
Barry Daniels
Barry holds a degree in Computer Science and has over 20 years of experience in IT. His technology career started with IBM and he has since gone on to design and implement a number of innovations. One of his most significant achievements is the development of a Zoom technology source code, which is used by smart phones including Apple iPhones. Barry joined Exponential-e in September 2013 as Strategic Alliances Manager and has been instrumental in delivering the company's go-to-market strategy for cloud virtualization. Barry is a keen gamer, which conveniently links to his latest area of expertise, DaaS GPU.

Discover how to set creativity free from infrastructure and location restrictions, with applications such as Maya and NUKE running on virtual machines in the cloud, powered by NVIDIA GRID™ technology. The live demonstration will reveal the power of low-latency network over public internet, with access to files anywhere in the world. We'll seamlessly access power-hungry graphics files stored in a London VFX studio, demonstrating a solution that has the flexibility and performance of a local desktop. At the end of the session, you will understand the value of no longer being restricted to desk-bound workstations and confident that security and privacy of your creative files will remain in your hands.

Level: All
Type: Talk
Tags: Media & Entertainment; Graphics Virtualization; Press-Suggested Sessions: Professional Graphics

Day: Tuesday, 03/17
Time: 15:30 - 15:55
Location: Room LL21D
View Recording
View PDF

S5362 - A GPU-Accelerated 3D Kinematic Modeling Platform for Behavioral Neuroscience

John Long Post-doctoral Researcher, New York University Langone Medical Center
John Long
John is a postdoctoral researcher in the laboratory of Dr. György Buzsáki at the New York University Langone Medical Center. He received his PhD in neuroscience from the UC Berkeley Helen Wills Neuroscience Institute in 2011, in the Brain-Machine Interface laboratory of Dr. Jose Carmena. His current work in neuroscience leverages multiple camera photogrammetry and the power of GPUs to build 3D models of his neurophysiological subjects to study the relationships between memory formation in the brain, navigation, and action planning. He is also working within the clinical domain to develop a computer vision system for behaviorally diagnosing Parkinson's disease.

Computer vision techniques for 3D reconstruction and kinematic modeling are positioned to bring about a major advance in the field of behavioral neuroscience. Integrating GPUs into the software pipeline has qualitatively improved our ability to fit, inspect, and refine complex kinematic models. Our custom markerless motion capture system, in conjunction with our use of high-density silicon neural implants (≥ 100 channels), provides an unprecedented glimpse into the relationship between the brain, memory, and behavior.

Level: Intermediate
Type: Talk
Tags: Computer Vision & Machine Vision; Life & Material Science; Developer - Algorithms; Press-Suggested Sessions: Deep Learning & Computer Vision; Press-Suggested Sessions: HPC & Science

Day: Tuesday, 03/17
Time: 15:30 - 15:55
Location: Room 210B
View Recording
View PDF

S5381 - OmpSuperscalar: Task-Parallel Simulation and Visualization of Crowds with Several CPUs and GPUs

Hugo Pérez PhD Student, Barcelona Supercomputing Center - CUDA Center of Excellence
Hugo Pérez
Hugo is a Phd. Student at Universitat Politècnica de Catalunya. His research interests are real-time crowd simulation, high-performance computing, parallel programming models and computer graphics.
Benjamin Hernandez Researcher, Barcelona Supercomputing Center - CUDA Center of Excellence
Dr. Hernández is a postdoctoral researcher at Barcelona Supercomputing Center in Spain. Previously he was full time professor at Tecnologico de Monterrey Campus Ciudad de Mexico, Mexico from 2010 to 2012. As a professor, he has advised postgraduate thesis on real-time crowd and traffic simulation, and human computer interaction and co-directed the NVIDIA CUDA Teaching Center Initiative from 2010 to 2013. He currently holds a National Research System Fellowship (SNI-C) from the Mexican Research Council (CONACyT), México. Dr. Hernández focuses his research interests at the intersection of real time crowd simulation, human computer interaction and visualization of inhabited virtual environments using high performance computing. He has published research papers on these topics in international journals, conference venues and book chapters.
Isaac Rudomin Senior Researcher, Barcelona Supercomputing Center - CUDA Center of Excellence

Industry trends in the coming years in the race to exascale imply the availability of cluster computing with hundreds to thousands of cores per chip. Programming presents a challenge due to the heterogeneous architecture. Using novel programming models that facilitate this process is necessary. In this talk we present the case of simulation and visualization of crowds. We analyze and compare the use of two programming models: OmpSs and CUDA and show that OmpSs allows us to exploit all the resources combining the use of CPU and GPU taking care of memory management, scheduling, communications and synchronization automatically. We will present experimental results obtained in the Barcelona Supercomputing Center GPU Cluster as well as describe several modes used for visualizing the results.

Level: All
Type: Talk
Tags: Visualization - In-Situ & Scientific; Supercomputing; Real-Time Graphics; Data Center, Cloud Computing & HPC

Day: Tuesday, 03/17
Time: 15:30 - 15:55
Location: Room LL21C
View Recording
View PDF

S5419 - Implementing Graph Analytics with Python and Numba

Siu Kwan Lam Software Engineer, Continuum Analytics, Inc
Siu Kwan Lam
Siu Kwan Lam has a B.S.+M.S. degree in Computer Engineering from San Jose State University. He taught CUDA at San Jose State University during his senior year and has researched TCP covert channel detection for NSF, STC, and TRUST. At Continuum Analytics, he is a developer of Numba and NumbaPro.
Stanley Seibert Scientific Software Developer, Continuum Analytics, Inc
Stanley  Seibert
Dr. Stanley Seibert has a Ph.D. in experimental high energy physics from the University of Texas at Austin. He performed his postdoctoral research at Los Alamos National Laboratory and University of Pennsylvania on experiments studying neutrinos and searching for dark matter. Stan has been evangelizing the use of Python and GPU computing in high energy physics since 2007, and has worked on a number of applications using Python, C++ and CUDA, including maximum likelihood parameter estimation in large data sets, Monte Carlo optical simulations, and information-theoretic approaches to experiment design. Prior to joining Continuum Analytics, Stan was Chief Data Scientist at Mobi.

We demonstrate how to implement the densest k-subgraph algorithm by Papailiopoulos et al, using the Numba CUDA compiler for Python. With the rise of social networks, more data scientists want to study the connections within and between the communities that dynamically organize on the Internet. Python is a very productive language for data scientists, but, on its own, may not provide the performance needed to analyze big data sets. To bridge this gap, the Numba compiler allows CUDA kernels to be written directly in the Python language and compiled for GPU execution. Using the densest k-subgraph algorithm as an example, we will show how the agility of Python can be combined with the high performance of GPU computing for graph analytics.

Level: All
Type: Talk
Tags: Defense; Developer - Algorithms; Big Data Analytics; Developer - Programming Languages

Day: Tuesday, 03/17
Time: 15:30 - 15:55
Location: Room 210C
View Recording
View PDF

S5422 - E4-ARKA: ARM64+GPU+IB is Now Here (Presented by E4 Computer Engineering)

Piero Altoè Project Manager, E4 Computer Engineering
Piero Altoè is Project Manager of the E4 Computer Engineering ARM project. He is working on ARM solution for HPC since 2012, collaborating with many research center and industries. He received his PhD in chemical physics at the university of Bologna in 2007, and he is working in E4 since 2011.

E4 Computer Engineering introduces ARKA the first server solution based on ARM 64 bit SoC dedicated to HPC. The compute node is boosted by discrete GPU NVIDIA cards K20, and have both 10Gb ethernet and FDR infiniband networks implemented by default. In the present talks the hardware configuration of the compute node is described in detail, and to demonstrate the unique capabilities of the ARM+GPU+IB combination, many synthetic benchmarks and application tests are reported, with particular attention to molecular dynamics software.

Level: All
Type: Talk
Tags: Data Center, Cloud Computing & HPC; Supercomputing; Life & Material Science

Day: Tuesday, 03/17
Time: 15:30 - 15:55
Location: Room 210D
View Recording
View PDF

S5760 - Real-Time, Content-Driven Representations at Twitter

Clement Farabet Senior Software Engineer, Twitter
Clement Farabet
Clement Farabet is a senior software engineer at Twitter, where he leads the effort on representation learning for all things Twitter. Clement Farabet received a Master’s Degree in Electrical Engineering with honors from Institut National des Sciences Appliquées (INSA) de Lyon, France in 2008. His Master’s thesis work on reconfigurable hardware for deep neural networks was developed at the Courant Institute of Mathematical Sciences of New York University with Professor Yann LeCun, and led to a patent. He then joined Professor Yann LeCun’s laboratory in 2008, as a research scientist. In 2009, he started collaborating with Yale University’s e-Lab, led by Professor Eugenio Culurciello. This joint work later led to the creation of TeraDeep (www.teradeep.com). In 2010, he started the PhD program at Université Paris-Est, co-advised by Professors Laurent Najman and Yann LeCun. His thesis focused on real-time image understanding/parsing with deep convolutional networks. The main contributions of his thesis were multi-scale convolutional networks and graph-based techniques for efficient segmentations of class prediction maps. He graduated in 2013, and went on to cofound Madbits, a company that focused on representing, understanding and connecting images. Madbits got acquired by Twitter in 2014.

Twitter is a unique source of real-time information, offering amazing opportunities for automatic content understanding. The format of this content is diverse (tweets, photos, videos, music, hyperlinks, follow graph, ...), the distribution of topics ever-changing (on a weekly, daily, or sometimes hourly basis), and the volume ever-growing; making it very challenging to automatically and continuously expose relevant content. Manually defining features to represent this data is showing its limits. In this talk, I provide an overview of how automated, content-driven representations—enabled by modern deep-learning algorithms—enables us to build adaptive systems which capture the richness of this content. Specifically, the presentation focuses on deep representations for images and images+text.

Level: All
Type: Talk
Tags: Machine Learning & Deep Learning; Computer Vision & Machine Vision; Press-Suggested Sessions: Deep Learning & Computer Vision

Day: Tuesday, 03/17
Time: 15:30 - 15:55
Location: Room 210A
View Recording

S5870 - Audi Piloted Driving: In the Fast Lane to the Future

Daniel Lipinski Senior Engineer, Audi of America
Daniel Lipinski
Daniel started working for Audi in 2008 as the lead developer for the European Traffic Sign Recognition system. In 2012 he joined the Volkswagen Electronics Research Lab (ERL) in Silicon Valley, where he led the application of several driver assistance systems to the U.S. market. Daniel is now the project lead with one of the most comprehensive Volkswagen Group and Audi research projects for piloted driving. One of his project cars is “Jack”, the Audi piloted driving concept car that successfully completed the 550 miles automated driving road test from Silicon Valley to Las Vegas. Lipinski studied Computer and Communications Systems Engineering at the Technical University in Braunschweig, Germany.

On the eve of CES 2015, Audi, ERL and VW Group Research accomplished the most dynamic automated driving road test yet, with non-engineers behind the wheel over 550 miles+ on public freeways. With the advanced Highway Pilot technology built into a car nicknamed "Jack", Audi demonstrated how far the automated technology has matured within the last decade. What enabled such complex technology is the massive growth in processing power, a field in which NVIDIA processors will be playing a central role in the future.

Level: All
Type: Talk
Tags: Automotive; Computer Vision & Machine Vision; Video & Image Processing; Press-Suggested Sessions: Cars

Day: Tuesday, 03/17
Time: 15:30 - 15:55
Location: Room LL21F
View Recording
View PDF

S5118 - Impressions: The Global Impact of Culture, Imagery and Visual Communication

Don Levy President & Cultivator, Smith Brook Farm
Don  Levy
Don Levy has been at the forefront of the entertainment industry's digital transformation, developing "the intersection of entertainment and technology" throughout his career and at Sony Pictures Entertainment (Columbia Pictures/Sony Pictures Digital) from 1995-2012. He founded Smith Brook Farm in 2012 as a creative consultancy and is also the co-founder of Spud Media, LLC, a new entertainment venture serving the family market. Levy attended New York University, received his B.A. from the University of Denver and earned certificates from UCLA's Anderson School of Business. Don is a member of the Academy of Motion Picture Arts & Sciences, serving on its feature animation nominating committee and recently chaired a working group for the Science and Technology Council. He also is a member of The Television Academy's Interactive Peer Group, The Visual Effects Society, ASIFA Hollywood, the International Photographers Guild and METAL, the Media, Entertainment and Technology Alpha Leaders organization. Levy is a frequent speaker on the subjects of innovation, digital creativity, education and visual effects. His 2012 talk on the principles and evolution of visual effects at the TED Conference in Long Beach, CA was posted on TED.com in January 2013. He is active in local education issues and organizes TEDxConejo in association with the Conejo Valley (Thousand Oaks, Ca) Unified School District.

We are what we see. The question is how does what we see influence our lives and the lives of future generations? We live in a visual world. This has brought us closer together and enabled people everywhere to share everything from the latest pop culture phenomenon to the most catastrophic news. Infographics and animation explain every subject. From an early age, he's appreciated the power of images to move people. Today, the line between fact and fiction is virtually gone. Many of the images that impressed me in my most formative years were of dreams and hope and aspiration. Others made me think. With a curiosity born of my Hollywood experience in the dream factory and thinking back on how the pictures of my own youth continue to influence, I'll share with you some thoughts and ideas

Level: All
Type: Talk
Tags: Media & Entertainment; Computer Vision & Machine Vision; Augmented Reality & Virtual Reality; Developer - Performance Optimization

Day: Tuesday, 03/17
Time: 16:00 - 16:25
Location: Room LL21D
View Recording
View PDF

S5135 - GPU-Driven Large Scene Rendering in OpenGL

Christoph Kubisch Senior Developer Technology Engineer, NVIDIA
Highly-Rated Speaker
Christoph Kubisch
Christoph Kubisch is a Senior Developer Technology Engineer for NVIDIA Corporation, where he focuses on OpenGL real-time rendering techniques suitable for CAD/DCC and scientific applications. He collaborates with external partners and NVIDIA's internal teams to optimize current and future rendering algorithms. Prior to joining NVIDIA, Christoph was a researcher on hardware accelerated visualization techniques for medical datasets at the Otto-von-Guericke University of Magdeburg. Furthermore, he has worked as technical artist creating game art, technology and polygon modeling tools.
Pierre Boudier Quadro Software Architect, NVIDIA
Pierre Boudier is a Software Architect for NVIDIA Corporation, where he focuses on OpenGL for the Quadro product line. Prior to NVIDIA, Pierre spent 10 years leading the software OpenGL at AMD, including a full rewrite of the OpenGL driver stack, and was responsible for representing software requirements for new Hardware architectures, with a special emphasis on performance/mm2 and performance/watt.

We will present the latest OpenGL technology from NVIDIA (NV_command_list) and rendering algorithms to render large scenes, typically found in CAD/DCC applications. Through the use of new powerful OpenGL extensions, the GPU can be leveraged very efficiently to do more work autonomously of the CPU. We provide algorithms and usage scenarios for scenes made out of many parts (millions) including GPU creating its own work for rendering (occlusion culling) and transformation updates. The data management allows to minimize data transfers and a high flexibility to make changes to the scene, so that interactive editing and viewing of large data sets is possible.

Level: Intermediate
Type: Talk
Tags: Real-Time Graphics; Developer - Performance Optimization; Developer - Algorithms; Rendering & Ray Tracing

Day: Tuesday, 03/17
Time: 16:00 - 16:50
Location: Room LL21B
View Recording
View PDF

S5154 - Parallel Breadth First Search on GPU Clusters

Bryan Thompson Chief Scientist, SYSTAP, LLC
Bryan  Thompson
Mr. Bryan Thompson (SYSTAP, LLC) has 30+ years experience as a technologist, inventor and researcher in cloud computing and big data. He leads SYSTAP's research team investigating GPU-accelerated distributed architectures for graph databases and graph mining, which, together with the SCI Institute, in 2014 first published results for executing Breadth-First Search on a cluster of 64 GPUs at up to 30 billion traversed edges per second. He is the lead architect for BlazeGraph®, an open source distributed graph database used by Fortune 500 companies, including EMC, Autodesk, and Yahoo!. He has over 30 years experience related to cloud computing; graph databases; the semantic web; web architecture; relational, object, and RDF database architectures; knowledge management and collaboration; artificial intelligence and connectionist models; natural language processing; metrics, scalability studies, benchmarks and performance tuning; decision support systems. He is an expert in Java, C, C++ with an emphasis on concurrent programing. He is the co-founder and Chief Scientist of SYSTAP, LLC. Previous positions include co-founder, President and CTO of GlobalWisdom, Inc., and Executive Vice President and Senior Scientist with Cognitive Technologies, Inc.

The goal of this session is to demonstrate our work on scalable and high-performance BFS on GPU clusters. Our proposed implementation achieves over 30 billion edges traversed per second on a cluster of 64 GPUs. The SIMT architecture of the GPUs, the imbalance of the GPU memory and the communication bandwidths, and the irregularity nature of the graphs make it difficult to develop efficient scalable graph analytics programs. In this session, we present the secret ingredients of our BFS implementation that help us overcome those difficulties and achieve high performance and scalability. We also show the performance and scalability characteristics of our implementation with a wide range of synthetic and real-life graphs. This is a collaborative work with Dr. Martin Berzins and Harish Kumar Dasari from the University of Utah.

Level: Intermediate
Type: Talk
Tags: Defense; Big Data Analytics; Developer - Performance Optimization

Day: Tuesday, 03/17
Time: 16:00 - 16:25
Location: Room 210C
View Recording
View PDF

S5185 - A CUDA Implementation of the High Performance Conjugate Gradient (HPCG) Benchmark

Everett Phillips HPC Software Engineer, NVIDIA
Everett works on HPC applications in the Tesla Performance Group at NVIDIA.

This talk will present the details of a CUDA implementation of the HPCG benchmark, including key optimization strategies and performance results on a wide range of GPU systems: from the smallest CUDA capable platform - the Jetson TK1, to the largest GPU supercomputers - Titan (Cray XK7 at ORNL) and Piz Daint (Cray XC30 at CSCS). HPCG was recently proposed as a complement to the High Performance Linpack (HPL) benchmark currently used to rank supercomputers in the Top500 list. HPCG solves a large sparse linear system of equations using a multigrid preconditioned conjugate gradient algorithm, and is designed to represent modern application workloads.

Level: Intermediate
Type: Talk
Tags: Supercomputing; Data Center, Cloud Computing & HPC; Developer - Algorithms

Day: Tuesday, 03/17
Time: 16:00 - 16:25
Location: Room 212B
View Recording

S5212 - High Performance Indexing of Large Data Sets Using GPU

Massimo Bernaschi Professor, National Research Council of Italy
Highly-Rated Speaker
Massimo Bernaschi
Massimo Bernaschi is with CNR, the National Research Council of Italy as Chief Technology Officer of the Institute for Applied Computing. He is also an Adjunct Professor of Systems Programming at "Sapienza" University in Rome; Trainer in Digital Forensics at "Sapienza" and Modena Universities. Before joining CNR in 1998, Massimo worked ten years at the IBM European Center for Scientific and Engineering Computing where he developed the IBM PVMe product and received two Outstanding Technical Achievement Awards. His main scientific interests are parallel computing; modelling of complex systems (finance and biology); systems and network security; high performance computing. He is the author of about 150 papers in peer-reviewed journals and international conferences. In 2012 Massimo Bernaschi has been named "CUDA Fellow".

Learn how to use multi-GPU and CUDA to speed-up text analysis, indexing and searching of textual data. We present a new framework to index large data sets of heterogeneous data. Our approach is based on a combination of of HPC techniques aimed at improving efficiency and reliability of the indexing process.The solution we propose is scalable and exploits in-memory computing to minimize I/O operations and enhance performance. Moreover we describe the CUDA-based parallelization of the most compute-intensive tasks involved in the indexing process. The integration of the CUDA components within an architecture that is mostly Java-based led us to develop a technique for Java-CUDA interoperability that can be applied to other applications. Some visualisation results will also be presented.

Level: All
Type: Talk
Tags: Big Data Analytics; Developer - Algorithms; Data Center, Cloud Computing & HPC

Day: Tuesday, 03/17
Time: 16:00 - 16:25
Location: Room 210D
View Recording
View PDF

S5307 - NVIDIA IndeX the In-Situ Visualization Software Merges Compute Cycles with Graphics Cycles

Tom-Michael Thamm Director, Software Product Management, NVIDIA
Tom-Michael Thamm
Mr. Thamm is Director for Software Product Management at NVIDIA Advanced Rendering Center (ARC) in Berlin Germany and is responsible for all software products, such as NVIDIA mental ray, NVIDIA iray, and NVIDIA indeX. He is managing and coordinating with his team the customer support as well as the general product definition and positioning. Mr. Thamm is working for NVIDIA ARC and before for mental images for over 24 years. He has led several key software projects and products, such as the new NVIDIA indeX product for in-situ visualization of huge datasets. He has studied Mathematics.
Marc Nienhaus Sr. Manager, NVIDIA IndeX Product, NVIDIA
Marc leads the NVIDIA IndeX product development and its adoptions in application domains. Since NVIDIA IndeX's advent in 2007 he has been responsible for the research, development and software architecture. In close collaboration with Tom-Michael Thamm he has established NVIDIA IndeX as the solution for in-situ visualisation for large-scale data. Marc holds a PhD in computer science from the Hasso Plattner Institute at the University of Potsdam and holds a Master in Mathematics from the University of Muenster.
Mahendra Roopa Solution Expert, NVIDIA IndeX, NVIDIA
Mahendra Roopa works as Solutions Expert for NVIDIA IndeX. His role involves exploring and trying out new domains in the area of 3D visualization by building prototypes and plugins based on NVIDIA IndeX. He also supports customers with integrating NVIDIA IndeX into their visualization pipelines. Mahendra holds a Masters degree in computer graphics from University of Bonn and a bachelors degree from Visvesvaraya Technological University, Bangalore.

A technical overview about NVIDIA IndeX as an in-situ technology will be presented. In addition, we describe an interactive workflow between compute and graphic cycles within NVIDIA IndeX. A real-time live demo on the NVIDIA VCA cluster using more than 1 TB of scientific data is underlining the power of the in-situ technology.

Level: All
Type: Talk
Tags: Visualization - In-Situ & Scientific; Data Center, Cloud Computing & HPC; Graphics Virtualization; Press-Suggested Sessions: HPC & Science

Day: Tuesday, 03/17
Time: 16:00 - 16:50
Location: Room LL21C
View Recording

S5342 - Outscale/Dassault Cloud Implementation with NVIDIA GRID and Cisco UCS (Presented by Cisco)

Laurent Seror Founder & CEO, Outscale
Laurent Seror graduated in 1996 and cut his IT teeth with the deployment of a system of networks and business applications for his squadron as part of his military service. In 1997, he set up the web hosting company AGARIK with two partners. In 1999, his Internet hosting career continued with providing support to AMEN for all its server, network, domain name reservation and online payment infrastructures. In 2006, AGARIK acquired SOFT2YOU, a leading provider of hosted applications, and joined the outsourcing division of BULL. In 2008, Laurent Seror set up the network operator BULL PI, and the first terrestrial network to use RAMAN technology to reduce latency. He also implemented the first AGARIK Cloud offer. In 2009, he was appointed to head up the Network & Security unit and merged the network, storage and servers activities to offer companies convergent offers compatible with the latest technological advances.In March 2010, Laurent Seror joined Dassault Systèmes as its Head of Cloud Computing, with responsibility for developing its SaaS, PaaS and IaaS offers. In October 2010, he founded Outscale at the initiative of Dassault Systèmes. His key goal was to develop and launch the ultimate IaaS Cloud offer in the French and international markets. Laurent Seror is a graduate of the Université Pierre et Marie Curie Paris VI and the Université Claude Bernard in Lyon I, and holds an engineering degree in industrial chemistry from the Ecole supérieure de Chimie Physique Electronique in Lyon.

Customer success story for Nvidia GRID and Cisco UCS.

Level: All
Type: Talk
Tags: Graphics Virtualization

Day: Tuesday, 03/17
Time: 16:00 - 16:50
Location: Room LL20D
View Recording

S5373 - GPU + Drones + 3D Imaging for Precision Farming and Construction

Bingcai Zhang Tech Fellow, BAE Systems
Bingcai Zhang
Dr. Zhang is a technical fellow at BAE Systems, the premier global defense and aerospace company. He joined BAE Systems in September 1995 right out of University of Wisconsin-Madison, where he earned his Ph.D. in engineering college and MS in computer science. His research interests are: (1)geospatial information technology and 3D mapping; (2)robot vision and unmanned systems; and (3)3D geoweb search. He has held positions as chief architect, chief photogrammetrist, R&D manager, and technical fellow with BAE Systems. Dr. Zhang has three inventions: (1)Embedded Photogrammetry, (2)Next Generation Automatic Terrain Extraction (NGATE), and Automatic 3D Object Extraction. Embedded photogrammetry is a concept to embed a precise 3D measurement component called photogrammetry into non-photogrammetry applications such as GIS and CAD. NGATE generates 3D terrain model from stereo images. AFE is a production capable system that automatically extracts 3D objects such as houses, buildings, trees from a digital surface model or LiDAR point clouds.

Agriculture and construction are two of the largest industries in the world. Democratization of 3-D imaging technology with drones, digital cameras, and GPU is applicable for precision farming and construction. Precision farming can increase crop yields, reduce pollution, save water, and increase productivity. The demand for precision farming has since increased, however, with more people living on planet Earth with fixed natural resources. Timely precise 3-D measurements are important for construction. Today, most of these 3-D measurements are obtained manually. BAE Systems is developing GPU-accelerated 3-D imaging technology with drone images for precision farming and construction.

Level: All
Type: Talk
Tags: Computer Vision & Machine Vision; Video & Image Processing; Press-Suggested Sessions: Deep Learning & Computer Vision

Day: Tuesday, 03/17
Time: 16:00 - 16:25
Location: Room 210B
View Recording
View PDF

S5472 - Real-Time Data Compression for Mass Spectrometry

Jose de Corral Principal Consulting Engineer, Waters Corporation
Jose de Corral
Jose de Corral received his B.S. in Electrical Engineering from Universidad Politénica de Madrid, and his M.S. in Software Engineering from Harvard University. Jose has a long career at Waters, where he started in 1983. He has been involved in many R&D design projects, specializing in analog electronic design, feedback control systems, and embedded software development. Jose’s preferences evolved toward the design of complex algorithms for data processing and instrument control. Since 2007, his main focus has been in Computer Graphics and GPU Computing.

Learn how the GPU enables a technique to perform Mass Spectrometry data compression in real-time. Mass Spectrometry data is large and it is getting larger with every new generation of instruments. This presents a serious problem of data storage. The GPU performs this data compression algorithm in real-time while the data is being acquired by the instrument, resulting in less data reaching the file system, and a reduced post-acquisition data processing time. Given the amount of computation involved, typically trillions of floating point operations, a conventional CPU solution cannot keep up with real-time acquisition.

Level: Intermediate
Type: Talk
Tags: Life & Material Science

Day: Tuesday, 03/17
Time: 16:00 - 16:25
Location: Room 212A
View Recording
View PDF

S5599 - Gesture Recognition: Using a Multi Sensor Approach

Shalini Gupta Senior Research Scientist, NVIDIA Research
Shalini Gupta
Shalini Gupta is a Senior Research Scientist at NVIDIA Research. Formerly, she was a Senior Mobile Computer Vision Engineer at NVIDIA, and an Imaging Scientist at Texas Instruments. Shalini received her Doctoral degree in Electrical and Computer Engineering from the University of Texas at Austin in 2008.

For accurate and power-efficient in-vehicle hand-gesture recognition, a novel multi-sensor system can be comprised of a short-range radar, a color camera, and a depth camera, which together make the system robust against variable lighting conditions. To jointly calibrate the radar and depth sensors, a convolutional deep neural networks can fuse data from multiple sensors and to classify the gestures. This algorithm accurately recognizes 10 different gestures acquired indoors and outdoors in a car during the day and at night, consuming significantly less power than purely vision-based systems.

Level: All
Type: Talk
Tags: Automotive; Computer Vision & Machine Vision

Day: Tuesday, 03/17
Time: 16:00 - 16:25
Location: Room LL21F
View Recording
View PDF

S5643 - Advanced Rendering Solutions from NVIDIA

Phillip Miller Director of NVIDIA Advanced Rendering Products, NVIDIA
Highly-Rated Speaker
Phillip Miller
Mr. Miller directs NVIDIA's Advanced Rendering offerings, ranging from the Iray and mental ray shipping within leading products in Design and Entertainment to the OptiX ray tracing framework used extensively within private and commercial applications. He has led leading software products for 20 years, including the 3D animation efforts at Autodesk and the Web Design products at Adobe. He holds a Masters of Architecture from the University of Illinois and is a registered architect.

Learn about the latest breakthroughs and offerings in NVIDIA's Advanced Rendering Solutions, which scale smoothly from local GPU rendering to remote super-computer clusters. New capabilities and possibilities in Iray® and mental ray® will be explored and demonstrated, along with what's possible with the latest in NVIDIA OptiX™ for accelerating custom ray tracing development. Industry trends and production examples will also be explored as advanced in both interactive and production rendering possibilities continue to revolutionize workflows.

Level: Beginner
Type: Talk
Tags: Rendering & Ray Tracing; Media & Entertainment; Product Design & Styling; Press-Suggested Sessions: Professional Graphics

Day: Tuesday, 03/17
Time: 16:00 - 16:50
Location: Room LL21E
View Recording

S5811 - TK1-Based Solutions for Intelligent Video Analytic Applications

Hai Tao CEO, Beijing Vion Technology Inc. (BVT)
Hai  Tao
Hai Tao is the founder and CEO of Beijing Vion Technology Inc. He has 25 years of experience in image processing and computer vision. Prior to BVT, he was an associate professor at UC Santa Cruz. Dr. Tao received his Ph.D. degree from University of Illinois at Urbana Champaign.

This talk demonstrates how GPU-based embedded computer vision system are transforming the world of video processing in several vertical markets including ATM safety, intelligent transportation systems (ITS), business intelligence (BI), and smart video surveillance. By taking full advantage of the TK1's 300+ GFLOPS computing power, BVT has built and deployed embedded systems for people counting, shopping traffic gender and age analysis, perimeter monitoring, violence and chasing detection, and ATM service area protection. These application systems require development of custom-made computer vision algorithms and efficient implementation of these algorithms in GPU. In addition, we will also demonstrate how the world's first TK1-based smart cameras are being developed for various applications including license plate recognition, face recognition and crowd management. Compared to the previous DSP-based smart camera solution, the powerful embedded GPU-based solution is the first that can support imaging sensor resolution up to 12 mega-pixels. The talk will also provide technical details on Cuda implementation of several computer vision algorithms.

Level: All
Type: Talk
Tags: Embedded Systems; Computer Vision & Machine Vision

Day: Tuesday, 03/17
Time: 16:00 - 16:25
Location: Room 210G
View Recording

S5872 - Worlds Collide: What Happens When VDI Meets GPU? (Presented by Citrix)

Gunnar Berger Chief Platform Strategist and CTO, Citrix
Gunnar Berger
Gunnar Berger is the Chief Technology Officer for Citrix's Desktop and Applications group. Previously Gunnar was a Research Director at Gartner working in the Technical Professionals research team. He covered server and client virtualization and private/public delivered desktops (DaaS). At Gartner, Mr. Berger spent considerable time with end-user organizations, advising them on architecture and best practices for both server virtualization and desktop transformation initiatives. Mr. Berger has worked with client virtualization technologies since 1999 and is recognized as a thought leader in the end-user computing space. Mr. Berger has spent the majority of his career as a specialized consultant focused in what is now called end-user computing. He specializes in desktop virtualization, which includes DaaS, storage, networking, server, personalization and virtualization technologies.

As Windows and productivity applications like Microsoft office become more and more graphically aware and graphics rich applications demand mobility, security and increased accessibility, providing employees with graphical processing to scale and economically is more critical than ever. In this session attendees will learn how to break free from the traditional physical GPU per user model. Virtualizing the GPU in a hosted environment with XenApp and XenDesktop opens a wide range of new mobile workstyles, cloud, and DaaS based offerings for a wider range of employees. We will also dive into the security and compliance benefits of keeping your sensitive apps and data where you have the strongest control over them; your data center. Come armed with your questions and use cases. No topic is too controversial for our experts.

Level: All
Type: Talk
Tags: Graphics Virtualization; Press-Suggested Sessions: Professional Graphics

Day: Tuesday, 03/17
Time: 16:00 - 16:50
Location: Room LL20C
View Recording
View PDF

S5873 - Optimized GPU Kernels for Deep Learning

Amir Khosrowshahi CTO and Co-Founder, Nervana Systems
Amir is CTO of Nervana Systems, a startup bringing unprecedented performance and scale to deep learning. He has a Ph.D. from UC Berkeley, and MA and BA from Harvard.

Deep learning has recently achieved great success in domains such as images, speech, and text. These gains have been made possible by efficient GPU implementations such as cuDNN. We show optimizations at the assembly level that result in significant performance improvements over existing methods. In particular, we show how operations such as convolutions and dense matrix multiply can be efficiently implemented using a custom assembler to attain state-of-the-art performance on the NVIDIA Maxwell GPU architecture. Additionally, we can significantly reduce memory bandwidth and run much larger models by using limited precision with a minimal tradeoff in model accuracy.

Level: Beginner
Type: Talk
Tags: Machine Learning & Deep Learning; Computer Vision & Machine Vision; Developer - Performance Optimization

Day: Tuesday, 03/17
Time: 16:00 - 16:25
Location: Room 210A
View Recording
View PDF

S5296 - Memory-Efficient Heterogeneous Speech Recognition Hybrid in the GPU-Equipped Mobile Devices

Alexei V. Ivanov CTO, Verbumware Inc.
Alexei V. Ivanov
Alexei V. Ivanov has a background in engineering and computer science. He received his Ph.D. in Theoretical Foundations of Computer Science in 2004 from Belarussian State University of Informatics and Radioelectronics. He also holds a MSc degree in Electrical Engineering from Moscow Institute of Physics and Technology (State University). He has working experience both in academia (University of Trento, Moscow Institute of Physics and Technology) and industry (Pearson Knowledge Technologies, USA; Speech Technology Center, Russia; Lernout & Hauspie Speech Products NV, Belgium). Alexei has broad experience in speech processing and recognition systems. His current research interests include adaptive conversational machines; web-integration of individual multimedia experiences; speech characterization technology; integration of para-linguistic knowledge into the process of speech recognition and interpretation.

Weighted Finite State Transducer (WFST)-based speech recognition systems permit their efficient implementation within the GPU computational paradigm. Our previous research has shown that speech recognition with GPUs can be done in a fast, accurate and power-efficient manner. However, completely compiled non-trivial WFSTs are too bulky to fit into a memory footprint of a typical mobile device. This problem represents the most fundamental obstacle in front of proliferation of the autonomous mobile-based speech recognition technology. In this presentation we're going to demonstrate a way to overcome this difficulty. A Tegra K1 device equipped with 2 GB of RAM will do autonomous recognition of English speech in a mid-sized vocabulary (20K words) task, defined by a tri-gram language model.

Level: All
Type: Talk
Tags: Signal & Audio Processing; Machine Learning & Deep Learning

Day: Tuesday, 03/17
Time: 16:30 - 16:55
Location: Room 210C
View Recording
View PDF

S5566 - GPU Errors on HPC Systems: Characterization, Quantification and Implications for Architects and Operations

James Rogers Director of Operations, Oak Ridge Leadership Computing Facility (OLCF), Oak Ridge National Laboratory (ORNL)
Highly-Rated Speaker
James Rogers
As NCCS Director of Operations, Jim Rogers is responsible for managing day-to-day operation of systems and developing plans for future generations of systems and infrastructure. Rogers has broad experience in high-performance computing, having provided strategic-planning, technology-insertion, and integration support for multiple computing centers, including those at the U.S. Army Corps of Engineers Engineer Research and Development Center, the Aeronautical Systems Center, the National Aeronautics and Space Administration (NASA) Goddard Space Flight Center, the NASA Ames Research Center, the Defense Intelligence Agency, and the Alabama Supercomputer Center. He comes to the center from Computer Sciences Corporation in Huntsville, Alabama, where he was a principal solutions architect. He has also been part of the Supercomputing (SC) series of conferences, most recently as the executive director for SC05 and as a member of the SC Steering Committee.

Titan, the world's #1 Open Science Supercomputer, consists of more than 18,000 GPUs that scientists from various domains such as astrophysics, fusion, climate, and combustion use routinely to run large-scale simulations. Unfortunately, while the performance efficiency of GPUs is well understood, their resilience characteristics in a large-scale computing system have not been fully evaluated. We present a detailed study to provide a thorough understanding of GPU errors on a large-scale GPU-enabled system. Our data spans more than 18 months, gathered on the Titan supercomputer at the Oak Ridge Leadership Computing Facility. We present several findings from our field data and discuss the implications of our results for future GPU architects, current and future HPC centers.

Level: All
Type: Talk
Tags: Supercomputing; Data Center, Cloud Computing & HPC

Day: Tuesday, 03/17
Time: 16:30 - 16:55
Location: Room 212B
View Recording
View PDF

S5658 - Single CUDA Block Implementation of Time Synchronous Viterbi Search for Speech Recognition

Nigel Cannings Chief Technical Officer, Chase Information Technology Services Limited
Nigel Cannings
Nigel Cannings was educated in England at Brentwood School, and in the USA at Milton Academy in Boston, qualifying as a solicitor in 1993, and has worked for some of world's largest law firms and software companies. In 2004 Nigel moved out of law and into the business sector, making a number of investments into small software companies. He founded Docusite as a research and development vehicle for advanced natural language processing and voice recognition technology, gaining prestige clients, including AXA Investment Managers, before merging with Chase ITS in 2009 to take advantage of Chase's wide range of insurance clients. Nigel contributes regularly to a number of publications, including the Huffington Post, and the Global Legal Post, featured on the front page of the Wall Street Journal and in their video looking at the advanced techniques used by Intelligent Voice to track trader wrongdoing as well as blogging on the Intelligent Voice website. A keen technologist, Nigel is always on the lookout for new challenges, and new ways of stretching existing techniques and technology. He has gained UK government recognition by way of a large grant for high tech research exploring leading edge problems in speech research, such as ultra-high speed GPU accelerated speech recognition, and emotional analysis of telephone calls.

Time synchronous Viterbi search algorithm for automatic speech recognition is implemented using a counter-intuitive single CUDA block approach. Decoding of a single utterance is carried out on a single stream multiprocessor (SM) and multiple utterances are decoded simultaneously using CUDA streams. The single CUDA block approach is shown to be substantially more efficient and enables overlapping of CPU and GPU computation by merging ten thousands of separate CUDA kernel calls for each utterance. The proposed approach has the disadvantage of large GPU global memory requirement because of the simultaneous decoding feature. However, the latest GPU cards with up to 12GB of global memory fulfill this requirement and the full utilization of the GPU card is possible using all available SMs.

Level: All
Type: Talk
Tags: Big Data Analytics; Machine Learning & Deep Learning; Signal & Audio Processing

Day: Tuesday, 03/17
Time: 16:30 - 16:55
Location: Room 210D
View Recording

S5740 - Clarifai: Scaling Deep Learning

Matthew Zeiler CEO, Clarifai
Matthew Zeiler
Matthew Zeiler, PhD, Founder and CEO of Clarifai Inc. studied machine learning and image recognition with several pioneers in the field of deep learning at University of Toronto and New York University. His insights into neural networks produced the top 5 results in the 2013 ImageNet classification competition. He founded Clarifai to push the limits of practical machine learning, which will power the next generation of intelligent applications and devices.

The field of artificial intelligence and machine learning is in a period of rapid, revolutionary improvement. Across many application areas, deep learning is proving to outperform techniques that were considered state of the art only a few years ago. Although the fundamental techniques were developed in the 1980s and 1990s, it was only recently that they were applied at large scale, due to the advent of general-purpose GPU computing and the availability of internet-scale datasets. The deep learning experts at Clarifai have spent years working alongside pioneers of the field and form a team who has vast experience developing new deep learning techniques and building state of the art systems that solve real problems. In this talk we will present some of the latest technologies we have developed and show how they can be applied to power a new generation of intelligent applications.

Level: All
Type: Talk
Tags: Machine Learning & Deep Learning; Computer Vision & Machine Vision; Developer - Tools & Libraries; Press-Suggested Sessions: Deep Learning & Computer Vision

Day: Tuesday, 03/17
Time: 16:30 - 16:55
Location: Room 210A
View Recording

S5761 - Achieving Real-Time Performances on Facial Motion Capture and Animation on Mobile GPUs

Emiliano Gambaretto Director of Research, Mixamo
Emiliano Gambaretto
Emiliano Gambaretto obtained a PhD degree in Bioengineering from Politecnico di Milano (Italy) in 2011 for his research on Markerless Motion Capture. Part of his research was carried out at Stanford Biomotion Lab in 2006. He also received a Master's Degree in Biomedical Engineering from Politecnico di Milano and a Diplome d'Ingenieur from Ecole Centrale de Lyon (France) in 2006. He was part of Mixamo's founding team in 2008. He's currently Director of Research at Mixamo. His job consists of designing and developing the technologies behind Mixamo's services. That includes motion models, auto-rigging, real-time face animation, integration with 3rd-party software and industry standards.

3D Animation is one of the most prominent forms of contemporary art, with millions people drawn to its emotional power in movie theaters and games every year. Mixamo developed a GPU-powered facial capture and animation technology to enable users to animate a character's face in real-time. This technology, originally targeted to desktop and laptop GPUs, is now available on mobile thanks to the improved performance of new generation hardware. This presentation will focus on the challenges faced and the strategies adopted to port this technology to Tegra K1 powered devices. We adopted two parallel approaches: one approach optimized our tracking algorithm and ported our code to CUDA (from OpenCL); the other completely changed facial tracking paradigm focusing on an intrinsically faster machine learning approach based on a cascade of simple regressors. We will compare performances and strengths of both approaches.

Level: All
Type: Talk
Tags: Computer Vision & Machine Vision; Developer - Algorithms; Developer - Performance Optimization; Press-Suggested Sessions: Deep Learning & Computer Vision

Day: Tuesday, 03/17
Time: 16:30 - 16:55
Location: Room 210B
View Recording

S5785 - Acceleration of a Molecular Modelling Code for the Analysis and Visualization of Weak Interactions between Molecules

Michael Krajecki Laboratory Head, Université de Reims Champagne-Ardenne
Michaël Krajecki defended his Ph.D degree in Computer Science at the University of Metz in 1998. He is a full professor in Computer Science from the University of Reims, France since 2005. He is the actual head of the ICT Research Center (CReSTIC) and of the ROMEO High Performance Computing Center. His research interests are mainly focused on parallel algorithms, GPU computing and combinatorial optimization.

At the interface between Chemistry, HPC and Biochemistry, this research work aims at exploiting the recent analysis method, "NCI" (Non Covalent Interactions), for molecular docking simulations using a new software, AlgoGen, within Drug-Design studies. NCI is a breakthrough in the field. The use of a GPU to accelerate this scientific application is very attractive in view of exploiting NCI in molecular docking, which is a very challenging tool in Medicinal Chemistry. A first GPU-accelerated version of the NCI code is proposed here.

Level: Intermediate
Type: Talk
Tags: Life & Material Science

Day: Tuesday, 03/17
Time: 16:30 - 16:55
Location: Room 212A
View Recording
View PDF

S5914 - Fly Me to the Moon: The Role of GPUs in Lunar Exploration (Presented by GE)

Kevin Peterson CTO, Astrobotics
Kevin Peterson manages Astrobotic's technology development. Peterson has fielded a dozen autonomous systems, including autonomous UXO survey, high-speed desert and urban driving, construction, and naval systems. Peterson was the technical lead for Carnegie Mellon's DARPA Grand Challenge teams. At Astrobotic, Peterson leads 5 active NASA contracts for Autolanding for Robotic Precursor Missions; Reliable, Hardened GPS-Denied Navigation and Landing; Resource Aware Planning for Shadowed and Uncertain Domains; Long-Range Prediction of Non-Geometric Terrain Hazards for Reliable Planetary Rover Traverse; and Non-Geometric Terrain Sensing for Autonomous Excavation and Site Work.

Since the beginning of the space age, access to the Moon has been limited to a select few. Only three governments have landed robotic spacecraft on the lunar surface – the United States, the former Soviet Union, and China. The cost and complexity of missions to the Moon have restricted this activity to large national governments that invest hundreds of millions of dollars per mission. This paradigm is shifting as commercial lunar delivery services pave the way for low cost access beyond low earth orbit. GPUs are powering the design, analysis, and flight of these services. This talk presents the use of GPUs in design and flight of Astrobtic's Griffin lander.

Level: All
Type: Talk
Tags: Embedded Systems

Day: Tuesday, 03/17
Time: 16:30 - 16:55
Location: Room 210G
View Recording

S5918 - Ubiquitous Perceptive 3D Sensing for a Smart Internet of Things

Louay Eldada CEO, Quanergy Systems, Inc.
Louay Eldada is founder and CEO of Quanergy Systems, Inc., a privately held Silicon Valley-based technology company developing and manufacturing smart sensors and sensing systems. Louay is a serial entrepreneur, having founded and sold three businesses to Fortune 100 companies. His fourth start-up, Quanergy, is developing compact low-cost high-performance high-reliability LiDAR (light detection and ranging) sensors and software used for capturing and processing 3D mapping data in real time. In transportation, the data will be utilized to greatly improve the accuracy and reliability of on-board driver safety systems and enhance them with object recognition and scenario analysis capability, as well as enable autonomous driving in the future. Quanergy has established early partnerships with global automotive, mining and digital mapping companies, and will be expanding its market footprint into logistics, robotics, aeronautics, security, and 3D-aware consumer electronics. Prior to Quanergy, Dr. Eldada was VP of technology at Amprius, a developer of lithium ion batteries based on silicon nanowire anodes. He was earlier CSO of SunEdison where he led innovation programs in photovoltaic and energy storage systems, after serving as CTO of HelioVolt, where he led the development of thin film photovoltaic technologies. He was earlier CTO of DuPont Photonic Technologies, a business that was formed from the acquisition by DuPont of Telephotonics, a company that he founded and where he led as CTO the development of optoelectronic telecommunication modules. His first industry job was at Honeywell, where he started the Telecom Photonics business and directed its R&D division. The success of the business led to its acquisition by Corning, where he continued to direct technical development. He chaired and organized 160 conferences; delivered 200 keynotes and invited talks/courses; published 270 technical papers, books and book chapters; received 50 technical awards and has 65 patents. He studied business administration at Harvard, MIT and Stanford and holds a Ph.D. in optoelectronics from Columbia University.

Innovations in perceptive smart sensors comprising solid state 3D LiDARs and GPUs with artificial intelligence software have reached a cost level that allows them to be deployed ubiquitously, supporting a smart Internet of Things (IoT). These smart sensors provide real-time information on billions of 'Things' and their surroundings (through 3D object detection, tracking, and classification) and, when needed, the ability to control them. The 'Things' include vehicles, infrastructure, buildings, homes, appliances, light controls, thermostats, medical devices, computers, and handheld devices. Growth of personal devices (phones, tablets, laptops, game consoles) is limited by the number of people in the world. The largest growth will come from connected devices in areas such as smart energy, home automation, and transportation.

Level: All
Type: Talk
Tags: Automotive; Computer Vision & Machine Vision; Machine Learning & Deep Learning

Day: Tuesday, 03/17
Time: 16:30 - 16:55
Location: Room LL21F
View Recording

S5697 - Data Centric Interactive Visualization of Very Large Data

Bruce D'Amora Sr. Technical Staff Member,Computational Science, IBM T. J. Watson Research Center
IBM Research Division, Thomas J. Watson Research Center, P.O Box 218, Yorktown Heights, New York 10598 (damora@us.ibm.com) . Mr. D’Amora is a Senior Technical Staff Member in the Computational Sciences department in Data-centric Computing group. He is currently focusing on frameworks to enable computational steering and visualization for high performance computing applications. Previously, Mr. D’Amora was the chief architect of Cell Broadband Engine-based platforms to accelerate applications used for creating digital animation and visual effects. He has been a lead developer on many projects ranging from applications to microprocessors and holds a number of hardware and software patents. He joined IBM Research in 2000 after serving as the Chief Software Architect for the IBM Graphics development group in Austin, Texas where he led the OpenGL development effort from 1991 to 2000. He holds Bachelor’s degrees in Microbiology and Applied Mathematics from the University of Colorado. He also holds a Masters degree in Computer Science from National Technological University.
Gordon Fossum Advisory Engineer, Computational Sciences, Thomas J. Watson Research Center
IBM Research Division, Thomas J. Watson Research Center, P.O. Box 218, Yorktown Heights, New York 10598 (fossum@us.ibm.com). Mr. Fossum is an Advisory Engineer in Computational Sciences at the Thomas J. Watson Research Center. He received a B.S. degree in Mathematics and Computer Science from the University of Illinois in 1978, an M.S. in Computer Science from the University of California, Berkeley in 1981, and attained "all but dissertation" status from the University of Texas in 1987. He subsequently joined IBM Austin, where he has worked on computer graphics hardware development, Cell Broadband Engine development, and OpenCL development. He is an author or coauthor of 34 patents, has received a "high value patent" award from IBM and was named an IBM Master Inventor in 2005. In January 2014, he transferred into IBM Research, to help enable visualization of “big data” in a data-centric computing environment.

The traditional workflow for high-performance computing simulation and analytics is to prepare the input data set, run a simulation, and visualize the results as a post-processing step. This process generally requires multiple computer systems designed for accelerating simulation and visualization. In the medical imaging and seismic domains, the data to be visualized typically comprise uniform three-dimensional arrays that can approach tens of petabytes. Transferring this data from one system to another can be daunting and in some cases may violate privacy, security, and export constraints.

Level: All
Type: Talk
Tags: OpenPOWER; Supercomputing; Visualization - In-Situ & Scientific

Day: Tuesday, 03/17
Time: 17:00 - 17:25
Location: OpenPOWER Booth
View Recording

S5704 - System Management Tool for OpenPOWER

Song Yu Development Manager, IBM STG China
Song Yu is a Development Manger for IBM STG China.
Li Guang Cheng xCAT Senior Architect, IBM STG China
Li Guang Cheng is an xCAT Senior Architect for IBM STG China.
Ma Yuan Liang Manager of System Development, Teamsun
Bio to come.
Chen Qing Hong System Architect, Teamsun
Bio to come.

OpenPOWER is a new generation platform. As a new system, the infrastructure level management is the most important requirement while the OpenPOWER machines are widely used in cloud area and non-cloud area.In cloud area, the end user normally cares about the SaaS or PaaS but, for the cloud admin, they must consider how to manage the OpenPOWER physical node to provide service. The cloud admin must also quickly and automatically provision physical machines and physical nodes into the cloud in order to provide service. How to self-service for physical node is a new challenge in public cloud.

Level: All
Type: Talk
Tags: OpenPOWER; Supercomputing; Data Center, Cloud Computing & HPC

Day: Tuesday, 03/17
Time: 17:30 - 17:55
Location: OpenPOWER Booth
View Recording
View PDF

S5922 - IBM

Srini Chari Managing Partner, Cabot Partners
Bio to come.

Level: All
Type: Talk
Tags: OpenPOWER

Day: Tuesday, 03/17
Time: 18:00 - 18:15
Location: OpenPOWER Booth
View Recording

S5703 - Accelerated Photodynamic Cancer Therapy Planning with FullMonte on OpenPOWER

Jeffrey Cassidy Ph.D Candidate, Electrical and Computer Engineering, University of Toronto
Jeffrey Cassidy, MASc, PEng, is a Ph.D. candidate in Electrical and Computer Engineering at the University of Toronto.

Photodynamic therapy (PDT) is a minimally-invasive cancer therapy which uses a light-activated drug (photosensitizer/PS). When the photosensitizer absorbs a photon, it excites tissue oxygen into a reactive state which causes very localized cell damage. The light field distribution inside the tissue is therefore one of the critical parameters determining the treatment's safety and efficacy. While FDA-approved and used for superficial indications, PDT has yet to be widely adopted for interstitial use for larger tumours using light delivered by optical fibres due to a lack of simulation and planning optimization software.

Level: All
Type: Talk
Tags: OpenPOWER; Supercomputing; Life & Material Science; Press-Suggested Sessions: HPC & Science

Day: Tuesday, 03/17
Time: 18:30 - 18:45
Location: OpenPOWER Booth
View Recording

S5686 - Enabling Coherent FPGA Acceleration

Allan Cantle President & Founder, Nallatech
Allan is the founder of Nallatech, established in 1993, that specializes in compute acceleration using FPGAs. As CEO, Allan focused Nallatech on helping customer’s port critical codes to Nallatech’s range of FPGA accelerators and pioneered several early tools that increased porting productivity. His prior background, with BAE Systems, was heavily involved in architecting Real Time, Heterogeneous Computers that tested live weapon systems and contained many parallel processors including Microprocessors, DSPs and FPGAs. Allan holds a 1st Class Honors EE BEng Degree from Plymouth University and a MSC in Corporate Leadership from Napier University.

The presentation will introduce CAPI, Coherent Accelerator Processor Interface, to the audience and will detail the CAPI HDK, Hardware Development Kit, implementation that is offered to OpenPOWER customers through Nallatech. Several high level examples will be presented that show where FPGA acceleration brings significant performance gains and how these can often be further advantaged by the Coherent CAPI interface. Programming methodologies of the accelerator will also be explored where customers can either leverage pre-compiled accelerated libraries that run on the accelerator or where they can write their own Accelerated functions in OpenCL.

Level: All
Type: Talk
Tags: OpenPOWER; Supercomputing

Day: Tuesday, 03/17
Time: 18:45 - 19:00
Location: OpenPOWER Booth
View Recording

S5206 - So You Want to Deploy High Resolution Graphics Desktop Virtualization

Chip Charnley Technical Expert: Client Technologies, Ford Motor Company

A review of the process and results of a High Resolution Graphics Proof of Concept for implementing XenApp and XenDesktop conducted jointly by Ford Motor Company, Cisco and Citrix.

Level: Intermediate
Type: Talk
Tags: Graphics Virtualization; Manufacturing; Press-Suggested Sessions: Professional Graphics

Day: Wednesday, 03/18
Time: 09:00 - 09:50
Location: Room LL20C
View Recording
View PDF

S5225 - University's Desktop Virtualization Delivers Graphics-Intense Apps on Any Device

George Thornton VP of Engineering, Logical Front
George Thornton Is the Vice President of Engineering for Logical Front, a premier Technology Integration firm with offices in Texas, Utah and Florida focusing on complex technology solutions in the public and commercial space. He currently oversees all new technology advancements for the company and has ultimate responsibility for the Engineering department that runs all customer implementations, hosted environments and support. He is credited with developing the company's virtual implementation strategies and designs that have been successfully deployed to hundreds of customer entities servicing over 100,000 virtual desktops. George has more than 15 years of experience in the public and private sector designing, implementing and managing complex IT environments and projects. His professional background includes Senior Engineer/Architect roles for various State agencies where he was responsible for the first successful rollout of wireless handsets that extended email and internet services to State officials. He also held the Director of Network Operations and eventually Director of Technology for a 8000+ student K-12 school district in Texas where he was among the first technology integrators to implement VDI successfully in an educational environment.
Jim Galib IT Director, Roger Williams University
Jim Galib has been with Roger Williams University (RWU) for over 20 years, the last 7 of which he has served as the Director of Information Technology. He and his dedicated staff provide cutting edge infrastructure to RWU, while still maintaining fiscal responsibility. He has been on the cutting edge of using virtual desktop infrastructure (VDI) in higher education. Under his leadership, RWU has become one of the first universities in the country to have their College of Architecture fully virtualized, allowing their students greater access to labs and applications, while reducing students out-of-pocket costs on hardware and software.
Ryan Tiebout Systems Operations Manager, Rogers Williams University
Ryan received his B.S. in Computer Science from RWU and is currently pursuing a Masters of Information Technology at UMass Lowell. He has been working at RWU in the IT department for the past 12 years. Most of his 12 years there have been Administering Systems and Databases. He has experience with Windows, SQL, VSphere and XenServer.

The rapid evolution of technology is changing the way we learn, work and educate. Attend this session to hear from Roger Williams University and learn how they overcame their challenges with a solution from Logical Front, NVIDIA, Citrix, and Dell. Specifically hear how they provide students remote access to their graphics-intensive apps like AutoCAD, Revit, and Adobe Creative Suite 6, improve 3D rendering and user experience, even during peak traffic times and, allow students the flexibility to work from anywhere, on any device.

Level: All
Type: Talk
Tags: Graphics Virtualization

Day: Wednesday, 03/18
Time: 09:00 - 09:50
Location: Room LL21F
View Recording
View PDF

S5306 - Direct Convolution for Deep Neural Network Classification on Tegra X1

Alan Wang Compute Architect, NVIDIA
Alan is a GPU Architect in the computer vision field at NVIDIA. He is experienced in parallelization, performance modeling and architecture-specific tuning. Alan is currently working on 2D convolution projects. Before joining computer architecture team, Alan works on graphics tracing and FPGA architecture&EDA software.

We prototype a direct convolution implementation to accelerate classification with a deep neural network. We take the Overfeat network as an example, analyzing some of its properties like math/memory ratio and input/coefficient ratio. We then discuss the workload distribution of the implementation and how we partition the computation into CUDA blocks. We also dive into details about how we optimize for data reuse, including the use of 3D texture for input pixels and a coefficient layout designed for coalesced stores. Experiments with Overfeat Layer 6 on Tegra X1 show that we can achieve 75% utilization of GFLOPs currently, with room for further optimization as future work.

Level: Advanced
Type: Talk
Tags: Developer - Performance Optimization; Video & Image Processing

Day: Wednesday, 03/18
Time: 09:00 - 09:25
Location: Room 210G
View Recording
View PDF

S5327 - Hydra: Pixar's Real-Time Render Engine for Feature Film Assets and Workflows

Jeremy Cowles GPU Team Technical Lead, Pixar Animation Studios
Jeremy Cowles
Jeremy Cowles is the GPU team lead in Pixar's software research and development group and co-designer of the Hydra render engine and Universal Scene Description. In his free time, he is an open source and computer graphics enthusiast and part-time demoscener.

Feature film production assets present a difficult performance challenge for the real-time graphics pipeline, however by leveraging low driver overhead OpenGL APIs, NVIDIA OpenGL extensions, OpenSubdiv, and Universal Scene Description, the Hydra render engine achieves flexible, high fidelity, real-time performance with unmodified production assets. This talk covers architectural details, trade-offs, performance metrics, and the integration of OpenSubdiv and USD.

Level: Intermediate
Type: Talk
Tags: Media & Entertainment; Real-Time Graphics; Developer - Performance Optimization; Rendering & Ray Tracing

Day: Wednesday, 03/18
Time: 09:00 - 09:25
Location: Room LL21D

S5371 - VMD: Visualization and Analysis of Biomolecular Complexes with GPU Computing

John Stone Senior Research Programmer, University of Illinois at Urbana-Champaign
Highly-Rated Speaker
John Stone
John Stone is a Senior Research Programmer in the Theoretical and Computational Biophysics Group at the Beckman Institute for Advanced Science and Technology, and Associate Director of the NVIDIA CUDA Center of Excellence at the University of Illinois. Mr. Stone is the lead developer of VMD, a high performance molecular visualization tool used by researchers all over the world. His research interests include molecular visualization, GPU computing, parallel processing, ray tracing, haptics, and virtual environments. Mr. Stone was awarded as an NVIDIA CUDA Fellow in 2010. Mr. Stone also provides consulting services for projects involving computer graphics, GPU computing, and high performance computing.

This talk will showcase recent successes in the use of GPUs to accelerate challenging molecular visualization and analysis tasks on hardware platforms ranging from commodity desktop computers to the latest Cray supercomputers. This presentation will highlight the use of in-place OpenGL rendering and GPU ray tracing for interactive and batch mode rendering of images and movies, CUDA just-in-time (JIT) compilation for increasing the performance of data-driven visualization and analysis algorithms, and GPU accelerated analysis of results of hybrid structure determination methods that combine data from cryo-electron microscopy and X-ray crystallography with all-atom molecular dynamics simulations.

Level: Intermediate
Type: Talk
Tags: Visualization - In-Situ & Scientific; Life & Material Science; Big Data Analytics

Day: Wednesday, 03/18
Time: 09:00 - 09:50
Location: Room LL21C
View Recording
View PDF

S5410 - Accelerating Graph Algorithms on Emerging Architectures

Antonino Tumeo Research Scientist, Pacific Northwest National Laboratory
Highly-Rated Speaker
Antonino Tumeo
Dr. Antonino Tumeo received the M.S degree in Informatic Engineering, in 2005, and the Ph.D degree in Computer Engineering, in 2009, from Politecnico di Milano in Italy. Since February 2011, he has been a research scientist in the PNNL's High Peformance Computing group. He Joined PNNL in 2009 as a post doctoral research associate. Previously, he was a post doctoral researcher at Politecnico di Milano. His research interests are modeling and simulation of high performance architectures, hardware-software codesign, FPGA prototyping and GPGPU computing.
Mahantesh Halappanavar Research Scientist, PNNL
Mahantesh Halappanavar
Dr. Mahantesh Halappanavar joined Pacific Northwest National Laboratory in December 2009. His work focuses on parallel graph algorithms and spans several applications including contingency analysis of electric power grids, statistical textual analysis, numerical linear algebra, information security and fault tolerance. He explores the interplay of algorithm design, architectural features, and input characteristics targeting massively multithreaded architectures such as the Cray XMT and emerging multicore (Intel, AMD) and manycore (NVIDIA) platforms. Mahantesh graduated in 2009 with a Ph.D. in Computer Science from the Old Dominion University, Norfolk, Virginia. His doctoral research was in the emerging interdisciplinary field known as combinatorial scientific computing (CSC) that employs combinatorial algorithmic techniques to solve scientific computing problems. He developed new approximation algorithms for graph matching a fundamental combinatorial problem with numerous applications in science and engineering. He also developed software targeting the Department of Energy's leadership class machines for the approximate graph matching problem and demonstrated scalability across tens of thousands of processors.

This talk discusses approaches we are pursuing to speed up large-scale graph algorithms on emerging architectures. Breadth First Search (BFS) has become one of the premiere benchmarks for new high performance computing systems through the Graph 500. However, there exists many more combinatorial algorithms that have significant applications on real world problems (scientific computing, machine learning, community detection, analysis of various networks, bioinformatics, etc.). Examples are matching and graph clustering. We will highlight the pros and cons of using GPUs for these algorithms, contrasting them to other novel architectures, with a perspective on their scaling on large datasets and highlighting whether, beside code optimization, rethinking of the algorithms is required.

Level: Intermediate
Type: Talk
Tags: Big Data Analytics; Developer - Algorithms; Supercomputing

Day: Wednesday, 03/18
Time: 09:00 - 09:50
Location: Room 210C

S5442 - High-Quality Rasterization

Chris Wyman Research Scientist, NVIDIA
Chris joined NVIDIA Research in 2012. Previously, he served as an associate professor of computer science at the University of Iowa. He has a PhD in computer science from the University of Utah. His research interests focus on realistic, real-time rendering including problems on lighting, global illumination, shadows, materials, participating media, and many related issues.

We describe three new rendering algorithms that rasterize many samples per pixel, taking advantage of Maxwell GPU features to make images that are sharper and less aliased. "ACAA" is a simple variation of MSAA that uses less memory. "AGAA" brings MSAA quality to deferred rendering, while shading less than twice per pixel. And thirdly, "FTIZB" renders alias-free hard shadows with 32 samples per pixel at real-time speeds.

Level: Intermediate
Type: Talk
Tags: Real-Time Graphics; Video & Image Processing; Rendering & Ray Tracing; Media & Entertainment

Day: Wednesday, 03/18
Time: 09:00 - 09:50
Location: Room LL21B
View Recording
View PDF

S5545 - Electronics & APIs: The Aftermarket's new Bondo™

John Waraniak Vice President of Vehicle Technology, Specialty Equipment Market Association (SEMA)
John Waraniak
John Waraniak has been vice president of Vehicle Technology at the Specialty Equipment Market Association (SEMA) since May 2006. In this role, Waraniak helps leading automotive aftermarket companies understand the latest, as well as emerging vehicle technology challenges, develop solutions and capitalize on new revenue and business opportunities. With more than 25 years of diverse experience at automotive, motorsports, aerospace and consumer products companies, Waraniak is an expert in vehicle systems engineering, motorsports program management and lean product-process development. John is currently a director of the Carroll A. Campbell Jr., Graduate Engineering Center Industrial Advisory Board at Clemson University where he is helping to develop, train and mentor tomorrow's automotive engineers today and align advanced vehicle technology with next-generation talent development. Prior to joining SEMA, Waraniak held executive management positions with global and entrepreneurial companies, including TATA Motors, Johnson Controls, General Motors, Hughes Aircraft, Northrop and No Fear. Waraniak earned a bachelor's degree in mechanical engineering from the University of Michigan. He has a master's degree in mechanical and industrial engineering from the University of Illinois and a master's degree in engineering management from West Coast University. He also graduated from the California Institute of Technology's Executive Engineering Management Program. Born in the Motor City of Detroit, Michigan, Waraniak is an avid auto industry, action sports and motocross enthusiast. He lives in West Bloomfield, Michigan with his wife Terri and has two sons, Scott and Jeff.
John Ellis Managing Director, Ellis & Associates
John Ellis
Most recently, John served as Global Technologist and Head of the Ford Developer Program with Ford Motor Company. While at Ford, he was tasked with expanding Ford's "brought-in" strategy of integrating mobile technology into the vehicle. He oversaw a team of developers and engineers responsible for creating the "connected car" and striking the right balance between embedded and off-board technology so that drivers can seamlessly extend their mobile lives into their vehicles. During most of his career, John worked for Motorola where he held key leadership positions in engineering; product management; software & services; marketing and strategy. While there, John participated in developing, marketing and selling Motorola's mobile software and services, their software developer ecosystem (developer.motorola.com), and industry-leading Open Source Software program (opensource.motorola.com). For the past 17 years, John has been the managing director of Ellis & Associates, the management consulting firm he founded in 1997. The firm has expanded its reach over the years, and currently offers expertise in international business and culture, open source and automotive software.

As the automotive industry relies on electronics and software for more and more active safety capabilities, how does a software or electronics company deliver their exciting value while ensuring that what they deliver doesn't "break" the vehicle? Drawing heavily on the Vehicle Dynamics Program, The Specialty Equipment Market Association ("SEMA") has developed the Vehicle Electronics Program to ensure that the next generation of in-car electronics realizes its full potential. Learn about this new program including the new proposed federal motor vehicle standard, FMVSS 150. In addition, we'll cover the resources and opportunities available to developers for designing and customizing vehicles.

Level: All
Type: Talk
Tags: Automotive; Product Design & Styling

Day: Wednesday, 03/18
Time: 09:00 - 09:25
Location: Room LL20D
View Recording
View PDF

S5580 - Application of GPUs to Classification Problems Using Deep Learning Architectures

Elliot English Senior Data Scientist, MetaMind
Elliot English
At MetaMind, Dr. English develops state-of-the-art algorithms for computer vision using novel types of deep neural networks. Prior to joining MetaMind, Dr. English completed his PhD at Stanford and held a postdoctoral fellowship at Lawrence Berkeley National Laboratory developing accurate and highly parallel algorithms for fluid dynamics and climate simulation.

In this talk, we discuss the latest techniques for solving image classification, localization, and detection problems on a multi-GPU architecture. We will cover issues and algorithms associated with training convolutional neural networks, as well as other network architectures, on small clusters of GPUs.

Level: Intermediate
Type: Talk
Tags: Machine Learning & Deep Learning

Day: Wednesday, 03/18
Time: 09:00 - 09:25
Location: Room 210A
View Recording
View PDF

S5584 - Improving GPU Utilization with the Multi-Process Service (MPS)

Priyanka Sah Compute Developer Technology, NVIDIA
Priyanka Sah is a Compute DevTech Engineer (GPU High Performance Computing) at NVIDIA. Having spent 2 years with the Indian Space Research Organization, developing and implementing parallel image processing algorithms for satellite imagery, I went on to attain masters in Computer Science and Engineering at IIT Delhi. I subsequently worked on life science and weather simulation codes as a CUDA consultant, before joining NVIDIA in their Developer Technology group. With NVIDIA , I works in a number of HPC application domains, helping customers develop with the GPU and working at the leading edge of HPC performance

Heterogeneous clusters with multi-core CPUs and one or many GPUs per node have gained wide popularity in scientific computing. While MPI-based distributed memory applications commonly assign multiple MPI ranks to each node, sharing GPUs amongst multiple processes can incur significant context switching overhead or lead to under-utilization of the GPU resources, especially in the strong scaling regime. Using the Multi Process Service (MPS) allows efficient sharing of GPUs among multiple CPU processes, leading to better utilization of the GPU resources and higher performance. This talk will focus on legacy MPI applications and demonstrate how to efficiently overlap work from multiple processes on the GPU,how to profile code under MPS on a node, using newly released tools in CUDA 6.5.

Level: Beginner
Type: Talk
Tags: Supercomputing; Data Center, Cloud Computing & HPC; Developer - Performance Optimization

Day: Wednesday, 03/18
Time: 09:00 - 09:25
Location: Room 212B
View Recording
View PDF

S5586 - Accelerating the Cure: GPU-Driven Drug Discovery for Targets in Cancer

Rommie Amaro Associate Professor, Chemistry and Biochemistry; Director, National Biomedical Computation Resource, University of California, San Diego
Rommie Amaro
Rommie E. Amaro, faculty in the Department of Chemistry and Biochemistry at the University of California, San Diego (UCSD), is a native of the south side of Chicago. She received her B.S. (Chemical Engineering, 1999) and Ph.D. (Chemistry, 2005) from the University of Illinois at Urbana-Champaign. She was a NIH postdoctoral fellow with Andy McCammon (UCSD). Rommie is the recipient of an NIH New Innovator Award, the Presidential Early Career Award for Scientists and Engineers, and the ACS COMP Outstanding Junior Faculty Award. Research in her lab is broadly concerned with the development and application of state-of-the-art computational and theoretical techniques to investigate the structure, function, and dynamics of complex biological systems. The lab focuses mainly on targeting neglected diseases, Chlamydia, influenza, and cancer, and works closely with experimental collaborators to catalyze the discovery of new potential therapeutic agents. The Amaro Lab is also keenly interested in developing new multiscale simulation methods and novel modeling paradigms that scale from the level of atoms to whole cells, and beyond.

This session discusses our work under the Compute-the-Cure award. We are dramatically accelerating the drug discovery pipeline for targets in cancer by incorporating the outstanding advances achieved in molecular dynamics (MD) simulations via GPU technologies into computer-aided drug design. GPU-based MD can be used to rapidly reveal novel druggable binding sites that are otherwise "hidden" in x-ray crystallographic structures and offer novel opportunities for drug discovery. I will also describe the development of automated GPU-based workflows to facilitate the broad adoption of GPU-based technologies in anti-cancer therapeutic discovery programs, with the hope to accelerate the discovery of new and safer medicines.

Level: All
Type: Talk
Tags: Life & Material Science; Computational Physics; Education & Training; Press-Suggested Sessions: HPC & Science

Day: Wednesday, 03/18
Time: 09:00 - 09:25
Location: Room 212A
View Recording
View PDF

S5645 - Easy Photorealism with NVIDIA Iray

Phillip Miller Director of NVIDIA Advanced Rendering Products, NVIDIA
Highly-Rated Speaker
Phillip Miller
Mr. Miller directs NVIDIA's Advanced Rendering offerings, ranging from the Iray and mental ray shipping within leading products in Design and Entertainment to the OptiX ray tracing framework used extensively within private and commercial applications. He has led leading software products for 20 years, including the 3D animation efforts at Autodesk and the Web Design products at Adobe. He holds a Masters of Architecture from the University of Illinois and is a registered architect.

Come learn how you can create stunning photorealistic imagery and animations with interactive ease by employing Iray within your favorite 3D applications. A full spectrum of Iray possibilities will be discussed working within tools like 3d Max, Cinema4D, Maya, Revit, and Rhino – each unfolding the latest capabilities of the new Iray 2015 framework. Distributed rendering to local machines or powerful VCA clusters will also be explored. The cross-use of material and light descriptions between applications will also be demonstrated.

Level: Beginner
Type: Talk
Tags: Rendering & Ray Tracing; Product Design & Styling; Media & Entertainment; Press-Suggested Sessions: Professional Graphics

Day: Wednesday, 03/18
Time: 09:00 - 09:50
Location: Room LL21E
View Recording

S5676 - Advancing the OpenPOWER Vision

Gordon MacKean Sr. Director with the Hardware Platforms team, Google
Gordon MacKean is a Sr. Director with the Hardware Platforms team at Google. He leads the team responsible for the design and development of the server and storage products used to power Google data centers. Prior to Google, Gordon held management and design roles at several networking companies including Matisse Networks, Extreme Networks, and Nortel Networks. Gordon is a founder of OpenPOWER Foundation and serves as the Chairman of the Board of Directors. Gordon holds a Bachelors degree in Electrical Engineering from Carleton University.

It's been nearly a year since the public launch of OpenPower and the community of technology leaders that make up our community have made significant progress towards our original goals. While growth of the membership is a critical factor, our success will come from the technology provided through the 'open model' and the 'value' solutions that are enabled by leveraging that technology. Please join us as we highlight the key components that our member community have contributed to that 'open model' and spotlight some examples of high value solutions enabled through members leveraging our combined capabilities and strengths.

Level: All
Type: Talk
Tags: OpenPOWER; Supercomputing; Press-Suggested Sessions: General Interest

Day: Wednesday, 03/18
Time: 09:00 - 09:25
Location: Room 220C

S5751 - Stereovision and the Future of Autonomous Machines

Edwin Azzam CTO, STEREOLABS
Edwin Azzam co-founded STEREOLABS in 2010. As STEREOLABS’s Chief Technical Officer, Edwin is responsible for leading the company’s product development and technology strategy in stereoscopic image processing. Prior to founding STEREOLABS, Edwin was a project manager at Airbus Defence and Space. Edwin holds a Master’s degree in Optics & Image Processing from Institut d’Optique, France, as well as a Master’s degree in Management from ESSEC Business School. He is a PhD supervisor and a National Technical Expert for the ANR (National Research Agency), where he uses his technical and market expertise for the assessment of national research projects in the field of computer vision and 3D image processing. Edwin was honored twice with the National Innovation Prize by the French Ministry of Research. Between 2010 and 2014, Edwin received 10 different distinctions for his achievements in the stereoscopic 3D field. In 2010, he won the European Innovation Award with STEREOLABS which recognizes the most promising and innovative technological companies in Europe.

Discover how stereovision and 3D depth sensing on mobile GPUs enable the development of future autonomous cars, drones and robots. We will discuss the benefits and challenges of using stereo cameras as depth sensing sensors, and how to leverage the power of embedded GPU to overcome these challenges. We will also show demonstrations of how the technology can be used to create 3D surrounding reconstruction, detect obstacles and navigate autonomously.

Level: All
Type: Talk
Tags: Computer Vision & Machine Vision; Automotive; Video & Image Processing; Press-Suggested Sessions: Deep Learning & Computer Vision; Press-Suggested Sessions: Cars

Day: Wednesday, 03/18
Time: 09:00 - 09:25
Location: Room 210B
View Recording

S5786 - Summit: The Next Peak in HPC

Tjerk Straatsma Manager, Scientific Computing Group, Oak Ridge National Laboratory
Dr. Straatsma manages the Scientific Computing group in the National Center for Computational Sciences. This center is the site for the Oak Ridge Leadership Computing Facility and houses the largest supercomputer for open science in the United States.

Hybrid CPU+GPU architectures are a response to power limitations imposed by the end in the last decade of processor clock-rate scaling. This limitation continues to drive supercomputer architecture designs toward massively parallel, hierarchical, and/or hybrid systems, and we expect that, for the foreseeable future, large leadership computing systems will continue this trajectory in order to address science and engineering challenges for government, academia, and industry. Consistent with this trend, the U.S. Department of Energy's (DOE) Oak Ridge Leadership Computing Facility (OLCF) has signed a contract with IBM to bring a next-generation supercomputer to the Oak Ridge National Laboratory (ORNL) in 2017. This new supercomputer, named Summit, will provide on science applications at least five times the performance of Titan, the OLCF's current hybrid CPU+GPU leadership system, and become the next peak in leadership-class computing systems for open science.

Level: All
Type: Talk
Tags: Supercomputing; Data Center, Cloud Computing & HPC; Press-Suggested Sessions: HPC & Science

Day: Wednesday, 03/18
Time: 09:00 - 09:25
Location: Room 210D

S5137 - Rapidly Prototyping Automotive User Experiences at Jaguar Land Rover

Matt Jones Head of Future Infotainment, Jaguar Land Rover
Matt Jones
Matt Jones is the Technical Lead for the Next Generation of Infotainment Systems at Jaguar Land Rover, a Board Member at the Linux Foundation, and the former Vice President of the GENIVI Alliance, leading the push for open infotainment standards and leading-edge features being deployed rapidly to the customer. To this end is actively working with Linux and the open source community; especially within the Automotive Grade Linux, Tizen Project (IVI branch) and GENIVI. At Jaguar Land Rover he was responsible for the specification and roll out of the latest generation of Linux based Infotainment system. With a large team in the UK, and a completely new team and facility in Portland, OR that is dedicated to Open Software for Automotive; as well as close links to the automotive infotainment ecosystem.

Learn how Jaguar Land Rover is using the power of GPU to design, create and test next generation user interfaces for cars.

Level: All
Type: Talk
Tags: Automotive; Embedded Systems; Real-Time Graphics; Manufacturing; Press-Suggested Sessions: Cars

Day: Wednesday, 03/18
Time: 09:30 - 09:55
Location: Room LL20D
View Recording

S5256 - Accelerating the NAS Multi-Zone Scalar Pentadiagonal (SP-MZ) Parallel Benchmark with OpenACC

Christopher Stone Research Scientist, Computational Science and Engineering, LLC
Christopher Stone
Dr. Stone has been developing high-performance, parallel algorithms professionally for over 10 years and specializes in GPU co-processing for a variety of computational science and engineering disciplines. His primary area of interest is the intersection of applied numerical methods, computational fluid dynamics, and high throughput computing. Dr. Stone is the owner of Computational Science and Engineering, LLC, an independent consulting firm, and has been providing HPC R&D services since 2006. Prior to founding CSE, Dr. Stone was a Research Engineer at The Georgia Institute of Technology School of Aerospace Engineering; a Visiting Asst Professor of Mechanical Engineering at Northern Arizona U.; and a Research Scientist with the Applied Research Group at Intelligent Light. Dr. Stone received his BS-Physics from Wofford College and his Ph.D. from the Georgia Institute of Technology School of Aerospace Engineering.
Bracy Elton Senior Computational Scientist, Engility Corporation
Dr. Bracy Elton is a Signal & Image Processing On-Site at Wright-Patterson Air Force Base in the User Productivity Enhancement, Technology Transfer, and Training (PETTT) activity of the DoD High Performance Computing Modernization Program (HPCMP). His background includes high performance scientific computing, parallel numerical methods, demanding signal processing applications, and high performance Fast Fourier Transforms. He was previously with the Ohio Supercomputer Center, the Scientific Libraries Group at Cray Inc. and the Supercomputer Group at Fujitsu America, Inc. He received Ph.D. and M.S. degrees in Computer Science from the University of California, Davis, and a B.S. Cum laude in Computer Science and in Math (double major) from Pacific Lutheran University.

Explore the latest techniques for accelerating implicit scalar pentadiagonal CFD codes using OpenACC. In this session we will compare the performance of different data lifetimes and array access patterns found in the NAS Multi-Zone Scalar Pentadiagonal (SP-MZ) parallel benchmark. We will also learn techniques to increase the performance of this challenging algorithm through asynchronous kernel execution and interfacing with optimized CUDA libraries. We will survey the performance of the SP-MZ benchmark on the NVIDIA Kepler GPU and compare to single and multi-core CPU performance.

Level: Intermediate
Type: Talk
Tags: Developer - Performance Optimization; Computational Physics; Developer - Algorithms; OpenACC

Day: Wednesday, 03/18
Time: 09:30 - 09:55
Location: Room 210G

S5317 - Development of a GPU Accelerated Visual Tracking Framework

David Concha Researcher, Universidad Rey Juan Carlos
David Concha
David received his B.Sc. Degree in Computer Science from Universidad Rey Juan Carlos (URJC) in 2011 and is currently a Ph.D. student and grant holder at Universidad Rey Juan Carlos. His research interests focus on Computer Vision and GPU Computing. Some research works done recently, exploits the graphics hardware to accelerate Computer Vision algorithms. In particular, David uses GPUs to accelerate methods related to 3D/2D motion tracking, medical image reconstruction, face recognition, high-definition depth maps computation, image segmentation, etc.

This session presents the development of a visual tracking system whose ultimate goal is to track multiple articulated objects. Throughout the development, different technologies for GPU programming are used, like OpenGL, Cg and CUDA; various types of sensor such as cameras or Kinects; and different methodologies like particle filters, Kalman filter or Variable Neighborhood Search (VNS) metaheuristic.

Level: All
Type: Talk
Tags: Computer Vision & Machine Vision; Video & Image Processing

Day: Wednesday, 03/18
Time: 09:30 - 09:55
Location: Room 210B
View Recording
View PDF

S5463 - GPU-Accelerated Virtual Screening: Rationale, Challenges, and Case Studies

Olexandr Isayev Research Scientist, University of North Carolina at Chapel Hill
Dr. Isayev is a Reseach scientist in the Laboratory for Molecular Modeling at the University of North Carolina at Chapel Hill. His research topics are related to Materials science, Molecular Modeling, and Big Data. See complete biography at http://www.researchgate.net/profile/Olexandr_Isayev

With the unprecedented growth of chemical databases incorporating billions of synthetically feasible chemicals, the current generation of cheminformatics software tools is not capable of handling, characterizing, and processing such extremely large chemical libraries. In this presentation, we will discuss the rationale and the main challenges (both theoretical and technical) for screening very large repositories of virtual compounds. We will present several proof-of-concept case studies regarding the screening of large libraries (≥ 1 billion compounds) using our novel GPU-accelerated cheminformatics platform to (1) rapidly compute chemical descriptors, (2) identify molecules with a defined bioactivity, and (3) materials with a desired property.

Level: All
Type: Talk
Tags: Life & Material Science; Big Data Analytics; Machine Learning & Deep Learning

Day: Wednesday, 03/18
Time: 09:30 - 09:55
Location: Room 212A
View Recording

S5548 - FMM Goes GPU: Smooth Trip or Bumpy Ride?

Bartosz Kohnke Software Developer, Max Planck Institute for Biophysical Chemistry
Bartosz Kohnke
Bartosz Kohnke is a software developer by Max Planck Institute for Biophysical Chemistry in Göttingen in the department of Theoretical and Computational Biophysics. His job is CUDA-parallelization and optimization of the Fast Multipole Method that will become a part of the GROMACS software. Before that Bartosz worked on efficient implementation of Super Resolution Fluctuation Imaging Algorithms (SOFI) researching on different parallelization techniques in the Laboratory of Cellular Dynamics (MPI Göttingen). He holds a MSc in Applied Computer Science from Georg-August-Universtität Göttingen, Germany, with specialization in Scientific Computing.

The N-body problem provides a very simple, yet scientific algorithm to utilize modern GPUs. However, the computational complexity is O(N^2). An algorithm reducing runtime and complexity to optimal O(N) for any required precision is the Fast Multipole Method (FMM). In this talk, we present our CUDA-enabled, templated C++ implementation. The algorithm requires several operators, partly depending up on each other, to exchange information in a tree-like data structure. We especially focus on the utilization of unified memory to minimize the porting efforts and the employment of dynamic parallelism to achieve a better computational workload. We will present timings/scalings for all FMM operators and will discuss remaining bottlenecks, like tree dependencies or redundancies in the kernel setup.

Level: Intermediate
Type: Talk
Tags: Supercomputing; Developer - Algorithms; Computational Physics

Day: Wednesday, 03/18
Time: 09:30 - 09:55
Location: Room 212B
View Recording
View PDF

S5622 - Dekko: A Framework for Real-Time Preview for VFX

Damien Fagnou Global Head, VFX Operations, MPC
Highly-Rated Speaker
Damien Fagnou
After finishing University with a Masters in Computer Science In France, Damien worked for an animated series implementing the technology to speed up the motion capture pipeline and rendering. He later accepted a job to help set up the workflow at Attitude studios and then, with his sights set on moving overseas, Damien took on the role of Tools and Workflow Programmer at Climax in the UK. In 2003, Damien transferred his skills to the film industry and started at leading VFX post production studio MPC to work on Troy, implementing preview tools and City Rendering scripts. In 2005, Damien became R&D lead on Charlie and the Chocolate Factory, 10,000 BC and Narnia. Damien then moved closer to production and became MPC's Stereographer working on movies including Pirates of the Caribbean: On Stranger Tides, the Harry Potter films and more recently Prometheus. After a few years in production Damien returned to his software roots and became Global Head of Software overseeing software development efforts across the company. Recently Damien moved up to a wider role as Global Head of VFX Operations, bringing together his expertise in both software and production to continue to evolve and refine the creation processes across all feature film VFX work at MPC.

In this session we will discuss the challenge and benefits of interactively visualizing large scenes in modern big budget VFX-driven movies. We will share some examples of the scale and complexity we experienced in our recent productions at MPC and the value of being able to visualize them without the need to go through long offline render processes. We will show initial results of our work done using NVIDIA's OptiX framework and Fabric Engine to assemble and render large scenes in an interactive environments taking advantage of the power of high end GPUs.

Level: Intermediate
Type: Talk
Tags: Media & Entertainment; Real-Time Graphics; Developer - Algorithms

Day: Wednesday, 03/18
Time: 09:30 - 09:55
Location: Room LL21D

S5672 - The Disruptive Technology of OpenPOWER

Brad McCredie Vice President, IBM Power Systems Development
.

The OpenPOWER Foundation is certainly carrying some strong momentum as it enters its second year. As we look forward there are many things still to be done to take the next step on our journey towards creating a broadly adopted, innovative and open platform for our industry. I will share my Top Ten List of OpenPOWER Projects to Disrupt the Data Center. Anything and everything is fair game on this list across the all disciplines, technologies and markets. Come join us in a fun look at how the OpenPOWER Foundation will continue to shake up the Data Center.

Level: All
Type: Talk
Tags: OpenPOWER; Data Center, Cloud Computing & HPC; Supercomputing; Press-Suggested Sessions: General Interest

Day: Wednesday, 03/18
Time: 09:30 - 09:45
Location: Room 220C

S5753 - Turbomachinery R&D Acceleration Using Titan

Ravi Srinivasan Aero/Thermodynamic Engineer , Dresser-Rand
Dr. Ravi Srinivasan received his Ph.D. in Aerospace Engineering from Texas A&M University in 2005. His graduate and post-graduate research areas were related to simulation of high-speed flow phenomena. Since joining Dresser-Rand as an Aero/Thermodynamic engineer in 2009, he has focused on the design and CFD modeling of supersonic flow in turbo-machines, including multi-stage and non-linear harmonic modeling.

Dresser-Rand (D-R) is an industrial partner of Oak Ridge Leadership Computing Facility (OLCF) and utilizes the Titan platform to accelerate turbomachinery research and development. In order to take advantage of computing infrastructure at OLCF, D-R has engaged with a third-party CFD software provider to add and modify computational fluid dynamics (CFD) solver modules. The developments include enhancing the scalability of the flow-solver by performing better grid partitioning, implementing GPU based acceleration and significantly improving IO performance. Turbomachinery design at D-R is complemented by employing an optimization process. Titan is the enabling technology that accelerates this process by significantly reducing database generation time and has made it possible to consider implementing optimization as part of R&D. Successful compressor component designs derived from optimization have been experimentally tested by D-R. The steps undertaken for optimization will be presented

Level: Intermediate
Type: Talk
Tags: Supercomputing; Computational Fluid Dynamics; Developer - Performance Optimization

Day: Wednesday, 03/18
Time: 09:30 - 09:55
Location: Room 210D
View Recording
View PDF

S5709 - OpenPOWER Keynote: OpenPOWER Foundation Technical Initiatives

Jeff Brown Chief Engineer , IBM
Jeffrey D. Brown, IBM Server and Technology Group. Jeff is an IBM Distinguished Engineer and member of the IBM Academy of Technology. He received a B.S. in Electrical Engineering and a B.S. in Physics from Washington State University in 1981. He received his M.S. degree in Electrical Engineering from Washington State University in 1982. Jeff has over 25 years of experience in VLSI development including processor, memory, and IO subsystem development projects for IBM multi-processor systems and servers. He is the coauthor of more than 40 patent filings. He has been the Chief Engineer on several processor and SOC chip development programs including Waternoose for the XBOX360 and Power Edge of Network. Jeff is currently actively involved in the OpenPOWER Foundation and chairs the Technical Steering Committee.
Sandy Woodward Senior Technical Staff Member , IBM
Sandy Woodward
Bio to come.
Rakesh Sharma Senior Technical Staff Member, IBM Power Systems I/O and Communications Architect, IBM
Bio to come.

As the Chair of the OpenPOWER Technical Steering Committee Mr. Brown will be discussing the technical agenda of the OpenPOWER Foundation and the structure of foundation workgroups. He will describe the scope and objectives of key workgroups as well as their relationships to each other. A roadmap of workgroup activities will illustrate when the community can expect key results. The presentation will also cover three of the key initiatives within the OpenPOWER Foundation. These initiatives involve work recently started to enable active foundation member engagement in workgroups focused on application solution sets IO device enablement, and compliance.

Level: All
Type: Talk
Tags: OpenPOWER; Supercomputing; Press-Suggested Sessions: General Interest

Day: Wednesday, 03/18
Time: 09:45 - 10:45
Location: Room 220C

ECS5000 - Emerging Companies Summit Opening Address

Jeff Herbst Vice President of Business Development, NVIDIA
Jeff Herbst
Jeff has responsibility for overall ecosystem development, mergers and acquisitions strategy, investments, partnerships, and other strategic business relationships and transactions. Prior to joining NVIDIA in 2001, Herbst was worldwide head of corporate and business development at AltaVista, as well as a partner with the law firm Wilson Sonsini.

Don't miss Jeff Herbst kick-off the Emerging Companies Summit event.

Level: All
Type: Talk
Tags: Emerging Companies Summit

Day: Wednesday, 03/18
Time: 10:00 - 10:15
Location: Room 220B

S5149 - Attacking HIV with Petascale Molecular Dynamics Simulations on Titan and Blue Waters

James Phillips Senior Research Programmer, University of Illinois at Urbana-Champaign
Highly-Rated Speaker
James Phillips
James Phillips is a Senior Research Programmer in the Theoretical and Computational Biophysics Group at the Beckman Institute for Advanced Science and Technology at the University of Illinois at Urbana-Champaign. He has a Ph.D. in Physics from the University of Illinois. Since 1999, James has been the lead developer of the highly scalable parallel molecular dynamics program NAMD, for which he received a Gordon Bell Award in 2002. His research interests include improving the performance and accuracy of biomolecular simulations through parallelization, optimization, hardware acceleration, better algorithms, and new methods.

The highly parallel molecular dynamics code NAMD was was one of the first codes to run on a GPU cluster when G80 and CUDA were introduced in 2007, and is now used to perform petascale biomolecular simulations, including a 64-million-atom model of the HIV virus capsid, on the GPU-accelerated Cray XK7 Blue Waters and ORNL Titan machines. Come learn the opportunities and pitfalls of taking GPU computing to the petascale, the importance of CUDA 6.5 and Kepler/Maxwell features in combining multicore host processors and GPUs in a legacy message-driven application, and the promise of remote graphics for improving productivity and accessibility in petascale biology.

Level: Intermediate
Type: Talk
Tags: Life & Material Science; Supercomputing; Graphics Virtualization

Day: Wednesday, 03/18
Time: 10:00 - 10:50
Location: Room 212A
View Recording
View PDF

S5200 - Big Data on a Budget: Cost Efficient Large-Scale Graph Analytics

Joe Schneible, Ph.D. Enterprise Software Solutions, Technica Corporation
Joe Schneible, Ph.D.
Joseph Schneible, Ph.D is a Software Engineer leading the Independent Research and Development team at Technica Corporation in Dulles, Virginia. His research focuses on a systems-based approach to optimizing graph analytics and machine-learning algorithms for commodity hardware. Dr. Schneible holds a PhD in Physics from Syracuse University where his research focused on parallel simulations of quantum field theory. Prior joining Technica, he performed postdoctoral research as a member of the High Performance Computing Lab at The George Washington University. His research interests include the use of GPUs to accelerate simulations and analytics.

The attendee will take away an appreciation for the nuances involved in performing large scale graph analytics on a budget. The discussion will center around utilizing graphics processing hardware in a limited memory environment. This will include insights into data storage structures for I/O efficient processing as well as the application of the massive parallelism of the GPU to real world graph data.

Level: Intermediate
Type: Talk
Tags: Big Data Analytics; Developer - Algorithms; Machine Learning & Deep Learning

Day: Wednesday, 03/18
Time: 10:00 - 10:25
Location: Room 210C
View Recording
View PDF

S5205 - Real-Time and High Resolution Feature Tracking and Object Recognition

Peter Andreas Entschev Software Engineer, ArrayFire
Peter Andreas Entschev
Peter Entschev is currently a software engineer at ArrayFire, where he primarily works on concurrent computer vision problems. He has received his Bachelor's degree in Telecommunication Systems and Master's degree in Computer Science from the Federal University of Technology - Paraná (UTFPR), Brazil. Before joining ArrayFire, he worked on real-time computer vision research at SEW-Eurodrive in Germany and with system administration and development of Linux distributions for the Brazilian Government.

This session will cover real-time feature tracking and object recognition in high resolution videos using GPUs and productive software libraries including ArrayFire. Feature tracking and object recognition are computer vision problems that have challenged researchers for decades. Over the last 15 years, numerous approaches were proposed to solve these problems, some of the most important being SIFT, SURF and ORB. Traditionally, these approaches are so computationally complex that processing more than a few frames per second is impossible. Using an NVIDIA K20 GPU with ORB, we are able to process more than 30 frames per second on images in the order of 10000x10000 pixels. Multiple quality and timing benchmarks will be presented, covering some of the most robust feature tracking methods.

Level: All
Type: Talk
Tags: Computer Vision & Machine Vision; Developer - Algorithms; Video & Image Processing; Press-Suggested Sessions: Deep Learning & Computer Vision

Day: Wednesday, 03/18
Time: 10:00 - 10:25
Location: Room 210B
View Recording
View PDF

S5282 - Avoiding Shared Memory Bank Conflicts in Rate Conversion Filtering

Mrugesh Gajjar Lead Research Engineer, Siemens Corporate Technology
Mrugesh Gajjar
Mrugesh Gajjar is an Engineer with Siemens Corporate Technology at Bangalore and working with Siemens Ultrasound Division at Mountain View on GPU based implementations of ultrasound signal processing algorithms. He holds Masters in Information & Communication Technology from Dhirubhai Ambani Institute (DA-IICT), India. He has 10 years of research and industrial experience and his interests include parallel computing and computer systems with focus on signal processing applications. He has 6 international publications and 2 patents pending.

Shared memory bank conflicts can be a significant performance limiter, depending on thread-dependent access patterns. We will present ideas on how to reduce shared memory bank conflicts in rate conversion filtering--a frequently used signal processing function in a variety of tasks such as image resizing. We find severe performance degradation for specific downsampling factors in rate conversion due to heavy bank conflicts in shared memory. We propose a novel technique for avoiding it via the use of scrambled addressing across threads. This technique is applicable more generally across many GPU architectures. We will demonstrate effectiveness with specific examples and performance measurements on NVIDIA GPUs and leave the attendee with ideas on how to identify and mitigate bank conflicts.

Level: Intermediate
Type: Talk
Tags: Developer - Performance Optimization; Medical Imaging; Video & Image Processing

Day: Wednesday, 03/18
Time: 10:00 - 10:25
Location: Room 210G
View Recording
View PDF

S5291 - Slicing the Workload: Multi-GPU Rendering Approaches

Ingo Esser DevTech ProViz, NVIDIA
Ingo Esser
Ingo Esser is a Senior DevTech Engineer in NVIDIA's Professional Solutions Group where he works to help different ISVs improving their rendering algorithms. These ISVs mostly work in the Automotive and the Oil&Gas domains, where either rendering complex surfaces or visualizing large datasets is an issue. He has a Diploma in Computer Science from the chair for Computer Graphics and Multimedia at the RWTH Aachen, Germany.

Since modern workstation applications become less CPU bound due to more efficient rendering pipelines, the GPU can become the bottleneck in a system and multi-GPU rendering can become an option to further speed up the rendering process. The first part of this talk will show the available tools for multi-gpu programming, including GPU-bound OpenGL contexts and functionality for synchronization and data transfer. The second part will dive into the details of designing a multi-threaded rendering pipeline which can be used to split up and distribute rendering tasks across a set of GPUs. Several split approaches and their resulting scaling behavior will be presented and discussed.

Level: Intermediate
Type: Talk
Tags: Real-Time Graphics; Rendering & Ray Tracing; Media & Entertainment

Day: Wednesday, 03/18
Time: 10:00 - 10:50
Location: Room LL21B
View Recording

S5374 - Training and Support System in the Cloud for Search and Rescue Missions

Pawel Musialik Programmer and Young Researcher, Institute of Mathematical Machines
Pawel is a graduate of Warsaw University of Technology, currently a Ph.D. candidate on Military University of Technology in Warsaw. Since February 2012 Pawel has been a young scientist and programmer in Institute of Mathematical Machines. His current research topics are semantic maps, 3D point cloud analysis and quantitative and qualitative reasoning. Pawel posseses over 5 years of C++ experience, 2 years as CUDA programmer, and 4 years of experience in academic lectures.

This work concerns the development of training and support system for SAR missions based on NVIDIA GRID technology. The architecture of cloud system will be discussed. This system can be deployed in the disaster zone as Mobile Data Centre and in typical Data Centre. We developed software tools for registration and gathering robotic data (3D cloud of points) into the common coordinate system. The rendering of 3D data is accessible via SaaS (Software as a Service) model. This software is dedicated for SAR teams working with modern UAV (Unmanned Aerial Vehicles) and UGV (Unmanned Ground Vehicles). GRID technology helps with integration of many data sources and visualisation over Ethernet. Training system is using these 3D maps as reference training area for rigid body simulation of robots.

Level: Intermediate
Type: Talk
Tags: Graphics Virtualization; Signal & Audio Processing; Data Center, Cloud Computing & HPC; Defense

Day: Wednesday, 03/18
Time: 10:00 - 10:25
Location: Room LL21F
View Recording
View PDF

S5379 - Porting CloverLeaf to CUDA Fortran

Greg Ruetsch Applied Engineer, NVIDIA
Greg Ruetsch is a Senior Applied Engineer at NVIDIA, where he works on CUDA Fortran and performance optimization of HPC codes using CUDA Fortran. His background is in mechanical engineering and applied mathematics, and has held positions at Stanford University’s Center for Turbulence Research and Sun Microsystems Laboratories.

This talk will discuss aspects of porting the CloverLeaf hydrodynamics code to CUDA Fortran. In particular, the use of unified or managed memory in CUDA Fortran is discussed in the context of the CloverLeaf code as well as in general code development. The use of the read-only data cache from CUDA Fortran is also discussed, as well as the use of new reduction intrinsics.

Level: Intermediate
Type: Talk
Tags: Supercomputing

Day: Wednesday, 03/18
Time: 10:00 - 10:25
Location: Room 212B
View Recording
View PDF

S5396 - Pimp My Ride: How to Mod Cars with Tegra

Dave Anderson Sr. Automotive Solutions Architect, NVIDIA
Dave Anderson is Senior Automotive Solutions Architect for NVIDIA's Automotive Business Unit, which provides the industry with powerful, yet efficient, processors and solutions for infotainment, digital instrument clusters and advanced driver assistance systems. Dave has more than 14 years of engineering experience in the technology industry. Prior to joining NVIDIA in March 2011, he served in several engineering and technical roles at Altera, Trilogy Marketing, Sirius Satellite Radio and Visteon. He also has U.S. Patents for a Flexible LED Backlighting Circuit and a Cross-Point Matrix for Infrared Touchscreen. He earned a BSEE in Electrical and Computer Engineering from Purdue University.

Tapping into in-vehicle architectures for infotainment and driver information applications is a huge challenge. We will examine several production cars as examples and provide insight into how NVIDIA automotive Tegra processors can be retrofitted into these cars as a proof-of-concept for next-generation digital clusters and infotainment systems.

Level: All
Type: Talk
Tags: Automotive; Video & Image Processing; Embedded Systems; Press-Suggested Sessions: Cars

Day: Wednesday, 03/18
Time: 10:00 - 10:25
Location: Room LL20D
View Recording

S5541 - Revolutionize Your Modeling and Design Workflow with CATIA Live Rendering, Iray and NVIDIA VCA

Pierre Maheut Creative Designer and Product Expert, Dassault Systèmes
Pierre Maheut
Graduated in Mechanical Engineering, Industrial Product Design and Innovation Management, Pierre manages the User Experience of CATIA Creative Design since 2008, connecting the dots between engineering and pure design thinking.
Xavier Melkonian Director CATiA Design / Shape Director, Dassault Systèmes
With over 20 years experience in the design industry, working with Autodesk and Alias, Xavier has been leading "CATiA Design" product line strategy for 5 years to deliver best in class Creative Design, Shape modeling & Visualization solutions across all industries to boost product design innovation by gaining the "competitive advantage by design ! ".

Using a concrete example with an actual CAD model running in CATIA, CATIA Live Rendering break down the frontier between industrial modeling and realistic rendering for design. Powered by Iray and coupled with NVIDIA VCA, it ensures real-time photo realistic rendering and unprecedented speed batching for all of your marketing assets. Follow-up a live actual creation workflow from ideation to marketing assets using the 3DEXPERIENCE platform.

Level: All
Type: Talk
Tags: Manufacturing; Rendering & Ray Tracing; Product Design & Styling; Visualization - Large Scale & Multi-Display; Press-Suggested Sessions: Professional Graphics

Day: Wednesday, 03/18
Time: 10:00 - 10:50
Location: Room LL21A
View Recording
View PDF

S5553 - The Fabric Engine DFG: GPU Visual Programming for Visual Effects

Peter Zion Co-founder and Chief Architect, Fabric Software Inc.
Highly-Rated Speaker
Peter Zion
Peter Zion is co-founder and chief architect at Fabric Engine Inc. Peter is the principal architect and implementor of the core of Fabric Engine, a platform for the development of high-end 3D production tools. In this role Peter has designed and developed the KL programming language, a high-level, single-source language that supports efficient parallel computation on both CPUs and GPUs.

In this session you will learn how the Fabric Engine Data-Flow Graph (DFG) provides an easy-to-use but powerful node-based visual programming interface that can be used for programming of CUDA GPUs. Fabric Engine is a platform for the creation of effects, simulations and tools for the media and entertainment industry, where visual programming is a popular development paradigm. Fabric Engine can be used standalone as well as integrated into popular off-the-shelf applications such as Maya, 3D Studio Max, and Softimage from Autodesk and Nuke from The Foundry. Fabric Engine can also be used for general CPU and GPU programming, providing a visual programming environment for general GPU programming.

Level: Beginner
Type: Talk
Tags: Media & Entertainment; Developer - Programming Languages; Real-Time Graphics

Day: Wednesday, 03/18
Time: 10:00 - 10:25
Location: Room LL21D
View Recording

S5604 - Visualization Toolkit: Faster, Better, Open Scientific Rendering and Compute

Robert Maynard R&D Engineer, Kitware, Inc.
Robert Maynard
Robert Maynard joined Kitware in March 2010 as a Research and Development Engineer. He received his B.S. in Computer Science from Laurentian University in May 2007. After graduating Robert spent 3 years at MIRARCO where he was the primary programmer on ParaViewGeo, a fork of ParaView designed for the mining industry. He is active contributor to the CMake, DAX, ParaView, and VTK projects. He has extensively contributed to the build systems for multiple open source projects.
Marcus Hanwell Technical Leader, Kitware, Inc.
Marcus Hanwell
Marcus D. Hanwell is a Technical Leader in the Scientific Computing group at Kitware, Inc. He leads the Open Chemistry project, which focuses on developing open-source tools to for chemistry, bioinformatics, and materials science research. He completed an experimental PhD in Physics at the University of Sheffield, a Google Summer of Code developing Avogadro and Kalzium, and a postdoctoral fellowship combining experimental and computational chemistry at the University of Pittsburgh before moving to Kitware, Inc. in late 2009. He is a member of the Blue Obelisk, blogs, is @mhanwell on Twitter, and is active on Google+ . He has also written several guest posts for opensource.com and the Kitware Source. He is passionate about open science, open source and making sense of increasingly large scientific data to understand the world around us. Dr. Hanwell has played a key role in developing new development workflows as Kitware's open source projects moved to Git, works on Gerrit code review integration, runs CDash@Home cloud-based testing with Gerrit code review, and contributes to next generation build systems in the VTK, ITK, and Titan projects. He has also been awarded and led a Phase I and Phase II SBIR project to further develop open-source chemistry tools for computational chemistry, and has taken part in international collaborations to establish open standards for data exchange in chemistry. Additionally, Dr. Hanwell has been an active member of several open-source communities. He is one of the core developers of Avogadro, an open-source, 3D, cross-platform molecular visualization and editing application/library. He has been an active member of the Gentoo and KDE communities, and is a member of the KDE e.V. His work in Avogadro was featured by Trolltech on their "Qt in Use" pages, and he was selected as a Qt Ambassador. He won a Blue Obelisk award for his work in open chemistry, and continues to develop and promote open approaches in chemistry and related fields. He has also worked throughout his career on approaches that use both experimental and computational approaches to validate theories and models, and continues to seek ways in which software tools can be created to make comparison and validation simpler.

The Visualization Toolkit (VTK) is an open source scientific visualization framework. We will describe the new VTK rendering backend which targets modern GPUs, taking advantage of the flexible programmable pipeline. This has resulted in significant improvements in rendering performance, especially with large geometries (20 million+ triangles) being rendered over 100 times faster, without significant API changes, with near identical rendering. This offers a drop-in replacement for existing applications, and a turn-key open source visualization framework for new applications. The VTK-M offers highly parallel and efficient algorithms for scientific data. The architecture, and how it will interact with VTK will be discussed.

Level: Beginner
Type: Talk
Tags: Visualization - In-Situ & Scientific; Press-Suggested Sessions: HPC & Science

Day: Wednesday, 03/18
Time: 10:00 - 10:50
Location: Room LL21C
View Recording
View PDF

S5718 - GPU Enhanced Molecular Dynamics of Lipid Membrane Systems

Russell Devane Senior Scientist, Procter & Gamble
Russell DeVane joined P&G in 2011 where he is currently a Senior Scientist in the Corporate Functions Modeling and Simulation group. Prior to joining P&G, he was an Assistant Research Professor in the Department of Chemistry at Temple University in Philadelphia, PA. He received a B.S. degree in chemistry from Florida Southern College and a Ph.D. in computational chemistry from the University of South Florida. After his graduate work he did postdoctoral work at the University of Pennsylvania as a National Science Foundation Bioinformatics Postdoctoral Fellow. Russell’s work focuses on the investigation of soft matter using of molecular dynamics simulations and high performance computing.

Lipid membrane systems show up in a broad range of industrial relevant applications. From human skin to fabric enhancers, the complex behavior of lipid systems presents challenges to product designers. At Procter & Gamble we are using GPU enhanced molecular dynamics to probe these complexities to not only help interpret experimental measurements but drive future experiments and refine our mechanistic understanding of processes. Aided by high performance computing resources provided through the DOE INCITE program, the level of complexity that we can capture in a simulation has progressed significantly. ***This talk is part of the "Accelerating Industrial Competitiveness through Extreme-Scale Computing" series chaired by Jack Wells, Director of Science, National Center for Computational Sciences, Oak Ridge National Laboratory.***

Level: All
Type: Talk
Tags: Supercomputing; Life & Material Science; Press-Suggested Sessions: HPC & Science

Day: Wednesday, 03/18
Time: 10:00 - 10:25
Location: Room 210D
View Recording

S5783 - Thinking Outside the Cartridge: Modern Ideas Applied to Archaic Devices

Jake Taylor Developer, Fuse
Hailing from the wilderness of northern Idaho, Jake Taylor has always had a knack for dissecting things and finding creative solutions to problems. From a very early age he showed a deep interest in computers and electronics, and has been a dedicated software enthusiast and hacker for the better part of his young life. On the side, he also enjoys making electronic music and being a poster child for college dropouts everywhere. Jake is currently located in Oslo, Norway, where he works at Fuse building the next generation of app development tools. NVScene 2008 was his first demoscene event, and he's excited to return.

Ever wanted to make something that looks and sounds awesome with a gaming console you've played on your entire life? Ever wanted to build a proper, modern toolset to do so and experiment with functional programming at the same time? Then this talk is for you! We'll take a look at two recent Super Nintendo demos, and, in particular, the ideas and methodologies applied when making them. First, we'll go over some of the details of the SNES' quirky hardware and the usual methods of making it tick. Building from there, we'll look at how most of this can be reduced to "simple" data processing, and how modern development techniques can be applied to make this simpler and more interesting. All in all, this talk aims to show how applicable modern programming practices can be to unexpected problem domains, and how inspiring it can be to work with creative, out-of-the-box solutions. After all, a little ancient console dev never hurt anyone, right?

Level: All
Type: Talk
Tags: NVScene; Game Development

Day: Wednesday, 03/18
Time: 10:00 - 10:50
Location: Room LL20A
View Recording

ECS5001 - CEO Show & Tell: Paracosm

Amir Rubin CEO, Paracosm
Amir Rubin
Amir co-founded Paracosm in Jan 2013, and currently serves as CEO. Prior to Paracosm, Amir spent the past 10 years developing 3D-vision systems and 3D-simulation applications.

He co-founded his first company, Prioria Robotics, while he was still a computer-engineering student at the University of Florida. At Prioria he developed vision systems for small drones. Most recently, he was employee #1 at a successful Florida startup, Shadow Health, which develops interactive healthcare simulations. He also holds a patent for weighing cows based on 3D-imagery and photographs of them.

Paracosm enables robots and augmented reality applications to understand and interact with the world around them. Their core technology is a spatial intelligence platform that provides the tools to collaboratively capture interior spaces, generate 3D maps, and create immersive experiences. They are a team of engineers, artists, and dreamers based in Gainesville, FL and Cambridge, MA.

Level: All
Type: Talk
Tags: Computer Vision & Machine Vision; Emerging Companies Summit; Press-Suggested Sessions: Deep Learning & Computer Vision

Day: Wednesday, 03/18
Time: 10:15 - 10:30
Location: Room 220B

ECS5002 - CEO Show & Tell: Herta Security

Javier Rodríguez Saeta CEO, Herta Security
Javier Rodríguez Saeta
Javier Rodríguez Saeta received the B.S., M.S. and Ph.D. degrees in Electrical (Telecommunication) Engineering from the Technical University of Catalonia, UPC, Barcelona (Spain), in 2000 and 2005, respectively. He has also received the B.A. degree in Business Administration by the Open University of Catalonia (UOC), and the MBA by ESADE Business School. In 2000 he worked for Robert Bosch, GmbH, in Hildesheim (Germany). In 2001, he joined Biometric Technologies, S.L., in Barcelona (Spain), where he was the R&D Manager. He founded Herta Security in 2009 and became the CEO of the company.

Herta Security is a world leader in the development of cutting edge facial recognition solutions. Based in Barcelona, Spain, – with offices in Madrid and London -, the company offers fast, accurate, robust, end-customer oriented solutions for video surveillance, access control, and marketing requirements.

Level: All
Type: Talk
Tags: Computer Vision & Machine Vision; Emerging Companies Summit

Day: Wednesday, 03/18
Time: 10:30 - 10:45
Location: Room 220B

S5145 - Accelerating R Applications with CUDA

Patric Zhao Senior GPU Architect, NVIDIA
Patric Zhao
Patric is Senior GPU Architect in HPC field at NVIDIA. He has seven years of experience developing scientific and engineering applications and is experienced in parallelization, performance modeling and architecture-specific tuning. Patric is currently working on Modular Dynamic and Big Data projects. Before joining NVIDIA, Patric worked on distributed processing and algorithm optimization for EDA software at Cadence.

R is a free software environment that provides a programming language and built-in libraries of mathematics operations for statistics, data analysis, machine learning and much more. In this talk, I will give an overview of applying GPU in R, and focus on three topics. First, I will introduce accelerating R computations by CUDA libraries, including apply drop-in library (nvblas) with zero coding effort, and step-by-step guide how to call CUDA-accelerated libraries such as cuFFT. Second, I am going to show how to accelerate legacy codes by directives (OpenACC), and write up your own CUDA algorithms in R. Third, I will illustrate the way to use CUDA tool chains with R as diverse as nvprof, cuda-memcheck and cuda-debug. Finally, I will present CUDA-accelerated results of several R benchmark.

Level: Intermediate
Type: Talk
Tags: Developer - Programming Languages; Big Data Analytics; OpenACC

Day: Wednesday, 03/18
Time: 10:30 - 10:55
Location: Room 210H
View Recording

S5156 - Coordinating More Than 3 Million CUDA Threads for Social Network Analysis

Adam McLaughlin Graduate Research Assistant, Georgia Institute of Technology
Adam McLaughlin
Adam McLaughlin is a fourth year Ph.D. student at Georgia Tech. He works in the High Performance Computing (HPC) lab as a graduate research assistant for Professor David Bader. His current research focuses on utilizing Graphics Processing Units (GPUs) for fast parallel execution of algorithms that traverse unstructured network data sets such as crawls of the internet or the social structure of Facebook. He has been a research intern for Los Alamos National Laboratory (LANL), Advanced Micro Devices (AMD), and NVIDIA. When he's not using grep he can likely be found listening to post-hardcore acts such as Dance Gavin Dance or Hail the Sun, reminiscing his days as a semi-professional poker player, or laughing at Arrested Development, The Whitest Kids U'Know, and stand up comedy.

Graphs that model social networks, numerical simulations, and the structure of the Internet are enormous and cannot be manually inspected. A popular metric used to analyze these networks is Betweenness Centrality (BC), which has applications in community detection, power grid contingency analysis, and the study of the human brain. However, these analyses come with a high computational cost. Here we present several hybrid GPU implementations, providing good performance on graphs of arbitrary structure rather than just scale-free graphs as was done previously. We achieve up to 13x speedup on high-diameter graphs and an average of 2.71x speedup overall over the best existing GPU algorithm. We observe near linear speedup and performance exceeding tens of GTEPS when running BC on 192 GPUs.

Level: Intermediate
Type: Talk
Tags: Big Data Analytics; Supercomputing; Developer - Algorithms

Day: Wednesday, 03/18
Time: 10:30 - 10:55
Location: Room 210C
View Recording
View PDF

S5221 - Tracking Objects Better, Faster, Longer

Alptekin Temizel Associate Professor, Middle East Technical University
Alptekin Temizel
Dr. Alptekin Temizel is an associate professor at Informatics Institute, Middle East Technical University (METU). He received his BSc in Electrical and Electronic Engineering from METU, Ankara, Turkey (1999) and his PhD from Centre for Vision, Speech and Signal Processing, University of Surrey, UK (2006). Between 1999-2001 he worked as a research assistant in University of Hertfordshire, UK. He co-founded Visioprime Ltd., UK –a company developing intelligent video systems for security and surveillance applications- and worked as a senior research engineer in this company between 2001-2006. Since 2006, he is a professor in Graduate School of Informatics, Middle East Technical University (METU), Turkey and consultant to several R&D companies. He is the principle investigator of Virtual Reality and Computer Vision Research Group (VRCV), NVIDIA CUDA Teaching Center and CUDA Research Center. His main research interest areas are: image and video processing, video surveillance, computer vision, parallel programming and GPU programming.

In this talk, we demonstrate a real-time long-term-tracker, Hybrid-TLD (H-TLD), which is based on the recently proposed Tracking-Learning-Detection (TLD) framework. TLD simultaneously tracks the object, learns its appearance and detects when it re-appears. While it has been shown to have promising results, its high computational cost prohibits running it at higher resolutions and frame-rates. We present our analysis of the framework and our modifications to make it work effectively on a CPU-GPU hybrid setting with a high utilization of both processing units using OpenMP and CUDA. Our results show that 10.25 speed up at 1920x1080 resolution could be obtained. The source code of the developed H-TLD library has been made publicly available.

Level: Intermediate
Type: Talk
Tags: Computer Vision & Machine Vision; Video & Image Processing

Day: Wednesday, 03/18
Time: 10:30 - 10:55
Location: Room 210B
View Recording
View PDF

S5297 - A Simulation of Global Atmosphere Model NICAM on TSUBAME2.5 Using OpenACC

Hisashi Yashiro Research Scientist, RIKEN Advanced Institute for Computational Science
Hisashi Yashiro
Hisashi Yashiro is research scientist at RIKEN Advanced Institute for Computational Science, where he joins the Computational Climate Science Research Team. He is the core developer of Nonhydrostatic ICosahedral Atmospheric Model (NICAM), which is a Japanese Next-generation climate model. He contributed to practice of performance optimization of NICAM and the large-scale simulation on the K computer. His specialty is atmospheric chemistry. He is also a member of Application Development Team of Exa-scale Computing Project in RIKEN AICS.

OpenACC was applied to the a global high-resolution atmosphere model named Nonhydrostatic ICosahedral Atmospheric Model (NICAM). We succeed the execution of the dynamical core test without re-writing any specific kernel subroutines for GPU execution. Only 5% of the lines of source code were modified, demonstrating good portability. The performance and scalability was evaluated using the TSUBAME2.5 supercomputer. The results showed that the kernels generated by OpenACC achieved good performance, which was appropriate to the memory performance of GPU, as well as weak scalability. A large-scale simulation was carried out using 2560 GPUs, which achieved 60 TFLOPS.

Level: All
Type: Talk
Tags: Supercomputing; Computational Physics; OpenACC

Day: Wednesday, 03/18
Time: 10:30 - 10:55
Location: Room 212B
View Recording
View PDF

S5520 - Building High Performance Input-Adaptive GPU Applications with Nitro

Saurav Muralidharan Ph.D. Student, University of Utah
Saurav Muralidharan
Saurav Muralidharan is a fifth year Ph.D. student in the CTOP research group at the University of Utah. His Ph.D. advisor is Prof. Mary Hall. He also collaborates with Michael Garland and Albert Sidelnik at NVIDIA Research, where he was an intern during the summers of 2013 and 2014. His research focuses on programming models and autotuning systems for parallel architectures, especially GPUs. The goal of his research is to help programmers build performance portable software with minimal effort.

Many irregular parallel computations such as sparse matrix-vector multiply (SpMV) and sorting have multiple different implementations (a.k.a variants) that are each suited for different classes of inputs. In this talk, attendees will learn about the Nitro automatic performance tuning framework and how it can be used to build high-performance input-adaptive GPU applications that automatically use the optimal variant for the given input data set. This is accomplished using a machine learning-based model that maps properties of the input data set to variants. We will present the Nitro C++ library and Python-based tuning interface and demonstrate their use in tuning 5 high-performance CUDA benchmarks. Finally, we will use heuristics in Nitro to reduce training time and other overheads.

Level: Advanced
Type: Talk
Tags: Developer - Performance Optimization; Developer - Tools & Libraries

Day: Wednesday, 03/18
Time: 10:30 - 10:55
Location: Room 210G
View Recording
View PDF

S5593 - Flexible, Scalable and Secure Deployment of Engineering Workstations with vGPU

V Balaji CIO, TATA Technologies Limited
Mr. Balaji brings more than 18 years of IT and process re-engineering and transformation experience to his role as Tata Technologies Chief Information Officer. As CIO, Mr. Balaji is responsible for the flow of information, coordination, assessment, and synchronization of all organization policies and standardization requirements for the information enterprise of Tata Technologies. Prior to his 2007 appointment as CIO, Mr. Balaji led the INCAT Enterprise Solutions Group. His distinguished career includes leadership roles at Harman/Becker Automotive Systems, Inc.; Anchor Glass Container Corporation; and the Ladish Co., Inc. Mr. Balaji possesses broad IT knowledge from his long history in the aerospace, automotive, and consumer goods sectors. Mr. Balaji earned a bachelor’s degree in Mechanical Engineering from the Indian Institute of Technology, Madras, as well as a Master of Science Degree in Manufacturing and Systems Engineering from Ohio State University in the U.S.

In this session learn about how Tata Technologies tested VDI to address security and network latency related performance challenges at its ODC in Pune. You will also hear how the IT team evaluated the vGPU technology that addressed flexibility and capacity scalability for the PLM lab, training and learning environment to create a private cloud that supported about eight engineering users per server node, delivering high-end graphics intensive work with thin clients. The private cloud is being planned to serve vGPU capabilities in phases, and it is estimated that the move to deploy engineering VDI will optimize costs by 30 percent. The private cloud has already begun to drive delivery transformation, and is soon slated to span across potential business cases in India, UK and US.

Level: Intermediate
Type: Talk
Tags: Graphics Virtualization; Automotive; Data Center, Cloud Computing & HPC; Press-Suggested Sessions: Professional Graphics

Day: Wednesday, 03/18
Time: 10:30 - 10:55
Location: Room LL21F

S5619 - BLINK: A GPU-Enabled Image Processing Framework

Mark Davey HPC Lead Engineer, The Foundry
Mark Davey joined The Foundry in 2011, heading up the HPC group to bring device agnostic image processing frameworks to a number of key products and plug-ins. Before his role at The Foundry, Mark was employed as Technology Manager at Grandeye, a leading manufacturer of security solutions, where he helped create innovative 360 degree cameras complete with sophisticated video analytics. Other roles have seen Mark working in fields as diverse as Augmented Reality Surgery, 3D Foetal Ultrasound, and Document Analysis and Classification. Mark obtained his Physics degree from University College London.

We present BLINK, a language and framework for developing image processing algorithms across a range of computation devices. BLINK-based algorithms are automatically translated to optimised code for both GPUs and CPUs. This "write-once" approach enables us to target both existing and new GPU hardware with minimal extra effort. Many algorithms produce visibly different results if mathematical operations are allowed to differ across platforms. Therefore BLINK has been designed to ensure numerically identical results between NVIDIA GPUs and CPUs. BLINK is at the heart of a number of key Foundry plug-ins and applications. An overview of this work and performance profiles will be presented, highlighting the speed gains achieved by using NVIDIA GPUs.

Level: Intermediate
Type: Talk
Tags: Media & Entertainment; Video & Image Processing

Day: Wednesday, 03/18
Time: 10:30 - 10:55
Location: Room LL21D
View Recording
View PDF

S5633 - Robust Speech Recognition for Cars

Ian Lane Assistant Research Professor, Carnegie Mellon University
Ian Lane
Ian Lane is an Assistant Professor at Carnegie Mellon University. He leads the speech and language-processing group at CMU Silicon Valley and performs research in the areas of speech recognition, spoken language understanding and speech interaction. Ian and his group are developing methods to accelerate speech and language technologies using GPUs.

One aspect os speech recognition work at Carnegie Mellon University is specifically focused on noise-robust speech recognition for automotive environments. By combining state-of-the-art methods in deep learning with GPU-accelerated embedded hardware, we are able to significantly improve the performance of speech recognition, even in challenging noise conditions.

Level: All
Type: Talk
Tags: Automotive; Signal & Audio Processing; Machine Learning & Deep Learning; Press-Suggested Sessions: Deep Learning & Computer Vision; Press-Suggested Sessions: Cars

Day: Wednesday, 03/18
Time: 10:30 - 10:55
Location: Room LL20D
View Recording

S5719 - Hybrid Simulations Using CPU-GPU Paradigm for Reacting Flows

Jeremiah Lee Staff Scientist, United Technologies Research Center
Dr. Jeremiah Lee has 20 years of experience in general area of applied mathematics, computational reactive flow. He has experience in many aspects in this field including, applied mathematics, high performance computing, large scale code development, algorithm development, data reduction, DNS of fundamental flame structures, turbulent flame modeling, acoustics and characteristics in CFD, dynamic chemical kinetics reduction, transition flows, spray dynamics, non-equilibrium multiphase (e.g. superheated fluids) flows, and GPU computing and applications in large scale CFD . His career in applied mathematics started with an investigation of the aerodynamics of baseballs at the Cooper Union. He developed a spectral element based DNS code for reactive flow while he was a graduate student at Princeton. In 1996, he joined the Swiss Federal Institute of Technology (ETH-Z) in Zurich Switzerland where he worked for 6 years as a research staff. In 2002, he returned to the USA and worked at the Combustion Research Facilities at the Sandia National Laboratories. Two years later, he came and joined the combustion group at UTRC. He has 18 publications in refereed journals and has been an invited speaker at the Princeton University, University of Southern California, University of Connecticut, and at the ICDERS.

GPU technology is attractive to computation intensive simulations such as Computational Fluid Dynamics (CFD) of Reacting Flows. A hybrid CPU-GPU paradigm was benchmarked by simulating a canonical CFD problem. A complex turbulent reactive flow was simulated including detailed chemistry that is typically burdensome for CPU based calculations. We achieved 2-5X overall speed-up using CPU-GPU simulations compared to CPU-only simulations. Further details of the CFD problem, hybrid methodology, performance metrics definition and benchmarking results will be presented. This promising technology, if exploited properly, could quickly enable accurate predictions of finite rate chemistry effects, such as pollutant emissions from combustors. ***This talk is part of the "Accelerating Industrial Competitiveness through Extreme-Scale Computing" series chaired by Jack Wells, Director of Science, National Center for Computational Sciences, Oak Ridge National Laboratory.***

Level: Intermediate
Type: Talk
Tags: Supercomputing; Computational Fluid Dynamics; Developer - Performance Optimization

Day: Wednesday, 03/18
Time: 10:30 - 10:55
Location: Room 210D
View Recording
View PDF

ECS5003 - CEO Show & Tell: Clarifai

Matthew Zeiler Founder & CEO, Clarifai
Matthew Zeiler
Matthew received a PhD in machine learning and image recognition with the pioneers of deep learning and convolutional neural networks. His research produced the world's best image recognition system in 2013. Matt's goal is to solve everyday problems with high tech solutions.

Clarifai was founded in 2013 by Matthew Zeiler to bring the world's best image recognition technology to market. Our first image recognition systems held the top 5 spots for classifying objects in images in the ImageNet 2013 competition. Since then Clarifai's deep learning systems have improved orders of magnitude in speed, vocabulary size, memory footprint and have expanded beyond images to extract knowledge from all forms of data.

Level: All
Type: Talk
Tags: Machine Learning & Deep Learning; Emerging Companies Summit

Day: Wednesday, 03/18
Time: 10:45 - 11:00
Location: Room 220B

S5679 - How Ubuntu is Enabling OpenPOWER and Innovation

Randall Ross Ubuntu Community Manager, for OpenPOWER & POWER8, Canonical
Randall Ross is Canonical's Ubuntu Community Manager for OpenPOWER & POWER8.

Learn how Canonical's Ubuntu is enabling OpenPOWER solutions and cloud-computing velocity. Ubuntu is powering the majority of cloud deployments. Offerings such as Ubuntu Server, Metal-as-a-service (MAAS), hardware provisioning, orchestration (Juju, Charms, and Charm Bundles), workload provisioning, and OpenStack installation technologies simplify managing and deploying OpenPOWER based solutions in OpenStack, public, private and hybrid clouds. OpenPOWER based systems are designed for scale-out and scale-up cloud and analytics workloads and are poised to become the go-to solution for the world's (and your businesses') toughest problems.

Level: All
Type: Talk
Tags: OpenPOWER; Supercomputing; Data Center, Cloud Computing & HPC

Day: Wednesday, 03/18
Time: 10:45 - 11:00
Location: Room 220C

ECS5004 - CEO Show & Tell: Jibo

Cynthia Breazeal Chief Scientist, Jibo
Cynthia Breazeal
Dr. Cynthia Breazeal is Chief Scientist of Jibo, Inc. She is also an Associate Professor at the MIT Media Lab where she directs the Personal Robots Group. Breazeal is recognized as a prominent innovator at the intersection of technology and society as the pioneer of Social Robotics. Her research spans both the creation of intelligent and socially responsive robots, as well as studying their impact on contributing to people's quality of life across education & learning, creativity, health, telecommunications, aging, entertainment, play, and more. Jibo, Inc. brings the technologies, design insights, and user experience of social robots to the home as the world's first family robot to help busy families to manage, care for, coordinate and connect with loved ones with greater ease, engagement, efficacy, and fun. As an open platform, Jibo enables 3rd party developers to bring the engagement and emotional lift of social robots to their apps and services.

Described by the company as the "world's first family robot," Jibo looks straight out of Pixar, but the plans that founder and Chief Scientist Breazeal has for the in-home social robot are very real. Jibo first appeared on the scene last summer as an Indiegogo crowd fund-raiser, bringing in the tidy sum of $2.3 million, and just recently announced it's raised $25.3 million in Series A funding. It's also taken almost 5,000 pre-orders to date, which are expected to start shipping at the end of this year. What will Jibo do? When fully realized, Jibo will engage as a helpful part of the family, a companion who knows the other members, and provides them with personalized messages and reminders, serves as the family photographer, tells stories to the kids, etc. As an open 3rd party developer platform, Jibo's skills will continue to expand, eventually providing services like ordering pizza and so much more. The company's goal is for Jibo to "help families manage, care for, coordinate, and connect with greater ease, engagement, efficiency, and fun."

Level: All
Type: Talk
Tags: Computer Vision & Machine Vision; Emerging Companies Summit; Press-Suggested Sessions: Deep Learning & Computer Vision

Day: Wednesday, 03/18
Time: 11:00 - 11:15
Location: Room 220B

S5483 - GPU Power through Javascript for Anyone with Universe 2.0 SDK

Sean Safreed Co-founder, Red Giant
Highly-Rated Speaker
Sean Safreed is co-founder of Red Giant Software, and a 16-year veteran of the computer graphics industry. The company started with two products and has since grown to offer more than 50 products with a team that spans the United States and Canada. Before founding Red Giant in 2002, he worked on the Apple's QuickTime team. At Silicon Graphics' he lead efforts to add innovative video features to the company's hardware systems. At Puffin Designs, he worked as a product manager on Commotion, a ground-breaking video paint application that originated at Industrial Light and Magic.

Red Giant Universe is a set of tools for creating visual effects across a wide range of popular DCC apps. It is now accessible by artists with basic Javascript programming skills. The system enables users to create in minutes or hours what used to take days or weeks to write in a mainstream computer language. This session will follow on the introductory session from 2014, with new expanded coverage of the SDK, Javascript examples and new additions to the system for real-time vector render and photo based rendering all in real-time on the GPU.

Level: All
Type: Talk
Tags: Media & Entertainment; Video & Image Processing; Real-Time Graphics

Day: Wednesday, 03/18
Time: 11:00 - 11:25
Location: Room LL21D
View Recording

S5644 - Flexible Cluster Rendering with NVIDIA VCA

Phillip Miller Director of NVIDIA Advanced Rendering Products, NVIDIA
Highly-Rated Speaker
Phillip Miller
Mr. Miller directs NVIDIA's Advanced Rendering offerings, ranging from the Iray and mental ray shipping within leading products in Design and Entertainment to the OptiX ray tracing framework used extensively within private and commercial applications. He has led leading software products for 20 years, including the 3D animation efforts at Autodesk and the Web Design products at Adobe. He holds a Masters of Architecture from the University of Illinois and is a registered architect.
Ankit Patel Senior Product Manager, VCA, NVIDIA
Ankit Patel
As a Sr.Product Manager Ankit Patel is working to deliver a product that allows everyone to harness the power of GPUs to help them realize their dreams, whether through creative storytelling or building amazing machines.

Learn how NVIDIA Visual Computing Appliances (VCA) are enabling a wide variety of rendering solutions to scale across hundreds of GPUs and stream their results back for interactive sessions of unprecedented performance. Commercial solutions employing Iray, VRay-RT, and OptiX will all be shown working with a remote cluster of VCAs. The mechanics of supporting the VCA from applications, managing clusters, and possibilities for streaming will also be explored.

Level: Beginner
Type: Talk
Tags: Rendering & Ray Tracing; Product Design & Styling; Visualization - Large Scale & Multi-Display

Day: Wednesday, 03/18
Time: 11:00 - 11:50
Location: Room LL21E
View Recording

ECS5005 - CEO Show & Tell: MirriAd

Mark Popkiewicz CEO, MirriAd
Mark Popkiewicz
Mark Popkiewicz is CEO of Mirriad, a video technology company built with circa $30m of investment capital.  A regular speaker at major industry events as well as TV and radio, Mark has led several technology businesses to global market leadership. Mark has a technology and commercial background with a truly global outlook.  To date Mark has spent more than half his career working with US based businesses and has set up 30 operations around the world including the BRIC and MINT countries.
He was previously a director of companies like BBC (commercial) Ventures, Mobile Media, Lucent and Eicon.  Mark has grown and improved performance of both small and large businesses with various trade exits and IPO. 

Launched in 2008 with a mission to revolutionize advertising for the Skip Generation, Mirriad's patented computer vision technology creates a new standard in advertising where a brand integration is an affordable, scalable ad unit running in multiple pieces of content. The resulting ads are seamless, authentic, and work across TV, tablet and mobile screens, building brand awareness and brand sentiment without interrupting the viewing experience. In 2013, an important aspect of Mirriad's imaging technology won an Academy Award.

Level: All
Type: Talk
Tags: Video & Image Processing; Emerging Companies Summit

Day: Wednesday, 03/18
Time: 11:15 - 11:30
Location: Room 220B

S5347 - Shadertoy: From the Web to Virtual Reality

Pol Jeremias Cofounder, Beautypi
Highly-Rated Speaker
Pol Jeremias
Pol Jeremias is passionate about technology and art. He uncovers the intersection with engineering and his imaginative mindset. He holds a Master's in CS from University of Southern California. He worked on Star Wars 1313 at LucasArts, and today, he's part of SoMa Play Inc, a AAA game studio.
Inigo Quilez Cofounder, Beautypi
Highly-Rated Speaker
Inigo Quilez
Inigo Quilez is fascinated with the potential of using code and math to build visual beauty. After many years in the demoscene, Inigo worked in virtual reality and real-time rendering. Today he's employed at Oculus, inventing techniques, drawing and creating procedural imagery.

What is beyond the web? In this talk the Shadertoy.com creators will cover how Shadertoy has changed and evolved over the years. The website started as a small community to create and share procedural shaders, growing to host more than 4000 creations, evolving overtime to a playground to create sound from the GPU, and finally transition to virtual reality. Join the Shadertoy team for 25 minutes of live-coding, gpu generated music, procedural visuals, virtual reality and Shadertoy, lots of Shadertoy.

Level: Intermediate
Type: Talk
Tags: Media & Entertainment; Real-Time Graphics; Augmented Reality & Virtual Reality

Day: Wednesday, 03/18
Time: 11:30 - 11:55
Location: Room LL21D
View Recording

S5699 - On Chip Controller (OCC)

Todd Rosedahl Chief Power/Thermal/Energy Management Engineer on POWER, IBM
Todd Rosedahl. IBM Chief Power/Thermal/Energy Management Engineer on POWER. Todd has worked on power, thermal, and energy management for his entire 22yr career at IBM and has over 20 related patents. He led the firmware effort for the On Chip Controller (OCC) which recently was released as open source.

The On Chip Controller (OCC) is a co-processor that is embedded directly on the main processor die. The OCC can be used to control the processor frequency, power consumption, and temperature in order to maximize performance and minimize energy usage. This presentation will include an overview of the power, thermal, and performance data that the OCC can access as well as the various control knobs, including adjusting the processor frequency and memory bandwidth. Details about the OCC processor, firmware structure, loop timings, off-load engines, and bus accesses will be given along with descriptions of example algorithms, system settings, and potential future enhancements.

Level: All
Type: Talk
Tags: OpenPOWER; Supercomputing

Day: Wednesday, 03/18
Time: 12:00 - 12:25
Location: OpenPOWER Booth
View Recording
View PDF

S5700 - HPC Solution Stack on OpenPOWER

Jing Li Software Development Manager, IBM, STG China
Jing is a Software Development Manager at IBM STG China. He leads a team for a HPC Cluster Management product development, and HPC Cloud as a Service development.

This demo will show the capability of IBM OpenPOWER that can be the foundation of the complicated High Performance Computing complete solution. From the HPC cluster deployment, job scheduling, system management, application management to the science computing workloads on top of them, all these components can be well constructed on top of IBM OpenPOWER platform with good usability and performance. Also this demo shows the simplicity of migrating a complete x86 based HPC stack to the OpenPOWER platform.

Level: All
Type: Talk
Tags: OpenPOWER; Supercomputing; Data Center, Cloud Computing & HPC

Day: Wednesday, 03/18
Time: 12:30 - 12:55
Location: OpenPOWER Booth
View Recording
View PDF

S5701 - Power 8 Microprocessor

Satish Kumar Sadasivam Senior Performance Engineer and Master Inventor, IBM STG
Satish Kumar Sadasivam is a Senior Performance Engineer and a Master Inventor at IBM STG responsible for Compiler and Hardware Performance analysis and optimization of IBM Power Processors and Compilers. He has 9+ years of experience in the area of Computer Architecture covering wide range of domains including Performance Analysis, Compiler Optimization, HPC, Competitive Analysis and Processor Validation. Currently he is responsible for delivering Performance Leadership for Power 8 Processor for emerging workloads. He also evaluates Competitors (Intel) Microrarchitecture design in detail and provide feedback to Power 9 Hardware design to address the next generation computing needs. He has filed more than 15 patents and achieved his 5th Invention Plateau and has several publications to his credit.

The primary objective of this presentation is to provide a performance evaluation methodology to the OpenPower user community to evaluate the performance using the advanced instrumentation capabilities available in the Power 8 Microprocessor. And also to present a case study on how CPI stack cycle accounting model can be effectively used to evaluate the performance of SPEC 2006 workloads in various SMT modes.

Level: All
Type: Talk
Tags: OpenPOWER; Supercomputing

Day: Wednesday, 03/18
Time: 13:00 - 13:25
Location: OpenPOWER Booth
View Recording

S5729 - Creating Interactive Visuals for Large Audiences

Joel Pryde Technical Lead, Stimulant
After getting his start in Carnegie Mellon University's Entertainment Technology Center, Joel has pursued a career in console game development and UI design. Working at studios like Activision, Microsoft and Gas Powered Games, Joel developed games for Vista, Next-gen consoles and some of the first Microsoft Surface tech demos. Joel is an experienced 3D graphics and physics developer and currently works at Stimulant building large-scale interactive experiences.

This session will cover what Joel has learned from working on interactive visuals for festivals, conferences and public places and some of the challenges of working in these venues. Both for his professional work at Stimulant (http://stimulant.com) as well as his side projects where he has created a number of very large scale interactive pieces for festivals such as Decibel, conferences like CES and various other public venues. This talk would include a quick high level overview of the work I've done and some of the tools and practices that have served me in building these creations and allowing them to react to music, the audience and changes in environment.

Level: All
Type: Talk
Tags: NVScene; Real-Time Graphics

Day: Wednesday, 03/18
Time: 13:00 - 13:50
Location: Room LL20A
View Recording

S5702 - SuperVessel: OpenPOWER R&D Cloud with Operation and Practice Experience Sharing

Yong Hua Lin Senior Technical Staff Member and Senior Manager of Cloud Infrastructure Group, IBM Research
Yonghua Lin is the Senior Technical Staff Member and Senior Manager of Cloud Infrastructure group in IBM Research. She has worked on system architecture research in IBM for 12 years. Her work covered all kinds of IBM multicore processors in the past 10 years, including IBM network processor, IBM Cell processor, PRISM, IBM POWER 6, and POWER 7, etc. She was the initiator of mobile infrastructure on cloud from 2007 which has become the Network Function Virtualization today. She led IBM team built up the FIRST optimized cloud for 4G mobile infrastructures, and successfully demonstrated in ITU, Mobile World Congress, etc. She was the founder of SuperVessel cloud to support OpenPOWER research and development in industry. She herself has more than 40 patents granted worldwide and publications in top conferences and journals.

SuperVessel cloud (www.ptopenlab.com) is the cloud platform built on top of POWER/OpenPOWER architecture technologies. It aims to provide the open remote access for all the ecosystem developers and university students. We (IBM Research China, IBM System Technology Lab in China and partners) have built and launched this cloud for more than 3 months, and rapidly attracted the public users from more than 30 universities, including those from GCG and the United States.

Level: All
Type: Talk
Tags: OpenPOWER; Supercomputing; Data Center, Cloud Computing & HPC

Day: Wednesday, 03/18
Time: 13:30 - 13:55
Location: OpenPOWER Booth
View Recording

S5712 - The Next Peak in HPC

Jack Wells Director of Science, Oak Ridge National Laboratory (ORNL)
Jack Wells is the Director of Science for the National Center for Computational Sciences (NCCS) at Oak Ridge National Laboratory (ORNL). He is responsible for devising the strategy to ensure cost-effective, state-of-the-art scientific computing at the NCCS, which hosts the Department of Energy’s Oak Ridge Leadership Computing Facility (OLCF), a DOE Office of Science national user facility, and Titan, currently the faster supercomputer in the United States. In ORNL’s Computing and Computational Sciences Directorate, Wells has previously lead both the Computational Materials Sciences group in the Computer Science and Mathematics Division and the Nanomaterials Theory Institute in the Center for Nanophase Materials Sciences. During an off-site assignment from 2006 to 2008, he served as a legislative fellow for U.S. Senator Lamar Alexander of Tennessee, providing information about high-performance computing, energy technology, and science, technology, engineering, and mathematics education policy issues. Wells began his ORNL career in 1990 for resident research on his Ph.D. in Physics from Vanderbilt University. Following a three-year postdoctoral fellowship at the Harvard-Smithsonian Center for Astrophysics, he returned to ORNL as a staff scientist in 1997 as a Wigner fellow. Jack is an accomplished practitioner of computational physics and has been sponsored in his research by the Department of Energy’s Office of Basic Energy Sciences. Jack has authored or co-authored over 70 scientific papers and edited 1 book, spanning nanoscience, materials science and engineering, nuclear and atomic physics computational science, and applied mathematics.
Tjerk Straatsma Manager, Scientific Computing Group, National Center for Computational Sciences (NCSS)
Dr. Tjerk P. Straatsma is an internationally recognized scientist with more than 30 years of experience in the development, efficient implementation and application of advanced modeling and simulation methods as key scientific tools in the study of chemical and biomolecular systems, complementing analytical theories and experimental studies. His research focuses on the development of computational techniques that provide unique and detailed atomic level information that is difficult or impossible to obtain by other methods, and that contributes to the understanding of the properties and function of these systems. Dr. Straatsma joined Oak Ridge National Laboratory in 2013, where he manages the Scientific Computing group in the National Center for Computational Sciences. This center is the site for the Oak Ridge Leadership Computing Facility and houses the largest supercomputer for open science in the United States. Prior to joining ORNL, Dr. Straatsma was a Laboratory Fellow at Pacific Northwest National Laboratory, and Director for the eXtreme Scale Computing Initiative to build the capabilities needed to enable scientific advancements and breakthroughs in selected domain sciences through computational modeling and simulation on next-generation, extreme-scale computers. He established the Computational Biology and Bioinformatics research group and was a member of the development team for the NWChem massively parallel computational chemistry software suite. Dr. Straatsma earned his Ph.D. in Mathematics and Natural Sciences from the University of Groningen.

Hybrid CPU+GPU architectures are a response to power limitations imposed by the end in the last decade of processor clock-rate scaling. This limitation continues to drive supercomputer architecture designs toward massively parallel, hierarchical, and/or hybrid systems, and we expect that, for the foreseeable future, large leadership computing systems will continue this trajectory in order to address science and engineering challenges for government, academia, and industry. Consistent with this trend, the U.S. Department of Energy's (DOE) Oak Ridge Leadership Computing Facility (OLCF) has signed a contract with IBM to bring a next-generation supercomputer to the Oak Ridge National Laboratory (ORNL) in 2017.

Level: All
Type: Talk
Tags: OpenPOWER

Day: Wednesday, 03/18
Time: 13:30 - 13:55
Location: Room 220C

ECS5019 - Early Stage Challenge Panel & Contestants Introduction

Come hear Jeff Herbst (NVIDIA) and Scott Budman (NBC) introduce the panel and contestants for the Early Stage Challenge.

Level: All
Type: Talk
Tags: Emerging Companies Summit

Day: Wednesday, 03/18
Time: 14:00 - 14:25
Location: Room 220B

S5133 - From Biological Cells to Populations of Individuals: Complex Systems Simulations with CUDA

Paul Richmond Research Fellow, University of Sheffield
Paul Richmond
Dr. Paul Richmond is a Vice Chancellor Research Fellow in the Department of Computer Science at the University of Sheffield. He is currently working on the acceleration of complex systems simulations using accelerator architectures such as GPUs. His research interests relate to the software engineering challenges of how complex systems can be described using high level or domain specific tools and how automated mapping to parallel and distributed hardware architectures can be achieved. Dr Richmond is particularly interested in applying agent-based techniques to cellular biology, computational neuroscience, pedestrian and transport systems as well as working with closely with industry through the University of Sheffield's Advanced Digital Research Centre and through the Transport Innovation Centre.

Complex systems are prevalent throughout various levels of biology from the molecular and cellular scales through to populations of interacting individuals. This talk discusses how formal state based representation of agents within a complex system can be simulated and visualized at large scales using the open source FLAME GPU framework. Methods of code generation from XML documents and use of CUDA streams for heterogeneous state execution are presented. Examples include cellular tissue modelling and large scale crowd dynamics.

Level: All
Type: Talk
Tags: Big Data Analytics; Developer - Tools & Libraries; Life & Material Science

Day: Wednesday, 03/18
Time: 14:00 - 14:25
Location: Room 210D
View Recording
View PDF

S5147 - Faster Convolutional Neural Networks by Separable Filters

Che-Rung Lee Professor, National Tsing Hua University
Che-Rung Lee
Che-Rung Lee received his B.S. and M.S. degrees in Computer Science from National Tsing Hua University Taiwan in 1996 and 2000 respectively, and the Ph.D. degree in Computer Science from University of Maryland, College Park in 2007. He joined the Department of Computer Science at National Tsing Hua University as an assistant professor in 2008 and became an associate professor in 2013. His research interests include numerical algorithms, scientific computing, high-performance computation, and cloud computing. He is the chair of CCOE (CUDA Center Of Excellence) in NTHU (National Tsing Hua University). He is a member of IEEE and SIAM.

Learn how to accelerate the training process of the convolutional neural networks (CNNs) for image recognition on GPU with separable filters. It uses Singular Value Decomposition (SVD) to approximate 2D filters by the product of two 1D filters, and performs two 1D convolutions consecutively. The GPU implementation consists of two kernels. First is a batched SVD routine on GPUs that can compute multiple small matrices simultaneously. Second is the computation of convolution, which combines three methods using different memory spaces for various filter size. Experimental results show that the implementations can achieve 1.35x~2.66x speedup in the forward pass and the backward pass comparing to the state of art GPU implementations of CNNs.

Level: Intermediate
Type: Talk
Tags: Machine Learning & Deep Learning; Computer Vision & Machine Vision; Developer - Performance Optimization

Day: Wednesday, 03/18
Time: 14:00 - 14:25
Location: Room 210A
View Recording

S5213 - Effective Planning for Density and Performance in a Virtual Desktop Deployment with NVIDIA GRID™

Jason Southern Senior Solution Architect, NVIDIA
Jason Southern
Jason has been in desktop and application virtualization from before it had a name. Starting with the deployment one of the earliest large scale implementations of Citrix WinFrame in Europe during the late 90's, Jason joined Citrix in 2002 where he stayed for almost a decade in a variety of roles. Jason has also spent time at Eco-System vendors AppDNA (which led to Jason returning to Citrix during the acquisition) and AppSense where he ran the EMEA Strategic Alliances organization. Since 2013 Jason has been a Senior Solutions Architect for NVIDIA focused on GRID technologies working on a broad variety of projects and developing several best practices for optimising remote graphics in conjunction with the virtualization vendors.

Adding GPU's to a virtual desktop deployment is only the beginning. Optimising the deployment to make best use of the GPU's and deliver the best experience to the end user. In this session we will discuss: choosing between Passthrough, vGPU or API Intercept methods, effectively selecting the right vGPU profile and card, benchmarking and the effects of virtualization; optimizing the virtual infrastructure and, fine tuning remote graphics protocols. This session will include real-world examples and demonstrations of the impact minor changes have on performance.

Level: Intermediate
Type: Talk
Tags: Graphics Virtualization; Data Center, Cloud Computing & HPC

Day: Wednesday, 03/18
Time: 14:00 - 14:50
Location: Room LL21F
View Recording
View PDF

S5295 - Next Generation Surround-View for Cars

Miguel Sainz Principal Engineer, Computer Vision, NVIDIA
Miguel Sainz is a Principal Engineer in Computer Vision at NVIDIA. His research interests and focus include real-time 3D graphics, image based modelling and rendering, camera calibration, 3D model reconstruction from images, tracking and image processing. Prior to joining NVIDIA Miguel received a degree in Electrical Engineering at the Polytechnic University of Catalonia, UPC,(Spain) and a PhD. degree in Electrical and Computer Engineering at the University of California, Irvine.
Timo Stich Sr. Developer Technology Engineer, NVIDIA
Highly-Rated Speaker
Timo Stich
Timo Stich is a Senior Developer Technology Engineer for NVIDIA Corporation. His focus is on image processing applications of Graphics Processors. Prior to joining NVIDIA he was research staff in Computer Graphics and Image Processing at the Max-Planck-Institute for Computer Science and the Computer Graphics Lab of Brunswick University, Germany. He received a diploma degree in Computer Science from Mannheim University, Germany and a Ph.D. degree from the Brunswick University, Germany.

A robust proof of concept Surround-Vision and Top View system for cars includes four car mounted cameras as inputs and the Jetson Pro platform as the computation and display unit, relying on CUDA and OpenGL for both GPGPU and rendering of the final views. Topics covered will include: the placement and calibration of the cameras, color correction and data preprocessing. A technical deep dive on the common pitfalls will highlight common visual artefacts in Top-View visualizations, and will present the algorithmic building blocks to correct for those errors.

Level: Intermediate
Type: Talk
Tags: Automotive; Computer Vision & Machine Vision; Real-Time Graphics; Press-Suggested Sessions: Cars

Day: Wednesday, 03/18
Time: 14:00 - 14:25
Location: Room LL20D
View Recording
View PDF

S5345 - VMware Horizon 6 View with NVIDIA GRID: A Practical Discussion of a Real-World Deployment

Jeff Weiss NVIDIA GRID SA Manager, NVIDIA
Jeff Weiss is the GRID SA Manager for North America working with the Solution Architecture & Engineering team at NVIDIA. Prior to joining NVIDIA, Jeff has a pedigree that includes a 7 year stint at VMware as an EUC Staff Engineer, as well as time at Symantec and Sun Microsystems. Along with his current focus on vGPU enabled end user computing, his experience includes datacenter business continuity/disaster recovery solutions, software infrastructure identity management and email security/archiving tools. During his tenure, he has architected, sold and deployed complex solutions into a wide array of both public and private accounts, from commercial to government, healthcare to education. Prior to working in sales, he spent time as a networking and datacenter manager at Hughes Aircraft for 14 years. Jeff is based in Los Angeles, CA.
Randall Siggers Solutions Architect, Textron Inc.
Randall Siggers
Randall is a Solutions Architect for Textron with 20 yrs. of experience in IT. He specializes in researching, analyzing, and implementing emerging technology. His current projects involve, SCCM, Cloud technology, imaging systems and VDI. He enjoys traveling and speaking at various tech conferences including VMworld 2014 and the AIA CIO LFRT. In his spare time he enjoys building custom gaming rigs, vintage BMX, and import tuning.
Ali Rizvi PLM Support Analyst, Bell Helicopter
Ali Rizvi is the PLM Support Analyst at Bell Helicopter. Ali join bell in 2014 and his tenure has worked on VDI and several other projects. Ali has passion for exploring VDI functionality to resolve user issues and enhance user productivity and experience. Ali current responsibilities involve collaborating with the Engineering Operation Team to develop CATIA methodologies configuration and define VDI customization. He is also working through complex Enovia-CATIA interoperability problems with software vendor support.

Jacobs Engineering has validated and expanded the capabilities of Horizon 6 View architecture paired with NVidia grid cards for use in their design departments and remote offices worldwide. This architecture is now able to provide a feature rich 3d design environment to local and remote users without sacrificing security, reliability and performance. Jacobs now has the ability to take existing and emerging platforms, combine them in such a way that allows them to offer a new set of expanded capabilities to their users and customers. They have found success where many others have failed. Come hear first hand how Jacobs Engineering accomplished this feat directly from the project's lead architect, Randall Siggers and VMware EUC Staff Engineer, Jeff Weiss.

Level: Intermediate
Type: Talk
Tags: Graphics Virtualization; Press-Suggested Sessions: Professional Graphics

Day: Wednesday, 03/18
Time: 14:00 - 14:50
Location: Room LL20C

S5387 - Performance Gains Achieved Through Modern OpenGL in the Siemens' DirectModel Rendering Engine

Jeremy Bennett Senior Software Engineer, Siemens PLM Software
Jeremy Bennett
Jeremy Bennett is a Senior Software Engineer at Siemens PLM software with over 10 years of industry experience. He is one of the principal developers of the Direct Model large-model visualization toolkit and has several Direct Model and computer-graphics related patents. He jointly received the Siemens PLM 2012 Invention of the Year award. His research interests include high-performance computer graphics and visualization, computer architecture, and large-scale geometric simplification methods. Jeremy holds a B.S. degree in Computer Engineering, M.S. Degree in Human Computer Interaction, and is pursuing a Ph.D. degree in Human Computer Interaction and Computer Engineering from Iowa State University.
Michael Carter Senior Key Expert, Siemens PLM Software
Michael Carter
Michael B. Carter has been employed by Siemens PLM Software and its antecedent Engineering Animation, Inc. since 1993. He is one of the principal architects of the DirectModel large-model visualization toolkit. He jointly received the EDS Patent of the Year Award in 2004 for Clustered Backface Culling and the Siemens PLM 2012 Invention of the Year award. He holds patents for DirectModel and computer-graphics related technologies. His research interests include high-performance computer graphics and visualization, computer architecture, geometric compression techniques, and large-scale geometric simplification methods. Mike holds a Ph.D. degree in Computer Engineering from the Iowa State University.

Advances in GPU Technology have opened the door for significant performance gains for applications willing to use the modern OpenGL APIs. This talk will provide details of how the Direct Model Scene Graph and Rendering Engine has adapted its rendering architecture to handle not only today's, but tomorrow's advances, and how the use of these technologies have significantly increased rendering performance.

Level: Intermediate
Type: Talk
Tags: Manufacturing; Real-Time Graphics; Developer - Performance Optimization; Rendering & Ray Tracing

Day: Wednesday, 03/18
Time: 14:00 - 14:50
Location: Room LL21A
View Recording
View PDF

S5388 - OpenACC for Fortran Programmers

Michael Wolfe Compiler Engineer, NVIDIA
Highly-Rated Speaker
Michael Wolfe
Michael Wolfe works on the Portland Group compilers at NVIDIA. He has over 35 years of experience developing languages and compilers for high performance and parallel computers in industry and academia. He has published one textbook, "High Performance Compilers for Parallel Computing," and a number of technical papers.

Learn how to program NVIDIA GPUs using Fortran with OpenACC directives. The first half of this presentation will introduce OpenACC to new GPU and OpenACC programmers, providing the basic material necessary to start successfully using GPUs for your Fortran programs. The second half will be intermediate material, with more advanced hints and tips for Fortran programmers with larger applications that they want to accelerate with a GPU. Among the topics to be covered will be dynamic device data lifetimes, global data, procedure calls, derived type support, and much more.

Level: All
Type: Talk
Tags: OpenACC; Developer - Programming Languages; Supercomputing

Day: Wednesday, 03/18
Time: 14:00 - 14:50
Location: Room 210H
View Recording
View PDF

S5409 - Custom Iray Applications and MDL for Consistent Visual Appearance Throughout Your Pipeline.

Dave Hutchinson Chief Operating Officer, Lightworks
Dave Hutchinson
Mr. Hutchinson is Chief Operating Officer at Lightworks and leads sales, product development, engineering and client support. He has a deep understanding across a range of visualization technologies matched with a clear view of what is required to successfully deliver commercial visualisation solutions. He is currently focused on bringing Iray technology to both developers and manufacturers who require commercial level physically based rendering within their applications and workflows.
Dave Coldron Product Director, Lightworks
Dave Coldron
Mr Coldron leads Product Management across all Lightwork Design solutions having key responsibility for the Lightworks Iray+ product and Iray+ for 3DSMax. He has over 20 years experience in the computer graphics industry specifically in providing integrated solutions for the mechanical CAD, lighting simulation, architectural and interior design markets. He has an art and design background combined with wide ranging knowledge of 3D graphics technology, interaction design and usability. Using these skills he drives products forward using compelling digital content combined with a design process aimed at supporting end-users workflow.

Take a tour through the possibilities that Iray physically based visualization and GPU scaling can unlock for your interactive photoreal applications and workflows. We demonstrate how Iray features and technology can be integrated and exposed within your existing digital tools, like the new breakthrough Iray+ for 3DSMax plug-in. Iray can enable your entire workflow from design and validation through marketing and consumer experiences with the same consistent photorealistic MDL powered visualization. Whether you want to build custom standalone applications and integrations, or use remote visualization to enable mobile, collaborative or cloud workflows, you will leave this presentation with a very clear view on what your next steps need to be to achieve your Iray goals.

Level: All
Type: Talk
Tags: Rendering & Ray Tracing; Media & Entertainment; Manufacturing; Developer - Tools & Libraries

Day: Wednesday, 03/18
Time: 14:00 - 14:50
Location: Room LL21E
View Recording

S5439 - Animating Singing Volcanoes: Real-Time Shading in Presto for Pixar's Short Film "Lava"

Dirk Van Gelder Engineering Lead, Pixar Animation Studios
Dirk Van Gelder
Dirk Van Gelder joined Pixar Animation Studios in 1997 as a software engineer for Academy Award® nominated film A Bug's Life and winning short film Geri's Game, working on animation software and the studio's first use of subdivision surfaces. Dirk has worked on software for every Pixar movie since, including the ground-up rewrite of the studio's proprietary animation system Presto. Currently Dirk leads the Character team in the Pixar Studio Tools Department.
Byron Bashforth Technical Director, Pixar Animation Studios
Byron Bashforth has been at Pixar for 15 years and has shaded characters for "Monsters, Inc", "Finding Nemo", "The Incredibles", "Ratatouille", "Up", and "Brave". Most recently, he was the Character Shading Lead for "Monsters University" and the Shading Lead for "Lava".

This session will show how GPUs were used in Pixar's Presto animation system to give animators the feedback they need to make volcanos sing. Most of the talk will consist of live demos of Presto with the characters from Pixar's latest short film "Lava". It was a challenge on that film to show rock surfaces sliding against each other, and to have transparent cutout effects to achieve a particular look around the mouth. We will show how multipass OpenGL drawing techniques and custom glsl shaders allowed animators to see these effects with lighting and shadows in real-time while they work.

Level: All
Type: Talk
Tags: Media & Entertainment; Real-Time Graphics

Day: Wednesday, 03/18
Time: 14:00 - 14:25
Location: Room LL21D

S5478 - I Can't Believe It's Not Just Molecular Dynamics (It's Machine Learning Too).

Scott LeGrand Principal Engineer, Amazon
Highly-Rated Speaker
Scott Le Grand is currently a principal engineer at Amazon working the personalization team. He developed the first molecular modeling system for home computers, Genesis, in 1987, Folderol, the distributed computing project targeted at the protein folding problem in 2000, and BattleSphere, a networkable 3D space shooter for the Atari Jaguar the same year. Surprisingly, all three of these efforts shared a common codebase. More recently, he ported the Folding@Home codebase to CUDA, achieving a 5x speedup over previous efforts, and which currently accounts for ~2.6 petaFLOPs of the project’s computational firepower. He is best known for his work porting the AMBER molecular dynamics package to CUDA, attaining record-breaking performance in the process. In a previous life, Scott picked up a B.S. in biology from Siena College and a Ph.D. in biochemistry from the Pennsylvania State University. In the current life, he is optimizing the performance of deep neural networks by day and continuing to work on AMBER by night.

There is a surprising algorithmic overlap between Deep Neural Networks (DNNs) and Molecular Dynamics. This talk will describe bidirectional technology transfers between these two seemingly disparate fields that has resulted from applying the wisdom gained porting the AMBER molecular dynamics package to 4 generations of NVIDIA GPUs over the past 6 years to the development of a deep neural network system. Finally, I will present record-breaking AMBER performance numbers for Maxwell GPUs and GPU clusters.

Level: Intermediate
Type: Talk
Tags: Life & Material Science; Machine Learning & Deep Learning; Press-Suggested Sessions: Deep Learning & Computer Vision

Day: Wednesday, 03/18
Time: 14:00 - 14:50
Location: Room 212A
View Recording
View PDF

S5504 - Tackling Performance Bottlenecks in the Diversifying CUDA HPC Ecosystem: A Molecular Dynamics Perspective

Szilárd Páll PhD student, KTH Royal Institute of Technology
Szilard is a Ph.D. student in theoretical and computational biophysics at the KTH Royal Institute of technology. With a background in CS and engineering, his main focus has been parallel algorithms and methods for molecular dynamics; he is also a core developer of the GROMACS open source molecular dynamics package.

The rapid evolution of CUDA GPU architecture and the new heterogenous platforms that break the hegemony of x86 offer opportunities for performance optimizations, but also pose challenges for scalable heterogeneous parallelization of the GROMACS molecular simulation package. This session will present our latest efforts to harness recent CUDA architectures to improve algorithmic efficiency and performance of our molecular dynamics kernels. We will also discuss load balancing and latency-hiding challenges emphasized by the expansions of GPU-accelerated platforms with CPUs a ranging from a power-optimized ARM architectures to extreme-performance highly multi-threaded Power and Xeon CPUs. Come to learn about our experiences in developing portable heterogeneous high performance code!

Level: Advanced
Type: Talk
Tags: Supercomputing; Life & Material Science; Computational Physics

Day: Wednesday, 03/18
Time: 14:00 - 14:50
Location: Room 212B
View Recording
View PDF

S5578 - Massively-Parallel Vector Graphics

Diego Nehab Professor, IMPA
Diego Nehab
Diego Nehab received a Ph.D. in Computer Science from Princeton University in 2007. He then joined Microsoft Research in Redmond as a post-doctoral researcher. In 2010, he moved back to Brazil and joined IMPA as a professor, where he now works in topics related to parallelism, computer graphics, and image processing.

In this talk, we will describe the first massively-parallel vector graphics rendering pipeline. Traditional rendering methods draw shapes one after the other into an output image, or use sequential algorithms to build acceleration data structures before rendering all pixels in parallel. We present an acceleration data structure that can be built efficiently and in parallel for all input segments. We also show how share samples between pixels in parallel to enable production-quality antialiasing filters and a large number of samples per pixel. The pipeline is feature-rich, and renders complex vector graphics in state-of-the-art quality and performance. The talk will be particularly interesting to researchers and developers that deal with rendering of complex 2D content.

Level: Intermediate
Type: Talk
Tags: Real-Time Graphics; Developer - Algorithms

Day: Wednesday, 03/18
Time: 14:00 - 14:50
Location: Room LL21B
View Recording
View PDF

S5660 - Scientific Visualization on GPU Clusters

Peter Messmer Senior DevTech Engineer, NVIDIA
Highly-Rated Speaker
Peter Messmer
Peter Messmer is a senior software engineer in NVIDIA's Developer Technology organization, working with clients to accelerate their scientific discovery process with GPUs. One area of his current research is to investigate how to utilize the GPUs in high-performance computing systems for data analysis and visualization. Prior to joining NVIDIA, Peter spent more than 15 years developing HPC- and GPU-accelerated applications for industry and government clients, ranging from simulating next-generation particle accelerators or electromagnetic problems to modeling the behavior of dust on the surface of the Moon. Peter holds an MSc and PhD in physics from ETH Zurich, Switzerland, with specialization in kinetic plasma physics and nonlinear optics.

Learn how to visualize your data on GPU accelerated supercomputers. In this presentation, we will give an overview of data analysis and visualization on GPU accelerated supercomputers and clusters. In a first part, we will describe the steps necessary to use the GPUs in a remote supercomputer for visualization. We will then provide a brief overview of Paraview, one of the widely used visualization applications, touching on topics like parallel compositing and in-situ visualization of GPU resident data.

Level: Beginner
Type: Talk
Tags: Visualization - In-Situ & Scientific; Graphics Virtualization; Supercomputing; Press-Suggested Sessions: HPC & Science

Day: Wednesday, 03/18
Time: 14:00 - 14:50
Location: Room LL21C
View Recording
View PDF

S5683 - Porting Scientific Applications to OpenPOWER

Dirk Pleiter Professor and research group leader, Jülich Supercomputing Centre
Prof. Dr. Dirk Pleiter is research group leader at the Jülich Supercomputing Centre (JSC) and professor of theoretical physics at the University of Regensburg. At JSC he is leading the work on application oriented technology development. Currently he is principal investigator of the ExascaleInnovation Center, the NVIDIA Application Lab at Jülich as well as the newly established POWER Acceleration and Design Center. He has played a leading role in several projects for developing massively-parallel special purpose computers, including QPACE.

While over the past years significant experience for using GPUs with processors based on the x86 ISA has been obtained, GPU-accelerated systems with POWER processors have become available only very recently. In this talk we report on early experiences of porting selected scientific applications to GPU-accelerated POWER8 systems. We will explore basic performance features through micro-benchmarks, but our main focus will be on results for full applications or mini-applications. These have been selected such that hardware characteristics can be explored for applications with significantly different performance signatures. The application domains range from physics to life sciences and have in common that they are in need of supercomputing resources.

Level: All
Type: Talk
Tags: OpenPOWER; Supercomputing

Day: Wednesday, 03/18
Time: 14:00 - 14:15
Location: Room 220C
View Recording

S5784 - GPU Unchained

Timothy Lottes GPU Developer
Timothy Lottes
Timothy Lottes is a GPU focused developer with professional history at Epic, NVIDIA, Industrial Light & Magic, and a side track as a professional fine art landscape photographer.

Live voyage thought a collection of low-level and advanced GPU programming topics with a focus on unconventional thinking. Starting with an interactive look at driving the CPU from the GPU: showing a GL based pipeline where shaders can write to a command buffer which the CPU executes, enabling GPU driven reconfiguration of resources and rendering pipeline. Exploring methods to use this kind of rapid development tool for manual run-time profile guided optimization. Continuing with a visual exploration of advanced filtering techniques for real-time ray based rendering, and methods to enable 1080p ray-march at 120Hz and beyond.

Level: Beginner
Type: Talk
Tags: NVScene; Real-Time Graphics

Day: Wednesday, 03/18
Time: 14:00 - 14:50
Location: Room LL20A
View Recording

S5825 - HP/NVIDIA Solutions for HPC Compute and Visualization Performance (Presented by HP)

Ed Turkel Group Manager, HPC Business Development, HP Servers, HP
Ed manages the worldwide product marketing team for the High Performance Computing (HPC) business at Hewlett Packard. The HPC business delivers integrated solutions for HPC with maximum performance and efficiency, enabling innovative research, engineering and analytics. Ed's team is responsible for developing HP's solutions and go-to-market strategy for HPC, working closely with HP's customers to develop the solutions that enable them to best achieve their business and research outcomes. Ed has almost 35 years experience in HPC, including 30 years with HP, in various technical, marketing and business roles.

High Performance Computing is characterized by user demand for increasing levels of computational performance, combined with exploding volumes of data, to accomplish their science, engineering, or analytics workloads. Demands for performance growth are becoming increasingly limited by the power, space and cost of deployment of new systems, while exploding data volumes challenge traditional client/server computing models. For years, HP has partnered with NVIDIA to develop HPC solutions that are purpose-built for compute and visualization performance and scalability, while delivering innovative energy and space efficiency, with a focus on customer ROI. This session will showcase HP and NVIDIA's latest technologies and solutions in use today by leaders in the HPC community, plus trends for the future.

Level: All
Type: Talk
Tags: Data Center, Cloud Computing & HPC

Day: Wednesday, 03/18
Time: 14:00 - 14:50
Location: Room 210C
View Recording
View PDF

S5690 - Life at the Intersection: OpenPOWER, Open Compute, and the Future of Cloud Software & Infrastructure

Aaron Sullivan Senior Director and Distinguished Engineer, Rackspace
Aaron Sullivan is a Senior Director and Distinguished Engineer at Rackspace, focused on infrastructure strategy. Aaron joined Rackspace's Product Development organization in late 2008, in an engineering role, focused on servers, storage, and operating systems. He moved to Rackspace's Supply Chain/Business Operations organization in 2010, mostly focused on next generation storage and datacenters. He became a Principal Engineer during 2011 and a Director in 2012, supporting a variety of initiatives, including the development and launch of Rackspace's first Open Compute platforms. He became a Senior Director and Distinguished Engineer in 2014. These days, he spends most of his time working on next generation server technology, designing infrastructure for Rackspace's Product and Practice Areas, and supporting the growth and capabilities of Rackspace's Global Infrastructure Engineering team. He also frequently represents Rackspace as a public speaker, writer, and commentator. He was involved with the Open Compute Project (OCP) since its start at Rackspace. He became formally involved in late 2012. He is Rackspace's lead for OCP initiatives and platform designs. Aaron is serving his second term as an OCP Incubation Committee member, and sponsors the Certification & Interoperability (C&I) project workgroup. He supported the C&I workgroup as they built and submitted their first test specifications. He has also spent time working with the OCP Foundation on licensing and other strategic initiatives.

Open hardware has the potential to disrupt the datacenter and the world of software development in very positive ways. OpenPOWER takes that potential a few steps further, both in the core system, and with technologies like CAPI. These innovations raise the possibility of performance and efficiency improvements to a magnitude not seen for a long time. This talk will explore past experience and current impressions of someone who has done development work at the intersection of OpenStack and Open Compute for a few years. It will cover his experience working with teams building & integrating hardware and software, for large scale as-a-Service deployments of OpenStack Nova and Ironic on Open Compute hardware.

Level: All
Type: Talk
Tags: OpenPOWER; Supercomputing; Data Center, Cloud Computing & HPC

Day: Wednesday, 03/18
Time: 14:15 - 14:30
Location: Room 220C
View Recording

ECS5006 - Early Stage Challenge: GeekSys

Luiz Vitor Martinez Cardoso CEO, GeekSys
Luiz Vitor Martinez Cardoso
Luiz is a 26 years engineer and entrepreneur who was nominated as one of the most innovative professionals on both communication and marketing in South America. Luiz got his dual engineering degree in computer and electronics engineering from a top Brazilian school and accumulates previous experiences on academia, small, mid and multinational companies, including GE. Self-learner, Luiz has a very special way of seeing the world, being able to anticipate trends, architect technologies from scratch and delivery it to market. Today Luiz is dedicating all efforts to make GeekSys be the leader in the Store Performance Management (SPM) field.

GeekSys is the most innovative and awarded brick & mortar analytics start-up in Brazil today. GeekSys has born from two engineering class mates curiosity while they were trying to develop a marketing tool to understand how customers behave in front of a shopping window back in 2010. After two years of intense R&D, GeekSys released his first product into the market. From 2012 to 2014, GeekSys released tree more products and received 7 awards in technology, innovation and business models. GeekSys was the first company in the world to be able to read and quantify the purchase intention inside physical stores and translate it to a more natural language to retailers. GeekSys was also the creator of the Store Performance Management (SPM) concept, comparable to CRM and ERP. Today GeekSys has big customers in Brazil and is integrating all previous complex technologies into a single and uniform platform, as never seen before. GeekSys is working hard to lead the retail analytics market and our key advantage is the ability to use technology as a path to value.

Level: All
Type: Talk
Tags: Computer Vision & Machine Vision; Big Data Analytics; Emerging Companies Summit

Day: Wednesday, 03/18
Time: 14:30 - 14:38
Location: Room 220B

S5255 - Power Efficient Visual Computing on Mobile Platforms

Brant ZHAO GPU Architect, NVIDIA
Brant is a GPU architect at NVIDIA Shanghai. He is focusing on GPU computing analysis and architecture investigation. His study targeted at providing performance and power optimized implementation of computing applications on current generation of GPUs as well as architecture improvements for next generation of GPUs to help current applications achieve better GPU utilization and power efficiency.

Tegra K1 brings desktop GPU into mobile world which makes it possible for mobile platforms to succeed in more and more complex visual computing tasks. With future more powerful Tegra family chips, much more compute applications are expected for the mobile world. Besides performance tuning for all these applications, it's also critical to make them power efficient as they are running on mobile devices which have limited power budget. In this work, we will present methodology of doing power analysis and optimizations for mobile computing workloads. Three case studies will be presented to explain the three items of the methodology:(1) Analyze the whole pipeline at system level; (2) Using energy efficient features of the target platforms; (3) Reduce the total instruction count to save energy.

Level: Intermediate
Type: Talk
Tags: Computer Vision & Machine Vision; Developer - Performance Optimization; Automotive

Day: Wednesday, 03/18
Time: 14:30 - 14:55
Location: Room 210B
View Recording
View PDF

S5425 - GPU-Accelerated Network Centrality

Erik Saule Assistant Professor, University of North Carolina at Charlotte, Department of Computer Science
Erik Saule is an Assistant Professor in the Computer Science department of UNC Charlotte since August 2013. He received his License and Maitrise in Computer Science in 2003 from University of Versailles, France and his Master and Ph.D. in Computer Science, respectively in 2005 and 2008, from Grenoble Institute of technology, France. Dr. Saule has been a post-doctoral researcher in the Department of Biomedical Informatics at The Ohio State University from 2009 to 2013. His research interests revolve around the efficient use of modern computing platforms (multi-core CPU, many-core CPU, GPU, Clusters, Accelerated Clusters, SSD) for compute intensive or data intensive applications. In the recent years, he studied in particular the use of accelerators for sparse computations.

This session is about how to efficiently compute shortest-path based network centrality metrics using GPU. Performing shortest path computation on a GPU is an expensive operation because of many idempotent operations (computation and memory accesses) that need to be performed for ensuring the computation is correct. We will show how to interleave shortest path based computation in the context of network centrality metric to reduce the number of memory accesses and to maximize their coalescing. We will also see how the representation of the network in memory is key to balance thread divergence and the number of atomic operations.

Level: Advanced
Type: Talk
Tags: Big Data Analytics; Developer - Algorithms

Day: Wednesday, 03/18
Time: 14:30 - 14:55
Location: Room 210D
View Recording
View PDF

S5532 - Safe and Seamless Integration of Tegra into the In-Vehicle Network

Stefaan Sonck Thiebaut General Manager, OpenSynergy
Stefaan  Sonck Thiebaut
Stefaan Sonck Thiebaut, PhD, is the general manager of OpenSynergy, where he is responsible for overall product development and the technical direction of the company. As one of the co-founders of OpenSynergy, he received his doctorate degree from Stanford University in the USA, and has over 20 years of experience in software development.

Virtualization is playing a more important role in the development of in-vehicle systems. Users of the NVIDIA Vibrante SDK/PDK can use OpenSynergy's integrated automotive solution to realize CAN communication and AUTOSAR compliance within the timing and safety constraints required by the automotive industry. In addition, learn how the solution allows controlled communication between virtualized operating systems and the vehicle networks while maintaining the isolation between both.

Level: Intermediate
Type: Talk
Tags: Automotive

Day: Wednesday, 03/18
Time: 14:30 - 14:55
Location: Room LL20D
View Recording
View PDF

S5629 - Reconstruction Networks for Efficient Face Detection and Landmark Localization

Bo Yu Visiting Researcher, Carnegie Mellon University
TBD
Ian Lane Assistant Research Professor, Carnegie Mellon University
Ian Lane
Ian Lane is an Assistant Professor at Carnegie Mellon University. He leads the speech and language-processing group at CMU Silicon Valley and performs research in the areas of Speech Recognition, spoken language understanding and speech interaction. Ian and his group are developing methods to accelerate speech and language technologies using General Purpose Graphics Processing Units (GPUs). His group has already obtained 1000x speedup for signal processing tasks, 100x speedup for Viterbi training and over 20x speedup for complex tasks such as graph search. These new technologies have enabled the group to explore novel interaction paradigms for human machine interaction.

In this talk we introduce Reconstruction Networks, a novel neural network-structure that enables extremely efficient object detection in images. Reconstruction Networks directly reconstruct the regions of interest of one or more objects within an image without explicitly performing image segmentation or generating key point descriptors. We show that Reconstruction Networks can learn the structure of face and facial landmarks automatically, even under various poses and illumination conditions and outperform state-of-the-art performance for Face Detection and Facial Landmark Localization while requiring only a fraction of the computational cost.

Level: Intermediate
Type: Talk
Tags: Machine Learning & Deep Learning; Computer Vision & Machine Vision; Automotive

Day: Wednesday, 03/18
Time: 14:30 - 14:55
Location: Room 210A
View Recording

S5682 - Introducing the Little-Endian OpenPOWER Software Development Environment and Its Application Programming Interfaces

Michael Gschwind STSM & Manager, System Architecture, IBM Systems & Technology Group
Michael Gschwind, PhD is an STSM & Manager for System Architecture in the IBM Systems & Technology Group. Michael is also a Fellow at IEEE and Member of the IBM Academy of Technology - IBM Master Inventor.

Over the past three decades, the Power Architecture has been an important asset in IBM's systems strategy. During the time, Power-based systems powered desktops, technical workstations, embedded devices, game consoles, supercomputers and commercial UNIX servers.

Level: All
Type: Talk
Tags: OpenPOWER; Supercomputing

Day: Wednesday, 03/18
Time: 14:30 - 14:45
Location: Room 220C
View Recording

S5733 - Augmented Reality with Google's Project Tango and NVIDIA Technology

Wil Braithwaite Senior Applied Engineer for VFX, NVIDIA
Wil Braithwaite
Wil Braithwaite has worked for 15 years in visual-effects at studios in London and Los Angeles. Positions ranged from research, technical direction, compositing, CG supervision, and MOCAP supervision. He has pioneered the use of graphics hardware in the VFX workflow, which led to his role at NVIDIA as a Senior Applied-Engineer, where he specializes in consulting, training and assisting development for VFX studio projects utilizing NVIDIA technologies.

This talk presents a system for the visualization of professional graphics, such as ray tracing, on a low-latency device, such as a head-mounted display or tablet. I will describe the issues encountered, and the algorithms used. The example I will demonstrate showcases the NVIDIA® VCA cluster for cloud-based rendering, NVENC for low-latency video encoding, and Google's Project Tango with the Tegra K1 processor for pose tracking and video decoding. The demo system presented can also serve graphics to multiple low-latency devices, such as a Virtual Reality HMD, at a rate much faster than the graphics are rendered.

Level: All
Type: Talk
Tags: Augmented Reality & Virtual Reality; Real-Time Graphics; Media & Entertainment

Day: Wednesday, 03/18
Time: 14:30 - 14:55
Location: Room LL21D
View Recording

ECS5007 - Early Stage Challenge: FluiDyna GmbH

Thomas Indinger CEO, FluiDyna GmbH
Thomas Indinger
Dr. Thomas Indinger graduated in 1994 from the "Universität der Bundeswehr München" in Mechanical Engineering and in 2000 from the "Technische Universität Dresden" in Applied Mechanics. In 2005 he received his doctoral degree from "Technische Universität München" in Fluid Mechanics. In 2006 Dr. Indinger founded FluiDyna. Also since 2006 he's Head of Automotive Aerodynamics at the Institute of Aerodynamics and Fluid Mechanics of "Technische Universität München" where Dr. Indinger did in 2013 his Habilitation and got and appointment as Associate Professor or Senior Lecturer for Fluid Mechanics.

FluiDyna GmbH provides a wide range of research and development services. Being comprised of the terms "Fluid" and "Dynamics", the name already indicates its knowhow and ultimate competitive edge. Its core expertise lies particularly in the development and application of numerical methods of flow simulation and thermodynamics. Additionally FluiDyna GmbH belongs to the experts within the demanding field of GPU-based HPC for fluid mechanical problems. FluiDyna provides assistance for both customers who require advice on how to configure a GPU-based supercomputing system, and for companies and institutions with a need for individually programmed, customized software for CFD, modelling and development purposes. Founded in 2006, the company already has excellent references and partnerships including NVIDIA Corporation, Realtime Technology AG, Silicon Graphics International Corp., and TEK Microsystems Inc. FluiDyna's solutions have been commissioned by research institutes and public-sector clients as well as by the industry, particularly manufacturers and suppliers of passenger and commercial vehicles, civil and military aircrafts as well as the pharmaceutical industry and its suppliers.

Level: All
Type: Talk
Tags: Computational Physics; Supercomputing; Emerging Companies Summit

Day: Wednesday, 03/18
Time: 14:40 - 14:48
Location: Room 220B

S5687 - NVIDIA Tesla Accelerated Computing Platform for IBM Power

John Ashley Senior IBM Software Developer Relations Manager, NVIDIA
John's focus is on enabling IBM Enterprise Software to take advantage of GPU acceleration. His special interest areas include data science, visualization, computational finance, and radio astronomy.

Learn how applications can be accelerated on IBM Power8 systems with NVIDIA® Tesla® Accelerated Computing Platform, the leading platform for accelerating big data analytics and scientific computing. The platform combines the world's fastest GPU accelerators, the widely used CUDA® parallel computing model, NVLink, high-speed GPU interconnect to power supercomputers, and a comprehensive ecosystem of software developers, software vendors, and datacenter system OEMs to accelerate discovery and insight.

Level: All
Type: Talk
Tags: OpenPOWER; Supercomputing

Day: Wednesday, 03/18
Time: 14:45 - 15:00
Location: Room 220C
View Recording

ECS5008 - Early Stage Challenge: Ersatz labs

Dave Sullivan CEO, Ersatz Labs
Dave Sullivan
David Sullivan, Ersatz Labs CEO has been programming since age 12 , is a passionate company starter and been working on deep learning for 4 years. He was previously founder of BlackCloud BSG. He has a Bachelors degree from UC San Diego.

Ersatz is a GPU backed platform for machine learning algorithms, specifically the most recent "deep learning" algorithms: brain-like computational type systems (for example convolutional and recursive neural nets). Its currently hard for vertical domain experts to become experts in data science and vice versa. There isn't any easy plug and play software. Many different network architectures/models and the most recent deep learning techniques need GPUs for a 40x speed up. Mathematics is hard for programmers. Programming is hard for mathematicians. Ersatz is a GPU-backed platform for building and deploying machine learning algorithms without programming using a drag and drop UI . The benefits are: 1. We provide the GPU hardware in the cloud or as an appliance so customer doesn't have to configure 2. We provide easy to use graphical user interface that wraps around the latest curated machine learning techniques, so there is no install, and the library integration is all done for the user 3. Use of Bayesian optimization to choose the parameters for the user or power users can choose themselves. 4. The model is provided to the user, either as parameters or via an API.

Level: All
Type: Talk
Tags: Machine Learning & Deep Learning; Big Data Analytics; Emerging Companies Summit

Day: Wednesday, 03/18
Time: 14:50 - 14:58
Location: Room 220B

ECS5009 - Early Stage Challenge: Replica Labs

Jack Morrison CTO, Replica Labs
Jack Morrison
Jack got started programming during his undergrad at Bowdoin College. He got hooked on robotics and computer vision as a member of Northern Bites RoboCup team at Bowdoin. Since then, he's spent his hours working on optical navigation for UAVs and researching distributed SLAM with cellphones. He resides in Boulder, Colorado with his fiance and their pets.

Replica Labs is a computer vision company focused on dense reconstruction from video feeds alone. Using the amazing, highly parallelizable power of Nvidia's CUDA core technology, we are able to translate single lens video feeds, that are available from any smartphone, into dense and highly accurate 3D reconstructions. Replica Labs' current focus is to use this amazing core technology to disrupt the way consumers take measurements of objects in their every day lives. With our robust software solution, we are able to reconstruct objects at a consumer's home with sub-millimeter accuracy! Replica's first product, Rendor, will empower billions of phones to become 3D scanners that can then transform the landscape and reach of 3D rendering and the reach that such information will have in e-commerce. With a small team of computer vision scientists and engineers, Rendor is currently under development and in an open beta phase of testing.

Level: All
Type: Talk
Tags: Computer Vision & Machine Vision; Emerging Companies Summit; Press-Suggested Sessions: Deep Learning & Computer Vision

Day: Wednesday, 03/18
Time: 15:00 - 15:08
Location: Room 220B

S5139 - Showing the Missing Middle: Enabling OpenACC Performance Analysis

Guido Juckeland Lead Hardware Accelerator Group, Technische Universität Dresden - ZIH
Guido Juckeland
Guido received his Ph.D. for his work on performance analysis for hardware accelerators. He coordinates the work of the CCoE at Technische Universität Dresden and also represents TU Dresden at the SPEC High Performance Group and OpenACC committee.

Learn how OpenACC runtimes now also exposes performance related information and how this can be now be used to show where your OpenACC applications are wasting clock cycles. The talk will show that profilers can connect with OpenACC applications to record how much time is spent in OpenACC regions and what device activity it turns into. See how this can be turned into a natural timeline based visualization to show with great detail what an OpenACC application is doing at any point in time.

Level: All
Type: Talk
Tags: OpenACC; Developer - Tools & Libraries; Developer - Performance Optimization; Supercomputing

Day: Wednesday, 03/18
Time: 15:00 - 15:25
Location: Room 210H
View Recording
View PDF

S5159 - Tuning the Performance of Convolutional Neural Network for Image Classification on GPU

Joseph Wang Principal Specialist, Alibaba
Joseph Wang
Joseph is in charge of the domain specific computing team at Alibaba Cloud Computing, a subsidiary of Alibaba Group. He focuses on performance optimization for computationally intensive domain questions or algorithms targeting on specific hardware architecture. Before joining Alibaba, Joseph worked as development manager at Oracle Asia development center for 5 years. Before that, he had HPC experience working at AMD multi-processors resource integration center.

Convolutional neural networks (CNNs) have achieved an impressive suite of results on image classification. Industry adoption, for instance by Alibaba, also indicates bright prospects. In this talk we will present several methods to optimize and accelerate GPU implementation of Convolutional Neural Networks. An optimized implementation is given as an example which has smaller memory footprints and performs 1.4 to 3 times faster than Caffe.

Level: All
Type: Talk
Tags: Machine Learning & Deep Learning; Developer - Performance Optimization

Day: Wednesday, 03/18
Time: 15:00 - 15:25
Location: Room 210A
View Recording
View PDF

S5204 - GPU-Based Scene Generation for Flight Simulation

Tim Woodard Chief Technology Officer, Diamond Visionics
Highly-Rated Speaker
Tim Woodard
Mr. Tim Woodard is the Chief Technology Officer at Diamond Visionics, with over 18 years of experience specializing in the design and development of software architectures for real-time PC-based image generation using Agile development processes, advanced C++, and modern OpenGL techniques. Mr. Woodard has received patents for the real-time simulator database generation technology which forms the basis of Diamond Visionics' GenesisRTX worldwide database generation system. GenesisRTX provides high-fidelity generation, visualization, and manipulation of visual databases at run-time directly from source data on low-cost PC-based platforms, eliminating the need for traditional labor-intensive off-line database production processes. Tim has published and presented at GTC, I/ITSEC, IMAGE, ASQ, and ITEC conferences.

Flight simulation is incredibly demanding from a graphics perspective. Both fidelity and performance are of utmost importance. By leveraging modern GPU capabilities, it is now possible to greatly increase both performance and fidelity by orders of magnitude when compared to traditional scene-graph approaches. Furthermore, both significant consolidation of hardware and distributed rendering are now possible, greatly simplifying large-scale simulator facility design and maintenance. Learn how modern GPU-based approaches are being utilized to provide high-quality training for today's pilots.

Level: Intermediate
Type: Talk
Tags: Real-Time Graphics; Visualization - Large Scale & Multi-Display; Developer - Algorithms

Day: Wednesday, 03/18
Time: 15:00 - 15:25
Location: Room LL21B
View Recording
View PDF

S5215 - Maximize the Performance of your Cluster: Marrying GPUs and Dataflow Graph Processing

Nam-Luc Tran R&D Manager, EURA NOVA
Highly-Rated Speaker
Nam-Luc Tran
Since joining EURA NOVA as a R&D researcher, Nam-Luc has published numerous times in the fields of big data and distributed computing, including storage, modeling and processing, with collaborations among the top Belgian universities (ULB, UCL, ULg). EURA NOVA is a private company located in Belgium and focused on solving industrial challenges with the most advanced technical innovations.

Get the best out of your processing cluster by equipping nodes with a GPU. Many distributed processing models have emerged these past years driven by the need of scaling out applications and by the affordability of clusters running on commodity hardware. Among these the dataflow graph processing model is the most general, representing jobs as distributed operators (nodes) connected by data channels (edges). In this talk, we explain how we have extended an existing dataflow graph processing framework to fully take into account GPU resources in the cluster. We show how this paradigm fully exploits the batch and streaming features of the GPU in a distributed job. We then finally expose our model for the scheduling on this heterogeneous processing framework.

Level: All
Type: Talk
Tags: Big Data Analytics; Machine Learning & Deep Learning; Data Center, Cloud Computing & HPC

Day: Wednesday, 03/18
Time: 15:00 - 15:25
Location: Room 210D
View Recording
View PDF

S5246 - Innovations in OptiX

David McAllister OptiX Engineering Manager, NVIDIA
Highly-Rated Speaker
David McAllister
David McAllister is the engineering manager of NVIDIA's OptiX ray tracing engine and has been in the OptiX group for five years. Prior to OptiX he was a GPU architect since joining NVIDIA in 2000, working on GPUs from the GeForce 3 through Fermi. David received his Ph.D. in Computer Science from UNC Chapel Hill and has been in the computer graphics industry since 1989. He resides in Salt Lake City, Utah, USA.

OptiX is the industry's premier ray tracing engine in terms of performance, functionality, and adoption. We will present three recent advances in OptiX. First, the renovation of the core of OptiX, including using an LLVM-based compiler pipeline, which brings several performance benefits and opens the door for long-desired new features. Second, the OptiX VCA allows OptiX-based applications to transparently use NVIDIA Visual Computing Appliance for massively parallel, shared, remote rendering. Third, we will share exciting results of our top partners and their recent successes with OptiX.

Level: Intermediate
Type: Talk
Tags: Rendering & Ray Tracing; Media & Entertainment; Product Design & Styling

Day: Wednesday, 03/18
Time: 15:00 - 15:50
Location: Room LL21E
View Recording
View PDF

S5252 - StarPU: Programming for Heterogeneous MultiGPU Systems

Joao Gazolla Ph.D. Student, Universidade Federal Fluminense
Joao Gazolla
Joao Gazolla is a Ph.D. candidate at Universidade Federal Fluminense since 2012 under the supervision of Dr Esteban Walter Gonzalez Clua in the MediaLab-UFF Group (www.ic.uff.br/~medialab). He received his MSc degree in Computer Science from the Universidade Federal Fluminense (UFF) in 2010 and a BS degree in Computer Science from the Federal University of Vicosa (UFV) in 2009. He's also a researcher at Media Lab at UFF which is an NVIDIA Center of Excellence and also fellow of the project Global Cyber Bridges of Florida International University (FIU). He has experience in Computer Science with emphasis in GPGPUs, Simulation, Optimization, Bioinformatics and Image Analysis.
Esteban Clua Associate Professor, Universidade Federal Fluminense
Esteban Clua
Esteban Walter Gonzalez Clua is graduated in Computer Science at Universidade de São Paulo and has master's and PhD degree in Computer Science. Today Esteban is associated professor at the computer science of Universidade Federal Fluminense, in Rio de Janeiro, and director of UFF Medialab. Esteban is one of the founders of SBGames – Brazilian Symposium of Digital Entertainment and Video Games (the largest conference in the subject in South America), is director of Academia of IGDA Rio, president of the Brazilian Computing Society Game Committee and member of program committees of many conferences in Video Games, such as ACM Siggraph Sandbox, IEEE Gameon and SBC SBGames. In 2007 received the prize of the personality which most contributed for the growth of the video game industry in Brazil and in 2009 received the prize of Young Scientist of the State of Rio de Janeiro. Esteban is the coordinator of the first Latin America NVidia Center of Excellence.

Learn implementation techniques for using Heterogenous MultiGPU Systems. We will show you an overview of the framework and teach you how to exploit the starPU power a unified run-time system for heterogeneous multi-core architectures that gives a unified view of the computational resources. The attendees will learn how to use the framework on a strategy that will include code examples and programming demonstrations.

Level: Beginner
Type: Talk
Tags: Supercomputing; Developer - Performance Optimization; Data Center, Cloud Computing & HPC

Day: Wednesday, 03/18
Time: 15:00 - 15:25
Location: Room 212B
View Recording
View PDF

S5290 - High Performance Computing on Mobile Devices through Distributed Shared CUDA

Edgar Josafat Martinez Noriega Master Student, The University of Electro-Communications, Tokyo
Edgar Josafat Martinez Noriega
Engineering title from National Polytechnic Institute of Mexico on Computer Science. Master Student (2nd year) at The University of Electro-Communications in Japan. Research Assistant in KEIO University Japan.

Through a GPU virtualization tool, (DS-CUDA), we remotely use an NVIDIA GPU from our local network to accelerate a molecular dynamics (MD) simulation inside an Android device (NVIDIA SHIELD™). We implement a NaCl MD simulation on Android. We accelerate the computation of force, velocity and coordinate using CUDA through the DS-CUDA tool. We use a laptop equipped with GeForce GTX 680M (server) connected to our LAN network using Gigabit Ethernet. Android device (client) is connected to same LAN using Wifi 802.11n. Server and client communicate under tcp socket. We reached up to 420 Gflops in force computation on a simulation with 5832 ions, 5700 times faster than the 0.073 Gflops delivered from CPU implementation on NVIDIA SHIELD™.

Level: Intermediate
Type: Talk
Tags: Visualization - In-Situ & Scientific; Life & Material Science

Day: Wednesday, 03/18
Time: 15:00 - 15:25
Location: Room LL21C
View Recording
View PDF

S5393 - Evolution of an NVIDIA GRID™ Deployment

Erik Bohnhorst GRID Solution Architect, NVIDIA
Erik Bohnhorst
Erik Bohnhorst is a Senior GRID Solution Architect at NVIDIA based in Stuttgart, Germany. After 7 years working at HP and focusing on Client Virtualization, Erik joined NVIDIA to support the largest Graphics Accelerated Client Virtualization opportunities in central Europe. Erik regularly shares his experience and technical understanding of Client Virtualization opportunities at technical events like BriForum, HP Discover, VMworld, E2EVC and other industry focused events.
Ronald Grass Systems Engineer, Citrix
Ronald Grass
Ronald Grass is a Citrix Systems Engineer based near Cologne, Germany. He supports Citrix Solutions Advisors with his technical skills in Desktop and Application Virtualization during Presales engagements and PoCs. After 6 years of Citrix consulting for different CSA he started his career at Citrix Systems in 2008, where he helped to drive Desktop Virtualization Initiatives as a Subject Matter Expert for XenDesktop. As SME he has deep technical knowledge in Desktop virtualization technologies covering all aspects from Provisioning and Management up to all kinds of user experience enhancements (HDX™). Since the support of XenServer (GPU-passthrough) and Nvidia GRID/ Xenserver vGPU, Ronald focused on HDX-3D Pro where he supported many implementations of NVIDIA GRID for the automotive industry, educational institutions, architecture and manufacturing. With the help of these real-world experiences he frequently provides feedback to product management and engineering to continuously improve NVIDIA GRID/XenServer vGPU deployments.

In this session you will learn about the entire lifecycle of how a customer moved from a local workstation deployment to a centralized and remote solution with CITRIX XenDesktop and NVIDIA GRID™. The session will guide you through the lifecycle of the project covering the business need, the proof of concept, challenges, learning's and the actual deployment. You will walk away from this session with a better understanding of the challenges and solutions of a NVIDIA GRID™ opportunity.

Level: All
Type: Talk
Tags: Graphics Virtualization; Data Center, Cloud Computing & HPC

Day: Wednesday, 03/18
Time: 15:00 - 15:50
Location: Room LL20C
View Recording

S5433 - Auto-Tuning Kernel Launch Parameters for Maximum Performance

Joshua Anderson Senior Research Area Specialist, University of Michigan
Joshua Anderson
Joshua Anderson is a Research Area Specialist in the Laboratory for Computational Nanoscience & Soft Matter Simulation at the University of Michigan, where he is the lead developer of HOOMD-blue, a high performance particle simulation tool. Dr. Anderson graduated from Michigan Tech in 2005 with B.S. degrees in Computer Science and Physics. He received his Ph.D. degree in Condensed Matter Physics from Iowa State University in 2009. His research interests include high performance computing using GPUs, nanoparticle self-assembly, and polymer physics.

Learn how to efficiently auto-tune kernel launch parameters and attain maximum performance in your application. Launch parameters have a large effect on the run time of the kernel, and there is no way to know the best choice a priori. Auto-tuning parameters provides maximum performance under any circumstances. This talk introduces the auto-tuning method used in the HOOMD-blue particle simulation toolkit and shows how you can use the same technique in your own applications. In practice, the method is very effective at finding the optimal launch parameters, and it can retune new parameters as the application runs. Retuning is important in Molecular Dynamics (MD) and Monte Carlo (MC) applications where the kernel workload can change drastically as the application run progresses.

Level: Advanced
Type: Talk
Tags: Life & Material Science; Developer - Performance Optimization

Day: Wednesday, 03/18
Time: 15:00 - 15:25
Location: Room 212A
View Recording
View PDF

S5558 - Publishing Medical Image Studies with NVIDIA GRID™

G Allan Johnson Charles E Putman Professor of Radiology,Physics, and Engineering, Duke University
G Allan Johnson
Dr. Johnson performed his Ph.D. in physics at Duke in magnetic resonance under Professor Walter Gordy from 1969-1974. He joined the Duke Department of Radiology in 1974 and was responsible for installing the first CT at Duke. As the Director of Diagnostic Physics for Duke Medical Center, he was intimately engaged with translating CT technology into application from 1979-2009. He was responsible for the installation of the world’s first clinical high-field (1.5 T) MR system at Duke in 1983. He was active in translation of MR technology into clinical use from 1982-2005. During that time, a desire to limit research on large animals (dogs) stimulated his interest in the engineering challenges of translating imaging methodologies to small animals. He founded the Center for In Vivo Microscopy in 1986 as an NIH National Resource (P41 EB015897). His interest remains in developing novel technologies for small animal imaging and translation of those technologies to important biomedical questions.

Biomedical imaging has experienced exponential growth over the last 30 years in instrumentation, applications, and data volume. Traditional methods for publishing imaging studies are no longer adequate. We describe a new paradigm for publication of biomedical imaging libraries- collections of large multi-dimensional images curated around a central theme. Libraries are shared via an NVIDIA GRID K2 server running CITRIX. Several libraries have been assembled: 1) A (0.5 TB) multidimensional atlas of the mammalian brain based on MRI of the mouse, rat, and monkey; 2) an (0.25 TB) interactive CT/MR atlas of the mouse with both in vivo and ex vivo MR and CT- at microscopic resolution; and 3) clinical libraries for teaching and surgical planning.

Level: All
Type: Talk
Tags: Graphics Virtualization; Medical Imaging

Day: Wednesday, 03/18
Time: 15:00 - 15:25
Location: Room LL21F
View Recording

S5577 - Building State-of-Art Face Processing Pipeline with GPU

Shuchang Zhou Principal Scientist, Megvii Inc.
Shuchang Zhou
Shuchang Zhou got his bachelor's degree from Tsinghua University in 2004 and Master's degree from National University of Singapore in 2005. Before joining Megvii in 2014, he had worked as Assistant Professor in Chinese Academy of Sciences and a software engineer in Google. He was owner of multiple American and international patents.

Megvii Inc. revisited face-related problems with deep learning techniques powered by GPU. Substantial progress had been made and performance kept increasing with inflowing of data. This brings facial recognition technique closer to solving the identity problem, which is fundamental to security, credibility and accountability of Internet. Availability and power-efficiency of GPU enables Megvii to explore deeper and more complex neural network topology, handle higher resolution images and videos, and extend to embedded devices of more limited power profile. As time of writing, Face++ of Megvii is a leading face recognition service provider on cloud, and has processed more than 40 billion images and run on 50 million devices.

Level: All
Type: Talk
Tags: Computer Vision & Machine Vision; Big Data Analytics; Machine Learning & Deep Learning; Press-Suggested Sessions: HPC & Science

Day: Wednesday, 03/18
Time: 15:00 - 15:25
Location: Room 210B
View Recording
View PDF

S5587 - Benchmarking Real-World In-Vehicle Applications

Michael Carstens-Behrens Managing Director, mycable GmbH
Michael Carstens-Behrens
Michael Carstens-Behrens is an embedded processor system architect. He has worked for more than 20 years on high-end automotive and industrial embedded systems. Since 2001 he is the owner-manager of mycable, a German consulting and design house for customized high performance embedded solutions.

Learn how to perform a critical use case analysis to ensure your high-end embedded system provides the required application specific performance. Typical GPU and CPU benchmark return performance values under optimized conditions. But real world applications, such as infotainment systems, will find the bottlenecks in your system. Find them before the project fails or find options to transfer tasks to the GPU (e.g. using CUDA). Attendees will see how to transform your system architecture into a ""System Resource Model"" then find the ""Critical Use Cases"" of your application and match them with this model. This practical approach will show how to setup benchmarks in parallel to emulate use cases under reproducible conditions based on an example for an automotive infotainment system.

Level: All
Type: Talk
Tags: Automotive; Embedded Systems; Developer - Performance Optimization

Day: Wednesday, 03/18
Time: 15:00 - 15:25
Location: Room LL20D
View Recording
View PDF

S5611 - TurbulenceFD 2: Distributed Sparse Grid Fluid Simulation and Rendering for VFX

Jascha Wetzel Software Engineer, Jawset Visual Computing
Jascha is a longtime software engineer and founder of Jawset Visual Computing. He designed and developed the TurbulenceFD fluid simulation and rendering software. He has a Master's degree in Computer Science (Dipl.-Inform.) from Aachen University of Technology, Germany.

Voxel-based fluid simulation for VFX is currently mostly done on dense grids, but their poor spatial adaptivity restricts scalability. On GPUs, scalability is further restricted by the limited GPU memory. This talk gives an overview of the architecture of TurbulenceFD 2 (TFD2) and how it handles fluid simulation and rendering. TFD2 implements a distributed sparse grid simulation and rendering framework that is highly adaptive and combines the memory and compute power of multiple GPUs and CPUs.

Level: All
Type: Talk
Tags: Media & Entertainment

Day: Wednesday, 03/18
Time: 15:00 - 15:25
Location: Room LL21D
View Recording

S5666 - A True Story: GPU in Production for Intraday Risk Calculations

Regis Fricker Quantitative Analyst, Societe Generale
Régis Fricker is a quantitative analyst at Société Générale since 2007. He is in charge of GPU projects for fixed income and forex library. He received a MSc from Centrale Lille in engineering in 2004 and a MSc from Paris VII in financial mathematics in 2006.

Explore a real-life case of GPU implementation in the context of trading and risk-management of numerically highly demanding financial computations. In this talk, we will present how, at Société Générale, we overcame the practical difficulties and the technical puzzles to put GPU into a concrete production environment. We will first show how GPU have changed the life of the trading desks by speeding up their pricing capabilities and delivering faster risk-analyses. We will then examine specific questions such as: How to use NVIDIA GPUs in a managed library (.NET) ? How to use this technology in the specific context of finance distributed calculation? Insights will be provided on the problems we encountered at each step and on the innovative solutions we have implemented to address them.

Level: Beginner
Type: Talk
Tags: Finance

Day: Wednesday, 03/18
Time: 15:00 - 15:25
Location: Room 210C
View Recording
View PDF

S5673 - WebGL Visualization Tools and GPUs for Marketing of Robotics and Automation Products

Steve Rueckhaus Digital Marketing Specialist, Yaskawa America, Inc. Motoman Robotics Division
Steve   Rueckhaus
Steve Rueckhaus stumbled into the GPU world in an effort to find more engaging ways to demonstrate robotics via the web. Of his 20-plus years in marketing and media, Steve draws on his background in industrial automation to craft stories that reach a targeted audience and help convert readers to sales leads. As digital marketing specialist for the Motoman Robotics Division of Yaskawa America, Inc., Steve leads all aspects of web campaigns, including search engine optimization, content creation, design and analytics.

Yaskawa Motoman successfully improved the speed and quality of rendering processes to promote its latest robotic and automation solutions by leveraging the strengths of WebGL visualization applications (CL3VER) and NVIDIA's Quadro GPU technology. Gain insight into how Yaskawa's Sales & Marketing Group provides interactive 3D marketing experiences to enhance the promotion of next generation robotic solutions.

Level: All
Type: Talk
Tags: Manufacturing; Product Design & Styling

Day: Wednesday, 03/18
Time: 15:00 - 15:25
Location: Room LL21A
View Recording
View PDF

S5678 - Accelerator Opportunities with OpenPower

Nick Finamore Product Marketing Manager for Software Development Tools, Altera Corporation
For the past 3 years Nick has been leading Altera’s computing acceleration initiative and the marketing of Altera’s SDK for OpenCL. Previously Nick was in several leadership positions at early stage computing and networking technology companies including Netronome, Ember(SiLabs) and Calxeda. Nick also had an 18 year career at Intel where he held several management positioning including general manager of the network processor division.

The OpenPower architecture provides unique capabilities which will enable highly effective and differentiated acceleration solutions. The OpenPower Accelerator Workgroup is chartered to develop both hardware the software standards which provide vendors the ability to develop these solutions. The presentation will cover an overview of the benefits of the OpenPower architecture for acceleration solutions. We will provide an overview of the Accelerator Workgroups plans and standards roadmap. We will give an overview of the OpenPower CAPI development kit. We will also walk through an example of a CAPI attached acceleration solution.

Level: All
Type: Talk
Tags: OpenPOWER; Supercomputing

Day: Wednesday, 03/18
Time: 15:00 - 15:15
Location: Room 220C
View Recording

S5727 - On Finishing Creative Projects

Thomas Mann CEO, Framefield
Thomas Mann
Tom was fascinated by computers even before owning one: He learned programming by writing assembler code on paper and testing it on a friend's East-German home computer. Soon thereafter he began developing computer games and tools on “real” computers. In his unsuccessful attempt to become less nerdy, he took classes in figure painting and attained a degree in architecture. With his student job, he became an expert in realtime graphics and was invited to SigGraph twice. His obsession with procedural computer graphics and motion design lead him to the demoscene, where he won many competitions and awards. After a decade of freelancing as an interface designer and developer, he finally starting Framefield – a Berlin based design agency – to satisfy his love for computer graphics and design by spending as much time as possible with the best team he has ever met.

Building on his talk at GTC 2014, Thomas will talk about our creative design process for real-time animations. Especially on how to turn abstract ideas and concepts into moving images; make consistent design decisions; get inspired by programming on the way and how to tweak the look and timing into a finished product.

Level: All
Type: Talk
Tags: NVScene; Real-Time Graphics

Day: Wednesday, 03/18
Time: 15:00 - 15:50
Location: Room LL20A
View Recording

ECS5010 - Early Stage Challenge: Insilico Medicine, Inc

Alex Zhavoronkov CEO, Insilico Medicine, Inc
Alex Zhavoronkov
Prior to co-founding InSilico Medicine Dr. Zhavoronkov held numerous positions in academia as well as the senior management positions in both IT and biotechnology. He started his career in semiconductors holding management positions at PMC-Sierra (Nasdaq: PMCS) and ATI Technologies (Nasdaq: AMD) and switched to computational systems biology. He served as the CEO of Mediox Inc and director of GTCBio. He co-founded the First Oncology Research and Advisory Center, a personalized medicine company acquired by Pathway Pharmaceuticals in Hong Kong. Dr. Zhavoronkov is the adjunct professor of the Moscow Institute of Physics and Technology. Since 2010 he published over thirty research articles in peer-reviewed journals and two books, including bestselling "The Ageless Generation: How biomedical advances will transform the global economy". Dr. Zhavoronkov received two bachelor degrees from Queen's University (computer science and commerce), masters in biotechnology from Johns Hopkins University and PhD in physics and mathematics from Moscow State University.

Insilico Medicine is a Baltimore-based company utilizing advances in genomics and big data analysis for in silico drug discovery, repurposing for aging, age-related diseases and personalized preventative medicine. The company utilizes a set of proprietary platforms to analyze gene expression, microRNA, genomic, proteomic and metabolomic profiles to discover, repurpose or combine drugs. Its OncoFinder (TM), GeroScope (TM) and PharmAtlas (TM) systems received praise from multiple research groups and pathway activation analysis algorithms were licensed by other innovative drug discovery companies. The company aims to repurpose and discover drugs for aging and age-related diseases. It also provides a broad range of services to pharmaceutical companies interested in drug repurposing, early project termination, companion diagnostic development and improving clinical trials enrollment. The company utilizes a unique approach to utilizing Big Data for a broad range of applications with no direct competition in all fields and abundant competition in every specific field. Calico and Human Longevity Inc appear to be likely competitors going after practical applications of aging research.

Level: All
Type: Talk
Tags: Life & Material Science; Big Data Analytics; Emerging Companies Summit

Day: Wednesday, 03/18
Time: 15:10 - 15:18
Location: Room 220B

S5685 - The Future of Interconnect with OpenPOWER

Scot Schultz Director, HPC and Technical Computing, Mellanox Technologies
Scot Schultz is a HPC technology specialist with broad knowledge in operating systems, high speed interconnects and processor technologies. Joining Mellanox in early 2013 as Director of HPC and Technical Computing, Schultz is 25-year veteran of the computing industry where prior to joining Mellanox, spent 17 years at AMD in various engineering and leadership roles; including strategic HPC technology ecosystem enablement. Scot has been instrumental with the growth and development of numerous industry standards-based organizations including OpenPOWER, OpenFabrics Alliance, HPC Advisory Council and many others.

Mellanox ConnectX-4 EDR 100Gb/s technology was introduced in November at the SC'14 conference in New Orleans, LA. ConnectX-4 EDR 100Gb/s with CAPI support tightly integrates with the POWER CPU at the local bus level and provides faster access between the POWER CPU and the network device. We will discuss the latest interconnect advancements that maximize application performance and scalability on OpenPOWER architecture, including enhanced flexible connectivity with the latest Mellanox ConnectX-3 Pro Programmable Network Adapter. The new programmable adapter provides maximum flexibility for users to bring their own customized applications such as IPSEC encryption, enhanced flow steering and Network Address Translation (NAT), data inspection, data compression and others.

Level: All
Type: Talk
Tags: OpenPOWER; Supercomputing

Day: Wednesday, 03/18
Time: 15:15 - 15:30
Location: Room 220C
View Recording

ECS5011 - Early Stage Challenge: Artomatix

Eric Risser CTO, Artomatix
Eric Risser
Dr. Eric Risser is an expert in the combined fields of artificial intelligence and computer graphics and is the pioneer of the Artomatix technology. He has authored six technical publications during his academic career at Columbia University (masters) and Trinity College Dublin (PhD). He has given talks at top industry/academic conferences such as Game Developers Conference (GDC) and Siggraph, multiple times each. He has been invited to speak at a number of companies/institutions, including Pixar and Princeton University and has consulted for Adobe. Eric has also authored a chapter in NVDIA's GPU Gems book series.

Founded in 2014, Artomatix is an early-stage company. Our mission is to develop the next generation of art tools, automating digital media creation. Ballooning costs for art development is the number one pain for both the video game and animation industries. Artomatix directly solves this problem by applying machine learning and big data concepts to art creation, enabling the computer to take over many tedious and time consuming aspects of making art. Contemporary tools are stuck on a model where human artists are needed as the source of creativity. Rather Artomatix has built a solution which fully or partially automates digital art creation, allowing a single human artist to do the work of a team. Art creation is no longer the bottleneck for how large and vibrant a virtual world can be. Artomatix brings the richness and variety of the real world into the digital realm.

Level: All
Type: Talk
Tags: Game Development; Machine Learning & Deep Learning; Emerging Companies Summit

Day: Wednesday, 03/18
Time: 15:20 - 15:28
Location: Room 220B

ECS5012 - Early Stage Challenge: QM Scientific

Faris Alqadah CEO, QM Scientific
Faris Alqadah
Faris leads the overall vision of QM Scientific to be the smartest shopping intelligence platform that answers consumers everyday questions simply, accurately and in real-time. In addition, he leads data science development and holds a PhD in the field from the University of Cincinnati. Prior to QM Scientific, Faris built very large scale consumer propensity, segmentation, and recommender systems as a senior data scientist at PayPal. Previously, he served as a fellow at the Johns Hopkins School of Medicine where he applied data science research to solve challenging problems in genomics and proteomics. His data science research work has been published in leading peer-reviewed conferences and journals and was awarded a Best Doctoral Forum Poster Award and twice nominated for Best Paper Awards. For fun Faris bangs his head to epic heavy metal music and fights monsters with his two children.

QM Scientific (QMS) is a shopping intelligence company whose platform empowers consumers to make smart buying decisions in real time. Targeting the grocery retail vertical, our goal is to enable consumers to answer everyday questions such as: • "What is the best store to shop at right now for my list?" • "Are there cheaper alternatives for the products I buy regularly?" • "How much do I spend on diapers and beer monthly?" Simply, accurately and in real time. The QMS platform utilizes proprietary Data Science, Computer Vision and Natural Language Processing technology to intelligently extract, connect, and organize millions of products and prices from thousands of sources including the web, partner datasets, and receipt/product images. In December 2014, QMS launched PriceSwarm: a grocery price comparison app built on the QMS platform. With PriceSwarm, users create shopping lists in natural language and the platform recommends a store to shop at by optimizing price, quality, shopping behavior and location. In addition, users contribute real time prices and receive personalized cost saving recommendations and analytics by simply snapping a picture of their receipt.

Level: All
Type: Talk
Tags: Machine Learning & Deep Learning; Computer Vision & Machine Vision; Emerging Companies Summit

Day: Wednesday, 03/18
Time: 15:30 - 15:38
Location: Room 220B

S5130 - Many-Body Forces for Molecular Dynamics

Peter Eastman Senior Software Engineer, Stanford University
Highly-Rated Speaker
Peter Eastman
Peter Eastman is a software engineer in the bioengineering department at Stanford University. His research focuses on developing high performance software for molecular modeling, particularly the OpenMM toolkit. His previous work also includes many years working on bioinformatics and computer graphics.

Learn to implement many-body forces on a GPU. Interactions involving three or more atoms are becoming increasingly important in molecular simulations. They present unique challenges not found in conventional pairwise forces. I will describe how we implemented them in OpenMM, with an emphasis on optimization strategies to minimize thread divergence and avoid unnecessary memory access. I will also discuss how you can use OpenMM as a library to implement your own many-body forces without writing a line of CUDA code.

Level: Intermediate
Type: Talk
Tags: Life & Material Science; Computational Physics; Developer - Algorithms

Day: Wednesday, 03/18
Time: 15:30 - 15:55
Location: Room 212A
View Recording
View PDF

S5258 - Dense 3D Culture Rendering Using NVIDIA Solutions in Immersive Fast-Jet Simulators

William Paone Technical Lead Engineer, Level 5: Image Generator Product Development, Boeing
Highly-Rated Speaker
William Paone
Bill Paone is the Technical Lead Engineer for Fixed Wing Image Generation Product Development and Integration for Boeing Training Systems and Government Services (TS&GS). Bill has led image generation development and production for multiple world-wide programs. Emphasizing in visual simulation design, his career also includes development and production engineering in multiple fields with lead positions in hardware design, software design and image generator design and development in immersive training systems.

NVIDIA solutions used in immersive visual systems for flight simulation have allowed image generators to render complex scenes. This includes dense 3D terrain culture including buildings, trees, roads and towers. When rendered with high resolution photo-specific imagery, this culture improves low to medium altitude flight realism and situational awareness. However, adding dense culture to the scene over already complex terrain skin rendering creates heavy stresses on the system and GPUs. For immersive image generator system design, new NVIDIA technology has allowed these designs to be scaled to manageable and deliverable sizes. This talk will discuss how the GPU roadmap has allowed this to happen with low level flight examples, and source types that are being used for urban rendering.

Level: Beginner
Type: Talk
Tags: Real-Time Graphics; Visualization - Large Scale & Multi-Display; Manufacturing

Day: Wednesday, 03/18
Time: 15:30 - 15:55
Location: Room LL21B
View Recording
View PDF

S5323 - Achieving Near-Native GPU Performance in the Cloud

John Paul Walters Project Leader, USC Information Sciences Institute
John Paul Walters
John Paul Walters is a Project Leader and Computer Scientist at USC's Information Sciences Institute (ISI). He is the technical lead for ISI's HPC OpenStack initiative and is the PI of ISI's NVIDIA CUDA Research Center. He leads research programs in high performance computing, embedded cloud computing, and space processing. His research interests span bioinformatics, multicore and accelerator programming, and fault tolerance. He developed the widely-used GPU-HMMER implementation of the HMMER sequence analysis suite. He received his PhD from Wayne State University in 2007, and his BA from Albion College in 2006, both in computer science.

Explore the use of GPUs in virtualized environments. In this session we describe how GPUs can be used within virtual environments with near-native performance. We begin by showing GPU performance across four hypervisors: VMWare ESXi, KVM, Xen, and LXC. After showing that performance characteristics of each platform, we extend the results to the multi-node case with nodes interconnected by QDR InfiniBand. We demonstrate multi-node GPU performance using GPUDirect-enabled MPI, achieving efficiencies of 97-99% of a non-virtualized system. Examples are drawn from signal processing, big data analytics, and molecular dynamics. The session will conclude with a discussion of the next steps in extending HPC to virtual environments, including our work with the OpenStack platform.

Level: All
Type: Talk
Tags: Data Center, Cloud Computing & HPC; Supercomputing

Day: Wednesday, 03/18
Time: 15:30 - 15:55
Location: Room 210E
View Recording
View PDF

S5324 - Interactive Visual Exploration of Peridynamic-Based Fracture Simulation

Chakrit Watcharopas Postdoc at Clemson University and Assistant Professor at Kasetsart University, School of Computing, Clemson University and Dept. of Computer Science, Kasetsart University
Chakrit Watcharopas
Chakrit Watcharopas received a B.S. degree in Statistics from Thammasat University, a M.S. degree in Computer Science from University of Southern California, and a Ph.D. in Computer Science from Clemson University. In 2005, Chakrit took up a position as teaching and research faculty in the Department of Computer Science, Kasetsart University, Thailand. He has been giving course instruction on various computer graphics topics, such as interactive computer graphics and rendering with GPU, computer animation, and crowd simulation. Chakrit is currently a postdoctoral researcher in the Visual Computing Division, School of Computing at Clemson University, where he focuses his research on woven cloth simulation and fracture simulation.

Simulating fracture has been an area of interest in graphics for many years. Beyond the computational expense needed to achieve realistic fracture, even the simplest of techniques often requires repeated iteration to fine-tune the many parameters that control the simulation. In this session, we will focus on one particular technique, fracture using spring-mass systems, with the goal in mind to better understand the capabilities of a variant of spring-mass fracture known as peridynamics. Coupled with a framework for visualization, our method allows users to simultaneously compare multiple fracture simulation runs across different parameter settings. We present experimental results and report new extensions to our peridynamic-based fracture simulation, implemented in CUDA on Tesla K40s.

Level: Intermediate
Type: Talk
Tags: Visualization - In-Situ & Scientific; Computational Physics; Game Development

Day: Wednesday, 03/18
Time: 15:30 - 15:55
Location: Room LL21C
View Recording
View PDF

S5334 - A Fast, Portable and Robust Calibration Approach for Stochastic Volatility Models

Matthew Dixon Assistant Professor, University of San Francisco
Matthew Dixon
Matthew Dixon is an Assistant Professor at the University of San Francisco and teaches in both the MSAN and MSFA programs. Matthew has a background as a quant in industry and consults in the areas of algorithmic trading, financial risk management and venture capital. He co-chairs the IEEE/ACM workshop on high performance computational finance at SuperComputing and has published around 20 peer-reviewed technical articles. Matthew has held postdoctorate and visiting research professor positions at Stanford and UC Davis. He is a certified financial risk manager and holds a PhD in Applied Math from Imperial College, a MS in Parallel and Scientific Computing (with distinction) from Reading University and a MEng in Civil Engineering from Imperial College.

The ability to rapidly recalibrate financial derivative models such as stochastic volatility models reduces model risk due to the reliance on stale option chain quotes. This talk will address the following objectives: (1) Gain insight into the challenges of robustly recalibrating stochastic volatility (SV) models and how frequent recalibration reduces error in pricing; (2) Learn about the challenges of deploying the same modeling codebase on GPUs and multi-core CPUs and, (3) Understand how the Xcelerit platform can be used to efficiently deploy C++ written SV models on GPUs and multi-core CPUs.

Level: Beginner
Type: Talk
Tags: Finance

Day: Wednesday, 03/18
Time: 15:30 - 15:55
Location: Room 210C
View Recording

S5337 - Recent Advances in Multi-GPU Graph Processing

Giancarlo Carbone Ph.D. Student, Sapienza Universtity of Rome
Giancarlo Carbone
Giancarlo Carbone has received the master degree in Computer Science from University "Federico II" of Naples (Italy). He has been working at IBM for about 13 years in Research and Development, with 6 patents filed. Currently he is a PhD student at "Sapienza" University of Rome at Department of Computer Science. His interests regard GPU computing, parallel algorithms, digital forensics.

Learn how to use GPUs as a computing platform to solve problems with irregular memory access patterns and low arithmetic intensity. We have shown that a proper data-threads mapping and a combination of techniques to reduce data traffic, allow to achieve excellent performances in the traversal, via a level synchronous Breadth First Search (BFS), of large scale graphs (i.e. million of nodes and billions of edges) using multiple GPUs system. We are going to present our recent activities related to GPU-based graph processing: a new implementation of the BFS based on a 2D partitioning exploiting the atomic operations of the Kepler architecture, two solutions to the st-connectivity problem and all-pairs shortest path. Some of these can be of immediate use in the analysis of large sets of data.

Level: Intermediate
Type: Talk
Tags: Big Data Analytics; Data Center, Cloud Computing & HPC; Developer - Algorithms

Day: Wednesday, 03/18
Time: 15:30 - 15:55
Location: Room 210D
View Recording
View PDF

S5366 - Extended OpenACC Programming to Exploit GPU-Specific Features Still at a High Level

Seyong Lee Computer Scientist, Oak Ridge National Laboratory
Seyong Lee
Seyong Lee a computer scientist at Oak Ridge National Laboratory. Dr. Lee is an expert in program analysis, parallelizing compilers, and runtime systems for High Performance Computing (HPC) and emerging architectures, including multi-cores and hardware accelerators. His current research focuses on productive programming environment for future, heterogeneous computing. He developed an open source compiler framework, called Open Accelerator Research Compiler (OpenARC), which supports full features of the OpenACC programming model and provides research environment for efficient and fault-tolerant accelerator computing. Previously, Dr. Lee pioneered another compiler framework for automatic OpenMP-to-Graphics Processing Unit (GPU) translation and optimization. This work has been cited extensively (his paper on this research has been selected as the most cited paper among all papers published in the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP) between 2009 and 2014.) and has won the best student paper award at the "Supercomputing 2010" conference. He also worked on various research projects, such as a compiler-driven adaptive execution, resource availability prediction in fine-grained cycle sharing systems, MapReduce with communication overlap (MaRCO), and Internet-sharing middleware and collaboration (iShare).

We present an extended OpenACC programming model to fully exploit GPU-specific features still at a high level. Directive-based, accelerator programming models such as OpenACC have arisen as an alternative solution for GPU programming. However, too much abstraction in the directive models makes it difficult for users to control architecture-specific features, incurring large performance gap between the directive models and low-level CUDA/OpenCL. We propose and implement new OpenACC extensions to support 1) hybrid programming of the unified memory and separate memory and 2) exploiting GPU-specific memories and synchronizations in an abstract manner. Experimental results show that the extended OpenACC programming can perform similarly to low-level CUDA programs, while at high level.

Level: Intermediate
Type: Talk
Tags: OpenACC; Developer - Programming Languages; Developer - Performance Optimization

Day: Wednesday, 03/18
Time: 15:30 - 15:55
Location: Room 210H
View Recording
View PDF

S5368 - Nonlinear Structured Prediction Using the GPU

Alexander Schwing Postdoctoral Fellow, University of Toronto
Alexander Schwing
Alex Schwing's research interests are optimization algorithms, statistical models and parallelization of implementations for high performance computing environments. Interesting playground for all three fields are inference and structured prediction in machine learning as well as computer vision, and in particular 3D scene understanding. He is currently working with Prof. Ruslan Salakhutdinov and Prof. Raquel Urtasun as a postdoc within the machine learning group at University of Toronto, Computer Science department. In 2013 he completed his PhD under supervision of Prof. Marc Pollefeys, Prof. Tamir Hazan and Prof. Raquel Urtasun within the Computer Vision and Geometry (CVG) group of the computer science department of ETH Zurich (ETHZ).

Learn how to combine deep neural networks with probabilistic models to build classification algorithms that jointly reason about multiple variables. Typically, deep neural networks reason about a single object of interest within the observed data, e.g., a single object within an image. We show how to enrich deep learning to jointly predict a set of random variables while leveraging learned variable correlations. To this end we present an efficient GPU driven algorithm based on neural networks that is able to jointly capture nonlinearities for multiple variables and their correlations.

Level: Intermediate
Type: Talk
Tags: Machine Learning & Deep Learning; Computer Vision & Machine Vision

Day: Wednesday, 03/18
Time: 15:30 - 15:55
Location: Room 210A
View Recording
View PDF

S5546 - GPU Accelerated Haze Removal on Tegra K1

Bin Zhou Adjunct Research Professor, University of Science and Technology of China
Bin Zhou
Dr. Bin Zhou is the director and chief scientist of Marine Information Processing Laboratory(MIPL) at Institution of Oceanography, Shandong Academy of Sciences. He serves as an Adjunct Research Professor in School of Information Science and Technology at USTC and an NVIDIA CUDA Fellow. He is the PI of CUDA research center (CRC) in Institute of Advanced Technology(IAT), USTC.In MIPL, he leads a team working on information processing systems for marine environmental pollution & natural hazard monitoring and ocean-atmosphere simulation. In CRC, he performs researches on drones control, video processing and computer vision algorithms on NVIDIA GPU/CUDA platform.

This talk shows how Tegra K1 GPU accelerates the dehazing process for outdoor computer vision systems. Toxic haze becomes a major air pollution threat in China, which affects not only public health but also outdoor computer vision systems. By adapting dark channel prior method into dehazing process, very good effects are achieved. However, huge processing requirements bring big challenges. We refined the parallel algorithm and performed deep-optimization on Tegra K1 Jetson platform. Compared to ARM CPU, experiments show 156x speedup. The results show Tegra K1 has great potential for embedded real-time computer vision processing.

Level: All
Type: Talk
Tags: Computer Vision & Machine Vision; Video & Image Processing; Press-Suggested Sessions: Deep Learning & Computer Vision

Day: Wednesday, 03/18
Time: 15:30 - 15:55
Location: Room 210B
View Recording

S5582 - Cloud Gaming & Application Delivery with NVIDIA® GRID™ Technologies

Franck Diard GRID Chief SW Architect, NVIDIA
Highly-Rated Speaker
Franck Diard
Franck Diard is Chief Software Architect at NVIDIA. In 15 years at NVIDIA, he has started, designed, built and helped deliver significant PC technologies like SLI, Optimus hybrid graphics and GRID cloud graphics platforms. In 2007, he was named the first software Distinguished Engineer at NVIDIA in recognition of his outstanding contributions and engineering creativity, with 66 US patents granted so far. He holds a Ph.D. in Computer Science from University of Nice-Sophia Antipolis.

This session presents the future of game engines and application delivery running in the cloud and the technologies behind NVIDIA® GRID™. The audience will learn about the key components of NVIDIA® GRID™, like optimal capture, efficient compression, fast streaming, and low latency rendering, make cloud gaming and application delivery possible. Franck will demonstrate how these components fit together, how to use the GRID APIs, and how to optimize their usage to deliver an ultimate experience, with live demos.

Level: Beginner
Type: Talk
Tags: Graphics Virtualization

Day: Wednesday, 03/18
Time: 15:30 - 15:55
Location: Room LL21F
View Recording

S5667 - A GPU-Accelerated Bundle Adjustment Solver

Lukáš Polok Researcher, Brno University of Technology
Lukáš Polok
Born in Brno (Czech Republic), Lukáš received a MSc in Computer Science with a specialization in computer graphics and multimedia at Brno University of Technology, where he is presently employed as a researcher, working on his thesis about using GPUs for general purpose calculations.
Simon Pabst R&D Programmer, Double Negative VFX
Simon Pabst
Simon joined the Double Negative VFX R&D team in 2012. Before, he obtained his Master’s degree (Diplom) and PhD (Dr. rer. nat) in Computer Science from the University of Tübingen (Germany) working on cloth simulation and deformable objects collision detection.
Jeff Clifford Head of R&D, Double Negative VFX
Jeff Clifford
With a background in Physics and an MSc in Applied Optics from Imperial College London, Jeff has pursued a career in the London Film Post-Production industry since joining Double Negative VFX in 2000. As a member the R&D team he wrote the DNB voxel renderer which has since been used in over 50 films. He developed DNeg's own 64-bit version of the 2D compositing software Shake, and subsequently transitioned the 2D department's tools to a stereo pipeline based around the compositing software Nuke. He has experience in developing many 2D and 3D tools for film production. More recently he has moved into a role as Head of R&D to oversee the strategic direction of internal tool development and use of 3rd party technology at DNeg.

This talk will give an overview of the film production processes, focusing on high quality 3D reconstruction of the scene and how GPU acceleration applies to it, as well as the math and algorithms behind. Several solutions for accelerating the algorithms involved in the 3D reconstruction process exist, but very few are concerned with the online quality assessment of the reconstructed areas. This is mostly due to the computational load of the algorithms computing the error in the reconstruction. The presented approach proposes efficient solutions based on GPU implementation of the matrix operations involved. It differs from the existing solutions by exploiting the inherent sparse, block structure of the underlying system matrices. The work is part of the EU FP7 IMPART project: impart.upf.edu.

Level: All
Type: Talk
Tags: Media & Entertainment; Developer - Algorithms; Video & Image Processing

Day: Wednesday, 03/18
Time: 15:30 - 15:55
Location: Room LL21D

S5689 - Using NVM Express SSDs and CAPI to Accelerate Data-Center Applications in OpenPOWER Systems

Stephen Bates Technical Director, CSTO, PMC-Sierra
Stephen Bates is a Technical Director at PMC-Sierra, Inc. He directs PMC's Non-Volatile Memory characterization program and is an architect for PMC’s Flashtec™ family of SSD controllers. Prior to PMC he taught at the University of Alberta, Canada. Before that he worked as a DSP and ECC. He has a PhD from the University of Edinburgh and is a Senior Member of the IEEE.

NVM Express is a standards based method of communication with PCIe attached Non-Volatile Memory. An NVM Express open-source driver has been an integrated part of the Linux kernel since March 2012 (version 3.3) and allows for very high performance. Currently there are NVM Express SSDs on the market that can achieve read speeds of over 3GB/s. We present results for a platform consisting of an NVM Express SSD, a CAPI accelerator card and a software stack running on a Power8 system. We show how the threading of the Power8 CPU can be used to move data from the SSD to the CAPI card at very high speeds and implement accelerator functions inside the CAPI card that can process the data at these speeds. We discuss several applications that can be serviced using this combination of NVMe SSD and CAPI.

Level: All
Type: Talk
Tags: OpenPOWER; Supercomputing

Day: Wednesday, 03/18
Time: 15:30 - 15:45
Location: Room 220C
View Recording
View PDF

S5789 - The Fast Lane from Silicon Valley to Munich

Uwe Higgen BMW Group Technology Office USA, BMW Group
Uwe Higgen
Effective September 1, 2014 Uwe Higgen was appointed Head of the BMW Group Technology Office USA. In this position, he is responsible for accelerating the delivery of automotive innovation to customers through the evaluation, development and design of new technologies. Higgen oversees a team of highly talented engineers specializing in Connected Car, Electromobility, Powertrain, Autonomous Driving and User Experience/Interface Design. His Silicon Valley-based team produces work that enables BMW to be the future, see the future and reimagine the future of world-class automotive engineering for individual mobility. The BMW Group Technology Office USA focuses on human-machine interfaces, mechatronics, infotainment, telematics and creating new portals and opportunities for business communication. Prior to his arrival in the U.S., Higgen served as the Head of BMW Group AppCenter in Munich. In this role, he was responsible for delivering the integration of information and entertainment smartphone applications into the vehicle. Higgen began his career at BMW Group in 2001. He holds a master degree in Computer Science from The Carl von Ossietzky University of Oldenburg.

Learn how the BMW Group Technology Office in Silicon Valley integrates with the automaker's world-wide research and development departments, with a specific focus on an active safety system running on NVIDIA hardware, recently developed for the i3. As one of the first automakers to open a research and development office in Silicon Valley, BMW has a long history of innovation in the Bay Area. Projects range from series vehicles to Formula 1, and from research to pre-development – including the iDrive interface, Apps4Automotive, the all-electric Mini-E, and Head-Up Displays.

Level: Beginner
Type: Talk
Tags: Automotive; Embedded Systems; Computer Vision & Machine Vision; Press-Suggested Sessions: Cars

Day: Wednesday, 03/18
Time: 15:30 - 15:55
Location: Room LL20D
View Recording
View PDF

S5815 - Programming Pointers to Optimize your In-Situ Visualization Pipeline

Shalini Venkataraman Senior Applied Engineer, NVIDIA
Highly-Rated Speaker
Shalini Venkataraman
Shalini Venkataraman is a Senior Applied Engineer in NVIDIA’s Professional Solutions Group where she works on using GPUs to solve large-scale imaging and visualization problems in Medical, Oil&Gas and Scientific Computing domains. Prior to that she was a researcher at various High Performance Computing centers in the US and Singapore. Her interests are in parallel and large data visualization. She has a MS in Computer Science from the Electronic Visualization Lab, University of Illinois-Chicago and BS from the National University of Singapore.

We will show programming tips and techniques to maximize performance in your in-situ vsialization pipeline. Specific topics include creating OpenGL contexts offscreen, using hardware-based encoding for rendered images and optimizing the readback performance using multi-context multi-threaded OpenGL.

Level: Intermediate
Type: Talk
Tags: Visualization - In-Situ & Scientific; Real-Time Graphics

Day: Wednesday, 03/18
Time: 15:30 - 15:55
Location: Room LL21A
View Recording

ECS5013 - Early Stage Challenge: Redshift Rendering Technologies

Nicolas Burtnyk CEO, Redshift Rendering Technologies
Nicolas Burtnyk
Prior to co-founding Redshift, Nicolas was Studio Technical Director at Double Helix Games where he lead the studio's central technology group, overseeing the development of two generations of a proprietary game engine, renderer and associated tools. He is an expert in human-computer interaction, specializing in 3D tools for the film, television, video game and industrial design industries.

Redshift Rendering Technologies develops "Redshift", the world's fastest, most fully-featured, most flexible GPU-accelerated renderer. Redshift has already been used in production by hundreds of small-to-midsize studios and freelancers around the world, and is currently being evaluated by some of the industry's larger players. Redshift Rendering Technologies was founded by Panagiotis ("Panos") Zompolas, Rob Slater and Nicolas Burtnyk, game development veterans who have actively participated in the GPU rendering and GPU computing revolution. Redshift proves that final-frame GPU-accelerated rendering can compete with the industry's top CPU rendering solutions as a cost-effective, high-performance alternative.

Level: All
Type: Talk
Tags: Rendering & Ray Tracing; Media & Entertainment; Emerging Companies Summit

Day: Wednesday, 03/18
Time: 15:40 - 15:48
Location: Room 220B

S5693 - Tyan OpenPOWER Products and Future Product Plans

Albert Mu General Manager, Business Unit, Tyan
Albert Mu is Vice President at MiTAC Computing Technology Corporation and General Manager for Tyan Business Unit. From 2005 to 2008 he was with Intel as General Manager of Global Server Innovation Group (GSIG) with the charter to develop differentiated system platform products for Internet Portal Data Center and Cloud segments. Prior to Intel, Albert Mu was Vice President and General Manager of Network, Storage, and Server Group (NSSG) at Promise Technologies, Inc. and Corporate Vice President and Chief Technology Officer at Wistron Corporation. Prior to Wistron, he was Vice President of Engineering at Clarent Corporation and worked at CISCO, HaL Computer and MIPS Computer. Mr. Mu received a BSEE degree from National Chiao Tung University, Hsinchu, Taiwan, MSEE degree from the University of Texas, at Austin and MS Engineering Management from Stanford University.

Introduce TYAN and brief on what contribution has been made to OpenPOWER community in the past twelve months. TYAN will also share the future product plan and associate milestones to the audiences. Invited to participate in OpenPOWER Foundation, TYAN developed the OpenPOWER reference board following the spirit of innovation and collaboration that defines the OpenPOWER architecture. In addition, TYAN contribute the associate reference design to the community. In the presentation, TYAN would like to share our value proposition to the community and reveal future product plan and associate milestones to the audiences participating in the first OpenPOWER Summit 2015.

Level: All
Type: Talk
Tags: OpenPOWER; Supercomputing

Day: Wednesday, 03/18
Time: 15:45 - 16:00
Location: Room 220C
View Recording

ECS5014 - Early Stage Challenge: GIS Federal

Nima Negahban CTO, GIS Federal
Nima Negahban
Nima has developed innovative big data systems across a wide spectrum of market sectors ranging from bio-technology to high speed trading systems. Nima acted as the core developer and architecht of GPUdb working in conjunction with Amit Vij to revolutionize large scale distributed data processing using many core devices. As part of the management team Nima helps define strategic goals, and is actively involved in GIS Federal businness operations. Nima holds a BS in computer science from the University of Maryland.

GIS Federal started in 2009 with two partners and no external funding. The company started as a consulting services company to government organizations, mainly the US Army, providing Subject Matter Expert support for geospatial intelligence and analyzing streaming big data. The two partners created GPUdb in 2009 which is a distributed GPU accelerated database with a built-in rendering pipeline. The company utilizes the profits from the consulting services of the company to help fund the development of GPUdb.

Level: All
Type: Talk
Tags: Big Data Analytics; Supercomputing; Emerging Companies Summit

Day: Wednesday, 03/18
Time: 15:50 - 15:58
Location: Room 220B

ECS5015 - Early Stage Challenge: NE Scientific LLC

Andrea Borsic CEO, NE Scientific LLC
Andrea Borsic
Andrea Borsic is the CEO and founder of NE Scientific LLC. Andrea has substantial experience in Medical Imaging, Computational Physics, Visualization, and High Performance Computing on GPUs, and he is the recipient of a National Cancer Institute small business grant directed at the development of a computerized platform for guiding a specific surgical intervention for treating cancer. Dr. Borsic has been a faculty at Thayer School of Engineering at Dartmouth in the period 2006 – 2012, where he has conducted research in the area of Computer Science applied to Medicine, developing new modalities and algorithms for breast and prostate cancer imaging / detection, and taught courses in the field of Medical Image Analysis, Visualization, and GPU Accelerated Computing. Dr. Borsic holds PhD in Electronic Engineering from Oxford Brookes University, UK, and an MSc and BSc in EE from the Polytechnic of Turin, Italy.

NE Scientific is a startup company founded in 2012 and operating in the fields of Scientific Computing, Medical Image Analysis, Modeling, and Visualization. NE Scientific is currently engaged in projects for grading prostate cancer with Magnetic Resonance Imaging, and for imaging lungs in the intensive care unit with a novel imaging modality based on sensing tissues with low intensity electric fields. NE Scientific collaborates with Dartmouth College in US and with the University of Manchester in Europe. Recently NE Scientific has been sponsored by the National Cancer Institute for developing an intraoperative guidance platform for a specific cancer treatment called Radio Frequency Ablation, where Radio Frequency energy is used to kill malignant tissues by thermal ablation. In developing a planning / guidance platform for this application GPUs play a critical and enabling role: in order to guide the intervention it is necessary to solve an electrical/thermal coupled Partial Differential Equation several times per second. NE Scientific was able to secure funding by successfully demonstrating accelerated prototypes of the main algorithms involved.

Level: All
Type: Talk
Tags: Medical Imaging; Computational Physics; Emerging Companies Summit

Day: Wednesday, 03/18
Time: 16:00 - 16:08
Location: Room 220B

S5126 - GPU Accelerated Backtesting and Machine Learning for Quant Trading Strategies

Daniel Egloff Partner InCube Group, Managing Director QuantAlea, InCube Group and QuantAlea
Highly-Rated Speaker
Daniel Egloff
In 2008 Daniel Egloff set up his own software engineering and consulting company and founded QuantAlea by the end of 2009. Since then he has advised several high profile clients on quantitative finance, software development and high performance computing. In 2014 QuantAlea and InCube merged and he became partner of InCube Group and Managing Director of QuantAlea. He is a well-known expert in GPU computing and parallel algorithms and successfully applied GPUs in productive systems for derivative pricing, risk calculations and statistical analysis. Before setting up his own company he had spent more than fifteen years in the financial service industry, where his work revolved around derivative pricing, risk management with a special focus on market and credit risk, and high performance computing on clusters and grids. He studied mathematics, theoretical physics and computer science at the University of Zurich and the ETH Zurich, and has a Ph.D. in mathematics from the University of Fribourg, Switzerland.

In algorithmic trading large amounts of time series data are analyzed to derive buy and sell orders so that the strategy is profitable but also risk measures are at an acceptable level. Bootstrapping walk forward optimization is becoming increasingly popular to avoid curve fitting and data snooping. It is computationally extremely expensive but can be very well distributed to a GPU cluster. We present a framework for bootstrapping walk forward optimization of trading strategies on GPU clusters, which allows us to analyze strategies in minutes instead of days. Moreover, we show how signal generation can be combined with machine learning to make the strategies more adaptive to further improve the robustness and profitability.

Level: All
Type: Talk
Tags: Finance; Machine Learning & Deep Learning

Day: Wednesday, 03/18
Time: 16:00 - 16:50
Location: Room 210C
View Recording
View PDF

S5144 - Cluster Monitoring and Management Tools

Rajat Phull Software Engineer, NVIDIA
Rajat Phull
Rajat is a software engineer on the NVIDIA team responsible for monitoring and management tools. His primary focus is the NVML library and the associated suite of nvidia-smi utilities. He has been working on GPU-related projects for 5+ years.
Rob Todd Software Engineer, NVIDIA
Rob is a software engineer on the NVIDIA team responsible for monitoring and management tools. His primary focus is the NVIDIA Validation Suite. He has been working on HPC-related projects for 10+ years.

Learn about the monitoring and management tools that NVIDIA provides for professional GPUs in HPC, cluster and datacenter environments. This talk will provide a high level overview of the relevant APIs and utilities, dive more deeply into several new features from recent CUDA releases, and review how this functionality is integrated into user environments and 3rd-party software.

Level: Beginner
Type: Talk
Tags: Data Center, Cloud Computing & HPC; Supercomputing

Day: Wednesday, 03/18
Time: 16:00 - 16:50
Location: Room 210E
View Recording
View PDF

S5187 - Real-Time Camera Tracking in the "1st & 10" System

Louis Gentry Principal Software Engineer, Sportvision
Louis Gentry
Louis Gentry is the lead developer for football, emerging sports, and core technologies at Sportvision and is responsible for the design and implementation of real-time broadcast rendering platforms. He has years of prior experience working in computer graphics and video for SGI, Pinnacle Systems, and other companies. In the last ten years at Sportvision, Louis has designed and implemented key technologies and systems used on broadcasts for ESPN, FOX, ABC, NFL Network, CBS, and other clients.
Rand Pendleton Senior Scientist and Advisor, Sportvision
Senior Scientist and Advisor at Sportvision assisting on various development projects with an emphasis on field deployment, camera tracking and algorithms. Prior to Sportvision Rand worked in defense related contract research, as high power microwave consultant, engineering physicist at Stanford Linear Accelerator Center and as a microwave tube engineer at Varian Associates.

Sportvision's "1st & 10" real-time system for displaying graphics during American football games has traditionally relied on hardware to calibrate and compute camera parameters necessary for inserting the "yellow line" and other effects into the scene. The hardware solution is limited to lock-down, broadcast cameras only. The vast compute power available in GPUs today provided a means for expanding the system to support both lock-down and mobile cameras without the need for hardware sensors. In this presentation, we will discuss how the optical camera tracking system works and its use on live NFL broadcasts.

Level: All
Type: Talk
Tags: Media & Entertainment; Augmented Reality & Virtual Reality; Video & Image Processing; Real-Time Graphics

Day: Wednesday, 03/18
Time: 16:00 - 16:25
Location: Room LL21D
View Recording
View PDF

S5224 - Unleashing The Power Of GPUs Over The Web

Vishal Vaidyanathan Partner, Royal Caliber
Highly-Rated Speaker
Vishal Vaidyanathan
Vishal Vaidyanathan graduated from Stanford University in 2007 with a Ph.D. in Computational Chemistry and an M.S. in Financial Mathematics. He developed the first Folding@Home client that used GPUs to accelerate biomolecular simulations by 50 times over what was previously possible. From 2007-2009 Vishal worked at Goldman Sachs developing the first fully automated high frequency trading solution for the US Treasury desk in New York. Subsequently as co-founder of a startup in Silicon Valley, he developed low-latency trading systems and HFT strategies for futures contracts. Vishal joined Royal Caliber as a partner in 2012 where he has developed HPC solutions across a wide spectrum of application areas.

GPUs have demonstrated regime-changing performance in a wide variety of applications. But there remain many engineering challenges to the adoption of GPUs in the mainstream, especially when operating at scale. We present a new infrastructure that provides a suite of GPU-driven machine learning and graph algorithms as a web service. The effortless usability of an HTTP API unlocks the power of GPU computing with none of the attendant complexities. As examples, we will show interactive analytics on web-scale graphs and deep learning on large data sets using nothing more than a modern web browser.

Level: Beginner
Type: Talk
Tags: Big Data Analytics; Data Center, Cloud Computing & HPC; Machine Learning & Deep Learning

Day: Wednesday, 03/18
Time: 16:00 - 16:25
Location: Room 210D
View Recording

S5226 - Single Precision Hybrid Model for Molecular Dynamics Simulations

Ross Walker Associate Professor, University of California San Diego
Ross Walker
Ross Walker is an Associate Research Professor at the San Diego Supercomputer Center, an Adjunct Associate Professor in the Department of Chemistry and Biochemistry at the University of California, San Diego, and an NVIDIA Fellow. He runs the Walker Molecular Dynamics Lab in San Diego where he leads a team that develops advanced techniques for Molecular Dynamics Simulations supporting work aimed at improved drug and biocatalyst design. His work includes leading the development of new force fields for simulation of lipid membranes, improved Quantum Mechanical/Molecular Mechanical models, automated force field parameter refinement techniques and the development of the world's fastest GPU accelerated Molecular Dynamics software released as the AMBER Molecular Dynamics engine PMEMD.
Scott LeGrand Senior Engineer, Amazon
Highly-Rated Speaker
Scott Le Grand is currently a principal engineer at Amazon working the personalization team. He developed the first molecular modeling system for home computers, Genesis, in 1987, Folderol, the distributed computing project targeted at the protein folding problem in 2000, and BattleSphere, a networkable 3D space shooter for the Atari Jaguar the same year. Surprisingly, all three of these efforts shared a common codebase. More recently, he ported the Folding@Home codebase to CUDA, achieving a 5x speedup over previous efforts, and which currently accounts for ~2.6 petaFLOPs of the project’s computational firepower. He is best known for his work porting the AMBER molecular dynamics package to CUDA, attaining record-breaking performance in the process. In a previous life, Scott picked up a B.S. in biology from Siena College and a Ph.D. in biochemistry from the Pennsylvania State University. In the current life, he is optimizing the performance of deep neural networks by day and continuing to work on AMBER by night.

In this talk we will highlight the work we have done to develop what we term the SPXP precision model. This is the first fully single and fixed precision hybrid model to provide conservation of energy in MD simulations equivalent to full double precision runs but without the need for double precision arithmetic. By exploiting the nature of fixed precision arithmetic and custom machine code accumulator functions we can effectively emulate double precision performance in the latest generation GPUs.

Level: All
Type: Talk
Tags: Life & Material Science; Developer - Performance Optimization

Day: Wednesday, 03/18
Time: 16:00 - 16:25
Location: Room 212A
View Recording

S5235 - Displacement Mapping: a New Method to Achieve Realistic Geo-Specific Feature Densities

Brett Chladny Project Manager, Renaissance Sciences Corporation
Brett Chladny
With over 16 years of experience, Brett Chladny has worked on multiple programs requiring his design, development, and management skills. Brett has a Masters degree in Computer Science from the University of Missouri – Columbia and began his career at Silicon Graphics. After that, he worked for MultiGen-Paradigm where he became the technical lead for the F-16 NVG and IR program at Luke AFB. Brett currently works for Renaissance Sciences and develops methods to improve the physical accuracy of sensor and out the window visual simulations. Brett is also the Principal Investigator for an ongoing Navy Phase II SBIR project.

Explore new techniques in identifying, representing, and rendering realistic number of geo-specifically placed features in synthetic environments. Adding 3D models to synthetic environments greatly enhance visual cues that enable the perception of depth, motion, and realism. However, constraints in hardware performance and budgets often limit the amount of 3D features in the scene. This session presents an innovative automated process that leverages geospatial data sources and GPU tessellation technologies to inject realistic numbers of features. A framework for extracting feature information from commonly available data will be discussed. We will also explore a new a library that uses the power of modern GPUs to achieve near constant rendering performance regardless of feature density.

Level: All
Type: Talk
Tags: Real-Time Graphics; Visualization - Large Scale & Multi-Display

Day: Wednesday, 03/18
Time: 16:00 - 16:25
Location: Room LL21B
View Recording
View PDF

S5336 - VDI Evolution at the Speed of GRID - VDI 2.0 IS Here! (Presented by Cisco)

Shawn Kaiser Technical Solutions Architect, Cisco Systems
Shawn Kaiser
Shawn Kaiser is a member of the Data Center Solutions Architecture team focusing on Virtualization Competency. Shawn’s primary background is in Virtual Infrastructure Architecture and Design where he designed and ran one of the first production Virtualization environments with VMware ESX 1.5. He has acted as a consultant for a number of years to help customers realize the benefits of Virtualization, and has done many assessments, designs and implementations. Shawn’s primary role in Cisco is to support other Sales Engineers and customers around opportunities for Virtualization and VDI with the Unified Computing System (UCS).
Jason Marchesano Technical Solutions Architect, Cisco Systems, Inc.
Jason Marchesano
Jason Marchesano is a Technical Solutions Architect for Cisco Systems, Inc., focused on Server and Desktop Virtualization on x86 platforms. Jason has experience with Server, Storage, and networking infrastructure architectures working as a customer and then as a systems integration consultant in the IT field before joining Cisco Systems.

2015 will be the year for VDI and desktop virtualization – Learn how Cisco and NVIDIA partner to deliver this next generation Desktop Virtualization Solution with GRID vGPU acceleration. VDI is not the same animal it used to be: User requirements and expectations have changed as has the Operating system and applications that feeds the beast. Join industry solution experts Shawn Kaiser and Jason Marchesano to discuss how you can evolve to VDI 2.0 and literally put the past to rest.

Level: All
Type: Talk
Tags: Graphics Virtualization; Press-Suggested Sessions: Professional Graphics

Day: Wednesday, 03/18
Time: 16:00 - 16:50
Location: Room 210F
View Recording
View PDF

S5385 - Benchmarking 3D Workloads at Scale on NVIDIA GRID with Horizon View 6 Using View Planner (Presented by VMWare)

Banit Agrawal Senior Performance Engineer, VMware
Dr. Banit Agrawal is a Sr. Performance Engineer at VMware. He has filed several patents in the area of VMware View, 3D graphics, remote display protocols, VMware View Planner, and performance troubleshooting. He has given numerous talks at VMworlds and external academic conferences. Recently he has been focussing on Docker container performance. He holds a PhD degree in Computer Science from University of California, Santa Barbara.
Luke Wignall GRID Performance Engineering Manager, NVIDIA
Highly-Rated Speaker
Luke Wignall
Luke came to NVIDIA after working as an owner of an integrator/VAR, as a sales engineer, solution architect, consultant, and system administrator with both VMware and Citrix technologies in both public and private industry. An early evangelist of virtualization, Luke now sees the ability to bring GPU to the end user experience as the missing "special sauce" that brings virtual desktops to the next level.
Lan Vu Performance Engineer, VMware
Dr. Lan Vu is working on performance engineering at VMware, focusing on optimizing performance and scalability of 3D graphics virtualization solutions. Prior joining VMware, she worked at Parallel Distributed Systems Lab, University of Colorado Denver for 5 years with research focus on high performance methods in data mining. She has involved in multiple GPU-related projects and published two peer-reviewed papers on GPGPU. She holds a Ph.D. in Computer Science & Information Systems from University of Colorado Denver.

If you are looking for the guidance and the "tool" on how to do the scaling for the 3D workloads in Horizon 6 with View on NVidia Grid GPU, you have come to the right session. In this session, we provide a deep dive on the scale testing of various 3D workloads using View Planner 3.5 tool. View Planner 3.5 is a capacity planning tool that supports the real user workload including Office applications, video, audio, Interactive (mouse) tests and characterizes the true user experience for desktops and also has a feature of bring your own applications (BYOA). Using the BYOA feature of this tool, we show how you can quickly characterize your GRID GPU to get the scaling results for different 3D workloads and benchmarks while meeting the desired user experience.

Level: Intermediate
Type: Talk
Tags: Graphics Virtualization

Day: Wednesday, 03/18
Time: 16:00 - 16:50
Location: Room LL20C
View Recording

S5457 - Maximizing Face Detection Performance on GPUs

Paulius Micikevicius devtech engineer, NVIDIA
Paulius Micikevicius is a developer technology engineer at NVIDIA, focusing on performance analysis and optimization. Prior to joining NVIDIA he was an assistant professor of Computer Science at Armstrong Atlantic State University. Paulius holds a PhD in Computer Science from University of Central Florida and BSc from Midwestern State University.

In this talk we look at GPU performance optimization for face detection using various techniques and features, including cascades with Haar-like features, multi-block local binary patterns. For each approach we examine various implementation tradeoffs and their performance limiters, as well as performance dependence on data. We also investigate optimization by combining the approaches and by doing additional work pruning.

Level: Intermediate
Type: Talk
Tags: Computer Vision & Machine Vision; Developer - Performance Optimization

Day: Wednesday, 03/18
Time: 16:00 - 16:50
Location: Room 210B
View Recording
View PDF

S5515 - Porting Apps to Titan: Results from the Inaugural GPU Hackathon

Mi Sun Min Computational Scientist, Mathematics and Computer Science Division, Agronne National Laboratory
Mi Sun Min
Misun has been a Computational Scientist in the MCS Division of Argonne National Laboratory since 2011.
Fernanda Foertter HPC User Assistance Specialist, Oak Ridge National Laboratory
Highly-Rated Speaker
Fernanda Foertter
Fernanda Foertter is a member of the User Assistance Team at the National Center for Computational Sciences (NCCS) located at Oak Ridge National Laboratory (ORNL). This team is responsible for assisting all users at the Oak Ridge Leadership Computing Facility (OLCF). Fernanda is responsible for the training program at the center and represents OLCF at both the OpenACC and OpenMP organizations.
Adam Simpson To come, Oak Ridge National Laboratory
Adam Simpson is a member of the User Assistance team at the National Center for Computational Sciences (NCCS) located at Oak Ridge National Laboratory (ORNL). Adam's primary role involves user support for all OLCF compute resources with a particular focus on Titan's 18,688 GPUs. When not answering user questions Adam is typically busy creating GPU documentation and tutorials.
Steven Young Postdoctoral Researcher, Oak Ridge National Laboratory
Steven Young
Dr. Steven R. Young holds a PhD in Computer Engineering and his research focuses on the field of machine learning. His dissertation work included modeling an analog implementation of a deep learning method and analyzing the effects upon performance. He has worked on projects using deep learning for various computer vision tasks including classification and detection.
Seth Johnson R&D Staff, Monte Carlo Methods , Oak Ridge National Laboratory
Seth received his Ph.D. in Nuclear Engineering and Radiological Sciences from the University of Michigan, Ann Arbor. He also has an M.S.E. in Nuclear Engineering and Radiological Sciences from the University of Michigan, Ann Arbor and a B.S. in Nuclear Engineering (minor in Mathematics), Summa Cum Laude, from Texas A&M University, College Station.

This session will showcase the results of the inaugural GPU Hackathon held at the Oak Ridge Leadership Computing Facility. The event hosted six teams paired with mentors over a week where applications where ported to GPUs using OpenACC directives. The talk will describe the progress of each team from beginning to end as well as details about their implementation. Best practices, lessons learned as well as anecdotes from mentors who participated in this training event will be shared.

Level: All
Type: Talk
Tags: OpenACC; Developer - Programming Languages; Supercomputing

Day: Wednesday, 03/18
Time: 16:00 - 16:50
Location: Room 210H
View Recording
View PDF

S5536 - Bringing Physically Based Rendering to your Application

Martin-Karl Lefrancois DevTech Lead, NVIDIA
Martin-Karl Lefrançois is senior graphics software engineer at NVIDIA ARC in Berlin. Under his lead, his team constantly explores best usage of NVIDA ARC products. After graduating with a degree in computer science and mathematics from the University of Sherbrooke in Quebec, he worked as a graphics developer for nearly ten years at Softimage in Montreal and Tokyo, leading the core game engine team at A2M, joining mental images over a decade, before joining NVIDIA in Berlin.

A study in how key applications have incorporated NVIDIA® Iray® and how you can do the same. The presentation begins with an overview of NVIDIA® Iray®. We then will show examples of NVIDIA® Iray® integrations, an overview of various supported geometry, what is MDL and rendering using NVIDIA® Visual Computing Appliance. Come to this session to learn how to create a custom rendering experience that fits your application.

Level: Beginner
Type: Talk
Tags: Rendering & Ray Tracing; Media & Entertainment; Product Design & Styling

Day: Wednesday, 03/18
Time: 16:00 - 16:50
Location: Room LL21E
View Recording
View PDF

S5610 - High Performance In-Situ Visualization with Thousands of GPUs

Evghenii Gaburov Supercomputing, SURFsara (CUDA Research Center)
Highly-Rated Speaker
Evghenii Gaburov
Evghenii Gaburov received his MPhys with Astrophysics from University of Leicester. He continued with Ph.D. research at the University of Amsterdam working on stellar dynamics, stellar collisions and parallel processing on GPUs. Afterwards he spent two years at Leiden Observatory investigating the impact of strong magnetic field on accretion disks around supermassive black holes. He continued with this research at the Northwestern University on the prestigious CIERA and NASA Hubble postdoctorate fellowships. Later he joined SURFsara, the Dutch national supercomputing center, where he helps researchers harness computing power packed in modern parallel processors.
Jeroen Bédorf Postdoc, CWI
Jeroen Bédorf received his MSc in Computer Science at the University of Amsterdam after which he continued to do a Ph.D. at Leiden University in Astronomy. His research was focused on fast N-body methods for GPU accelerators and merging galaxies. After obtaining his Ph.D. he started a postdoc at the CWI (Centre for Math and Computer Science) in Amsterdam where he works on tomographic image reconstructions.

In-situ visualization is one of the major themes in HPC. The ability to attach a massively parallel visualization tool to a live simulation can be valuable to the researches, whose simulations may last days or even weeks on a supercomputer. High Performance In-Situ Visualization allows to visualize, interact and conduct data analysis in real-time, thereby enabling an efficient and intuitive discovery process. The ability of Tesla GPUs to compute and render simultaneously enables a wide range of high performance in-situ analysis scenarios with little overhead. In this talk we'll present our first attempt at this using the US Titan and Swiss Piz Daint supercomputers. We will present our solutions to the rendering pipeline that allowed us to achieve ~10fps on 1024 GPUs.

Level: Intermediate
Type: Talk
Tags: Visualization - In-Situ & Scientific; Supercomputing; Astronomy & Astrophysics

Day: Wednesday, 03/18
Time: 16:00 - 16:25
Location: Room LL21C
View Recording

S5628 - Simulation-Based CGI for Automotive Applications

Benoit Deschamps CGI Solutions - Team Leader, PSA Peugeot Citroën
Benoit Deschamps
Benoit received his Master's degree of Imaging & Multimedia, University of Bordeaux Prior to becoming Team Leader for Imaging Solutions at PSA Peugeot Citroën, Benoit worked as a Project Manager for Imaging Solutions at PSA and a Development Engineer for an automotive company.

To reduce the gap between a physical mock-up and a virtual mock-up, a combination of real-time rendering and simulation enable better decision making. Leveraging NVIDIA Optix to develop specific automotive tools, we are able to run simulations and visualize solutions to a wide range of problems, such as what is the best vehicle geometry to minimize gravel impact on the door. In addition, tools such as RTT DeltaGen enable photo real results that help us experiment and visualize changing vehicle designs; for example when changing the slope of the windshield, how are elements inside the car affected due to the reflective properties of glass.

Level: All
Type: Talk
Tags: Manufacturing; Automotive; Rendering & Ray Tracing; Press-Suggested Sessions: Cars

Day: Wednesday, 03/18
Time: 16:00 - 16:25
Location: Room LL21A
View Recording
View PDF

S5637B - ZFAS - The Brain of Piloted Driving at Audi

Matthias Rudolph Head of Architecture Driver Assistance Systems, Audi AG
Matthias Rudolph
Dr. Rudolph studied Electrical Engineering at the University of Kassel and got his Ph.D. in Aerospace Engineering and Engineering Mechanics from Iowa State in 1999 with a minor in mathematics. After holding various positions at Audi he took in 2009 the Lead of the Department "Architecture Driver Assistance Systems". The zFAS project is one of the core development of the department. Dr. Rudolph is a member of the management at Audi.

During the last several years, Audi has developed with partners a platform that enables piloted driving and piloted parking. At CES 2015 it was shown that the system can drive piloted on the highway from Silicon Valley to Las Vegas. The computational platform or brain of this vehicle is called zFAS, with the core element being the NVIDIA Tegra K1. This talk will start with the history and the motivation of piloted functions at Audi, followed by an overview of the current architecture and an outline of future potential leveraging deep learning algorithms.

Level: Intermediate
Type: Talk
Tags: Automotive; Computer Vision & Machine Vision; Video & Image Processing; Press-Suggested Sessions: Cars

Day: Wednesday, 03/18
Time: 16:00 - 16:25
Location: Room LL20D
View Recording

S5694 - Key-Value Store Acceleration with OpenPOWER

Michaela Blott Principal Engineer, Xilinx
Michaela Blott graduated from the University of Kaiserslautern in Germany. She worked in both research institutions (ETH and Bell Labs) as well as development organizations and was deeply involved in large scale international collaborations such as NetFPGA-10G. Her expertise spreads high-speed networking, emerging memory technologies, data centers and distributed computing systems with a focus on FPGA-based implementations. Today, she works as a principal engineer at the Xilinx labs in Dublin heading a team of international researchers. Her key responsibility is exploring applications, system architectures and new design flows for FPGAs in data centers.

Distributed key-value stores such as memcached form a critical middleware application within today's web infrastructure. However, typical x86-based systems yield limited performance scalability and high power consumption as their architecture with its optimization for single thread performance is not well-matched towards the memory-intensive and parallel nature of this application.

Level: All
Type: Talk
Tags: OpenPOWER; Supercomputing

Day: Wednesday, 03/18
Time: 16:00 - 16:15
Location: Room 220C
View Recording

ECS5016 - Early Stage Challenge: SYSTAP, LLC

Brad Bebee CEO, SYSTAP, LLC
Brad Bebee
Brad has a passion for helping customers navigate complex technology and business challenges and delivering products and solutions that solve them quickly and effectively. he has focused on participating and running businesses that apply novel and advanced technology solutions to new mission and business problems. Over the course of his career, he has served as a CTO, CFO, managed operating divisions, and performed advanced technology development for commercial and government customers. His technology experience ranges from early work in modeling methodologies and knowledge representation dating back to precursors of DARPA's DAML program to more recent work with large scale data analytics using the Hadoop ecosystem, Accumulo, and related technologies. In his current role with SYSTAP, LLC, he is focused on leveraging products for high performance graph databases and analytics into business and mission areas. He specializes in the architecture, design, and development of advanced technology systems. In this capacity, he has directed the design and implementation of multiple system development efforts, in addition to leading and overseen multiple advanced technology development programs ranging from research pilots to enterprise agency deployments. Brad has extensive experience in architecture and software modeling methodologies, where he has lead and collaborated upon multiple publications receiving recognition for his research. In 2006, he was selected as a participant in the National Academy of Engineering's 2006 U.S. Frontiers of Engineering Symposium.

SYSTAP, LLC was founded with vision of building high quality, highly scalable, software solutions for big graphs. While graph problems may look similar to other big data challenges from the outside, they have very different computational workloads and scaling requirements. Techniques that work on a small scale will often fail to deliver on larger graphs. SYSTAP's solutions fill the gap created by this "big graph anti-pattern". We believe the only way to get scaling and high throughput for graph traversal and graph mining is to get the architecture, the software, and the hardware right. SYSTAP's MapGraph is a new and disruptive technology for organizations that need to process large graphs in near-real time using GPUs. It cost effectively brings the capabilities of High Performance Computing (HPC) to your organization's biggest and most time critical graph challenges. MapGraph provides a familiar vertex-centric graph programming model, but its GPU acceleration is 100s of times faster than competing CPU-only technologies and up to 50,000 times faster than graph technologies based on key-value stores such as HBase, Titan, and Accumulo.

Level: All
Type: Talk
Tags: Big Data Analytics; Data Center, Cloud Computing & HPC; Emerging Companies Summit

Day: Wednesday, 03/18
Time: 16:10 - 16:18
Location: Room 220B

S5688 - PGI Compilers for OpenPOWER Platforms

Douglas Miles Director, PGI Compilers & Tools
Doug Miles, director, PGI Compilers & Tools, since 2003. Prior to joining PGI in 1993, Doug was an applications engineer at Cray Research Superservers and Floating Point Systems.

High-performance computing (HPC) systems are now built around a de facto node architecture with high-speed latency-optimized SIMD-capable CPUs coupled to massively parallel bandwidth-optimized Accelerators. In recent years, as many as 90% of the Top 500 Computing systems relied entirely on x86 CPU-based systems. OpenPOWER and the increasing success of Accelerator-based systems offer an alternative that promises unrivaled multi-core CPU performance and closer coupling of CPUs and GPUs through technologies like NVIDIA's NVLink high-speed interconnect.

Level: All
Type: Talk
Tags: OpenPOWER; Supercomputing; Data Center, Cloud Computing & HPC

Day: Wednesday, 03/18
Time: 16:15 - 16:30
Location: Room 220C
View Recording

ECS5017 - Early Stage Challenge: INTEMPORA

Xavier Rouah CTO, INTEMPORA
Xavier Rouah
Xavier Rouah is CTO at INTEMPORA SA. Graduated Engineer in Embedded Systems, he worked with French car manufacturers to experiment cooperative intelligent transport systems. This professional background helped him to understand advanced driving assistance systems needs and complexity. Since he joined INTEMPORA, Xavier works on INTEMPORA's RTMaps software portability on embedded systems. To meet the growing power needs of intensive image processing algorithms for driving assistance systems, he started working on heterogeneous architectures with scientists and experts from car manufacturers. He has responsibility for designing the integration architecture of intensive computation technologies into RTMaps.

Intempora is an Independent Software Vendor and provides the RTMaps technology. RTMaps is a modular software development and execution platform for the design of real time heterogeneous multi sensors applications and systems. Intempora is strongly present at Advanced Driving Assistance Systems R&D activities up to Autonomous Cars projects (with customers such as Renault, PSA, Valeo, Honda, ESG, SAIC Motor,… and numerous research labs such as INRIA, CEA, IFSTTAR, VEDECOM, DLR, Shanghai Jiao Tong, …), as well as in the robotics domain (THALES, DGA, DCNS, Airbus group…), cognitive load assessment, advanced multimodal HMIs development,… RTMaps and the Intempora team accompany researchers and engineers in all stages of their ADAS and Autonomous vehicles software development process: • Datalogging and data playback of multiple high-bandwidth sensors data streams (cameras, CAN & LIN bus, GPS, lidars, radars, IMUs, etc.) • Data management • Real-time or offline data processing and data fusion for perception • Navigation and communication • Decision making • Command-control • Multimodal HMIs development • Human factors studies • Validation and benchmarking

Level: All
Type: Talk
Tags: Automotive; Video & Image Processing; Emerging Companies Summit

Day: Wednesday, 03/18
Time: 16:20 - 16:28
Location: Room 220B

ECS5018 - Early Stage Challenge: Vote & Award Ceremony

Share in the excitement and come see the Early Stage Challenge participants and help the judges select the one winning start-up company!

Level: All
Type: Talk
Tags: Emerging Companies Summit

Day: Wednesday, 03/18
Time: 16:30 - 16:55
Location: Room 220B

S5176 - Fast Triangle Counting for Social Network Analytics on the K40

Oded Green COO, ArrayFire
Oded Green
Oded Green received a Ph.D. in Computational Science and Engineering from Georgia Tech. Oded has a MSc in electrical engineering and a BSc in computer engineering, both from the Technion0 .Oded's work has focused on optimizing and designing algorithms for high performance computing systems. Oded just recently joined ArrayFire as the COO.

In this session we will explore a new approach for counting triangles in networks that partitions the work at multiple parallel granularties. This new approach is highly scalable and is appropriate for both sparse and dense networks.

Level: Intermediate
Type: Talk
Tags: Big Data Analytics; Developer - Algorithms

Day: Wednesday, 03/18
Time: 16:30 - 16:55
Location: Room 210D
View Recording
View PDF

S5319 - CUDA in Urban Search and Rescue: Mission Planing Module for Icarus Project

Pawel Musialik Programmer and Young Researcher, Institute of Mathematical Machines
Pawel Musialik
Pawel is a graduate of Warsaw University of Technology, currently a Ph.D. candidate on Military University of Technology in Warsaw. Since February 2012 Pawel has been a young scientist and programmer in Institute of Mathematical Machines. His current research topics are semantic maps, 3D point cloud analysis and quantitative and qualitative reasoning. Pawel posseses over 5 years of C++ experience, 2 years as CUDA programmer, and 4 years of experience in academic lectures.

This session will concentrate on the topic of mission planning for search hand rescue personnel and how CUDA can help in this task. Urban Search and Rescue is a challenging and important activity in current society. The ICARUS project (Integrated Components for Assisted Rescue and Unmanned Search operations) concentrates on aiding these efforts by providing robotic components for rescue teams. Adding mobile robots into the tray rises the need for additional planning effort, which would consume a lot of time using classical approach. We will present how this can be prevented by using CUDA based mission planners for solving tasks like path planning, patrol communication relay location etc. A number of CUDA-implemented algorithms will be shown along with example results.

Level: All
Type: Talk
Tags: Machine Learning & Deep Learning; Computer Vision & Machine Vision

Day: Wednesday, 03/18
Time: 16:30 - 16:55
Location: Room 210A
View Recording
View PDF

S5330 - GPU-Accelerated Finite Element Analysis and Design Optimization on the Cloud

Krishnan Suresh Associate Professor, University of Wisconsin, Madison
Highly-Rated Speaker
Krishnan Suresh
Krishnan Suresh is currently an Associate Professor in the Department of Mechanical Engineering, University of Wisconsin, Madison. He graduated in 1998 from Cornell with a Ph.D. in Mechanical Engineering. He later served as an Engineering Manager at Kulicke and Soffa Industries, Philadelphia from 1998 through 2002. He has received numerous peer-reviewed grants, including the prestigious NSF Career award in 2007. His research interests are in topology optimization, finite element analysis and high-performance computing. He has co-authored over 35 journal papers, and several conference papers, two of which have received best-paper awards from ASME.

The audience will learn how GPUs can accelerate cloud-based finite element analysis and design optimization. The computational challenges underlying such tasks will be discussed, followed by their solution through fast GPU linear solvers. A case-study involving integration of massively parallel GPU computing with modern browser technology will demonstrate and identify new frontiers in engineering.

Level: Intermediate
Type: Talk
Tags: Manufacturing; Computational Physics; Data Center, Cloud Computing & HPC; Product Design & Styling

Day: Wednesday, 03/18
Time: 16:30 - 16:55
Location: Room LL21A
View Recording

S5363 - Roadmap for Many-Core Visualization Software in DOE

Jeremy Meredith Senior Research Scientist, Oak Ridge National Laboratory
Jeremy Meredith
Jeremy Meredith is a senior research scientist in the Future Technologies Group at Oak Ridge National Laboratory, where his research interests include emerging computing architectures and large-scale visualization and analysis. He is a recipient of the 2008 ACM Gordon Bell Prize and a 2005 R&D100 Award. Jeremy received his MS in Computer Science from Stanford University and his BS from the University of Illinois at Urbana-Champaign. Jeremy is a member of the NVIDIA CUDA Center of Excellence at the Georgia Institute of Technology

Visualization and data analysis is an important part of the US DOE investment in HPC to solve challenging scientific problems. As HPC systems become more reliant on many-core technology, three DOE projects are addressing various aspects of this challenge. PISTON provides cross-platform algorithms, EAVL provides advanced data models, and Dax provides execution models. This talk will briefly review these projects and highlight some of the successes each project has had. We then discuss our roadmap to consolidate the features of these three frameworks into a unified system called VTK-m.

Level: All
Type: Talk
Tags: Visualization - In-Situ & Scientific; Data Center, Cloud Computing & HPC; Visualization - Large Scale & Multi-Display

Day: Wednesday, 03/18
Time: 16:30 - 16:55
Location: Room LL21C
View Recording
View PDF

S5480 - GPU-Optimized Algorithms for Coarse-Grained MD Simulations of Protein-Nanoparticle Biocorona Formation

Samuel Cho Assistant Professor, Wake Forest University
Samuel Cho
Samuel S. Cho is an assistant professor with a joint appointment in the Departments of Computer Science and Physics at Wake Forest University. He has published his interdisciplinary computational biophysics research in MD simulations of protein and RNA dynamics, folding, and assembly in over 30 papers in peer-reveiwed journals and conference proceedings, including four as first author in the high impact factor journal, Proceedings of the National Academy of Sciences. To date, he has received funding from NSF, NVIDIA, and Google. Dr. Cho received B. S. degrees in Biochemistry and Computer Science from the University of Maryland, Baltimore County. He was also an undergraduate researcher with Alexander D. Mackerell, Jr. He went on to the University of California, San Diego, where he received a Ph.D. in Physical Chemistry with Peter G. Wolynes. He then performed postdoctoral research with Dave Thirumalai at the University of Maryland, College Park, where he was awarded the NIH (NRSA) Postdoctoral Fellowship. His broad interdisciplinary research interests in computational biophysics are focused on biomolecular folding and assembly mechanisms.

We will describe the GPU-optimized algorithms we developed in order to perform novel coarse-grained MD simulations of 15 apolipoproteins (243 residues each) interacting with a silver nanoparticle, represented by 500 individual beads. The advancement of nanomedicine that can deliver drugs into areas of the cells that were previously inaccessible are becoming realized through nanoparticle development, but they readily interact with biomolecular species that result in biocorona formation that result in nanotoxicity. We will outline the GPU-optimized neighbor list and cell list algorithms, as well as bit-wise shift compression algorithms that decreases the data transfer between GPUs, that were necessary to perform these MD simulations.

Level: Intermediate
Type: Talk
Tags: Life & Material Science; Computational Physics; Developer - Algorithms

Day: Wednesday, 03/18
Time: 16:30 - 16:55
Location: Room 212A
View Recording

S5642 - Canvas: GPU Image Processing on Giant Surfaces

Thomas Soetens Founder and Research Director, Immersive Design Studios
Thomas Soetens
Thomas Soetens (1972) graduated in 1992 with an MFA in Visual Arts from the St-Lucas School of Arts in Belgium. After practicing as a painter, he co-founded Workspace Unlimited in 2001 and founded Immersive Design Studios in 2007 where he currently acts as its research and development director. Immersive Design Studios is an interdisciplinary design and technology company based in Montreal utilizing the potential of 3D game technology in corporate events, architecture, cultural new-media installations, and real-time collaborative environments. Thomas Soetens has initiated several research projects and workshops in collaboration with an international network of institutions, companies, and universities. He is frequently invited to participate in lectures and presentations and his work has been highlighted in numerous publications.

We will discuss how we are bridging the transition from FPGA to GPU-based image processing with our proprietary software - CANVAS: a GPU image-processing platform designed for various AV applications including multi-screen warping, blending, pixel- mapping and color matching. We will present a case-study based on a project at Montreal's Bell Centre hockey arena, featuring projections on ice during the 2013 NHL playoffs. The installation required image warping and blending with 12 overlapping projectors -each set of 6 projectors mapping in 6K onto the arena ice. The use of CANVAS allowed for pixel by pixel resolution, easy warping and blending, as well as cutting the projector calibration time from 8-12 hrs down to just 15 min. Attendees will learn about how to push the limits of the GPU's.

Level: All
Type: Talk
Tags: Media & Entertainment; Visualization - Large Scale & Multi-Display; Video & Image Processing

Day: Wednesday, 03/18
Time: 16:30 - 16:55
Location: Room LL21D
View Recording
View PDF

S5691 - Reflections on Migrating IBM APP Genomic Workflow Acceleration to IBM POWER8

Chandler Wilkerson Project Lead, Research Computing Support Group, Rice University
Chandler has taken the lead on all IBM POWER related projects within Rice’s Research Computing Support Group since 2008, including a pre-GA deployment of POWER7 servers that turned into a 48-node cluster, Blue BioU. The RCSG team maintains a collection of different HPC resources purchased through various grants, and is experienced in providing as uniform a user experience between platforms as possible.

Migrating any workflow to a new hardware platform generates challenges and requires adaptability. With the transition from POWER7 to POWER8, the addition of PowerKVM obviates the need for VIOS and provides the opportunity to manage virtual machines on the POWER platform in a much more Linux-friendly manner. In addition, a number of changes to Red Hat's Enterprise Linux operating system between versions 6 and 7 (7 being required for full POWER8 support at the time of this project's start) have required modifying the standard processes outlined in the tested IBM solution. This presentation will take attendees through the growing pains and lessons learned while migrating a complex system to a new platform.

Level: All
Type: Talk
Tags: OpenPOWER; Supercomputing

Day: Wednesday, 03/18
Time: 16:30 - 16:45
Location: Room 220C
View Recording

S5681 - Changing the Game: Burst Buffer Technologies

Jeff Sisilli Senior Director of Product Marketing, DataDirect Networks
Jeff Sisilli, senior director of product marketing at DataDirect Networks, has over 12 years experience creating and driving enterprise hardware, software and professional services offerings and effectively bringing them to market. Jeff is often quoted in storage industry publications for his expertise in software-defined storage and moving beyond traditional approaches to decouple performance from capacity.

Planning for exascale, accelerating time to discovery and extracting results from massive data sets requires organizations to continually seek faster and more efficient solutions to provision I/O and accelerate applications. New burst buffer technologies are being introduced to address the long-standing challenges associated with the overprovisioning of storage by decoupling I/O performance from capacity. Some of these solutions allow large datasets to be moved out of HDD storage and into memory quickly and efficiently. Then, data can be moved back to HDD storage once processing is complete much more efficiently with unique algorithms that align small and large writes into streams, thus enabling users to implement the largest, most economical HDDs to hold capacity.

Level: All
Type: Talk
Tags: OpenPOWER; Supercomputing

Day: Wednesday, 03/18
Time: 16:45 - 17:00
Location: Room 220C
View Recording

S5680 - Using Docker in High Performance Computing in OpenPOWER Environment

Sam Sanjabi Advisory Software Engineer, IBM Systems & Technology Lab (Canada)
.
Xuebin Min Advisory Software Engineer , IBM (China)

OpenPOWER will be one of major platforms in High Performance Computing (HPC). IBM Load Sharing Facility (LSF) is the most famous cluster workload management software aimed to explore computation capacity of clusters to the maximum in HPC, and LSF is proved running well on OpenPOWER platform. As an open platform for developers and system administrators to build, ship and run applications, Docker has been widely used in cloud. Could we extend Docker benefits to HPC? Yes, we do. By integrating LSF and Docker in OpenPOWER platform, we achieved better application Docking in OpenPOWER HPC.

Level: All
Type: Talk
Tags: OpenPOWER; Supercomputing; Data Center, Cloud Computing & HPC

Day: Wednesday, 03/18
Time: 17:00 - 17:25
Location: OpenPOWER Booth
View Recording

S5708 - China POWER Technology Alliance (CPTA)

Zhu YaDong Chairman, SuZhou PowerCore
Zhu YaDong
Zhu YaDong is chairman of SuZhou PowerCore company which is the Platinum member of OpenPOWER Foundation and he is CEO of Research Institute of Jiangsu Industrial Technology which is the Gold member of OpenPOWER Foundation. He's a member of China MIIT-IBM project leadership team. He earned his MBA at NanJing University and BEI JING University, as well as attended Guang HUA management school. He has many years of experience in the IT industry and used to be the Director of JiangSu province network center and COO of PalmSource.Inc China etc.

The objective is to position China POWER Technology Alliance (CPTA) as a mechanism to help global OpenPOWER Foundation members engage with China organizations on POWER-based implementations in China.

Level: All
Type: Talk
Tags: OpenPOWER; Supercomputing

Day: Wednesday, 03/18
Time: 17:00 - 17:15
Location: Room 220C
View Recording

S5684 - One-Click Hadoop Cluster Deployment on OpenPower Systems Running KVM and Managed by Openstack

Pradeep Surisetty Linux KVM (PowerKVM & zKVM) Test Lead, Linux Technology Centre
Pradeep is the Linux KVM (PowerKVM & zKVM) Test Lead for the Linux Technology Centre in Bangalore.

Hadoop workloads are memory and compute intensive and Power servers are best choice for hadoop workloads. The Power servers are first processor designed to accelerate big data workloads. We implemented PowerKVM based Hadoop cluster solution on Power Systems and validated performance of teradata workload on PowerKVM virtual machines, to ensure consolidation of Hadoop workloads on PowerKVM. This paper covers how capabilities of Open Power & Openstack simplify deployment of Hadoop Solution on Power Virtual machines. Also would like to share VM & Hadoop cluster configuration which yields better performance. This presentation talks about "One-click hadoop cluster deployment on OpenPower systems running KVM and managed by Openstack"

Level: All
Type: Talk
Tags: OpenPOWER; Supercomputing

Day: Wednesday, 03/18
Time: 17:15 - 17:30
Location: Room 220C

S5692 - Trusted Computing Applied in OpenPOWER Linux

Mao Qiu Yin Director, Teamsun
Mao Qiu is a Director at Teamsun.
Zhiqiang Tian Software Developer, Teamsun
Zhiqiang is an SW Developer at Teamsun.

The computer system security problem is more and more emphasized by the Chinese government and it has created its own security standards. OpenPOWER as a new open platform, it urgently needs to achieve China's trusted computing security standard and provides the prototype system that conforms to the specifications in order to satisfy the demands of the development of OpenPOWER ecosystem in China.

Level: All
Type: Talk
Tags: OpenPOWER; Supercomputing

Day: Wednesday, 03/18
Time: 17:30 - 17:45
Location: Room 220C
View Recording

S5923 - Present and Future Leadership Computers at Oak Ridge National Laboratory

Jack Wells Director of Science, Oak Ridge National Laboratory
Jack Wells is the Director of Science for the National Center for Computational Sciences (NCCS) at Oak Ridge National Laboratory (ORNL). He is responsible for devising the strategy to ensure cost-effective, state-of-the-art scientific computing at the NCCS, which hosts the Department of Energy's Oak Ridge Leadership Computing Facility (OLCF), a DOE Office of Science national user facility, and Titan, currently the faster supercomputer in the United States. In ORNL's Computing and Computational Sciences Directorate, Wells has previously lead both the Computational Materials Sciences group in the Computer Science and Mathematics Division and the Nanomaterials Theory Institute in the Center for Nanophase Materials Sciences. During an off-site assignment from 2006 to 2008, he served as a legislative fellow for U.S. Senator Lamar Alexander of Tennessee, providing information about high-performance computing, energy technology, and science, technology, engineering, and mathematics education policy issues. Wells began his ORNL career in 1990 for resident research on his Ph.D. in Physics from Vanderbilt University. Following a three-year postdoctoral fellowship at the Harvard-Smithsonian Center for Astrophysics, he returned to ORNL as a staff scientist in 1997 as a Wigner fellow. Jack is an accomplished practitioner of computational physics and has been sponsored in his research by the Department of Energy's Office of Basic Energy Sciences. Jack has authored or co-authored over 70 scientific papers and edited 1 book, spanning nanoscience, materials science and engineering, nuclear and atomic physics computational science, and applied mathematics.

Pending

Level: All
Type: Talk
Tags: OpenPOWER

Day: Wednesday, 03/18
Time: 17:30 - 17:55
Location: OpenPOWER Booth
View Recording

S5677 - Enabling Financial Service Firms to Compute Heterogeneously with Gateware Defined Networking (GDN)

John Lockwood CEO, Algo-Logic Systems, Inc.
John W. Lockwood, CEO of Algo-Logic Systems, Inc., is an expert in building FPGA-accelerated applications. He has founded three companies focused on low latency networking, Internet security, and electronic commerce and has worked at the National Center for Supercomputing Applications (NCSA), AT&T Bell Laboratories, IBM, and Science Applications International Corp (SAIC). As a professor at Stanford University, he managed the NetFPGA program from 2007 to 2009 and grew the Beta program from 10 to 1,021 cards deployed worldwide. As a tenured professor, he created and led the Reconfigurable Network Group within the Applied Research Laboratory at Washington University in St. Louis. He has published over 100 papers and patents on topics related to networking with FPGAs and served as served as principal investigator on dozens of federal and corporate grants. He holds BS,MS, PhD degrees in Electrical and Computer Engineering from the University of Illinois at Urbana/Champaign and is a member of IEEE, ACM, and Tau Beta Pi.

Stock, futures, and option exchanges; market makers; hedge funds; and traders require real-time knowledge of the best bid and ask prices for the instruments that they trade. By monitoring live market data feeds and computing an order book with Field Programmable Gate Array (FPGA) logic, these firms can track the balance of pending orders for equities, futures, and options with sub-microsecond latency. Tracking the open orders by all participants ensures that the market is fair, liquidity is made available, trades are profitable, and jitter is avoided during bursts of market activity.

Level: All
Type: Talk
Tags: OpenPOWER; Supercomputing; Finance

Day: Wednesday, 03/18
Time: 17:45 - 18:00
Location: Room 220C
View Recording

S5695 - FPGA Acceleration in a Power8 Cloud

Fei Chen Staff Researcher, Next-Generation Systems, IBM, China Research Lab
Fei Chen works for IBM China Research Lab in major of cloud and big data. He achieved his B.S. degree in Tsinghua University, China and got his Ph.D. degree in Institute of Computing Technology, Chinese Academy of Sciences in the year 2011. He worked on hardware design for many years, and now focuses on integrating heterogeneous computing resource into cloud.

OpenStack is one of the popular software that people use to run a cloud. It managers hardware resources like memory, disks, X86 and POWER processors and then provide IaaS to users. Based on existing OpenStack, more kinds of hardware resource can also be managed by OpenStack and be provided to users, like GPU and FPGA. FPGA has been widely used for many kinds of applications, and POWER8 processor has integrated an innovated interface called CAPI (Coherent Accelerator Processor Interface) for direct connection between FPGA and POWER8 chip. CAPI not only provides low latency, high bandwidth and cache coherent interconnection between user's accelerator hardware and the application software, but also provides an easy programming capability for accelerator hardware developers.

Level: All
Type: Talk
Tags: OpenPOWER; Supercomputing; Data Center, Cloud Computing & HPC

Day: Wednesday, 03/18
Time: 18:00 - 18:25
Location: OpenPOWER Booth
View Recording

S5705 - Data Center and Cloud Computing Market Landscape and Challenges

Manoj Roge Director, Wired/Data Center Solutions, Xilinx
Manoj Roge is Director of Wired & Data Center Solutions Planning at Xilinx. Manoj is responsible for product/roadmap strategy and driving technology collaborations with partners. Manoj has spent 21 years in semiconductor industry with past 10 years in FPGA industry. He has been in various engineering and marketing/business development roles with increasing responsibilities. Manoj has been instrumental in driving broad innovative solutions through his participation in multiple standards bodies and consortiums. He holds an MBA from Santa Clara University, MSEE from University of Texas, Arlington and BSEE from University of Bombay.

In this talk, we will gain an understanding of Data center and Cloud computing market landscape and challenges, discuss technology challenges that limit scaling of cloud computing that is growing at an exponential pace and wrap up with insights into how FPGAs combined with general purpose processors are transforming next generation data centers with tremendous compute horsepower, low-latency and extreme power efficiency.

Level: All
Type: Talk
Tags: OpenPOWER; Data Center, Cloud Computing & HPC; Supercomputing

Day: Wednesday, 03/18
Time: 18:00 - 18:25
Location: OpenPOWER Booth
View Recording
View PDF

S5791 - Introduction to OPAL: the OpenPower Abstraction Layer

Stewart Smith OPAL Architect , IBM
Stewart Smith is the OPAL Architect at IBM, spending his days inside the Open Power Abstraction Layer, the open source firmware for OpenPOWER systems. He maintains skiboot, the runtime and boot firmware that runs on OPAL systems. Previously he worked for Percona as Director of Server Development where he oversaw development of many of Percona's software products. He comes from many years' experience in databases and free and open source software development. He was one of the founding core developers of the Drizzle Database Server and has previously worked for MySQL AB and Sun Microsystems on Drizzle, MySQL and MySQL Cluster. He's often found cycling, running, brewing beer, taking photos and talking about free and open source software (sometimes all at the same time).

A tour of the boot and runtime components of OpenPower firmware. A tour through the boot process, skiboot (boot and runtime), the petitboot bootloader and where OEM customizations are possible.

Level: All
Type: Talk
Tags: OpenPOWER

Day: Wednesday, 03/18
Time: 18:15 - 18:30
Location: OpenPOWER Booth
View Recording

S5792 - Customizing and Contributing to OPAL: The OpenPower Abstraction Layer

Stewart Smith OPAL Architect, IBM
Stewart Smith is the OPAL Architect at IBM, spending his days inside the Open Power Abstraction Layer, the open source firmware for OpenPOWER systems. He maintains skiboot, the runtime and boot firmware that runs on OPAL systems. Previously he worked for Percona as Director of Server Development where he oversaw development of many of Percona's software products. He comes from many years' experience in databases and free and open source software development. He was one of the founding core developers of the Drizzle Database Server and has previously worked for MySQL AB and Sun Microsystems on Drizzle, MySQL and MySQL Cluster. He's often found cycling, running, brewing beer, taking photos and talking about free and open source software (sometimes all at the same time).

An overview of how to build and release OPAL for OEMs, where we will go over the OPAL development and release processes and cover how to work with upstream. This session is useful for OEMs and those deploying POWER systems who may want to customize their firmware.

Level: All
Type: Talk
Tags: OpenPOWER

Day: Wednesday, 03/18
Time: 18:45 - 19:00
Location: OpenPOWER Booth
View Recording

S5120 - Breakthrough Science on GPU Clusters

John Taylor Research Director, CSIRO
John Taylor
Dr Taylor is currently Research Director, CSIRO eResearch & Computational and Simulation Sciences at CSIRO's Digital Productivity and Services Flagship. I have broad international leadership experience working in large and complex research and teaching organisations including the University of Chicago, the Australian National University, Montclair State University, CSIRO, Argonne National Laboratory, NASA and the world's top physics Laboratory - Lawrence Livermore National Laboratory. Dr Taylor has written more than 150 articles and books on computational and simulation science, climate change, global biogeochemical cycles, materials science, image analysis, air quality and environmental policy, from the local to the global scale, spanning science, impacts and environmental policy. Dr Taylor has worked as a Computational Scientist and group leader both at the Mathematics and Computer Science Division, Argonne National Laboratory and at the Atmospheric Science Division at Lawrence Livermore National Laboratory. Dr Taylor was Senior Fellow in the Computation Institute at the University of Chicago. Dr Taylor has served on the Advisory Panel of the Scientific Computing Division of US National Center for Atmospheric Research (NCAR) and the US National Energy Research Scientific Computing Center NUGEX Advisory Committee. Dr Taylor is a Fellow of the Clean Air Society of Australia and New Zealand. I am a current member of the Board of the National eResearch Collaboration Tools and Resources (NeCTAR). NeCTAR is a $47M Australian Government project conducted as part of the Super Science Initiative. I am also a member of the Scientific Advisory Committee, Victorian Life Sciences Computing Initiative.
Tomasz Bednarz Projects Leader and Computational Research Scientist, CSIRO
Tomasz Bednarz
Dr Tomasz Bednarz currently works as a Computational Research Scientist and Projects Leader at CSIRO's Digital Productivity and Services Flagship. He joined CSIRO in early 2009, and initially worked as 3-D Visualisation Software Engineer at CSIRO Queensland Centre for Advanced Technologies. Then in early 2011, he moved to Sydney to carry out works on image analysis using GPGPUs and heterogenous architectures, and led NeCTAR funded Cloud-based image analysis and processing toolbox project (http://cloudimaging.net.au). Currently, he leads project Platform for Big Data Analytics and Visual Analytics, connecting data analytics, statistical modelling, image analytics, machine learning, visualisation into one stack of reusable solutions running on the CSIRO infrastructure. His broad range of expertise spanning from image analysis, through numerical simulations and experiments with fluids, visualisation, computer graphics, demoscene (https://www.youtube.com/watch?v=UmS6LtNwMcE) to human-computer interactions is evidenced by the quality and number of publications (http://www.researcherid.com/rid/A-7376-2011). He runs Brisbane GPU Meet-up group, is active in the ACM SIGGRAPH International Resources Committee, and leads the Khronos Group chapter. He actively promotes use of computational and visualisation techniques for science and research.

This presentation will outline CSIRO's accelerated computing strategy, its development and its achievements over the past 5 years. We will provide a detailed description of the accelerated computing facility. Experiences with implementing and managing the facility will be discussed. Examples of the accelerated computing program projects, which partners computational scientists with science teams, will be presented. Finally we will consider the future directions of the CSIRO's accelerated computing strategy including the accelerated computing facility and its associated programs. Steve McMahon, Solution Architect and Senior Systems Administrator at CSIRO, is a co-author of this talk.

Level: All
Type: Talk
Tags: Supercomputing; Big Data Analytics; Life & Material Science; Press-Suggested Sessions: HPC & Science

Day: Thursday, 03/19
Time: 09:00 - 09:50
Location: Room 212B
View Recording

S5207 - GiMMiK: Generating Bespoke Matrix-Multiplication Kernels for NVIDIA GPUs

Freddie Witherden Ph.D. Student, Imperial College London
Freddie Witherden
Freddie Witherden studied Physics with Theoretical Physics at Imperial College London between 2008–2012 earning an MSci degree with first class honours. His masters thesis was on the development of a parallel Barnes-Hut type treecode for simulating laser-plasma interactions. Currently, he is a Ph.D. candidate in the department of Aeronautics at Imperial College London under the supervision of Dr Peter Vincent.

Learn how run-time code generation can be used to generate high-performance matrix-multiplication kernels for GPUs. In this talk, I will introduce GiMMiK, an open-source framework for generating bespoke kernels for performing block-by-panel type matrix-matrix multiplications. The techniques employed by GiMMiK will be described in detail. Benchmarks comparing GiMMiK to cuBLAS will be presented and speed-ups of up to 10x will be demonstrated. Specific applications of GiMMiK in the field of high-order computational fluid dynamics will also be highlighted.

Level: Intermediate
Type: Talk
Tags: Computational Fluid Dynamics; Developer - Performance Optimization; Computational Physics; Supercomputing

Day: Thursday, 03/19
Time: 09:00 - 09:25
Location: Room 210B
View Recording
View PDF

S5217 - Accelerated Sparse Matrix Multiplication for Quantum Chemistry with CP2K on Hybrid Supercomputers

Ole Schütt Ph.D. Student, ETH Zürich
Ole Schütt
Ole studied Physics and Computer Science at TU Braunschweig and FU Berlin. From 2009 to 2012, Ole served as an Assistant at the Zuse Institute Berlin. Currently Ole is completing his Ph.D. studies at the ETH Zürich (c/o Prof. Dr. Joost VandeVondele, Nanoscale Simulations).

Learn how we achieve great GPU performance with an auto-tuned sparse matrix multiplication library, enabling quantum simulation of millions of electrons. Our tool of choice is CP2K, a leading code in the field of electronic structure and simulation. Exploiting the locality and sparsity this code achieves a linear computational complexity for DFT, allowing for novel science. Massive parallelism over thousands of GPUs leads to excellent time to solution. The major computational kernel is block-sparse matrix matrix multiplication. We will discuss results and development insights, including GPU kernels and latency hiding node-parallel techniques. We propose sparse matrix multiplications as a powerful abstraction to formulate streaming algorithms in general.

Level: Beginner
Type: Talk
Tags: Life & Material Science; Developer - Performance Optimization; Supercomputing

Day: Thursday, 03/19
Time: 09:00 - 09:25
Location: Room 212A
View Recording
View PDF

S5220 - Solutions for Efficient Memory Access for Cubic Lattices and Random Number Algorithms

Matteo Lulli Mr, 'Sapienza', University of Rome
Matteo Lulli
Matteo is a Ph.D. Candidate at "Sapienza", University of Rome. The thesis subject is the development of a new out-of-equilibrium finite-size scaling method to measure the critical parameters (critical temperature and critical exponents) of a continuous phase transition. The new method has been successfully applied to the 3D Ising Spin Glass and it can be used for any kind of system undergoing such a transition. Using GPUs competitive results were obtained. These are comparable to those obtained from dedicated FPGA hardware, being the most accurate so far. Thesis advisors are: Prof. Massimo Bernaschi, Prof. Giorgio Parisi and Prof. Andrea Pelissetto.

The cubic stencil is one of the most common data set for on-lattice algorithms and high quality random numbers are useful in many areas. Based on the lessons we learned during the development of a highly-tuned implementation of a Monte Carlo (MC) simulator for the three-dimensional Ising spin glass, we present solutions for an efficient memory access pattern for the cubic stencil and for lagged-Fibonacci-like PRNGs, in particular for the famous Mersenne-Twister MT19937. We will show both single- and multi-GPU results highlighting the advantages coming from our approach also in the multi-GPU settings, and a comparison of the performances of our PRNG implementations with those of the cuRand library.

Level: Intermediate
Type: Talk
Tags: Computational Physics; Developer - Algorithms

Day: Thursday, 03/19
Time: 09:00 - 09:25
Location: Room 210F
View Recording
View PDF

S5232 - GPU Acceleration of WSMP (Watson Sparse Matrix Package)

Natalia Gimelshein Developer Technology Engineer Compute, NVIDIA
Natalia Gimelshein received M.S. in Aerospace Engineering from PennState University in 2001. Her research after that focused on using DSMC and model kinetic equations for rarefied gas dynamics problems. She is a co-author of several journal, conference and review papers on various topics of rarefied gas dynamics. She joined NVIDIA in 2014.
Steve Rennich Sr. DevTech Engineer - Compute, NVIDIA
Steve Rennich received his Ph.D. in Aeronautics and Astronautics from Stanford University in 1997. He has performed extensive work in computational fluid dynamics, plant morphogenesis, and parallel optimization of multibody dynamics and finite element analysis. Most recently his interests have concerned th optimization of massively parallel algorithms for scientific computing in general and optimizing sparse linear solvers on GPUs in particular. Steve has been with NVIDIA as a Sr. Developer Technology Engineer - Compute for 4 years.

The Watson Sparse Matrix Package (WSMP) is a well-known collection of algorithms for efficiently solving sparse systems of linear equations that has long been among the best-performing sparse solver codes on the CPU. Recently, the direct sparse solver capabilities of WSMP have been modified to leverage GPU computing, resulting in significant performance improvements. This talk will focus on detailing the very non-invasive approach used to accelerate WSMP's direct sparse capabilities using GPUs. Performance results for the case of both single-node and distributed-memory solves will also be presented. This work was done in collaboration with Seid Koric from NCSA and Anshul Gupta from IBM.

Level: Intermediate
Type: Talk
Tags: Developer - Performance Optimization; Developer - Algorithms; Supercomputing

Day: Thursday, 03/19
Time: 09:00 - 09:50
Location: Room 210G
View Recording
View PDF

S5233 - GPU Acceleration Using OpenACC and C++ Classes

Mathew Colgrove Dev Tech Software Engineer, NVIDIA
Mathew Colgrove
Mathew Colgrove is a Dev Tech Software Engineer with NVIDIA's Portland Group team. Mat's primary role is to help users in porting code to accelerators using OpenACC and CUDA Fortran as well as assisting with general programming questions. Prior to his current position, he was Quality Assurance manager responsible for both building and maintaining PGI's proprietary automated testing environments. Mat is also NVIDIA's SPEC representative www.spec.org on the CPU and HPG committees.

This tutorial provides strategies of using OpenACC to accelerate C++ classes. Examples illustrate topics such as member functions, inheritance, templates, containers, the implicit 'this' pointer, private data and deep copies. OpenACC 2.0 features such as unstructured data regions and the "routine" directive are highlighted. We also discuss current limitations and the future directions of OpenACC. Familiarity with OpenACC is recommended but not required.

Level: Intermediate
Type: Talk
Tags: OpenACC; Developer - Programming Languages; Supercomputing

Day: Thursday, 03/19
Time: 09:00 - 09:50
Location: Room 210D
View Recording
View PDF

S5265 - Customer Success Story: Desktop Virtualization with NVIDIA GRID for a Large Construction Company

Jits Langedijk Senior Consultant, PQR
Jits Langedijk
Jits Langedijk has over 12 years experience in delivering remote desktops and applications to end-users. Within the expertise of Application and Desktop Delivery (ADD) his primary focus is with Citrix products. As a cowriter of the '3D Graphics for Virtual Desktops Smackdown' and carrying out various projects involving 3D graphics contributed to awesome knowledge, both in theory and in practice.

Learn how one of the largest construction companies in the Netherlands successfully implemented a VDI environment with NVIDIA GRID, Citrix XenDesktop and VMware virtualization. Hear about their use cases and lessons learned of as well as how to run AutoDesk, BIM and other applications with VDI.

Level: Intermediate
Type: Talk
Tags: Graphics Virtualization; Manufacturing; Press-Suggested Sessions: Professional Graphics

Day: Thursday, 03/19
Time: 09:00 - 09:50
Location: Room LL20D
View Recording
View PDF

S5310 - GPU Computing for Distributed Acoustic Sensing

Marzban Palsetia Technical Advisor, Halliburton
Marzban Palsetia is a technical advisor at Halliburton where he develops software and algorithms for Fiber Optics technologies. He has more than fifteen years of experience in digital signal processing and has developed applications ranging from a Mobile Indoor Positioning system to a Synthetic Aperture Radar processing system for NASA’s LRO and Chandrayaan lunar missions. Prior to joining Halliburton, he was with Microsoft Corporation for six years and Vexcel Corporation for ten years, both in Boulder, CO. He holds a Master’s degree from the University of Florida, Gainesville and a Bachelor’s degree from the University of Bombay, India.

Distributed Acoustic Sensing (DAS) is a fiber optic technology deployed in energy production by Pinnacle, a Halliburton Service. DAS, based on Rayleigh scattering principles, is used to determine acoustic strain over several kilometers, effectively turning the fiber into a series of virtual microphones. DAS data analysis involves processing of high volume rate (> 400 Mbytes/sec) data with algorithms for data correction, spectral filtering, and spectrogram and image generation. We show processing speed up with GPU-adapted algorithms that far exceed the single CPU and multiple CPU algorithms, reducing processing time from the order of a day to a few minutesf

Level: All
Type: Talk
Tags: Energy Exploration; Signal & Audio Processing; Video & Image Processing

Day: Thursday, 03/19
Time: 09:00 - 09:25
Location: Room 210E

S5331 - GPUs and Machine Learning: A Look at cuDNN

Sharan Chetlur Software Engineer, NVIDIA
Highly-Rated Speaker
Sharan Chetlur
Sharan Chetlur is an engineer at NVIDIA working in the CUDA Libraries and Algorithms Group. He currently works in the fields of Deep Learning and Neural Networks, and is a developer of the cuDNN library. Previously, his focus was on applications in Linear Algebra, working as a developer on the cuBLAS and cuSparse libraries. Sharan holds a Master's Degree in Computer Engineering from the University of Florida.

We describe cuDNN, NVIDIA's in-house CUDA library of deep learning primitives. Addressing the demand from engineers and data scientists, we created a library similar in intent to BLAS, with optimized routines for deep learning workloads. Previously, Neural Network framework developers had to implement these low-level routines for GPUs on an ad-hoc basis, optimizing individual computational kernels by hand and repeating this work as new parallel processors emerged. cuDNN alleviates this burden by providing tuned black box implementations of these functions. The library is easy to integrate into existing frameworks, and provides optimized performance and memory usage across GPU generations. We discuss supported functionality, algorithmic implementation details and performance achieved.

Level: Intermediate
Type: Talk
Tags: Machine Learning & Deep Learning; Computer Vision & Machine Vision; Developer - Tools & Libraries

Day: Thursday, 03/19
Time: 09:00 - 09:50
Location: Room 210A
View Recording

S5386 - VMD: Publication-Quality Ray Tracing of Molecular Graphics with OptiX

John Stone Senior Research Programmer, University of Illinois at Urbana-Champaign
Highly-Rated Speaker
John Stone
John Stone is a Senior Research Programmer in the Theoretical and Computational Biophysics Group at the Beckman Institute for Advanced Science and Technology, and Associate Director of the NVIDIA CUDA Center of Excellence at the University of Illinois. Mr. Stone is the lead developer of VMD, a high performance molecular visualization tool used by researchers all over the world. His research interests include molecular visualization, GPU computing, parallel processing, ray tracing, haptics, and virtual environments. Mr. Stone was awarded as an NVIDIA CUDA Fellow in 2010. Mr. Stone also provides consulting services for projects involving computer graphics, GPU computing, and high performance computing.

This session will describe the adaptation of the popular molecular graphics program VMD to support both batch and interactive ray tracing using NVIDIA OptiX, on computers ranging from laptops all the way up to large scale Cray XK7 supercomputers such as Blue Waters and Titan. We will describe the benefits of custom VMD-specific geometric primitives and memory layouts, and relate our experiences adapting the Tachyon CPU-based ray tracing engine used by VMD, to NVIDIA's OptiX GPU ray tracing framework. The session will present performance data for workstation and supercomputer class visualizations, integration of OptiX into VMD, interactive ray tracing, many example movies and visualizations, and avenues for further improvement.

Level: Intermediate
Type: Talk
Tags: Rendering & Ray Tracing; Media & Entertainment; Visualization - In-Situ & Scientific

Day: Thursday, 03/19
Time: 09:00 - 09:25
Location: Room LL21E
View Recording
View PDF

S5525 - NeuroGPU : Accelerating Biophysical Neuronal Modeling with CUDA

Roy Ben-Shalom Post-Doc Researcher, UCSF Neurology Department
Roy Ben-Shalom
I am a post-doctoral research fellow in Kevin Bender's lab at the Keck Center for Integrative Neuroscience at the University of California, San Francisco. My research throughout graduate school focused on accelerating neuron simulation models using GPUs through CUDA. I published two papers on the topic. The first examined simulation of ion channels and the second focused on simulating entire neurons. By using CUDA, I developed a method to accelerate neuronal simulations. We showed that our application works three orders of magnitude faster in comparison to the typically used method (NEURON simulation environment). My current research i am developing a realistic model for pyramidal cells in the prefrontal cortex of the brain and checking the effect of dopamine on excitability.

Learn how to implement fast, cheap and realistic neuronal modeling through NeuroGPU, the first open-source, biophysical rigorous compartmental neuronal modeling environment based in for GPUs. In this talk we will discuss: 1) an overview of the mathematics of neuronal modeling, 2) computational challenges imposed by traditional modeling environments, and 3) how these can be overcome through implementation in CUDA. Examples of advanced scientific computing tasks, including evolutionary algorithms for model optimization, will be provided using NeuroGPU.

Level: All
Type: Talk
Tags: Medical Imaging; Life & Material Science; Supercomputing; Developer - Algorithms

Day: Thursday, 03/19
Time: 09:00 - 09:25
Location: Room LL21B
View Recording
View PDF

S5544 - Map-D: Hyper-Interactive GPU-Powered Visualytics for Big Data

Todd Mostak Co-Founder/President, Map-D
Todd Mostak
Before Map-D, Todd was a researcher at MIT CSAIL. Seeking adventure upon finishing his undergrad, Todd moved to the Middle East, spending two years in Syria and Egypt teaching English, studying Arabic and eventually working as a translator for an Egyptian newspaper. Later as a Research Fellow at Harvard’s Kennedy School of Government, he focused on analysis of Islamism using forum and social media datasets. His frustration with the inability of existing tools to allow for the interactive exploration of large Twitter datasets motivated him to create Map-D.

As people wish to interactively explore increasingly larger datasets, existing tools are unable to deliver acceptable performance. The distributed-nature of systems like Spark leads to latencies detrimental to interactive data exploration, while single-node visualization solutions like Tableau and Qlikview are not powerful enough to deliver sub-second response times for even intermediate-sized datasets. In this talk, we will argue that dense GPU servers, containing 4-16 GPUs each, can provide analytics query throughput exceeding what can be achieved on even large clusters, while avoiding the latencies and complications associated with running over a network. We will look at MapD, which can query and visualize multi-billion row datasets in milliseconds, as an example of such a system. Finally, we will show how the significantly higher performance achievable with a GPU system translates into new modes and paradigms of data analysis.

Level: All
Type: Talk
Tags: Big Data Analytics; Real-Time Graphics; Data Center, Cloud Computing & HPC; Press-Suggested Sessions: HPC & Science

Day: Thursday, 03/19
Time: 09:00 - 09:25
Location: Room LL21C
View Recording

S5641 - Delta Mush: Smoothing Deformations While Preserving Detail for VFX and Game Characters

Joe Mancewicz Principal Graphics Scientist, Rhythm & Hues Studios
Joe Mancewicz
Mr. Mancewicz, principal graphics scientist at Rhythm and Hues Studios, has spent the past 17 years doing character animation and visual effects for film and commercials. Mr. Mancewicz has been writing software and doing research at Rhythm since 2000. Although his primary focus is on character rigging and animation software development, he has participated in a wide range of R&D projects, from CFD to art-directable stress analysis response. He was an integral part of the special effects team that contributed to two Academy Award winning movies in the category of Best Achievement in Visual Effects: The Golden Compass in 2008 and Life of Pi in 2013. In 2014 he was personally recognized as one of the two core engineers in development of the Voodoo software, which was selected for a Technical Achievement Award by the Motion Picture Academy of Arts and Sciences. Mr. Mancewicz has also assisted faculty at University of Michigan AOSS with occlusion computation and visualizations of satellite deployment.

Peek under the hood of Rhythm & Hues Studio's powerhouse cleanup deformer, the Delta Mush. Delta Mush is a Voodoo deformer, which smooths arbitrary deformation of a polygonal mesh without smoothing the original detail of the model. Delta Mush does not require meticulous up-front tuning: it easily accommodates model and rig changes; and it has proven to be versatile far beyond cleanup. It has been used in all character rigs at R&H since it was developed in 2010.

Level: All
Type: Talk
Tags: Media & Entertainment; Game Development; Real-Time Graphics

Day: Thursday, 03/19
Time: 09:00 - 09:25
Location: Room LL21D
View Recording
View PDF

S5867 - 3D Cloud Workstations: Scyld Cloud Workstation (Presented by Penguin Computing)

Gary Yee Senior Software Engineer, Penguin Computing
Gary Yee has been a senior software engineer for Penguin Computing since 2011 after graduating from CU Boulder. He has been designing and implementing HPC cluster management and monitoring tools as well as remote desktop solutions as part of the Penguin Computing On Demand team.
Thomas Ruge CEO and Founder, Colorado Code Craft
Thomas Ruge, CEO of Colorado Code Craft, Ltd. is a life long serial entrepreneur with a successful track record in the field of computer graphics, cloud middleware and virtualization. Thomas has held leadership and management positions at ModViz, Mersive Technologies, NVIDIA and Siemens. Before founding Colorado Code Craft, he has been the founder and CTO of ModViz, which he successfully sold to NVIDIA in 2008. In the early part of his career he was the cofounder of multiple startups in Germany. Thomas holds an MBA from the Columbia Business School, and an MBA from the Haas School of Business, UC Berkeley and a Masters of Quantum Physics, Technical University of Munich.

Learn why Scyld Cloud Workstation, a browser-based, high quality, low-bandwidth, 3D accelerated desktop can greatly streamline cloud HPC workflows. Penguin's Scyld Cloud Workstation provides significant time savings by eliminating the need for downloading large data files when using the cloud for HPC workflows. For example, a typical manufacturing engineer using a Scyld Cloud Workstation can run real-time, interactive GUI tools and 3D visualizations in the cloud, removing the traditional barrier of moving large files down to on-premises workstations. Additionally, since there is no browser plug-in or application installation needed, the difficulty of security changes is eliminated - allowing for easy integration with industry security policies.

Level: All
Type: Talk
Tags: Data Center, Cloud Computing & HPC; Graphics Virtualization

Day: Thursday, 03/19
Time: 09:00 - 09:50
Location: Room LL21F
View Recording

S5238 - Multiphysics Simulation Using GPUs

Arman Pazouki Research Associate, University of Wisconsin-Madison
Arman Pazouki
Arman is currently a post-doctoral researcher at the University of Wisconsin-Madison. He received his Ph.D. in Mechanical Engineering and MS in Engineering Mechanics, both from University of Wisconsin-Madison, Madison, WI, MS in Mechanical Engineering from Sharif University of Technology, Iran, and BS in Mechanical Engineering from University of Tehran, Iran. He has more that 10 years of experience of developing code for fluid dynamics and fluid-solid interaction and more than 5 years of experience developing GPU enabled codes using CUDA library.

We present a GPU-based framework for the fully-resolved simulation of interacting rigid and deformable solid objects that move in fluid flow. The fluid dynamics is based on a meshless approach. Moving Lagrangian markers, distributed in the fluid domain as well as on the solid surfaces, are used to capture the fluid dynamics, fluid-solid, and solid-solid interactions. Mass and momentum exchange between neighbor markers are determined in a parallel spatial subdivision algorithm. The solid objects' distributed forces are reduced in parallel via thrust reduction algorithms and used later for temporal update via lightweight GPU kernels. Scenarios containing tens of thousands of floating rigid and flexible objects were exercised on several GPU architectures and the linear scalability was shown.

Level: Advanced
Type: Talk
Tags: Computational Fluid Dynamics; Computational Physics; Developer - Algorithms; Developer - Tools & Libraries

Day: Thursday, 03/19
Time: 09:30 - 09:55
Location: Room 210B
View Recording
View PDF

S5241 - GPUs in GAMESS: The Story of Libcchem

Dave Tomlinson Graduate Student, Iowa State University/Ames Lab
Dave Tomlinson
Dave completed his undergraduate degree at Iowa State University in 2011 before joining Mark Gordon's research group. He is currently pursuing his PhD in physical chemistry. His primary research interests are in HPC, GPUs, and the application of new technologies to GAMESS.

Learn about GPU acceleration in the General Atomic and Molecular Electronic Structure System (GAMESS), one of the most widely used and freely available electronic structure codes in use today. The focus of this talk is libcchem, a high performance library developed for GAMESS to provide both high performance CPU and GPU code for performance critical methods. An overview of the methods in libcchem and how the methods are impacted by GPUs as well as a comparison of GAMESS CPU and GPU code will be given.

Level: Beginner
Type: Talk
Tags: Life & Material Science

Day: Thursday, 03/19
Time: 09:30 - 09:55
Location: Room 212A
View Recording
View PDF

S5273 - Real-Time Heston Stochastic Volatility Tracking via GPUs for Streaming Transactions Data

Yong Zeng Professor, University of Missouri at Kansas City
Yong Zeng
Yong Zeng is a professor in Department of Mathematics and Statistics at University of Missouri at Kansas City. His main research interest includes mathematical finance, financial econometrics, stochastic nonlinear filtering, and Bayesian statistical analysis. Notably, he has developed the statistical analysis via filtering for financial ultra-high frequency data. He has published in Mathematical Finance, International Journal of Theoretical and Applied Finance, Applied Mathematical Finance, Applied Mathematics and Optimization, IEEE Transactions on Automatic Control, Statistical Inference for Stochastic Processes, among others. He has co-edited 'State space models: Applications to Economics and Finance', a Springer 2013 book volume. He has held visiting professorships at Princeton University and the University of Tennessee. He received his B.S. from Fudan University in 1990, M.S. from University of Georgia in 1994 and Ph.D. from University of Wisconsin at Madison in 1999. All degrees were in statistics.

Volatility is influential in investment, risk management and security valuation, and is regarded as one of the most important financial market indicators. For a model well-fitting the stylized facts of transactions data, this session demonstrates how online tracking of Heston stochastic volatility is made possible by GPU computing. The evolving distribution of the volatility and others as new trade occurs is governed by a stochastic partial differential equation (SPDE). Numerically solving such a SPDE as new data flowing in provides the tracking of volatility. The algorithm can be parallelized and each group of threads solves a PDE using red-black Gaussian-Seidel algorithm. The workload sharing among GPUs is embarrassingly parallel and the code scales linearly with the number of GPUs.

Level: Intermediate
Type: Talk
Tags: Finance; Supercomputing; Big Data Analytics

Day: Thursday, 03/19
Time: 09:30 - 09:55
Location: Room 210C
View Recording

S5392 - Demonstrating Innovative Reservoir Modeling Workflows Enabled by a GPU-Accelerated Implicit Simulator

Dave Dembeck Director, Software Engineering, Stone Ridge Technology
Dave Dembeck
Dave is involved in mixing emerging technologies in Systems/Software Engineering, specializing in the development and commercialization of novel software products, managing growth, execution, and embracing organizational/business ambiguity. Dave has been working in GPU-accelerated HPC for over a decade and now leads a multidisciplinary effort underway at Stone Ridge Technology.

Learn how the speed and compute density of GPUs are transforming engineering workflows. We have built a fully-accelerated reservoir simulator that reduces run times from hours to minutes. This increased speed has shifted emphasis from long single runs to much faster workflows where ensembles of hundreds of simulations are available for evaluation by engineers. We present real-field results generated by our simulator which are up to 50x faster than current commercial offerings. We discuss our workflow acceleration tools which display the ensemble results while maintaining model context.

Level: Beginner
Type: Talk
Tags: Energy Exploration; Computational Physics; Visualization - In-Situ & Scientific; Supercomputing

Day: Thursday, 03/19
Time: 09:30 - 09:55
Location: Room 210E
View Recording
View PDF

S5430 - Interactive Modelling and Rendering of Clouds

Jesper Mosegaard Head of Research and Innovation, Alexandra Instituttet
Jesper Mosegaard
Jesper is Head of Research & Innovation at the Computer Graphics Lab of the Alexandra Institute, leading a group of 8 researchers and software developers in adding value to Danish companies through research based innovation. Jesper's main research area is computer graphics and accelerated computations. Jesper got a PhD in 2006 from the University of Aarhus for his dissertation on surgical simulation for children with malformed hearts. Jespers has published 28 peer-reviewed publications, 3 other publications including an animation for SIGGRAPH 2005 Electronic Art and Animation Catalog and a SIGGRAPH 2006 Emerging Technology. Publications on topics within GPGPU and applications in rendering, physics based animation, medical image processing and surgical simulation.

In this presentation we explain how five small Danish animation/vfx companies and a Danish research institute worked together with the vision of increasing productivity and visual quality in clouds for small creative companies - these effects typically requires a lot of waiting time for simulation or rendering. Our solution is fully interactive through utilization of the GPU. The graphics artist can manipulate mesh-geometry and will get interactive updates in final rendering quality of clouds with wispy features and multi-scatter light. We will explain how we carefully selected and implemented GPU algorithms going from mesh to voxel fields with wispy cloud appearances. We will also argue for an industry that needs more interactive tools to truly take advantage of the creative process.

Level: Advanced
Type: Talk
Tags: Media & Entertainment; Real-Time Graphics

Day: Thursday, 03/19
Time: 09:30 - 09:55
Location: Room LL21E
View Recording
View PDF

S5447 - GPU vs Xeon Phi: Performance of Bandwidth Bound Applications with a Lattice QCD Case Study

Mathias Wagner Postdoc, Indiana University
Theoretical physicist Dr. Mathias Wagner is currently working in the physics department at Indiana University. After receiving his PhD in 2009 at Technical University Darmstadt he moved on to Bielefeld University in 2010. There he focussed on CUDA implementations of Lattice QCD simulations. At Indiana University he continues working on high-performance Lattice QCD simulations on GPUs, intensively collaborating with researchers from the National Center for Supercomputing Applications at the University of Illinois and the developers of the QUDA library.

Accelerators have become a key ingredient in HPC. GPUs had a head start and are already widely used in HPC applications but now are facing competition from Intel's Xeon Phi accelerators. The latter promise comparable performance and easier portability and even feature a higher memory bandwidth - key to good performance for a wide range of bandwidth-bound HPC applications. In this session we compare their performance using a Lattice QCD application as a case study. We give a short overview of the relevant features of the architectures and discuss some implementation details. Learn about the effort it takes to achieve great performance on both architectures. See which accelerator is more energy efficient and which one takes the performance crown at about 500 GFlop/s.

Level: All
Type: Talk
Tags: Computational Physics; Developer - Performance Optimization; Data Center, Cloud Computing & HPC

Day: Thursday, 03/19
Time: 09:30 - 09:55
Location: Room 210F
View Recording
View PDF

S5589 - Scaling Data Visualization with GPUs and Design

Leo Meyerovich CEO & Co-Founder, Graphistry, Inc.
Leo Meyerovich co-founded Graphistry, Inc. in Q1 2014 to enable big interactive visualizations to run everywhere. Graphistry advances upon the founding team's work at UC Berkeley: they built the first parallel web browser and Superconductor, a GPU-accelerated visualization scripting language. Leo's past research in programming language design explored automatic parallelization (PLDI SRC awards for Superconductor), the first reactive web language (OOPSLA best paper for Flapjax), sociological foundations (OOPSLA best paper and SIGPLAN best-of-year highlight), and most cited, verified security (ConScript and Margrave). Off-hours, Leo experiments with human computation.

GPUs are ushering in a new era of data visualization. Today, shoving one hundred thousand query results into a chart makes an illegible mess and kills interactivity. The good news is that infovis researchers invented smarter layouts that maximize visibility. The bad news is that these layouts and basic interactions are computationally intensive enough that analysts can no longer simply slide a slider, drag a graph cluster, etc. With the availability of GPUs, however, the rules have changed. This talk shows examples of smarter designs and how we use GPUs to turn them into interactive tools. For experts, we will discuss how running in browsers and even phones led to Graphistry's tiered GPU visualization engine approach, and touch on our use of WebGL, WebCL, and our own in-house libraries.

Level: Beginner
Type: Talk
Tags: Big Data Analytics; Web Acceleration; Visualization - In-Situ & Scientific

Day: Thursday, 03/19
Time: 09:30 - 09:55
Location: Room LL21C
View Recording

S5615 - Acquiring Dramatic Gains in Image Quality: GPU-Accelerated Beamforming

Andre Lehovich Image Scientist, Decision Sciences Medical
Andre Lehovich is an imaging scientist in San Diego, currently working on next-generation ultrasound systems with Decision Sciences Medical (http://www.dsmedco.com). He has also worked on image reconstruction in muon tomography, nuclear medicine, and fMRI. He was an undergraduate at Brown University and received a Ph.D. in applied mathematics from University of Arizona. His personal website is at http://chippingsparrow.com/

Synthetic-aperture ultrasound systems offer a potential for dramatic gains in image quality, compared to classical ultrasound, provided one can do the data acquisition and computations quickly enough. In our synthetic aperture implementation, we are able to achieve 10 fps images by using a GPU to accelerate the beamforming process, a significant speedup over the frame rates available using CPU processing.

Level: Intermediate
Type: Talk
Tags: Medical Imaging; Press-Suggested Sessions: HPC & Science

Day: Thursday, 03/19
Time: 09:30 - 09:55
Location: Room LL21B
View Recording
View PDF

S5624 - GPU-Accelerated Image Processing for Modern Moving Images: Tachyon Wormhole

Lance Maurer CEO and founder, Cinnafilm, Inc.
Lance Maurer
Lance Maurer is the CEO and founder Cinnafilm, Inc, a software engineering company dedicated to the development of the highest quality image processing solutions for both cinema and broadcast. Cinnafilm has developed solutions that positively impacted many of the most valuable film and television projects of all time. Lance's background is as a mechanical engineer in the aerospace industry, and he continues to design solutions for the largest American rocket programs including Atlas, NMD, Delta IV and even the new NASA SLS program.

Cinnafilm CEO and founder Lance Maurer will discuss Tachyon Wormhole, a scalable, real-time, GPU-accelerated tool for lengthening or shortening video by precise amounts, avoiding the need for added editorial. This permits creating new commercial breaks and revenue opportunities. Processing is performed simultaneously to video, audio and captions, and the system also offers professional transcoding, motion-compensated frame-rate conversion, and unlimited format conversions. Wormhole is a software engineering marvel, receiving both "Best of Show" award at NAB 2014 and the prestigious HPA Engineering Excellence Award for 2014. Wormhole is a joint project between Cinnafilm and Wohler Technologies.

Level: All
Type: Talk
Tags: Media & Entertainment; Video & Image Processing; Real-Time Graphics

Day: Thursday, 03/19
Time: 09:30 - 09:55
Location: Room LL21D
View Recording
View PDF

S5151 - Voting And Shuffling For Fewer Atomic Operations

Elmar Westphal Scientific Programmer, Forschungszentrum Jülich GmbH
Highly-Rated Speaker
Elmar Westphal has been working as a prgrammer and cluster architect at Forschungszentrum Juelich for more than 15 years. In the last years he ported simulation programs from different fields of computational physics to single- and/or multi-GPU systems and developed CUDA-based building blocks, libraries and applications mostly for Molecular Dynamics- and Micromagnetism-simulations.

Even though atomic operations became much faster with the introduction of the Kepler architecture, they are still a bottleneck in many algorithms and applications. This is especially true for operations that are not natively supported on the device and have to be implemented using atomicCAS loops (e.g. double precision additions), because modifying the same data by multiple threads within the same warp will, due to warp divergence, also stall the threads already done. This talk will show how to use warp votes and shuffle operations to pre-combine data within a warp by destination-address, in parallel. This can significantly reduce the total number of atomic operations in a kernel call and eliminates CAS loop iterations caused within the same warp.

Level: Advanced
Type: Talk
Tags: Developer - Performance Optimization; Developer - Algorithms

Day: Thursday, 03/19
Time: 10:00 - 10:25
Location: Room 210G
View Recording
View PDF

S5152 - GPU-Accelerated Undecimated Wavelet Transform for Film and Video Denoising

Hermann Fuerntratt Senior Researcher, Joanneum Research
Hermann Fuerntratt
Hermann Fürntratt studied Telematics at the Graz University of Technology where he received his MSc in 1997. During his study, his special focus was on medical image processing. Since then he worked for more than a year in the UK for a company focused on digital color correction and is now a senior researcher at the Audiovisual Media research group of the DIGITAL institute at JOANEUM RESEARCH. He introduced CUDA at the Audiovisual Media group and implemented a real-time GPU-accelerated template-tracking library based on block-matching. His research activities comprise porting all sorts of algorithms to the GPU with CUDA.

The Undecimated Wavelet transform (UWT) is a valuable tool for all kinds of image and video enhancement tasks such as denoising, deconvolution and superresolution. Due to its translation invariance, it provides superior results when compared with the classical discrete wavelet transform, but at the cost of a significantly higher computational complexity. In this session, we will present an highly-efficient GPU implementation of the UWT for 16-bit or 32-bit floating point images, based on modern GPU implementation strategies like register blocking and the computation of multiple outputs per thread. Furthermore, we will show how the UWT is used within a novel film and video denoising algorithm which is able to deal with very different kinds of noise like film grain and digital sensor noise.

Level: Intermediate
Type: Talk
Tags: Media & Entertainment; Video & Image Processing

Day: Thursday, 03/19
Time: 10:00 - 10:25
Location: Room LL21D
View Recording
View PDF

S5189 - AeroFluidX: A Next Generation GPU-Based CFD Solver for Engineering Applications

Bjoern Landmann Development Engineer, FluiDyna GmbH
Bjoern Landmann is a development engineer at FluiDyna GmbH.

The presentation shows the potential of GPU acceleration for reducing turn-around times of industrial CFD applications. FluiDyna is adressing this issue in a modular approach: the library "Culises" was developed to accelerate matrix operations originating from arbitrary problems. This approach can be complemented by a second module that generates the linear system directly on the GPU – the resulting code being less general, but allowing higher speed-up. The code aeroFluidX is a finite volume solver dedicated to incompressible aerodynamics, combining a SIMPLE algorithm for unstructured grids with state-of-the-art RANS turbulence modelling. MPI-parallelization allows calculations being split-up on multiple GPU-enabled nodes, leading to speed-ups of 2.5-3x for industrial scale problems.

Level: All
Type: Talk
Tags: Computational Fluid Dynamics; Computational Physics; Supercomputing; Automotive

Day: Thursday, 03/19
Time: 10:00 - 10:25
Location: Room 210B
View Recording
View PDF

S5190 - Sharing Physically Based Materials Between Renderers with MDL

Jan Jordan Software Product Manager MDL, NVIDIA
Mr. Jordan is the product manager for the NVIDIA Material Definition Language. He is a graduate engineer of applied computer science from the Fachhochschule für Wirtschaft und Technik Berlin/Germany and has a B.Sc in computer science from the RTC Galway Ireland. Before joining NVIDIA, his diverse working experience spans from research work on practical VR application to working as art director in computer games. He is a long time member of NVIDIA's Advanced Rendering team where his focus has been on enabling material workflows across many different applications and renderers.
Lutz Kettner Senior Manager, Rendering Software and Material Definition, NVIDIA
Lutz Kettner
Mr. Kettner leads the design and engineering efforts for the Material Definition Language, MDL, and the Iray renderer from the NVIDIA Advanced Rendering Center. He has been working on leading software products in advanced rendering, language design, API design, and geometry for 19 years. He is known for his influential work on the open source Computational Geometry Algorithms Library CGAL. He holds a Ph. D in Computer Science from ETH Zurich, Switzerland, worked as a researcher at the University of North Carolina at Chapel Hill and led a research group at the Max-Planck-Institute in Saarbrucken, Germany. He served on ISO and ECMA standardization committees.

The basics of NVIDIA's Material Definition Language (MDL) will be discussed, showing how a single material can be used to define matching appearances between different renderers and rendering techniques. End users will learn how physically-based definitions can be defined while developers will learn what's entailed in supporting MDL within their own product/renderer.

Level: All
Type: Talk
Tags: Rendering & Ray Tracing; Media & Entertainment; Product Design & Styling

Day: Thursday, 03/19
Time: 10:00 - 10:50
Location: Room LL21E
View Recording
View PDF

S5199 - Simulating What is Measured - Closing the Loop Between Experiment and Simulation

Michael Bussmann Junior Group Leader, Computational Radiation Physics, Helmholtz-Zentrum Dresden - Rossendorf
Highly-Rated Speaker
Michael Bussmann
Michael Bussmann is the leader of the Junior Group - Computational Radiation Physics. His group provides open source computational tools for plasma physics, particle acceleration, advanced light sources and large-scale data analysis. His research spans from radiation tumor therapy to particle acceleration, image reconstruction and astrophysics. Michael loves GPUs but thinks there is never enough register memory available.
Axel Huebl PHD Student, Helmholtz-Zentrum Dresden - Rossendorf
Axel Huebl is a PHD student in Physics at Helmholtz-Zentrum Dresden -Rossendorf, Dresden, Germany. He is one of the main developers of the many-GPU code PIConGPU and one of the Gordon Bell Prize finalists in 2013. His main interests are laser-driven acceleration of electron and ion beams with high power lasers and probing the dynamics of plasmas at the femtosecond and nanometer scale.

With GPU-accelerated simulations, frames-per-second, in-situ visualization and visual analytics becoming a reality, it increases scalability of codes which allows to reduce the time to obtain a solution significantly. This also makes it possible to run large-scale parameter surveys for optimization. We will present recent activities on integrating complex particle accelerator simulations into a reconstruction loop for matching experimental measurements to simulation. This requires to put simulations in a loop with large-scale data analysis, sythetic diagnostics, image reconstruction techniques and interactive in-situ visualization. We will show how the different building blocks of such a tool chain can be accelerated using GPUs and discuss the combination of these tools.

Level: All
Type: Talk
Tags: Visualization - In-Situ & Scientific; Computational Physics; Medical Imaging; Machine Learning & Deep Learning

Day: Thursday, 03/19
Time: 10:00 - 10:25
Location: Room LL21F
View Recording
View PDF

S5270 - Implementing Radar Algorithms on CUDA Hardware

Pietro Monsurro Research Assistant, University of Rome "Sapienza"
Pietro Monsurro
Pietro Monsurrò is a Research Assistant at the University of Rome "Sapienza", where he teaches Analog Electronics. His research is about analog ICs, and behavioral models and digital calibration techniques for mixed-signal systems. He has industrial experience in radar and sonar algorithms, RF and analog ICs, satellite communication systems, localization techniques.

This talk investigates the implementation of radar algorithms on GPUs. The focus is on electronically scanned search radars. GPUs enable us to develop high performance digital processing systems with limited development time. It is possible to employ a single commercial board to perform all the algorithms of a search radar including downconversion, amplitude/phase correction, pulse compression, beam forming, spectrum analysis, and CFAR noise floor estimation.

Level: Beginner
Type: Talk
Tags: Signal & Audio Processing; Computational Physics; Developer - Algorithms; Developer - Performance Optimization

Day: Thursday, 03/19
Time: 10:00 - 10:25
Location: Room 210D
View Recording
View PDF

S5404 - Tuning Low-Mode Deflation for Systems with Multiple Right-Hand Sides on GPUs

Alexei Strelchenko Computational Physics Developer, Fermilab
Alexei Strelchenko
Alexei Strelchenko is a computational physics developer at the Fermi National Accelerator Laboratory. He received a Ph.D. degree from Leipzig University, Germany. From 2009 to 2012 he was working on the Lattice QCD on GPU Architectures project at the Computation-based Science and Technology Research Centre (CaSToRC). He was also participating in European PRACE-1IP (Partnership For Advance Computing in Europe) and LinkSCEEM-2 projects. His main research interests focus on computational physics, general purpose computing on graphics processing units (GPGPU), Lattice QCD.

In this session we will describe how to employ the mixed precision technique to accelerate solutions of large sparse linear systems with multiple right-hand sides on GPUs. Here we will focus on the incremental eigCG algorithm which allows to compute a number of small magnitude eigenvalues while solving the first few (hermitian) systems with Conjugate Gradient, and then to reuse this information to deflate the CG solver for the remaining systems. While the mixed precision technique itself is a well-known optimization approach for linear solvers, its utilization for the eigenvector computing within eigCG requires special consideration. We will discuss implementation aspects of the mixed precision deflation and analyse its efficiency on the example of the Lattice QCD fermion matrix inverters.

Level: Advanced
Type: Talk
Tags: Computational Physics; Developer - Algorithms; Developer - Performance Optimization

Day: Thursday, 03/19
Time: 10:00 - 10:25
Location: Room 210F

S5414 - GPU-Enabled VDI and Rendering at Architecture and Engineering Firm HDR

Clint Pearson IT Infrastructure Systems Lead, HDR, Inc.
Clint Pearson
A GTC Alumni, Clint Pearson has been in technology for over 20 years - working or consulting as an IT Infrastructure Administrator and Data Center Systems Architect. Clint has designed and managed complex computing environments and has led the virtualization of most systems at HDR for the past 5 years. HDR is a global Engineering and Architecture (AEC) firm with over 200 offices spanning 19 world timezones. HDR is headquartered in Omaha, Nebraska. Over the past several years at HDR, Clint has been instrumental in the design and implementation of GPU-enabled VMware Horizon View systems. In addition to technology design and consulting, Clint has had many opportunities to speak at various technology conferences and professional organization group events.
Jeremy Korell IT Infrastructure Systems Lead, HDR, Inc.
Jeremy Korell
A GTC Alumni, Jeremy Korell has been an advocate and champion of all things technology for over 15 years - working or consulting in every major industry/sector in one capacity or another including college instructor, Federal and private-sector contracting, Java and .NET development, IT Infrastructure Architecture and Administration. Jeremy has worked in various capacities within HDR's IT Infrastructure Systems group for the past 7 years. HDR is a global Engineering and Architecture (AEC) firm with over 200 offices spanning 19 world timezones. HDR is headquartered in Omaha, Nebraska. Over the past several years at HDR, Jeremy has been instrumental in the successful design and implementation of GPU-enabled Citrix XenApp and VMware Horizon View published desktops and applications. In addition to technology design and consulting, Jeremy has had many opportunities to speak at various technology conferences and professional organization group events.

At GTC 2013, Clint and Jeremy caught the vision of GPU-enabled VDI using NVIDIA GRID™, as well as many other applications of GRID for HDR, a global engineering and architecture design firm based in Omaha Nebraska. Ever since, Clint and Jeremy have been leading the HDR IT Group to fund and implement a GPU-Enabled VMware View system to enable global work-sharing from a central data center.

Level: All
Type: Talk
Tags: Graphics Virtualization; Data Center, Cloud Computing & HPC; Manufacturing; Press-Suggested Sessions: Professional Graphics

Day: Thursday, 03/19
Time: 10:00 - 10:50
Location: Room LL20D
View Recording
View PDF

S5437 - GPU Acceleration of Acquisition Footprint Removal in Post-Stack Seismic Data

Jonathan Marbach Sr. Software Engineer, CGG GeoSoftware
Jonathan Marbach
Jonathan Marbach received his Ph.D. in Computer Science from the University of Colorado at Boulder in 2010 where he pursued improving multi-viewer virtual reality for large-scale immersive systems. His interest in GPUs has continued through his work bringing GPU-acceleration of geoscientific imaging algorithms to CGG’s Insight Earth, a seismic interpretation package. A three-time GTC speaker, Jon enjoys sharing his experiences implementing algorithms on the GPU to help encourage others to do the same. When not spending time making things go faster, Jon enjoys slowing down to spend time with his daughters Genevieve and Penelope, and his wife Amy.

Learn how GPU-accelerated acquisition footprint removal improves seismic interpretation results and workflow throughput. Even in modern seismic surveys, acquisition footprint can persist in post-stack 3D surveys, causing artifacts in downstream interpretation workflows. CGG's Insight Earth can now perform structure-oriented de-striping, including removing oblique footprint, in record-time via GPU acceleration. In this talk, the presenters will not only demonstrate the benefits of these advances to interpreters, but will discuss how their perspective on GPU acceleration has changed after several years of inclusion in their commercial interpretation system.

Level: Beginner
Type: Talk
Tags: Energy Exploration; Developer - Performance Optimization

Day: Thursday, 03/19
Time: 10:00 - 10:25
Location: Room 210E
View Recording
View PDF

S5441 - GPU-Accelerated Quantum ESPRESSO: Achievements and Challenges in Running Real Science Cases

Filippo Spiga HPC Application Specialist / GPU Developer, High Performance Computing Service (University of Cambridge) / Quantum ESPRESSO Foundation
Highly-Rated Speaker
Filippo Spiga
Filippo is a HPC Application Specialist working at the High Performance Computing Service (HPCS) at the University of Cambridge. Previously he worked in top-level Research institutes/High Performance Computing centres (ICHEC, CINECA, CERN), and in Enterprise R&D (IBM Research) as well as wide multi-institutional collaborations (PRACE and EUAsiaGrid). At that time where he was at ICHEC, the QE-GPU project started and then it rapidly grew, attracting users from both the accademia and industries. As member of the Quantum ESPRESSO Foundation, he has responsibility of several aspects of the QE-GPU project: new developments, bug fixing, code maintenance, dissemination and coordinating new people that want to contribute (directly or indirectly). Starting from August 2013 he is appointed as one of the four Directors of the Foundation. His main interests include general GP-GPU programming, numerical algorithms for GP-GPU, development of mixed multi-core CPU and GPU code and scientific application porting.

Quantum ESPRESSO is an integrated suite of computer codes for electronic-structure calculations and materials modeling at the nano-scale. The GPU-accelerated Quantum ESPRESSO project started early 2011 and it has evolved and extended beyond the initial goals and QE-GPU plug-in it is now used by many users that run their calculations from small workstations to big supercomputers. Some new features have been under development and testing for long time to ensure robustness, correctness, longevity and portability. Top performance is not always the first priority. The aim of this talk is to present challenges and achievements of running on various heterogeneous systems by presenting real science cases gathered from Quantum ESPRESSO users.

Level: All
Type: Talk
Tags: Life & Material Science; Supercomputing; Press-Suggested Sessions: HPC & Science

Day: Thursday, 03/19
Time: 10:00 - 10:25
Location: Room 212A
View Recording

S5470 - Enabling Efficient Use of UPC and OpenSHMEM PGAS Models on GPU Clusters

Dhabaleswar K. (DK) Panda Professor, The Ohio State University
Dhabaleswar K. (DK)  Panda
Dhabaleswar K. (DK) Panda is a Professor of Computer Science and Engineering at the Ohio State University. He has published over 300 papers in major journals and international conferences. The MVAPICH2 (High Performance MPI over InfiniBand, iWARP and RoCE) open-source software package, developed by his research group (http://mvapich.cse.ohio-state.edu), are currently being used by more than 2,225 organizations worldwide (in 73 countries). This software has enabled several InfiniBand clusters to get into the latest TOP500 ranking during the last decade. More than 223,000 downloads of this software have taken place from the project's website alone. He is an IEEE Fellow and a member of ACM.

Learn about extensions that enable efficient use of Partitioned Global Address Space (PGAS) Models like OpenSHMEM and UPC on supercomputing clusters with NVIDIA GPUs. PGAS models are gaining attention for providing shared memory abstractions that make it easy to develop applications with dynamic and irregular communication patterns. However, the existing UPC and OpenSHMEM standards do not allow communication calls to be made directly on GPU device memory.This talk discusses simple extensions to the OpenSHMEM and UPC models to address this issue. Runtimes to support these extensions, optimize data movement using features like CUDA IPC and GPUDirect RDMA and exploiting overlap are presented. We demonstrate the use of the extensions and performance impact of the runtime designs.

Level: Intermediate
Type: Talk
Tags: Supercomputing; Developer - Tools & Libraries; Data Center, Cloud Computing & HPC

Day: Thursday, 03/19
Time: 10:00 - 10:25
Location: Room 212B
View Recording
View PDF

S5534 - 3D Backprojection: Meeting the Challenge for Performance in Medical Imaging

Lars Nyland Architect, NVIDIA
Highly-Rated Speaker
Lars Nyland
Lars Nyland is a GPU architect at NVIDIA, working on the architecture of GPUs for the purpose of general-purpose programming. He works on features for computing, such as new instructions for the compiler and caches. He also explores how applications run on the GPU, looking for ways to improve their performance by improving the architecture.
Julien Demouth Developer Technologist, NVIDIA
Highly-Rated Speaker
Julien Demouth
Julien Demouth is a member of the Developer Technology team at NVIDIA where he works on accelerating applications on GPUs. He holds a Ph.D in Computational Geometry from INRIA / Université Nancy 2 in France.
Feiwen Zhu GPU Architect, NVIDIA
Feiwen Zhu is a GPU architect at NVIDIA Shanghai for two years. He got his master's degree from Fudan University. He is focusing on predicting, analyzing and optimizing HPC applications for GPU architectures.
Sky Wu Senior GPU Architect, NVIDIA
Sky Wu is a senior GPU architect at NVIDIA Shanghai with experience in computational chemistry, numerical optimization, etc. He explores ways of improving application performance in these fields by improving the GPU architecture.

In this session, we present the implementation in CUDA of a backprojection kernel for Cone-Beam Computed Tomography (CBCT) and study the performance of this kernel from a GPU architectural point of view. We will explain how we measured the utilization of different components of a GPU by this kernel. Our focus will be on Kepler and Maxwell architectures.

Level: Beginner
Type: Talk
Tags: Medical Imaging; Signal & Audio Processing

Day: Thursday, 03/19
Time: 10:00 - 10:50
Location: Room LL21B
View Recording
View PDF

S5547 - Retail bank: 400 times faster

Jun Xie Chief technology officer, Lactec
Dr Jun Xie is one of the pionner of data warehouse and data mining technique as applied to telecom and baning industry. He has 20 year working experience for more than 40 large projects. He is currently lead a team focus on development and application of GPU technique to banking industry

We present a database query engine that speeds up the process of querying a database table using GPU. The process has been applied to a CRM project of large bank where we observed 400 times faster than querying with DB2.

Level: All
Type: Talk
Tags: Finance; Big Data Analytics; Data Center, Cloud Computing & HPC

Day: Thursday, 03/19
Time: 10:00 - 10:25
Location: Room 210C
View Recording
View PDF

S5612 - Fighting Malware With GPUs in Real Time

Peter Kovac Senior Researcher, Avast Software
Peter Kovac
Peter Kovac has been working for Avast for nearly five years, currently holding the position of Senior Researcher. He is one of the authors of the GPU database that powers the classifier discussed in this talk. Peter believes in simple solutions for complex problems and likes to read fantasy books.

Dive deep into the problematic of protecting electronic devices such as PC's, smartphones or tablets against malicious software. In this talk we will show you how we handle the ever increasing number of malware samples produced by the malware ecosystem every day. To leverage similarities between the samples for automated classification we built a distributed database engine relying on GPUs. With query times of fraction of a second even using a compound distance function, this system is able to classify incoming samples in real time. Samples classified as malware are directly used to generate rules to identify similar samples in machines of our customers.

Level: All
Type: Talk
Tags: Big Data Analytics; Machine Learning & Deep Learning; Press-Suggested Sessions: Deep Learning & Computer Vision; Press-Suggested Sessions: HPC & Science

Day: Thursday, 03/19
Time: 10:00 - 10:25
Location: Room LL21C
View Recording
View PDF

S5665 - GPUs and the ImageNet Large Scale Visual Recognition Challenge (ILSVRC)

Olga Russakovsky Ph.D. Student , Computer Science, Stanford University
Olga Russakovsky
Olga Russakovsky (http://ai.stanford.edu/~olga) is a computer science PhD student at Stanford University advised by Professor Fei-Fei Li. Her main research interest is in computer vision, specifically focusing on large-scale object detection and recognition. For the past two years she has been the lead organizer of the international ImageNet Large Scale Visual Recognition Challenge which was featured in the New York Times, MIT Technology Review, and other international media venues. She has organized several workshops at top-tier computer vision conferences: the ImageNet challenge workshop at ICCV’13 and ECCV’14, the upcoming workshop on Large-Scale Visual Recognition and Retrieval at CVPR’15, and the new Women in Computer Vision workshop at CVPR’15. During her PhD she collaborated closely with NEC Laboratories America and with Yahoo! Research Labs. She was awarded the NSF Graduate Fellowship and the CRA undergraduate research award.
Alex Berg Assistant Professor, UNC Chapel Hill
Alex is interested in all aspects of computer vision and related problems in other fields. His thesis was on shape and object recognition in images using a new take on deform-able templates. He also works on large scale machine learning algorithms for object recognition and detection, image retrieval, recognizing and synthesizing human action in video, recovering human body poses from photographs, detecting and identifying human faces in images, detecting vehicles in images, and more.

This session will provide an introduction to the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), analyze the results of the 2014 challenge and provide a glimpse into what the 2015 challenge has in store. Highlights include a discussion of large-scale image recognition, a history of the ILSVRC and an overview of current techniques and trends in image classification and object detection, as well as the role that GPUs have played in this challenge.

Level: All
Type: Talk
Tags: Machine Learning & Deep Learning; Computer Vision & Machine Vision

Day: Thursday, 03/19
Time: 10:00 - 10:50
Location: Room 210A
View Recording

S5860 - Real Virtuality: Adventures in WebVR

Antti Jädertpolm (Fthr / TPOLM) CEO, Vizor.io
Antti  Jädertpolm (Fthr / TPOLM)
Fthr/TPOLM is a demoscene dude since 1994, making demos, graphics, music and programming. On the artistic side, he's made a career in freelance illustration, mainly making artwork for electronic music artists (Kettel, Eedl, Lackluster, Sense, Ilkae among others). On the business side, he's been involved in startups making innovative web platforms. 2005's Splice Music was the first online music sequencer, 2008's SongHi a gamified music learning platform and 2011 he founded Engi, a web-based visual programming tool for WebGL. He is now the CEO of Vizor.io, a startup concentrating on WebVR.

In this session, we will go over the brief history of online VR (WebVR) and what the future holds for the medium. We will showcase different frameworks and tools that are currently available and being used for WebVR and discuss how both businesses and art can benefit from it. Antti will demonstrate how he has used VR to create unique demoscene related effects through his own visual programming interface and share his experiences with VR in general.

Level: All
Type: Talk
Tags: NVScene; Augmented Reality & Virtual Reality; Web Acceleration

Day: Thursday, 03/19
Time: 10:00 - 10:50
Location: Room LL20A
View Recording
View PDF

S5865 - Supermicro's Application Optimized GPU System Solutions: Winning Strategies for Selecting Best Platforms (Presented by Supermicro)

Don Clegg VP Marketing & Business Development, Supermicro
Highly-Rated Speaker
Don Clegg
Bio to come.

As GPU-enabled computing matures, selecting the best hardware platform is more essential than ever. Successful enterprises understand the importance of optimizing compute power and density, I/O bandwidth and latency, plus electrical power-efficiency and cooling to ideally match the intended application within the specified budget. Supermicro, with its industry-leading building-block solutions, delivers the most comprehensive range of GPU-optimized platforms on the market. This presentation, featuring Supermicro's FatTwin™, Twin™, SuperBlade™ andrack/tower building blocks will highlight some of the most important architectural innovations to consider when selecting the best GPU platforms.

Level: All
Type: Talk
Tags: Data Center, Cloud Computing & HPC

Day: Thursday, 03/19
Time: 10:00 - 10:50
Location: Room LL21A
View Recording

S5227 - Financial Risk Modeling on Low-power Accelerators: Experimental Performance Evaluation of TK1 with FPGAs

Rajesh Bordawekar Research Staff Member, IBM T. J. Watson Research Center
Rajesh Bordawekar
Rajesh Bordawekar is a research staff member at IBM T. J. Watson Research Center and a member of the Programming Technologies department at the IBM T. J. Watson Research Center. Rajesh studies interactions between applications, programming languages/runtime systems, and computer architectures. He is interested in understanding how modern hardware, multi-core processors, GPUs, and SSDs impact design of optimal algorithms for main-memory and out-of-core problems. His current interest is exploring software-hardware co-design of analytics workloads. Specifically, he has been investigating how GPUs could be used for accelerating key analytics kernels in text analytics, data management, graph analytics, and deep learning.

We experimentally implement key financial risk modeling algorithms (e.g., Monte Carlo Pricing) on nvidia TK1 and compare its performance against a FPGA implementation. We compute both the FLOPS/dollar and FLOPS/watt, and describe pro and cons of using two different architectures for implementing financial risk modeling algorithms.

Level: Intermediate
Type: Talk
Tags: Finance; Embedded Systems; Developer - Algorithms

Day: Thursday, 03/19
Time: 10:30 - 10:55
Location: Room 210C
View Recording
View PDF

S5254 - Fast and Scalable Eigenvalue Solvers for 3D Photonic Crystals on GPUs

Weichung Wang Professor, National Taiwan University
Weichung Wang
Weichung Wang is a Professor at the Institute of Applied Mathematical Sciences, National Taiwan University. He received his Ph.D. degree in applied mathematics from the University of Maryland at College Park in 1996. He is interested in developing numerical algorithms and software for high-performance scientific computing. His researches usually involve numerical linear algebra, computational optimization, parallel computing, and their applications.

Explore new algorithms and techniques to solve large-scale Maxwell eigenvalue problems arising in the simulations of bandgap engineering. By using the proposed algorithms and the implementations, we have successfully computed the desired multiple interior eigenvalues of the eigensystems with dimension as large as 4.2 millions within 100 seconds by using one single GPU. The techniques are extended to multiple GPUs to solve eigenvalue problems with different wave vectors simultaneously so that we can shorten the time to plot a complete band structure diagram from days to minutes. The codes can also achieve almost linear scalability for parallel computers ranging from a workstation with multiple GPUs to a cluster with homogeneous or heterogeneous CPU and GPU.

Level: All
Type: Talk
Tags: Computational Physics; Life & Material Science; Developer - Algorithms; Supercomputing

Day: Thursday, 03/19
Time: 10:30 - 10:55
Location: Room 210F
View Recording
View PDF

S5263 - MAPS: Optimizing Massively Parallel Applications Using Device-Level Memory Abstraction

Eri Rubin VP R&D, SagivTech LTD.
Eri Rubin
Eri Rubin has over 20 years of experience as a software developer. Prior to joining SagivTech, Eri was a Team Leader of CUDA Development at OptiTex. He worked as a Senior Graphics Developer for IDT-E Toronto, on animation movies and TV specials. Eri has a Master of Science in Parallel Computing, from the Hebrew University of Jerusalem. He received his Bachelor of Science in Computer Science & Life Sciences and also studied Animation for 3 years at Bezalel Arts Academy, Jerusalem.

In this talk, we present MAPS: a novel library that helps developers write CUDA kernels faster, easier and with better performance, without losing flexibility. This library exposes a set of data structures and iterators, similar to STL containers, eliminating the need for complex index calculations that appear when implementing memory optimizations. The resulting code is shorter and simpler. Under the hood, the library implements complex platform-specific memory optimizations. Benchmarks show that the library has minimal overhead compared to implementing such optimizations manually, and sometimes even surpasses their performance. The library is header-only and open source.

Level: Intermediate
Type: Talk
Tags: Developer - Performance Optimization; Developer - Tools & Libraries

Day: Thursday, 03/19
Time: 10:30 - 10:55
Location: Room 210G
View Recording
View PDF

S5316 - DAG-Scheduled Linear Algebra Using Template-Based Building Blocks

Jonathan Hogg Researcher, Science and Technology Faciliities Council
Highly-Rated Speaker
Jonathan Hogg
Jonathan is a researcher and developer at STFC's Rutherford Appleton Laboratory, located near Oxford, England. He has developed a number of solvers, matrix scaling and ordering routines around sparse symmetric matrices, with a particular view towards mathematical optimization applications.

We describe our experiences using DAG-driven algorithms built from templated BLAS-like building blocks to implement LAPACK-like functionality at the single kernel level. There will be a particular focus on strong scaling of multiple small dense factorizations, as required for sparse direct methods. The main objective is to overlap expensive latency-bound pivoting operations with highly parallel matrix-matrix multiplication operations. As the later are dependent on the output of previous pivoting decisions, a directed-acyclic graph (DAG) scheduler is implemented using global memory to manage fine-grained inter-block parallelism.

Level: Advanced
Type: Talk
Tags: Developer - Algorithms; Developer - Tools & Libraries

Day: Thursday, 03/19
Time: 10:30 - 10:55
Location: Room 210H
View Recording
View PDF

S5318 - Rolls-Royce Hydra on GPUs Using OP2

Istvan Reguly Postdoctoral Research Associate, University of Oxford
Istvan Reguly
Istvan is a Postdoctoral Research Associate at the University of Oxford, working with Prof. Mike Giles on the acceleration of structured and unstructured mesh computations, collaborating with Rolls-Royce and the Atomic Weapons Establishment in the UK to introduce GPU support and future-proof their codes.

Learn how a Domain Specific Language can be used to accelerate a full-scale industrial CFD application. With OP2, you can easily describe your computational problem at a high level, and then generate CUDA code. We show how parallelization on an unstructured mesh is handled over a cluster of GPUs, and how a range of optimizations can be automatically applied during code generation for GPUs, such as conversion from Array-of-Structures to Structure-of-Arrays, the use of shared memory or caches to improve data reuse. We demonstrate that a 4x performance increase can be achieved with a K40 GPU over a server CPU, and present scaling up to 16 GPUs.

Level: Intermediate
Type: Talk
Tags: Computational Fluid Dynamics; Developer - Programming Languages; Computational Physics

Day: Thursday, 03/19
Time: 10:30 - 10:55
Location: Room 210B
View Recording
View PDF

S5350 - Justifying Reverse Time Migration Order of Accuracy on NVIDIA GPUs

Marcel Nauta Software Developer, Acceleware
Marcel Nauta
Marcel joined Acceleware in 2012 to advance the company's high performance seismic software for multi-core CPUs and NVIDIA GPUs. His areas of expertise include analysis and implementation of numerical algorithms for electromagnetic and seismic wave equations. Marcel has a published a number of technical papers on advanced finite-difference time-domain methods and was awarded the NSERC CGS-M for his Masters. He currently holds an AITF Industry Associates award for his research at Acceleware. Marcel has a B. Sc. in Physics and a M. Sc. in Electrical Engineering from the University of Calgary.

Theoretical Full Wave Modelling improvements present compromises between various hardware metrics such as computational intensity, GPU/CPU memory usage, and hard disk requirements. Varying the spatial order of accuracy changes a kernel from memory bound to compute bound and strongly affects register usage. This non-linear relationship between compute cost and order of accuracy determines the optimal configuration on a given hardware architecture. The first part of the presentation will focus on optimizing GPU kernels with varying spatial and temporal orders of accuracy. The second part will discuss benchmarks that show the optimal throughput of RTM jobs. Both isotropic and TTI kernels will be considered to illustrate flavours of the wave equation with differing computational intensity.

Level: Intermediate
Type: Talk
Tags: Energy Exploration; Computational Physics

Day: Thursday, 03/19
Time: 10:30 - 10:55
Location: Room 210E
View Recording
View PDF

S5412 - GPUDirect: Integrating the GPU with a Network Interface

Davide Rossetti Software Engineer, NVIDIA
Davide Rossetti has a degree in Theoretical Physics from Sapienza Rome University and is currently a senior engineer at NVIDIA Corp. His research focuses on High Performance Computing (parallel computing, high-speed networking architectures, numerical simulations), while his interests spans different areas such as HPC, computer graphics, operating systems, I/O technologies, GPGPUs, embedded systems, real-time systems.

In the GPU off-loading programming model, the CPU is the initiator, e.g. it prepares and orchestrates work for the GPU. In GPU-accelerated multi-node programs, the CPU has to do the same for the network interface as well. But the truth is that both the GPU and the network have sophisticated hardware resources, and these can be effectively short-circuited so to get rid of the CPU altogether. Meet PeerSync, which is a set of CUDA-Infiniband Verbs interoperability APIs which opens an unlimited number of possibilities. It also provides a scheme to go beyond the GPU-network duo, i.e. effectively employing the same ideas to other 3rd party devices.

Level: Advanced
Type: Talk
Tags: Supercomputing; Data Center, Cloud Computing & HPC

Day: Thursday, 03/19
Time: 10:30 - 10:55
Location: Room 212B
View Recording
View PDF

S5445 - Building the Best User Experience with Citrix XenApp & NVIDIA® GRID™

Thomas Poppelgaard Technology Evangelist, Poppelgaard.com
Thomas Poppelgaard
Thomas Poppelgaard is a Citrix Technology Professional (CTP), technology evangelist, and subject matter expert in remote graphics. Thomas is an independent consultant and enables companies around the world to virtualize 2D/3D graphics from the cloud or their own private datacenters. Thomas specializes in delivering virtualized CAD/CAE/CAM/entertainment/media applications with Citrix, VMware, and Microsoft. Thomas has extensive experience with Virtual Desktop Infrastructure (VDI) and Server Based Computing (SBC) using these technologies with NVIDIA. Thomas has been working with Citrix HDX 3D Pro and NVIDIA Graphics since 2008

Citrix XenApp (formerly Citrix WinFrame Server, Citrix MetaFrame Server and Citrix Presentation Server) is an application virtualization product that allows users to connect to their corporate applications from a wide range of computer systems and mobile devices. XenApp can host applications on central servers and allow users to interact with them remotely or stream and deliver them to user devices for local execution. Learn in this session customer cases, how and why NVIDIA GRID provided the best user experience. Learn how to build better user experience with application such as Google Earth, Adobe Reader, MS Office in a Citrix XenApp with NVIDIA GRID.

Level: Intermediate
Type: Talk
Tags: Graphics Virtualization; Data Center, Cloud Computing & HPC

Day: Thursday, 03/19
Time: 10:30 - 10:55
Location: Room LL20C
View Recording
View PDF

S5448 - Real-Time Telemetry Group Variant of Shaped Offset Quadrature Phase Shift Keying (SOQPSK-TG) Communications with CUDA

Andrew McMurdie Research Assistant, Brigham Young University
Andrew McMurdie
Andrew McMurdie is a Masters student at Brigham Young University in Electrical Engineering. He expects to graduate in April of 2015 with an emphasis on communications, signal processing, and algorithm design.

In this session we discuss our CUDA implementation of frame synchronization, frequency offset estimation, channel equalization, and demodulation for integrated Network Enhanced Telemetry(iNET) formatted SOQPSK-TG communications. Application is aeronautical telemetry downlinks. Algorithmic improvements yielding better parallelization allow us to receive and process samples in real-time for a sample rate greater than 20 Mb/s. Multiple channel equalizers are implemented and tested to produce multiple output bit streams. Bit-error rates for tests with real data are presented, showing that the system can efficiently equalize and process the data.

Level: Intermediate
Type: Talk
Tags: Signal & Audio Processing; Developer - Algorithms; Developer - Performance Optimization

Day: Thursday, 03/19
Time: 10:30 - 10:55
Location: Room 210D
View Recording
View PDF

S5538 - Fast Method to Find Critical Points of the Electron Density in Large Systems

Jorge Garza Professor, Universidad Autonoma Metropolitana
Jorge Garza
Jorge Garza obtained his Ph. D. at the Universidad Autónoma Metropolitana-Iztapalapa (UAMI) in México City by studying confinement effects on the electron structure of atoms, within the context of the density functional theory. After his Ph. D., Jorge Garza gained experience on parallel programming techniques at the Pacific Northwest National Laboratory working with quantum chemistry code suite NWChem. Dr. Garza has one position as full professor at the UAMI, he has published around 70 scientific reports related with quantum chemistry supported by parallel computing. In 2008, he was responsible of the installation of the fastest supercomputer in Latin America. Now, Dr. Garza applies parallel programming techniques on heterogeneous computing.

Learn how to distribute on GPUs the evaluation of some scalar fields involved in quantum chemistry methods. In particular, we analyze the electron density in large-size systems. We found critical points, bond paths and molecular graphs in a fast way by accelerating all evaluations (density and its derivatives) with GPUs. Additionally, we show how the evaluation of atomic properties, defined within the atoms in molecules approach, has been implemented on GPUs. This presentation is the final stage of one application designed to be executed on GPUs to analyze scalar and vector fields.

Level: Intermediate
Type: Talk
Tags: Life & Material Science; Supercomputing; Developer - Performance Optimization

Day: Thursday, 03/19
Time: 10:30 - 10:55
Location: Room 212A
View Recording
View PDF

S5540 - Building a Life-Size Automultiscopic Display Using Consumer Hardware

Andrew Jones Research Programmer, USC Institute for Creative Technologies
Andrew Jones has been a researcher in the Graphics Lab at the USC Institute for Creative Technologies since 2002. His research has covered reconstructing the Parthenon in Athens, high dynamic range photography, and 3D scanning of human faces, bodies, and performances. Currently, Andrew is finishing up his PhD work on rendering for automultiscopic 3D displays.

Automultiscopic displays allow multiple users to experience 3D content without the hassle of special glasses or head gear. Such displays generate many simultaneous images with high-angular density, so that each eye perceives a distinct and different view. This presents a unique challenge for content acquisition and rendering. In this talk, we explain how to build an automultiscopic display using off-the-shelf projectors, video-splitters, and graphics cards. We also present a GPU-based algorithm for rendering a large numbers of views from a sparse array of video cameras.

Level: Intermediate
Type: Talk
Tags: Visualization - Large Scale & Multi-Display; Augmented Reality & Virtual Reality; Computer Vision & Machine Vision

Day: Thursday, 03/19
Time: 10:30 - 10:55
Location: Room LL21F
View Recording
View PDF

S5618 - GPU Computing: A VFX Plug-In Developer's Perspective

Stephen Bash Senior Software Developer, GenArts Inc.
Stephen Bash
After graduating with a degree in aeronautical and astronautical engineering from Purdue University, Stephen began his career at MIT's Lincoln Laboratory working in a variety of fields including propulsion, control systems, modeling and simulation, real time signal processing, and high performance computing. In 2010 he moved into the entertainment industry with GenArts Inc. where he is a senior software developer on their flagship product, Sapphire Plug-ins, the leading specialized visual effects software for the media and advertising industries worldwide.

Making GPU plug-ins is hard! This talk is a very personal view on why, how CUDA helps, where it hurts, some of the emerging challenges, and what makes using CUDA for image-processing visual effects so worthwhile. I'll talk about multi-GPU, the challenges of mixed languages, multiple OSes and supporting lots of hosts. We'll get into some technical details such as APIs and libraries, but it will be easily understood by anyone.

Level: Beginner
Type: Talk
Tags: Media & Entertainment

Day: Thursday, 03/19
Time: 10:30 - 10:55
Location: Room LL21D
View Recording
View PDF

S5861 - Android Performance Patterns: Flow

Etienne Caron Android Team Lead, TrueKey
Etienne's been an active member of the Android developer community in Montreal since 2010, and has been regularly devoting time towards mentoring startups, developers and students in the mobile space. As part of the Montreal Android GDG, he organises meetups, hackathons, and gives regular talks on Android-related subjects. He's now part of the Google Developer Expert program. He got his start in programming in the mid 80s when he started obsessing over C64 demos, and went on to organize the NAID 95/96 demoparties. He is now Android team lead for TrueKey, Intel's digital identity manager. Previously, Etienne worked on projects ranging from high-availability stock trading software platforms, all the way to large scale municipal bike sharing systems (London's cycle Hire amongst others). He lives in Montreal with his wonderful wife and daughter.

On mobile devices, tactile feedback provides a very close, personal interaction with users. Lack of speed or sluggishness compromises this feedback loop, and multiple UX studies have shown this has a very real impact on users and how they use your software. Fluid feedback can have huge impact on getting your work noticed and adopted by users. A well crafted UI/UX can induce 'flow', or hyperfocus in your users. Something demos usually excel at provoking in viewers. In this session, we'll leverage demoscene know-how to create rich dynamic user interfaces, combining shader rendering tricks with traditional Android UI elements. We'll also learn how to efficiently use the Android platform tools to keep your framerate at a rock-solid 60fps.

Level: All
Type: Talk
Tags: NVScene; Rendering & Ray Tracing

Day: Thursday, 03/19
Time: 13:00 - 13:50
Location: Room LL20A
View Recording

S5111 - NVIDIA GRID™ and vGPU: Best Practices for Designing and Monitoring

Florian Becker Sr. Director - Strategic Alliances, Lakeside Software, Inc.
Florian Becker
Florian Becker is an experienced software executive with a primary focus on end-user computing and virtualization. Florian led the worldwide consulting solutions practice at Citrix for many years, where he introduced the Citrix Desktop Transformation methodology and service offerings. Since 2013, Florian has been leading Lakeside Software's strategic alliances with a focus on Citrix, NVIDIA, and other key partners. Prior to his time at Citrix, Florian held various positions in software development, technical services, product management and consulting services at Epic Systems, a leading provider of healthcare software. Florian holds a Master of Science degree in Computer Information Systems from the University of Miami and a Bachelor of Science equivalent in Physics from the Munich Technical University.
Ben Murphy Senior Applications Engineer and Product Manager, Lakeside Software Inc.
Ben Murphy
Ben Murphy is a product manager and senior applications engineer for Lakeside Software, a leading big data analytics company focused on end-user computing. Ben manages the SysTrack MarketPlace program and has worked closely with NVIDIA to create dynamic planning reports for the implementation and monitoring of vGPU enabled VDI systems and physical systems. He holds both BS and MS degrees in Mechanical Engineering from Northwestern University.

Learn how to implement NVIDIA GRID™ technology in virtual desktop and application environments for graphics accelerated use cases. Apply industry-leading best practices to accurately assess the user community's current graphical application parameters and GPU utilization and use the data to accurately size and scale the vGPU implementation in VDI use cases. Monitor virtual GPUs to proactively detect changes in performance requirements of the end-user community and manage the end-user experience and to pinpoint performance bottlenecks in the environment.

Level: Beginner
Type: Talk
Tags: Graphics Virtualization; Manufacturing

Day: Thursday, 03/19
Time: 14:00 - 14:50
Location: Room LL20C
View Recording

S5116 - Out-of-Core Proximity Computation on GPU for Particle-Based Fluid Simulations

Duksu Kim Senior researcher, (KISTI) Korea Institute of Science and Technology Information
Duksu Kim
Duksu Kim is currently a senior researcher at KISTI (Korea Institute of Science and Technology Information), South Korea. He received his Ph. D. degree in Computer Science in 2014 from KAIST. His research interests include proximity computation, and large-scale parallel computing.

Lean how to use your GPU for massive-scale particle-based fluid simulations that require a larger amount of memory space than the video memory. We introduce a novel GPU-based neighbor search algorithm used in particle-based fluid simulations such as SPH. With the proposed method, we can efficiently handle a massive-scale particle-based fluid simulation with a limited GPU video memory in out-of-core manner. We have demonstrated that our method robustly handles massive-scale benchmark scenes consisting of up to 65 million particles and requires up to 16 GB memory by using a GPU having only 3 GB memory. It shows up to 26 times higher performance compared to using NVIDIA's mapped memory technique and 51 times higher performance compared to using a CPU core.

Level: Intermediate
Type: Talk
Tags: Computational Fluid Dynamics; Computational Physics; Real-Time Graphics; Developer - Algorithms

Day: Thursday, 03/19
Time: 14:00 - 14:25
Location: Room 210B
View Recording
View PDF

S5160 - Experiences in Porting Scientific Applications to GPUs Using OpenACC

Saber Feki Computational Scientist, KAUST
Saber Feki
Saber Feki received his PhD in computer science at the University of Houston in 2010. His research focused on automatic performance tuning using machine learning techniques. In 2011, he joined the oil and gas company TOTAL as an HPC Research Scientist working on seismic imaging application using different programming models including CAF, OpenACC and HMPP. Currently, he holds the position of a computational scientist at the KAUST Supercomputing Laboratory where he collaborates with various researchers in computational sciences including seismic imaging and electromagnetics. He also participates in giving trainings and teaching various courses such as MPI and OpenACC.
Ahmed Al-Jarro Principal Researcher, Fujitsu Laboratories of Europe Ltd
Ahmed Al-Jarro
Ahmed Al-Jarro received the BEng degree in electronic engineering with Spanish and the PhD degree in electrical and electronic engineering from the University of Nottingham, UK, in 2001 and 2004, respectively. He has held several research positions in computational electromagnetics at various academic institutions including the University of Nottingham, UK, King Abdullah University of Science and Technology, KSA, University College London, UK. He currently holds the position of Principal Researcher at Fujitsu Laboratories of Europe, London, UK. His current research interests include the development of large-scale and massively parallel simulation tools that exploits the emerging many- and multi-core computing architectures.

Learn how to effectively use the directive-based OpenACC programming model to accelerate scientific applications and easily harness the computational power of GPUs. We share in this session our experiences in porting and tuning three applications to GPUs using OpenACC: (i) an explicit seismic imaging kernel used in the Reverse Time Migration and Full Waveform Inversion applications, widely used in oil and gas exploration, where we show that fine tuning some of its clauses results in better performance, (ii) an implicit solver used in CFD for simulating the fluid structure interaction of flow over airfoil, and (iii) a CEM code that is based on the time-domain volume-integral-equation for simulating transient electromagnetics using both CAPS and PGI compilers.

Level: Intermediate
Type: Talk
Tags: OpenACC; Developer - Programming Languages; Computational Physics; Developer - Performance Optimization; Energy Exploration

Day: Thursday, 03/19
Time: 14:00 - 14:50
Location: Room 220C
View Recording
View PDF

S5211 - Accurate Floating-Point Summation in CUB

Uri Verner Ph.D. Candidate / Intern, Technion / NVIDIA
Uri Verner
Uri is a PhD student in Technion institute (Israel), where his research is focused on processing real-time data streams on GPU-based systems. Uri interned at the DevTech compute group at Nvidia in summer 2014.

We address the problem of accurate parallel floating-point summation. Two issues with current methods for parallel summation of floating-point numbers on GPUs are (1) loss of precision due to error propagation, and (2) the bitwise-exact result is not reproducible with a different architecture or configuration. We present a new efficient method for parallel accurate summation of an array of floating point numbers in CUB. The method computes a full-precision sum by recovering and keeping track of the round-off error. The method is implemented using parallel primitives such as sort and scan, and so it takes advantage of future optimizations of these primitives to new architectures. Our method can reduce the number of iterations in some iterative linear solvers, such as lattice QCD.

Level: All
Type: Talk
Tags: Developer - Algorithms; Developer - Tools & Libraries

Day: Thursday, 03/19
Time: 14:00 - 14:25
Location: Room 210H
View Recording
View PDF

S5219 - Delivering Production Deployments Using Virtualization and NVIDIA GRID™

Adam Jull CEO, IMSCAD
Adam Jull
Adam started IMSCAD to bridge the knowledge gap between the virtualization platform providers and the design focussed ISV's, their re-sellers and ultimately their customers. Adam started out in sales within the telecoms industry before moving to virtualization. He was introduced to Autodesk in 2008 and could immediately see the benefits for their customers in the virtualization of their products. IMSCAD was formed to explore the potential of this opportunity.The initial focus was supporting Autodesk and their global channel partners with virtualization of their design software. This led to the forming of strategic partnerships with key players like NVIDIA, HP, IBM and numerous other ISV's who operate within this space.

Attendees will hear IMSCAD case studies of real world deployments using NVIDIA GRID™, various design applications on Citrix and the challenges faced when deploying this technology. If your looking to virtualize with NVIDIA GRID, this session is a must see.

Level: All
Type: Talk
Tags: Graphics Virtualization; Data Center, Cloud Computing & HPC; Product Design & Styling; Manufacturing

Day: Thursday, 03/19
Time: 14:00 - 14:25
Location: Room LL20D
View Recording
View PDF

S5311 - To 3D or not to 3D? Why GPUs Are Critical for 3D Mass Spectrometry Imaging

Eri Rubin VP, R&D, SagivTech Ltd.
Eri  Rubin
Eri Rubin has over 20 years of experience as a software developer. Prior to joining SagivTech, Eri was a Team Leader of CUDA Development at OptiTex. He worked as a Senior Graphics Developer for IDT-E Toronto, on animation movies and TV specials. Eri has a Master of Science in Parallel Computing, from the Hebrew University of Jerusalem. He received his Bachelor of Science in Computer Science & Life Sciences and also studied Animation for 3 years at Bezalel Arts Academy, Jerusalem.

Big data kind of problems emerge in the analysis of biological samples. Advanced acquisition methods that provide 3D mass spectrometry information along with sophisticated learning algorithms call for fast computation methods. GPUs are an enabling technology to allow the analysis of the ever growing mass spectrometry data. Come hear about the machine learning algorithms migrated to the GPU environment, including Probabilistic Latent Semantic Analysis and Hierarchical Clustering Distance Calculation, with acceleration of more than 1-2 orders of magnitude. This work was under the framework of 3D Massomics, a European FP7 funded project that includes partners with expertise in imaging mass spectrometry, analytical chemistry, medicine, statistics, bioinformatics, and parallel computing.

Level: All
Type: Talk
Tags: Machine Learning & Deep Learning; Life & Material Science; Big Data Analytics

Day: Thursday, 03/19
Time: 14:00 - 14:25
Location: Room 210A
View Recording
View PDF

S5358 - Beyond Pair Potential: A CUDA Implementation of REBO Potential

Przemyslaw Tredak Ph.D. Student, University of Warsaw
Przemyslaw Tredak
Przemyslaw Tredak is a Ph.D. student at Faculty of Physics, University of Warsaw. His main research interest is simulation of crystal growth in covalent systems using Molecular Dynamics and GPUs. During his Ph.D. he performs simulations of growth of graphene on the silicon carbide substrate.

Classical molecular dynamics is a very important method in computational physics, chemistry and biology. It is also very computationally demanding. That is why it was among the first scientific methods to be ported to run on the GPUs. However, only some types of potentials used in MD, namely pair potentials, were ported. Other types, like REBO many body potential, very important to simulate systems of C and H, are still computed on the CPU. The reason for this lies in a huge complexity of many body potentials, as well as in a lack of an efficient communication scheme between threads that would resolve race conditions without atomic operations. This work shows a method of overcoming these difficulties in the CUDA implementation of 2nd generation REBO potential and achieved speedup.

Level: Intermediate
Type: Talk
Tags: Life & Material Science; Computational Physics; Developer - Algorithms

Day: Thursday, 03/19
Time: 14:00 - 14:25
Location: Room 212A
View Recording
View PDF

S5416 - Accelerad: Daylight Simulation for Architectural Spaces Using GPU Ray Tracing

Nathaniel Jones P.h.D Student, MIT
Nathaniel Jones
Nathaniel Jones researches sustainable architectural design at the Massachusetts Institute of Technology, where he is a doctoral student in the Building Technology Program. His research interests include: Automated translation protocols for producing simulation-ready building energy performance models from architectural CAD models, fast radiant heat transfer simulation using graphics processing units, optimization algorithms for selecting building massing and materials according to performance-based metrics, and development of architectural design software tools to aid in energy-conscious design. Previously, Nathaniel was a research associate at Cornell University, where he earned his Master of Architecture degree in 2009. His work there in the Program of Computer Graphics was aimed to provide architects with feedback on the energy efficiency of buildings during conceptual and schematic design and resulted in a speedup in the calculation of direct incident radiation on buildings by 10,000 times.

This talk introduces Accelerad, a simulation tool for modeling naturally and artificially lit spaces using NVIDIA® OptiX™ ray tracing engine. Three challenges encountered in implementing physically-based ray tracing on the GPU are presented: (1) the need for large numbers of bounces which leads to poor warp coherence; (2) the use of irradiance caching that does not naturally lend itself to parallelism, and (3) the need for validation against physical measurement. The solutions implemented in Accelerad are described, along with test results which show that Accelerad achieves accuracy comparable to current best simulation practices in the building industry while running at speeds up to fifty times faster.

Level: All
Type: Talk
Tags: Rendering & Ray Tracing; Visualization - In-Situ & Scientific; Computational Physics

Day: Thursday, 03/19
Time: 14:00 - 14:25
Location: Room LL21E
View Recording
View PDF

S5497 - Accelerating Proton Computed Tomography on Heterogeneous Systems

Thomas Uram Principal Software Developer, Argonne National Laboratory
Thomas Uram
Thomas Uram is staff at the Argonne Leadership Computing Facility and the Computation Institute at the University of Chicago. His interests include parallel programming, next-generation architectures, and large-scale analysis and visualization.

Proton computed tomography is a medical imaging technology with the potential to produce more accurate volumetric reconstructions at a lower radiation dose than X-ray computed tomography, albeit with greater computational demands. While in X-ray CT the photons propagate through the target volume in straight lines, the protons in pCT are scattered by the material in the target volume, resulting in a curvilinear path that must be approximated, and a system of equations that must be iteratively solved. We describe the adaptation of the two dominant compute phases for the GPU, compare their performance on the CPU and GPU, and describe efforts to improve the GPU performance. The first phase achieves an 11x speedup; the second phase involves a sparse iterative solver and achieves a 2x speedup.

Level: Intermediate
Type: Talk
Tags: Medical Imaging

Day: Thursday, 03/19
Time: 14:00 - 14:25
Location: Room LL21B
View Recording

S5507 - High-Performance Broadcast with GPUDirect RDMA and InfiniBand Hardware Multicast for Streaming Applications

Dhabaleswar K. (DK) Panda Professor, The Ohio State University
Highly-Rated Speaker
Dhabaleswar K. (DK) Panda
Dhabaleswar K. (DK) Panda is a Professor of Computer Science and Engineering at the Ohio State University. He has published over 350 papers in major journals and international conferences. The MVAPICH2 (High Performance MPI over InfiniBand, iWARP and RoCE) open-source software package, developed by his research group (http://mvapich.cse.ohio-state.edu), are currently being used by more than 2,225 organizations worldwide (in 73 countries). This software has enabled several InfiniBand clusters to get into the latest TOP500 ranking during the last decade. More than 223,000 downloads of this software have taken place from the project's website alone. He is an IEEE Fellow and a member of ACM.

Learn about the latest developments in middleware design that boosts the performance of GPGPU based streaming applications. Several middlewares already support communication directly from GPU device memory and optimize it using various features offered by the CUDA toolkit, providing optimized performance. Some of these middlewares also take advantage of novel features like hardware based multicast that high performance networks like InfiniBand offer to boost broadcast performance. This talk will focus on challenges in combining and fully utilizing GPUDirect RDMA and hardware multicast features in tandem to design support for high performance broadcast operation for streaming applications. Performance results will be presented to demonstrate the efficacy of the proposed designs.

Level: Intermediate
Type: Talk
Tags: Data Center, Cloud Computing & HPC; Developer - Tools & Libraries; Supercomputing

Day: Thursday, 03/19
Time: 14:00 - 14:25
Location: Room 210D
View Recording

S5510 - Real-Time Image Segmentation for Homeland Security Exploiting Hyper-Q Concurrency

Fanny Nina-Paravecino Ph.D. Candidate, Northeastern University
Fanny Nina-Paravecino
Fanny Nina-Paravecino is a Ph.D. Candidate in Computer Engineering at Northeastern University. She belongs to Northeastern University Research Group (NUCAR) under supervision of Dr. David Kaeli. She received her B.S. Summa cum laude in Computer Engineering from University of San Antonio Abad of Cusco in Perú in 2005. She received M.Sc. in Computer Engineering from University of Puerto Rico at Mayaguez in 2011. She achieved the best grade for undergrad thesis entitled "Virtual framework to simulate Industrial Robot" using OpenGL 3D graphics with C#. Her research interested focus on high performance optimization, with emphasis on parallel architectures. She is highlighted in Woman & CUDA on the NVIDIA website.

This talk will describe how concurrent kernel execution with Hyper-Q can impact our national security. By exploiting 32 concurrent work queues between the host and the device, we can identify the contents of baggage using CT images. This talk focuses on using Hyper-Q for real-time image segmentation as applied to luggage scanning at airports. Image segmentation plays a key role in this compute pipeline – the accuracy and real-time constraints of the application pose computational barriers. We discuss our ability to scale the number of streams using Hyper-Q, run on an NVIDIA GK110. We are able to achieve a ~47x speedup when processing 32 megapixels vs. an optimized OpenMP implementation running on an Intel Core i7-3770K.

Level: Beginner
Type: Talk
Tags: Defense; Video & Image Processing; Developer - Algorithms; Developer - Performance Optimization

Day: Thursday, 03/19
Time: 14:00 - 14:25
Location: Room LL21C
View Recording
View PDF

S5570 - Accelerating Derivatives Contracts Pricing Computation with GPGPUs

Daniel Augusto Magalhães Borges da Silva Manager, BMFBOVESPA
Daniel Augusto Magalhães Borges da Silva
Daniel is the manager in charge of risk management and calculation systems in BMFBOVESPA. He received his B.S. in Computer Science from PUC - SP and graduate degree in software engineering from UNICAMP. Daniel has an MBA in Derivatives and Capital Markets at BMFBOVESPA / USP Educational Institute and is currently enrolled in a Master's Program in Economics at FEA - USP.
Alexandre Barbosa Associate Director Pricing and Risk Systems, BMFBOVESPA
Alexandre Barbosa is an Associate Director at BMFBOVESPA. He has a Bachelor of Information Systems and a MBA in Capital Markets and Derivatives. He is responsible for Pricing Systems, Risk Calculation Systems and Risk Scenarios Management Systems.

Explore new techniques used by BMFBOVESPA, the Brazilian Stock Exchange, in the implementation of a new Close-out Risk Evaluation system (CORE) that saved $5 billion dollars in collaterals. CORE uses a set of GPGPUs to produce future price estimates on time. Session topics are: High-Level overview of BMFBOVESPA Clearing House; the CORE risk system; coding guidelines and the interface between CPU and GPU, as calculation routines needed to be the same for both environments; the use of GPUs to calculate 1.32 billion prices and its importance in a crisis event; performance analysis confronting CPU and GPU timings, showing that CPU is not powerful enough; the multi GPU and three tier production environment; daily usage and market results. No prior knowledge is required to attend this session.

Level: All
Type: Talk
Tags: Finance

Day: Thursday, 03/19
Time: 14:00 - 14:25
Location: Room 210C
View Recording
View PDF

S5623 - Game Engine Technology to Build Advanced Scientific Software Applications

Michele Isernia VP Strategy and Alliances, HUE AS
Michele Isernia
Mik is the VP for Strategic & Alliances at HUE (Norway) since October 2012, based in Boulder, Colorado. Prior to HUE he was with NVIDIA Corp. for 2 years, Microsoft Corp. for 3 years as well as Hewlett-Packard for over 3 years. He has also been a co-founder of 2 technology start-ups in the High Performance Computing (Engineered Intelligence/GEAR6) and Bio Sensors (Logisens) areas. Mik has held various management positions in R&D and Consulting for HP and Apollo Computers in various countries, always focusing on technical computing, science and research. Mik's education includes Computer Science at the University of Milan, Italy as well as an Executive MBA at the Stanford School of Business in California.

HueSpace is the only game engine for scientific computing, combining lightning fast computation on GPUs and CPUs with state of the art domain-oriented, multi-dimensional visualization and intelligent data handling of E&P data in an all-in-one, easy-to-use toolkit. Applications that rely on HueSpace benefit from unparalleled interactivity & scalability. The HueSpace Core Engine defines a scalable Object Model, which it then uses to implement an event-driven, multi-threaded dataflow architecture. It does this by efficiently managing the interaction between the Data, Compute, and Visualization systems to deliver exceptional application and system performance, with little to no limitations as far as data size. HueSpace truly realizes NVIDIA's Visual Computing for Science.

Level: All
Type: Talk
Tags: Energy Exploration; Developer - Tools & Libraries; Manufacturing

Day: Thursday, 03/19
Time: 14:00 - 14:25
Location: Room 210E
View Recording
View PDF

S5636 - Designing Studio Infrastructure for Uncompressed 6K Workflow: Using Adobe Premier for House of Cards and Gone Girl

Jeff Brue CTO, Open Drives
Jeff Brue
Jeff Brue is a Technical Consultant specializing in extremely performant studio architecture. Jeff has designed the compute architecture and storage infrastructure of over 40 facilities and productions world wide. He founded Open Drives in 2011 in order to provide the next generation of storage infrastructure for data centric production. Designing infrastructure for productions such as Lawless, American Hustle, and House of Cards, his background in Visual Effects and Post Production includes working on over 135 features and 60 commercial productions. Jeff is currently the CTO of Open Drives in Santa Monica.

Jeff Brue, CTO of Open Drives and Post Production Engineer for House of Cards and Fox's upcoming Gone Girl, will discuss the infrastructure challenges and solutions for working in 6K. The talk will cover system requirements and the unique scenarios that arise when visual effects are integrated in a deep seamless manner into editorial through Adobe Premiere. Jeff will also discuss designing large scale editorial and VFX deployments with HP and NVIDIA.

Level: All
Type: Talk
Tags: Media & Entertainment; Press-Suggested Sessions: Professional Graphics

Day: Thursday, 03/19
Time: 14:00 - 14:25
Location: Room LL21D
View Recording
View PDF

S5650 - Porting and Optimizing GTC-P Code to NVIDIA GPU

Bei Wang HPC Fusion Energy Specialist, Princeton University
Bei Wang is currently an Associate Research Scholar in Princeton Institute for Computational Science and Engineering at Princeton University. She received her PhD from University of California at Davis. Her research interests include particle-in-cell algorithm for kinetic plasma simulations, high performance computing applications and performance analysis and modeling.

Gyrokinetic Toroidal Code at Princeton (GTC-P) is a highly scalable particle-in-cell (PIC) code for studying microturbulence in magnetically-confined plasmas. As a representative particle-in-cell (PIC) code, GTC-P includes algorithmic level "scatter" and "gather" operations, which feature random memory access, potential fine-grained synchronization and low computational intensity. However, it is challenging to optimize this class of irregular codes on current HPC architectures. In this talk, we will present our efforts in porting and optimizing the GTC-P code on NVIDIA GPU. In particular, we will discuss the redesign of the "shift" kernel for Kepler architecture. The performance of the code will be demonstrated on the top 7 supercomputers world-wide.

Level: Intermediate
Type: Talk
Tags: Computational Physics; Supercomputing; Developer - Performance Optimization

Day: Thursday, 03/19
Time: 14:00 - 14:25
Location: Room 210F
View Recording
View PDF

S5728 - Graphics Programming Through the Ages

Michael Dille Sr. Computer Scientist, GGT/NASA Ames Research Center
Michael Dille
Dr. Dille is a Sr. Computer Scientist at SGT/NASA Ames Research Center where he designs space station hardware and works on the navigation system for an upcoming lunar rover mission. In his spare time, he tinkers with the many fun rabbit holes to be found in retrocomputing restoration and modding.
Keith Bare Filesystems Developer, NetApp
Keith Bare
Mr. Bare is a filesystems developer at NetApp Pittsburgh, where he works on data mobility features of Clustered Data ONTAP. Outside of work, he participates in a variety of projects ranging from systems administration to FPGA hackery and plays oboe in several community music ensembles. Both are core members of cmucc, an American demo group that earned several first place rankings and hosts the Demosplash party at Carnegie Mellon University in Pittsburgh, Pennsylvania.

Many new computers over the years have generated great excitement with ever more powerful graphical abilities. To facilitate this, machine designers developed a variety of creative (if largely now arcane) programming interfaces that allowed software authors to squeeze impressive displays from scant computational resources, an art perfected by the demoscene. This talk will focus on a few famous case studies such as the Commodore 64, the Amiga, and early PCs while exploring how each respective architecture influenced the style and appearance of demos on that platform. This chronology of hardware history provides the context to then appreciate the evolution of demos from machine-specific skills demonstrations to immersive graphical simulations, reaching the modern emphasis on aesthetics and production while offering a nod to today's "low-fi" demoscene that retains a focus on pure programming challenge.

Level: All
Type: Talk
Tags: NVScene; Real-Time Graphics

Day: Thursday, 03/19
Time: 14:00 - 14:50
Location: Room LL20A
View Recording

S5821 - POWER8 and GPUs: Helping Unfold the Intricate Loops of Genome Architecture (Presented by IBM)

Ido Machol Lead Scientific Programmer, Baylor College of Medicine, Baylor College of Medicine
Ido Machol
Ido is a leader at the Baylor College of Medicine working to find answers complex computational biology questions. For many years, he worked as a software team leader, exercising daily problem solving in software development and requirements deciphering. He learned that in order to deliver a quality product on time, a good planning of activities is required. Leading development teams for complex systems at small and big companies, taught him a lot about personal communications, human interactions and human-machine interactions. As a team leader, Ido is concerned about the quality and applicability of the software he develops, always looking for new creative methods to accomplish ever demanding new tasks. Ido graduated from The Academic College of Tel-Aviv, Yaffo with a BSc in Computer Science. This team at the Baylor College of Medicine led by Erez Lieberman Aiden, was selected recently as one of the five finalists for NVIDIA's Global Impact Award for "groundbreaking work that addresses social, humanitarian and environmental problems".

Develop new approaches and algorithms for high-throughput systematic identification of chromatin loops between genomic regulatory elements, utilizing Tesla GPUs to in-parallel and efficiently search the space of possible chromatin interactions for true chromatin loops. This team is working with IBM POWER8 and NVIDIA Tesla GPU technologies to creating customized algorithms for enabling genomics scientists to see fine details about genome folding and learn more about genetic regulation. The maps of looping revealed thousands of hidden switches not known to have existed before. For genes that cause diseases or cancers, locating these switches is essential. GPUs help speed up these algorithms up to 200x, reducing the cycle time to process a single chromosome from a week long process to less than a coffee break.

Level: Intermediate
Type: Talk
Tags: Big Data Analytics; Developer - Algorithms; Life & Material Science; Press-Suggested Sessions: HPC & Science

Day: Thursday, 03/19
Time: 14:00 - 14:50
Location: Room LL21A
View Recording
View PDF

S5146 - Data Movement Options for Scalable GPU Cluster Communication

Benjamin Klenk PhD Student, Ruprecht-Karls University of Heidelberg
Benjamin Klenk
Benjamin Klenk is a PhD Student at Ruprecht-Karls University of Heidelberg (Germany) and currently working in the Computing Engineering Group of Prof. Holger Fröning at the Institute of Computer Engineering. After pursuing his Master in Computer Engineering, he has started with his PhD in October 2013. His research interests include parallel computing, GPUs, interconnection networks and communication optimizations for distributed GPUs. Benjamin has contributed to the GGAS project and implemented a communication method based on put/get semantics for GPUs. He has already authored and co-authored several papers for conferences and workshops.

In this talk we will explore how existing communication models map to GPUs and what advantages specialized communication models for GPUs offer. GPU computing is used pervasively for many reasons including performance increase and improved energy efficiency. The Green500 list reveals that the top 10 most energy-efficient computing clusters rely on GPU acceleration. GPU computing at cluster-level is challenging though, as communication models match poorly and hybrid programming models like CUDA+MPI have to be employed. This talk provides observations and insights from experiments with different communication models, and shows promising paths to overcome these limitations.

Level: Intermediate
Type: Talk
Tags: Data Center, Cloud Computing & HPC; Supercomputing

Day: Thursday, 03/19
Time: 14:30 - 14:55
Location: Room 210D
View Recording

S5193 - Compact Cancer Killers: Simulating Next-Generation Laser-Driven Ion Accelerators with GPUs

Michael Bussmann Junior Group Leader "Computational Radiation Physics", Helmholtz-Zentrum Dresden - Rossendorf
Highly-Rated Speaker
Michael Bussmann is the leader of the Junior Group "Computational Radiation Physics". His group provides open source computational tools for plasma physics, particle acceleration, advanced light sources and large-scale data analysis. His research spans from radiation tumor therapy to particle acceleration, image reconstruction and astrophysics. Michael loves GPUs but thinks there is never enough register memory available.
Axel Huebl PHD Student, Helmholtz-Zentrum Dresden - Rossendorf
Axel Huebl is a PHD student in Physics at Helmholtz-Zentrum Dresden -Rossendorf, Dresden, Germany. He is one of the main developers of the many-GPU code PIConGPU and one of the Gordon Bell Prize finalists in 2013. His main interests are laser-driven acceleration of electron and ion beams with high power lasers and probing the dynamics of plasmas at the femtosecond and nanometer scale.

Radiation therapy with ion beams precisely targets the tumor, leaving surrounding healthy tissue unharmed. Usually, ion accelerators are huge in size and thus only found in few facilities worldwide. Using high-power laser systems for accelerating the ions could reduce the size and cost of such systems, potentially increasing the number of treatment facilities and thus giving more patients access to this promising therapy method. In order to bring laser acceleration of ions to application, realistic simulations of the acceleration process are needed. We present PIConGPU, a relativistic particle-in-cell plasma simulation code implemented on GPUs that is ideal for optimizing laser ion acceleration.

Level: All
Type: Talk
Tags: Computational Physics; Life & Material Science; Supercomputing; Press-Suggested Sessions: HPC & Science

Day: Thursday, 03/19
Time: 14:30 - 14:55
Location: Room 210F
View Recording
View PDF

S5197 - Acceleration of Electromagnetic Scattering from Discrete Bodies of Revolution (DBOR)

Eric Dunn Electromagnetic Research Scientist, Leidos
Eric Dunn
Dr. Eric Dunn currently serves as an Electromagnetic Research Scientist with Leidos as well as a Professorial Lecturer with the Department of Mathematics at George Washington University. His research interests include CEM hybrid techniques and GPU hardware acceleration of parallel algorithms. Prior to this his focus was on full-wave methods such as higher-order finite element boundary integral methods for body of revolution geometries.

1987 was the year that "The Simpsons" first aired on TV - now the longest-running scripted show in television history. Perhaps a slightly lesser known streak is that in 1987 one of the first studies of electromagnetic scattering from discrete bodies of revolution (DBOR) was published. Today, more than 25 years later, this same algorithm is still in use. Come join us to learn how we have used the latest in GPU technology to continue to accelerate this legendary algorithm - including our newest results using a library called Momentous (developed by TechX) that enables distributed GPU matrix factorization.

Level: Beginner
Type: Talk
Tags: Defense; Computational Physics; Supercomputing

Day: Thursday, 03/19
Time: 14:30 - 14:55
Location: Room LL21C
View Recording
View PDF

S5210 - GPU-Accelerated Spectral Caustic Rendering of Homogeneous Caustic Objects

Budianto Tandianus Research Associate, Nanyang Technological University
Budianto Tandianus
In 2005, Budianto Tandianus received his Bachelor Degree in Computer Science from STMIK Mikroskil in Medan, Indonesia. Afterward, he continued his study in Nanyang Technological University in Singapore and obtained the M. Sc. degree in Digital Media Technology in 2008. Right now, he is pursuing his PhD degree in Nanyang Technological University under the supervision of Prof. Seah Hock Soon and Dr. Henry Johan, with the reseach topic in caustic computation.

We propose a two-step acceleration scheme for spectral caustics rendering that takes into account information across visible wavelengths of the scene, index of refraction (caustic object), light power, and material reflectance (surface). In the first step, we analyze the index of refraction and we cluster the wavelengths based on refraction direction similarity in order to reduce the intersection tests. In the second step, we consider the surrounding objects properties (material reflectance and light power) and we compute the refinement amount of each wavelength cluster. Our accelerated algorithm can produce rendering results close to the reference images with a significant acceleration. We implement our two acceleration schemes by using OptiX, a GPU rendering engine built on top of CUDA.

Level: Beginner
Type: Talk
Tags: Rendering & Ray Tracing

Day: Thursday, 03/19
Time: 14:30 - 14:55
Location: Room LL21E
View Recording
View PDF

S5249 - Groovy and GPU: Enhancing Pricing Performance and Quant Productivity

Felix Grevy Director of Product Management, Misys
Felix Grevy
Director of Product Management at Misys, Felix Grevy is working in the finance industry for 15 years. After various roles in development, sales and product management, he is currently leading the technology strategy for Capital Market, where GPU is one of the key topics to achieve great value for customers.
Bram Leenhouwers Senior Architect, Misys
Bram Leenhouwers
Bram Leenhouwers is a senior architect at Misys. He leads the Fusion Parallel Platform team responsible for the Misys GPU pricing platform. Prior to Misys, he created two startups and worked at ESI, a security software company. When he’s not optimizing openCL code, you can find him playing guitar or video games with Dévi, his hardcore gamer wife.

Discover how Misys quants use a groovy DSL to write efficient GPU enabled pricing models without any OpenCL or CUDA knowledge. Allowing progressive migration from legacy code to GPU enabled models, this framework leverages GPGPU strengths to achieve high performance pricing with a really short learning curve. This session consists in a global overview of the framework, along with some simple pricing examples demonstrating the strengths and ease of use of this approach. We will also discuss how technical concerns are separated from financial modeling to maximize quants efficiency while leaving room for continuous platform improvement on the development side.

Level: Beginner
Type: Talk
Tags: Finance

Day: Thursday, 03/19
Time: 14:30 - 14:55
Location: Room 210C
View Recording
View PDF

S5329 - How Schlumberger Leveraged NVIDIA GPUs Using Open Inventor Toolkit

Michael Heck Technology Advisor, FEI - Visualization Sciences Group
Michael Heck
Michael Heck is the Technology Advisor for the Visualization Sciences Group of FEI Company. He has been involved in writing, managing, teaching and applying 3D visualization toolkits for scientific and engineering applications for over 30 years, surviving many different graphics platforms, languages and APIs. Mike is an Electrical Engineering graduate of the University of Pittsburgh. He has been a speaker or instructor at many conferences including GTC, SEG and SIGGRAPH.
Oyvind Yrke Project Manager, Schlumberger
Oyvind Yrke
Oyvind's current role is project manager in Petrel Visualization. Øyvind has a Master's Degree in Engineering Cybernetics and job experience in the oil and gas software as developer, architect and project manager.

This session will explain how Schlumberger's Petrel leveraged NVIDIA GPUs through usage of Open Inventor toolkit. Seismic interpretation, including volume rendering in geobody recognition and height field rendering for huge horizons will be presented.

Level: Intermediate
Type: Talk
Tags: Energy Exploration; Developer - Tools & Libraries

Day: Thursday, 03/19
Time: 14:30 - 14:55
Location: Room 210E
View Recording
View PDF

S5443 - Fast Sparse Matrix Multiplication for QMD using Parallel Merge

Jamaludin Mohd Yusof Research Scientist, Los Alamos National Laboratory
Jamaludin  Mohd Yusof
Jamal Mohd-Yusof is member of the Collaborative Programming team in Applied Computer Science group at LANL. He worked on Open Science programming for Roadrunner, where he developed a novel distributed tri-diagonal solver which which was instrumental in achieving significant speedup of the CFDNS-RR fluid dynamics code. He has been working with advanced architectures for several years, and teaches OpenCL courses at LANL. He is currently developing and profiling physics algorithms for a variety of advanced architectures, and was part of the team that won an R&D 100 Award for the Mantevo software suite, as a developer of the CoMD proxy-app. This work was performed with S. Mniszewski, M. Cawkwell and A. Niklasson of LANL.
Nikolay Sakharnykh Developer Technology Engineer, NVIDIA
Nikolay Sakharnykh
Nikolay is an HPC engineer with experience in scientific research and software development focusing on computational techniques related to physics, chemistry, and biology.

We present a novel sparse matrix formulation that uses modified merge algorithms. In contrast to conventional sparse matrix algorithms, which suffer from data divergence within large work arrays, this method allows us to maintain contiguous data layouts at all stages of the process. This also allows us to take advantage of ideas from optimized parallel merge algorithms for efficient GPU performance. Performance comparisons are presented. We are motivated by quantum mechanical simulations of atomic systems, which are limited by the computational cost of the eigenvalue solution. Linear scaling methods have been developed which require multiplication of large sparse matrices, where the number of non-zeros per row can be relatively large although still much less than the matrix dimension.

Level: All
Type: Talk
Tags: Life & Material Science; Computational Physics; Developer - Algorithms

Day: Thursday, 03/19
Time: 14:30 - 14:55
Location: Room 212A
View Recording

S5482 - Protecting Intellectual Property: CAD/CAM for Contractors and Countries of Concern

Fred Devoir Sr. Architect, Textron Inc.
Fred Devoir
Sr. Architect with 18 years of experience designing and developing complex systems. Deployed Citrix environment for NASA to provide the ISS with desktop computing for astronauts in space. Deployed 1500 seat Citrix environment for Design engineering at NASA JSC. Designed and in processes of deploying a 375 seat CAD/CAM environment for Bell Helicopter. Currently designing a 150 seat deployment for Cessna Aircraft (Scorpion project) to protect intellectual property with contractors and countries of concern.

Attend this session and join a discussion about protecting intellectual property in a distributed design and development environment. We will explore the challenges of deploying 3D CAD/CAM toolsets for contract designers as well as remote manufacturing engineering facilities in countries of concern.

Level: Intermediate
Type: Talk
Tags: Graphics Virtualization; Manufacturing

Day: Thursday, 03/19
Time: 14:30 - 14:55
Location: Room LL20D
View Recording
View PDF

S5492 - Fast Digital Tomosynthesis for LIVE Radiation Therapy

Alexandros-Stavros Iliopoulos Ph.D. candidate, Department of Computer Science, Duke University
Alexandros-Stavros Iliopoulos
Alexandros-Stavros Iliopoulos is a Ph.D. candidate in the Department of Computer Science, Duke University. His research interests include developing efficient and robust computational models on parallel architectures, with applications to reconstruction from limited data, and signal/image processing. He is a Fulbright scholar, and received his Diploma in Electrical and Computer Engineering from the Aristotle University of Thessaloniki, Greece, in 2011.

Learn about the recently developed LIVE radiation oncology imaging system for 4D localization of moving tumors, and how its computational reconstruction algorithm may enable clinical applicability during adaptive radiation therapy treatments. We discuss the approach of LIVE for high-fidelity reconstruction from a partial patient scan, together with its clinical significance and resulting computational challenges. By exploiting the GPU computing model and using a novel algorithm formulation, we obtain a simple and efficient reconstruction process, allowing LIVE to go into clinical trials for the first time. We present results with patient data, and remark on remaining challenges.

Level: All
Type: Talk
Tags: Medical Imaging; Video & Image Processing; Press-Suggested Sessions: HPC & Science

Day: Thursday, 03/19
Time: 14:30 - 14:55
Location: Room LL21B
View Recording
View PDF

S5518 - Exploiting GPU Caches in Sparse Matrix Vector Multiplication

Yusuke Nagasaka Master's Student, Tokyo Institute of Technology
Yusuke Nagasaka
Yusuke Nagasaka is a master's student at Tokyo Institute of Technology, Tokyo, Japan, working under Prof. Satoshi Matsuoka. He received his B.S. in 2014 from Tokyo Institute of Technology, Tokyo, Japan. His research interests include sparse matrix formats for many-core architectures.

We show the technique of sparse matrix vector multiplication (SpMV) fully exploiting GPU's caches. Many sparse algorithms such as conjugate gradient are occupied by SpMV computation, which includes random memory accesses to input vector. On GPU, the problem becomes more serious because it has only small cache. Our new sparse matrix formats for many-core processors significantly increase the cache hit ratio by segmenting the matrix along the columns. Performance evaluations show that we achieve up to x3.0 speedup in SpMV and x1.12 in CG, compared to cuSparse and recently proposed formats such as SELL-C-sigma. In iterative methods, we devise an auto-tuning mechanism for the segment sizes.

Level: All
Type: Talk
Tags: Developer - Algorithms; Supercomputing

Day: Thursday, 03/19
Time: 14:30 - 14:55
Location: Room 210H
View Recording
View PDF

S5592 - Using OpenCL for Performance-Portable, Hardware-Agnostic, Cross-Platform Video Processing

Dennis Adams Director of Technology, Sony Creative Software Inc.
Dennis Adams
Dennis Adams has been with Sony Creative Software (and Sonic Foundry before it) for 15 years, creating features for Vegas Pro editing software and working on the the video processing engine and effects. He led the Vegas Pro OpenCL GPU acceleration project which shipped in 2011, including dozens of accelerated video processing, effects and transitions.

This talk will discuss how Sony Creative Software used OpenCL to build a 4K video pipeline in Vegas Pro and the new Catalyst Prepare applications. It will cover the design as well as the promises and pitfalls of writing over 100 OpenCL kernels for all aspects of video processing from color management to plug-in video effects.

Level: Intermediate
Type: Talk
Tags: Media & Entertainment; Video & Image Processing

Day: Thursday, 03/19
Time: 14:30 - 14:55
Location: Room LL21D
View Recording
View PDF

S5756 - Sparse Fluid Simulation in Direct X

Alex Dunn Developer Technology Engineer - Graphics, NVIDIA
As a Developer Technology Engineer for NVIDIA, Alex spends his days passionately working towards advancing real time visual effects in games. A former graduate of, Abertay University's Games Technology Course, Alex got his first taste of graphics programming on the consoles. Now working for NVIDIA his time is spent working on developing cutting edge programming techniques to ensure the highest quality and best player experience possible is achieved.

How to simulate and render game-ready, high resolution fluid in real time on the GPU using DirectX. We'll present a new method for sparsely simulating and rendering traditional grid based fluid systems. By utilizing a simple CPU prediction algorithm, we can update the virtual memory table of the GPU to reflect only the active areas of a simulation volume, providing compressed memory storage and hardware level, memory translation for performing region look ups. This CPU prediction mechanism has a much wider use case than just fluid simulation, and is a must know for anyone planning on using tiled resources in the future.

Level: Advanced
Type: Talk
Tags: Computational Fluid Dynamics; Real-Time Graphics; Developer - Algorithms

Day: Thursday, 03/19
Time: 14:30 - 14:55
Location: Room 210B
View Recording
View PDF

S5788 - Recent Advances in Deep Learning at Microsoft: A Selected Overview

Li Deng Partner Research Manager, Microsoft
Li Deng received the Ph.D. degree from the University of Wisconsin-Madison. He was an assistant professor, tenured associate and full professor at the University of Waterloo, Ontario, Canada during 1989-1999, and then joined Microsoft Research, Redmond, USA, where he currently leads R&D of application-focused deep learning and machine intelligence as Partner Research Manager of its Deep Learning Technology Center. He is Fellow of the IEEE, and Editor-in-Chief of IEEE Signal Processing Magazine and of IEEE/ACM Transactions on Audio, Speech, and Language Processing.

Since 2009, Microsoft has engaged with academic pioneers of deep learning and has created industry-scale successes in speech recognition as well as in speech translation, object recognition, automatic image captioning, natural language, multimodal processing, semantic modeling, web search, contextual entity search, ad selection, and big data analytics. Much of these successes are attributed to the availability of big datasets for training deep models, the powerful general-purpose GPU computing, and the innovations in deep learning architectures and algorithms. In this talk, a selected overview will be given to highlight our work in some of these exciting applications, as well as the lessons we have learned along the way as to what tasks are best solved by deep learning methods.

Level: All
Type: Talk
Tags: Machine Learning & Deep Learning; Press-Suggested Sessions: Deep Learning & Computer Vision

Day: Thursday, 03/19
Time: 14:30 - 14:55
Location: Room 210A
View Recording