Sign In
GTC Logo
GPU
Technology
Conference

March 24-27, 2014 | San Jose, California
Slidecasts of GTC sessions are available now for conference registrants – please “Sign In” to view.
PDFs of presentation slides will be available by mid-April. Registrants must login to view slidecasts and PDFs.
For non-registrants, this GTC content will be available at the end of April on GTC On Demand.

GPU Technology Conference Schedule Planner

Print
 
  • List View
  • Calender View

 
Refine:
  • Session Levels:
  • |
  • |
  • |
  • |
  • Session Levels:
  •  
  •  
  •  
  •  
  • = Highly Rated Speaker

TALK

Presentation
Details

S4704 - Data Processing and Analytics for Defense

Christopher White ( Program Manager , DARPA )
Christopher White
Dr. Chris White joined DARPA as a program manager in August 2011. His focus is on developing the enabling technology required for efficiently processing, analyzing and visualizing large volumes of data in a military, mission-oriented context. Dr. White previously served DARPA as its country lead for Afghanistan and in-theater member of the Senior Executive Service supporting the commander of the NATO International Security Assistance Force, the Combined Joint Staff branch for Intelligence, the Afghan Threat Finance Cell and the regional military commands. Prior to joining DARPA as government staff, Dr. White was a researcher in DARPA's Information Innovation Office where he created techniques to better understand, measure and model social media and large networks of information. Dr. White was a Research Fellow at Harvard University's School of Engineering and Applied Sciences and the Johns Hopkins University's Human Language Technology Center of Excellence, researching large-scale data analytics for graphs and networks, natural language processing, machine learning and statistical methods for heterogeneous sources in real-world applications. Dr. White holds Ph.D. and M.S. degrees in Electrical Engineering from the Johns Hopkins University and a B.S. in Electrical Engineering from Oklahoma State University.

Join this session to learn about DARPA XDATA program, the program created by DoD to efficiently process and analyze vast amount of mission-oriented information for Defense activities. Data science programs in the DoD aim to meet the challenges of big data by developing computational techniques and software tools for processing and analyzing the vast amount of mission-oriented information for Defense activities. As part of this exploration, the DARPA XDATA program aims to address the need for scalable algorithms for processing and visualization of imperfect and incomplete data. And because of the variety of DoD users, XDATA intends to create human-computer interaction tools that could be easily customized for different missions. Finally, to enable large scale data processing in a wide range of potential settings, XDATA plans to release open source software toolkits to enable collaboration among the applied mathematics, computer science and data visualization communities.

Session Level: Beginner
Session Type: Talk
Tags: Big Data Analytics & Data Algorithms; Machine Learning & AI; Defense; Large Scale Data Analytics; Recommended Press Session – HPC-Science

Day: Monday, 03/24
Time: 09:30 - 09:55
Location: Room 210F

S4617 - A High Level API for Fast Development of High Performance Graphic Analytics on GPUs

Zhisong Fu ( CUDA Researcher, SYSTAP )
Zhisong Fu
Zhisong Fu is a CUDA researcher at SYSTAP, LLC where he works on efficient GPU graph processing based on Merrill's BFS code. Zhisong is a Ph.D. candidate in the School of Computing at the University of Utah, Salt Lake City and received his Bachelor of Science in Computer Science from Zhejiang University in Hangzhou, China.

The goal of this session is to demonstrate how our high level abstraction enables developers to quickly develop high performance graph analytics programs on GPUs with up to 3 billion edges traversed per second on a Tesla or Kepler GPU. High performance graph analytics are critical for a large range of application domains. The SIMT architecture of the GPUs and the irregularity nature of the graphs make it difficult to develop efficient graph analytics programs. In this session, we present an open source library that provides a high level abstraction for efficient graph analytics with minimal coding effort. We use several specific examples to show how to use our abstraction to implement efficient graph analytics in a matter of hours.

Session Level: Beginner
Session Type: Talk
Tags: Big Data Analytics & Data Algorithms; Large Scale Data Analytics; Defense

Day: Monday, 03/24
Time: 10:00 - 10:25
Location: Room 210F

S4611 - Speeding Up GraphLab Using CUDA

Vishal Vaidyanathan ( Partner, Royal Caliber )
Vishal Vaidyanathan
Vishal graduated from Stanford University in 2007 with a Ph.D. in Computational Chemistry and an M.S. in Financial Mathematics. He developed the first Folding@Home client that used GPUs to accelerate biomolecular simulations by 50 times over what was previously possible. From 2007-2009 Vishal worked at Goldman Sachs developing the first fully automated high frequency trading solution for the US Treasury desk in New York. Subsequently as co-founder of a startup in Silicon Valley, he developed low-latency trading systems and HFT strategies for futures contracts. Vishal joined Royal Caliber as a partner in 2012.

We demonstrate how describing graph algorithms using the Gather-Apply-Scatter (GAS) approach of GraphLab allows us to implement a general purpose and extremely fast GPU based framework for describing and running graph algorithms. Most algorithms and graphs demonstrate a large speedup over GraphLab. We show that speedup is possible when using multiple gpus within a box and that processing of large graphs is possible - with the latest Tesla cards over 48GB of GPU memory can be available within a single box. Example algorithms will include pagerank, bfs, and sssp. The precursor to this work serves as the basis for other attempts at a GPU based GAS framework.

Session Level: Beginner
Session Type: Talk
Tags: Big Data Analytics & Data Algorithms; Performance Optimization; Large Scale Data Analytics; Defense

Day: Monday, 03/24
Time: 11:00 - 11:25
Location: Room 210F

S4608 - Extending Python for High-Performance Data-Parallel Programming

Siu Kwan Lam ( Software Engineer, Continuum Analytics, Inc )
Siu Kwan  Lam
Siu Kwan Lam has a B.S.+M.S. degree in Computer Engineering from San Jose State University. He has researched TCP covert channel detection for NSF STC TRUST and has taught CUDA at San Jose State University during his senior year. At Continuum Analytics, he is the primary developer for NumbaPro, and maintains the opensource LLVMPY project.

Our objective is to design a high-level data-parallel language extension to Python on GPUs. This language extension cooperates with the CPython implementation and uses Python syntax for describing data-parallel computations. The combination of rich library support and language simplicity makes Python ideal for subject matter experts to rapidly develop powerful applications. Python enables fast turnaround time and flexibility for custom analytic pipelines to react to immediate demands. However, CPython has been criticized as being slow and the existence of the global interpreter lock (GIL) makes it difficult to take advantage of parallel hardware. To solve this problem, Continuum Analytics has developed LLVM based JIT compilers for CPython. Numba is the open-source JIT compiler. NumbaPro is the proprietary compiler that adds CUDA GPU support. We aim to extend and improve the current GPU support in NumbaPro to further increase the scalability and portability of Python-based GPU programming.

Session Level: Intermediate
Session Type: Talk
Tags: Big Data Analytics & Data Algorithms; Programming Languages & Compilers; Large Scale Data Analytics; Defense

Day: Monday, 03/24
Time: 13:00 - 13:25
Location: Room 210F

S4609 - High-Performance Graph Primitives on GPU: Design and Implementation of Gunrock

Yangzihao Wang ( Ph.D. Student, UC Davis )
Yangzihao Wang
Yangzihao Wang is a Computer Science Ph.D. student at UC Davis. His advisor is Prof. John Owens. His main research interests are 1) structure of parallelism and locality in irregular algorithms such as graph algorithms on GPU; 2) exascale computing and data analysis using Multi-GPUs.

Gunrock is a CUDA library for graph primitives that refactors, integrates, and generalizes best-of-class GPU implementations of breadth-first search, connected components, and betweenness centrality into a unified code base useful for future development of high-performance GPU graph primitives. The talk will share experience on how to design the framework and APIs for computing efficient graph primitives on GPUs. We will focus on the following two aspects: 1) Details of the implementations of several graph algorithms on GPUs. 2) How to abstract these graph algorithms using general operators and functors on GPUs to improve programmer productivity.

Session Level: Intermediate
Session Type: Talk
Tags: Big Data Analytics & Data Algorithms; Large Scale Data Analytics; Defense

Day: Monday, 03/24
Time: 13:30 - 13:55
Location: Room 210F

S4460 - Peer-to-Peer Molecular Dynamics and You

Scott LeGrand ( Principal Engineer, Amazon Web Services )
Highly-Rated Speaker
Scott LeGrand is currently a principal engineer at Amazon Web Services. He developed the first molecular modeling system for home computers, Genesis, in 1987, Folderol, the distributed computing project targeted at the protein folding problem in 2000, and BattleSphere, a networkable 3D space shooter for the Atari Jaguar the same year. Surprisingly, all three of these efforts shared a common codebase. More recently, he ported the Folding@Home codebase to CUDA, achieving a 5x speedup over previous efforts and which currently accounts for ~2.6 petaFLOPs of the project's computational firepower. He is best known for his work porting the AMBER molecular dynamics package to CUDA, attaining record-breaking performance in the process. In a previous life, Scott picked up a B.S. in biology from Siena College and a Ph.D. in biochemistry from the Pennsylvania State University. In the current life, he is developing life science services on Amazon's Elastic Compute Cloud (EC2).

Recent code optimization within AMBER has improved single-node performance by up to 30% and multi-GPU scaling by up to 70%. The latter was achieved by aggressive use of Peer-to-Peer copies and RDMA. This has unleashed new time scale regimes for sampling and simulation on low-end GPU clusters, beating every known software-based molecular dynamics codebase in existence at the time of submission. This talk will cover first how AMBER's already efficient single-node performance was made even more so, the challenge not only of enabling peer to peer copies between GPUs, but obtaining hardware capable of enabling it, and finally, up to the minute results using MVAPICH2 and OpenMPI for RDMA directly between GPUs on separate nodes connected by dual-line FDR Infiniband.

Session Level: Intermediate
Session Type: Talk
Tags: Molecular Dynamics; Big Data Analytics & Data Algorithms; Supercomputing

Day: Tuesday, 03/25
Time: 13:00 - 13:50
Location: Room LL21E

S4506 - Indexing Documents on GPU - Can You Index Web in Real Time?

Michael Frumkin ( Sr. Computer Architect, NVIDIA )
Michael Frumkin
Michael contributed to many areas of parallel and high performance computing. He contributed to parallel image inspection algorithm (KLA-Tencor), benchmarking of parallel and distributed applications (NASA), performance modeling of parallel applications on many core systems (Intel), optimization of search engine and implementation of control mechanism in computer defined networks (Google).

Index of web documents provides a base for search and decision making. Traditionally, GPUs are used to run applications having a lot of parallelism and a small degree of divergence. We show that GPUs also are able to outperform CPUs for an application that has a large degree of parallelism, but medium divergence. Specifically, we concentrate on text processing used to index web documents. We present indexing algorithms for both GPU and CPU and show that GPU outperforms CPU on two common workloads. We argue that a medium sized GPU enabled cluster will be able to index all internet documents in one day. Indexing of web documents on GPU opens a new area for GPU computing. Companies that provide search services spend a lot of cycles on indexing. Faster and more energy efficient indexing on GPU may provide a valuble alternative to CPU-only clusters used today.

Session Level: Intermediate
Session Type: Talk
Tags: Big Data Analytics & Data Algorithms; Machine Learning & AI

Day: Tuesday, 03/25
Time: 13:00 - 13:25
Location: Room 210B

S4831 - GAIA: The GPU-Accelerated Distributed Database and Computational Framework Solving the Infinite Data Problem

Nima Negahban ( CTO, GIS Federal )
Nima Negahban, CTO Nima has developed enterprise-scale platforms & leading technologies across a wide spectrum of market sectors ranging from bio-technology to high speed trading systems. Beyond a cutting-edge technical expertise, Nima has a proven track record of leveraging his technical prowess to create compelling business strategies. Nima holds a BS in computer science from the University of Maryland.

A distributed database and computational framework designed to leverage the GPU. GAIA's unique semantic type system coupled with its near real time processing,query, and visualization capability have made it the solution for Government Agencies coming to grips with being able to query and visualize high volume data streams. GAIA has been distributed to multiple Government Agencies including: Army, Navy, and DHS.

Session Level: All
Session Type: Talk
Tags: Defense; Supercomputing; Big Data Analytics & Data Algorithms; Cloud Visualization

Day: Tuesday, 03/25
Time: 13:00 - 13:25
Location: Room 210A

S4462 - GPUs and Regular Expression Matching for Big Data Analytics

Alon Shalev Housfater ( Software Developer, IBM )
Alon Shalev Housfater is a software engineer at the Hardware Acceleration Laboratory, IBM since 2011. He specializes in applying computational acceleration technology for enterprise computing. Alon holds a PhD in Electrical and Computer Engineering from the University of Toronto where he studied fundamental performance limits of broadcast communication systems.

Regular expression based pattern matching is a key enabling technology for a new generation of big data analytics. We'll describe several key use cases that require high throughput, low latency, regular expression pattern matching. A new GPU based regular expression technology will be introduced, its basic performance characteristics will be presented. We'll demonstrate that the GPU enables impressive performance gains in pattern matching tasks and compare its performance against latest generation processors. Finally, we'll examine the key challenges in using such accelerators in large software products and highlight open problems in GPU implementation of pattern matching tasks.

Session Level: Intermediate
Session Type: Talk
Tags: Big Data Analytics & Data Algorithms; Recommended Press Session – HPC-Science

Day: Tuesday, 03/25
Time: 13:30 - 13:55
Location: Room 210B

S4320 - Packet-based Network Traffic Monitoring & Analysis with GPUs

Wenji Wu ( Network Researcher, Fermilab )
Wenji Wu
Wenji Wu received the doctorate degree in computer engineering in 2003 from the University of Arizona, Tucson. He is currently a network researcher in Fermi National Accelerator Laboratory. His research interests include high performance networking, operating systems, and distributed systems.

In high-speed networks, network traffic monitoring and analysis applications may require enormous raw compute power and high I/O throughputs, especially when traffic scrutiny on a per-packet basis is needed. Under those conditions, the applications face tremendous performance and scalability challenges. The GPU architecture fits well with the features of packet-based network monitoring and analysis applications. At Fermilab, we have prototyped a GPU-assisted network traffic monitoring & analysis system, which analyzes network traffic on a per-packet basis. We implemented a GPU-accelerated library for network traffic capturing, monitoring, and analysis. The library consists of various CUDA kernels, which can be combined in various ways to perform monitoring and analysis tasks. In this talk, we will describe our architectural approach in developing a generic GPU-assisted network traffic monitoring and analysis capability. Multiple examples will be given to demonstrate how to use GPUs to analyze network traffic.

Session Level: All
Session Type: Talk
Tags: Big Data Analytics & Data Algorithms; Computational Physics; Supercomputing; Numerical Algorithms & Libraries; Recommended Press Session – HPC-Science

Day: Tuesday, 03/25
Time: 14:00 - 14:50
Location: Room 210B

S4561 - Virtual Screening of One Billion Compound Libraries Using Novel GPU-Accelerated Cheminformatics Approaches

Olexandr Isayev ( Research Scientist, University of North Carolina at Chapel Hill )
Olexandr Isayev
Olexandr Isayev is a Research Scientist at UNC Eshelman School of Pharmacy at the University of North Carolina at Chapel Hill. His research interests include the broad areas of cheminformatics, molecular dynamics and materials science. Olexandr earned his Ph.D. in Theoretical Chemistry in 2008 and worked at Case Western Reserve University and US Army ERDC prior to joining UNC this year.

Recent years have seen an unprecedented growth of chemical databases incorporating tens of millions of available or up to 170 billion of synthetically feasible chemical compounds. They offer unprecedented opportunities for discovering novel molecules with the desired therapeutical and safety profile. However, current cheminformatics technologies and software relying on conventional CPUs are not capable to handle, characterize, and virtually screen such "Big Data" chemical libraries. We present the first proof-of-concept study of GPU-accelerated cheminformatics software capable of calculating chemical descriptors for a billion molecules-large library. Furthermore, we demonstrate the ability of GPU-based virtual screening software to rapidly identify compounds with specific properties in extremely large virtual libraries. We posit that in the era of big data explosion in chemical genomics, GPU computing represents an effective and inexpensive architecture to develop and employ a new generation of cheminformatics methods and tools.

Session Level: Intermediate
Session Type: Talk
Tags: Molecular Dynamics; Big Data Analytics & Data Algorithms

Day: Tuesday, 03/25
Time: 14:00 - 14:25
Location: Room LL21E

S4651 - Deep Learning Meets Heterogeneous Computing

Ren Wu ( Distinguished Scientist, Baidu )
Ren Wu
Dr. Ren Wu is a distinguished scientist of Baidu. He was the lead architect for Heterogeneous system Architecture (HSA) and before that, he was the Principal Investigator of CUDA Research Center at HP Labs. Dr. Wu is renowned for pioneering the idea of using GPUs to accelerate big data analytics as well as his work on GPU-accelerated large-scale clustering algorithms. At Baidu, Dr. Wu is leading the effort to build the company's heterogeneous computing platform - a turbo engine to power Baidu's business and to unlock a new kind of intelligence.

The rise of the internet, especially mobile internet, has accelerated the data explosion - a driving force for the great success of deep learning in recent years. Behind the scenes, the heterogeneous high-performance computing is another key enabler of that success. In this talk, we will share some of work we did at Baidu. We will highlight how big data, deep analytics and high-performance heterogeneous computing can work together with great success.

Session Level: All
Session Type: Talk
Tags: Machine Learning & AI; Big Data Analytics & Data Algorithms; Supercomputing; Video & Image Processing; Recommended for All Press

Day: Tuesday, 03/25
Time: 14:00 - 14:50
Location: Room LL21B

S4133 - OpenMM Molecular Dynamics on Kinases: Key Cancer Drug Targets Revealed with New Methods and GPU Clusters

Vijay Pande ( Professor and Director, Stanford University )
Highly-Rated Speaker
Vijay Pande
Prof. Pande is currently the Director of the Program in Biophysics, Director of the Folding@home Distributed Computing project, and a Professor of Chemistry and (by courtesy) of Structural Biology and of Computer Science at Stanford University. His current research centers on the development and application of novel cloud computing simulation techniques to address problems in chemical biology. In particular, he has pioneered novel distributed computing methodology to break fundamental barriers in the simulation of kinetics and thermodynamics of proteins and nucleic acids. As director of the Folding@home project (http://folding.stanford.edu), Prof. Pande has, for the first time, directly simulated protein folding dynamics with quantitative comparisons with experiment, often considered a "holy grail" of computational biology. His current research also includes novel computational methods for drug design, especially in the areas of protein misfolding and related diseases such as Alzheimer's Disease. Prof. Pande received a BA in Physics from Princeton University in 1992. Prof. Pande has won numerous awards, including the Michael and Kate Bárány Award for Young Investigators from Biophysical Society (2012), Thomas Kuhn Paradigm Shift Award, American Chemical Society (2010), Fellow of the American Physical Society (2008), Irving Sigal Young Investigator Award from the Protein Society (2006), the MIT Indus Global Technovator's Award (2004), a Henry and Camile Dreyfus Teacher-Scholar award (2003), being named to MIT's TR100 (2002), and named a Frederick E. Terman Fellow (2002).

Learn how to use GPU-enabled molecular dynamics codes, parallelized on a cluster of 100 GPUs, and sample key conformational transitions. When applied to protein kinase molecules, key targets in anti-cancer drugs, these methods reveal new insights into how to target new drugs to these systems.

Session Level: Beginner
Session Type: Talk
Tags: Molecular Dynamics; Big Data Analytics & Data Algorithms; Bioinformatics & Genomics; Computational Physics; Recommended Press Session – HPC-Science

Day: Tuesday, 03/25
Time: 14:30 - 15:20
Location: Room LL21E

S4459 - Parallel Lossless Compression Using GPUs

Evangelia Sitaridi ( Ph.D. Candidate, Columbia University )
Evangelia Sitaridi
Evangelia Sitaridi is a Ph.D. Candidate in the Computer Science Department of Columbia University in the City of New York. Her PhD research focuses on on database processing using GPUs. Before starting her PhD she graduated with a BSc and MSc degree from the Department of Informatics and Telecommunications of the University of Athens. During the summers of 2012 and 2013 she interned at IBM Almaden and TJ Watson.

Given the high cost of enterprise data storage, compression is becoming a major concern for the industry in the age of Big Data. Attendees can learn how to efficiently offload data compression to the GPU, leveraging its superior memory and compute resources. We focus on a the DEFLATE algorithm that is a combination of the LZSS and Huffman entropy coding algorithms, used in common compression formats like gzip. Both algorithms are inherently serial and trivial parallelization methods are inefficient. We show how to parallelize these algorithms efficiently on GPUs and discuss trade-offs between compression ratio and increased parallelism to improve performance. We conclude our presentation with a head-to-head comparison to a multi-core CPU implementation, demonstrating up to half an order of performance improvement using a single Kepler GPU. This is joint work with IBM researchers Rene Mueller and Tim Kaldewey.

Session Level: Intermediate
Session Type: Talk
Tags: Big Data Analytics & Data Algorithms

Day: Tuesday, 03/25
Time: 15:00 - 15:25
Location: Room 210B

S4882 - First Glimpse into the OpenPOWER Software Stack with Big Data Workload Example (Presented by IBM)

Ken Rozendal ( Distinguished Engineer, IBM )
Ken Rozendal is the chief architect for IBM's Linux Technology Center. Previously, he was lead architect for IBM's Linux kernel development and earlier for IBM's AIX kernel development organization. He created the original version of the CD-ROM filesystem in AIX and worked on the AIX design for diskless systems, the virtual memory and filesystem support for SMP systems, workload management, and application checkpoint/restart.
Keith Campbell ( Senior Software Engineer, IBM )
Keith Campbell is a senior software engineer at IBM, working in the Hardware Acceleration Laboratory since 2012. He specializes in applying computational acceleration technology for enterprise computing. Keith received a Bachelors of Math degree from the University of Waterloo in Waterloo, Ontario with majors in Computer Science and Combinatorics & Optimization.

The OpenPOWER Foundation (http://www.open-power.org/) is an open alliance of companies working together to expand the hardware and software ecosystem based on the POWER architecture. This collaboration across hardware and software vendors enables unique innovation across the full hardware and software stack. OpenPOWER ecosystem partners and developers now have more choice, control and flexibility to optimize at any level of the technology from the processor on up for next-generation, hyperscale and cloud datacenters. Integrating support for NVIDIA GPUs on the POWER platform enables high performance enterprise and technical computing applications such as Big Data and analytics workloads. This presentation will cover the software stack and developer tools for OpenPOWER, the planned support for CUDA, and a proof of concept showing GPU acceleration. This proof of concept will be available as a demo in the IBM booth.

Session Level: Intermediate
Session Type: Talk
Tags: Big Data Analytics & Data Algorithms; Programming Languages & Compilers; Debugging Tools & Techniques; Recommended Press Session – HPC-Science

Day: Tuesday, 03/25
Time: 15:00 - 15:50
Location: Room LL20D

S4266 - Concurrency and Overlapping: Fully Exploit Every Single GPU for Large-Scale Machine Learning

Yun Zhu ( Ph.D. Candidate, Georgia State University )
Yun Zhu
Yun Zhu is currently a Ph.D. student in Computer Science Department at Georgia State University. He received his BS at East China University of Science & Technology and MS at Emory University. His research interests include machine learning algorithm and high performance computing.

Learn how to tackle large-scale machine learning model that exceeds the device memory capacity of a GPU without using additional GPUs and maintain competitive performance comparing to those with everything on board. (1) Remove the device memory limitation by splitting parameter set into small parts that fits in device memory and update one by one. (2) Neutralize the overhead caused by the extra memory transferring of small parameter subsets between host and device using a)overlapping of memory copy and kernel execution and b) concurrent kernels. (3) Figure out the computation power limit of the GPU and select the optimal configuration of concurrency. Example will be given using Restricted Boltzman Machine (RBM) with billions of parameters.

Session Level: Intermediate
Session Type: Talk
Tags: Machine Learning & AI; Big Data Analytics & Data Algorithms

Day: Tuesday, 03/25
Time: 15:30 - 15:55
Location: Room LL21B

S4410 - Visualization and Analysis of Petascale Molecular Simulations with VMD

John Stone ( Senior Research Programmer, Associate Director CUDA Center of Excellence, University of Illinois )
Highly-Rated Speaker
John Stone
John Stone is a Senior Research Programmer in the Theoretical and Computational Biophysics Group at the Beckman Institute for Advanced Science and Technology, and Associate Director of the NVIDIA CUDA Center of Excellence at the University of Illinois. Mr. Stone is the lead developer of VMD, a high performance molecular visualization tool used by researchers all over the world. His research interests include molecular visualization, GPU computing, parallel processing, ray tracing, haptics, and virtual environments. Mr. Stone was awarded as an NVIDIA CUDA Fellow in 2010. Mr. Stone also provides consulting services for projects involving computer graphics, GPU computing, and high performance computing. Prior to joining University of Illinois in 1998, Mr. Stone helped develop the award winning MPEG Power Professional line of video compression tools at Heuris.

We present recent successes in the use of GPUs to accelerate challenging molecular visualization and analysis tasks on hardware platforms ranging from commodity desktop computers to the latest Cray XK7 supercomputers. This talk will focus on recent algorithm developments and the applicability and efficient use of new CUDA features on state-of-the-art Kepler GPUs. We will present the latest performance results for GPU accelerated trajectory analysis runs on Cray XK7 petascale systems and GPU-accelerated workstation platforms. We will conclude with a discussion of ongoing work and future opportunities for GPU acceleration, particularly as applied to the analysis of petascale simulations of large biomolecular complexes and long simulation timescales.

Session Level: Intermediate
Session Type: Talk
Tags: Molecular Dynamics; Supercomputing; Big Data Analytics & Data Algorithms; Scientific Visualization

Day: Tuesday, 03/25
Time: 15:30 - 16:20
Location: Room LL21E

S4447 - Rhythm: Harnessing Data Parallel Hardware for Server Workloads

Sandeep Agrawal ( Student, Duke University )
Sandeep Agrawal
Sandeep Agrawal is a third-year Ph.D. student in the Computer Science Department at Duke University. His interests broadly lie in the field of computer and systems architecture, especially in the area of energy and performance optimization.

We present Rhythm, a framework for high throughput servers that exploits similarity across web service requests to improve server throughput and energy efficiency. Present work in data center efficiency primarily focuses on scale-out, with off the shelf hardware used for individual machines leading to an inefficient usage of energy and area. Rhythm improves upon this by harnessing data parallel hardware to execute "cohorts" of web service requests, grouping requests together based on similar control flow and using intelligent data layout optimizations. An evaluation of the SPECWeb Banking workload for future server platforms on the GTX Titan achieves 4x the throughput (reqs/sec) of a core i7 at efficiencies (reqs/Joule) comparable to a dual core ARM Cortex A9.

Session Level: Beginner
Session Type: Talk
Tags: Big Data Analytics & Data Algorithms

Day: Tuesday, 03/25
Time: 15:30 - 15:55
Location: Room 210B

S4515 - Real-Time Geospatial Processing with NVIDIA® GPUs and CUDA-Based GIS Library

Srinivas Reddy ( CTO, SRIS )
Mr. Reddy is CTO at SRIS, a leading technology provider focusing primarily on GPU technology. He is responsible for overseeing the design and development of applications on Geospatially-Enabled Supercomputers and architecting next generation HPCs and HTCs using GPUs. He currently leads efforts to bring innovative solutions from various vendor partnerships to his clients. Mr. Reddy was a Senior Solutions Architect at IBM's Advanced Solutions, a leading provider of massive analytics serving federal markets. In 2009, Mr. Reddy received the prestigious NISC Chairman's Thought Leadership Award in recognition of his outstanding service, commitment to excellence, innovative approaches and solutions resulting in a positive impact to his customers' missions, high-quality performance, and noteworthy dedication to the realization of his client's objectives. Mr. Reddy has a B.S. degree in Zoology from Louisiana Tech University and a dual MBA/MS in Logistics and Transportation Management from Smith School of Business at University of Maryland at College Park.

GPUs have been highly successful in gaming technology and have become nearly universal in mobile devices, tablets, and laptops in their role as graphics processors and GIS platforms. User communities have been using GIS algorithms and libraries on CPUs for real-time computations. Since our problem set was a data-centric problem, we proposed that developing these GIS algorithms on NVIDIA GPUs should increase processing speed and efficiency. To test this hypothesis, we designed an architecture that included a Dell R720 with two NVIDIA K20x GPUs. We found that these algorithms lend themselves for perfect implementation on GPUs. Furthermore, GPUs became extremely important for their ability to process large amounts of data in real-time. Final testing revealed that processing speed increased by 200X (200/sec to 20,000/sec). In this session we will lay out the pitfalls of the current methods, our proposed architecture and some details about the findings.

Session Level: Intermediate
Session Type: Talk
Tags: Defense; Big Data Analytics & Data Algorithms

Day: Tuesday, 03/25
Time: 16:00 - 16:25
Location: Room 210D

S4656 - Machine Learning with GPUs: Fast Support Vector Machines without the Coding Headaches

Stephen Tyree ( Graduate Student, Washington University in St. Louis )
Stephen Tyree is a PhD student in the Department of Computer Science and Engineering at Washington University in St. Louis. He holds a Bachelors degree in computer science and mathematics and a Masters degree in computer science, both from the University of Tulsa. His research focuses on parallel and approximate methods for fast and scalable machine learning.

Speeding up machine learning algorithms has often meant tedious, bug-ridden programs tuned to specific architectures, all written by parallel programming amateurs. But machine learning experts can leverage libraries such as CuBLAS to greatly ease the burden of development and make fast code widely available. We present a case study in parallelizing Kernel Support Vector Machines, powerful machine-learned classifiers which are very slow to train on large data. In contrast to previous work which relied on hand-coded exact methods, we demonstrate that a recent approximate method can be compelling for its remarkably simple implementation, portability, and unprecedented speedup on GPUs.

Session Level: Intermediate
Session Type: Talk
Tags: Machine Learning & AI; Big Data Analytics & Data Algorithms; Numerical Algorithms & Libraries

Day: Tuesday, 03/25
Time: 16:00 - 16:25
Location: Room LL21B

S4674 - Parallel Decomposition Strategies in Modern GPU

Sean Baxter ( Research Scientist, NVIDIA )
Highly-Rated Speaker
Sean Baxter
Sean is a research scientist at NVIDIA.

Learn strategies to decompose algorithms into parallel and sequential phases. These strategies make algorithmic intent clear while enabling performance portability across device generations. Examples include scan, merge, sort, and join.

Session Level: Intermediate
Session Type: Talk
Tags: Big Data Analytics & Data Algorithms; Performance Optimization

Day: Tuesday, 03/25
Time: 16:00 - 16:25
Location: Room 210B

S4686 - NVIDIA GRID for VDI: How To Design And Monitor Your Implementation

Florian Becker ( Sr. Director, Strategic Alliances, Lakeside Software Inc. )
Florian Becker
Florian Becker leads the global alliance with Citrix and the Citrix ecosystem for Lakeside Software. Prior to that, Florian led the worldwide consulting solutions team at Citrix Systems, where he and his team developed the implementation best practices and professional services offering for virtual application and desktop use cases for Citrix Consulting Services and System Integrators. With more than 15 years of experience in the high tech and software industry and experience in user-focused design, Florian is uniquely positioned to bridge the worlds of IT and the end-user. A physicist by training, Florian holds a Master of science in Information Systems from the University of Miami and completed the Customer Focused Innovation curriculum at Stanford University.
Ben Murphy ( Senior Applications Engineer and Product Manager, Lakeside Software Inc. )
Ben Murphy is a Senior Applications Engineer and Product Manager for the MarketPlace program at Lakeside Software, the makers of SysTrack. His primary work focuses on mathematical data analysis and reporting to transform data collected from distributed end points and servers into meaningful recommendations and information. Currently he is engaged in ongoing work with NVIDIA to provide both data driven assessments of GPU needs in production environments and provide ongoing insight into GPU consumption and performance for both physical and virtual systems.

Learn how to implement NVIDIA GRID technology in virtual desktop and application environments for graphics accelerated use cases. Apply industry-leading best practices to accurately assess the user community's current graphical application parameters and GPU utilization and use the data to accurately size and scale the vGPU implementation in VDI use cases. Monitor virtual GPUs to proactively detect changes in performance requirements of the end-user community and manage the end-user experience and to pinpoint performance bottlenecks in the environment.

Session Level: Beginner
Session Type: Talk
Tags: Graphics Virtualization Summit; Remote Graphics & Cloud-Based Graphics; Big Data Analytics & Data Algorithms; Computer Aided Design; Recommended Press Session – Graphics Virtualization

Day: Tuesday, 03/25
Time: 16:00 - 16:50
Location: Room 210F

S4215 - GPU-Accelerated Large-Scale Dense Subgraph Detection

Andy Wu ( Research Scientist, Xerox Research Center )
Andy Wu
Andy Wu is a researcher working on large-scale data analytics project in XRCW (Xerox research center Webster, NY). In 2011, He graduated with a PhD degree on Computer science from Washington State University, Pullman, WA. The focus of his PhD research was on solving large-scale computational biology problems using high performance computers, and his research interests include parallel computing, string algorithms, graph algorithms and computational biology. He has published several papers in top-ranked conferences and journals, including TPDS, SC, ICPP, Nature Genetics, etc. He is utilizing GPU to solve the large-scale graph analytics in XRCW.

Large-scale dense subgraph detection problem has been an active research area for decades. It has numerous applications in web and bioinformatics domains. Therefore, numerous algorithms are designed to tackle this graph kernel. Due to the computation limitation, traditional approaches are infeasible when dealing with large-scale graph with millions or billions vertices. In this presentation, we proposed a GPU accelerated dense subgraph detection algorithm to solve the large-scale dense subgraph detection problem. It successfully mapped the irregular graph clustering problem into the GPGPU platform, and extensive experimental results demonstrated our strong scalability on the GPU computing platforms.

Session Level: Intermediate
Session Type: Talk
Tags: Big Data Analytics & Data Algorithms; Bioinformatics & Genomics

Day: Tuesday, 03/25
Time: 16:30 - 16:55
Location: Room 210B

S4680 - Exploiting the GPU for High Performance Geospatial Situational Awareness Involving Massive and Dynamic Data Sets

Bart Adams ( Software Engineering Manager, Luciad )
Bart  Adams
Dr. Bart Adams, Software Engineering Manager, is with Luciad since April 2009. He holds an MSc and Ph.D. in Engineering (University of Leuven, Belgium) and spent two years as a post-doctoral researcher at Stanford University, USA. Within the R&D group of Luciad, he manages the research on novel algorithms for high-performance computation and visualization on desktop and mobile devices.

Geospatial Situational Awareness(SA)engines face stringent accuracy and performance requirements. Large volumes of static and dynamic data need to be analyzed and visualized, in both 2D and 3D in various geographic projections, at sub-centimeter accuracy and interactive update rates. In contrast to game engines where this data can be pre-processed and stored in optimized data structures, the data comes in any form and needs to be interpreted on-the-fly. This talk will discuss these challenges and the advanced GPU rendering techniques and algorithms that address them. We will show that by exploiting the GPU, terabytes of terrain and imagery data, in combination with highly dynamic data streams that can contain millions of tracks and multiple radar feeds as well as orthorectified UAV video streams, can be handled on a world-scale theater at update rates of over 60Hz.

Session Level: Advanced
Session Type: Talk
Tags: Defense; Combined Simulation & Real-Time Visualization; Big Data Analytics & Data Algorithms; Desktop & Application Virtualization

Day: Tuesday, 03/25
Time: 16:30 - 16:55
Location: Room 210D

S4525 - Building Random Forests on the GPU with PyCUDA

Alexander Rubinsteyn ( Ph.D. Student, NYU )
Alex Rubinsteyn is a Computer Science Ph.D. student at NYU. His interests are a high variance mixture distribution around programming language implementation and machine learning.

Random Forests have become an extremely popular machine learning algorithm for making predictions from large and complicated data sets. The currently highest performing implementations of Random Forests all run on the CPU. We implemented a Random Forest learner for the GPU (using PyCUDA and runtime code generation) which outperforms the currently preferred libraries (scikits-learn and wiseRF). The "obvious" parallelization strategy (using one thread-block per tree) results in poor performance. Instead, we developed a more nuanced collection of kernels to handle various tradeoffs between the number of samples and the number of features.

Session Level: Intermediate
Session Type: Talk
Tags: Machine Learning & AI; Big Data Analytics & Data Algorithms

Day: Tuesday, 03/25
Time: 17:00 - 17:25
Location: Room LL21B

S4612 - Speeding Up GraphLab Using CUDA

Vishal Vaidyanathan ( Partner, Royal Caliber )
Vishal Vaidyanathan
Vishal graduated from Stanford University in 2007 with a Ph.D. in Computational Chemistry and an M.S. in Financial Mathematics. He developed the first Folding@Home client that used GPUs to accelerate biomolecular simulations by 50 times over what was previously possible. From 2007-2009 Vishal worked at Goldman Sachs developing the first fully automated high frequency trading solution for the US Treasury desk in New York. Subsequently as co-founder of a startup in Silicon Valley, he developed low-latency trading systems and HFT strategies for futures contracts. Vishal joined Royal Caliber as a partner in 2012.

We demonstrate how describing graph algorithms using the Gather-Apply-Scatter (GAS) approach of GraphLab allows us to implement a general purpose and extremely fast GPU based framework for describing and running graph algorithms. Most algorithms and graphs demonstrate a large speedup over GraphLab. We show that speedup is possible when using multiple GPUs within a box and that processing of large graphs is possible - with the latest Tesla cards over 48GB of GPU memory can be available within a single box. Example algorithms will include pagerank, bfs, and sssp. The precursor to this work serves as the basis for other attempts at a GPU-based GAS framework.

Session Level: Intermediate
Session Type: Talk
Tags: Big Data Analytics & Data Algorithms; Performance Optimization

Day: Tuesday, 03/25
Time: 17:00 - 17:25
Location: Room 210B

S4716 - Interactive 3D Data Visualization of 700 GB

Tom-Michael Thamm ( Director, Software Product Management Advanced Rendering, NVIDIA )
Tom-Michael Thamm
Mr. Thamm is Director for Software Product Management at NVIDIA Advanced Rendering Center (ARC) in Berlin Germany and is responsible for all software products, such as NVIDIA mental ray, NVIDIA Iray, and NVIDIA IndeX. He is managing and coordinating with his team the customer support as well as the general product definition and positioning. Mr. Thamm is working for NVIDIA ARC and before for mental images for over 20 years. He has led several key software projects and products, such as the new NVIDIA index product for geo-spatial visualization of huge datasets. He has studied Mathematics.
Jörg Mensmann ( Senior Graphics Software Engineer, NVIDIA IndeX, NVIDIA )
Joerg Mensmann is a Senior Graphics Software Engineer at NVIDIA, with a focus on large-scale and distributed volume rendering. Prior to joining NVIDIA, he was a member of the research staff in the Visualization and Computer Graphics group at the University of Munster, Germany, where he helped build a flexible visualization framework for medical volume data. He holds a Diplom degree and a PhD in Computer Science from the University of Munster.

Technical presentation of the latest version of NVIDIA IndeX (TM) with the emphasis of large volumetric data visualization. IndeX is a scalable GPU based software framework which renders high-quality images with interactive frame-rates.

Session Level: All
Session Type: Talk
Tags: Rendering & Animation; Big Data Analytics & Data Algorithms; Clusters & GPU Management; Combined Simulation & Real-Time Visualization

Day: Tuesday, 03/25
Time: 17:00 - 17:25
Location: Room LL20B

S4338 - The Energy Case for Graph Processing on Hybrid CPU and GPU Systems

Elizeu Santos-Neto ( Ph.D. Student, Electrical & Computer Engineering, University of British Columbia )
Elizeu Santos-Neto
Elizeu's research focuses on the characterization and design of online peer production systems such as peer-to-peer networks and collaborative tagging communities. Prior to joining UBC, he received a B.Sc. in Computer Science from the Universidade Federal de Alagoas and a M.Sc. in Computer Science from the Universidade Federal de Campina Grande, in Brazil.

This work reports on a power and performance analysis of large-scale graph processing on hybrid (i.e., CPU and GPU), single-node systems. While, on these systems, graph processing can be accelerated by properly mapping the graph-layout such that the algorithmic tasks exercise each of the processing units where they perform best; GPUs have much higher TDP thus their impact on overall energy consumption is unclear. An evaluation on large real-world graphs as well as on synthetic graphs as large as 1 billion vertices and 16 billion edges shows that efficiency - in terms of both performance and power, can be achieved.

Session Level: Intermediate
Session Type: Talk
Tags: Big Data Analytics & Data Algorithms; Energy Exploration

Day: Tuesday, 03/25
Time: 17:30 - 17:55
Location: Room 210B

S4488 - GooFit: Massively parallel likelihood fitting using GPUs

Rolf Andreassen ( Postdoctoral Fellow, University of Cincinnati )
Rolf Andreassen
Dr. Andreassen's research interests lie mainly in the creation of tools for physics analysis. He began developing software for HEP purposes as a summer student at CERN, writing a C++ API for the ISAJET event-generation package. As a Master's and Ph.D. student he wrote custom modules for analysing BABAR data, and later a fast simulation of the DIRC component of the SuperB experiment. Dr. Andreassen is the designer and lead developer of the GooFit fitting package, and has given courses at the University of Cincinnati and at CERN in CUDA programming and use of GooFit. He is involved with the QuarkNet outreach program, bringing high-school students and teachers to the university to gain experience with HEP theory and research.

We present the GooFit maximum likelihood fit framework which has been develop to run effectively on general purpose graphics processing units (GPUs) to enable next generation experimental high energy physics (HEP) research. Most analyses of data from HEP experiments use maximum likelihood fits. Some of today's analyses use fits which require more than 24 hours on traditional multi-core systems. The next generation of experiments will require computing power two orders of magnitude greater for analyses which are sensitive to New Physics. Our GooFit framework, which has been demonstrated to run on nVidia GPU devices ranging from high end Teslas to laptop GeForce GTs, uses CUDA and the Thrust library to massively parallelize the per-event probability calculation. For realistic physics fits we achieve speedups, relative to executing the same algorithm on a single CPU, of several hundred.

Session Level: Intermediate
Session Type: Talk
Tags: Computational Physics; Numerical Algorithms & Libraries; Big Data Analytics & Data Algorithms

Day: Tuesday, 03/25
Time: 17:30 - 17:55
Location: Room 212A

S4706 - A GPU-Based Computational Framework for Large-Scale Critical Infrastructure Mapping Using Satellite Imagery

Dilip Patlolla ( R & D Staff, Oak Ridge National Laboratory )
Dilip Patlolla
Dilip Patlolla is Research staff member in the Geographic Information Science and Technology (GIST) Group at the Oak Ridge National Laboratory.He leads the development of Large-Scale Critical Infrastructure Mapping using advanced computing methods. His primary responsibilities include: opening up new domains of application for HPC, FPGAs, GPUs by researching and developing computing algorithms, and ensuring best possible performance on current and next-generation architectures. Dilip received his MS from the University of Tennessee, Knoxville and is the recipient of ORNL's 2013 Significant Event Award.

Assessing and monitoring critical infrastructures from space is a cost effective and efficient solution. Satellite images are now available with spatial resolutions and acquisition rates to enable image driven large-scale mapping and monitoring of critical infrastructure a viable possibility. However, processing huge volume of high spatial resolution imagery is not a trivial task. Often solutions require advanced algorithms capable of extracting, representing, modeling, and interpreting scene features that characterize the spatial, structural, and semantic attributes. Furthermore, these solutions should be scalable enabling analysis of big image datasets; at half-meter pixel resolution the earth's surface has roughly 600 Trillion pixels and the requirement to process at this scale at repeated intervals demands highly scalable solutions. In this research, we present a GPU-based computational framework designed for identifying critical infrastructures from large-scale satellite or aerial imagery to assess vulnerable population.

Session Level: All
Session Type: Talk
Tags: Defense; Video & Image Processing; Supercomputing; Big Data Analytics & Data Algorithms; Recommended Press Session – HPC-Science

Day: Tuesday, 03/25
Time: 17:30 - 17:55
Location: Room 210D

S4494 - Preliminary Work on Fast Radix-Based k-NN MultiSselect on the GPU

Roshan D'Souza ( Associate Professor, University of Wisconsin - Milwaukee )
Roshan D'Souza
Roshan D'Souza is an associate professor in the Mechanical Engineering Dept. at the University of Wisconsin-Milwaukee. He obtained a PhD in Mechanical Engineering in 2003 from the University of California, Berkeley. His research interests include parallel algorithms for Monte-Carlo type algorithms with applications in Systems Biology and parallel image processing,

In this presentation we describe an efficient multi-level parallel implementation of the most significant bit (MSB) radix sort-based multi-select algorithm (k-NN). Our implementation processes multiple queries within a single kernel call with each thread block/warp simultaneously processing different queries. Our approach is incremental and reduces memory transactions through the use of bit operators, warp voting functions, and shared memory. Benchmarks show significant improvement for over previous implementation of k-NN search on the GPU.

Session Level: Intermediate
Session Type: Talk
Tags: Machine Learning & AI; Big Data Analytics & Data Algorithms

Day: Wednesday, 03/26
Time: 09:00 - 09:25
Location: Room LL21D

S4691 - Map-D: A GPU Database for Interactive Big Data Analytics

Thomas Graham ( Co-founder, Map-D )
Thomas  Graham
Before co-founding Map-D, Tom was a researcher at Harvard Law School where he focused on the intersection between social networks, big data and law reform. His research centered around privacy and the development of social science methodologies that allow legal scholars, governments and interest groups to more effectively incorporate social network data into their decision-making processes. Tom lived in China for many years where he studied Chinese and dabbled in Chinese cooking and calligraphy. Tom is admitted to the New York Bar and was previously an attorney with Davis Polk in Hong Kong, where he focused on capital markets and M&A across Asia's emerging markets. He is also admitted to practice law in Australia. Tom holds a LLM from Harvard Law School and a LLB, BA and Dip. Languages from Melbourne University.
Todd Mostak ( Co-founder, Map-D )
Todd Mostak
Todd was a researcher at MIT CSAIL, where he worked in the database group, before co-founding Map-D. Seeking adventure upon finishing his undergrad, Todd moved to the Middle East, spending two years in Syria and Egypt teaching English, studying Arabic and eventually working as a translator for an Egyptian newspaper. He then completed his MA in Middle East Studies at Harvard University, afterwards taking a position as a Research Fellow at Harvard’s Kennedy School of Government, focusing on the analysis of Islamism using forum and social media datasets. The impetus to build Map-D came from how slow he found conventional GIS tools to spatially aggregate and analyze large Twitter datasets.

Interactive big data analytics: Using GPUs to power the world's fastest database. As part of an emerging conversation between HPC and enterprise, this talk will focus on the future of high performance big data analytics from enterprise, government and scientific perspectives while tracking the challenges posed by data collection, hardware integration and interface design. But there is more at stake than data-drive cost savings, these perspectives are framed by the need to socialize and democratize high-power big data analytics to the advantage of all. Map-D is an ultra-fast GPU database that allows anyone to interactively analyze and visualize big data in real time. Built into GPU memory, Map-D's unique architecture runs 70-1000X faster than other in-memory databases and big data analytics platforms. We will also showcase Map-D's public demos, including TweetMap that maps over 1 billion tweets in real time and Campaign Finance Map that unravels the influence of money on political discourse over time.

Session Level: All
Session Type: Talk
Tags: Big Data Analytics & Data Algorithms; Combined Simulation & Real-Time Visualization; Supercomputing; Large Scale Data Visualization & In-Situ Graphics; Recommended Press Session – HPC-Science

Day: Wednesday, 03/26
Time: 09:00 - 09:50
Location: Room 210B

S4222 - Red Fox: An Execution Environment for Relational Query Processing on GPUs

Haicheng Wu ( Ph.D. Student, Georgia Institute of Technology )
Haicheng Wu
Haicheng Wu is a Ph.D. student in the Computer Architecture and Systems Lab (CASL) at Georgia Institute of Technology under direction of Professor Sudhakar Yalamanchili. He received his B.S. in Electrical Engineering (EE) from Shanghai Jiao Tong University in 2006 and his M.S. in Electrical and Computer Engineering (ECE) from Georgia Institute of Technology in 2009. His research project is developing compiler tool chains for heterogeneous architectures with a focus on GPU-based systems. Specifically, He is developing a compiler, Red Fox, for accelerating large scale data warehousing applications on cloud architectures augmented with GPU accelerators. He received NVIDIA Graduate Fellowship twice (2012~2014).

This session will present the Red Fox system. Attendees will leave understanding GPU performance when executing relational queries over large data sets as typically found in data warehousing applications and the automatic compilation flow of kernel fusion which can be applied to other applications.

Session Level: Beginner
Session Type: Talk
Tags: Big Data Analytics & Data Algorithms; Programming Languages & Compilers

Day: Wednesday, 03/26
Time: 10:00 - 10:25
Location: Room 210B

S4331 - Fast and Easy GPU Offloading for Computational Finance

Lukasz Mendakiewicz ( Software Development Engineer in Test II, Microsoft Corp )
Lukasz Mendakiewicz
Łukasz Mendakiewicz is a software engineer at Microsoft, where he focuses on the customer experience with parallel programming models for C++. He is especially interested in GPGPU acceleration, and puts this passion to work on C++ AMP. He holds an M.S. in Computer Science from AGH UST in Krakow, Poland with the thesis on implementing real-time global illumination algorithms on a GPU.

This session provides insight on how to obtain superior performance for computational finance workloads without compromising developer productivity. C++ AMP technology lets you write C++ STL like code that runs on GPUs (and CPUs) in a platform (Windows and Linux) and vendor agnostic manner. The session will start with an overview of C++ AMP, dive into C++ AMP features, list various compilers that support C++ AMP and showcase the performance characteristics of options pricing workloads written using C++ AMP code. Attend this talk to see how you can write productive and easy to maintain code that offers superior performance. Thereby delivering the ability to write productivity code once and exploit the hardware to its fullest.

Session Level: Intermediate
Session Type: Talk
Tags: Finance; Big Data Analytics & Data Algorithms; Programming Languages & Compilers

Day: Wednesday, 03/26
Time: 10:00 - 10:25
Location: Room 210C

S4572 - An Elegantly Simple Design Pattern for Building Multi-GPU Applications

Bob Zigon ( Sr Staff Research Engineeer, Beckman Coulter )
Highly-Rated Speaker
Bob Zigon
Bob Zigon is a Sr Staff Research Engineer and has worked at Beckman Coulter for 11 years. He has degrees in Computer Science and Mathematics from Purdue University. He was the architect of Kaluza, an NVIDIA Tesla powered analysis application for flow cytometry. He's now working in particle characterization and analytical ultracentrifugation. His interests include high performance computing, numerical analysis and software development for life science.

GPU-based applications can be architected in different ways. The simplest approach will have a client application that is tightly coupled to a single GPU. The second approach can have a client application that is tightly coupled to multi GPU's by way of operating system threads and GPU contexts. Finally, in scientific computing, a common pattern is to use MPI, multiple Intel cores and multiple GPU's that work cooperatively to solve a fixed problem. This session will describe a design pattern that loosely couples a client application to a collection of GPU's by way of a public domain "data structure server" called Redis. The approach works well for fat client and thin client based applications. The compelling aspects of the approach are 1) the ease of debugging and 2) the ease with which multiple GPU's can be added to handle increased user load.

Session Level: Intermediate
Session Type: Talk
Tags: Performance Optimization; Big Data Analytics & Data Algorithms

Day: Wednesday, 03/26
Time: 10:00 - 10:25
Location: Room 212B

S4353 - Terrestrial 3D Mapping with Parallel Computing Approach

Janusz Bedkowski ( Researcher, Institute of Mathematical Machines )
From 2006-2012 Janusz was a Researcher in Industrial Research Institute of Automation and Measurements, Warsaw, Poland. Prior to this, Janusz was a Researcher and lecturer in Institute of Automation and Robotics at Warsaw University of Technology, Warsaw, Poland and Researcher in Institute of Mathematical Machines, Warsaw, Poland. Janusz's research interests are parallel computing in CUDA for robotic applications and creating simulation tools for mobile robot operators' training.

This work concerns the parallel implementation of 3D mapping algorithm. Several nearest neighborhood search strategies are compared. The accuracy of final 3D mapping is evaluated with geodetic precision. This work can be used in several applications such as mobile robotics and spatial design. Attendees will learn how to choose proper nearest neighbors search strategy for 3D data registration, how to build accurate 3D maps, how to evaluate 3D mapping system with geodetic precision and what the influence of parallel programming is to performance and accuracy.

Session Level: Intermediate
Session Type: Talk
Tags: Computer Vision; Big Data Analytics & Data Algorithms

Day: Wednesday, 03/26
Time: 10:30 - 10:55
Location: Room 210B

S4811 - Extreme Machine Learning with GPUs

John Canny ( Professor, UC Berkeley )
John Canny
John Canny is a professor in computer science at UC Berkeley. He is an ACM dissertation award winner and a Packard Fellow. Since 2002, he has been developing and deploying large-scale behavioral modeling systems. He designed and protyped production systems for Overstock.com, Yahoo, Ebay, and Quantcast. He also data mines for a hobby, and led the Netflix contest for a couple of months in 2006. Other recent projects include sensorless sensing: sensing stress from mouse and mobile phone data, computer-mediated learning and analysis of MOOC data.

BIDMach is an open-source library for GPU-accelerated machine learning. BIDMach on a single GPU node exceeds the performance of all other tools (including **cluster** systems on hundreds of nodes) for the most common machine learning tasks. BIDMach is an easy-to-use, interactive environment similar to SciPy/Matlab, but with qualitatively higher performance. The session will discuss: Performance: BIDMach follows a "Lapack" philosophy of building high-level algorithms on fast low-level routines (like BLAS). It exploits the unique hardware features of GPUs to provide more than order-of-magnitude gains over alternatives. Accuracy: Monte-Carlo methods (MCMC) are the most general way to derive models, but are slow. We have developed a new approach to MCMC which provides two orders of magnitude speedup beyond hardware gains. Our "cooled" MCMC is fast and improves model accuracy. Interactivity: We are developing interactive modeling/visualization capabilities in BIDMach to allow analysts to guide, correct, and improve models in real time.

Session Level: All
Session Type: Talk
Tags: Big Data Analytics & Data Algorithms; Machine Learning & AI; Scientific Visualization; Bioinformatics & Genomics

Day: Wednesday, 03/26
Time: 10:30 - 10:55
Location: Room LL21F

S4201 - GPU Acceleration of Sparse Matrix Factorization in CHOLMOD

Steven Rennich ( Senior HPC Developer Technology Engineer, NVIDIA )
Highly-Rated Speaker
Steven Rennich
Steven Rennich is a Sr. NVIDIA HPC Developer Technology Engineer. His primary activities include promoting the use of GPUs in computational structural mechanics and the development and optimization of parallel algorithms for direct and iterative solvers for sparse linear systems. Steve holds a Ph.D. in Aeronautics and Astronautics from Stanford University where his research involved computational fluid mechanics and vortex system instabilities. Prior to joining Nvidia, Steve spent many years parallelizing structural analysis and rigid body dynamics codes.
Tim Davis ( Professor, University of Florida )
Tim Davis
Tim Davis is a professor in Computer and Information Science and Engineering at the University of Florida. He is a Fellow of the Society of Industrial and Applied Mathematics (SIAM), in recognition for his work on sparse matrix algorithms. His software for sparse direct methods appears in 100s of applications in industry, academia, and government labs, including MATLAB (x=A), Mathematica, NASTRAN, Cadence, Mentor Graphics, Google Ceres (StreetView, PhotoTours), IBM, Berkeley Design Automation, Xyce, and many others. For a full CV, see http://www.cise.ufl.edu/~davis/background.html .

Sparse direct solvers, and their requisite factorization step, are a critical component of computational engineering and science codes. High performance is typically achieved by reducing the sparse problem to dense sub-problems and applying dense math kernels. However, achieving high performance on a GPU is complicated due to the range of sizes of the dense sub-problems, irregular memory access patterns, and the limited communication bandwidth between the host system and the GPU. This talk will describe the high factorization performance achieved in CHOLMOD using the GPU and discuss in detail key techniques used to achieve this performance including minimizing communication and maximizing concurrency.

Session Level: Intermediate
Session Type: Talk
Tags: Numerical Algorithms & Libraries; Big Data Analytics & Data Algorithms; Computational Structural Mechanics

Day: Wednesday, 03/26
Time: 14:00 - 14:50
Location: Room LL20D

S4507 - Evaluation of Parallel Hashing Techniques

Rajesh Bordawekar ( Research Staff Member, IBM T. J. Watson Research Center )
Rajesh Bordawekar
Dr. Rajesh Bordawekar is a research staff member at the IBM T. J. Watson Research Center at Yorktown Heights, NY. Rajesh Bordawekar received his M.S and Ph.D. in Computer Engineering in 1993 and 1996, respectively.

This presentation will cover techniques for implementing hashing functions on the GPU. We will describe various parallel implementations of hashing techniques, e.g., cuckoo hashing, Partitioned Hashing, Bin-Hash, bloom filters, etc. and then present different ways of implementing these functions on the GPU, with emphasis on data structures that exploit GPU's data parallel features as well as memory constraints.

Session Level: Intermediate
Session Type: Talk
Tags: Big Data Analytics & Data Algorithms; Programming Languages & Compilers

Day: Wednesday, 03/26
Time: 14:00 - 14:25
Location: Room 210B

S4395 - Real-Time Quantification Filters for Multidimensional Databases

Peter Strohm ( Software Developer, Jedox AG )
Peter Strohm
Peter Strohm obtained his diploma in Computer Science from the University of Freiburg, Germany, in 2008. After that he joined the Inline Processing Team at Fraunhofer Institute for Physical Measurement Techniques IPM, Freiburg, as a software developer for parallel real-time applications. Since 2013, he has been with Jedox as a GPU developer.

Learn how GPUs can speed up real-time calculation of advanced multidimensional data filters required in data analytics and business intelligence applications. We present the design of a massively parallel "quantification" algorithm which, given a set of dimensional elements, returns all those elements for which ANY (or ALL) numeric cells in the respective slice of a user-defined subcube satisfy a given condition. Such filters are especially useful for the exploration of big data spaces, for zero-suppression in large views, or for top-k analyses. In addition to the main algorithmic aspects, attendees will see how our implementation solves challenges such as economic utilization of the CUDA memory hierarchy or minimization of threading conflicts in parallel hashing.

Session Level: Intermediate
Session Type: Talk
Tags: Big Data Analytics & Data Algorithms; Finance

Day: Wednesday, 03/26
Time: 14:30 - 14:55
Location: Room 210B

S4526 - GPU Accelerated Genomics Data Compression

BingQiang Wang ( Manager of Technical Computing, BGI )
BingQiang Wang
BingQiang Wang completed his doctorate in computational chemistry at East China University of Science and Technology (ECUST), China in 2006. Since March 2005, he was research scientist at Shanghai Supercomputing Center, dedicated to high performance computing enabled research in computational chemistry and life science. In March 2010 he joined BGI as head of high performance computing, to develop solutions for emerging life science challenges. He also served as adjunct assistant professor at Chinese University of Hong Kong (CUHK) for first half of 2012.

A review of existing compression algorithms is made, plus characteristics of genomics data (format). Then a general GPU-accelerated compression framework is introduced, featuring 1) adaptive compression scheme tuning, 2) optimized, GPU-accelerated compression algorithms, and 3) column-major storage. This approach fully exploit similarity within individual columns in popular genomics data formats, by using appropriate compression scheme (combination of algorithms), then GPU is employed to speedup compression / decompression thus several folds faster bandwidth.

Session Level: Intermediate
Session Type: Talk
Tags: Bioinformatics & Genomics; Big Data Analytics & Data Algorithms

Day: Wednesday, 03/26
Time: 15:00 - 15:25
Location: Room LL21D

S4644 - Getting Big Data Done On a GPU-Based Database

Ori Netzer ( VP, Product Management , SQream Technologies )
As the VP of Product Management for SQream Technologies, Ori is responsible for mapping the company's product road map. Ori is a Big Data thought leader and his main goal is to share my views with other professionals within the industry.

We will provide an in-depth analysis of our in production, GPU based technology for Big Data analytics, highlighting how our database benefits teleco companies. We will do this by explaining the key-features of our technology, mentioning that our database provides close to real-time analytics and provides up to 100X faster insights all in a very cost-effective manner. We will elaborate on these features and more in order to provide a clear understanding of how our technology works and why it is beneficial for teleco companies.

Session Level: All
Session Type: Talk
Tags: Big Data Analytics & Data Algorithms

Day: Wednesday, 03/26
Time: 15:00 - 15:25
Location: Room 210B

S4471 - High Speed Analysis of Big Data Using NVIDIA GPUs and Hadoop

Partha Sen ( COO, Fuzzy Logix )
Partha Sen
Partha has a passion for solving complex business problems using quantitative methods, data mining and pattern recognition. For a period of about 12 years from 1995 to 2007, Partha pursued this passion as a hobby and developed about 100 algorithms and over 700 quantitative models. These algorithms and models are the basis for the solutions being implemented by Fuzzy Logix today. Before founding Fuzzy Logix, Partha worked at Bank of America where he held senior management positions in the commercial and investment bank and in the portfolio strategies group. In the commercial and investment bank, Partha led the initiative to build a quantitative model driven credit rating methodology for the entire commercial loan portfolio. The methodology is used by the bank for allocating reserves against potential losses from loans. In the portfolio strategies group, Partha led a team to devise various strategies for effectively hedging the credit risk for the bank's commercial loan portfolio and for minimizing the impact of mark-to-market volatility of the portfolio of hedging instruments (Credit Default Swaps, Credit Default Swaptions, and CDS Indexes). Partha was also responsible for managing the Quantitative Management Associate Program at Bank of America. This is a two-year associate development program which has groomed over 75 quantitative managers within the enterprise. Prior to working at Bank of America, Partha held managerial positions at Ernst and Young and Tata Consultancy Services. He has a Bachelor of Engineering, with a major in computer science and a minor in mathematics from the Indian Institute of Technology. He also has an MBA from Wake Forest University.

Performing analytics on data stored in Hadoop can be time consuming. While Hadoop is great at ingesting and storing data, getting timely insight out of the data can be difficult which reduces effectiveness and time-to-action. The use of NVIDIA GPUs to accelerate analytics on Hadoop is an optimal solution that drives high price to performance benefits. In this session, we'll demonstrate a solution using NVIDIA GPUs for the analysis of big data in Hadoop. The demo will show how you can leverage the Hadoop file system, it's map reduce architecture and GPUs to run computationally intense models bringing together both data and computational parallelism. Methods demonstrated will include classification techniques such as decision trees, logistic regression and support vector machines and clustering techniques like k means, fuzzy k means and hierarchical k means on marketing, social and digital media data.

Session Level: Intermediate
Session Type: Talk
Tags: Big Data Analytics & Data Algorithms; Finance; Bioinformatics & Genomics; Recommended Press Session – HPC-Science

Day: Wednesday, 03/26
Time: 15:30 - 15:55
Location: Room 210B

S4628 - BWT Indexing: Big Data from Next Generation Sequencing and GPU

Jeanno Cheung ( Research Engineer, HKU-BGI Bioinformatics Algorithms and Core Technology Laboratory )
Jeanno Cheung
Jeanno Cheung is a research engineer in HKU-BGI Bioinformatics Algorithms and Core Technology Laboratory. He works on bioinformatics applications and other projects that utilize parallel computing platforms such as CUDA and MIC.

With the rapid improvement in DNA sequencing technologies, huge amounts of sequencing data can be produced in a time-and-cost efficient manner (e.g., it costs only a few thousand US dollars to produce 100 Gigabases in a day). Compressed full-text indexing based on BWT has found to be very useful in speeding up the analysis of the high-throughput sequencing data. In this talk we consider two major problems in this context, namely, alignment of sequencing data onto a reference genome (for genetic variations detection), and indexing of sequencing data. These two problems have different applications and different technical challenges. We show how GPU can be exploited to achieve tremendous improvement in each case. In particular, our alignment solution makes it feasible to conduct NGS analysis even in the time-critical clinical environment; for example, 30+ fold whole genome sequencing data of human (~100 Gigabases) can be aligned and analyzed in a few hours, with sensitivity and accuracy even higher than before.

Session Level: Intermediate
Session Type: Talk
Tags: Bioinformatics & Genomics; Big Data Analytics & Data Algorithms

Day: Wednesday, 03/26
Time: 15:30 - 15:55
Location: Room LL21D

S4169 - Accelerate Distributed Data Mining with Graphics Processing Units

Nam-Luc Tran ( R&D Engineer, EURA NOVA )
Nam-Luc Tran
Graduated in 2010 from the ULB in the department of biomedical engineering, Nam-Luc is currently a R&D engineer at EURA NOVA, a private research company. Nam-Luc has led many research projects so far in the fields involved in Big Data such as storage, distributed processing and architecture, with multiple collaborations from the ULB and the UCL.

Numerous distributed processing models have emerged, driven by (1) the growth in volumes of available data and (2) the need for precise and rapid analytics. The most famous representative of this category is undoubtedly MapReduce, however, other more flexible models exist based on the DFG processing model. None of the existing frameworks however have considered the case when the individual processing nodes are equipped with GPUs to accelerate parallel computations. In this talk, we discuss this challenge and the implications of the presence of GPUs on some of the processing nodes on the DFG model representation of such heterogeneous jobs and on the scheduling of the jobs, with big data mining as principal use case.

Session Level: Intermediate
Session Type: Talk
Tags: Big Data Analytics & Data Algorithms

Day: Wednesday, 03/26
Time: 16:00 - 16:25
Location: Room 210B

S4536 - An Approach to Parallel Processing of Big Data in Finance for Alpha Generation and Risk Management

Yigal Jhirad ( Head of Quantitative Strategies and Risk Management , Cohen & Steers )
Yigal  Jhirad
Yigal D. Jhirad, Senior Vice President, is Director of Quantitative Strategies and a Portfolio Manager for Cohen & Steers’ options and real assets strategies. Mr. Jhirad heads the firm’s Investment Risk Committee. He has 26 years of experience. Prior to joining the firm in 2007, Mr. Jhirad was an executive director in the institutional equities division of Morgan Stanley, where he headed the company’s portfolio and derivatives strategies effort. He was responsible for developing, implementing and marketing quantitative and derivatives products to a broad array of institutional clients, including hedge funds, active and passive funds, pension funds and endowments. Mr. Jhirad holds a BS from the Wharton School. He is a Financial Risk Manager (FRM), as Certified by the Global Association of Risk Professionals. He is based in New York.
Blay Tarnoff ( Senior Software Engineer, Cohen & Steers )
Blay Tarnoff
Blay Tarnoff is a senior applications developer and database architect. He specializes in array programming and database design and development. He has developed equity and derivatives applications for program trading, proprietary trading, quantitative strategy, and risk management. He is currently a consultant at Cohen & Steers and was previously at Morgan Stanley.

This session discusses the convergence of parallel processing and big data in finance as the next step in evolution of risk management and trading systems. We advocate a risk management approach in finance should evolve from more traditional inter-day top down metrics to intra-day bottom up approach using signal generation and pattern recognition. We have also determined that parallel processing is a key tool to absorb greater insights into market patterns providing "trading DNA" and more effective tools to manage risk in real time.

Session Level: All
Session Type: Talk
Tags: Finance; Big Data Analytics & Data Algorithms; Numerical Algorithms & Libraries

Day: Wednesday, 03/26
Time: 16:00 - 16:50
Location: Room 210C

S4836 - Merging ADAS and Infotainment to Create a Connected, Cloud-Enhanced, Vehicle Safety System

Roger Lanctot ( Associate director, Global Automotive Practice, Strategy Analytics )
Roger Lanctot has a powerful voice in the definition of future trends in automotive safety and infotainment systems. Roger draws on 25 years' experience in the technology industry as an analyst, journalist and consultant. Roger has conducted and participated in major industry studies, created new research products and services, and advised clients on strategy and competitive issues throughout his career. Roger is a prolific writer and blogger and is a frequent featured speaker at industry events on such topics as driver distraction, smartphone connectivity, customer relationship management and usage-based insurance. He has an AB in English from Dartmouth College.

The barriers between in-vehicle infotainment systems and safety systems are falling and off-board connections are proliferating. The combination of these two trends is enabling entirely new user experiences in connected vehicles while setting the stage for new revenue opportunities. The challenge for car makers and their suppliers is to build big data opportunities upon consumer usage, vehicle diagnostic and environmental data. Leveraging vehicle data will create new value propositions, mitigate driver distraction and help drivers avoid accidents and, eventually, deliver autonomous driving. But justifying and implementing connectivity is still in its earliest phases. This session will help define the combination of applications, services, content and enabling technologies that will speed enhanced safety to the market.

Session Level: All
Session Type: Talk
Tags: Automotive; Big Data Analytics & Data Algorithms; In-Vehicle Infotainment (IVI) & Safety

Day: Wednesday, 03/26
Time: 16:00 - 16:25
Location: Room 210A

S4249 - Histograms in CUDA: Privatized for Fast, Level Performance

Nicholas Wilt ( Author, The CUDA Handbook )
Nicholas Wilt
Nicholas Wilt has been programming professionally for more than twenty-five years in a variety of areas, including industrial machine vision, graphics, and low-level multimedia software. While at Microsoft, he served as the development lead for Direct3D 5.0 and 6.0, built the prototype for the Desktop Window Manager, and did early GPU computing work. At NVIDIA, he worked on CUDA from its inception, designing and often implementing most of CUDA’s low-level abstractions. Now at Amazon, Mr. Wilt is working on cloud computing technologies relating to GPUs.

Histograms are an important statistical tool with a wide variety of applications, especially in image processing. Naive CUDA implementations suffer from low performance on degenerate input data due to contention. This presentation will show how to use "privatized" (per-thread) histograms to balance performance of the average case against data-dependent performance of degenerate cases.

Session Level: Intermediate
Session Type: Talk
Tags: Big Data Analytics & Data Algorithms; Video & Image Processing

Day: Wednesday, 03/26
Time: 16:30 - 16:55
Location: Room 210B

S4265 - RDMA GPU Direct for the Fusion-io ioDrive

Robert Wipfel ( Fellow, Fusion-io )
Robert started his career at a parallel processing startup, and then at INMOS, worked on a distributed operating system for the Transputer. Robert next helped Unisys and Intel jointly enter the commercial parallel processing market. He worked on single system image Unix and Oracle Parallel Server. At Novell Robert was an architect or engineering lead for various Data Center products that integrated clustering, virtualization, and network storage. His work on management products combined web-scale automation, process orchestration and a federated CMDB to create IT as a Service. Robert joined Fusion-io as an architect and helped the company deliver its second generation ioMemory product line. He is presently chief architect for the ION Data Accelerator all-Flash SCSI storage appliance.

Learn how to eliminate I/O bottlenecks by integrating Fusion-io's ioDrive Flash storage into your GPU applications. The first part of this session is a technical overview of Fusion-io's PCIe attached ioDrive. The second part presents developer best practices and tuning for GPU applications using ioDrive based storage. Topics will cover threading, pipe-lining, and data path acceleration via RDMA GPU Direct. Demos and example code showing integration between RDMA GPU Direct and Fusion-io's ioDrive will be given.

Session Level: Intermediate
Session Type: Talk
Tags: Performance Optimization; Finance; Big Data Analytics & Data Algorithms

Day: Wednesday, 03/26
Time: 16:30 - 17:20
Location: Room 212B

S4534 - A High-Speed 2-Opt TSP Solver for Large Problem Sizes

Martin Burtscher ( Associate Professor, Texas State University )
Highly-Rated Speaker
Martin Burtscher
Martin Burtscher is Associate Professor in the Department of Computer Science at Texas State University. He received the BS/MS degree from ETH Zurich and the Ph.D. degree from the University of Colorado at Boulder. Martin's research interests include efficient parallelization of programs for GPUs, performance assessment and optimization, and high-speed data compression. He is a senior member of the IEEE, its Computer Society, and the ACM. Martin has co-authored over 75 peer-reviewed publications, including a book chapter in NVIDIA's GPU Computing Gems, is the recipient of an NVIDIA Academic Partnership award, and is the PI of a CUDA Teaching Center.

Learn how to process large program inputs at shared-memory speeds on the example of a 2-opt TSP solver. Our implementation employs interesting code optimizations such as biasing results to avoid computation, inverting loops to enable coalescing and tiling, introducing non-determinism to avoid synchronization, and parallelizing each operation rather than across operations to minimize thread divergence and drastically lower the latency of result production. The final code evaluates 68.8 billion moves per second on a single Titan GPU.

Session Level: Advanced
Session Type: Talk
Tags: Big Data Analytics & Data Algorithms; Supercomputing; Programming Languages & Compilers

Day: Wednesday, 03/26
Time: 17:00 - 17:25
Location: Room 210B

S4553 - Productive Programming with Descriptive Data: Efficient Mesh-Based Algorithm Development in EAVL

Jeremy Meredith ( Senior Research Scientist, Oak Ridge National Laboratory )
Jeremy Meredith is a senior research scientist in the Future Technologies Group at Oak Ridge National Laboratory, where his research interests include emerging computing architectures and large-scale visualization and analysis. He is a recipient of the 2008 ACM Gordon Bell Prize and a 2005 R&D100 Award. Jeremy received his MS in Computer Science from Stanford University and his BS from the University of Illinois at Urbana-Champaign. Jeremy is part of the Keeneland Heterogeneous Computing Project and the NVIDIA CUDA Center of Excellence at the Georgia Institute of Technology.

Learn about the data-parallel programming model in EAVL and how it can be used to write efficient mesh-based algorithms for multi-core and many-core devices. EAVL, the Extreme-scale Analysis and Visualization Library, contains a flexible scientific data model and targets future high performance computing ecosystems. This talk shows how a productive programming API built upon an efficient data model can help algorithm developers achieve high performance with little code. Discussions will include examples and lessons learned.

Session Level: Intermediate
Session Type: Talk
Tags: Big Data Analytics & Data Algorithms; Scientific Visualization

Day: Wednesday, 03/26
Time: 17:30 - 17:55
Location: Room 210B

S4603 - GPU-Accelerated Algorithms in Bioinformatics and Data Mining

Bertil Schmidt ( Professor, Johannes Gutenberg University Mainz )
Bertil Schmidt
Bertil Schmidt is a tenured full Professor of Computer Science at JGU Mainz. He heads the „Parallel and Distributed Architectures“ group, which is highly active in the area of designing, implementing and evaluating new scalable methods and parallelized software for bioinformatics, with a particular focus on NGS. Based on the pioneering work on using GPUs for bioinformatics, the group was awarded a “CUDA Research Center” and a “CUDA Teaching Center” by NVIDIA. For the paper "MSA-CUDA: "Multiple Sequence Alignment on Graphics Processing Units with CUDA“, the group was awarded the best paper award at prestigious IEEE ASAP 2009 Conference in Boston.

The development of scalable algorithms and tools is of high importance to bioinformatics and data mining. In this session, you will learn about the efficient usage of CUDA to accelerate prominent algorithms in both areas. In particular, GPU-acceleration of the following methods will be discussed: (1) Smith-Waterman algorithm on Kepler (CUDASW++ 3.0) compared to an equivalent Xeon Phi implementation (SWAPHI); (2) Short read aligment (CUSHAW2-GPU and CUSHAW3); (3) Clustering of protein structures; (4) Alignment of time series with a Dynamic Time Warp inspired similarity measure; and (5) an effective scalable clustering algorithm for large data sets that builds upon the concept of divide-and-conquer.

Session Level: Intermediate
Session Type: Talk
Tags: Bioinformatics & Genomics; Big Data Analytics & Data Algorithms

Day: Wednesday, 03/26
Time: 17:30 - 17:55
Location: Room LL21D

S4934 - Signal Processing Libraries for High Performance Embedded Computing (Presented by GE)

David Tetley ( Senior Technology Software Manager, GE Intelligent Platforms )
David is the Engineering Manager for GE Intelligent Platform's High Performance Embedded Computing centre of excellence. His team works closely with customers to develop integrated CPU and GPU platforms for signal and image processing applications for the military and aerospace market. He is also responsible for GE's AXIS software libraries and graphical tools.He graduated from the University of Bath in the UK with a B.Eng in Electronic and Communication engineering.

High Performance Embedded Computing (HPEC) is bringing hardware found in the Top500 supercomputers into the embedded market space. This is leading to Linux clusters consisting of a mixture of CPUs and GPUs being deployed to tackle signal and image processing applications such as those found on Intelligence, Surveillance and Reconnaissance (ISR) platforms. Developers, whilst wanting to take advantage of the potential performance advantage of GPUs want to keep their code architecture agnostic so it can be ported to other hardware platforms without significant re-design. Whilst CUDA and OpenCL are emerging to offer this capability at a lower programing level, the industry standard Vector Signal and Image Processing API provides a higher level of abstraction with over 600 signal processing and vector math functions to choose from. This enables developers to build portable signal processing algorithms that can be targeted at either the CPU or GPU with no source code changes. This session provides an overview of the VSIPL standard and demonstrates the portability between CPU and GPU platforms.

Session Level: All
Session Type: Talk
Tags: Signal & Audio Processing; Numerical Algorithms & Libraries; Performance Optimization; Big Data Analytics & Data Algorithms

Day: Wednesday, 03/26
Time: 17:30 - 17:55
Location: Room 212B

S4223 - Real-Time Imaging in Radio-Astronomy :A Fully GPU-Based Imager

Sanjay Bhatnagar ( Scientist, National Radio Astronomy Observatory )
Dr. Sanjay Bhatnagar is a Scientist at the National Radio Astronomy Observatory (NRAO), USA and is the Chief Scientist for the observatory Algorithms R&D Group. He holds a PhD in Physics and Astrophysics and worked at the Tata Institute of Fundamental Research (TIFR) in Inida before joining NRAO. His research interests are in the area of imaging and calibration algorithms for radio astronomical imaging and in use of them in observational radio astrophysical research. He has worked extensively on related Big Data and High Performance Computing issues.
Pradeep Kumar Gupta ( Sr. System Software Engineer, NVIDIA )
Pradeep Kumar Gupta
Pradeep Gupta is a Compute Developer Technology Engineer at NVIDIA, where he supports developers with HPC and CUDA application development and optimization, and works to enable the GPU computing ecosystem in various universities and research labs across India and south East Asia. Before joining NVIDIA, Pradeep worked on various technologies including the Cell architecture and programming, MPI, OpenMP, and green data center technologies. Pradeep received a master's degree in research from the Indian Institute of Science (IISc), Bangalore. His research focused on developing compute-efficient algorithms for image denoising and inpainting using transform domains.

We are implementing a fully GPU-based imager for radio interferometric imaging for high sensitivity near real-time imaging. Modern interferometric radio telescope generated many Tera Bytes of data per observation which needs to be imaged in near-real time. Imaging software running on conventional computers currently take many orders of magnitude longer for imaging. In this presentation, we will briefly describe the algorithms and describe in more detail their adaptation for GPUs in particular and for heterogeneous computing in general. We will discuss the resulting run-time performance on the GPU using deal data from existing radio telescopes. Test with our current implementation show a speed-up of upto 100x compared to CPU implementation in the critical parts of processing enabling us to reduce the memory footprint by replacing compute-and-cache with on-demand computing on the GPU. For scientific use cases requiring high resolution high sensitivity imaging such a GPU-based imager represents an enabler technology.

Session Level: Beginner
Session Type: Talk
Tags: Astronomy & Astrophysics; Big Data Analytics & Data Algorithms

Day: Thursday, 03/27
Time: 09:00 - 09:50
Location: Room LL21F

S4345 - Optical Character Recognition with GPUs: Document Processing Throughput Increased by a Magnitude

Jeremy Reed ( Research Assistant, University of Kentucky )
Jeremy Reed
Jeremy Reed received a B.S. degree in computer science from Centre College in 2001. He is currently pursuing his Ph.D. degree at the University of Kentucky and is advised by Dr. Raphael Finkel. He has worked in a variety of software development roles over the past 15 years and is currently employed as a software architect and research assistant. His research interests include artificial intelligence, optical character recognition and software engineering.

Learn how an OCR engine, built from scratch for the GPU, enables businesses to turn document images into searchable, editable text several orders of magnitude faster than is possible with currently available commercial software. Several case studies will be presented outlining the cost and technical benefits and use cases of the technology before diving deeper into the technical details of the software itself. A demo of the software will also be given.

Session Level: Beginner
Session Type: Talk
Tags: Video & Image Processing; Big Data Analytics & Data Algorithms

Day: Thursday, 03/27
Time: 09:00 - 09:25
Location: Room 210E

S4583 - Middleware Framework Approach for BigData Analytics Using GPGPU

Ettikan Kandasamy Karuppiah ( Principal Researcher , MIMOS Bhd )
Ettikan KK, (Ph.D in the area of Distributed Computing) is the Principal Researcher and Head of Accelerative Technology Lab of ICT Division @MIMOS. Current research interest includes Big/Media Data Processing, Multi-processors, GPGPU & FPGA and Network Processing. Previously he was attached with Panasonic R&D {Panasonic Corporate Research Arm} as Principal Engineer and Group Manager of Panasonic Kuala Lumpur Lab with R&D responsibility in IP, AV, distributed and embedded communications protocols in the Home Networking/Network Processing products. Prior to Panasonic, he was with Intel Communication Group responsible for Network Processor related R&D. He has numerous international patents, publication and directly involved in world commercial product developments in those organizations. (ettikan.org)

Current application of GPU processors for parallel computing tasks shows excellent results in terms of speed-ups compared to CPU processors. However, there is no existing middleware framework that enables automatic distribution of data and processing across heterogeneous computing resources for structured and unstructured BigData applications. Thus, we propose a middleware framework for 'Big Data' analytics that provides mechanisms for automatic data segmentation, distribution, execution, information retrieval across multiple cards (CPU & GPU) and machines, a modular design for easy addition of new GPU kernels at both analytic and processing layer, and information presentation. The architecture and components of the framework such as multi-card data distribution and execution, data structures for efficient memory ac-cess, algorithms for parallel GPU computation and results for various test con-figurations are shown. Our results show proposed middleware framework pro-vides alternative and cheaper HPC solution to users.

Session Level: Intermediate
Session Type: Talk
Tags: Big Data Analytics & Data Algorithms; Video & Image Processing; Finance

Day: Thursday, 03/27
Time: 09:00 - 09:25
Location: Room 210B

S4360 - Monte Carlo Calibration to Implied Volatility Surface: A New Computational Paradigm

Chuan-Hsiang Han ( Associate Professor, National Tsing-Hua University )
Chuan-Hsiang Han
Chuan-Hsiang is: Associate Professor. Department Quantitative Finance, National Tsing-Hua University, Taiwan; Adjunct Associate Professor. Department of Mathematics, National Taiwan University; Co-director of NVIDIA-NTHU Joint Lab on Computational Finance; Software Developer of Volatility Information Platform (VIP) and Director of Taiwan Financial Engineers and Traders. Association

This presentation offers a new possibility that Monte Carlo simulation is capable of fast solving the calibration problem of implied volatility surfaces. Dimension separation and standard error reduction constitute the two-stage procedure. The first stage aims to reduce dimensionality of the solving optimization problem by utilizing the Fourier transform representation of the volatility dynamics. The second stage provides a high performance computing paradigm for option pricing by standard error reduction. GPU a parallel accelerating device drastically increases the total number of simulations in addition to variance reduction algorithms. In virtue of its flexibility, this two-stage Monte Carlo method is applied to estimate various volatility models such as hybrid models and multiscale stochastic volatility models.

Session Level: Intermediate
Session Type: Talk
Tags: Finance; Big Data Analytics & Data Algorithms

Day: Thursday, 03/27
Time: 09:30 - 09:55
Location: Room 210C

S4483 - Recursive Interaction Probability: A New Paradigm in Parallel Data Processing

Richard Heyns ( Founder and CEO, brytlyt )
Richard Heyns
Richard is the CEO and founder of brytlyt limited. Richard's background is a BSc in Electo-Mechanical Engineering from the University of Cape Town (1995). He has since worked mainly in the Business Intelligence space. He has most recently worked on Big Data solutions for large retailers like Kroger in the USA and Tesco in the UK. Richard's passion is the interaction of exotic hardware, cool software and ingenious algorithms.

This session will describe Recursive Interaction Probability (RIP) and why it is a pretty cool algorithm. Time will be spent on benchmark analysis against other algorithms as well as performance within an operational database. The presentation will end with how RIP was implemented on a NVIDIA Kepler K20c, the design choices and how these affect performance. Use cases that play to the strengths of RIP as well as use cases that reveal its weaknesses will also be shared.

Session Level: Beginner
Session Type: Talk
Tags: Big Data Analytics & Data Algorithms; Numerical Algorithms & Libraries; Clusters & GPU Management

Day: Thursday, 03/27
Time: 09:30 - 09:55
Location: Room 210B

S4245 - Increasing Mass Spectrometer Sensitivity 20x with Real-Time GPU Processing

Evan Hauck ( Signal Processing Co-Op Member, LECO Corporation )
Evan Hauck
Evan Hauck is a high school senior who spends an hour each day at high school. The rest of his time is split between a Math/Science Center, Andrews University, and a co-op at LECO Corporation. Evan is an accomplished C# and GPU programmer who is at ease with all things parallel. Evan won an Intel Excellence in Computer Science Award in 2011 for his work in recreating the experiments of Comte de Buffon, claimed 1st place with his teammates in the National Institute of Aerospace's NASA Engineering Design Challenge in 2012, and presented four GPGPU talks in 2013 at CodeMash, GRDevDay, Microsoft, and ThatConference.

The monitoring of waste water for dioxins is important because these compounds are extremely toxic. One possible way to detect dioxins is by an analytical instrument called a Gas Chromatograph / Mass Spectrometer. This session summarizes our research aimed at increasing the sensitivity of a commercially available time-of-flight mass spectrometer without sacrificing resolution, mass range, or acquisition rate. In brief, we configured the mass spectrometer to pulse ions into the flight tube 20 times faster than originally designed, causing more ions to strike the detector per unit time, increasing sensitivity. However, because lighter, faster ions from one pulse overtake heaver ions from a previous pulse, the resulting mass spectra are severely intertwined, or multiplexed. Our work included developing a demultiplexing algorithm, which computes the theoretical source spectrum from the multiplexed data. Because the instrument generates 1.2GB/s, we designed and coded all algorithms for execution on a GTX Titan.

Session Level: Beginner
Session Type: Talk
Tags: Computational Physics; Big Data Analytics & Data Algorithms

Day: Thursday, 03/27
Time: 14:00 - 14:25
Location: Room 212A

S4502 - Training Random Forests on the GPU: Genomic Implications on HIV Susceptibility

Mark Seligman ( Principal, Rapidics LLC )
Mark's original training centered on mathematics and formal logic. For roughly twenty years, Mark developed compiler back ends for manufacturers of supercomputers and high-performance processors. For most of the last decade, Mark's work has focused on acceleration of algorithms used in bioinformatics, statistics and machine-learning.

The Random Forest (trademarked) algorithm is a powerful, versatile tool in machine learning. It consists of a training pass which builds a tree-based predictive model from a sample data set, followed by a tree-walking pass to generate predictions for new data. Recent efforts at acceleration have focused on the independence of both the construction, and walking, of distinct trees using, for example, multi-CPU and Hadoop-based approaches. Here, by contrast, we report progress in parallelizing the construction of individual trees themselves using the GPU. This enables the algorithm to treat very wide data sets, such as those common in genomic studies, in times significantly shorter than have been reported before now. This also makes practical iterative invocation and enables, for example, reweighted and variational applications of the algorithm. We demonstrate recent results on studies of HIV-susceptibility in subjects from Sub-Saharan Africa.

Session Level: Intermediate
Session Type: Talk
Tags: Bioinformatics & Genomics; Big Data Analytics & Data Algorithms; Machine Learning & AI; Supercomputing

Day: Thursday, 03/27
Time: 14:30 - 14:55
Location: Room 210H

S4682 - Developing a System For Real-Time Numerical Simulation During Physical Experiments in a Wave Propagation Laboratory

Darren Schmidt ( Numerical Computing Specialist, National Instruments )
Darren Schmidt
Darren Schmidt has worked for National Instruments in Austin, TX for almost two decades serving as a computation expert on a wide array of products and authoring several patents across multiple computational math domains. Currently, he works in NI's Scientific Research and Lead User Group defining, developing and deploying cutting edge systems for big analog data and large physics applications. These real-world applications demand use of a broad range of (co-)processor technologies in time-constrained situations for which he has amassed a great deal of intuition and experience.

ETH-Zurich is proposing a new concept for wave propagation laboratories in which the physical experiment is linked with a numerical simulation in real time. Adding live experimental data to a larger numerical simulation domain creates a virtual lab environment never before realized and enabling the study of frequencies inherent in important seismological and acoustic real-world scenarios. The resulting environment is made possible by a real-time computing system under development. This system must perform computations typically reserved for traditional (offline) HPC applications but produce results in a matter of microseconds. To do so, National Instruments is using the LabVIEW platform to leverage NI's fastest data acquisition and FPGA hardware with NVIDIA's most powerful GPU processors to build a real-time heterogenous simulator.

Session Level: Intermediate
Session Type: Talk
Tags: Climate, Weather, Ocean Modeling; Big Data Analytics & Data Algorithms; Signal & Audio Processing; Numerical Algorithms & Libraries

Day: Thursday, 03/27
Time: 15:00 - 15:25
Location: Room 212B

S4149 - Efficient Computation of Radial Distribution Function on GPUs: Algorithm Design and Optimization

Yicheng Tu ( Associate Professor, University of South Florida )
Yicheng Tu
Yicheng Tu received a Bachelor's degree in horticulture from Beijing Agricultural University, China, and MS and PhD degrees in computer science from Purdue University (2003; 2007). He is currently an associate professor in the Department of Computer Science & Engineering at the University of South Florida, Tampa, Florida. His research is in energy-efficient database systems, scientific data management, high performance computing and data stream systems. Yicheng is a recipient of the NSF CAREER award (2013) and a member of ACM, IEEE, and ASEE.
Anand Kumar ( Graduate Student, University of South Florida )
Anand Kumar received BE degree in Computer Science & Engineering from Visvesvaraya Technological University, Belgaum, India and MS degree in Computer Science from IIIT Hyderabad, India. He is currently a senior PhD student in the Department of Computer Science & Engineering at the University of South Florida, Tampa, Florida. His interests are in management of big data, data compression, GPU computing and privacy in queries.

The radial distribution function (RDF) is a fundamental tool in validation and analysis of particle simulation data. Computation of RDF is a very time expensive process. It may take days or even months to process moderate size data points (millions) on CPU. We present an efficient technique to compute RDF on GPUs, which takes advantage of shared memory, registers, and special instructions. Recent GPU architectures support shuffle instruction that can be used to share data between threads, via registers. We exploit these features of the new architecture to improve performance of the RDF algorithm. Further, we present benefits of using different GPU optimization techniques to improve the performance. Effect of algorithm behavior on the speedup is also presented in detail with the help of examples.

Session Level: Intermediate
Session Type: Talk
Tags: Computational Physics; Big Data Analytics & Data Algorithms; Molecular Dynamics

Day: Thursday, 03/27
Time: 15:30 - 15:55
Location: Room 212A

S4424 - Hybrid Clustering Algorithms for Degenerate Primer Development on the GPU

Trevor Cickovski ( Assistant Professor of Computer Science, Eckerd College )
Trevor Cickovski
Trevor Cickovski received his Ph.D. from the University of Notre Dame in 2008. He worked for the Laboratory for Computational Life Sciences at Notre Dame and is currently an Assistant Professor of computer science in the Natural Sciences Collegium at Eckerd College. His areas of research include GPU computing, computational biology, and programming languages.

Analyzing portions of a genome during Polymerase Chain Reaction (PCR) analysis requires construction of a primer sequence that complementary to the flanking regions of a target sequence, producing multiple copies of that portion of the genome. When analyzing multiple related genomes the primer must be degenerate, containing an amount of uncertainty that we must minimize. We use graphics processing units (GPUs) to analyze the performance of a parallelized hierarchical clustering algorithm for grouping related genomes prior to degenerate primer construction, and also hybridize this algorithm with strategies from K-Means and Fuzzy C-Means. We demonstrate an order of magnitude improvement when running these algorithms on nearly one thousand sequences of more than seven thousand nucleotides from the human genome.

Session Level: Beginner
Session Type: Talk
Tags: Bioinformatics & Genomics; Big Data Analytics & Data Algorithms

Day: Thursday, 03/27
Time: 16:00 - 16:25
Location: Room 210H

S4538 - Real-Time RFI Rejection Techniques for the GMRT Using GPUs

Rohini Joshi ( Student, Drexel University )
Rohini Joshi
Rohini Joshi is currently a graduate student in the Electrical Dept at Drexel University. Rohini graduated from the University of Pune, India in 2011 and from then on was working on optimizing software pipelines using GPUs at the National Centre for Radio Astrophysics. Her research interests are in the domain of signal and image processing.

Radio frequency interference (RFI) is the primary enemy of sensitive multi element radio instruments like the Giant Metrewave Radio Telescope (GMRT, India). Signals from radio receivers are corrupted with RFI from power lines, satellite signals, etc. Seen in the form of spikes and bursts in raw voltage data, RFI is statistically seen as outliers in a Gaussian distribution. We present an approach to tackle the problem of RFI, in real-time, using a robust scale estimator such as the Median Absolute Deviation (MAD). Given the large data rate from each of the 30 antennas, sampled at 16 ns, it is necessary for the filter to work well within real-time limits. To accomplish this, the algorithm has been ported to the GPUs to work within the GMRT pipeline. Presently, the RFI rejection pipeline runs in real-time for 0.3-0.7 sec long data chunks. The GMRT will soon be upgraded to work at 10 times the current data rate. We are now working on improving the algorithm further so as to have the RFI rejection pipeline ready for the upgraded GMRT.

Session Level: Intermediate
Session Type: Talk
Tags: Astronomy & Astrophysics; Big Data Analytics & Data Algorithms; Signal & Audio Processing

Day: Thursday, 03/27
Time: 16:00 - 16:25
Location: Room LL21F

S4850 - Explore Dell Wyse Datacenter's Graphics Options for Virtual Desktop Computing (Presented by Dell)

Gary Radburn ( Head of Workstation Virtualization, Dell, Inc. )
Gary Radburn currently heads up the Workstation Virtualization initiatives globally at Dell Inc. With over 25 years experience in the technology industry, ranging from Engineering to Sales and Marketing, Gary has had experience of all aspects of designing successful products and solutions and bringing them to market. He has worked for companies such as Digital Equipment, 3Com and most recently (for the past 13 years) at Dell. Originating from the UK, where he managed the OptiPlex client business for EMEA, he went on to lead the Workstation Solutions team in the US and then championed graphics virtualization for Engineering applications. This has now become one of the most talked about topics from workstation customers today and will be for the foreseeable future. He still retains his English accent thankfully assisted by the presence of BBC America on TV.

In this session we will explore the various options Dell supports in its Dell Wyse DataCenter solution offerings. This session will describe various platform offerings, such as the PowerEdge 720, PowerEdge C8220x and Precision 7610 with the various graphics cards options. In addition, we will discuss the solution offerings around VMware View, Citrix XenDeskop, and Microsoft with Dell vWorkspace. Lastly, we will detail the capabilities of those solutions offerings with various hypervisors, such as VMware vSphere, Citrix XenServer, and Microsoft Windows 2012. This will provide attendees with an overall view of what Dell can offer, giving customers multiple options that they can pick from.

Session Level: Beginner
Session Type: Talk
Tags: Graphics Virtualization Summit; Big Data Analytics & Data Algorithms

Day: Thursday, 03/27
Time: 16:00 - 16:50
Location: Room 210F

Talk
 

TUTORIAL

Presentation
Details

S4713 - Session 4: Deploying Your CUDA Applications Into The Wild (Presented by ArrayFire)

Umar Arshad ( Senior Software Engineer, CUDA Training Specialist, ArrayFire )
Umar Arshad is an engineer at ArrayFire where he primarily works on improving concurrency in ArrayFire and applications using ArrayFire. He also created the CUDA and OpenCL Optimization training material and regularly gives tutorials throughout the country. Before joining ArrayFire Umar was a developer at Inovalon where he was involved with improving performance and designing large scale applications. Umar has graduated from Georgia State University with a Masters degree in Computer Science. At GSU, he studied Parallel programming and was the Program Chair of ACM in the University.

Excited about CUDA but concerned about deployment? In this session, you will learn best practices for deploying your CUDA application and about how to resolve issues that commonly arise in the process. You will learn about scaling your application to multiple GPUs to handle large amounts of data (such as streams and/or files on disk). You will also learn about deploying your CUDA based applications in the cloud using Node.js, containers via Docker, etc.

Session Level: Intermediate
Session Type: Tutorial
Tags: Numerical Algorithms & Libraries; Clusters & GPU Management; Big Data Analytics & Data Algorithms; Mobile Applications

Day: Monday, 03/24
Time: 14:30 - 15:50
Location: Room 210B

Tutorial
 

SPECIAL EVENT

Presentation
Details

S4941 - Hangout: Big Data Analytics & Machine Learning

Sharan Chetlur ( CUDA Software Engineer, NVIDIA )
Alon Shalev Housfater ( Software Developer, IBM )
Martin Peniak ( Cognitive Robotics Researcher, CUDA Lecturer, Plymouth University )
John Tran ( Senior Research Scientist, NVIDIA )

Connect with NVIDIA engineers, devtechs and invited experts and get answers to all your burning questions.

Session Level: All
Session Type: Special Event
Tags: Big Data Analytics & Data Algorithms; Machine Learning & AI

Day: Monday, 03/24
Time: 14:00 - 15:50
Location: Concourse Pod C

S4942 - Hangout: Big Data Analytics & Machine Learning

Greg Diamos ( Software Engineer, NVIDIA )
Alon Shalev Housfater ( Software Developer, IBM )
Ren Wu ( Distinguished Scientist, Baidu )
John Tran ( Senior Research Scientist, NVIDIA )
Bryan Catanzaro ( Senior Research Scientist, NVIDIA )

Connect with NVIDIA engineers, devtechs and invited experts and get answers to all your burning questions.

Session Level: All
Session Type: Special Event
Tags: Big Data Analytics & Data Algorithms; Machine Learning & AI

Day: Tuesday, 03/25
Time: 11:00 - 12:50
Location: Concourse Pod C

S4965 - Hangout: GTC Speakers

Vishal Vaidyanathan ( Partner, Royal Caliber )
Nicholas Wilt ( Author, The CUDA Handbook )
Jonathan Rogers ( Assistant Professor, Georgia Institute of Technology )
Mohit Gupta ( Staff Sottware Engineer, Life Technologies )
Highly-Rated Speaker

to come

Session Level: All
Session Type: Special Event
Tags: Bioinformatics & Genomics; Machine Learning & AI; Big Data Analytics & Data Algorithms

Day: Tuesday, 03/25
Time: 13:00 - 14:00
Location: Concourse Pod C

S4955 - Hangout: GTC Speakers

Martin Burtscher ( Associate Professor, Texas State University-San Marcos )
Highly-Rated Speaker
H. Carter Edwards ( Principle Member of Technical Staff, Sandia National Laboratories )
Highly-Rated Speaker
Vijay Pande ( Professor and Director, Stanford University )
Highly-Rated Speaker

Session Level: All
Session Type: Special Event
Tags: Supercomputing; Molecular Dynamics; Big Data Analytics & Data Algorithms

Day: Wednesday, 03/26
Time: 13:00 - 14:00
Location: Concourse Pod B

Special event