Sign In
GTC Logo
GPU
Technology
Conference

April 4-7, 2016 | Silicon Valley
Check back often for session updates.

Scheduler Planner

Print
Download Pdf
 

 
Refine:
  • Session Levels:
  • |
  • |
  • |
  • |
  • Session Levels:
  •  
  •  
  •  
  •  
  • = Highly Rated Speaker

TALK

Presentation
Details

S6422 - Enhancing Visual Realism of Mixed Reality Applications with Stereo Vision

Azzam Edwin CTO, Stereolabs
Edwin Azzam co-founded STEREOLABS in 2010. As STEREOLABS's Chief Technical Officer, Edwin is responsible for leading the company's product development and technology strategy in stereo vision. Prior to founding STEREOLABS, Edwin was a project manager at Astrium Space Transportation, Paris.Edwin holds a Master's degree in Optics & Image Processing from Institut d'Optique, France, as well as a Master's degree in Management from ESSEC Business School. He is a PhD supervisor and a National Technical Expert for the ANR (National Research Agency), where he uses his technical and market expertise for the assessment of national research projects in the field of computer vision and 3D image processing.

Discover how stereo vision and 3D depth sensing on GPU enable the development of mixed reality applications, which merge virtual information into a live 3D video stream of the real world. We will discuss the various stages of a real-time mixed reality processing pipeline, and how NVIDIA's GPU acceleration is integral to every step of the pipeline. We will also show demonstrations of how stereo depth sensing can be used to create 3D virtual playgrounds and real-time augmentation of the environment.

Level: All
Type: Talk
Tags: Computer Vision & Machine Vision; Virtual Reality & Augmented Reality; Video & Image Processing; Embedded

Day: Tuesday, 04/05
Time: 13:00 - 13:25
Location: Room 210F

S6384 - NVIDIA CUDA® for Mobile

Yogesh Kini Manager, CUDA System Software, NVIDIA
Yogesh Kini manages the Tegra CUDA driver team at NVIDIA. For last four years he has been working towards enabling GPU compute software on different Tegra platforms. His team is responsible for the CUDA API and system software on various embedded, automotive, and mobile platforms based on Tegra SOC. He holds a B.S. from Manipal Institute of Technology, India.

This session is about a few important use-cases in mobile that can be accelerated using CUDA. Use-cases include image processing, camera output post-processing, and real-time texture compression in graphics applications. Attendees will learn that: [1] Tegra has unified memory architecture that can be utilized by applications to reduce total memory usage and power consumption.The use-case presented demonstrates effective use of UVM on Tegra. [2] CUDA provides means to take inputs from a camera via EGLImage and EGLStreams interoperability. This can be used to post-process camera images using CUDA. The example presented demonstrates use of these CUDA API. [3] CUDA provides API for interoperability with OpenGL-ES.Texture compression in a graphics application is demonstrated as an example.

Level: Intermediate
Type: Talk
Tags: Computer Vision & Machine Vision; Video & Image Processing; Tools & Libraries

Day: Tuesday, 04/05
Time: 14:00 - 14:25
Location: Room 210F

S6126 - GPU-Enabled Pavement Defect Detection: Looking for Cracks with Thousands of Eyes

Kristina Doycheva Research Assistant, Ruhr-University Bochum, Germany
Kristina Doycheva is pursuing her Ph.D. at the Ruhr-University and is working as a research assistant at the Chair of Computing in Engineering, Department of Civil Engineering. She received her M.S. in applied informatics at the Ruhr-University Bochum, Germany in 2013. Her research interests include high-performance image processing and machine learning. She is now working on a project related to pavement defect detection.

Learn how to use GPUs for pavement defect detection. In recent years, a variety of visual-based methods for pavement defect detection have been proposed. However, these methods process the images mostly offline, which results in a large amount of data being persistently stored. To enable real-time pavement distress detection, image pre-processing steps, such as nonuniform background illumination and noise removal, as well as pavement defect detection methods based on texture features and the wavelet transform, were implemented using GPUs. The achieved speed-up of the GPU implementation compared to a sequential implementation is approximately 10,000. The execution time allows for processing more than 600 images per second.

Level: Beginner
Type: Talk
Tags: Self-Driving Cars & Automotive ; Computer Vision & Machine Vision; Video & Image Processing

Day: Tuesday, 04/05
Time: 14:30 - 14:55
Location: Room LL21E

S6397 - Real-Time Non-Rigid Image Registration Engine

Randall Miles Senior Research Scientist, Propulsion Science and Technology
Dr. Randall Miles is a physicist, algorithm developer, and senior research scientist at Propulsion Science and Technology. He is lead designer and developer for model database development activities, and key contributor on a variety of projects, including quantum chemistry calculations and radar cross section modeling of CFD fields.

Non-rigid image registration, i.e., morphing, allows a smaller footprint of seed images to be used to create a smooth and continuously changing series of images. We'll present a new high-speed toolkit for image morphing implemented using NVIDIA GPU technology. Time improvements of ~80% were seen through implementing a succession of CUDA optimizations guided by the Nsight profiler results. Tests were conducted using available simulated rocket plume images to calculate run times and create performance measures.

Level: All
Type: Talk
Tags: Aerospace & Defense; Video & Image Processing; Performance Optimization; Computer Vision & Machine Vision

Day: Tuesday, 04/05
Time: 14:30 - 14:55
Location: Marriott Salon 2

S6314 - Java Image Processing: How Runtime Compilation Transforms Memory-Bound into Compute-Bound

Florent Duguet Founder, ALTIMESH
Florent Duguet founded Altimesh in 2008, in an effort to reduce the learning curve of GPU computing for high-level language developers.The outcome is the Hybridizer, which enables many-core computing in high-level programming environments such as dot net and java. Florent graduated with a Ph.D. in computer graphics in 2005. He has implemented solutions for financial services for oil and gas industries with a focus on GPGPU since 2007, starting from the proof of concept and leading up to production.

A wide variety of image processing algorithms are typically parallel. However, depending on filter-size or neighborhood search pattern, memory access is critical for performances. We'll show how loop reordering and memory locality fine-tuning help achieve best performance. Using Hybridizer to automate Java byte-code transformation to CUDA source code, and using new CUDA feature Run Time Compilation, we transformed execution from memory-bound to compute-bound. Applying this technique to oil and gas image processing algorithms results in interactive response time on production-size datasets.

Level: All
Type: Talk
Tags: Performance Optimization; Energy Exploration; Video & Image Processing

Day: Tuesday, 04/05
Time: 16:00 - 16:25
Location: Marriott Salon 1

S6446 - A Fully Automated, High Performance Geolocation Improvement Workflow for Problematic Imaging Systems

Devin White Senior Research Scientist, Oak Ridge National Laboratory
Dr. Devin White is a senior research scientist at Oak Ridge National Laboratory and is a subject matter expert in the areas of quantitative social science, modeling complex adaptive systems, social network analysis, high performance computing, tactical airborne and spaceborne geopositioning, uncertainty propagation and analysis, image science, computer vision, multimodal image registration, data fusion, data visualization, imaging spectroscopy, lidar, SAR, and Earth observing systems. Devin is also a joint faculty professor of anthropology at the University of Tennessee, Knoxville. He previously served as a lead scientist for Exelis Visual Information Solutions and a scientist at Integrity Applications Incorporated, supporting large commercial and government customers.
Sophie Voisin Geospatial Software Engineer, Oak Ridge National Laboratory
Dr. Sophie Voisin is an engineer at Oak Ridge National Laboratory developing high performance computing methods for geospatial data analysis for the GIST group. She received her Ph.D. in computer science and image processing from the Universite de Bourgogne (France) in 2008 and joined ORNL in 2010 to work on numerous image processing related projects, successively performing quantitative analysis of neutron 2D and 3D image data; developing new techniques for eye-gaze data analysis, for which she is a co-recipient of an R&D 100 award (2014); and now implementing multidimensional image processing algorithms on GPU platforms for high performance computing analysis of satellite imagery.

Learn how hybrid CPU-GPU parallelization is being used to support rapid improvement of the geolocation accuracy of imagery collected by multiple airborne and spaceborne platforms. A sensor-agnostic, plugin-based framework with CUDA-enabled workflows was built to support photogrammetric and computer-vision processing tasks like image registration and orthorectification. Leveraging the complementary strengths of multicore CPUs and multiple Tesla K80 GPUs on each compute node required significant custom development to achieve optimal performance. We dramatically reduced per-image processing time and can handle multiple data streams simultaneously. The science behind two workflows will be presented, along with their performance metrics while executing on both bare-metal and virtual machines.

Level: Intermediate
Type: Talk
Tags: Video & Image Processing; Supercomputing & HPC; Aerospace & Defense

Day: Tuesday, 04/05
Time: 16:00 - 16:50
Location: Room 210E

S6107 - Robust Model-Based 3D Head Pose Estimation

Shalini Gupta Senior Research Scientist, NVIDIA
Shalini Gupta has been a senior research scientist in the Mobile Visual Computing group of NVIDIA Research since April 2013. From 2011 to 2013, she worked as a senior mobile computer vision engineer at NVIDIA, where she designed and productized computer vision and computational photography solutions for mobile platforms and GPUs. She worked as an imaging and architecture scientist at Texas Instruments, from 2008 to 2010, where she designed algorithms for the image signal processing pipeline of mobile phones, at AT&T Laboratories on their IPTV project, and at Advanced Digital Imaging Research, LLC, where she designed algorithms for 3D human face recognition. Shalini received her M.S. and Ph.D. in electrical and computer engineering from the University of Texas at Austin in 2004 and 2008, respectively. She received a B.S. in electronics and electrical communication engineering from Punjab Engineering College, India, in 2002. She is a recipient of the Summer Research Fellowship 2001, awarded by the Jawaharlal Nehru Center for Advanced Scientific Research, Bangalore, India. Her primary research interests are image/signal processing, computer vision, and machine learning, and their application to scene understanding and interpretation.

Depth cameras have become cheap and ubiquitous. We introduce a computer vision algorithm for accurate, three-dimensional (3D) head pose (rotation and translation) estimation, which runs in near real time in CUDA. It works with different commodity depth sensors with minimal adaptation, handles large head rotations and occlusions gracefully, and does not require cumbersome subject initialization. Our algorithm results in an angular error of 2 degrees and a transnational error of 6 mm. It outperforms all seven competing methods on a benchmark data set. Accurate head pose estimation is an important fundamental problem in computer vision. It is a prerequisite for gaze estimation, facial animation capture, face recognition, driver monitoring, and head-coupled, 3D perspective displays.

Level: Intermediate
Type: Talk
Tags: Computer Vision & Machine Vision; Video & Image Processing; Intelligent Video Analytics (IVA)

Day: Wednesday, 04/06
Time: 09:00 - 09:25
Location: Room 210F

S6511 - Fast Compressed Sensing MRI Reconstruction Using Convolutional Sparse Coding on the GPU

Won-Ki Jeong Associate Professor, Ulsan National Institute of Science and Technology (UNIST)
Won-Ki Jeong is an associate professor in the School of Electrical and Computer Engineering at UNIST. Before joining UNIST, he was a research scientist in the Center for Brain Science at Harvard University. His research interests include scientific visualization, image processing, and general purpose computing on the graphics processor in the field of biomedical image analysis. He has published 22 peer-reviewed research articles, two book chapters (NVIDIA GPU Gems), and one international patent. He received a Ph.D. in computer science from the University of Utah in 2008, and was a member of the Scientific Computing and Imaging (SCI) institute. He is a recipient of the NVIDIA Graduate Fellowship in 2007, and is currently the princial investigator of the NVIDIA CUDA Research Center at UNIST.
Tran Minh Quan PhD Student, Ulsan National Institute of Science and Technology (UNIST)
Tran Minh Quan is a Ph.D. student in the School of Electrical and Computer Engineering at UNIST, Korea. His research interests are GPU computing and biomedical image processing. He received his B.S. in electrical engineering at KAIST, Korea.

Among the well-known machine learning methods such as deep neural network, convolutional sparse coding is fairly new in those data-driven approaches, when it requires some regularizers to approximate signals with a superposition of sparse feature maps that have been convolved by a collection of filters. We'll introduce a fast alternating method for reconstructing highly undersampled MRI data. The proposed solution leverages Fourier Convolution Theorem to accelerate the process of learning a set of filters and iteratively revise the MRI reconstruction based on the sparse codes found subsequently. Finally, we'll show that our method is faster with GPU supports and outperforms regular CPU implementation of the state-of-the art dictionary learning-based approaches.

Level: All
Type: Talk
Tags: Medical Imaging; Video & Image Processing; Deep Learning & Artificial Intelligence

Day: Wednesday, 04/06
Time: 10:30 - 10:55
Location: Room 212B

S6121 - GPU-Accelerated Computer Vision for Multimedia, Post Production and Surveillance

Hannes Fassold Senior Researcher, JOANNEUM RESEARCH
Hannes Fassold works at Joanneum Research, where he is a senior researcher at the Audiovisual Media Group of DIGITAL -- the Institute for Information and Communication Technologies. His main research interests are algorithms for digital film restoration, content-based video quality analysis, and the efficient parallelization of these algorithms on the GPU. He received an M.S. in applied mathematics from Graz University of Technology in 2004. He has published several publications in these fields and is the principal investigator for the CUDA Research Center at DIGITAL - Institute for Information and Communication Technologies, Joanneum Research.

Computer vision is at the core of many tools used in multimedia, post-production, and surveillance. We'll present some key computer vision algorithms for motion compensation, feature point extraction and tracking, SIFT descriptor extraction, and wavelet transform. We'll provide information about the significant speed-up we gained from porting these algorithms to the GPU and lessons learned from the process of porting. We'll give insight how these algorithms are used in several applications like real-time video quality analysis (detection of dropouts and noise level), brand visibility monitoring in broadcast content, film and video restoration (dust and dirt removal, noise reduction, etc.), and traffic monitoring for wrong-way driver detection.

Level: All
Type: Talk
Tags: Media & Entertainment; Computer Vision & Machine Vision; Video & Image Processing

Day: Wednesday, 04/06
Time: 14:00 - 14:25
Location: Room LL21C

S6526 - Beyond Standards: A New GPU-Aware Image Coding System

Pablo Enfedaque Ph.D. Student, Universitat Autònoma de Barcelona
Pablo Enfedaque is a third year Ph.D. student with the departments of Information and Communications Engineering (dEIC) and Computer Architectures and Operating Systems (CAOS), Universitat Autonoma de Barcelona, Spain. He received a B.E. in computer science and an M.S. in high performance computing and information theory in 2012 and 2013, respectively. Pablo has been working with GPU architectures since his final degree project. His research interests include image coding, high performance computing, and parallel architectures.

Discover a new image coding system devised to exploit massive parallelism in a GPU. Current standards for the compression of images lack the kind of parallelism required for efficient implementation in GPUs. Although much effort is made to implement such standards in CUDA, most implementations obtain poor results. This session describes the main insights behind the proposed image coding system. Our starting point was JPEG2000. The core mechanisms of the standard were redefined to allow the type of parallelism required in SIMT computing. All the advanced features of the system are preserved, but it is no longer compatible with the standard. Performance results will be given, comparing state-of-the-art CPU and GPU implementations of JPEG2000 with the proposed system.

Level: Intermediate
Type: Talk
Tags: Video & Image Processing; Performance Optimization; Signal & Audio Processing

Day: Wednesday, 04/06
Time: 14:00 - 14:25
Location: Room LL21B

S6153 - Fast Non-Rigid Registration for Mobile High-Dynamic Range Photography

Orazio Gallo Senior Research Scientist, NVIDIA
Orazio earned a M.S. degree in Biomedical Engineering from "Politecnico di Milano" (Italy). He then joined the Smith-Kettlewell Eye Research Institute, where he developed a novel bio-imaging technique capable of recording micrometric deformations of soft tissues. Subsequently, he joined the University of California at Santa Cruz, where he received a Ph.D. in Computer Engineering in 2011. During his studies in Santa Cruz, Orazio also interned at Canesta, Inc. (now acquired by Microsoft), and at the Nokia Research Center in Palo Alto. In September 2011, Orazio joined NVIDIA Research, where he currently works in the Mobile Visual Computing team. His interests span several areas of the fields of computer vision and computational photography. For a complete list of papers, including those published before joining NVIDIA research, see here. Orazio regularly serves on the program committees of the top computer vision and computational photography conferences (CVPR, ICCV, ICCP) and is an associate editor of the journal Signal Processing: Image Communication.

We present a method that leverages the computational power of GPUs to create a high-dynamic-range (HDR) photograph in the presence of camera motion and scene changes. Our approach is extremely fast and prevents the artifacts that arise from insufficient registration quality. Previous methods to address this problem are either accurate, but too slow for mobile devices, or fast, but prone to failing. As a comparison, our method runs in under 700ms on an NVIDIA-powered tablet for a pair of 5MP images, whereas previous state-of-the-art methods performing non-rigid registration take over a minute on desktops for a pair of 1MP images.

Level: Intermediate
Type: Talk
Tags: Video & Image Processing; Computer Vision & Machine Vision

Day: Wednesday, 04/06
Time: 14:30 - 14:55
Location: Room LL21B

S6238 - Realtime Raw Image and Video Processing on GPU

Fyodor Serzhenko CEO, Fastvideo
Fyodor is CEO of Fastvideo company. His research interests include high speed cameras and software for high speed imaging, high performance computing, GPU image processing for video applications. He was graduated from Moscow Institute of Physics and Technology in 1989 and got PhD in physics of semiconductors in 1993.

The goal of this session is to demonstrate how to achieve real time image and video processing for RAW data on GPU. In this session we will present detailed analysis of Fastvideo SDK for GPU image processing pipeline: RAW/DNG acquisition, preprocessing, demosaicing, denoising, color correction, tone mapping, resizing, sharpening, OpenGL output, compression to MJPEG and H.264. Now it could be done in real time on GPU for 4K RAW data.

Level: All
Type: Talk
Tags: Media & Entertainment; Video & Image Processing; Computer Vision & Machine Vision

Day: Wednesday, 04/06
Time: 14:30 - 14:55
Location: Room LL21C

S6691 - Multi-Source Fusion Using Deep Learning

Andrew Jenkins Senior Data Scientist, Digital Globe
Andrew Jenkins works for DigitalGlobe Inc. as a Senior Data Scientist focused on applying deep learning to multi-source spatial data such as satellite imagery, geo-tagged photos and videos. Andrew is currently a PhD candidate in the Department of Geography and Geographic Information Science at George Mason University. He holds a MS degree in Geoinformatics and a BS in Computer and Information Science. Andrew previously worked as a government researcher at the US Army Engineer Research and Development Center, and prior to that spent eight years in the military.

DigitalGlobe's satellite constellation collects millions of square kilometers of earth's imagery daily, yielding high resolution data of our planet. By employing DL algorithms & NVIDIA GPUs, DigitalGlobe processes imagery & detect objects at speeds orders of magnitude faster than ever before. Emergency responders require a multitude of information sources to support their mission. DigitalGlobe utilizes several methodologies of fusing disparate data sets together. Social media, weather, other sensor types (eg. RADAR/LIDAR) & Satellite Imagery can be fused together to help decision makers answer questions. By combining the data sets based on their location and common categories from the DL algorithms, emergency responders & analysts are able to automatically verify objects on the ground.

Level: Intermediate
Type: Talk
Tags: Deep Learning & Artificial Intelligence; Video & Image Processing; Aerospace & Defense

Day: Wednesday, 04/06
Time: 14:30 - 14:55
Location: Marriott Salon 2

S6197 - Improving High Performance Image Resizing and Rotation: A Case Study of Texture Options

Ismayil Guracar Senior Key Expert, Siemens Medical Solutions, Ultrasound Business Unit
Highly-Rated Speaker
Ismayil Guracar has been working in the ultrasound imaging field for over 29 years. He is a senior key expert with the Innovations Applications Group at Siemens Healthcare, Ultrasound Business Unit in Mountain View, Calif. His interests include ultrasound image formation and high-performance, real-time signal processing, especially using GPUs. He holds 68 U.S. patents, has pioneered new ultrasound technologies in the areas of parametric and molecular imaging, and has contributed to the development of many successful diagnostic medical imaging products.

We present a case study on the use of textures for image resizing and rotation using conventional bilinear and high-quality cubic interpolation filtering using various texturing options and data widths. Choices including CUDA arrays, pitched 2D arrays, or linear memory each offer benefits and drawbacks that depend on the particular demands and details of the application. We provide performance measurements from the latest Maxwell GPU architecture, which has a number of performance improving advances over previous generations. We hope to provide information and insight to CUDA developers and demonstrate some benchmarking and measurement techniques with Nsight so that informed choices can be made about how best to match texture image processing options to application requirements.

Level: Intermediate
Type: Talk
Tags: Performance Optimization; Video & Image Processing; Real-Time Graphics

Day: Wednesday, 04/06
Time: 15:00 - 15:25
Location: Room LL21B

S6698 - How GPU Helps to Power Next Generation of ArcVideo Video Products and Service

Jin Huang CTO, ArcVideo, Inc
Jin Huang is CTO of ArcVideo, a spin off software company from ArcSoft, Inc, and focus on providing video related solutions and service to Chinese Broadcasting and OTT customers. Working on multimedia areas for over ten years, including PC/Mobile and Server/Cloud business, Jin responses for enabling broadcasting level of video solutions with high performance, private/public cloud video SAAS services, and intelligent video content analytic products supporting millions of end users.

ArcVideo is a leading video solution company in China, provides video transcoding, video content analyzing, interactive video solution running on physical or virtual servers, private and public cloud for Broadcasting and OTT customers. We take fully advantage of Tesla and GRID GPU transcoding and generic CUDA capabilities, to accelerate video transcoding and post processing pipeline, enable Deep Learning training for fast video content recognition, and also private and public cloud video based services for content providers. The high performance of GPU bring ArcVideo next generation of video experience including VR and 4K HEVC broadcasting, and make real time video based interactive platform possible to support millions users.

Level: Intermediate
Type: Talk
Tags: Media & Entertainment; Video & Image Processing; Data Center & Cloud Computing

Day: Wednesday, 04/06
Time: 15:00 - 15:25
Location: Room LL21C

S6161 - Single Instruction Multiple Data for Computer Vision

Yen-Te Shih Sr. Compute Architect, NVIDIA
Yen-Te Shih works at NVIDIA on GPU architecture which runs computer vision algorithms and applications.

Attendees will learn how to: port f32 code to f16x2 version quickly; predict the performance; analyze the overhead; and design a tool or follow one SOP to directly translate existing f32 code to f16x2.

Level: Intermediate
Type: Talk
Tags: Performance Optimization; Computer Vision & Machine Vision; Video & Image Processing

Day: Wednesday, 04/06
Time: 15:30 - 15:55
Location: Room 212A

S6308 - Image Super-Resolution: From Sparse Coding to Deep Network

Zhaowen Wang Research Scientist, Adobe Systems Inc.
Zhaowen Wang is a research scientist in Adobe Systems, Inc. His research areas include image understanding and enhancement through machine learning algorithms, with a special interest in deep learning. Before joining Adobe, Zhaowen obtained a Ph.D. from the University of Illinois at Urbana Champaign in 2014.

Learn how to combine a conventional signal processing model with a deep neural network to achieve state-of-the-art performance for single-image super-resolution. Representing an image signal with its sparse coefficients has been proven as an effective prior for many image restoration problems including super-resolution. We design a deep convolutional network that mimics the sparse coding model and at the same time has the same advantage of end-to-end optimization as other deep learning models. By unifying the strengths of good image prior and large learning capacity, our method generates much better upscaling results than vanilla sparse coding and neural network in both visual and numerical quality. The learned network has a very compact size and can be implemented efficiently on a GPU.

Level: Intermediate
Type: Talk
Tags: Media & Entertainment; Video & Image Processing; Deep Learning & Artificial Intelligence

Day: Wednesday, 04/06
Time: 15:30 - 15:55
Location: Room LL21C

S6432 - Hand Gesture Recognition with 3D Convolutional Neural Networks

Pavlo Molchanov Research Scientist, NVIDIA
Pavlo Molchanov is with NVIDIA research science May 2015. He received BSc and MSc degrees, with distinction, in radio technical systems, devices, and complexes from National Aerospace University, Kharkov, Ukraine, in 2008 and 2010, respectively. He received PhD degree with Tampere University of Technology, Tampere, Finland in the area of signal processing.

This presentation will describe the design of a multi-resolution 3D convolutional neural network for drivers' hand gesture recognition. The talk will include task-specific data augmentation strategies that help to achieve state-of-the-art performance on a publicly available dataset. Several aspects of multi-sensor fusion with deep neural networks will be discussed in detail.

Level: Beginner
Type: Talk
Tags: Deep Learning & Artificial Intelligence; Video & Image Processing

Day: Wednesday, 04/06
Time: 15:30 - 15:55
Location: Room 210G

S6712 - GPU Powered Solutions in the Second Kaggle Data Science Bowl

Jon Barker Solution Architect, NVIDIA
Jon Barker joined NVIDIA in May 2015 as a Solution Architect. Since then he has been helping customers to design, implement and optimize a variety of GPU accelerated deep learning applications and has also provided internal and external deep learning training. Jon is particularly focused on the application of deep learning to problems in defense and national security. Jon graduated from the University of Southampton in the UK in 2007 with a PhD in Mathematics. Prior to joining NVIDIA, Jon worked for the UK Ministry of Defence and spent four years on secondment to the US Department of Defense where he was a research scientist focused on data analytics and machine learning for multi-sensor data streams. In order to learn new data science skills Jon has been a long time competitor on Kaggle.

The second annual Data Science Bowl was an online data science contest that took place in early 2016 and was hosted on the Kaggle platform. The objective of the contest was to develop an algorithm that could accurately estimate the volume of the left ventricle of a human heart at the points of maximum and minimum volume from a time-series of multiple cross sectional Magnetic Resonance Imaging (MRI) images of the heart. The contest provided thousands of MRI images to train an algorithm. The challenge was a natural fit for GPU accelerated deep learning (DL). We'll hear from some of the winning teams describe their approaches. The complexities of working with sometimes messy clinical data will be discussed and we will hear how deep learning can be applied to a time-series of 3D images.

Level: Beginner
Type: Talk
Tags: Medical Imaging; Deep Learning & Artificial Intelligence; Video & Image Processing

Day: Wednesday, 04/06
Time: 16:00 - 16:50
Location: Room LL21B

S6501 - Deep Correspondence Restricted Boltzmann Machine for Cross-Modal Retrieval

Ruifan Li Assistant Professor , Beijing University of Posts and Telecommunications
Ruifan Li is an assistant professor of computer science at Beijing University of Posts and Telecommunications, and affiliated with the Engineering Research Center of Information Networks, Ministry of Education. Ruifan received B.S. and M.S. degrees in control systems and in circuits and systems from Huazhong University of Science and Technology, China, in 1998 and 2001, respectively. In 2006, he received a Ph.D. in signal and information processing from BUPT and joined the School of Computer Science there. In 2011, he spent one year as a visiting scholar at the Information Sciences Institute, University of Southern California. Ruifan's current research activities include neural information processing, multimedia information processing, and statistical machine learning.

Learn how correspondence restricted Boltzmann machine (Corr-RBM), a kind of classical model in deep learning, is used for the task of large-scale cross-modal retrieval, such as using text query for images. We'll first illustrate the RBM model as one of the building blocks in deep learning. We'll describe the architecture of the Corr-RBM model and its learning algorithm. We construct two deep neural structures using Corr-RBM for the task of cross-modal retrieval. A number of comparison experiments with their hardware and software settings are reported on three public real-world datasets. We report the computational time using the NVIDIA Tesla K20c GPU card for the largest dataset used in our experiments.

Level: Intermediate
Type: Talk
Tags: Deep Learning & Artificial Intelligence; Algorithms; Video & Image Processing

Day: Thursday, 04/07
Time: 09:30 - 09:55
Location: Room 210D

S6657 - GPU-Based Real Time Reconstruction and Visualization of Cardiovascular 3D Ultrasound Images

Erik N. Steen Principal Engineer, GE Healthcare
Erik N. Steen is a principal engineer with GE VIngmed Ultrasound, responsible for technology strategy. He received his M.S. in computer science from the Norwegian Technical University in Trondheim in 1992. He received his Ph.D. in the field of 3D medical image processing in 1996. He has worked at GE Vingmed Ultrasound in Norway since 1996. During the last few years, he has been actively involved in development of a new GPU-based architecture for real-time ultrasound image reconstruction as well as real-time visualization of 3D cardiac images.

We'll cover the clinical and technical benefits of using GPUs for real-time reconstruction and visualization of 3-dimensional and 2-dimensional cardiovascular ultrasound images. The session has three main parts. First, we'll introduce some of the clinical challenges in cardiovascular ultrasound imaging. Second, we'll give an overview of a new image reconstruction architecture called cSound as well as some of the algorithms that have been implemented with this new architecture. The technical and clinical benefits of this architecture also will be discussed. Finally, we'll cover GPU-based real-time visualization of the reconstructed 3D images. Several examples of 2D and 3D ultrasound images will be shown.

Level: All
Type: Talk
Tags: Medical Imaging; Video & Image Processing

Day: Thursday, 04/07
Time: 09:30 - 09:55
Location: Room LL21B

S6549 - Real-Time Medical Imaging Using GPUs with a Non-Real-Time Operating System

Stefan Schneider Software Engineer, Siemens Healthcare
Stefan Schneider earned his associate engineer in computer systems and information in 2001 and his M.S. in computational engineering in 2006. He became a certified CUDA programmer in 2011. He joined Siemens Healthcare in 2001 and was responsible for 3D reconstruction on GPUs and mobile C-arm devices. In 2010, he became responsible for 2D image processing pipeline, implemented in CUDA, on many Siemens Healthcare devices.

Is it possible to perform real-time X-ray imaging on a standard PC even in fraught situations? Learn how Siemens Healthcare managed to remove dedicated image processing hardware (like FPGAs and DSPs) from many of its products and introduced NVIDIA GPUs to design and implement the imaging chain from the frame-grabbing board to the display on out-of-the-box-hardware. This "harmonized image chain" (what we call "harmonIC") runs on many modalities like radiography, fluoroscopy, mammography, and mobile C-arm devices, which are used in surgery, where reliability matters most. Additionally, scalability and extensibility are very important in medical imaging and will be covered in this presentation.

Level: All
Type: Talk
Tags: Medical Imaging; Video & Image Processing

Day: Thursday, 04/07
Time: 10:00 - 10:25
Location: Room LL21B

S6740 - GPU Powered Solutions in the Second Kaggle Data Science Bowl

Jon Barker Solution Architect, NVIDIA
Jon Barker joined NVIDIA in May 2015 as a Solution Architect. Since then he has been helping customers to design, implement and optimize a variety of GPU accelerated deep learning applications and has also provided internal and external deep learning training. Jon is particularly focused on the application of deep learning to problems in defense and national security. Jon graduated from the University of Southampton in the UK in 2007 with a PhD in Mathematics. Prior to joining NVIDIA, Jon worked for the UK Ministry of Defence and spent four years on secondment to the US Department of Defense where he was a research scientist focused on data analytics and machine learning for multi-sensor data streams. In order to learn new data science skills Jon has been a long time competitor on Kaggle.

The second annual Data Science Bowl was an online data science contest that took place in early 2016 and was hosted on the Kaggle platform. The objective of the contest was to develop an algorithm that could accurately estimate the volume of the left ventricle of a human heart at the points of maximum and minimum volume from a time-series of multiple cross sectional Magnetic Resonance Imaging (MRI) images of the heart. The contest provided thousands of MRI images to train an algorithm. The challenge was a natural fit for GPU accelerated deep learning (DL). We'll hear from some of the winning teams describe their approaches. The complexities of working with sometimes messy clinical data will be discussed and we will hear how deep learning can be applied to a time-series of 3D images.

Level: Beginner
Type: Talk
Tags: Medical Imaging; Deep Learning & Artificial Intelligence; Video & Image Processing

Day: Thursday, 04/07
Time: 13:00 - 13:50
Location: Room LL21B

S6713 - Large Scale Video Processing for Virtual Reality

Arthur van Hoff CTO, Jaunt VR
Arthur van Hoff is a serial entrepreneur and was most recently CTO at Flipboard. He started his career in Silicon Valley at Sun Microsystems where he was an early developer of the Java programming language. Since then he has started several successful companies including Marimba (IPO 1999), Strangeberry (acquired by TiVo), ZING (acquired by Dell), and Ellerdale (acquired by Flipboard). Arthur has expertise in machine learning, big data, mobile applications, 3D printing, and computational photography. He is originally from the Netherlands and has a master's degree in Computer Science from Strathclyde University in Glasgow.

Jaunt VR has developed a GPU based large scale video processing platform to combine multiple HD camera streams in radial configuration into seamlessly stitched stereoscopic spherical panoramas. The approach uses complex computational photography algorithms that require sharded processing of the data across hundreds of cloud based GPU instances.

Level: All
Type: Talk
Tags: Virtual Reality & Augmented Reality; Computer Vision & Machine Vision; Video & Image Processing

Day: TBD, TBD
Time: TBD - TBD
Location: TBD

S6720 - Live Video Streaming in VR

Nicolas Burtey CEO, VideoStitch
Nicolas Burtey is Founder and CEO of VideoStitch and has worked in the VR and 360 industry for more than twelve years.

Learn how to stream live VR video from capture to the HMD and all the processing step in between.

Level: Beginner
Type: Talk
Tags: Virtual Reality & Augmented Reality; Video & Image Processing

Day: TBD, TBD
Time: TBD - TBD
Location: TBD

S6743 - Large Scale Video Processing for Virtual Reality

Arthur van Hoff CTO, Jaunt VR
Arthur van Hoff is a serial entrepreneur and was most recently CTO at Flipboard. He started his career in Silicon Valley at Sun Microsystems where he was an early developer of the Java programming language. Since then he has started several successful companies including Marimba (IPO 1999), Strangeberry (acquired by TiVo), ZING (acquired by Dell), and Ellerdale (acquired by Flipboard). Arthur has expertise in machine learning, big data, mobile applications, 3D printing, and computational photography. He is originally from the Netherlands and has a master's degree in Computer Science from Strathclyde University in Glasgow.

Jaunt VR has developed a GPU based large scale video processing platform to combine multiple HD camera streams in radial configuration into seamlessly stitched stereoscopic spherical panoramas. The approach uses complex computational photography algorithms that require sharded processing of the data across hundreds of cloud based GPU instances.

Level: All
Type: Talk
Tags: Virtual Reality & Augmented Reality; Computer Vision & Machine Vision; Video & Image Processing

Day: TBD, TBD
Time: TBD - TBD
Location: TBD

S6763 - Analyzing Videos for Creating Personalized Viewing Experiences

Omar Javed Chief Scientist, ClipMine, Inc.
Omar Javed is ClipMine's Chief Scientist. His interests include video event understanding, multi-media indexing, object tracking, and online machine learning. Prior to joining ClipMine, he was a Principal Scientist and Technology Leader at SRI International. Omar was also an Associate Editor for the Machine Vision and Applications Journal from 2008-2015, and was the area chair for CVPR 2008. His research article "Object Tracking: A Survey" was ranked #1 in ACM's most popular magazine and computing survey articles list in 2007. His research article, "Modeling Inter-Camera Space-Time and Appearance Relationships for Tracking across Non Overlapping Views" was listed as the top-10 most cited paper in the Computer Vision and Image Understanding Journal for papers published from 2007-2011.

Our session will focus on video analysis for rapid media personalization. In recent years, viewership of online video content has increased exponentially. However, the current video players allow for video viewing mostly in the same way as analog videos were viewed using VCRs, i.e., view from start to end with option to play, pause, fast forward and rewind. We have developed an automatic video processing system to rapidly analyze videos on-demand, in order to generate Table of Contents (ToCs) and personalized storyboards for video navigation. We will discuss the advantages of using GPUs for video analytics and how GPUs enabled us to generate TOCs of hour long HD videos in seconds.

Level: All
Type: Talk
Tags: Video & Image Processing

Day: TBD, TBD
Time: TBD - TBD
Location: TBD

Talk
 

TUTORIAL

Presentation
Details

S6160 - Best Practices in GPU-Based Video Processing

Thomas True Senior Applied Engineer for Professional Video and Image Processing, NVIDIA
Tom True is a Senior Applied Engineer at NVIDIA where he works with developers to optimize the designed and implementation of GPU-based professional video broadcast, digital post production and large scale multi-GPU multi-display visualization systems. Tom has a Bachelor of Science degree from the Rochester Institute of Technology and a Master of Science degree from the graphics lab at Brown University. Prior to joining NVIDIA, Tom was an application engineer at Silicon Graphics.

This session will explore best practices and techniques for the development of efficient GPU-based video and image processing applications. Topics to be discussed include threading models for efficient parallelism, CPU affinity to optimize system memory and GPU locality, image segmentation for overlapped asynchronous transfers, optimal memory usage strategies to reduce expensive data movement, and image format considerations to reduce and eliminate data conversions. Single and multi-GPU systems for uncompressed real time 4K video capture, processing, display and play-out will be considered. Takeaways should prove applicable to developers of video broadcast and digital post production systems as well as to developers of large scale visualization systems that require video ingest.

Level: Advanced
Type: Tutorial
Tags: Media & Entertainment; Video & Image Processing; Large Scale and Multi-Display Visualization

Day: Monday, 04/04
Time: 09:00 - 10:20
Location: Room 210F

S6226 - High-Performance Video Encoding on NVIDIA GPUs

Abhijit Patait Director, Multimedia System Software, NVIDIA
Abhijit Patait manages the GPU Multimedia Software team at NVIDIA, which is responsible for video and audio software, GRID SDK, NVENC SDK, and others. Prior to NVIDIA, he worked at various U.S. organizations in areas such as baseband and audio signal processing, telecommunications software, VoIP, and video/image processing. Abhijit holds an MSEE from Missouri S&T University and MBA from University of California, Berkeley.
Eric Young GRID Developer Relations Manager , NVIDIA
Eric Young is an engineering manager in the Content and Technology group. We focus on applied research with GRID technologies and video compression. Eric specializes in remote graphics, video compression, and technology for games.

We'll provide an overview of video encoding technologies available using NVIDIA GPUs. In particular, attendees can expect to learn the following: (1) overview of NVIDIA video encoding SDK (NVENC SDK); (2) new features in NVIDIA video encoding (NVENC) hardware with new GPU chips; (3) changes and new features in NVENC SDK 6.0 and NVENC SDK 7.0; and (4) differences between NVENC SDK and GRID SDK and using right SDK for a particular application.

Level: All
Type: Tutorial
Tags: Media & Entertainment; Video & Image Processing; Tools & Libraries

Day: Monday, 04/04
Time: 13:00 - 13:50
Location: Room 210F

Tutorial
 

POSTER

Presentation
Details

P6311 - QuickEye: Highly Efficient Face Detection & Recognition in Large Video Using Hadoop GPU-cluster

Illo Yoon Graduate Student, University of Seoul
Illo Yoon is working at designing an efficient hybrid scheduler for CPU+GPU map tasks in Hadoop. He designed and implemented QuickEye, a GPU-accelerated face detection and recognition system. Illo joined ParLab(Parallel Software Design Lab, Univ. of Seoul) in 2014 as an undergraduate researcher, and is expected to finish his B.S. and M.S. in August 2016.

Ignoring the debate about privacy, we cannot deny that CCTV has had a positive contribution on crime prevention. CCTV is everywhere and more are coming. More and more video files are created and they are getting bigger. It is difficult to handle these big video file with a single server. So let's try Hadoop! Hadoop is a big data framework that can be used easily distributed in a cluster environment. By default, Hadoop cannot utilize GPUs, because Hadoop is running on JVM. So we attached GPU code to Hadoop using JNI(Java Native Interface) and introduce a system called QuickEye. QuickEye decodes large video files detecting face recognition using CUDA.

Level: Intermediate
Type: Poster
Tags: Video & Image Processing

Day: TBD, TBD
Time: TBD - TBD
Location: TBD

P6313 - Large Time Series Single-Molecule Tracking Including Defocus and Motion Blur Control

Xu Xiaochun Research Associate, National University of Singapore
Xu Xiaochun is currently Research Associate in Mechanobiology Institute, Singapore. Mechanobiology Institute (MBI) is one of the Research Centres of Excellence in National University of Singapore (NUS), whose mission is to develop a new paradigm of biomedical research by focusing on the quantitative and systematic understanding of dynamic functional processes. Dr. Xu received his Bachelor and Master degrees in Biology from Huazhong University of Science and Technology (HUST), China in 2008 and 2011, respectively. His research interests lie in the fields of bio-photonics, bioinformatics, and biomedical system, including X-ray cone-beam microtomography, single molecular tracking, cubic membrane, and ophthalmic device. He has published several journal papers in these areas.

We'll present an operational tracking implementation for multi-channel microscopy time series from hundreds to tens of thousands of frames, depicting the dim traces of single fluorescent molecules moving over time. The characteristic shape of an optical point source is used to localize and trace thousands of molecules fast, accurately, and reliably over a timespan of several minutes.

Level: Beginner
Type: Poster
Tags: Video & Image Processing; Computational Biology

Day: TBD, TBD
Time: TBD - TBD
Location: TBD

P6324 - Upscaling with Deep Convolutional Networks and Muxout Layers

Pablo Navarrete Michelini Principal Research Engineer, BOE Technology Group Co., Ltd.
Pablo Navarrete Michelini was born in Santiago, Chile. He received the B.Sc. in Physics (2001), B.Sc. in Electrical Engineering (2001) and the Electrical Engineer Degree (2002), from Universidad de Chile at Santiago. He received the Ph.D. degree in Electrical Engineering from Purdue University at West Lafayette, in 2008. Pablo worked as a research intern in CIMNE at Technical University of Catalonia in 2006, and as a visitor student research collaborator at Princeton University at Princeton, NJ, in 2006-2007. He worked as Assistant Professor in the Department of Electrical Engineering at Universidad de Chile, in 2008-2011. He worked as Senior Software Engineer on video compression algorithms in Yuvad Technologies Co., Ltd. at Beijing, China, in 2011-2014. Pablo has worked as a Principal Research Engineer at BOE Technology Group Co., Ltd. since 2014.

We consider the problem of super-resolution using convolutional networks. Previous work has shown the advantages of using convolutional networks to improve the quality of image upscaling. Unlike previous solutions, our method incorporates the image upsampling within the network structure. To achieve this goal we propose a so-called Muxout layer that increases the size of image features by combining them in groups. The system structure is motivated by an interpretation of convolutional networks as adaptive filters and classic interpolation theory. We use this interpretation to propose specialized initialization methods that are convenient for training deep structures. Our tests show state-of-art quality, high performance, and the ability for unsupervised learning of text images.

Level: Intermediate
Type: Poster
Tags: Deep Learning & Artificial Intelligence; Video & Image Processing

Day: TBD, TBD
Time: TBD - TBD
Location: TBD

P6344 - Large Time Series Single-Molecule Tracking Including Defocus and Motion Blur Control

Xu Xiaochun Research Associate, National University of Singapore
Xu Xiaochun is currently Research Associate in Mechanobiology Institute, Singapore. Mechanobiology Institute (MBI) is one of the Research Centres of Excellence in National University of Singapore (NUS), whose mission is to develop a new paradigm of biomedical research by focusing on the quantitative and systematic understanding of dynamic functional processes. Dr. Xu received his Bachelor and Master degrees in Biology from Huazhong University of Science and Technology (HUST), China in 2008 and 2011, respectively. His research interests lie in the fields of bio-photonics, bioinformatics, and biomedical system, including X-ray cone-beam microtomography, single molecular tracking, cubic membrane, and ophthalmic device. He has published several journal papers in these areas.

We'll present an operational tracking implementation for multi-channel microscopy time series from hundreds to tens of thousands of frames, depicting the dim traces of single fluorescent molecules moving over time. The characteristic shape of an optical point source is used to localize and trace thousands of molecules fast, accurately, and reliably over a timespan of several minutes.

Level: Beginner
Type: Poster
Tags: Video & Image Processing; Computational Biology

Day: TBD, TBD
Time: TBD - TBD
Location: TBD

Poster