Sign In
GTC Logo
GPU
Technology
Conference

April 4-7, 2016 | Silicon Valley
Check back often for session updates.

Scheduler Planner

Print
Download Pdf
 

 
Refine:
  • Session Levels:
  • |
  • |
  • |
  • |
  • Session Levels:
  •  
  •  
  •  
  •  
  • = Highly Rated Speaker

TALK

Presentation
Details

S6767 - VisionWorks(TM), A CUDA Accelerated Computer Vision Library

Elif Albuz Computer Vision Software Lead, NVIDIA
Elif Albuz is the technical lead for VisionWorks Toolkit at NVIDIA, driving features and optimizations with CUDA acceleration on Tegra GPUs. Before Computer Vision Group, she was leading CUDA FFT Library; designing new algorithms for motion estimation, superresolution and frame-rate up conversion and accelerating them on NVIDIA GPUs; designing architecture for error concealment, adaptive quantization for video codec hardware; and implementing low-level code for h.264, MPEG2 codecs. Prior to joining NVIDIA, she worked at Sony Electronics, leading DVD decoder firmware stack that was used in DVD players and Playstation 2, implementing real-time OS for multi-processor systems and accelerating h.264 using SIMD in the Multimedia Research Labs. Elif Albuz holds dual degree on Electrical Engineering and Computer Science where she focused on Artificial Intelligence and Robotics, and holds a Masters degree in Electrical Engineering where she did research on content based image retrieval, parallel architectures and algorithms.

In this talk, we will introduce NVIDIA VisionWorks™ toolkit, a software development package for computer vision (CV) and image processing. VisionWorks(TM) implements and extends the Khronos OpenVX standard, and it is optimized for CUDA-capable GPUs and SOCs enabling computer vision applications on a scalable and flexible platform. VisionWorks implements a thread-safe API and framework for seamlessly adding user defined primitives. The talk will give an overview of the VisionWorks toolkit, OpenVX API and framework, VisionWorks-plus modules including VisionWorks Structure From Motion and Object Tracker modules, and computer vision pipeline samples showing integration of the library API into a computer vision pipeline on Tegra platforms.

Level: All
Type: Talk
Tags: Computer Vision & Machine Vision; Embedded; Self-Driving Cars & Automotive

Day: Monday, 04/04
Time: 13:00 - 13:50
Location: Room LL20A

S6783 - VisionWorks(TM), A CUDA Accelerated Computer Vision Library

Elif Albuz Computer Vision Software Lead, NVIDIA
Elif Albuz is the technical lead for VisionWorks Toolkit at NVIDIA, driving features and optimizations with CUDA acceleration on Tegra GPUs. Before Computer Vision Group, she was leading CUDA FFT Library; designing new algorithms for motion estimation, superresolution and frame-rate up conversion and accelerating them on NVIDIA GPUs; designing architecture for error concealment, adaptive quantization for video codec hardware; and implementing low-level code for h.264, MPEG2 codecs. Prior to joining NVIDIA, she worked at Sony Electronics, leading DVD decoder firmware stack that was used in DVD players and Playstation 2, implementing real-time OS for multi-processor systems and accelerating h.264 using SIMD in the Multimedia Research Labs. Elif Albuz holds dual degree on Electrical Engineering and Computer Science where she focused on Artificial Intelligence and Robotics, and holds a Masters degree in Electrical Engineering where she did research on content based image retrieval, parallel architectures and algorithms.

In this talk, we will introduce NVIDIA VisionWorks™ toolkit, a software development package for computer vision (CV) and image processing. VisionWorks(TM) implements and extends the Khronos OpenVX standard, and it is optimized for CUDA-capable GPUs and SOCs enabling computer vision applications on a scalable and flexible platform. VisionWorks implements a thread-safe API and framework for seamlessly adding user defined primitives. The talk will give an overview of the VisionWorks toolkit, OpenVX API and framework, VisionWorks-plus modules including VisionWorks Structure From Motion and Object Tracker modules, and computer vision pipeline samples showing integration of the library API into a computer vision pipeline on Tegra platforms.

Level: All
Type: Talk
Tags: Computer Vision & Machine Vision; Embedded; Self-Driving Cars & Automotive

Day: Monday, 04/04
Time: 13:00 - 13:50
Location: Room LL20A

S6227 - Distributed Deep Learning at Scale

Soumith Chintala Research Engineer, Facebook AI Research
Soumith Chintala is a Research Engineer at Facebook AI Research. Prior to joining Facebook in August 2014, Soumith worked at MuseAmi, where he built deep learning models for music and vision targeted at mobile devices. In the past, Soumith worked on state-of-the-art deep learning models for pedestrian detection, natural image OCR, depth-images among others while driving his research heavily using CUDA and multiple GPUs.

This talk provides a brief overview of deep learning research, the challenges involved in scaling it up across multi-GPU and multi-machine clusters, while providing software that is flexible enough for research settings. We discuss the clear trends that are emerging in deep learning from a HPC perspective and discuss several examples from our work at Facebook AI Research.

Level: All
Type: Talk
Tags: Deep Learning & Artificial Intelligence; Computer Vision & Machine Vision

Day: Tuesday, 04/05
Time: 13:00 - 13:50
Location: Room 210D

S6391 - Bootstrapping Labels for One Hundred Million Images

Jimmy Whitaker Software Engineer, Digital Reasoning
Jimmy Whitaker is a software engineer at Digital Reasoning, a cognitive computing company focused on enabling humans to leverage big data to make decisions, where he has been pioneering computer vision efforts. Prior to joining Digital Reasoning, Jimmy completed his M.S. in computer science at the University of Oxford, where he achieved a distinction for his research in the field of steganalysis -- detecting hidden information in images.

We'll describe how we created an iterative labeling process to perform data science on 100 million+ images using a GPU-powered workflow with convolutional neural networks. Recently, deep learning techniques such as deep convolutional neural networks (ConvNets) have achieved state-of-the-art results in many computer vision tasks. The data-driven nature of deep learning normally requires a large number of labeled examples to achieve high accuracies. Unfortunately, much of the publicly available data on the web is not labeled, thus requiring human labelers for large datasets or unsupervised machine learning techniques. Our labeling process allows weak labels and a small number of strong labels to be used to create classifiers for very large datasets.

Level: Beginner
Type: Talk
Tags: Deep Learning & Artificial Intelligence; Big Data Analytics; Computer Vision & Machine Vision

Day: Tuesday, 04/05
Time: 13:00 - 13:25
Location: Room 210G

S6422 - Enhancing Visual Realism of Mixed Reality Applications with Stereo Vision

Azzam Edwin CTO, Stereolabs
Edwin Azzam co-founded STEREOLABS in 2010. As STEREOLABS's Chief Technical Officer, Edwin is responsible for leading the company's product development and technology strategy in stereo vision. Prior to founding STEREOLABS, Edwin was a project manager at Astrium Space Transportation, Paris.Edwin holds a Master's degree in Optics & Image Processing from Institut d'Optique, France, as well as a Master's degree in Management from ESSEC Business School. He is a PhD supervisor and a National Technical Expert for the ANR (National Research Agency), where he uses his technical and market expertise for the assessment of national research projects in the field of computer vision and 3D image processing.

Discover how stereo vision and 3D depth sensing on GPU enable the development of mixed reality applications, which merge virtual information into a live 3D video stream of the real world. We will discuss the various stages of a real-time mixed reality processing pipeline, and how NVIDIA's GPU acceleration is integral to every step of the pipeline. We will also show demonstrations of how stereo depth sensing can be used to create 3D virtual playgrounds and real-time augmentation of the environment.

Level: All
Type: Talk
Tags: Computer Vision & Machine Vision; Virtual Reality & Augmented Reality; Video & Image Processing; Embedded

Day: Tuesday, 04/05
Time: 13:00 - 13:25
Location: Room 210F

S6362 - CNN Based Object Detection in Large Video Images

Tao Wang Chief Scientist, iQIYI ltd. Corp.
Dr. Tao Wang is chief scientist of iQIYI ltd. Corp., the biggest video sharing platform in China, where he works on computer vision and multimedia software applications. He received his Ph.D. in computer science from Tsinghua University in 2003. Tao then worked as a senior researcher in Intel Labs China. He has published more than 60 papers in IJCV, CVPR, CIVR, ICME, and ACM multimedia.

Object detection in real video images is more challengable than image data set. We'll present CNN based object detection research on IQIYI large image and videos. It is used for content based ads recommendation.

Level: All
Type: Talk
Tags: Deep Learning & Artificial Intelligence; Computer Vision & Machine Vision

Day: Tuesday, 04/05
Time: 13:30 - 13:55
Location: Room 210G

S6563 - Where Tegra Meets Titan: Asymmetric Computer Vision for Smartphones and Robotics

Tom Drummond Professor, Monash University
Tom Drummond has been a principal investigator on several EU Framework projects and is a chief investigator in the ARC Centre of Excellence for Robotic Vision. Tom studied mathematics for his B.A. at the University of Cambridge. In 1989, he emigrated to Australia and worked for CSIRO in Melbourne for four years before moving to Perth for his Ph.D. in computer science at Curtin University. In 1998, he returned to Cambridge as a postdoctoral research associate and in 1991 was appointed as a university lecturer and was subsequently promoted to senior university lecturer. In 2010, he returned to Melbourne and took up a professorship at Monash University.

This presentation will argue that battery life and thermal limits will prevent small mobile devices from implementing the next generation of visual processing algorithms without external assistance from high performance computing. Several innovative methods of distributing these problems between lightweight and high-powered nodes will be explored for a number of visual processing applications relevant to smartphones and robotics. We'll illustrate how these problems can be mapped onto the thread model of GPUs and will present a couple of CUDA tricks used to maximize efficiency.

Level: All
Type: Talk
Tags: Computer Vision & Machine Vision; Robotics & Autonomous Machines; Virtual Reality & Augmented Reality

Day: Tuesday, 04/05
Time: 13:30 - 13:55
Location: Room 210F

S6384 - NVIDIA CUDA® for Mobile

Yogesh Kini Manager, CUDA System Software, NVIDIA
Yogesh Kini manages the Tegra CUDA driver team at NVIDIA. For last four years he has been working towards enabling GPU compute software on different Tegra platforms. His team is responsible for the CUDA API and system software on various embedded, automotive, and mobile platforms based on Tegra SOC. He holds a B.S. from Manipal Institute of Technology, India.

This session is about a few important use-cases in mobile that can be accelerated using CUDA. Use-cases include image processing, camera output post-processing, and real-time texture compression in graphics applications. Attendees will learn that: [1] Tegra has unified memory architecture that can be utilized by applications to reduce total memory usage and power consumption.The use-case presented demonstrates effective use of UVM on Tegra. [2] CUDA provides means to take inputs from a camera via EGLImage and EGLStreams interoperability. This can be used to post-process camera images using CUDA. The example presented demonstrates use of these CUDA API. [3] CUDA provides API for interoperability with OpenGL-ES.Texture compression in a graphics application is demonstrated as an example.

Level: Intermediate
Type: Talk
Tags: Computer Vision & Machine Vision; Video & Image Processing; Tools & Libraries

Day: Tuesday, 04/05
Time: 14:00 - 14:25
Location: Room 210F

S6116 - Towards Building a GPU Cloud Service for Human-Level Quality Image Understanding

Xiaodong He Senior Researcher, Microsoft
Xiaodong He is a senior researcher in the Deep Learning Technology Center, Microsoft Research, Redmond, Wash. He is also an affiliate full professor in the Department of Electrical Engineering at the University of Washington, Seattle, serving in the Ph.D. reading committee. His research interests include deep learning, speech, natural language, vision, information retrieval, and knowledge representation and management. He has published in IEEE TASLP, IEEE SPM, Proc. IEEE, ICASSP, ACL, EMNLP, NAACL, CVPR, SIGIR, WWW, CIKM, ICLR, NIPS, and other venues. He has received several awards, including the Outstanding Paper Award of ACL 2015. He and colleagues developed the MSR-NRC-SRI entry and the MSR entry that won No. 1 in the 2008 NIST Machine Translation Evaluation and the 2011 IWSLT Evaluation (Chinese-to-English), respectively, and the MSR image captioning system that won first prize at the MS COCO Captioning Challenge 2015. He has held editorial positions on several IEEE Journals and has served on the organizing and program committees of major speech and language processing conferences. He is a senior member of IEEE and a member of ACL.
Kenneth Tran Senior Research Engineer, Microsoft Research
Kenneth Tran is a senior research engineer in the Deep Learning Technology Center, Microsoft Research. Previously, he was a machine learning scientist in the Cloud Machine Learning group at Microsoft, building a machine learning platform, which now powers the Azure ML. His research interest includes machine learning, optimization, and distributed computing.

Learn the latest deep learning techniques for semantic modeling of image, text, and knowledge graph, all empowered by GPU computing and cloud service. We'll demonstrate how to build deep semantic models across different modalities, and how to apply these models to reach the best results in information retrieval, question answering, and image captioning benchmarks. In particular, facilitated by the recently announced Microsoft Azure GPU compute instances, we'll show how to use GPU clusters to extend the MSR image captioning system, which won first prize in the COCO Captioning Challenge at CVPR 2015, and to build a publically available, large-scale, deep image understanding service that achieves state-of-the-art performance in generating novel captions for images.

Level: Intermediate
Type: Talk
Tags: Deep Learning & Artificial Intelligence; Computer Vision & Machine Vision; Data Center & Cloud Computing

Day: Tuesday, 04/05
Time: 14:30 - 14:55
Location: Room 210D

S6126 - GPU-Enabled Pavement Defect Detection: Looking for Cracks with Thousands of Eyes

Kristina Doycheva Research Assistant, Ruhr-University Bochum, Germany
Kristina Doycheva is pursuing her Ph.D. at the Ruhr-University and is working as a research assistant at the Chair of Computing in Engineering, Department of Civil Engineering. She received her M.S. in applied informatics at the Ruhr-University Bochum, Germany in 2013. Her research interests include high-performance image processing and machine learning. She is now working on a project related to pavement defect detection.

Learn how to use GPUs for pavement defect detection. In recent years, a variety of visual-based methods for pavement defect detection have been proposed. However, these methods process the images mostly offline, which results in a large amount of data being persistently stored. To enable real-time pavement distress detection, image pre-processing steps, such as nonuniform background illumination and noise removal, as well as pavement defect detection methods based on texture features and the wavelet transform, were implemented using GPUs. The achieved speed-up of the GPU implementation compared to a sequential implementation is approximately 10,000. The execution time allows for processing more than 600 images per second.

Level: Beginner
Type: Talk
Tags: Self-Driving Cars & Automotive ; Computer Vision & Machine Vision; Video & Image Processing

Day: Tuesday, 04/05
Time: 14:30 - 14:55
Location: Room LL21E

S6276 - Autonomous Robotic 3D Printing: Real-Time Path Planning with Computer Vision

Daghan Cam Architect, University College London
Daghan Cam is an architect and researcher based in London. He is the director of Daghan Cam Limited, which operates between architecture, technology, and research. He runs a post-graduate research cluster at UCL's Bartlett School of Architecture with Alisa Andrasek and Andy Lomas. He also leads research on GPU computing and he is a co-principal investigator of UCL as an NVIDIA GPU Research Center. Previously he worked with Zaha Hadid Architects. He taught workshops and gave lectures at AA Visiting Schools in Istanbul, Athens, London, and at Ecole d'architecture in Paris. His work on computational design and large-scale robotic fabrication has been widely exhibited, recently in San Francisco and in Milan Design Week 2015.

Teach your 3D printing robot how to adapt to unpredictable material behavior by using deep learning algorithms. We'll introduce a path planning strategy for iteratively correcting robot target positions in a 3D printing process by using an NVIDIA Jetson card attached to an industrial robotic arm. Initial path generation, visual tracking of material behavior in real-time, evaluation and recomputation of robot trajectories will be explained by code examples and video recordings from the fabrication process.

Level: Beginner
Type: Talk
Tags: Product & Building Design; Robotics & Autonomous Machines; Computer Vision & Machine Vision; Deep Learning & Artificial Intelligence

Day: Tuesday, 04/05
Time: 14:30 - 14:55
Location: Room LL21A

S6389 - Embedded Deep Learning for Object Detection and Classification in Aerial Images

Jon Barker Solution Architect, NVIDIA
Jon Barker is a solution architect with NVIDIA, helping customers and partners develop applications of GPU-accelerated machine learning and data analytics to solve defense and national security problems. He is particularly focused on applications of the rapidly developing field of deep learning. Prior to joining NVIDIA, Jon spent almost a decade as a government research scientist within the U.K. Ministry of Defence and the U.S. Department of Defense R&D communities. While in government service, he led R&D projects in sensor data fusion, big data analytics, and machine learning for multi-modal sensor data to support military situational awareness and aid decision making. He has a Ph.D. and B.S. in pure mathematics from the University of Southampton, U.K.

Learn how deep learning can be applied to object detection, localization, and tracking problems in remote sensing. We'll present a technical case study showing how a convolutional neural network (CNN) trained in the data center using DIGITS can be deployed to an embedded GPU system to carry out low-latency object detection, classification, and tracking in high-resolution aerial imagery. We'll compare different approaches to detection and localization tasks. An example will be given of integrating the Caffe deep learning framework for GPU-accelerated CNN inference with an OpenCV-based image and video processing pipeline. We'll also show how transfer learning can be accomplished using DIGITS to train a CNN when only a small task specific training dataset is available.

Level: Intermediate
Type: Talk
Tags: Deep Learning & Artificial Intelligence; Computer Vision & Machine Vision; Robotics & Autonomous Machines; Aerospace & Defense

Day: Tuesday, 04/05
Time: 14:30 - 14:55
Location: Room 210G

S6397 - Real-Time Non-Rigid Image Registration Engine

Randall Miles Senior Research Scientist, Propulsion Science and Technology
Dr. Randall Miles is a physicist, algorithm developer, and senior research scientist at Propulsion Science and Technology. He is lead designer and developer for model database development activities, and key contributor on a variety of projects, including quantum chemistry calculations and radar cross section modeling of CFD fields.

Non-rigid image registration, i.e., morphing, allows a smaller footprint of seed images to be used to create a smooth and continuously changing series of images. We'll present a new high-speed toolkit for image morphing implemented using NVIDIA GPU technology. Time improvements of ~80% were seen through implementing a succession of CUDA optimizations guided by the Nsight profiler results. Tests were conducted using available simulated rocket plume images to calculate run times and create performance measures.

Level: All
Type: Talk
Tags: Aerospace & Defense; Video & Image Processing; Performance Optimization; Computer Vision & Machine Vision

Day: Tuesday, 04/05
Time: 14:30 - 14:55
Location: Marriott Salon 2

S6115 - Real-Time Free Viewpoint TV System Based on a New Panorama Stitching Framework

Pierre Boulanger Professor, University of Alberta
Pierre Boulanger has more than 30 years of experience in 3D computer vision, rapid product development, and the applications of virtual reality systems to medicine and industrial manufacturing. He worked for 18 years at the National Research Council of Canada as a senior research officer where his primary research interest was in 3D computer vision, rapid product development, and virtualized reality systems. He now has a double appointment as a professor at the University of Alberta Department of Computing Science and at the Department of Radiology and Diagnostic Imaging. He is currently the director of the Advanced Man Machine Interface Laboratory (AMMI) as well as the scientific director of the SERVIER Virtual Cardiac Centre. In 2013, Pierre was awarded the CISCO chair in healthcare solutions, a 10 years investment by CISCO systems in the development of new IT technologies for healthcare in Canada. His main research topics are on the development of new techniques for telemedicine, patient specific modeling using sensor fusion, and the application of tele-presence technologies to medical training, simulation, and collaborative diagnostics.

With the advance of GPU and vision technologies, free viewpoint TV (FTV) will become a reality in the near future. Traditional videos such as those shown on TV or viewed on the Internet are passive and two-dimensional in nature. Viewers can only passively observe the events captured by a cameraman and have no ability to actively change their viewpoint once the video is recorded. On the contrary, FTV will allow the viewer to select an arbitrary viewpoint and thus enjoy a feeling of immersion into events such as an Olympic competition or a popular theatre show. In this presentation, we will describe a FTV system based on creating a real-time panorama from multiple pixel synchronized cameras using GPU and how to transmit this information using normal IPTV technologies.

Level: Intermediate
Type: Talk
Tags: Computer Vision & Machine Vision

Day: Tuesday, 04/05
Time: 15:00 - 15:25
Location: Room 210F

S6570 - Deep Learning in Real-World Large-Scale Image Search and Recognition

Xian-Sheng Hua Senior Director/Researcher, Alibaba Group
Xian-Sheng Hua became a researcher and senior director of Alibaba Group in April 2015, leading the multimedia technology team in the Search Division. Before that, he was a senior researcher of Microsoft Research Redmond since 2013, working on web-scale image and video understanding and search, as well as related applications. He was a principal research and development lead in Multimedia Search for the Microsoft search engine, Bing, until 2011, where he led a team that designed and delivered leading-edge media understanding, indexing, and searching features. He joined Microsoft Research Asia in 2001 as a researcher. Since then, his research interests have been in the areas of multimedia search, advertising, understanding, and mining, as well as pattern recognition and machine learning. He has authored or co-authored more than 250 research papers in these areas and has filed more than 90 patents. He received his B.S. in 1996 and Ph.D. in applied mathematics in 2001 from Peking University, Beijing.

We'll introduce how deep learning helps realize a real-world visual search and recognition system. This topic has been studied for decades and became very hot again in recent years mainly due to the rapid development of deep learning and large-scale search techniques. Many visual search and recognition preliminary products are available to the public. However, have we solved all the big technical and non-technical challenges? Has ImageNet solved the recognition problem? What are the key factors of realizing a real-world visual recognition/search system? Are semantic gaps still there? Which direction is visual search/recognition going toward? What is still missing? We'll discuss all these based on a real-world, deep learning-based visual search and recognition.

Level: All
Type: Talk
Tags: Deep Learning & Artificial Intelligence; Computer Vision & Machine Vision

Day: Tuesday, 04/05
Time: 15:00 - 15:25
Location: Room 210G

S6741 - Nervve High Speed Video Search, Powered by NVIDIA

Thomas Slowe CEO and Co-Founder, Nervve
Thomas Slowe identifies opportunities of scale in the market and designs game changing machine learning and interactive technologies. As CEO and Co-Founder of Nervve, Tom uses his technical and business development experience to position Nervve to change the way video and imagery can be used for intelligence, insight and action. Tom has over 19 years of experience in the area of machine learning as applied to video, imagery, and other two dimensional raster data. Prior to Nervve, he held a number of technical and executive positions where he was responsible for providing products and services to Fortune 500 companies in Retail, Advertising, Broadcast, Social, the US Intelligence Community and Department of Defense. Tom received his BSEE from Rutgers University and MS from MIT Media Laboratory.

Nervve is changing the way the world searches video by enabling industry leaders' to target, measure and monetize from their media through visual search and analysis. We currently work with the Federal Government, Media and Entertainment and Sports Media markets.

Level: Beginner
Type: Talk
Tags: Media & Entertainment; Big Data Analytics; Computer Vision & Machine Vision; Intelligent Video Analytics (IVA)

Day: Tuesday, 04/05
Time: 15:00 - 15:25
Location: Room LL21C

S6775 - The Recent Advances in GPU-Based Intelligent Video Analysis

Hai Tao CEO and Founder, Beijing Vion Technology
Hai Tao is a founder and the CEO of Beijing Vion Technology, Inc., a company focusing on developing world leading computer vision and artificial intelligence algorithms and products, with various applications in intelligent transportation systems (ITS), public safety, and business intelligence. Hai holds more than 10 US patents and published more than 130 papers in the field of image processing and computer vision. He received his B.S. and M.S. degrees in Automation from Tsinghua University in 1991 and 1993, respectively. He received the Ph.D. degree in Electrical Engineering from the University of Illinois at Urbana-Champaign in 1999.

We'll demonstrate our recent progress in applying GPUs in several key computer vision sub-fields including video-based face recognition, vehicle attribute analysis, urban management event detection, and high density crowd counting. These algorithms combine the traditional feature-plus-classifier approach with the recent advances in deep learning to make high performance computer vision systems practical and enable products in several vertical markets including intelligent transportation systems (ITS), business intelligence (BI), and smart video surveillance. In addition, we'll demonstrate a single-GPU video analytic box that can process up to 8 channels of analog or 2 channels of 1080p HD video inputs. A prototype 40-GPU server system capable of processing up to 80 channels of 1080p video inputs will also be introduced during this presentation.

Level: Beginner
Type: Talk
Tags: Intelligent Video Analytics (IVA); Algorithms; Computer Vision & Machine Vision

Day: Tuesday, 04/05
Time: 15:00 - 15:25
Location: Room LL20D

S6467 - Training My Car to See: Using Virtual Worlds

Antonio M. López Principal Investigator & Associate Professor, Computer Vision Center & Universitat Autònoma de Barcelona
Antonio Lopez is the head of the Advanced Driver Assistance Systems (ADAS) Group of the Computer Vision Center, and associate professor of the Computer Science Department, both at the Universitat Autonoma de Barcelona (UAB). In 1996, Antonio participated in the foundation of the Computer Vision Center at the UAB, where he has held different institutional responsibilities. Antonio has been principal investigator of numerous public and industrial projects, and is a co-author of a large number of top journal and conference papers. His research interests are vision-based object detection, semantic segmentation, domain adaptation, and computer graphics for training visual models. These topics are seen as key technologies to be applied in ADAS and autonomous driving.

Learn how realistic virtual worlds can be used to train vision-based classifiers that operate in the real world, i.e., avoiding the cumbersome task of collecting ground truth by manual annotation. Many vision-based applications rely on classifiers trained with annotated data. We avoid manual annotation by using realistic computer graphics (e.g. video games). However, the accuracy of the classifiers drops because virtual (training) and real (operation) worlds are different. We overcome the problem using domain adaptation (DA) techniques. In the context of vision-based driver assistance and autonomous driving, we present our DA experiences using classifiers based on both handcrafted features and CNNs. We show how GPUs are used in all the stages of our training and operation paradigm.

Level: Beginner
Type: Talk
Tags: Self-Driving Cars & Automotive ; Computer Vision & Machine Vision; Deep Learning & Artificial Intelligence

Day: Tuesday, 04/05
Time: 15:30 - 15:55
Location: Room LL21E

S6534 - Exciting Practical Applications of Scalable Deep Learning and Image Recognition in the Cloud

Georgi Kadrev CEO, Imagga Technologies
Georgi Kadrev is co-founder and CEO of Imagga Technologies (http://imagga.com), one of the companies pioneering the image-recognition-as-a-service model, offering highly scalable cloud API to businesses and developers. Georgi graduated with an M.S. in technology entrepreneurship from Sofia University in 2009 and is currently an assistant professor and Ph.D. student in the Software Engineering department, specializing in practical deep-learning for image recognition. While leading Imagga, Georgi has won multiple technology, innovation, and entrepreneurship awards, most recently the best company award in the "Technology For The Big Players" track at South Summit, Madrid, October 2015.

We'll demonstrate how scalable image recognition based on deep-learning can greatly contribute to business cases varying from advertising and user profiling to content management and cloud services. We'll also discuss the technical challenges of providing scalable image recognition capable of handling huge loads of images, instant feedback loops, and customer-specific recognition tasks, and how we've addressed them using GPUs in the cloud. Ultimately, you'll benefit from our experience handling 80+ different practical cases and dive deep into the most exciting ones.

Level: All
Type: Talk
Tags: Computer Vision & Machine Vision; Big Data Analytics; Deep Learning & Artificial Intelligence

Day: Tuesday, 04/05
Time: 15:30 - 15:55
Location: Room 210F

S6279 - Visual Feature Learning from Web Images and Click Log

Chen Fang Research Scientist, Adobe Systems
Chen Fang is a research scientist at Adobe Research. His interests include image recognition, image retrieval, deep learning, and large-scale machine learning. He obtained his Ph.D. from the computer science department at Dartmouth College.

Visual feature learning is a fundamental problem in computer vision. Existing solutions rely on deep learning and large-scale labeled datasets. However, it is often labor intensive and time consuming to collect such datasets. On the other hand, the internet offers raw visual data, i.e., images and videos, at massive scale and the associated user behavior data, e.g., click logs. We'll present a novel framework to learn visual features from such data, which completely forgoes the need of labeled datasets. We apply the proposed framework and its variants on two kinds of web data: images on a social website and their view history, and search log of a commercial image search engine. High-quality visual features are learned in both cases.

Level: All
Type: Talk
Tags: Computer Vision & Machine Vision; Deep Learning & Artificial Intelligence

Day: Tuesday, 04/05
Time: 16:00 - 16:25
Location: Room 210F

S6745 - VQA: Visual Question Answering

Aishwarya Agrawal Ph.D. Student, Virginia Tech
Aishwarya Agrawal is a second year Ph.D. student at the Bradley Department of Electrical and Computer Engineering at Virginia Tech. She is a member of the Virginia Tech Machine Learning and Perception Lab and is advised by Dhruv Batra. Her research interests lie at the intersection of machine learning, computer vision and natural language processing with a focus on multi-modal Artificial Intelligence, e.g. Visual Question Answering (VQA).

We'll describe the task of free-form and open-ended Visual Question Answering (VQA). Given an image and a natural language question about the image (e.g., "What kind of store is this?", "How many people are waiting in the queue?", "Is it safe to cross the street?"), the machine's task is to automatically produce an accurate natural language answer ("bakery", "5", "Yes"). Answering any possible question about an image is one of the 'holy grails' of AI requiring integration of vision, language, and reasoning. We have collected and recently released a dataset containing >250,000 images, >750,000 questions, and ~10 Million answers (www.visualqa.org). We are also running VQA challenge (www.visualqa.org/challenge.html) which includes both an open-ended answering task and a multiple-choice task.

Level: Intermediate
Type: Talk
Tags: Computer Vision & Machine Vision; Deep Learning & Artificial Intelligence; Big Data Analytics

Day: Tuesday, 04/05
Time: 16:30 - 16:55
Location: Room 210F

S6107 - Robust Model-Based 3D Head Pose Estimation

Shalini Gupta Senior Research Scientist, NVIDIA
Shalini Gupta has been a senior research scientist in the Mobile Visual Computing group of NVIDIA Research since April 2013. From 2011 to 2013, she worked as a senior mobile computer vision engineer at NVIDIA, where she designed and productized computer vision and computational photography solutions for mobile platforms and GPUs. She worked as an imaging and architecture scientist at Texas Instruments, from 2008 to 2010, where she designed algorithms for the image signal processing pipeline of mobile phones, at AT&T Laboratories on their IPTV project, and at Advanced Digital Imaging Research, LLC, where she designed algorithms for 3D human face recognition. Shalini received her M.S. and Ph.D. in electrical and computer engineering from the University of Texas at Austin in 2004 and 2008, respectively. She received a B.S. in electronics and electrical communication engineering from Punjab Engineering College, India, in 2002. She is a recipient of the Summer Research Fellowship 2001, awarded by the Jawaharlal Nehru Center for Advanced Scientific Research, Bangalore, India. Her primary research interests are image/signal processing, computer vision, and machine learning, and their application to scene understanding and interpretation.

Depth cameras have become cheap and ubiquitous. We introduce a computer vision algorithm for accurate, three-dimensional (3D) head pose (rotation and translation) estimation, which runs in near real time in CUDA. It works with different commodity depth sensors with minimal adaptation, handles large head rotations and occlusions gracefully, and does not require cumbersome subject initialization. Our algorithm results in an angular error of 2 degrees and a transnational error of 6 mm. It outperforms all seven competing methods on a benchmark data set. Accurate head pose estimation is an important fundamental problem in computer vision. It is a prerequisite for gaze estimation, facial animation capture, face recognition, driver monitoring, and head-coupled, 3D perspective displays.

Level: Intermediate
Type: Talk
Tags: Computer Vision & Machine Vision; Video & Image Processing; Intelligent Video Analytics (IVA)

Day: Wednesday, 04/06
Time: 09:00 - 09:25
Location: Room 210F

S6108 - High-Performance Pedestrian Detection on NVIDIA Tegra®

Max Lv GPU Architect, NVIDIA
Max Lv is a GPU architect in the Compute Architecture team at NVIDIA, focusing on computer vision applications on GPU and mobile GPU architecture. Before joining NVIDIA, he was a research assistant at the Parallel Processing Institute in Fudan University.

We'll present an innovate approach to efficiently mapping a popular pedestrian detection algorithm (HoG) on an NVIDIA Tegra GPU. Attendees will learn new techniques to optimize a real computer vision application on Tegra X1, as well as several new architecture features of the Tegra X1 GPU.

Level: Advanced
Type: Talk
Tags: Self-Driving Cars & Automotive ; Computer Vision & Machine Vision

Day: Wednesday, 04/06
Time: 09:00 - 09:25
Location: Room LL21E

S6130 - 3D Deep Learning

Jianxiong Xiao Assistant Professor, Princeton University
Jianxiong Xiao is an assistant professor in the Department of Computer Science at Princeton University and the director of the Princeton Vision Group. He received his Ph.D. from the Computer Science and Artificial Intelligence Laboratory (CSAIL) at Massachusetts Institute of Technology (MIT). Jianxiong's research interests are in computer vision. He has been motivated by the goal of building computer systems that automatically understand visual scenes, both inferring the semantics and extracting 3D structure. Jianxiong focuses on 3D deep learning, RGB-D recognition and reconstruction, place-centric 3D context modeling, graphics for vision (synthesis for analysis), deep learning for autonomous driving, large-scale crowd-sourcing, and petascale big data. His work has received the Best Student Paper Award at the European Conference on Computer Vision (ECCV) in 2012 and Google Research Best Papers Award for 2012. Jianxiong was awarded the Google U.S./Canada Fellowship in Computer Vision in 2012, MIT CSW Best Research Award in 2011, and two Google Research Awards in 2014 and in 2015.

We'll discuss some of our research projects about 3D deep learning in computer vision, including our projects to use 3D convolution neural networks on GPUs to learn 3D descriptors for point features, to model 3D shapes, and to parse 3D scenes. Finally, we'll talk about Marvin, a deep learning software framework for N-dimensional data that we developed for NVIDIA GPUs, which could impact other fields, such as neural sciences, biology, medical images, and healthcare.

Level: All
Type: Talk
Tags: Computer Vision & Machine Vision; Robotics & Autonomous Machines; Deep Learning & Artificial Intelligence

Day: Wednesday, 04/06
Time: 09:30 - 09:55
Location: Room 210F

S6272 - Deep Learning Algorithms for Recognising the Features of Facial Ageing

Konstantin Kiselev Data Scientist, Facebrain, Inc
Konstantin Kiselev conducts research in scientific startup, Facebrain (which is co-founded by Dr. Alex Zhavoronkov), in the field of computer vision and deep learning and holds a position of lead data scientist in the big data project for TechnoServ, a top 5 Russian IT company, and Beeline, a top 3 Russian mobile operator. Konstantin holds an M.S. in theoretical physics from Lomonosov Moscow State University. He has a broad experience in software development of high-load systems and extensive knowledge in machine learning and big data. From 2014 to 2015, he was development lead for large IT systems for the Russian government at LANIT, a leading Russian software company. He received additional education in big data and machine learning fields, took first place in the Microsoft Machine Learning Hackathon (June 2015), and participated in the deep learning team competition organized by MIPT (deephack.me, mipt.ru/en/, July 2015).

We'll discuss DNN applications for determination of main facial skin biomarkers using a face photo. While there are a lot of other factors that enable to determine human age with high accuracy, the most obvious factor is how your face looks. Tracking face wrinkles enables us to track not only skin ageing process as such, but also the results and efficiency of treatment used. By following the dynamics of wrinkles appearance, it is possible to find out which treatment is more suitable for a particular face or skin type and hence provide recommendations.

Level: Intermediate
Type: Talk
Tags: Deep Learning & Artificial Intelligence; Computer Vision & Machine Vision

Day: Wednesday, 04/06
Time: 10:00 - 10:25
Location: Room 210G

S6653 - Bilateral and Trilateral Filters in Stereo Vision

Ryan Beethe Student, Colorado School of Mines
Ryan Beethe is a student at the Colorado School of Mines who loves robots. His recent research focus has been in embedded stereo vision, and was last year's NVIDIA CUDA Vision Challenge winner, with his work on GPU-accelerated semi-global block matching.

We'll explain how bilateral and trilateral filters work and how they are used in modern stereo vision algorithms. Bilateral filters are the basis of many of the fastest, most effective local algorithms available today. Local algorithms are especially desirable because they are easily parallelizable. Topics covered with OpenCV-based GPU-accelerated examples will include stereo vision basics, motivations for applying bilateral filters, and pre- and post-processing stereo images with bilateral filters, and limitations of bilateral filters in stereo vision. Additional topics will include bilateral-based adaptive support weight (ASW) correspondence searching, trilateral filters, and trilateral-based ASW correspondence searching.

Level: Beginner
Type: Talk
Tags: Computer Vision & Machine Vision; Embedded; IoT

Day: Wednesday, 04/06
Time: 10:00 - 10:25
Location: Room 210F

S6305 - 3D Point Cloud Registration Using GPU-Accelerated Expectation Maximization

Benjamin Eckart Ph.D. Student, Carnegie Mellon University
Benjamin Eckart is a Ph.D. candidate with the Robotics Institute at Carnegie Mellon University and an NVIDIA Graduate Fellow. Ben's research focuses on the creation of parallel algorithms for 3D robotic perception. He is currently exploring ways to use many-core architectures such as the GPU to rapidly create compact models to facilitate and unify common low-level perceptive tasks like segmentation, registration, and classification. Ben holds an M.S. in robotics from Carnegie Mellon University, an M.S. in electrical engineering from Tennessee Tech University, as well as a B.S. in computer science and a B.S. in computer engineering.

We'll discuss how to use GPUs to accelerate a common 3D spatial processing application, point cloud registration. Registration, or finding the relative rigid transform between two point clouds, forms a core component of many 3D vision algorithms such as object matching and environment reconstruction. We use the GPU to accelerate this process using a parallelized form of the Expectation Maximization (EM) algorithm. Using this novel EM construction can both accelerate registration as well as provide a natural geometric segmentation of the data, two processes that we show to be highly interrelated at the kernel level when deployed on a GPU. Finally, we discuss how GPU-accelerated registration can be used in the larger context of real-time 3D perception.

Level: Advanced
Type: Talk
Tags: Robotics & Autonomous Machines; Computer Vision & Machine Vision; IoT

Day: Wednesday, 04/06
Time: 10:30 - 10:55
Location: Room LL20D

S6723 - Which Whale Is It, Anyway? Face Recognition for Right Whales Using Deep Learning

Robert Bogucki Chief Science Officer, deepsense.io
Robert Bogucki is a Chief Science Officer at deepsense.io where he currently manages the R&D team and focuses on deep learning. He is also a successful Kaggle competitor. When tackling real life problems, he particularly enjoys leveraging algorithms and computational power instead of, or in addition to, domain knowledge. His motivation to work in the IT Industry is to bring the theoretical ideas and concepts and put them to good use.

With fewer than 500 North Atlantic right whales left in the world's oceans, knowing the health and status of each whale is integral to the efforts of researchers working to protect the species from extinction. To interest the data science community, NOAA Fisheries has organized a competition hosted on Kaggle.com. The challenge was to automate the right whales recognition process using a dataset of aerial photographs of individual whales - currently a painstaking and lengthy, manual process. In this session, I will outline the winning solution. It is based on deep learning and convolutional neural networks.

Level: Advanced
Type: Talk
Tags: Deep Learning & Artificial Intelligence; Computer Vision & Machine Vision

Day: Wednesday, 04/06
Time: 10:30 - 10:55
Location: Room 210G

S6742 - Real-Time Person Tracking on Jetson with OpenPTrack

Jeff Burke Assistant Dean, Technology and Innovation, UCLA School of Theater, Film and Television
Jeff Burke is Assistant Dean for Technology and Innovation at the UCLA School of Theater, Film and Television (UCLA TFT). He has produced, managed, programmed and designed experimental performances, short films, new genre art installations and new facility construction internationally for more than 15 years. Jeff has been a faculty member since 2001 and today, in addition to his role developing technology and innovation strategy at TFT, is Co-PI and application team lead for the Named Data Networking project, a multi-campus effort supported by the National Science Foundation (NSF) and an international 25-member consortium to develop a future Internet architecture. In 2004, Burke co-founded UCLA TFT's Center for Research in Engineering, Media and Performance (REMAP), a collaboration with the Henry Samueli School of Engineering and Applied Science, which combines research, artistic production and community engagement. At REMAP, Burke's research has been supported by the NSF and NEA, Intel, Cisco, Trust for Mutual Understanding and the MacArthur Foundation, among others. From 2006-2012, he was area lead for participatory sensing at the NSF Center for Embedded Networked Sensing, helping to define a new application arena for mobile devices. In 2014, Jeff received a three-year Google Focused Award on the "Future of Storytelling," for work that will explore the intersection of storytelling and coding through research and production of original, interdisciplinary digital media works at UCLA TFT.

We'll provide an overview of OpenPTrack, a GPU-enabled, open-source project that enables real-time position tracking of many people using networked 3D imagers, which is now available for the Jetson TK1/TX1 embedded platform. OpenPTrack specifically targets innovative applications in education, arts, and culture, where it aims to meet a need for real-time person tracking that is reliably scalable over large areas, realistically deployable, and low cost. We'll cover the basic technical approach, UCLA REMAP's experience from real-world multi-imager deployments, and the technology roadmap, using Jetson, that aims to bring occlusion-resistant, real-time person tracking into the mainstream of interactive design and experimentation.

Level: Intermediate
Type: Talk
Tags: Computer Vision & Machine Vision; Robotics & Autonomous Machines; Embedded

Day: Wednesday, 04/06
Time: 10:30 - 10:55
Location: Room 210F

S6121 - GPU-Accelerated Computer Vision for Multimedia, Post Production and Surveillance

Hannes Fassold Senior Researcher, JOANNEUM RESEARCH
Hannes Fassold works at Joanneum Research, where he is a senior researcher at the Audiovisual Media Group of DIGITAL -- the Institute for Information and Communication Technologies. His main research interests are algorithms for digital film restoration, content-based video quality analysis, and the efficient parallelization of these algorithms on the GPU. He received an M.S. in applied mathematics from Graz University of Technology in 2004. He has published several publications in these fields and is the principal investigator for the CUDA Research Center at DIGITAL - Institute for Information and Communication Technologies, Joanneum Research.

Computer vision is at the core of many tools used in multimedia, post-production, and surveillance. We'll present some key computer vision algorithms for motion compensation, feature point extraction and tracking, SIFT descriptor extraction, and wavelet transform. We'll provide information about the significant speed-up we gained from porting these algorithms to the GPU and lessons learned from the process of porting. We'll give insight how these algorithms are used in several applications like real-time video quality analysis (detection of dropouts and noise level), brand visibility monitoring in broadcast content, film and video restoration (dust and dirt removal, noise reduction, etc.), and traffic monitoring for wrong-way driver detection.

Level: All
Type: Talk
Tags: Media & Entertainment; Computer Vision & Machine Vision; Video & Image Processing

Day: Wednesday, 04/06
Time: 14:00 - 14:25
Location: Room LL21C

S6293 - DeepFont: Font Recognition and Similarity Based on Deep Learning

Hailin Jin Principal Scientist, Adobe
Dr. Hailin Jin has been a principal scientist with Adobe Research since 2004. Hailin received his B.S. in automation from Tsinghua University, Beijing, in 1998. He then received his M.S. and Ph.D. in electrical engineering from Washington University in Saint Louis in 2000 and 2003, respectively. In 2003, he was a postdoctoral researcher at the Computer Science Department with the University of California at Los Angeles.

Font is one of the core elements in design. In this talk we will present two technologies: one for recognizing font from an image and other for suggesting fonts based on visual similarity. Both technologies are built upon improvements to the state-of-the-art in Deep Learning. Our recognition system is trained with millions of images and on NVIDIA GPUs. It is able to recognize over 7,500 fonts, achieves an accuracy of higher than 80% (top-5), and produces a good font similarity measure for font selection and suggestion. The technologies presented are the foundation of the new font similarity feature in Adobe Photoshop.

Level: All
Type: Talk
Tags: Deep Learning & Artificial Intelligence; Computer Vision & Machine Vision

Day: Wednesday, 04/06
Time: 14:00 - 14:25
Location: Room 210G

S6301 - Driver Face Analytics & Emotion Recognition Using Deep Learning

Modar Alaoui CEO, Eyeris
Modar Alaoui is a tech entrepreneur and expert in artificially intelligent vision technologies, deep learning, and ambient intelligence. Modar is founder and CEO at Eyeris, the worldwide leading deep learning-based emotion recognition software. The company's flagship product, EmoVu, reads facial micro-expressions in real time and uses convolutional neural networks as a deep learning architecture to train and deploy its algorithm into a myriad of today's commercial applications. Modar combines a decade of experience between human machine interaction and audience behavioral measurement. He is a frequent speaker and keynoter on "ambient intelligence" as the next frontier in AI, a winner of several technology and innovation awards, and has been featured in many major publications for his work.

We'll introduce you to ultra-lightweight vision software that reads facial micro-expressions in real time for use in driver monitoring systems in the next generation of vehicles. Using deep learning-based convolutional neural networks (CNNs) powered by GPUs, vision algorithms for embedded systems can now allow vehicles to constantly monitor drivers' inattention, cognitive awareness, and emotional distraction, through a number of face analytics and emotion recognition technology in a 30th of a second. We'll also reveal the five most common applications of such technology in the automotive space, ranging from invisible reactive support systems to semi-autonomous driving. We'll also present on stage a brief, highly rated live demo toward the end of the session.

Level: All
Type: Talk
Tags: Self-Driving Cars & Automotive ; Computer Vision & Machine Vision; Deep Learning & Artificial Intelligence; Intelligent Video Analytics (IVA)

Day: Wednesday, 04/06
Time: 14:00 - 14:25
Location: Room LL21E

S6153 - Fast Non-Rigid Registration for Mobile High-Dynamic Range Photography

Orazio Gallo Senior Research Scientist, NVIDIA
Orazio earned a M.S. degree in Biomedical Engineering from "Politecnico di Milano" (Italy). He then joined the Smith-Kettlewell Eye Research Institute, where he developed a novel bio-imaging technique capable of recording micrometric deformations of soft tissues. Subsequently, he joined the University of California at Santa Cruz, where he received a Ph.D. in Computer Engineering in 2011. During his studies in Santa Cruz, Orazio also interned at Canesta, Inc. (now acquired by Microsoft), and at the Nokia Research Center in Palo Alto. In September 2011, Orazio joined NVIDIA Research, where he currently works in the Mobile Visual Computing team. His interests span several areas of the fields of computer vision and computational photography. For a complete list of papers, including those published before joining NVIDIA research, see here. Orazio regularly serves on the program committees of the top computer vision and computational photography conferences (CVPR, ICCV, ICCP) and is an associate editor of the journal Signal Processing: Image Communication.

We present a method that leverages the computational power of GPUs to create a high-dynamic-range (HDR) photograph in the presence of camera motion and scene changes. Our approach is extremely fast and prevents the artifacts that arise from insufficient registration quality. Previous methods to address this problem are either accurate, but too slow for mobile devices, or fast, but prone to failing. As a comparison, our method runs in under 700ms on an NVIDIA-powered tablet for a pair of 5MP images, whereas previous state-of-the-art methods performing non-rigid registration take over a minute on desktops for a pair of 1MP images.

Level: Intermediate
Type: Talk
Tags: Video & Image Processing; Computer Vision & Machine Vision

Day: Wednesday, 04/06
Time: 14:30 - 14:55
Location: Room LL21B

S6238 - Realtime Raw Image and Video Processing on GPU

Fyodor Serzhenko CEO, Fastvideo
Fyodor is CEO of Fastvideo company. His research interests include high speed cameras and software for high speed imaging, high performance computing, GPU image processing for video applications. He was graduated from Moscow Institute of Physics and Technology in 1989 and got PhD in physics of semiconductors in 1993.

The goal of this session is to demonstrate how to achieve real time image and video processing for RAW data on GPU. In this session we will present detailed analysis of Fastvideo SDK for GPU image processing pipeline: RAW/DNG acquisition, preprocessing, demosaicing, denoising, color correction, tone mapping, resizing, sharpening, OpenGL output, compression to MJPEG and H.264. Now it could be done in real time on GPU for 4K RAW data.

Level: All
Type: Talk
Tags: Media & Entertainment; Video & Image Processing; Computer Vision & Machine Vision

Day: Wednesday, 04/06
Time: 14:30 - 14:55
Location: Room LL21C

S6497 - Neural Attention for Object Tracking

Brian Cheung Ph.D. Student, UC Berkeley
Brian Cheung is a Ph.D. student at UC Berkeley working with Professor Bruno Olshausen at the Redwood Center for Theoretical Neuroscience. His research interests lie at the intersection between machine learning and neuroscience.

With differentiable forms of attention being integrated into neural networks, end-to-end training with backpropagation is possible. We adopt the recently proposed attention mechanism in spatial transformer networks (STNs) into a recurrent architecture to perform object tracking. We show that this attention mechanism has significant overlap with the mechanism in deep recurrent attentive writer (DRAW) networks, which have been successfully used to create generative models of images. We present an end-to-end trainable recurrent attention model for tracking a variety of objects in video recorded by cameras mounted on an automobile.

Level: Advanced
Type: Talk
Tags: Deep Learning & Artificial Intelligence; Robotics & Autonomous Machines; Computer Vision & Machine Vision; Self-Driving Cars & Automotive

Day: Wednesday, 04/06
Time: 15:00 - 15:25
Location: Room 210G

S6522 - A Novel Neural Network Architecture for Representing Scene Structure

Eric Weiss Graduate Student, UC Berkeley
Eric Weiss is a Ph.D. student at UC Berkeley, working under Professor Bruno Olshausen at the Redwood Center for Theoretical Neuroscience. His work focuses on computational modeling of cognitive and neural processes using methods from statistics and machine learning.

Early works on deep image processing using recurrent neural networks with selective attention have yielded promising results. However, it is unclear whether standard recurrent network architectures are well-suited to representing scene structure. We present a novel memory system that can efficiently store a high-level model of a scene. The proposed approach features several advantages: it is differentiable, easy to analyze, and has constant memory requirements. Additionally, we show how it is relatively straightforward to incorporate it into a selective attention mechanism based on information theoretic principles, enabling highly efficient image processing. We present results on a toy dataset.

Level: Intermediate
Type: Talk
Tags: Robotics & Autonomous Machines; Computer Vision & Machine Vision; Deep Learning & Artificial Intelligence; Algorithms; IoT

Day: Wednesday, 04/06
Time: 15:00 - 15:25
Location: Room LL20D

S6588 - GPU-Based Deep Learning in Cloud and Embedded Systems

Frederick Soo Chief Technology Officer, Nauto, Inc.
As chief technology officer and co-founder of Nauto, Inc., Frederick Soo has assembled a team of world-class computer vision and machine learning researchers and engineers and set them to build the core algorithms and hardware for Nauto's commercial products. Prior to joining Nauto, Fred studied the computational neurophysiology of the retina, receiving his Ph.D. in biophysics from Stanford University and completing post-doctoral fellowships at the University of Washington and Princeton University. His work experiences include working at McKinsey and Co., where he collaborated with Nauto co-founder Prof. Stefan Heck, and at Soo Embedded Systems, where he built products from the ground up.

We'll present how Nauto uses deep learning in its distributed, vehicle-based compute and sensor network, and our learnings to date. Topics will include the performance of deep learning algorithms for computer vision in embedded systems, strategies for distributing compute across networks of embedded systems and in the cloud, and collecting and labeling data to maximize the performance of the system. Nauto's system is a dual-camera, windshield-mounted dashcam with GPS, IMU, wireless/cellular connection, and a SoC capable of running small CNNs in real time.

Level: All
Type: Talk
Tags: Self-Driving Cars & Automotive ; Embedded; Deep Learning & Artificial Intelligence; Computer Vision & Machine Vision

Day: Wednesday, 04/06
Time: 15:00 - 15:25
Location: Room LL21E

S6596 - IFM Technologies: Intelligent Flying Machines for Indoor Applications

Marc Gyongyosi Founder , IFM Technologies
Marc Gyongyosi is a junior in computer science at Northwestern University in the McCormick School of Engineering. For the past two years, he has been working closely together with BMW's robotics research department to develop novel robotic systems assisting workers at BMW factories. At BMW, Marc's primary research focus is the implementation and development of cooperative lightweight robots. At Northwestern's The Garage, Marc is involved in two startups: at MDAR Technologies, he works on a novel 3D vision system for self-driving cars and other autonomous vehicles. As the founder of IFM Technologies, he develops novel "Intelligent Flying Machines," i.e., Drones for Decisions. IFM Technologies aims to increase productivity and improve efficiency in everyday manufacturing and logistics processes.

We'll present recent advancements in leveraging the GPU on-board IFM Technologies' "Intelligent Flying Machines." IFM is providing industrial, indoor, flying platforms for data-driven decisions in the manufacturing and logistics industry. IFM provides a complete framework to collect, visualize, and leverage three-dimensional data analysis in indoor environments. Using the onboard GPU, IFM Technologies takes innovative production and logistics technology to a -- quite literally -- new dimension.

Level: Intermediate
Type: Talk
Tags: Robotics & Autonomous Machines; Computer Vision & Machine Vision; IoT

Day: Wednesday, 04/06
Time: 15:30 - 15:55
Location: Room LL20D

S6743 - Large Scale Video Processing for Virtual Reality

Arthur van Hoff CTO, Jaunt VR
Arthur van Hoff is a serial entrepreneur and was most recently CTO at Flipboard. He started his career in Silicon Valley at Sun Microsystems where he was an early developer of the Java programming language. Since then he has started several successful companies including Marimba (IPO 1999), Strangeberry (acquired by TiVo), ZING (acquired by Dell), and Ellerdale (acquired by Flipboard). Arthur has expertise in machine learning, big data, mobile applications, 3D printing, and computational photography. He is originally from the Netherlands and has a master's degree in Computer Science from Strathclyde University in Glasgow.

Jaunt VR has developed a GPU based large scale video processing platform to combine multiple HD camera streams in radial configuration into seamlessly stitched stereoscopic spherical panoramas. The approach uses complex computational photography algorithms that require sharded processing of the data across hundreds of cloud based GPU instances.

Level: All
Type: Talk
Tags: Virtual Reality & Augmented Reality; Computer Vision & Machine Vision; Video & Image Processing

Day: Wednesday, 04/06
Time: 15:30 - 15:55
Location: Room LL20C

S6321 - How Deep Learning Works for Automated Customer Service

Chenghua (Kevin) Li Chief Scientist of DNN Lab, JD.COM
Dr. Chenghua Li is the chief scientist in the deep neural network (deep learning) laboratory of JD, in charge of promoting the application of deep learning technologies in JD products. He was a data mining expert in the National Key Laboratories of Hisense, in charge of intelligent hardware innovation and the development of data mining. Chenghua has been researching and working in machine learning, especially in neural network and data mining for decades. He has published more than 30 papers in world-leading academic journals such as Expert System with Application, Information Processing and Management, Knowledge based system, and Neurocomputing, and hold more than 10 patents. He received his Ph.D in data mining and machine learning at Chonbuk National University and finished his post-doctoral research at St. Francis Xavier University and York University in Canada. He was also a visiting scientist at MIT Media Lab.

Deep learning research and applications have seen numerous successes in the field of image processing and speech recognition. However, in the field of natural language processing, it is still under utilized. This session will share the relevant technology and the development process of the intelligent customer service robot; as well as machine learning, deep learning, and natural language processing related technology. We'll also discuss the application of deep learning on natural language processing and automatic question answering system, the role it plays in business, and how it enhances the ability to answer customer questions and boost customer satisfaction.

Level: All
Type: Talk
Tags: Deep Learning & Artificial Intelligence; Robotics & Autonomous Machines; Computer Vision & Machine Vision

Day: Wednesday, 04/06
Time: 16:00 - 16:25
Location: Room LL20D

S6417 - FireCaffe: Near-Linear Acceleration of Deep Neural Network Training on Compute Clusters

Forrest Iandola CEO, DeepScale
Forrest Iandola will complete his Ph.D. in EECS from UC Berkeley in spring 2016. Forrest has published more than 10 papers on computer vision and has applied computer vision research experience at companies such as NVIDIA and Microsoft.

One of the largest barriers to industrial adoption of deep learning is the time required to train models; it can take a week or more to train a high-quality deep neural network on a GPU workstation. We present FireCaffe, which trains state-of-the-art deep neural networks on a cluster of 32 GPUs with a 23x speedup over a single GPU.

Level: Intermediate
Type: Talk
Tags: Deep Learning & Artificial Intelligence; Computer Vision & Machine Vision; Big Data Analytics

Day: Wednesday, 04/06
Time: 16:30 - 16:55
Location: Room 210E

S6426 - Production Intelligence: GPU-Databases for Predictive Maintenance and In-Line Controlling in Automobile Manufacturing

Peter Strohm Senior Program Manager R&D, Jedox AG
Peter Strohm works as project manager in the R&D department at Jedox with a focus on GPU databases, business intelligence, and big data analysis. He obtained his diploma in computer science from the University of Freiburg, Germany, in 2008. After that, he joined the Inline Processing Team at Fraunhofer Institute for Physical Measurement Techniques IPM, Freiburg, as a software developer for parallel real-time applications. Since 2013, Peter has been with Jedox as a GPU developer and manager for research projects.

Learn how in-GPU-memory databases optimize complex manufacturing processes by enabling real-time data input into big datasets, in-line decision making, and predictive maintenance. In general, manufacturing processes today provide tons of data, e.g., on the process itself, workpieces, machine sensor data, parts delivered by external vendors, etc. In the Production Intelligence project, our goal is to turn this unspecific data into "smart data" to gain better insight in the manufacturing process, e.g., prevent machine shutdowns or decrease the amount of junk parts. We'll present our solutions to streaming input data vectors into big datasets, analyzing incoming data in real time and predicting production or system errors with the help of deep learning algorithms.

Level: All
Type: Talk
Tags: Big Data Analytics; Deep Learning & Artificial Intelligence; Computer Vision & Machine Vision

Day: Thursday, 04/07
Time: 10:00 - 10:25
Location: Room 210F

S6456 - Natural Human Interactivity in the World of AR & VR: A Pipe Dream or Reality?

Soulaiman Itani Founder & CTO, Atheer
Soulaiman Itani has spent his career trying to understand how the world operates and how to leverage that knowledge to improve everyday life. With that core belief he created Atheer, the goal of which is to advance human-centric computing technologies and empower users to have technology work with them in ways never thought possible only a few short years ago. His previous work includes designing cancer tests and treatments as well as creating models for robotics and unmanned aerial vehicles. He received his M.S. and Ph.D. in electrical engineering and computer science from the Massachusetts Institute of Technology.

As augmented and virtual reality get traction in enterprises, smart glasses are graduating from a tiny monocular display to a 3D immersive experience, overlaying rich contextual information right where you need it. Now the questions coming into the spotlight are "What's the optimal interaction model for the enterprise-class workflows powered by the smart glasses? Can we combine hand gestures, voice, eye tracking, head motion, and contextualization to build a more intuitive and natural user interaction?" Augmented interactive reality promises to increase productivity and streamline workflows in ways that hasn't been seen before.

Level: Beginner
Type: Talk
Tags: Virtual Reality & Augmented Reality; Computer Vision & Machine Vision

Day: Thursday, 04/07
Time: 10:00 - 10:25
Location: Room LL20C

S6561 - Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding

Song Han PhD student, Stanford University
Song Han is a fourth-year PhD student with Prof. Bill Dally at Stanford University. His research interest is computer architecture, deep learning, and computer vision. His research is improving the efficiency of neural networks to fit it on mobile. Before joining Stanford, Song Han did his undergrad at Tsinghua University, Beijing.

Neural networks are both computationally intensive and memory intensive, making them difficult to deploy on embedded systems with limited hardware resources. To address this limitation, we introduce "Deep Compression", which reduces the number of connections of deep neural networks by an order of magnitude and the total size of the networks by 35-49x without affecting their accuracy. On the ImageNet dataset, our method reduced the storage required by AlexNet by 35x, from 240MB to 6.9MB, without loss of accuracy. Our method reduced the size of VGG-16 by 49x, from more than half a gigabyte to 11.3MB, again with no loss of accuracy. This allows fitting the model into on-chip SRAM cache rather than off-chip DRAM memory. This also makes it possible to fit DNN into mobile Apps given size limit.

Level: Intermediate
Type: Talk
Tags: Deep Learning & Artificial Intelligence; Computer Vision & Machine Vision

Day: Thursday, 04/07
Time: 10:30 - 10:55
Location: Room 210D

S6753 - How to Draw Living Virtual Reality Worlds

Beck Besecker CEO, Co-Founder, Marxent
Beck Besecker is co-founder and CEO of Marxent (marxent.com), the leader in enterprise Virtual Reality and Augmented Reality solutions for retailers and manufacturers. Prior to founding Marxent, Beck spent 13 years building interactive marketing solutions for Fortune 500 retailers and brands, including Target Stores and Tesco. In 1999, he founded Copient Technologies which enabled large retailers to easily manage personalized promotions in-store and online. Copient was acquired by NCR in 2003. Beck then served as EVP of New Business at Catalina Marketing, the nation's largest in-store promotional network.

Imagine creating and populating an endless series of living, breathing 360-degree worlds with 3D products. Empowered to change features, textures and objects in real time, you can immediately view each freshly designed creation in 3D Virtual Reality on an Oculus Rift or in Augmented Reality on an iPad. Using the VisualCommerce(TM)-powered Lowe's Holoroom as an example, we'll share three tips for taking full advantage of NVIDIA graphics processing power to calculate and render VR-ready 3D graphics to create the ultimate real-time Virtual Reality experience.

Level: Intermediate
Type: Talk
Tags: Virtual Reality & Augmented Reality; Computer Vision & Machine Vision; Graphics Virtualization

Day: Thursday, 04/07
Time: 13:00 - 13:25
Location: Room LL20C

S6141 - Computer Vision Algorithm Acceleration Using GPGPU and the Tegra Processor's Unified Memory

Aaron Mosher Design and Analysis Engineer, The Boeing Company (Boeing Research & Technology)
Aaron Mosher is a design and analysis engineer at Boeing Research and Technology (BR&T). He is currently a team lead on multiple projects involving sensor processing, algorithms, and software. He has worked on autonomy and sensor processing technologies for both unmanned ground vehicles and unmanned air vehicles (UGV and UAV). He holds three patents on radar obstacle detection technology. His first experience with unmanned vehicles was a joint research project between Boeing, Carnegie Mellon University, and Duke University for the DARPA Grand Challenge unmanned ground vehicle competition in 2004. He has a B.S. in computer engineering from the University of Alabama in Huntsville, and an M.S. in systems engineering from Missouri Science and Technology. He has worked on a variety of projects, including ground vehicles, air vehicles, and communications systems.

We'll explain how the Tegra processor's GPGPU capabilities and Unified Memory enabled us to accelerate computer vision algorithms. Unified Memory has been available on CUDA before, but now the Tegra K1 and X1 architectures use Unified Memory at the physical layer. Our example computer vision algorithm detects objects in a scene and requires a high degree of computation on traditional desktop systems, making it an ideal candidate for GPGPU acceleration. We'll explain the adaptation difficulty involved in porting an existing algorithm to the Tegra, the challenges involved in taking advantage of GPGPU capabilities, and the Unified Memory advantages and disadvantages, on both the Tegra K1 and X1.

Level: Intermediate
Type: Talk
Tags: Embedded; Computer Vision & Machine Vision; Robotics & Autonomous Machines; Aerospace & Defense; IoT

Day: Thursday, 04/07
Time: 14:30 - 14:55
Location: Room LL20D

S6161 - Single Instruction Multiple Data for Computer Vision

Yen-Te Shih Sr. Compute Architect, NVIDIA
Yen-Te Shih works at NVIDIA on GPU architecture which runs computer vision algorithms and applications.

Attendees will learn how to: port f32 code to f16x2 version quickly; predict the performance; analyze the overhead; and design a tool or follow one SOP to directly translate existing f32 code to f16x2.

Level: Intermediate
Type: Talk
Tags: Performance Optimization; Computer Vision & Machine Vision; Video & Image Processing

Day: Thursday, 04/07
Time: 14:30 - 14:55
Location: Room 212A

S6155 - Deep Learning Recommendation of Treatment from Electronic Data

David Ledbetter Data Science Consultant, Children's Hospital Los Angeles
David Ledbetter has an extensive and deep understanding of decision theory. He has experience implementing various decision engines, including convolutional neural networks, random forests, extra trees, and linear discrimination analysis. His particular area of focus is in performance estimation, where he has demonstrated a tremendous ability to accurately predict performance on new data in nonstationary, real-world scenarios. David has worked on a number of real-world detection projects, including detecting circulating tumor cells in blood, automatic target recognition utilizing CNNs from satellite imagery, make/model car classification for the Los Angeles Police Department using CNNs, and acoustic right whale call detection from underwater sonobuoys. Recently, David has been developing a CNN to generate personalized treatment recommendations to optimize patient outcomes using unstructured electronic medical records from 10 years of data collected from the Children's Hospital Los Angeles Pediatric Intensive Care Unit.

Construct a model to generate treatment predictions to optimize patient outcomes by leveraging the information gleaned from over 10,000 patients that passed through the Pediatric Intensive Care Unit at Children's Hospital Los Angeles over more than 10 years. This is accomplished by converting unstructured, non-uniformly sampled patient information into a structured data representation which resembles an image - here referred to as a "patient snapshot." These patient snapshots elegantly enable convolutional neural networks to efficiently generate a basis.

Level: Intermediate
Type: Talk
Tags: Medical Imaging; Deep Learning & Artificial Intelligence; Computer Vision & Machine Vision

Day: Thursday, 04/07
Time: 15:30 - 15:55
Location: Room LL21B

S6713 - Large Scale Video Processing for Virtual Reality

Arthur van Hoff CTO, Jaunt VR
Arthur van Hoff is a serial entrepreneur and was most recently CTO at Flipboard. He started his career in Silicon Valley at Sun Microsystems where he was an early developer of the Java programming language. Since then he has started several successful companies including Marimba (IPO 1999), Strangeberry (acquired by TiVo), ZING (acquired by Dell), and Ellerdale (acquired by Flipboard). Arthur has expertise in machine learning, big data, mobile applications, 3D printing, and computational photography. He is originally from the Netherlands and has a master's degree in Computer Science from Strathclyde University in Glasgow.

Jaunt VR has developed a GPU based large scale video processing platform to combine multiple HD camera streams in radial configuration into seamlessly stitched stereoscopic spherical panoramas. The approach uses complex computational photography algorithms that require sharded processing of the data across hundreds of cloud based GPU instances.

Level: All
Type: Talk
Tags: Virtual Reality & Augmented Reality; Computer Vision & Machine Vision; Video & Image Processing

Day: TBD, TBD
Time: TBD - TBD
Location: TBD

S6789 - Quasar (GPU Programming Language) on GDaaS Accelerates Coding from Months to Days (Presented by Cloudalize)

Bart Goossens CTO, Gepura (spinoff in incubation at UGent iMinds)
Since 2006, Bart Goossens has been a presenter at more than 30 scientific conferences in the domain of Image Processing/Medical Image Processing, such as IEEE International Conference on Image Processing, SPIE/IS&T Electronic Imaging, SPIE Optics + Photonics and SPIE Medical Imaging. He was also invited for a lecture at Banff International Research Station (BIRS), Banff, Canada in 2010 and for two lectures at the Image Processing Seminar of the University of Houston (dept. Mathematics) in 2013 and 2014, respectively.

Learn why the Quasar programming language for heterogeneous hardware on GDaaS accelerates parallel coding efforts from months to days by relieving most of the device/platform-dependent issues from the programmer. Quasar provides:(1)low barrier of entry to parallel computing,(2)IDE tools,(3)multi-core CPU, GPU and multiple GPU support (4)runtime that will automatically reconfigure your code depending on the context. GDaaS is a GPU Desktop as a Service platform that enables the instant provisioning and deployment of both Quasar and applications developed with Quasar. GDaaS provides flexible compute resources like massive HPC power for deep learning development, big data optimized, instant distribution from your own website (APIs), your own brand, supports a pay-as-you use licensing model.

Level: All
Type: Talk
Tags: Programming Languages; Video & Image Processing; Computer Vision & Machine Vision

Day: TBD, TBD
Time: TBD - TBD
Location: TBD

Talk
 

TUTORIAL

Presentation
Details

S6739 - VisionWorks™ Toolkit Programming

Thierry Lepley Senior SW Engineer, NVIDIA
Thierry Lepley is Senior Computer Vision Engineer at NVIDIA and the NVIDIA representative in the OpenVX standardization group. His focus is on the development of optimized computer vision toolkits and libraries for real-time embedded systems. Earlier, Thierry was Principal Engineer at STMicroelectronics, working on many-core acceleration for computer vision, where he developed a compiler that automates the parallelization of image processing pipelines and the management of image tiling.

In this tutorial, we'll dive into the programming with the VisionWorks toolkit, an NVIDIA SDK for computer vision (CV) that implements and extends the new OpenVX standard. The tutorial will cover the most important aspects of the API, for instance data objects and CV graphs. We'll discuss NVIDIA extensions and the way to interop with CUDA and OpenCV. Finally, we'll look at different ways debugging and profiling an application developed with VisionWorks.

Level: Intermediate
Type: Tutorial
Tags: Computer Vision & Machine Vision

Day: Monday, 04/04
Time: 13:00 - 14:20
Location: Room LL20A

S6729 - Teach Robotics with the New Jetson™ GPU Teaching Kit for Educators

Joe Bungo GPU Educators Program Manager, NVIDIA
Joe Bungo is the GPU Educators Program Manager at NVIDIA, where he enables the use of NVIDIA and GPU technologies in universities in a variety of ways, including curriculum and teaching material development, facilitation of academic ecosystems, and hands-on instructor workshops. Previously, he managed the university program at ARM Inc. and worked as an applications engineer there.
John Seng Professor, Cal Poly State University
John Seng is a professor in the Computer Science department at Cal Poly State University, San Luis Obispo. He is also part of the Cal Poly Computer Engineering Program.

As performance and functionality requirements of interdisciplinary robotics applications rise, industry demand for new graduates familiar with GPU-accelerated computer vision, machine learning and other robotics concepts grows. We'll introduce you to a comprehensive set of academic labs and university teaching material targeted at the NVIDIA Tegra-based Jetson embedded computing platform for use in introductory and advanced interdisciplinary robotics courses. The teaching materials start with the basics and focus on programming the Jetson platform, and include advanced topics such as computer vision, machine learning, robot localization and controls.

Level: All
Type: Tutorial
Tags: Robotics & Autonomous Machines; Computer Vision & Machine Vision

Day: Monday, 04/04
Time: 14:30 - 15:20
Location: Room 212A

S6715 - VisionWorks™ Toolkit Programming Tutorial

Thierry Lepley Senior SW Engineer, NVIDIA
Thierry Lepley is Senior Computer Vision Engineer at NVIDIA and the NVIDIA representative in the OpenVX standardization group. His focus is on the development of optimized computer vision toolkits and libraries for real-time embedded systems. Earlier, Thierry was Principal Engineer at STMicroelectronics, working on many-core acceleration for computer vision, where he developed a compiler that automates the parallelization of image processing pipelines and the management of image tiling.

In this tutorial, we'll dive into the programming with the VisionWorks toolkit, an NVIDIA SDK for computer vision (CV) that implements and extends the new OpenVX standard. The tutorial will cover the most important aspects of the API, for instance data objects and CV graphs. We'll discuss NVIDIA extensions and the way to interop with CUDA and OpenCV. Finally, we'll look at different ways debugging and profiling an application developed with VisionWorks.

Level: Intermediate
Type: Tutorial
Tags: Computer Vision & Machine Vision

Day: Monday, 04/04
Time: 15:30 - 16:50
Location: Room 211B

Tutorial
 

HANDS-ON LAB

Presentation
Details

L6121 - Applied Deep Learning for Vision and Natural Language with Torch7

Nicholas Leonard Research Engineer, Element Inc.
Nicholas Leonard applies deep learning to biometric authentication using smartphones. He graduated from the Royal Military College of Canada in 2008 with a B.S. in computer science. Nicholas retired from the Canadian Army Officer Corp in 2012 to complete an M.S. in deep learning at the University of Montreal.

This hands-on tutorial targets machine learning enthusiasts and researchers and will cover applying deep learning techniques on classifying images and natural language data. The session is driven in Torch: a scientific computing platform that has great toolboxes for deep learning and optimization among others, and fast CUDA backends with multi-GPU support. Torch is supported by Facebook, Google, Twitter, and a strong community who actively open-source their code and packages.

Level: Beginner
Type: Hands-on Lab
Tags: Deep Learning & Artificial Intelligence; Computer Vision & Machine Vision; Tools & Libraries

Day: Tuesday, 04/05
Time: 15:00 - 16:30
Location: Room 210B

L6129 - VisionWorks Toolkit Hands-on (Computer Vision)

Thierry Lepley Senior Software Engineer, NVIDIA
[To Be Written]
Colin Tracey Senior System Software Engineer, NVIDIA
Colin Tracey has been with NVIDIA as a Senior System Software Engineer since 2011. He has worked on camera features for mobile devices including panorama, HDR, video stabilization, and object tracking. More recent work has been in ADAS and autonomous driving systems including surround view, obstacle detection, and sensor fusion.

In this hands-on session, we'll discover practically the VisionWorks™ toolkit, a NVIDIA SDK for computer vision (CV) that implements and extends the new OpenVX standard. The first step will be to install the VisionWorks toolkit, discover its structure, its documentation and run samples. In a second step, we will experiment different ways debugging and profiling an application developed with VisionWorks. Finally, we will do some programming to experience practically the API.

Level: Intermediate
Type: Hands-on Lab
Tags: Computer Vision & Machine Vision

Day: Wednesday, 04/06
Time: 09:30 - 11:00
Location: Room 210C

L6124 - Teach GPU Accelerated Robotics: Hands-on with Jetson™ Robotics Teaching Kit

Joe Bungo GPU Educators Program Manager, NVIDIA
Joe Bungo is the GPU Educators Program Manager at NVIDIA, where he enables the use of NVIDIA and GPU technologies in universities in a variety of ways, including curriculum and teaching material development, facilitation of academic ecosystems, and hands-on instructor workshops. Previously, he managed the university program at ARM Inc. and worked as an applications engineer there.

As performance and functionality requirements of interdisciplinary robotics applications rise, industry demand for new graduates familiar with GPU-accelerated computer vision, machine learning and other robotics concepts grows. This hands-on tutorial introduces a comprehensive set of academic labs and university teaching material targeted at the NVIDIA Tegra-based Jetson embedded computing platform for use in introductory and advanced interdisciplinary robotics courses. The teaching materials start with the basics and focus on programming the Jetson platform, and include advanced topics such as computer vision, machine learning, robot localization and controls.

Level: Intermediate
Type: Hands-on Lab
Tags: Robotics & Autonomous Vehicles; Computer Vision & Machine Vision

Day: TBD, TBD
Time: TBD - TBD
Location: TBD

Hands-on lab
 

POSTER

Presentation
Details

P6310 - Deep Residual Networks - Ultra-Deep Neural Networks with 150+ layers

Jian Sun Principal Research Manager, Microsoft
Jian Sun was born in Xian, China,home of The Terracotta Army. He received a B.S., M.S., and a Ph.D. from Xian Jiaotong University in 1997, 2000 and 2003, respectively. In 2003, he joined Microsoft Research Asia, and has been working in the fields of computer vision and computer graphics, with particular interests in building real-world working systems. His current primary research interests are computational photography, face recognition, and deep learning based image understanding.

Deeper neural networks are difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers---8x deeper than VGG nets but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task.

Level: Intermediate
Type: Poster
Tags: Computer Vision & Machine Vision; Deep Learning & Artificial Intelligence

Day: TBD, TBD
Time: TBD - TBD
Location: TBD

P6325 - Which Whale Is It, Anyway? Face Recognition for Right Whales Using Deep Learning

Robert Bogucki Chief Science Officer, deepsense.io
Robert is a Chief Science Officer at deepsense.io where he currently manages the R&D team and focuses on deep learning. He is also a successful Kaggle competitor. When tackling real life problems, he particularly enjoys leveraging algorithms and computational power instead of, or in addition to, domain knowledge. His motivation to work in the IT Industry is to bring the theoretical ideas and concepts and put them to good use.

With fewer than 500 North Atlantic right whales left in the world's oceans, knowing the health and status of each whale is integral to the efforts of researchers working to protect the species from extinction. To interest the data science community, NOAA Fisheries has organized a competition hosted on Kaggle.com. The challenge was to automate the right whales recognition process using a dataset of aerial photographs of individual whales - currently a painstaking and lengthy, manual process. In the poster, we outline the winning solution. It is based on deep learning and convolutional neural networks.

Level: Advanced
Type: Poster
Tags: Deep Learning & Artificial Intelligence; Computer Vision & Machine Vision

Day: TBD, TBD
Time: TBD - TBD
Location: TBD

P6327 - Fine-Tune For A Fortune: Transfer Learning Using DIGITS and GPUs

Valeriu Codreanu HPC consultant, SURFsara
Valeriu Codreanu is currently working as HPC consultant at SURFsara, the Dutch Supercomputing Center. Since 2015 Valeriu is PI for the CUDA Research Center at SURFsara. Before joining the team in Amsterdam, Valeriu was a postdoctoral researcher for three years at both Eindhoven University of Technology and the University of Groningen working on GPU computing, computer vision, and embedded systems. Valeriu received his PhD in Electrical Engineering from the Polytechnic University of Bucharest in 2011 with a thesis proposing efficient cooperation between multi-threaded and vector processors. His interests lie in the field of high-performance and energy-efficient machine learning systems.

Deep convolutional neural networks are widely accepted as the state-of-the-art solution for various computer vision problems. These commonly lead to a trade-off between network complexity and over-fitting, addressable by increasing the number of training examples, thus resulting in a lengthy training process. Moreover, more training examples may not even be available. Recent research suggests that this hurdle can be surmounted by using pre-trained complex networks and then fine-tuning them to fit specific datasets. We show that this approach allows for record-breaking performance on tasks ranging from natural image classification to handwritten character recognition. This is made possible by using high-performance NVIDIA GPUs in conjunction with the NVIDIA DIGITS training system.

Level: Intermediate
Type: Poster
Tags: Deep Learning & Artificial Intelligence; Computer Vision & Machine Vision

Day: TBD, TBD
Time: TBD - TBD
Location: TBD

P6330 - GPU Boosted Deep Learning in Real-time Face Alignment

Binglong Xie Chief Architect, HiScene Information Technology Co,.Ltd
Dr. Binglong Xie is chief architect at HiScene, a leading Chinese image recognition and augmented reality technology provider. Before joining HiScene, he was Senior Staff Engineer at Qualcomm, leading heterogeneous acceleration of computer vision algorithms and applications on Snapdragon mobile platforms. Prior to that, he worked at Siemens Corporate Research on research and development in industrial inspection for Siemens Energy and other computer vision projects for Siemens business units. He received Ph.D. from Lehigh University in Electrical Engineering.

For the task of real-time face alignment, we employ a GPU server cluster to train a Convolutional Neural Network. By taking advantages of both Deep Learning and GPU computing technologies, our algorithm outperforms all existing algorithms on the popularly tested IBUG benchmark. In a photo edit application, our face alignment algorithm is integrated to locate precise facial key points, which provide the basis for further virtual facial makeup. Details of our algorithm are given in our poster, along with experimental results on the public benchmark.

Level: Intermediate
Type: Poster
Tags: Deep Learning & Artificial Intelligence; Computer Vision & Machine Vision; Algorithms

Day: TBD, TBD
Time: TBD - TBD
Location: TBD

P6336 - Oblique-View Computed Tomography for 3D IC Package Inspection Using CUDA

Kyung-Chan Jin Principal Researcher, Korea Institute of Industrial Technology
Kyung-Chan Jin is a principal researcher at the Korea Institute of Industrial Technology.

This study focused on the CUDA implementation to oblique-view CT(Computed Tomography) technique for non-destructive internal inspection of 3D IC chips. With 400 projected images from rotating phantom in an oblique direction, we executed 16 GUPS performance to reconstruct 512x512x512 volume of phantom with NVIDIA Quadro K6000 GPU, showed that the GPU performed 100 times faster than the dual CPU processors in the CT reconstruction method.

Level: Intermediate
Type: Poster
Tags: Computer Vision & Machine Vision; Medical Imaging

Day: TBD, TBD
Time: TBD - TBD
Location: TBD

P6341 - Accelerating Java Applications Using GPGPUs

James Clarkson PhD Student, University of Manchester
James Clarkson is a third year Ph.D. student at the University of Manchester in the UK. He is a member of the Advanced Processor Technologies (APT) group under the supervision of Dr. Mikel Lujan and is actively researching ways to make hardware accelerators more programmable. Prior to starting his Ph.D. James worked for ARM on the EU funded Mont Blanc project.

Over the last few years we have been researching ways to exploit features of managed-languages, such as Java, to simplify programming GPGPUs, we'll present our state of the art prototype: Tornado. Tornado is a framework that allows Java programmers to write GPU accelerated applications in 100% pure Java. It employs a task-based programming model, which makes it simple to compose complex processing pipelines that can execute multiple kernels across multiple GPGPUs. A key outcome of Tornado is that with minimal refactoring of code it is possible to port an application onto a GPGPU. We'll demonstrate a real-time computer vision application, ported from CUDA into Java, to reconstruct a 3D scene from a stream of RGB-D data.

Level: Beginner
Type: Poster
Tags: Programming Languages; Computer Vision & Machine Vision

Day: TBD, TBD
Time: TBD - TBD
Location: TBD

P6343 - GPU Boosted Deep Learning in Real-time Face Alignment

Binglong Xie Chief Architect, HiScene Information Technology Co,.Ltd
Dr. Binglong Xie is chief architect at HiScene, a leading Chinese image recognition and augmented reality technology provider. Before joining HiScene, he was Senior Staff Engineer at Qualcomm, leading heterogeneous acceleration of computer vision algorithms and applications on Snapdragon mobile platforms. Prior to that, he worked at Siemens Corporate Research on research and development in industrial inspection for Siemens Energy and other computer vision projects for Siemens business units. He received Ph.D. from Lehigh University in Electrical Engineering.

For the task of real-time face alignment, we employ a GPU server cluster to train a Convolutional Neural Network. By taking advantages of both Deep Learning and GPU computing technologies, our algorithm outperforms all existing algorithms on the popularly tested IBUG benchmark. In a photo edit application, our face alignment algorithm is integrated to locate precise facial key points, which provide the basis for further virtual facial makeup. Details of our algorithm are given in our poster, along with experimental results on the public benchmark.

Level: Intermediate
Type: Poster
Tags: Deep Learning & Artificial Intelligence; Computer Vision & Machine Vision; Algorithms

Day: TBD, TBD
Time: TBD - TBD
Location: TBD

Poster