Sign In
GTC Logo
GPU
Technology
Conference

March 17-20, 2015 | San Jose, California
Check back often for session updates.
Registration is currently closed. To be notified when registration opens, along with other pertinent news from GTC, sign up here.

Scheduler Planner

Print
Download Pdf
 
  • List View
  • Calender View

 
Refine:
  • Session Levels:
  • |
  • |
  • |
  • |
  • Session Levels:
  •  
  •  
  •  
  •  
  • = Highly Rated Speaker

TALK

Presentation
Details

S5552 - Transparent Parallelization of Neural Network Training

Cyprien Noel Software Engineer, Flickr / Yahoo Inc.
Cyprien Noel
Cyprien worked on high performance distributed software in various settings, finance software startup, gaming, internet of things, and an Air Control simulator at NASA. He lived in France, NY, and loves his new home San Francisco. A couple years ago he started working again on his beginnings in machine learning, combining expertise in high performance computing and neural networks.Bio details to go here.
Simon Osindero Senior Manager / Research Engineer, Flickr / Yahoo Inc.
Simon Osindero
Simon Osindero is currently a senior principal researcher and manager at Flickr, Yahoo Inc. where he leads efforts on applied machine learning. Prior to joining Yahoo, he was CTO and co-founder of LookFlow, a startup that combined state-of-the-art machine learning, computer vision, and information visualization methods to build a revolutionary search-and-discovery experience. (LookFlow was acquired by Yahoo at the end of 2013.) Before starting LookFlow he developed machine learning algorithms for natural language processing and semantic analysis as a researcher at Idilia Inc. He is perhaps best known for his contribution to the field of machine learning through his post-doctoral work on Deep Belief Networks, at the University of Toronto in collaboration with Geoff Hinton and Yee Whye Teh. His 2006 paper is widely credited as reigniting the current wave of interest in "deep learning". He holds: a PhD in Computational Neuroscience from the Gatsby Unit at UCL; an MSci in Experimental & Theoretical Physics along with a BA/MA degrees in Physics, Molecular Biology, and Mathematics from the University of Cambridge; and diplomas in Photography and Design from Concordia University.

Deep learning has enjoyed tremendous success in recent years. Unfortunately, training large models can be very time consuming, even on GPU hardware. We describe a set of extensions to the state of the art Caffe library cite{Jia13caffe}, allowing training on multiple threads and GPUs, and across multiple machines. Our focus is on architecture, implementing asynchronous SGD without increasing Caffe's complexity.

Level: Intermediate
Type: Talk
Tags: Machine Learning & Deep Learning; Computer Vision & Machine Vision; Developer - Performance Optimization; Press-Suggested Sessions: Deep Learning & Computer Vision

Day: Tuesday, 03/17
Time: 13:00 - 13:25
Location: Room 210A
View Recording
View PDF

S5637 - ZFAS - The Brain of Piloted Driving at Audi

Matthias Rudolph Head of Architecture Driver Assistance Systems, Audi AG
Matthias Rudolph
Dr. Rudolph studied Electrical Engineering at the University of Kassel and got his Ph.D. in Aerospace Engineering and Engineering Mechanics from Iowa State in 1999 with a minor in mathematics. After holding various positions at Audi he took in 2009 the Lead of the Department "Architecture Driver Assistance Systems". The zFAS project is one of the core development of the department. Dr. Rudolph is a member of the management at Audi.

During the last several years, Audi has developed with partners a platform that enables piloted driving and piloted parking. At CES 2015 it was shown that the system can drive piloted on the highway from Silicon Valley to Las Vegas. The computational platform or brain of this vehicle is called zFAS, with the core element being the NVIDIA Tegra K1. This talk will start with the history and the motivation of piloted functions at Audi, followed by an overview of the current architecture and an outline of future potential leveraging deep learning algorithms.

Level: Intermediate
Type: Talk
Tags: Automotive; Computer Vision & Machine Vision; Video & Image Processing; Press-Suggested Sessions: Cars

Day: Tuesday, 03/17
Time: 13:00 - 13:25
Location: Room LL21F
View Recording

S5866 - Project Tango: Mobile 3D Tracking and Perception

Johnny Lee Lead, Project Tango, Google
Highly-Rated Speaker
Johnny Lee
Johnny Lee is the lead of Project Tango at Google - which is a focused effort to bring computer vision and advanced sensor fusion to mobile platforms. Previously, he helped Google X explore new projects as Rapid Evaluator and was a core algorithms contributor to the original Xbox Kinect. His YouTube videos demonstrating Wii remote hacks have surpassed over 15 million views and became one of the most popular TED talk videos. In 2008, he received his PhD in Human-Computer Interaction from Carnegie Mellon University and has been recognized in MIT Technology Review’s TR35.
James Fung Platform Software Lead, Project Tango, Google
James Fung
James Fung has been applying GPUs to accelerate general purpose parallel computing, with a focus on image processing and computer vision. He received his Ph.D. in Electrical and Computer Engineering from the University of Toronto. He worked in Developer Technology at NVIDIA helping adoption of GPU Computer Vision. He is currently the Platform Software Lead in Project Tango at Google

Project Tango is a focused effort accelerate the progress and adoption in of 3D tracking and sensing technologies on mobile devices. It is a platform for developing advanced computer vision and sensor fusion algorithms to estimate position and orientation of the device in the real-time, while simultaneously generating a 3D map of the environment. This talk we discuss some of the underlying technologies that make this possible, such as the hardware sensors and some of the software algorithms. We will also show demonstrations of current state of development, and discuss the role of 3D sensing in mobile gaming, indoor navigation, virtual reality, augmented reality, and autonomous drones. We hope you will join us on this journey. We believe it will be one worth taking.

Level: All
Type: Talk
Tags: Computer Vision & Machine Vision; Augmented Reality & Virtual Reality; Press-Suggested Sessions: Deep Learning & Computer Vision

Day: Tuesday, 03/17
Time: 13:00 - 13:50
Location: Room 212B

S5108 - Vision-Based Driver Assistance: Seeing the Way Forward

Ian Riches Director, Global Automotive Practice, Strategy Analytics
Ian Riches is a Director in the Global Automotive Practice at Strategy Analytics. He heads a research team that covers all aspects of embedded automotive electronic systems, semiconductors and sensors on a worldwide basis. His areas of research include powertrain, chassis, safety, security and body applications – including high-growth areas such as hybrid and electric vehicles and advanced driver assistance systems. Before joining Strategy Analytics, Ian spent two years as assistant editor of Automotive Engineer, the UK magazine published by the IMechE. He has also held the position of Press Officer/Technical Author for MTL, a safety-related electronic equipment manufacturing company. With over eighteen years of experience, he is one of the foremost industry analysts in the automotive electronics sector. Ian holds an MA in engineering from Cambridge University, UK, where he specialized in fluid dynamics, turbo-machinery and internal combustion engines.

This market introduction to vision-based solutions in advanced driver assistance systems will highlight the regions, applications and vehicle sectors are driving the growth. Current and likely future architectures will be explored and the implications for both traditional and non-traditional automotive suppliers will be highlighted. Finally, the role of and implications of automated driving will be investigated and analyzed.

Level: All
Type: Talk
Tags: Automotive; Computer Vision & Machine Vision; Video & Image Processing; Press-Suggested Sessions: Cars

Day: Tuesday, 03/17
Time: 13:30 - 13:55
Location: Room LL21F
View Recording
View PDF

S5131 - Mobile Visual Search

Martin Peniak Parallel Computing Software Engineer, Cortexica
Martin Peniak
Martin works as a parallel computing software engineer at Cortexica where he develops algorithms for discrete as well as mobile GPUs. Martin got his Ph.D. in GPU computing applied to cognitive robotics and previously collaborated with international EU FP7 ITALK and Poeticon++ consortium that aimed at developing biologically-inspired artificial systems capable of progressively developing their cognitive capabilities through the interaction with their environments. He also collaborated with ESA (European Space Agency) on a project evolving neural network controllers for simulated Mars rover robots. In summer 2012, Martin worked at NVIDIA research in Santa Clara where he evaluated several machine learning algorithms on the next-generation GPU architecture. During his work at NVIDIA, he also developed a novel bio-inspired system for 3D object recognition.More recently, Martin did a TEDx talk, the first one covering GPU computing and its implications to robotics.

The attendees will learn about Cortexica's FindSimilar™ technology. Its algorithms are based on the way the human visual cortex recognises images and objects, meaning that poor lighting conditions, rotated or skewed images and other 'imperfect' objects can all be recognized accurately. In this presentation, you will learn about the challenges in the field of visual search and how our company addresses them by leveraging the processing power of GPUs including the latest NVIDIA K1 processor. This session will include several demonstrations of our technology and the latest mobile applications using NVIDIA K1 processors to speedup the visual search performance.

Level: Intermediate
Type: Talk
Tags: Computer Vision & Machine Vision; Video & Image Processing; Embedded Systems

Day: Tuesday, 03/17
Time: 13:30 - 13:55
Location: Room 210B
View Recording

S5182 - The Future of Human Vision: Preferential Augmentation Using GPUs

Muhammad Shamim Bioinformatics Programmer, Baylor College of Medicine
Muhammad Shamim
Muhammad Shamim is a bioinformatics programmer in Dr. Erez Lieberman Aiden's Lab at the Baylor College of Medicine, working on a variety of projects ranging from big data and genomics to augmented reality. Muhammad is a graduate of Rice University with a BS in Computer Science and a BA in Computational & Applied Mathematics and Cognitive Sciences.

Loss of vision can result from an enormous number of visual disorders, a small subset of which can be addressed using traditional corrective lenses, i.e. by transforming light in accordance with Snell's law of refraction. In principle, a more general class of transformations might help address a broader range of disorders. Discover how GPUs are being used in augmented reality applications to correct or alleviate vision deterioration in real-time, as well as personalize vision in novel ways.

Level: All
Type: Talk
Tags: Augmented Reality & Virtual Reality; Computer Vision & Machine Vision; Video & Image Processing; Medical Imaging; Press-Suggested Sessions: Deep Learning & Computer Vision

Day: Tuesday, 03/17
Time: 13:30 - 13:55
Location: Room LL21C
View Recording
View PDF

S5581 - Visual Object Recognition Using Deep Convolutional Neural Networks

Rob Fergus Research Scientist , Facebook
Highly-Rated Speaker
Rob Fergus
Rob Fergus is an Associate Professor of Computer Science at the Courant Institute of Mathematical Sciences, New York University. He is also a Research Scientist at Facebook, working in their AI Research Group. He received a Masters in Electrical Engineering with Prof. Pietro Perona at Caltech, before completing a PhD with Prof. Andrew Zisserman at the University of Oxford in 2005. Before coming to NYU, he spent two years as a post-doc in the Computer Science and Artificial Intelligence Lab (CSAIL) at MIT, working with Prof. William Freeman. He has received several awards including a CVPR best paper prize, a Sloan Fellowship & NSF Career award and the IEEE Longuet-Higgins prize.

This talk will describe recent progress in object recognition using deep convolutional networks. Over the last few years, these have demonstrated significant gains over traditional computer vision approaches and are now widely used in industry (e.g. Google, Facebook, Microsoft, Baidu). Rob Fergus will outline how these models work, and describe architectures that produce state-of-the-art results on the leading recognition benchmarks. GPUs are an essential component to training these models. The talk will conclude with a live demo.

Level: Intermediate
Type: Talk
Tags: Machine Learning & Deep Learning; Computer Vision & Machine Vision; Press-Suggested Sessions: Deep Learning & Computer Vision

Day: Tuesday, 03/17
Time: 13:30 - 13:55
Location: Room 210A
View Recording

S5201 - SMTool: A GPU based Satellite Image Analysis Tool

Dilip Patlolla R & D Staff, Oak Ridge National Laboratory
Highly-Rated Speaker
Dilip Patlolla
Dilip Patlolla is an R & D Staff in the Geographic Information Science and Technology (GIST) Group at the Oak Ridge National Laboratory, which has been a pioneer in the development, implementation, and application of systems, science, and technology for geographic information. His primary responsibilities include: opening up new domains of application for HPC, FPGAs, GPUs by researching and developing computing algorithms, and ensuring best possible performance on current and next-generation architectures. He leads the development of mapping and characterizing global-scale human settlements using advanced computing methods and received ORNL's 2013 Significant Event Achievement Award for the effort.

This session will demonstrate our advanced satellite image analytic tool referred as SMTool built on the CUDA platform to process city-scale sub-meter resolution satellite imagery to detect and discriminate man-made structures. Automated analysis of large-scale high resolution satellite imagery requires computationally efficient image representation techniques that characterize the distribution of structures in the scene. The interesting structures could range from simple edges, lines, to complex shapes of objects on the ground.Different representation techniques and their careful implementation exploiting the GPU architecture will be reviewed. We present results of SMTool from our ongoing work supporting global-scale population mapping and polio eradication and immunization efforts.

Level: All
Type: Talk
Tags: Computer Vision & Machine Vision; Machine Learning & Deep Learning; Big Data Analytics; Supercomputing; Press-Suggested Sessions: HPC & Science

Day: Tuesday, 03/17
Time: 14:00 - 14:25
Location: Room 210B
View Recording
View PDF

S5626 - Accelerating Computer Vision and Augmented Reality via GPGPU Computing

Jack Dashwood Senior PR & Marketing Manager, Metaio
Jack Dashwood
Jack Dashwood is the PR & Marketing manager at Metaio Inc. He received his Bachelor of Science in Psychology, and Bachelor of Commerce in Finance from the University of Calgary, Canada, and later obtained a Master's in Global Business from the University of Victoria. Having recently relocated from Metaio's Munich HQ, Jack now oversees Metaio's PR and marketing activities in North America.

It is no secret that augmented reality is a computationally-intensive endeavor. While human sight is taken for granted by the average person, getting a computer to "see" in a way that remotely resembles our expectations is an extremely complex challenge. Tasks such as extracting significant features from a camera feed, estimating a camera pose, and finally rendering digital content appropriately demand a huge amount of processing power from the CPU of today's mobile devices. Compounding this problem is the increasing interest 3D object tracking and depth-sensing cameras. Metaio CEO Dr. Thomas Alt will illustrate the current "CV Bottlenecks" and how GPU-based solutions can significantly improve the increasingly important mobile computer vision and augmented reality apps coming to market.

Level: All
Type: Talk
Tags: Augmented Reality & Virtual Reality; Computer Vision & Machine Vision; Machine Learning & Deep Learning

Day: Tuesday, 03/17
Time: 14:00 - 14:50
Location: Room LL21C
View Recording
View PDF

S5713 - Collaborative Feature Learning from Social Media

Hailin Jin Principal Scientist, Adobe
Hailin Jin
Hailin received his Ph.D. degree in Electrical Engineering from Washington University in Saint Louis. His research interests include deep learning, 3D reconstruction, structure from motion, optical flow, and stereo. His work can be found in several Adobe products, including Photoshop, After Effects, Premiere Pro, and Behance.

Image feature representation plays an essential role in image recognition. The current state-of-the-art feature learning paradigm is supervised learning from labeled data. However, this paradigm requires large-scale category labels, which limits its applicability to domains where labels are hard to obtain. I will present a new data-driven feature learning paradigm which does not rely on category labels. Instead, we learn from user behavior data collected on social media. We use the image relationship discovered in the latent space from the user behavior data to guide the image feature learning. Also presented is a new large-scale image and user behavior dataset collected on Behance.net. The dataset consists of 1.9 million images and over 300 million view records from 1.9 million users.

Level: All
Type: Talk
Tags: Machine Learning & Deep Learning; Press-Suggested Sessions: Deep Learning & Computer Vision

Day: Tuesday, 03/17
Time: 14:00 - 14:25
Location: Room 210A
View Recording
View PDF

S5896 - A Performance, Energy and Accuracy Aware Benchmarking Methodology for Robot Vision

Luigi Nardi Research Associate, Imperial College London
Dr Luigi Nardi is a post-doctoral research associate at Imperial College London in the Software performance optimisation group. Luigi's primary role is to work in the co-design of high-performance computer vision systems where performance, power and accuracy are part of the same optimisation space. Luigi earned his Ph.D. in computer science creating a new performance domain-specific language (DSL) in the context of automatic code generation for applied mathematics. He has almost 10 years of experience in parallel computing and more than 4 years of experience developing GPU enabled codes using CUDA and OpenCL from desktop to embedded. Prior to his current position, Luigi was a permanent researcher at Murex S.A.S. working on the acceleration of production-level computational finance codes for pricing evaluation and risk management on cluster of GPUs.

We introduce SLAMBench, a publicly-available software framework for quantitative, comparable and validatable experimental research to investigate trade-offs in performance, accuracy and energy consumption for real-time 3D scene understanding. 3D scene understanding offers great potential for a new level of scene modelling, localisation and real environmental interaction for many types of robot, but its high computational requirements means that use on mass market embedded platforms is challenging. SLAMBench provides a KinectFusion implementation in C++, OpenMP, OpenCL and CUDA, and a powerful mechanism for reliable accuracy comparison of different implementation and algorithms. We experimentally investigate SLAMBench execution time, energy and accuracy on a variety of multicore and GPU-accelerated platforms.

Level: All
Type: Talk
Tags: Embedded Systems; Computer Vision & Machine Vision; Developer - Performance Optimization

Day: Tuesday, 03/17
Time: 14:00 - 14:25
Location: Room 210G
View Recording
View PDF

S5251 - Accelerating Automated Image Processing Pipelines for Cameras with Novel CFAs on GPUs

Qiyuan Tian Ph.D. Candidate, Stanford University
Qiyuan  Tian
Qiyuan Tian is a Ph.D. Candidate in the Department of Electrical Engineering at Stanford University. He received B.Eng. (2011) in Communication Science and Engineering at Fudan University, China, and M.S. (2013) in Electrical Engineering at Stanford University. He studied as an undergraduate exchange student (2009) in the Department of Electronic and Computer Engineering at The Hong Kong University of Science and Techonology. He is working on digital imaging, magnetic resonance imaging and neuroimaging.
Haomiao Jiang Ph.D. Candidate, Stanford University
Haomiao Jiang
Haomiao Jiang is a Ph.D. candidate in the Department of Electrical Engineering at Stanford University. He received B.A. (2011) in Information Security at Shanghai Jiao Tong University, China, and M.S. (2013) in Electrical Engineering at Stanford University. He is working with Professor Brian Wandell on color vision, display modeling and computational photography.

L3 (Local, Linear, Learned) is a new technology to automate and customize the design of image processing pipelines for cameras with novel architecture, such as unconventional color filter arrays. L3 classifies sensor image pixels into categories that are local in space and response and automatically learns linear operators that transform pixels to the calibrated output space using training data from camera simulation. The local and linear processing of individual pixels makes L3 ideal for parallelization. We accelerated the L3 pipeline on NVIDIA® Shield™ Tablets using GPUs for real time rendering of video captured by a multispectral camera prototype. The combination of L3 and GPUs delivers high performance with low power for image processing on mobile devices.

Level: All
Type: Talk
Tags: Defense; Video & Image Processing; Computer Vision & Machine Vision

Day: Tuesday, 03/17
Time: 14:30 - 14:55
Location: Room 210C
View Recording
View PDF

S5333 - SceneNet: 3D Reconstruction of Videos Taken by the Crowd on GPU

Chen Sagiv CEO, SagivTech Ltd.
Chen  Sagiv
Dr. Sagiv brings to SagivTech over 15 years of experience in the image processing industry both in Israel and the Netherlands. In addition to her activities with the company, she also collaborates with academic beneficiaries in Israel and Europe. Chen Sagiv holds a PhD from Tel Aviv University in Applied Mathematics, with specializations in texture analysis, filter banks and optimization problems.

If you visited a rock concert recently you probably recognized how many people are taking videos of the scenario, using their mobile phone cameras.The aim of SceneNet is to use these multiple video sources to create a high quality 3D video scene that can be shared via social networks. The SceneNet pipeline starts at the mobile device where the video streams are acquired, pre-processed and transmitted to the server, where the various video streams are registered and submitted to 3D reconstruction. We will share the compute challenges of SceneNet and the GPU based acceleration on mobile devices and the server, from pre-processing on the mobile device to extremely computationally demanding algorithms such as bundle adjustment and 3D reconstruction. SceneNet is a FP7 European funded project.

Level: All
Type: Talk
Tags: Computer Vision & Machine Vision; Developer - Algorithms; Video & Image Processing

Day: Tuesday, 03/17
Time: 14:30 - 14:55
Location: Room 210B
View Recording
View PDF

S5571 - A High-Density GPU Solution for DNN Training

Franco Mana Research Engineer, NUANCE
Franco Mana
Franco Mana graduated in Computer Science at the Turin University, Italy. With CSELT (the Italian Telecom Research LAB) since 1986, where he developed his thesis in the Artificial Intelligence field, his first scientific interest has been on automatic learning systems. Next, he moved to the field of neural networks mainly applied to speech recognition. Currently he is in R&D division at NUANCE. He is engaged in algorithmic research for speech recognition and there he developed his know-how in the use of GPU for speeding up HMM-NN hybrid system training.

We, at Nuance, use DNN in ASR systems in multiple languages and tasks. We train DNN with large amounts of data. DNNs are trained with gradient descent methods that are slow and difficult to parallelize across machines. In order to speed up the training process, we have developed training algorithms/recipes which can be used to train a DNN in parallel on multiple GPU devices. This can significantly reduce the DNN training time. We will present benchmark results that include the basic computational operations included in DNN training (SGEMM, Memory copy throughput, etc.) as well as the end-to-end training time on different GPU based hardware configurations. In particular we will benchmark systems based on K10 versus systems based on K80, with a number of GPU varying from 1 to 16.

Level: All
Type: Talk
Tags: Machine Learning & Deep Learning; Supercomputing; Signal & Audio Processing; Press-Suggested Sessions: Deep Learning & Computer Vision

Day: Tuesday, 03/17
Time: 14:30 - 14:55
Location: Room 210A
View Recording
View PDF

S5123 - Through the Eyes of a Car: Visualizing a Car's Camera System

Gernot Ziegler Senior Developer Technology Engineer (Computer Vision), NVIDIA
Highly-Rated Speaker
Gernot Ziegler
Dr Gernot Ziegler is an Austrian engineer with a PhD in Computer Science from University of Saarbrücken, Germany, and an MSc degree in Computer Science and Engineering from Linköping University, Sweden. He pursued his PhD studies at the Max-Planck-Institute for Computer Science, where he specialized in GPU algorithms for computer vision and data-parallel algorithms for spatial data structures. After six years in NVIDIA's Developer Technology (Compute) team, working with High Performance Computing, Gernot now has returned to his original research domain and consults in the use of GPU algorithms for computer vision in the automotive as a senior member of the NVIDIA's Developer Technology team for Computer Vision.

Learn how the GPU's real-time graphics capabilities can be used to interactively visualize and enhance the camera system of modern cars. The GPU simplifies design, interactive calibration and testing of the car's computer vision systems, and even allows for creating simulated environments where the behavior of the car's computer vision can be tested to pass standard safety tests or navigational street situations.

Level: Intermediate
Type: Talk
Tags: Automotive; Computer Vision & Machine Vision; Real-Time Graphics; Press-Suggested Sessions: Cars

Day: Tuesday, 03/17
Time: 15:00 - 15:25
Location: Room LL21F
View Recording
View PDF

S5383 - Mobile 3D Mapping With Tegra K1

Karol Majek Researcher, Institute of Mathematical Machines
Karol Majek
Karol is a PhD Student and Researcher at CUDA Research Center in Institute of Mathematical Machines. He is doing research in robotics. In the last two years he has focused on using CUDA technology in 3D mapping on robotic platforms. Currently he is working on embedding CUDA enabled algorithms to run on Tegra K1.

This work presents 3D mapping algorithm implemented on Tegra K1 device. The data processing pipeline is implemented in parallel using CUDA. The performance and accuracy of the final model were compared to the mobile and desktop GPU results. This work shows how to replace traditional CUDA-enabled laptops with embedded Tegra K1. Attendees will learn about the problems and challenges of embedding parallel 3D mapping algorithm and how to improve its speed.

Level: Intermediate
Type: Talk
Tags: Embedded Systems; Computer Vision & Machine Vision

Day: Tuesday, 03/17
Time: 15:00 - 15:25
Location: Room 210G
View Recording
View PDF

S5429 - Creating Dense Mixed GPU and FPGA Systems With Tegra K1 Using OpenCL

Lance Brown Director - Radar, EW and HPC, Colorado Engineering Inc
Lance Brown
Lance Brown has been in the COTS hardware world since 1999 after spending 5 years as a software engineering at Nortel and Motorola. Lance has been a field application engineer for Curtiss Wright and GE supporting GPU, CPU and FPGA products for telecom, networking and defense. He is now the director of radar, EW and HPC at Colorado Engineering Inc focusing on high TFLOP, low CSWAP systems. CEI's product line is based on 3D computing architectures co-developed with the Missile Defense Agency and Naval Research Labs. Lance is a graduate of the University of Texas - Arlington with a BS in Computer Science Engineering.

With the introduction of comprehensive OpenCL support and IEE754 hard floating point units for Altera FPGAs and availability of NVIDIA® Tegra® K1 GPUs, opportunities for designing compact solutions that used to require many discrete boards can now be done in small form factors for Distributed Aperture Systems (DAS), Situational Awareness 360 (SA360), Digital Signal Processing (DSP) and 100s of other high performance embedded computing (HPEC) from mil-aero to commercial to industrial to medical to consumer applications. Funded by Missile Defense Agency, Lance Brown will discuss the challenges and benefits of using multiple Altera Arria 10 FPGAs and multiple NVIDIA® Tegra® K1 GPUs on a single card to speed up 6 degrees of freedom simulations.

Level: Intermediate
Type: Talk
Tags: Defense; Embedded Systems; Signal & Audio Processing; Computer Vision & Machine Vision

Day: Tuesday, 03/17
Time: 15:00 - 15:25
Location: Room 210C
View Recording
View PDF

S5459 - Exploiting the OpenPOWER Platform for Big Data Analytics and Cognitive Computing

Rajesh Bordawekar Research Staff Member, IBM T. J. Watson Research Center
Rajesh Bordawekar
Rajesh Bordawekar is a research staff member at IBM T. J. Watson Research Center and a member of the Programming Technologies department at the IBM T. J. Watson Research Center. Rajesh studies interactions between applications, programming languages/runtime systems, and computer architectures. His current interest is exploring software-hardware co-design of analytics workloads. Specifically, he has been investigating how GPUs could be used for accelerating key analytics kernels in text analytics, data management, graph analytics, and deep learning.
Ruchir Puri IBM Fellow, IBM T J Watson Research Center, Yorktown Hts, NY, IBM Research
Ruchir Puri
Ruchir Puri is an IBM Fellow at IBM Thomas J Watson Research Center atYorktown Heights, NY where he leads high performance design methodology and SW HW Acceleration research for all of IBM’s enterprise server and system chip designs. Ruchir is a Fellow of the IEEE, an ACM Distinguished Speaker, and has been an IEEE Distinguished Lecturer. He received Asian American Engineer of the year award in 2014. Ruchir was also honored as a John Von-Neumann Chair at Bonn University, Germany and has been an adjunct professor at Dept. of Electrical Engineering, Columbia University, NY.

In this session you will learn about how IBM is exploiting GPUs in its new IBM OpenPOWER platform for acceleration of Big Data Analytics and Cognitive Computing solutions. The Hardware Acceleration Lab in IBM's Software Group is partnering with IBM Research to develop optimized heterogeneous computing solutions. With the creation of the OpenPOWER consortium last year, IBM has created an open ecosystem along with heterogeneous computing platforms that include NVIDIA's Tesla GPUs. GPUs are gaining traction in the Enterprise as accelerators for Big Data Analytics and Cognitive Computing workloads. This session will focus on Industrial case studies and exploitation of GPUs. Some early results will also be shared.

Level: All
Type: Talk
Tags: Big Data Analytics; Machine Learning & Deep Learning; Press-Suggested Sessions: Deep Learning & Computer Vision

Day: Tuesday, 03/17
Time: 15:00 - 15:25
Location: Room 210D
View Recording
View PDF

S5474 - CloudCV: Large-Scale Distributed Computer Vision as a Cloud Service

Dhruv Batra Assistant Professor, Virginia Tech
Dhruv Batra
Dhruv Batra is an Assistant Professor at the Bradley Department of Electrical and Computer Engineering at Virginia Tech, where he leads the VT Machine Learning & Perception group. He is a member of the Virginia Center for Autonomous Systems (VaCAS) and the VT Discovery Analytic Center (DAC). Prior to joining VT, he was a Research Assistant Professor at Toyota Technological Institute at Chicago (TTIC), a philanthropically endowed academic computer science institute located in the campus of University of Chicago. He received his M.S. and Ph.D. degrees from Carnegie Mellon University in 2007 and 2010 respectively, advised by Tsuhan Chen. In past, he has held visiting positions at the Machine Learning Department at CMU, and at CSAIL MIT. His research interests lie at the intersection of machine learning, computer vision and AI, with a focus on developing scalable algorithms for learning and inference in probabilistic models for holistic scene understanding. He has also worked on other topics such as interactive co-segmentation of large image collections, human body pose estimation, action recognition, depth estimation and distributed optimization for inference and learning in probabilistic graphical models. He was a recipient of the Carnegie Mellon Dean's Fellowship in 2007, the Google Faculty Research Award in 2013, the Virginia Tech Teacher of the Week in 2013, the Army Research Office (ARO) Young Investigator Program (YIP) award in 2014, and the National Science Foundation (NSF) CAREER award in 2014. His research is supported by NSF, ARO, ONR, Amazon, Google, Microsoft, and NVIDIA.

In this talk, Attendees can expect to learn about CloudCV, an ambitious system that will provide access to state-of-the-art distributed computer vision algorithms as a cloud service. Our goal is to democratize computer vision; one should not have to be a computer vision, big data and distributed computing expert to have access to state-of-the-art distributed computer vision algorithms. As the first step, CloudCV is focused on object detection and localization in images. CloudCV provides APIs for detecting if one of 200 different object categories such as entities (person, dog, cat, horse, etc), indoor objects (chair, table, sofa, etc), outdoor objects (car, bicycle, etc) are present in the image.

Level: All
Type: Talk
Tags: Computer Vision & Machine Vision; Machine Learning & Deep Learning; Data Center, Cloud Computing & HPC

Day: Tuesday, 03/17
Time: 15:00 - 15:25
Location: Room 210B
View Recording
View PDF

S5631 - Speech: The Next Generation

Bryan Catanzaro Senior Researcher, Baidu
Highly-Rated Speaker
Bryan Catanzaro
Bryan Catanzaro is a research scientist at Baidu's new Silicon Valley Artificial Intelligence Laboratory, working with Adam Coates and Andrew Ng to create next generation systems for deep learning. He came to Baidu from NVIDIA, where he researched tools and libraries for making machine learning more efficient and easier to implement on parallel processors. He earned his Ph.D. from Berkeley where he built the Copperhead language and compiler, which allows Python programmers to use nested data parallel abstractions and gain high efficiency on contemporary parallel platforms.

Speech is the user interface of the future, but today's implementations often fail when we need them the most, such as in noisy environments or when the microphone isn't close at hand. At Baidu, an increasing fraction of our users employ speech interfaces to find what they are looking for. In this talk, I will show how next generation deep learning models can provide state-of-the-art speech recognition performance. We train these models using clusters of GPUs using CUDA, MPI and Infiniband.

Level: Intermediate
Type: Talk
Tags: Machine Learning & Deep Learning; Press-Suggested Sessions: Deep Learning & Computer Vision

Day: Tuesday, 03/17
Time: 15:00 - 15:25
Location: Room 210A
View Recording

S5362 - A GPU-Accelerated 3D Kinematic Modeling Platform for Behavioral Neuroscience

John Long Post-doctoral Researcher, New York University Langone Medical Center
John Long
John is a postdoctoral researcher in the laboratory of Dr. György Buzsáki at the New York University Langone Medical Center. He received his PhD in neuroscience from the UC Berkeley Helen Wills Neuroscience Institute in 2011, in the Brain-Machine Interface laboratory of Dr. Jose Carmena. His current work in neuroscience leverages multiple camera photogrammetry and the power of GPUs to build 3D models of his neurophysiological subjects to study the relationships between memory formation in the brain, navigation, and action planning. He is also working within the clinical domain to develop a computer vision system for behaviorally diagnosing Parkinson's disease.

Computer vision techniques for 3D reconstruction and kinematic modeling are positioned to bring about a major advance in the field of behavioral neuroscience. Integrating GPUs into the software pipeline has qualitatively improved our ability to fit, inspect, and refine complex kinematic models. Our custom markerless motion capture system, in conjunction with our use of high-density silicon neural implants (≥ 100 channels), provides an unprecedented glimpse into the relationship between the brain, memory, and behavior.

Level: Intermediate
Type: Talk
Tags: Computer Vision & Machine Vision; Life & Material Science; Developer - Algorithms; Press-Suggested Sessions: Deep Learning & Computer Vision; Press-Suggested Sessions: HPC & Science

Day: Tuesday, 03/17
Time: 15:30 - 15:55
Location: Room 210B
View Recording
View PDF

S5760 - Real-Time, Content-Driven Representations at Twitter

Clement Farabet Senior Software Engineer, Twitter
Clement Farabet
Clement Farabet is a senior software engineer at Twitter, where he leads the effort on representation learning for all things Twitter. Clement Farabet received a Master’s Degree in Electrical Engineering with honors from Institut National des Sciences Appliquées (INSA) de Lyon, France in 2008. His Master’s thesis work on reconfigurable hardware for deep neural networks was developed at the Courant Institute of Mathematical Sciences of New York University with Professor Yann LeCun, and led to a patent. He then joined Professor Yann LeCun’s laboratory in 2008, as a research scientist. In 2009, he started collaborating with Yale University’s e-Lab, led by Professor Eugenio Culurciello. This joint work later led to the creation of TeraDeep (www.teradeep.com). In 2010, he started the PhD program at Université Paris-Est, co-advised by Professors Laurent Najman and Yann LeCun. His thesis focused on real-time image understanding/parsing with deep convolutional networks. The main contributions of his thesis were multi-scale convolutional networks and graph-based techniques for efficient segmentations of class prediction maps. He graduated in 2013, and went on to cofound Madbits, a company that focused on representing, understanding and connecting images. Madbits got acquired by Twitter in 2014.

Twitter is a unique source of real-time information, offering amazing opportunities for automatic content understanding. The format of this content is diverse (tweets, photos, videos, music, hyperlinks, follow graph, ...), the distribution of topics ever-changing (on a weekly, daily, or sometimes hourly basis), and the volume ever-growing; making it very challenging to automatically and continuously expose relevant content. Manually defining features to represent this data is showing its limits. In this talk, I provide an overview of how automated, content-driven representations—enabled by modern deep-learning algorithms—enables us to build adaptive systems which capture the richness of this content. Specifically, the presentation focuses on deep representations for images and images+text.

Level: All
Type: Talk
Tags: Machine Learning & Deep Learning; Computer Vision & Machine Vision; Press-Suggested Sessions: Deep Learning & Computer Vision

Day: Tuesday, 03/17
Time: 15:30 - 15:55
Location: Room 210A
View Recording

S5870 - Audi Piloted Driving: In the Fast Lane to the Future

Daniel Lipinski Senior Engineer, Audi of America
Daniel Lipinski
Daniel started working for Audi in 2008 as the lead developer for the European Traffic Sign Recognition system. In 2012 he joined the Volkswagen Electronics Research Lab (ERL) in Silicon Valley, where he led the application of several driver assistance systems to the U.S. market. Daniel is now the project lead with one of the most comprehensive Volkswagen Group and Audi research projects for piloted driving. One of his project cars is “Jack”, the Audi piloted driving concept car that successfully completed the 550 miles automated driving road test from Silicon Valley to Las Vegas. Lipinski studied Computer and Communications Systems Engineering at the Technical University in Braunschweig, Germany.

On the eve of CES 2015, Audi, ERL and VW Group Research accomplished the most dynamic automated driving road test yet, with non-engineers behind the wheel over 550 miles+ on public freeways. With the advanced Highway Pilot technology built into a car nicknamed "Jack", Audi demonstrated how far the automated technology has matured within the last decade. What enabled such complex technology is the massive growth in processing power, a field in which NVIDIA processors will be playing a central role in the future.

Level: All
Type: Talk
Tags: Automotive; Computer Vision & Machine Vision; Video & Image Processing; Press-Suggested Sessions: Cars

Day: Tuesday, 03/17
Time: 15:30 - 15:55
Location: Room LL21F
View Recording
View PDF

S5118 - Impressions: The Global Impact of Culture, Imagery and Visual Communication

Don Levy President & Cultivator, Smith Brook Farm
Don  Levy
Don Levy has been at the forefront of the entertainment industry's digital transformation, developing "the intersection of entertainment and technology" throughout his career and at Sony Pictures Entertainment (Columbia Pictures/Sony Pictures Digital) from 1995-2012. He founded Smith Brook Farm in 2012 as a creative consultancy and is also the co-founder of Spud Media, LLC, a new entertainment venture serving the family market. Levy attended New York University, received his B.A. from the University of Denver and earned certificates from UCLA's Anderson School of Business. Don is a member of the Academy of Motion Picture Arts & Sciences, serving on its feature animation nominating committee and recently chaired a working group for the Science and Technology Council. He also is a member of The Television Academy's Interactive Peer Group, The Visual Effects Society, ASIFA Hollywood, the International Photographers Guild and METAL, the Media, Entertainment and Technology Alpha Leaders organization. Levy is a frequent speaker on the subjects of innovation, digital creativity, education and visual effects. His 2012 talk on the principles and evolution of visual effects at the TED Conference in Long Beach, CA was posted on TED.com in January 2013. He is active in local education issues and organizes TEDxConejo in association with the Conejo Valley (Thousand Oaks, Ca) Unified School District.

We are what we see. The question is how does what we see influence our lives and the lives of future generations? We live in a visual world. This has brought us closer together and enabled people everywhere to share everything from the latest pop culture phenomenon to the most catastrophic news. Infographics and animation explain every subject. From an early age, he's appreciated the power of images to move people. Today, the line between fact and fiction is virtually gone. Many of the images that impressed me in my most formative years were of dreams and hope and aspiration. Others made me think. With a curiosity born of my Hollywood experience in the dream factory and thinking back on how the pictures of my own youth continue to influence, I'll share with you some thoughts and ideas

Level: All
Type: Talk
Tags: Media & Entertainment; Computer Vision & Machine Vision; Augmented Reality & Virtual Reality; Developer - Performance Optimization

Day: Tuesday, 03/17
Time: 16:00 - 16:25
Location: Room LL21D
View Recording
View PDF

S5373 - GPU + Drones + 3D Imaging for Precision Farming and Construction

Bingcai Zhang Tech Fellow, BAE Systems
Bingcai Zhang
Dr. Zhang is a technical fellow at BAE Systems, the premier global defense and aerospace company. He joined BAE Systems in September 1995 right out of University of Wisconsin-Madison, where he earned his Ph.D. in engineering college and MS in computer science. His research interests are: (1)geospatial information technology and 3D mapping; (2)robot vision and unmanned systems; and (3)3D geoweb search. He has held positions as chief architect, chief photogrammetrist, R&D manager, and technical fellow with BAE Systems. Dr. Zhang has three inventions: (1)Embedded Photogrammetry, (2)Next Generation Automatic Terrain Extraction (NGATE), and Automatic 3D Object Extraction. Embedded photogrammetry is a concept to embed a precise 3D measurement component called photogrammetry into non-photogrammetry applications such as GIS and CAD. NGATE generates 3D terrain model from stereo images. AFE is a production capable system that automatically extracts 3D objects such as houses, buildings, trees from a digital surface model or LiDAR point clouds.

Agriculture and construction are two of the largest industries in the world. Democratization of 3-D imaging technology with drones, digital cameras, and GPU is applicable for precision farming and construction. Precision farming can increase crop yields, reduce pollution, save water, and increase productivity. The demand for precision farming has since increased, however, with more people living on planet Earth with fixed natural resources. Timely precise 3-D measurements are important for construction. Today, most of these 3-D measurements are obtained manually. BAE Systems is developing GPU-accelerated 3-D imaging technology with drone images for precision farming and construction.

Level: All
Type: Talk
Tags: Computer Vision & Machine Vision; Video & Image Processing; Press-Suggested Sessions: Deep Learning & Computer Vision

Day: Tuesday, 03/17
Time: 16:00 - 16:25
Location: Room 210B
View Recording
View PDF

S5599 - Gesture Recognition: Using a Multi Sensor Approach

Shalini Gupta Senior Research Scientist, NVIDIA Research
Shalini Gupta
Shalini Gupta is a Senior Research Scientist at NVIDIA Research. Formerly, she was a Senior Mobile Computer Vision Engineer at NVIDIA, and an Imaging Scientist at Texas Instruments. Shalini received her Doctoral degree in Electrical and Computer Engineering from the University of Texas at Austin in 2008.

For accurate and power-efficient in-vehicle hand-gesture recognition, a novel multi-sensor system can be comprised of a short-range radar, a color camera, and a depth camera, which together make the system robust against variable lighting conditions. To jointly calibrate the radar and depth sensors, a convolutional deep neural networks can fuse data from multiple sensors and to classify the gestures. This algorithm accurately recognizes 10 different gestures acquired indoors and outdoors in a car during the day and at night, consuming significantly less power than purely vision-based systems.

Level: All
Type: Talk
Tags: Automotive; Computer Vision & Machine Vision

Day: Tuesday, 03/17
Time: 16:00 - 16:25
Location: Room LL21F
View Recording
View PDF

S5811 - TK1-Based Solutions for Intelligent Video Analytic Applications

Hai Tao CEO, Beijing Vion Technology Inc. (BVT)
Hai  Tao
Hai Tao is the founder and CEO of Beijing Vion Technology Inc. He has 25 years of experience in image processing and computer vision. Prior to BVT, he was an associate professor at UC Santa Cruz. Dr. Tao received his Ph.D. degree from University of Illinois at Urbana Champaign.

This talk demonstrates how GPU-based embedded computer vision system are transforming the world of video processing in several vertical markets including ATM safety, intelligent transportation systems (ITS), business intelligence (BI), and smart video surveillance. By taking full advantage of the TK1's 300+ GFLOPS computing power, BVT has built and deployed embedded systems for people counting, shopping traffic gender and age analysis, perimeter monitoring, violence and chasing detection, and ATM service area protection. These application systems require development of custom-made computer vision algorithms and efficient implementation of these algorithms in GPU. In addition, we will also demonstrate how the world's first TK1-based smart cameras are being developed for various applications including license plate recognition, face recognition and crowd management. Compared to the previous DSP-based smart camera solution, the powerful embedded GPU-based solution is the first that can support imaging sensor resolution up to 12 mega-pixels. The talk will also provide technical details on Cuda implementation of several computer vision algorithms.

Level: All
Type: Talk
Tags: Embedded Systems; Computer Vision & Machine Vision

Day: Tuesday, 03/17
Time: 16:00 - 16:25
Location: Room 210G
View Recording

S5873 - Optimized GPU Kernels for Deep Learning

Amir Khosrowshahi CTO and Co-Founder, Nervana Systems
Amir is CTO of Nervana Systems, a startup bringing unprecedented performance and scale to deep learning. He has a Ph.D. from UC Berkeley, and MA and BA from Harvard.

Deep learning has recently achieved great success in domains such as images, speech, and text. These gains have been made possible by efficient GPU implementations such as cuDNN. We show optimizations at the assembly level that result in significant performance improvements over existing methods. In particular, we show how operations such as convolutions and dense matrix multiply can be efficiently implemented using a custom assembler to attain state-of-the-art performance on the NVIDIA Maxwell GPU architecture. Additionally, we can significantly reduce memory bandwidth and run much larger models by using limited precision with a minimal tradeoff in model accuracy.

Level: Beginner
Type: Talk
Tags: Machine Learning & Deep Learning; Computer Vision & Machine Vision; Developer - Performance Optimization

Day: Tuesday, 03/17
Time: 16:00 - 16:25
Location: Room 210A
View Recording
View PDF

S5740 - Clarifai: Scaling Deep Learning

Matthew Zeiler CEO, Clarifai
Matthew Zeiler
Matthew Zeiler, PhD, Founder and CEO of Clarifai Inc. studied machine learning and image recognition with several pioneers in the field of deep learning at University of Toronto and New York University. His insights into neural networks produced the top 5 results in the 2013 ImageNet classification competition. He founded Clarifai to push the limits of practical machine learning, which will power the next generation of intelligent applications and devices.

The field of artificial intelligence and machine learning is in a period of rapid, revolutionary improvement. Across many application areas, deep learning is proving to outperform techniques that were considered state of the art only a few years ago. Although the fundamental techniques were developed in the 1980s and 1990s, it was only recently that they were applied at large scale, due to the advent of general-purpose GPU computing and the availability of internet-scale datasets. The deep learning experts at Clarifai have spent years working alongside pioneers of the field and form a team who has vast experience developing new deep learning techniques and building state of the art systems that solve real problems. In this talk we will present some of the latest technologies we have developed and show how they can be applied to power a new generation of intelligent applications.

Level: All
Type: Talk
Tags: Machine Learning & Deep Learning; Computer Vision & Machine Vision; Developer - Tools & Libraries; Press-Suggested Sessions: Deep Learning & Computer Vision

Day: Tuesday, 03/17
Time: 16:30 - 16:55
Location: Room 210A
View Recording

S5761 - Achieving Real-Time Performances on Facial Motion Capture and Animation on Mobile GPUs

Emiliano Gambaretto Director of Research, Mixamo
Emiliano Gambaretto
Emiliano Gambaretto obtained a PhD degree in Bioengineering from Politecnico di Milano (Italy) in 2011 for his research on Markerless Motion Capture. Part of his research was carried out at Stanford Biomotion Lab in 2006. He also received a Master's Degree in Biomedical Engineering from Politecnico di Milano and a Diplome d'Ingenieur from Ecole Centrale de Lyon (France) in 2006. He was part of Mixamo's founding team in 2008. He's currently Director of Research at Mixamo. His job consists of designing and developing the technologies behind Mixamo's services. That includes motion models, auto-rigging, real-time face animation, integration with 3rd-party software and industry standards.

3D Animation is one of the most prominent forms of contemporary art, with millions people drawn to its emotional power in movie theaters and games every year. Mixamo developed a GPU-powered facial capture and animation technology to enable users to animate a character's face in real-time. This technology, originally targeted to desktop and laptop GPUs, is now available on mobile thanks to the improved performance of new generation hardware. This presentation will focus on the challenges faced and the strategies adopted to port this technology to Tegra K1 powered devices. We adopted two parallel approaches: one approach optimized our tracking algorithm and ported our code to CUDA (from OpenCL); the other completely changed facial tracking paradigm focusing on an intrinsically faster machine learning approach based on a cascade of simple regressors. We will compare performances and strengths of both approaches.

Level: All
Type: Talk
Tags: Computer Vision & Machine Vision; Developer - Algorithms; Developer - Performance Optimization; Press-Suggested Sessions: Deep Learning & Computer Vision

Day: Tuesday, 03/17
Time: 16:30 - 16:55
Location: Room 210B
View Recording

S5918 - Ubiquitous Perceptive 3D Sensing for a Smart Internet of Things

Louay Eldada CEO, Quanergy Systems, Inc.
Louay Eldada is founder and CEO of Quanergy Systems, Inc., a privately held Silicon Valley-based technology company developing and manufacturing smart sensors and sensing systems. Louay is a serial entrepreneur, having founded and sold three businesses to Fortune 100 companies. His fourth start-up, Quanergy, is developing compact low-cost high-performance high-reliability LiDAR (light detection and ranging) sensors and software used for capturing and processing 3D mapping data in real time. In transportation, the data will be utilized to greatly improve the accuracy and reliability of on-board driver safety systems and enhance them with object recognition and scenario analysis capability, as well as enable autonomous driving in the future. Quanergy has established early partnerships with global automotive, mining and digital mapping companies, and will be expanding its market footprint into logistics, robotics, aeronautics, security, and 3D-aware consumer electronics. Prior to Quanergy, Dr. Eldada was VP of technology at Amprius, a developer of lithium ion batteries based on silicon nanowire anodes. He was earlier CSO of SunEdison where he led innovation programs in photovoltaic and energy storage systems, after serving as CTO of HelioVolt, where he led the development of thin film photovoltaic technologies. He was earlier CTO of DuPont Photonic Technologies, a business that was formed from the acquisition by DuPont of Telephotonics, a company that he founded and where he led as CTO the development of optoelectronic telecommunication modules. His first industry job was at Honeywell, where he started the Telecom Photonics business and directed its R&D division. The success of the business led to its acquisition by Corning, where he continued to direct technical development. He chaired and organized 160 conferences; delivered 200 keynotes and invited talks/courses; published 270 technical papers, books and book chapters; received 50 technical awards and has 65 patents. He studied business administration at Harvard, MIT and Stanford and holds a Ph.D. in optoelectronics from Columbia University.

Innovations in perceptive smart sensors comprising solid state 3D LiDARs and GPUs with artificial intelligence software have reached a cost level that allows them to be deployed ubiquitously, supporting a smart Internet of Things (IoT). These smart sensors provide real-time information on billions of 'Things' and their surroundings (through 3D object detection, tracking, and classification) and, when needed, the ability to control them. The 'Things' include vehicles, infrastructure, buildings, homes, appliances, light controls, thermostats, medical devices, computers, and handheld devices. Growth of personal devices (phones, tablets, laptops, game consoles) is limited by the number of people in the world. The largest growth will come from connected devices in areas such as smart energy, home automation, and transportation.

Level: All
Type: Talk
Tags: Automotive; Computer Vision & Machine Vision; Machine Learning & Deep Learning

Day: Tuesday, 03/17
Time: 16:30 - 16:55
Location: Room LL21F
View Recording

S5751 - Stereovision and the Future of Autonomous Machines

Edwin Azzam CTO, STEREOLABS
Edwin Azzam co-founded STEREOLABS in 2010. As STEREOLABS’s Chief Technical Officer, Edwin is responsible for leading the company’s product development and technology strategy in stereoscopic image processing. Prior to founding STEREOLABS, Edwin was a project manager at Airbus Defence and Space. Edwin holds a Master’s degree in Optics & Image Processing from Institut d’Optique, France, as well as a Master’s degree in Management from ESSEC Business School. He is a PhD supervisor and a National Technical Expert for the ANR (National Research Agency), where he uses his technical and market expertise for the assessment of national research projects in the field of computer vision and 3D image processing. Edwin was honored twice with the National Innovation Prize by the French Ministry of Research. Between 2010 and 2014, Edwin received 10 different distinctions for his achievements in the stereoscopic 3D field. In 2010, he won the European Innovation Award with STEREOLABS which recognizes the most promising and innovative technological companies in Europe.

Discover how stereovision and 3D depth sensing on mobile GPUs enable the development of future autonomous cars, drones and robots. We will discuss the benefits and challenges of using stereo cameras as depth sensing sensors, and how to leverage the power of embedded GPU to overcome these challenges. We will also show demonstrations of how the technology can be used to create 3D surrounding reconstruction, detect obstacles and navigate autonomously.

Level: All
Type: Talk
Tags: Computer Vision & Machine Vision; Automotive; Video & Image Processing; Press-Suggested Sessions: Deep Learning & Computer Vision; Press-Suggested Sessions: Cars

Day: Wednesday, 03/18
Time: 09:00 - 09:25
Location: Room 210B
View Recording

S5317 - Development of a GPU Accelerated Visual Tracking Framework

David Concha Researcher, Universidad Rey Juan Carlos
David Concha
David received his B.Sc. Degree in Computer Science from Universidad Rey Juan Carlos (URJC) in 2011 and is currently a Ph.D. student and grant holder at Universidad Rey Juan Carlos. His research interests focus on Computer Vision and GPU Computing. Some research works done recently, exploits the graphics hardware to accelerate Computer Vision algorithms. In particular, David uses GPUs to accelerate methods related to 3D/2D motion tracking, medical image reconstruction, face recognition, high-definition depth maps computation, image segmentation, etc.

This session presents the development of a visual tracking system whose ultimate goal is to track multiple articulated objects. Throughout the development, different technologies for GPU programming are used, like OpenGL, Cg and CUDA; various types of sensor such as cameras or Kinects; and different methodologies like particle filters, Kalman filter or Variable Neighborhood Search (VNS) metaheuristic.

Level: All
Type: Talk
Tags: Computer Vision & Machine Vision; Video & Image Processing

Day: Wednesday, 03/18
Time: 09:30 - 09:55
Location: Room 210B
View Recording
View PDF

S5205 - Real-Time and High Resolution Feature Tracking and Object Recognition

Peter Andreas Entschev Software Engineer, ArrayFire
Peter Andreas Entschev
Peter Entschev is currently a software engineer at ArrayFire, where he primarily works on concurrent computer vision problems. He has received his Bachelor's degree in Telecommunication Systems and Master's degree in Computer Science from the Federal University of Technology - Paraná (UTFPR), Brazil. Before joining ArrayFire, he worked on real-time computer vision research at SEW-Eurodrive in Germany and with system administration and development of Linux distributions for the Brazilian Government.

This session will cover real-time feature tracking and object recognition in high resolution videos using GPUs and productive software libraries including ArrayFire. Feature tracking and object recognition are computer vision problems that have challenged researchers for decades. Over the last 15 years, numerous approaches were proposed to solve these problems, some of the most important being SIFT, SURF and ORB. Traditionally, these approaches are so computationally complex that processing more than a few frames per second is impossible. Using an NVIDIA K20 GPU with ORB, we are able to process more than 30 frames per second on images in the order of 10000x10000 pixels. Multiple quality and timing benchmarks will be presented, covering some of the most robust feature tracking methods.

Level: All
Type: Talk
Tags: Computer Vision & Machine Vision; Developer - Algorithms; Video & Image Processing; Press-Suggested Sessions: Deep Learning & Computer Vision

Day: Wednesday, 03/18
Time: 10:00 - 10:25
Location: Room 210B
View Recording
View PDF

ECS5001 - CEO Show & Tell: Paracosm

Amir Rubin CEO, Paracosm
Amir Rubin
Amir co-founded Paracosm in Jan 2013, and currently serves as CEO. Prior to Paracosm, Amir spent the past 10 years developing 3D-vision systems and 3D-simulation applications.

He co-founded his first company, Prioria Robotics, while he was still a computer-engineering student at the University of Florida. At Prioria he developed vision systems for small drones. Most recently, he was employee #1 at a successful Florida startup, Shadow Health, which develops interactive healthcare simulations. He also holds a patent for weighing cows based on 3D-imagery and photographs of them.

Paracosm enables robots and augmented reality applications to understand and interact with the world around them. Their core technology is a spatial intelligence platform that provides the tools to collaboratively capture interior spaces, generate 3D maps, and create immersive experiences. They are a team of engineers, artists, and dreamers based in Gainesville, FL and Cambridge, MA.

Level: All
Type: Talk
Tags: Computer Vision & Machine Vision; Emerging Companies Summit; Press-Suggested Sessions: Deep Learning & Computer Vision

Day: Wednesday, 03/18
Time: 10:15 - 10:30
Location: Room 220B

ECS5002 - CEO Show & Tell: Herta Security

Javier Rodríguez Saeta CEO, Herta Security
Javier Rodríguez Saeta
Javier Rodríguez Saeta received the B.S., M.S. and Ph.D. degrees in Electrical (Telecommunication) Engineering from the Technical University of Catalonia, UPC, Barcelona (Spain), in 2000 and 2005, respectively. He has also received the B.A. degree in Business Administration by the Open University of Catalonia (UOC), and the MBA by ESADE Business School. In 2000 he worked for Robert Bosch, GmbH, in Hildesheim (Germany). In 2001, he joined Biometric Technologies, S.L., in Barcelona (Spain), where he was the R&D Manager. He founded Herta Security in 2009 and became the CEO of the company.

Herta Security is a world leader in the development of cutting edge facial recognition solutions. Based in Barcelona, Spain, – with offices in Madrid and London -, the company offers fast, accurate, robust, end-customer oriented solutions for video surveillance, access control, and marketing requirements.

Level: All
Type: Talk
Tags: Computer Vision & Machine Vision; Emerging Companies Summit

Day: Wednesday, 03/18
Time: 10:30 - 10:45
Location: Room 220B

S5221 - Tracking Objects Better, Faster, Longer

Alptekin Temizel Associate Professor, Middle East Technical University
Alptekin Temizel
Dr. Alptekin Temizel is an associate professor at Informatics Institute, Middle East Technical University (METU). He received his BSc in Electrical and Electronic Engineering from METU, Ankara, Turkey (1999) and his PhD from Centre for Vision, Speech and Signal Processing, University of Surrey, UK (2006). Between 1999-2001 he worked as a research assistant in University of Hertfordshire, UK. He co-founded Visioprime Ltd., UK –a company developing intelligent video systems for security and surveillance applications- and worked as a senior research engineer in this company between 2001-2006. Since 2006, he is a professor in Graduate School of Informatics, Middle East Technical University (METU), Turkey and consultant to several R&D companies. He is the principle investigator of Virtual Reality and Computer Vision Research Group (VRCV), NVIDIA CUDA Teaching Center and CUDA Research Center. His main research interest areas are: image and video processing, video surveillance, computer vision, parallel programming and GPU programming.

In this talk, we demonstrate a real-time long-term-tracker, Hybrid-TLD (H-TLD), which is based on the recently proposed Tracking-Learning-Detection (TLD) framework. TLD simultaneously tracks the object, learns its appearance and detects when it re-appears. While it has been shown to have promising results, its high computational cost prohibits running it at higher resolutions and frame-rates. We present our analysis of the framework and our modifications to make it work effectively on a CPU-GPU hybrid setting with a high utilization of both processing units using OpenMP and CUDA. Our results show that 10.25 speed up at 1920x1080 resolution could be obtained. The source code of the developed H-TLD library has been made publicly available.

Level: Intermediate
Type: Talk
Tags: Computer Vision & Machine Vision; Video & Image Processing

Day: Wednesday, 03/18
Time: 10:30 - 10:55
Location: Room 210B
View Recording
View PDF

S5633 - Robust Speech Recognition for Cars

Ian Lane Assistant Research Professor, Carnegie Mellon University
Ian Lane
Ian Lane is an Assistant Professor at Carnegie Mellon University. He leads the speech and language-processing group at CMU Silicon Valley and performs research in the areas of speech recognition, spoken language understanding and speech interaction. Ian and his group are developing methods to accelerate speech and language technologies using GPUs.

One aspect os speech recognition work at Carnegie Mellon University is specifically focused on noise-robust speech recognition for automotive environments. By combining state-of-the-art methods in deep learning with GPU-accelerated embedded hardware, we are able to significantly improve the performance of speech recognition, even in challenging noise conditions.

Level: All
Type: Talk
Tags: Automotive; Signal & Audio Processing; Machine Learning & Deep Learning; Press-Suggested Sessions: Deep Learning & Computer Vision; Press-Suggested Sessions: Cars

Day: Wednesday, 03/18
Time: 10:30 - 10:55
Location: Room LL20D
View Recording

ECS5004 - CEO Show & Tell: Jibo

Cynthia Breazeal Chief Scientist, Jibo
Cynthia Breazeal
Dr. Cynthia Breazeal is Chief Scientist of Jibo, Inc. She is also an Associate Professor at the MIT Media Lab where she directs the Personal Robots Group. Breazeal is recognized as a prominent innovator at the intersection of technology and society as the pioneer of Social Robotics. Her research spans both the creation of intelligent and socially responsive robots, as well as studying their impact on contributing to people's quality of life across education & learning, creativity, health, telecommunications, aging, entertainment, play, and more. Jibo, Inc. brings the technologies, design insights, and user experience of social robots to the home as the world's first family robot to help busy families to manage, care for, coordinate and connect with loved ones with greater ease, engagement, efficacy, and fun. As an open platform, Jibo enables 3rd party developers to bring the engagement and emotional lift of social robots to their apps and services.

Described by the company as the "world's first family robot," Jibo looks straight out of Pixar, but the plans that founder and Chief Scientist Breazeal has for the in-home social robot are very real. Jibo first appeared on the scene last summer as an Indiegogo crowd fund-raiser, bringing in the tidy sum of $2.3 million, and just recently announced it's raised $25.3 million in Series A funding. It's also taken almost 5,000 pre-orders to date, which are expected to start shipping at the end of this year. What will Jibo do? When fully realized, Jibo will engage as a helpful part of the family, a companion who knows the other members, and provides them with personalized messages and reminders, serves as the family photographer, tells stories to the kids, etc. As an open 3rd party developer platform, Jibo's skills will continue to expand, eventually providing services like ordering pizza and so much more. The company's goal is for Jibo to "help families manage, care for, coordinate, and connect with greater ease, engagement, efficiency, and fun."

Level: All
Type: Talk
Tags: Computer Vision & Machine Vision; Emerging Companies Summit; Press-Suggested Sessions: Deep Learning & Computer Vision

Day: Wednesday, 03/18
Time: 11:00 - 11:15
Location: Room 220B

S5147 - Faster Convolutional Neural Networks by Separable Filters

Che-Rung Lee Professor, National Tsing Hua University
Che-Rung Lee
Che-Rung Lee received his B.S. and M.S. degrees in Computer Science from National Tsing Hua University Taiwan in 1996 and 2000 respectively, and the Ph.D. degree in Computer Science from University of Maryland, College Park in 2007. He joined the Department of Computer Science at National Tsing Hua University as an assistant professor in 2008 and became an associate professor in 2013. His research interests include numerical algorithms, scientific computing, high-performance computation, and cloud computing. He is the chair of CCOE (CUDA Center Of Excellence) in NTHU (National Tsing Hua University). He is a member of IEEE and SIAM.

Learn how to accelerate the training process of the convolutional neural networks (CNNs) for image recognition on GPU with separable filters. It uses Singular Value Decomposition (SVD) to approximate 2D filters by the product of two 1D filters, and performs two 1D convolutions consecutively. The GPU implementation consists of two kernels. First is a batched SVD routine on GPUs that can compute multiple small matrices simultaneously. Second is the computation of convolution, which combines three methods using different memory spaces for various filter size. Experimental results show that the implementations can achieve 1.35x~2.66x speedup in the forward pass and the backward pass comparing to the state of art GPU implementations of CNNs.

Level: Intermediate
Type: Talk
Tags: Machine Learning & Deep Learning; Computer Vision & Machine Vision; Developer - Performance Optimization

Day: Wednesday, 03/18
Time: 14:00 - 14:25
Location: Room 210A
View Recording

S5295 - Next Generation Surround-View for Cars

Miguel Sainz Principal Engineer, Computer Vision, NVIDIA
Miguel Sainz is a Principal Engineer in Computer Vision at NVIDIA. His research interests and focus include real-time 3D graphics, image based modelling and rendering, camera calibration, 3D model reconstruction from images, tracking and image processing. Prior to joining NVIDIA Miguel received a degree in Electrical Engineering at the Polytechnic University of Catalonia, UPC,(Spain) and a PhD. degree in Electrical and Computer Engineering at the University of California, Irvine.
Timo Stich Sr. Developer Technology Engineer, NVIDIA
Highly-Rated Speaker
Timo Stich
Timo Stich is a Senior Developer Technology Engineer for NVIDIA Corporation. His focus is on image processing applications of Graphics Processors. Prior to joining NVIDIA he was research staff in Computer Graphics and Image Processing at the Max-Planck-Institute for Computer Science and the Computer Graphics Lab of Brunswick University, Germany. He received a diploma degree in Computer Science from Mannheim University, Germany and a Ph.D. degree from the Brunswick University, Germany.

A robust proof of concept Surround-Vision and Top View system for cars includes four car mounted cameras as inputs and the Jetson Pro platform as the computation and display unit, relying on CUDA and OpenGL for both GPGPU and rendering of the final views. Topics covered will include: the placement and calibration of the cameras, color correction and data preprocessing. A technical deep dive on the common pitfalls will highlight common visual artefacts in Top-View visualizations, and will present the algorithmic building blocks to correct for those errors.

Level: Intermediate
Type: Talk
Tags: Automotive; Computer Vision & Machine Vision; Real-Time Graphics; Press-Suggested Sessions: Cars

Day: Wednesday, 03/18
Time: 14:00 - 14:25
Location: Room LL20D
View Recording
View PDF

S5478 - I Can't Believe It's Not Just Molecular Dynamics (It's Machine Learning Too).

Scott LeGrand Principal Engineer, Amazon
Highly-Rated Speaker
Scott Le Grand is currently a principal engineer at Amazon working the personalization team. He developed the first molecular modeling system for home computers, Genesis, in 1987, Folderol, the distributed computing project targeted at the protein folding problem in 2000, and BattleSphere, a networkable 3D space shooter for the Atari Jaguar the same year. Surprisingly, all three of these efforts shared a common codebase. More recently, he ported the Folding@Home codebase to CUDA, achieving a 5x speedup over previous efforts, and which currently accounts for ~2.6 petaFLOPs of the project’s computational firepower. He is best known for his work porting the AMBER molecular dynamics package to CUDA, attaining record-breaking performance in the process. In a previous life, Scott picked up a B.S. in biology from Siena College and a Ph.D. in biochemistry from the Pennsylvania State University. In the current life, he is optimizing the performance of deep neural networks by day and continuing to work on AMBER by night.

There is a surprising algorithmic overlap between Deep Neural Networks (DNNs) and Molecular Dynamics. This talk will describe bidirectional technology transfers between these two seemingly disparate fields that has resulted from applying the wisdom gained porting the AMBER molecular dynamics package to 4 generations of NVIDIA GPUs over the past 6 years to the development of a deep neural network system. Finally, I will present record-breaking AMBER performance numbers for Maxwell GPUs and GPU clusters.

Level: Intermediate
Type: Talk
Tags: Life & Material Science; Machine Learning & Deep Learning; Press-Suggested Sessions: Deep Learning & Computer Vision

Day: Wednesday, 03/18
Time: 14:00 - 14:50
Location: Room 212A
View Recording
View PDF

ECS5006 - Early Stage Challenge: GeekSys

Luiz Vitor Martinez Cardoso CEO, GeekSys
Luiz Vitor Martinez Cardoso
Luiz is a 26 years engineer and entrepreneur who was nominated as one of the most innovative professionals on both communication and marketing in South America. Luiz got his dual engineering degree in computer and electronics engineering from a top Brazilian school and accumulates previous experiences on academia, small, mid and multinational companies, including GE. Self-learner, Luiz has a very special way of seeing the world, being able to anticipate trends, architect technologies from scratch and delivery it to market. Today Luiz is dedicating all efforts to make GeekSys be the leader in the Store Performance Management (SPM) field.

GeekSys is the most innovative and awarded brick & mortar analytics start-up in Brazil today. GeekSys has born from two engineering class mates curiosity while they were trying to develop a marketing tool to understand how customers behave in front of a shopping window back in 2010. After two years of intense R&D, GeekSys released his first product into the market. From 2012 to 2014, GeekSys released tree more products and received 7 awards in technology, innovation and business models. GeekSys was the first company in the world to be able to read and quantify the purchase intention inside physical stores and translate it to a more natural language to retailers. GeekSys was also the creator of the Store Performance Management (SPM) concept, comparable to CRM and ERP. Today GeekSys has big customers in Brazil and is integrating all previous complex technologies into a single and uniform platform, as never seen before. GeekSys is working hard to lead the retail analytics market and our key advantage is the ability to use technology as a path to value.

Level: All
Type: Talk
Tags: Computer Vision & Machine Vision; Big Data Analytics; Emerging Companies Summit

Day: Wednesday, 03/18
Time: 14:30 - 14:38
Location: Room 220B

S5255 - Power Efficient Visual Computing on Mobile Platforms

Brant ZHAO GPU Architect, NVIDIA
Brant is a GPU architect at NVIDIA Shanghai. He is focusing on GPU computing analysis and architecture investigation. His study targeted at providing performance and power optimized implementation of computing applications on current generation of GPUs as well as architecture improvements for next generation of GPUs to help current applications achieve better GPU utilization and power efficiency.

Tegra K1 brings desktop GPU into mobile world which makes it possible for mobile platforms to succeed in more and more complex visual computing tasks. With future more powerful Tegra family chips, much more compute applications are expected for the mobile world. Besides performance tuning for all these applications, it's also critical to make them power efficient as they are running on mobile devices which have limited power budget. In this work, we will present methodology of doing power analysis and optimizations for mobile computing workloads. Three case studies will be presented to explain the three items of the methodology:(1) Analyze the whole pipeline at system level; (2) Using energy efficient features of the target platforms; (3) Reduce the total instruction count to save energy.

Level: Intermediate
Type: Talk
Tags: Computer Vision & Machine Vision; Developer - Performance Optimization; Automotive

Day: Wednesday, 03/18
Time: 14:30 - 14:55
Location: Room 210B
View Recording
View PDF

S5629 - Reconstruction Networks for Efficient Face Detection and Landmark Localization

Bo Yu Visiting Researcher, Carnegie Mellon University
TBD
Ian Lane Assistant Research Professor, Carnegie Mellon University
Ian Lane
Ian Lane is an Assistant Professor at Carnegie Mellon University. He leads the speech and language-processing group at CMU Silicon Valley and performs research in the areas of Speech Recognition, spoken language understanding and speech interaction. Ian and his group are developing methods to accelerate speech and language technologies using General Purpose Graphics Processing Units (GPUs). His group has already obtained 1000x speedup for signal processing tasks, 100x speedup for Viterbi training and over 20x speedup for complex tasks such as graph search. These new technologies have enabled the group to explore novel interaction paradigms for human machine interaction.

In this talk we introduce Reconstruction Networks, a novel neural network-structure that enables extremely efficient object detection in images. Reconstruction Networks directly reconstruct the regions of interest of one or more objects within an image without explicitly performing image segmentation or generating key point descriptors. We show that Reconstruction Networks can learn the structure of face and facial landmarks automatically, even under various poses and illumination conditions and outperform state-of-the-art performance for Face Detection and Facial Landmark Localization while requiring only a fraction of the computational cost.

Level: Intermediate
Type: Talk
Tags: Machine Learning & Deep Learning; Computer Vision & Machine Vision; Automotive

Day: Wednesday, 03/18
Time: 14:30 - 14:55
Location: Room 210A
View Recording

ECS5009 - Early Stage Challenge: Replica Labs

Jack Morrison CTO, Replica Labs
Jack Morrison
Jack got started programming during his undergrad at Bowdoin College. He got hooked on robotics and computer vision as a member of Northern Bites RoboCup team at Bowdoin. Since then, he's spent his hours working on optical navigation for UAVs and researching distributed SLAM with cellphones. He resides in Boulder, Colorado with his fiance and their pets.

Replica Labs is a computer vision company focused on dense reconstruction from video feeds alone. Using the amazing, highly parallelizable power of Nvidia's CUDA core technology, we are able to translate single lens video feeds, that are available from any smartphone, into dense and highly accurate 3D reconstructions. Replica Labs' current focus is to use this amazing core technology to disrupt the way consumers take measurements of objects in their every day lives. With our robust software solution, we are able to reconstruct objects at a consumer's home with sub-millimeter accuracy! Replica's first product, Rendor, will empower billions of phones to become 3D scanners that can then transform the landscape and reach of 3D rendering and the reach that such information will have in e-commerce. With a small team of computer vision scientists and engineers, Rendor is currently under development and in an open beta phase of testing.

Level: All
Type: Talk
Tags: Computer Vision & Machine Vision; Emerging Companies Summit; Press-Suggested Sessions: Deep Learning & Computer Vision

Day: Wednesday, 03/18
Time: 15:00 - 15:08
Location: Room 220B

S5577 - Building State-of-Art Face Processing Pipeline with GPU

Shuchang Zhou Principal Scientist, Megvii Inc.
Shuchang Zhou
Shuchang Zhou got his bachelor's degree from Tsinghua University in 2004 and Master's degree from National University of Singapore in 2005. Before joining Megvii in 2014, he had worked as Assistant Professor in Chinese Academy of Sciences and a software engineer in Google. He was owner of multiple American and international patents.

Megvii Inc. revisited face-related problems with deep learning techniques powered by GPU. Substantial progress had been made and performance kept increasing with inflowing of data. This brings facial recognition technique closer to solving the identity problem, which is fundamental to security, credibility and accountability of Internet. Availability and power-efficiency of GPU enables Megvii to explore deeper and more complex neural network topology, handle higher resolution images and videos, and extend to embedded devices of more limited power profile. As time of writing, Face++ of Megvii is a leading face recognition service provider on cloud, and has processed more than 40 billion images and run on 50 million devices.

Level: All
Type: Talk
Tags: Computer Vision & Machine Vision; Big Data Analytics; Machine Learning & Deep Learning; Press-Suggested Sessions: HPC & Science

Day: Wednesday, 03/18
Time: 15:00 - 15:25
Location: Room 210B
View Recording
View PDF

ECS5012 - Early Stage Challenge: QM Scientific

Faris Alqadah CEO, QM Scientific
Faris Alqadah
Faris leads the overall vision of QM Scientific to be the smartest shopping intelligence platform that answers consumers everyday questions simply, accurately and in real-time. In addition, he leads data science development and holds a PhD in the field from the University of Cincinnati. Prior to QM Scientific, Faris built very large scale consumer propensity, segmentation, and recommender systems as a senior data scientist at PayPal. Previously, he served as a fellow at the Johns Hopkins School of Medicine where he applied data science research to solve challenging problems in genomics and proteomics. His data science research work has been published in leading peer-reviewed conferences and journals and was awarded a Best Doctoral Forum Poster Award and twice nominated for Best Paper Awards. For fun Faris bangs his head to epic heavy metal music and fights monsters with his two children.

QM Scientific (QMS) is a shopping intelligence company whose platform empowers consumers to make smart buying decisions in real time. Targeting the grocery retail vertical, our goal is to enable consumers to answer everyday questions such as: • "What is the best store to shop at right now for my list?" • "Are there cheaper alternatives for the products I buy regularly?" • "How much do I spend on diapers and beer monthly?" Simply, accurately and in real time. The QMS platform utilizes proprietary Data Science, Computer Vision and Natural Language Processing technology to intelligently extract, connect, and organize millions of products and prices from thousands of sources including the web, partner datasets, and receipt/product images. In December 2014, QMS launched PriceSwarm: a grocery price comparison app built on the QMS platform. With PriceSwarm, users create shopping lists in natural language and the platform recommends a store to shop at by optimizing price, quality, shopping behavior and location. In addition, users contribute real time prices and receive personalized cost saving recommendations and analytics by simply snapping a picture of their receipt.

Level: All
Type: Talk
Tags: Machine Learning & Deep Learning; Computer Vision & Machine Vision; Emerging Companies Summit

Day: Wednesday, 03/18
Time: 15:30 - 15:38
Location: Room 220B

S5368 - Nonlinear Structured Prediction Using the GPU

Alexander Schwing Postdoctoral Fellow, University of Toronto
Alexander Schwing
Alex Schwing's research interests are optimization algorithms, statistical models and parallelization of implementations for high performance computing environments. Interesting playground for all three fields are inference and structured prediction in machine learning as well as computer vision, and in particular 3D scene understanding. He is currently working with Prof. Ruslan Salakhutdinov and Prof. Raquel Urtasun as a postdoc within the machine learning group at University of Toronto, Computer Science department. In 2013 he completed his PhD under supervision of Prof. Marc Pollefeys, Prof. Tamir Hazan and Prof. Raquel Urtasun within the Computer Vision and Geometry (CVG) group of the computer science department of ETH Zurich (ETHZ).

Learn how to combine deep neural networks with probabilistic models to build classification algorithms that jointly reason about multiple variables. Typically, deep neural networks reason about a single object of interest within the observed data, e.g., a single object within an image. We show how to enrich deep learning to jointly predict a set of random variables while leveraging learned variable correlations. To this end we present an efficient GPU driven algorithm based on neural networks that is able to jointly capture nonlinearities for multiple variables and their correlations.

Level: Intermediate
Type: Talk
Tags: Machine Learning & Deep Learning; Computer Vision & Machine Vision

Day: Wednesday, 03/18
Time: 15:30 - 15:55
Location: Room 210A
View Recording
View PDF

S5546 - GPU Accelerated Haze Removal on Tegra K1

Bin Zhou Adjunct Research Professor, University of Science and Technology of China
Bin Zhou
Dr. Bin Zhou is the director and chief scientist of Marine Information Processing Laboratory(MIPL) at Institution of Oceanography, Shandong Academy of Sciences. He serves as an Adjunct Research Professor in School of Information Science and Technology at USTC and an NVIDIA CUDA Fellow. He is the PI of CUDA research center (CRC) in Institute of Advanced Technology(IAT), USTC.In MIPL, he leads a team working on information processing systems for marine environmental pollution & natural hazard monitoring and ocean-atmosphere simulation. In CRC, he performs researches on drones control, video processing and computer vision algorithms on NVIDIA GPU/CUDA platform.

This talk shows how Tegra K1 GPU accelerates the dehazing process for outdoor computer vision systems. Toxic haze becomes a major air pollution threat in China, which affects not only public health but also outdoor computer vision systems. By adapting dark channel prior method into dehazing process, very good effects are achieved. However, huge processing requirements bring big challenges. We refined the parallel algorithm and performed deep-optimization on Tegra K1 Jetson platform. Compared to ARM CPU, experiments show 156x speedup. The results show Tegra K1 has great potential for embedded real-time computer vision processing.

Level: All
Type: Talk
Tags: Computer Vision & Machine Vision; Video & Image Processing; Press-Suggested Sessions: Deep Learning & Computer Vision

Day: Wednesday, 03/18
Time: 15:30 - 15:55
Location: Room 210B
View Recording

S5789 - The Fast Lane from Silicon Valley to Munich

Uwe Higgen BMW Group Technology Office USA, BMW Group
Uwe Higgen
Effective September 1, 2014 Uwe Higgen was appointed Head of the BMW Group Technology Office USA. In this position, he is responsible for accelerating the delivery of automotive innovation to customers through the evaluation, development and design of new technologies. Higgen oversees a team of highly talented engineers specializing in Connected Car, Electromobility, Powertrain, Autonomous Driving and User Experience/Interface Design. His Silicon Valley-based team produces work that enables BMW to be the future, see the future and reimagine the future of world-class automotive engineering for individual mobility. The BMW Group Technology Office USA focuses on human-machine interfaces, mechatronics, infotainment, telematics and creating new portals and opportunities for business communication. Prior to his arrival in the U.S., Higgen served as the Head of BMW Group AppCenter in Munich. In this role, he was responsible for delivering the integration of information and entertainment smartphone applications into the vehicle. Higgen began his career at BMW Group in 2001. He holds a master degree in Computer Science from The Carl von Ossietzky University of Oldenburg.

Learn how the BMW Group Technology Office in Silicon Valley integrates with the automaker's world-wide research and development departments, with a specific focus on an active safety system running on NVIDIA hardware, recently developed for the i3. As one of the first automakers to open a research and development office in Silicon Valley, BMW has a long history of innovation in the Bay Area. Projects range from series vehicles to Formula 1, and from research to pre-development – including the iDrive interface, Apps4Automotive, the all-electric Mini-E, and Head-Up Displays.

Level: Beginner
Type: Talk
Tags: Automotive; Embedded Systems; Computer Vision & Machine Vision; Press-Suggested Sessions: Cars

Day: Wednesday, 03/18
Time: 15:30 - 15:55
Location: Room LL20D
View Recording
View PDF

S5457 - Maximizing Face Detection Performance on GPUs

Paulius Micikevicius devtech engineer, NVIDIA
Paulius Micikevicius is a developer technology engineer at NVIDIA, focusing on performance analysis and optimization. Prior to joining NVIDIA he was an assistant professor of Computer Science at Armstrong Atlantic State University. Paulius holds a PhD in Computer Science from University of Central Florida and BSc from Midwestern State University.

In this talk we look at GPU performance optimization for face detection using various techniques and features, including cascades with Haar-like features, multi-block local binary patterns. For each approach we examine various implementation tradeoffs and their performance limiters, as well as performance dependence on data. We also investigate optimization by combining the approaches and by doing additional work pruning.

Level: Intermediate
Type: Talk
Tags: Computer Vision & Machine Vision; Developer - Performance Optimization

Day: Wednesday, 03/18
Time: 16:00 - 16:50
Location: Room 210B
View Recording
View PDF

S5637B - ZFAS - The Brain of Piloted Driving at Audi

Matthias Rudolph Head of Architecture Driver Assistance Systems, Audi AG
Matthias Rudolph
Dr. Rudolph studied Electrical Engineering at the University of Kassel and got his Ph.D. in Aerospace Engineering and Engineering Mechanics from Iowa State in 1999 with a minor in mathematics. After holding various positions at Audi he took in 2009 the Lead of the Department "Architecture Driver Assistance Systems". The zFAS project is one of the core development of the department. Dr. Rudolph is a member of the management at Audi.

During the last several years, Audi has developed with partners a platform that enables piloted driving and piloted parking. At CES 2015 it was shown that the system can drive piloted on the highway from Silicon Valley to Las Vegas. The computational platform or brain of this vehicle is called zFAS, with the core element being the NVIDIA Tegra K1. This talk will start with the history and the motivation of piloted functions at Audi, followed by an overview of the current architecture and an outline of future potential leveraging deep learning algorithms.

Level: Intermediate
Type: Talk
Tags: Automotive; Computer Vision & Machine Vision; Video & Image Processing; Press-Suggested Sessions: Cars

Day: Wednesday, 03/18
Time: 16:00 - 16:25
Location: Room LL20D
View Recording

S5319 - CUDA in Urban Search and Rescue: Mission Planing Module for Icarus Project

Pawel Musialik Programmer and Young Researcher, Institute of Mathematical Machines
Pawel Musialik
Pawel is a graduate of Warsaw University of Technology, currently a Ph.D. candidate on Military University of Technology in Warsaw. Since February 2012 Pawel has been a young scientist and programmer in Institute of Mathematical Machines. His current research topics are semantic maps, 3D point cloud analysis and quantitative and qualitative reasoning. Pawel posseses over 5 years of C++ experience, 2 years as CUDA programmer, and 4 years of experience in academic lectures.

This session will concentrate on the topic of mission planning for search hand rescue personnel and how CUDA can help in this task. Urban Search and Rescue is a challenging and important activity in current society. The ICARUS project (Integrated Components for Assisted Rescue and Unmanned Search operations) concentrates on aiding these efforts by providing robotic components for rescue teams. Adding mobile robots into the tray rises the need for additional planning effort, which would consume a lot of time using classical approach. We will present how this can be prevented by using CUDA based mission planners for solving tasks like path planning, patrol communication relay location etc. A number of CUDA-implemented algorithms will be shown along with example results.

Level: All
Type: Talk
Tags: Machine Learning & Deep Learning; Computer Vision & Machine Vision

Day: Wednesday, 03/18
Time: 16:30 - 16:55
Location: Room 210A
View Recording
View PDF

S5331 - GPUs and Machine Learning: A Look at cuDNN

Sharan Chetlur Software Engineer, NVIDIA
Highly-Rated Speaker
Sharan Chetlur
Sharan Chetlur is an engineer at NVIDIA working in the CUDA Libraries and Algorithms Group. He currently works in the fields of Deep Learning and Neural Networks, and is a developer of the cuDNN library. Previously, his focus was on applications in Linear Algebra, working as a developer on the cuBLAS and cuSparse libraries. Sharan holds a Master's Degree in Computer Engineering from the University of Florida.

We describe cuDNN, NVIDIA's in-house CUDA library of deep learning primitives. Addressing the demand from engineers and data scientists, we created a library similar in intent to BLAS, with optimized routines for deep learning workloads. Previously, Neural Network framework developers had to implement these low-level routines for GPUs on an ad-hoc basis, optimizing individual computational kernels by hand and repeating this work as new parallel processors emerged. cuDNN alleviates this burden by providing tuned black box implementations of these functions. The library is easy to integrate into existing frameworks, and provides optimized performance and memory usage across GPU generations. We discuss supported functionality, algorithmic implementation details and performance achieved.

Level: Intermediate
Type: Talk
Tags: Machine Learning & Deep Learning; Computer Vision & Machine Vision; Developer - Tools & Libraries

Day: Thursday, 03/19
Time: 09:00 - 09:50
Location: Room 210A
View Recording

S5612 - Fighting Malware With GPUs in Real Time

Peter Kovac Senior Researcher, Avast Software
Peter Kovac
Peter Kovac has been working for Avast for nearly five years, currently holding the position of Senior Researcher. He is one of the authors of the GPU database that powers the classifier discussed in this talk. Peter believes in simple solutions for complex problems and likes to read fantasy books.

Dive deep into the problematic of protecting electronic devices such as PC's, smartphones or tablets against malicious software. In this talk we will show you how we handle the ever increasing number of malware samples produced by the malware ecosystem every day. To leverage similarities between the samples for automated classification we built a distributed database engine relying on GPUs. With query times of fraction of a second even using a compound distance function, this system is able to classify incoming samples in real time. Samples classified as malware are directly used to generate rules to identify similar samples in machines of our customers.

Level: All
Type: Talk
Tags: Big Data Analytics; Machine Learning & Deep Learning; Press-Suggested Sessions: Deep Learning & Computer Vision; Press-Suggested Sessions: HPC & Science

Day: Thursday, 03/19
Time: 10:00 - 10:25
Location: Room LL21C
View Recording
View PDF

S5665 - GPUs and the ImageNet Large Scale Visual Recognition Challenge (ILSVRC)

Olga Russakovsky Ph.D. Student , Computer Science, Stanford University
Olga Russakovsky
Olga Russakovsky (http://ai.stanford.edu/~olga) is a computer science PhD student at Stanford University advised by Professor Fei-Fei Li. Her main research interest is in computer vision, specifically focusing on large-scale object detection and recognition. For the past two years she has been the lead organizer of the international ImageNet Large Scale Visual Recognition Challenge which was featured in the New York Times, MIT Technology Review, and other international media venues. She has organized several workshops at top-tier computer vision conferences: the ImageNet challenge workshop at ICCV’13 and ECCV’14, the upcoming workshop on Large-Scale Visual Recognition and Retrieval at CVPR’15, and the new Women in Computer Vision workshop at CVPR’15. During her PhD she collaborated closely with NEC Laboratories America and with Yahoo! Research Labs. She was awarded the NSF Graduate Fellowship and the CRA undergraduate research award.
Alex Berg Assistant Professor, UNC Chapel Hill
Alex is interested in all aspects of computer vision and related problems in other fields. His thesis was on shape and object recognition in images using a new take on deform-able templates. He also works on large scale machine learning algorithms for object recognition and detection, image retrieval, recognizing and synthesizing human action in video, recovering human body poses from photographs, detecting and identifying human faces in images, detecting vehicles in images, and more.

This session will provide an introduction to the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), analyze the results of the 2014 challenge and provide a glimpse into what the 2015 challenge has in store. Highlights include a discussion of large-scale image recognition, a history of the ILSVRC and an overview of current techniques and trends in image classification and object detection, as well as the role that GPUs have played in this challenge.

Level: All
Type: Talk
Tags: Machine Learning & Deep Learning; Computer Vision & Machine Vision

Day: Thursday, 03/19
Time: 10:00 - 10:50
Location: Room 210A
View Recording

S5540 - Building a Life-Size Automultiscopic Display Using Consumer Hardware

Andrew Jones Research Programmer, USC Institute for Creative Technologies
Andrew Jones has been a researcher in the Graphics Lab at the USC Institute for Creative Technologies since 2002. His research has covered reconstructing the Parthenon in Athens, high dynamic range photography, and 3D scanning of human faces, bodies, and performances. Currently, Andrew is finishing up his PhD work on rendering for automultiscopic 3D displays.

Automultiscopic displays allow multiple users to experience 3D content without the hassle of special glasses or head gear. Such displays generate many simultaneous images with high-angular density, so that each eye perceives a distinct and different view. This presents a unique challenge for content acquisition and rendering. In this talk, we explain how to build an automultiscopic display using off-the-shelf projectors, video-splitters, and graphics cards. We also present a GPU-based algorithm for rendering a large numbers of views from a sparse array of video cameras.

Level: Intermediate
Type: Talk
Tags: Visualization - Large Scale & Multi-Display; Augmented Reality & Virtual Reality; Computer Vision & Machine Vision

Day: Thursday, 03/19
Time: 10:30 - 10:55
Location: Room LL21F
View Recording
View PDF

S5788 - Recent Advances in Deep Learning at Microsoft: A Selected Overview

Li Deng Partner Research Manager, Microsoft
Li Deng received the Ph.D. degree from the University of Wisconsin-Madison. He was an assistant professor, tenured associate and full professor at the University of Waterloo, Ontario, Canada during 1989-1999, and then joined Microsoft Research, Redmond, USA, where he currently leads R&D of application-focused deep learning and machine intelligence as Partner Research Manager of its Deep Learning Technology Center. He is Fellow of the IEEE, and Editor-in-Chief of IEEE Signal Processing Magazine and of IEEE/ACM Transactions on Audio, Speech, and Language Processing.

Since 2009, Microsoft has engaged with academic pioneers of deep learning and has created industry-scale successes in speech recognition as well as in speech translation, object recognition, automatic image captioning, natural language, multimodal processing, semantic modeling, web search, contextual entity search, ad selection, and big data analytics. Much of these successes are attributed to the availability of big datasets for training deep models, the powerful general-purpose GPU computing, and the innovations in deep learning architectures and algorithms. In this talk, a selected overview will be given to highlight our work in some of these exciting applications, as well as the lessons we have learned along the way as to what tasks are best solved by deep learning methods.

Level: All
Type: Talk
Tags: Machine Learning & Deep Learning; Press-Suggested Sessions: Deep Learning & Computer Vision

Day: Thursday, 03/19
Time: 14:30 - 14:55
Location: Room 210A
View Recording

S5646 - RGBD Occlusion Detection via Deep Convolutional Neural Networks

Vivek Venugopalan Senior Research Scientist, United Technologies Research Center
Vivek Venugopalan is a Senior Research Scientist with United Technologies Research Center (UTRC). He works in the areas of hardware acceleration and reconfigurable platforms at UTRC for aerospace and building system applications.

Occlusion edge detection is a very challenging task in robotics and unmanned autonomous vehicles. The occlusion edges in images correspond to range discontinuity in the scene from the point of view of the observer and extracting these edges from raw image and videos are very computationally intensive. Deep Learning techniques have largely replaced existing methods for extracting information in similar applications by mapping the problem to large multi-layer neural networks. These techniques rely on utilizing Deep Convolutional Neural Networks (DCNNs) with multiple hidden layers for capturing the local spatial correlations, that help in identifying occlusion edges in images and videos.

Level: Intermediate
Type: Talk
Tags: Machine Learning & Deep Learning; Computer Vision & Machine Vision; Embedded Systems

Day: Thursday, 03/19
Time: 15:00 - 15:25
Location: Room 210A
View Recording
View PDF

S5590 - Deep Neural Networks for Visual Pattern Recognition Problems

Dan Ciresan Senior Researcher, IDSIA
Dan Ciresan
Dr. Dan Ciresan received his PhD from Universitatea Politehnica Timisoara, Romania. He first worked as a postdoc before becoming a senior researcher at IDSIA, Switzerland. Dr. Ciresan is one of the pioneers of using CUDA for Deep Neural Networks (DNNs). His methods have won five international competitions on topics such as classifying traffic signs, recognizing handwritten Chinese characters, segmenting neuronal membranes in electron microscopy images, and detecting mitosis in breast cancer histology images. Dr. Ciresan has published his results in top-ranked conference proceedings and journals. His DNNs have significantly improved the state of the art on several image classification tasks.

GPU-optimized Deep Neural Networks (DNNs) excel on visual pattern recognition tasks. They are successfully used for automotive problems like pedestrian and traffic sign detection. DNNs are fast and extremely accurate. They help the field of connectomics by making it possible to segment and reconstruct the neuronal connections in large sections of brain tissue for the first time. This will bring a new understanding of how biological brains work. DNNs power automatic navigation of a quadcopter in the forest.

Level: All
Type: Talk
Tags: Machine Learning & Deep Learning; Medical Imaging; Press-Suggested Sessions: Deep Learning & Computer Vision

Day: Thursday, 03/19
Time: 15:30 - 15:55
Location: Room 210A
View Recording

S5300 - High Quality Real Time Image Processing Framework on Mobile Platforms using Tegra K1

Eyal Hirsch Mobile GPU Leader , SagivTech Ltd.
Mr. Eyal Hirsch has 15 years’ experience as a software developer. Prior to joining SagivTech, Eyal was a member of AMD’s OpenCL team in Israel, developing and working on the OpenCL driver from AMD. Prior to AMD, Eyal was a team leader at Geomage, a leading software company in the Oil&Gas field. Geomage deployed one of the very first commercial GPU clusters in Israel, consisting of over many GPUs. Eyal developed all the GPU implementations and was responsible for all aspects of the GPU life cycle from development through production. Prior to Geomage, Eyal served as a team leader in Cyota, who was later sold to RSA.

Real time image processing involves computationally intensive tasks. It becomes extremely important for mobile platforms equipped with cameras, e.g. wearable devices. Image processing algorithms perfectly suit the GPU architecture, and their implementation on discrete GPUs is well established. Now, as compute enabled GPUs are available on mobile platforms, real time image processing is easier to obtain. SagivTech is a partner in Google's project Tango where it implemented Mantis Vision's depth algorithms on Tegra K1. Hear SagivTech experts on application of computer vision algorithms to the Tegra K1. We share our experience and provide tips on Mobile GPU computing, and demonstrate the advantages of implementing state of the art computer vision algorithms such as FREAK, BRISK and DOG.

Level: All
Type: Talk
Tags: Video & Image Processing; Computer Vision & Machine Vision; Developer - Performance Optimization

Day: Thursday, 03/19
Time: 16:30 - 16:55
Location: Room LL21A
View PDF

S5585 - Multi-GPU Training for Large-Scale Visual Object Recognition

Wei Xia Research Scientist, Orbeus
Wei Xia
Wei XIA, now working in Orbeus Inc. as a research scientist, expected to obtain his Ph.D. degree in Computer Vision and Machine Learning from National University of Singapore, 2014. He has rich research experience in the field of generic object classification, detection and segmentation. He has won the winner awards of both segmentation and classification competitions in PASCAL VOC Challenge 2012, runner-up award in ILSVRC Challenge 2013, and winner award in ILSVRC Challenge 2014, both of which are among the most impactful competitions in this field. He visited Lund University, Sweden, as a visiting scholar in 2013. He has published many academic papers in top international conferences and journals of computer vision, and was awarded the President Graduate Fellowship (1%) for his achievements in both research and coursework in National University of Singapore. He also served as the reviewer for many international conferences and journals, like ECCV/BMVC/ICASSP/ICPR/ICIP/ACMMM/TCSVT/MVP, etc.. Besides, for industry experience, he was the research intern in Panasonic Singapore Laboratory (2012-2013) and Singapore 2359 media Pte Ltd (2013).

Despite the great progress of the deep learning models (Deep Convolutional Neural Networks) in the field of visual recognition in the past few years, one of the greatest bottlenecks lies in the extremely long training hours (from several weeks to months) to handle tens of millions of training images. The goal of this session is to share the results that we achieved when we used multiple-GPUs installed in one server to speed-up the training process. By configuring 16 GPUs (8 Titan Zs) and optimizing the parallel implementation for the CNN training, up to 14x speed increase is achieved without compromising, and even sometimes boosting, the model's accuracy. Comprehensive experimental results have demonstrated the linear scalability of the proposed multi-GPU training processes.

Level: Intermediate
Type: Talk
Tags: Machine Learning & Deep Learning; Computer Vision & Machine Vision; Video & Image Processing

Day: Thursday, 03/19
Time: 17:00 - 17:25
Location: Room 210A
View Recording
View PDF

S5280 - Deep Learning at Scale

Ren Wu Distinguished Scientist, Baidu
Ren Wu
Dr. Ren Wu is a distinguished scientist at Baidu Research. He is leading the effort to push the frontier of deep learning and artificial intelligence (AI) via high-performance heterogeneous computing, aiming to make AI capable of doing anything and be in anywhere at anytime. Prior to joining Baidu, Ren served as chief software architect of Heterogeneous System Architecture (HSA) at AMD. Earlier, he was the principal investigator of CUDA Research Center at HP Labs, where he is widely known for his pioneering work in using GPUs to accelerate big data analytics as well as his contribution to large-scale machine learning algorithms via the GPU. Ren is also known for his work in computational artificial intelligence and computer game theory. He was the author of world champion Xiangqi (Chinese chess) program and the first person to perform systematic research computationally on Xiangqi endgames with astonishing discoveries which corrected and improved human's understanding and knowledge.

We present a state-of-the-art image recognition system, Deep Image, developed using end-to-end deep learning. The key components are a custom-built supercomputer dedicated to deep learning, a highly optimized parallel algorithm using new strategies for data partitioning and communication, larger deep neural network models, novel data augmentation approaches, and usage of multi-scale high-resolution images. On one of the most challenging computer vision benchmarks, the ImageNet classification challenge, our system has achieved the best result to date, with a top-5 error rate of 5.98% - a relative 10.2% improvement over the previous best result.

Level: All
Type: Talk
Tags: Machine Learning & Deep Learning; Supercomputing; Big Data Analytics; Press-Suggested Sessions: Deep Learning & Computer Vision

Day: Thursday, 03/19
Time: 17:30 - 17:55
Location: Room 210A
View Recording
View PDF

S5869 - SenDISA: Distributed Intelligent, Video, Sensor & Actuator Analytics Platform for Smart Cities (Presented by Sensen)

Dr. Subhash Challa CEO, Sensen Networks
With a focus on sales & strategy, I help my team to close sales and manage major accounts in a variety of markets including transportation, security, gaming and hospitality. Prior to taking up the full time role as the CEO of SenSen Networks in January 2012, I was a Senior Principal Scientist at NICTA, University of Melbourne and lead a number of ICT for life sciences projects. I started my professional career as a Research Fellow at the University of Melbourne in 1998, where I led a number of tracking and data fusion projects. With deep and passionate interest in taking ideas to usable products, I spent over a decade of my career in R&D & product development. I was the Professor of Computer Systems Engineering at the University of Technology Sydney from 2004-2007.

This session will introduce SenSen's proprietary Video, Sensor and Actuator Analytics Platform (SenDISA) that is used by world's most prestigious and trusted organizations including the Abu Dhabi airport, Singapore police, Roads & Maritime Services Australia, Westgate Bridge, Melbourne, Australia, City of Trondheim, Norway, City of Brisbane, Ipswich city, Manly city and more. We will present how our innovative algorithms powered by the GPGPU based SenDISA platform is enabling Big Data analytic applications by fusing data from Video, Sensor & IoT devises and combining them with other transaction data to deliver smart city solutions across the globe. We will provide insights into the architecture of SenDISA and the market specific Big Data solutions serving different market verticals.

Level: Intermediate
Type: Talk
Tags: Big Data Analytics; Computer Vision & Machine Vision; Video & Image Processing; Press-Suggested Sessions: Deep Learning & Computer Vision

Day: Thursday, 03/19
Time: 17:30 - 17:55
Location: Room 210B
View Recording
View PDF

S5346 - 3 Engineers, 2 Months: The World's First Operating Room Enhanced by High Performance Computing

John Clarke CTO, Cydar Ltd
John Clarke
John has a long history of working at the extremes of computing. His career started with microcomputers deployed at the bottom of oil wells and has now reached the cloud. Along the way he has analyzed the risk of financial transactions, located cell phones using cellular time-of-flight, and led teams building world leading codec hardware IP. John holds a PhD in robotics and machine vision from the University of Oxford.

Learn how a tiny team built and deployed a 424TFlop/s supercomputer in only two months. This supercomputer is used to provide real-time enhanced visualizations to endovascular surgeons during aortic aneurysm repair. Real-time machine vision demands not only massive parallel data processing but also massive dataflows and unavoidably serial processing. In this talk, we describe how three advanced machine vision algorithms were each taken from single high-end GPU and moved to a cloud of GPU servers where the price-performance sweet spot is far from the high end. We describe the design and performance of our work and data distribution systems which are solutions to the cloud specific problems of slow intra-cloud networking and occasional cloud server hiatuses.

Level: All
Type: Talk
Tags: Medical Imaging; Computer Vision & Machine Vision; Supercomputing; Press-Suggested Sessions: Deep Learning & Computer Vision; Press-Suggested Sessions: HPC & Science

Day: Friday, 03/20
Time: 09:00 - 09:50
Location: Room LL21B
View Recording
View PDF

S5435 - Using the Power of the GPU to Connect the Web to the Real World

Rob Manson CEO & Co-founder, buildAR.com
Rob Manson
Rob Manson is CEO & co-founder of buildAR.com, the world's first cloud based Augmented Reality Authoring Platform launched in 2009. Rob is one of the editors of the Media Stream Depth Extensions Specification and an Invited Expert with the ISO, W3C and the Khronos Group. He's an active evangelist within the global AR and standards communities and is regularly invited to speak on the topics of the Augmented Web, Augmented Reality, WebRTC and the development of multi-device platforms.

This session will take a detailed look at the various media stream processing pipelines available on the Web Platform and how the optimization of these will be critical in the near future. We will look specifically at how you can use GPUs directly from Javascript for vision and sensor processing. One specific example will explore how Depth Cameras can now be used to extend the web and the influence this may have on the other pipelines too. These streams of sensor and image data now make it possible to connect the web to the real world. GPUs are a key asset for taming this growing flood of data.

Level: Intermediate
Type: Talk
Tags: Web Acceleration; Augmented Reality & Virtual Reality; Computer Vision & Machine Vision

Day: Friday, 03/20
Time: 09:00 - 09:25
Location: Room LL21C
View Recording

S5208 - Streaming FFTs on Large 3D Microscope Images

Peter Steinbach HPC Developer, Max Planck Institute of Molecular Cell Biology and Genetics
Peter Steinbach
I have studied at Desy Hamburg and Zeuthen, Humboldt University of Berlin and the University of Leipzig of which I received a Diploma thesis in Physics. After that, I conducted a PhD thesis in particle physics by analysing data of the ATLAS experiment at the Large Hadron Collider (CERN. SUI). I am now a High Performance Computing (HPC) Developer at the Max Planck Institute of Molecular Cell Biology and Genetics where I support scientific groups to develop fast software that harnesses the capabilities of today's HPC installations.

Dive deep into efficient and fast memory transfers of multi-gigabyte image data to perform swift iterative deconvolutions of 3D microscope imagery. Through the creation of an open-source GPU deconvolution implementation (github.com/psteinb/libmultiviewnative), I studied various techniques to orchestrate memory copies of multi-dimensional images. I will present concepts, available options and details of efficient memory transfers from host to device memory. I will showcase CUDA/C++ code and discuss my experiences with various CUDA versions on NVIDIA hardware that lead to greater performance than achieved by just performing the calculations on device (2-3x). This work will enable the scientific community to push the limits of processing and handling data gathered by imaging living tissue.

Level: Intermediate
Type: Talk
Tags: Video & Image Processing; Life & Material Science; Computer Vision & Machine Vision; Data Center, Cloud Computing & HPC

Day: Friday, 03/20
Time: 10:00 - 10:25
Location: Room LL21A
View Recording

S5305 - A 2D Convolution Framework for Extreme Performance Tuning

Alan Wang Compute Architect, NVIDIA
Alan is a GPU Architect in computer vision field at NVIDIA. He is experienced in parallelization, performance modeling and architecture-specific tuning. Alan is currently working on 2D convolution projects. Before joining computer architecture team, Alan works on graphics tracing and FPGA architecture&EDA software.

We propose a 2D convolution framework that (1) maintains a unified abstraction incorporating a series of optimization techniques and (2) can auto-tune the performance on different GPUs. We quantify and analyze the performance impact of using a single strategy which reveals its potential when applied to other application. The experiment shows the algorithm tuned by our framework can reach a high GFLOPs utilization of nearly 80%, when target GM107.

Level: Intermediate
Type: Talk
Tags: Video & Image Processing; Developer - Performance Optimization; Computer Vision & Machine Vision

Day: Friday, 03/20
Time: 10:30 - 10:55
Location: Room LL21A
View Recording

S5734 - Khronos API Standards Update: Including Vulkan, OpenCL 2.1 and SPIR-V

Neil Trevett Vice President Mobile Ecosystem, NVIDIA
Neil has spent over thirty years in the 3D graphics industry and is currently responsible for driving the advanced apps ecosystem for NVIDIA Tegra. Neil is also the elected President of the Khronos Group industry standards consortium where he initiated the OpenGL ES standard, helped catalyze the WebGL project and chairs the OpenCL working group. Previously, as Vice President of 3Dlabs, Neil was at the forefront of the silicon revolution bringing interactive 3D to the PC, and he established the embedded graphics division of 3Dlabs to bring advanced visual processing to a wide-range of non-PC platforms. Neil was elected President for eight consecutive years of the Web3D Consortium dedicated to creating open standards for communicating real-time 3D on the Internet. Neil graduated from Birmingham University in the UK with a First Class Joint Honors B.Sc. in electronic engineering and computer science and holds several patents in the area of graphics technology.

Discover how over 120 companies cooperate at the Khronos Group to create open, royalty free standards that enable developers to access the power of the GPU to accelerate demanding compute, graphics and vision applications. This session includes the very latest updates, including the newly announced Vulkan, SPIR-V and OpenCL 2.1 specifications.

Level: All
Type: Talk
Tags: Real-Time Graphics; Computer Vision & Machine Vision; Web Acceleration

Day: Friday, 03/20
Time: 11:00 - 11:25
Location: Room LL21D
View Recording

S5720 - DeepFont: Large-Scale Real-World Font Recognition from Images

Jianchao Yang Research Scientist, Adobe
Jianchao Yang
Jianchao is a research scientist at Adobe Research in San Jose, California. My research interests are in the broad area of computer vision, machine learning and image processing, with focuses on image categorization, object recognition and detection, image and video super-resolution, denoising and deblurring, metric learning, data embedding and deep learning. He received my M.S. and Ph.D. degrees both from the ECE Department of University of Illinois at Urbana-Champaign (UIUC) under Prof. Thomas S. Huang. Prior to that, he obtained my B.S. degree in EEIS Department from University of Science and Technology of China (USTC).

This works addresses the problem of recognizing font style of the text from an image. Our algorithm is based on a carefully designed deep convolutional neural network. Since collecting real-world training text images for font recognition is extremely difficult, we have to resort to synthetic training data, which unfortunately has a large domain mismatch from the real-world test examples. Besides data augmentation techniques of adding synthetic degradations, we also present a domain adaptation framework to bring the gap between synthetic training and real-world testing. In particular, we introduce a convolutional neural network decomposition approach to obtain effective features for classification, which is done based on stacked convolutional auto encoders. Millions of images are used in the model, which could not have been trained without the GPU and CUDA. The proposed DeepFont system achieves top-5 accuracy of over 80% on a large labeled real-world test set we collected.

Level: Beginner
Type: Talk
Tags: Machine Learning & Deep Learning; Press-Suggested Sessions: Deep Learning & Computer Vision

Day: Friday, 03/20
Time: 14:30 - 14:55
Location: Room 210A
View Recording
View PDF

Talk
 

TUTORIAL

Presentation
Details

S5796 - Image Learning and Computer Vision in CUDA (Presented by ArrayFire)

Peter Andreas Entschev Software Engineer, ArrayFire
Peter Entschev is currently a Software Developer at ArrayFire, where he primarily works on concurrent computer vision problems. He has received his Bachelor's degree in Telecommunication Systems and Master's degree in Computer Science from the Federal University of Technology - Paraná (UTFPR), Brazil. Before joining ArrayFire, he worked on real-time computer vision research at SEW-Eurodrive in Germany and with system administration and development of Linux distributions for the Brazilian Government.

Analyzing a massive data set? Need fast results? Need computer vision algorithms? Not sure when and where to start? The answer is here and now! In this tutorial we will give you the tools to bring your favorite computer vision algorithm to life. In this tutorial we will go over key challenges for implementing computer vision and machine learning algorithms on the GPU. We will walk you through several computer vision algorithms for the GPU (ORB, Fast, SIFT) and give you the hands experience to implement you own algorithms.

Level: All
Type: Tutorial
Tags: Video & Image Processing; Computer Vision & Machine Vision

Day: Tuesday, 03/17
Time: 15:00 - 16:20
Location: Room 210F
View Recording
View PDF

Tutorial
 

PANEL

Presentation
Details

S5816 - Project Tango Tablet: Application Rapid Fire Presentations

Larry Yang Lead Product Manager, Project Tango, Google
Larry Yang
Larry Yang is the lead product manager of Project Tango, responsible creating an ecosystem of devices, applications and services that use the Project Tango platform. Before Project Tango he led the product management team for Google Fiber, and before that he was the product manager for the GoogleTV platform and partnerships. Larry has 15 years of experience creating innovative consumer products and services, including leading product management for a new consumer video conferencing business at Cisco Systems and serving as general manager of the Xbox 360 console group at Microsoft. Before his work with consumer electronics, he spent 10 years in various microprocessor development and marketing roles at Sun Microsystems. Larry has a B.S. and M.S. in Electrical Engineering from Stanford University.
Eric Lee CTO & Partner, Left Field Labs
Eric Lee
Eric Lee is CTO and partner at Left Field Labs, a storytelling company based in Venice CA that believes technology is pushing humanity towards a new era of art, culture, and commerce. Eric is driven by building interactive products and platforms that showcase the power of emerging technologies in enhancing the human experience. He has worked in design and engineering for over 15 years, and has led the creation of award winning projects ranging from apps and websites to 3d­printed music players and virtual reality games.
Jeff Schmitz Senior Technical Artist & Graphics Programmer, NVYVE
Jeff  Schmitz
Jeff Schmitz is a senior technical artist and graphics programmer at NVYVE, a leading company in architectural and product visualization, and Unite 2014 award winner for Best Visual Simulation. With over 10 years of experience, Jeff has gained an unparalleled wealth of knowledge and experience in Unity, Unreal Engine and OpenGL. Always looking for challenges and passionate about what he does, Jeff has been working to bring the very latest in cutting edge 3D graphics techniques into the visualization and simulation industry.
Iman Mostafavi Co-Founder and COO, Limbic
Iman  Mostafavi
Iman Mostafavi is co-founder and COO of Limbic, a mobile-focused game studio best known for its hit games Zombie Gunship and Tower Madness. He left the Ph.D. program in Computer Science at U.C. San Diego to pursue his dream of making games. Before Limbic, Iman created visualization software to help neuroscientists study the brain.

Come to hear from the first wave of application developers exploring the unique odometry and depth sensor capabilities of Google's Tango Tablet using Tegra K1. Five leading-edge developers will showcase the applications that they are developing for Tango, how they are using Tango's spatial awareness – and share the lessons learned so far.

Level: All
Type: Panel
Tags: Computer Vision & Machine Vision; Press-Suggested Sessions: Deep Learning & Computer Vision

Day: Wednesday, 03/18
Time: 09:00 - 09:50
Location: Room LL21A
View Recording
View PDF

Panel
 

KEYNOTE

Presentation
Details

S5818 - Deep Learning: What's Next

Andrew Ng Chief Scientist, Baidu
Dr. Andrew Ng is Chief Scientist at Baidu. He leads Baidu Research, which includes three interrelated labs: the Silicon Valley AI Lab, the Institute of Deep Learning and the Big Data Lab. The organization brings together global research talent to work on fundamental technologies in areas such as image recognition and image-based search, speech recognition, natural language processing and semantic intelligence. In addition to his role at Baidu, Dr. Ng is a faculty member in Stanford University's Computer Science Department, and Chairman of Coursera, an online education platform that he co-founded. Dr. Ng is the author or co-author of over 100 published papers in machine learning, robotics and related fields. He holds degrees from Carnegie Mellon University, MIT and the University of California, Berkeley.

Deep Learning has transformed many important tasks, including speech and image recognition. Deep Learning systems scale well by absorbing huge amounts of data to create accurate models. The computational resources afforded by GPUs have been instrumental to this scaling. However, as Deep Learning has become more mainstream, it has generated some hype, and has been linked to everything from world peace to evil killer robots. In this talk, Dr. Ng will help separate hype from reality, and discuss potential ways that Deep Learning technologies can benefit society in the short and long term.

Level: All
Type: Keynote
Tags: Machine Learning & Deep Learning; Computer Vision & Machine Vision; Press-Suggested Sessions: General Interest

Day: Thursday, 03/19
Time: 11:00 - 12:00
Location: Hall 3
View Recording
View PDF

Keynote
 

HANDS-ON LAB

Presentation
Details

S5647 - Hands-on Lab: DIY Deep Learning for Vision with Caffe

Evan Shelhamer PhD Student / Lead Developer of Caffe, UC Berkeley
Evan Shelhamer
Evan Shelhamer is a PhD student at UC Berkeley advised by Trevor Darrell as a member of the Berkeley Vision and Learning Center. His research is on deep learning and end-to-end optimization for vision. He is the lead developer of the Caffe deep learning framework and takes his coffee black.
Yangqing Jia Research Scientist, Google
Yangqing Jia
Yangqing Jia finished his Ph.D. in computer vision at UC Berkeley supervised by Trevor Darrell in May 2014. He is now a research scientist at Google. His main interests lie in large-scale and cognitive science inspired vision systems. His work focuses on enabling efficient learning of state-of-the-art features and human-like concept generalization from perceptual inputs. He was in the GoogLeNet team that won several of the ILSVRC 2014 challenges. He was also the recipient of the best paper award at ECCV 2014. He is the original author and a core developer of Caffe.
Jon Long PhD Student, UC Berkeley
Jon Long
Jon is a PhD student with Trevor Darrell at UC Berkeley and a developer of the Caffe deep learning framework. Jon seeks powerful and efficient vision systems, building on our advancing understanding of deep learning. His recent research focuses on object detection and segmentation.

This tutorial is designed to equip researchers and developers with the tools and know-how needed to incorporate deep learning into their work. Both the ideas and implementation of state-of-the-art deep learning models will be presented. While deep learning and deep features have recently achieved strong results in many tasks, a common framework and shared models are needed to advance further research and applications and reduce the barrier to entry. To this end we present the Caffe framework that offers an open-source library, public reference models, and working examples for deep learning. Join our tour from the 1989 LeNet for digit recognition to today's top ILSVRC14 vision models. Follow along with do-it-yourself code notebooks. While focusing on vision, general techniques are covered.

Level: All
Type: Hands-on Lab
Tags: Machine Learning & Deep Learning; Computer Vision & Machine Vision

Day: Wednesday, 03/18
Time: 14:00 - 15:20
Location: Room 211A

S5574 - Hands-on Lab: Applied Deep Learning for Vision, Natural Language and Audio with Torch7

Soumith Chintala Research Engineer, Facebook
Soumith is a Research Engineer at Facebook AI Research. Prior to joining Facebook in August 2014, Soumith worked at MuseAmi, where he built deep learning models for music and vision targeted at mobile devices. In the past, Soumith worked on state-of-the-art deep learning models for pedestrian detection, natural image OCR, depth-images among others while driving his research heavily using CUDA and multiple GPUs.

This is a hands-on tutorial targeted at machine learning enthusiasts and researchers and covers applying deep learning techniques on classifying images, videos, audio and natural language data. The session is driven in Torch: a scientific computing platform that has great toolboxes for deep learning and optimization among others, and fast CUDA backends with multi-GPU support. Torch is supported by Facebook, Google, Twitter and a strong community who actively open-source their code and packages.

Level: Beginner
Type: Hands-on Lab
Tags: Machine Learning & Deep Learning; Computer Vision & Machine Vision; Signal & Audio Processing

Day: Wednesday, 03/18
Time: 15:30 - 16:50
Location: Room 211A

S5574B - Hands-on Lab: Applied Deep Learning for Vision, Natural Language and Audio with Torch7

Soumith Chintala Research Engineer, Facebook
Soumith is a Research Engineer at Facebook AI Research. Prior to joining Facebook in August 2014, Soumith worked at MuseAmi, where he built deep learning models for music and vision targeted at mobile devices. In the past, Soumith worked on state-of-the-art deep learning models for pedestrian detection, natural image OCR, depth-images among others while driving his research heavily using CUDA and multiple GPUs.

This is a hands-on tutorial targeted at machine learning enthusiasts and researchers and covers applying deep learning techniques on classifying images, videos, audio and natural language data. The session is driven in Torch: a scientific computing platform that has great toolboxes for deep learning and optimization among others, and fast CUDA backends with multi-GPU support. Torch is supported by Facebook, Google, Twitter and a strong community who actively open-source their code and packages.

Level: Beginner
Type: Hands-on Lab
Tags: Machine Learning & Deep Learning; Computer Vision & Machine Vision; Signal & Audio Processing

Day: Wednesday, 03/18
Time: 17:00 - 18:20
Location: Room 211A

S5895 - Hands-on Lab: Accelerating Computer Vision Algorithms on CUDA-Capable Embedded Platforms

Alexander Smorkalov Senior Software Engineer, Itseez
Alexander  Smorkalov
Alexander Smorkalov is a Senior Software Engineer at Itseez, leading a team of developers maintaining the OpenCV library on mobile and embedded platforms. He received a master's degree from Nizhny Novgorod State University, in Russia. His professional interests include: system programming, computer vision, and performance acceleration of multimedia applications. Alexander is a contributor to OpenCV library development, and has worked on porting it to Android, Windows RT and Embedded Linux platforms.

Computer Vision is becoming a part of everyday life, and is an integral feature of modern smart devices. The OpenCV library (http://opencv.org) provides powerful tools for initial algorithm development, and enables deployment on a wide spectrum of target platforms, ranging from servers to embedded and mobile devices. Despite rapid growth in available computational power, application performance and responsiveness still remain a key challenge. In this hands-on lab with Jetson TK1 dev kits, we will study how an OpenCV-based application can be ported to the NVIDIA Jetson embedded platform, and then accelerated using CUDA technology. We will start with a computational photography application that uses only CPU cores for processing. Next, we will profile the application and see that the CPU is not powerful enough to process high resolution images with acceptable performance. After that, we will replace detected hotspots with CUDA implementations, enabling decent processing speed. Finally, some tips and best practices for cross-platform development will be given.

Level: Beginner
Type: Hands-on Lab
Tags: Embedded Systems; Computer Vision & Machine Vision

Day: Friday, 03/20
Time: 09:30 - 10:50
Location: Room 211A

S5850 - Hands-on Lab: Project Tango UnitySDK & C/C++ API Introduction

Jason Guo Developer Relations Engineer, Google
Jason Guo is a Developer Relations Engineer at Google Project Tango team. He received his Master degree from Carnegie Mellon University, Entertainment Technology Center. Jason holds a background in human computer interaction, digital media, and computer graphics. With a great passion of exploring new interactive experiences, Jason has been involved with multiple AR/VR projects related to depth sensing and 6 DOF motion tracking. Jason is also one of the major contributors of Project Tango examples and demos.
Ravi Teja Kommineni Developer Relations Engineer, Google
Ravi Teja Kommineni
Ravi has been an ardent enthusiast of games and technology since his childhood. This is what interested him to take up computer science engineering during his undergraduate studies and learn various game and graphics software like Opengl, Unity3D . To further improve his skills,he pursued Masters in Entertainment Technology at Carnegie Mellon University(CMU). At CMU Ravi got a chance to experiment with various game hardware prototypes and his main interest now lies in exploring how they would fit into the current popular genre of games. He is currently working in Google Project Tango team as Developer Relations Engineer exploring various gameplay mechanics and possibilities with the platform. He is also a main contributor to sample code and demos for Project Tango.

C/C++ API Session (40 min.): In Part I, developers will create a JNI based motion tracking example from an Android Studio skeleton project. We will walk through Tango API, including configuration, Android lifecycle, and implementing basic motion tracking functionalities. UnitySDK Session (40 min.): Part II of this session walks the user through porting an existing game to the Tango platform, using motion tracking, and demonstrating the best developer practices. We will introduce developers to the UI/UX library and demonstrate the benefits it provides in terms of user feedback. . Seating is limited. Please fill out the form at the following link to reserve your seat and a Project Tango device here: Reserve Your Spot

Level: All
Type: Hands-on Lab
Tags: Augmented Reality & Virtual Reality; Computer Vision & Machine Vision; Game Development

Day: Friday, 03/20
Time: 13:30 - 14:50
Location: Room 211A

Hands-on lab
 

POSTER

Presentation
Details

P5112 - CUDA Based Fog Removal : Machine Vision for Automotive/Defence Applications

RATUL WASNIK Sr. Software Engineer, KPIT technologies Ltd, Pune
Ratual Wasnik is presently working as a Sr. Software Engineer at KPIT Technologies, Pune with the defense team. My area of work includes High performance Computing/CUDA and Image Processing algorithms.

Proposed System is a fully automated De-weathering system to improve the visibility/stability during bad weather conditions much needed for surveillance/automotive infotainment/defense applications has been developed. Fog and haze during day and night time was handled with real time performance using accelerations from CUDA implemented algorithms. Videos from fixed cameras are processed with no special hardware except CUDA capable NVIDIA GPU's.

Level: All
Type: Poster
Tags: Computer Vision & Machine Vision; Video & Image Processing

Day: Monday, 03/16
Time: 17:00 - 20:00
Location: Grand Ballroom 220A
View PDF

P5117 - GPU Accelerated Compressive Imaging System

Mohammad Azari Research assistant, UNC Charlotte
Mohammad Azari
Mohammad Azari is a PhD student at Electrical and Computer Engineering department of University of North Carolina at Charlotte working as a research assistant at Center for Precision Meteorology under supervision of Dr. Farahi. He received the MSc degree in electrical engineering in 2011 and the BSc degree in electrical engineering in 2008, both from Amirkabir University of Technology (Tehran Polytechnic). His research interests include computer vision, signal and image processing, computational imaging, and embedded system design.

A new GPU accelerated compressive imaging system is introduced that is based on single-pixel camera architecture which allows designing a high-resolution camera for scenarios that ordinary high-resolution sensors that are costly or impractical such as hyperspectral and SWIR imaging. One major obstacle for employing this technique is very high computational requirement of the recovery algorithm. By parallelizing the recovery algorithm using GPU, we achieve the required speedup to make our imaging system suitable for practical applications.

Level: All
Type: Poster
Tags: Video & Image Processing; Computer Vision & Machine Vision

Day: Monday, 03/16
Time: 17:00 - 20:00
Location: Grand Ballroom 220A
View PDF

P5135 - Real-Time Incremental Principal Component Pursuit for Video Background Modeling on the TK1

Paul Rodriguez full professor, Pontificia Universidad Catolica del Peru
Paul Rodriguez
Paul Rodriguez received the BSc degree in electrical engineering from the "Pontificia Universidad Católica del Perú" (PUCP), Lima, Peru, in 1997, and the MSc and PhD degrees in electrical engineering from the University of New Mexico, USA, in 2003 and 2005 respectively. He spent two years (2005-2007) as a postdoctoral researcher at Los Alamos National Laboratory, and is currently a Full Professor with the Department of Electrical Engineering at PUCP. His research interests include AM-FM models, parallel algorithms, adaptive signal decompositions, and inverse problems in signal and image processing.

This work presents a real-time, GPU-enabled / CUDA-aware (unified memory model) implementation of a novel incremental Principal Component Pursuit (PCP) algorithm for video background modeling on the Jetson TK1 platform.Our implementation has an extremely low memory footprint, and a computational complexity that allows (on the Jetson TK1 platform) a processing frame rate throughput of 27.8 and 9.4 f.p.s. for grayscale videos of 640x480 and 1920x1088 respectively.

Level: All
Type: Poster
Tags: Computer Vision & Machine Vision

Day: Monday, 03/16
Time: 17:00 - 20:00
Location: Grand Ballroom 220A
View PDF

P5139 - 3D Object Recognition Considering Light Conditions

Kenia Picos PhD Candidate, CITEDI-IPN
Kenia Picos is a Ph.D. candidate at the National Polytechnic Institute in the Center of Research and Development of Digital Technology at Tijuana, México. Her fields on interest are image processing, computer vision and computer graphics.

The goal of this work is to solve object recognition under lighting conditions. Several issues presented in real scenes compromise the performance, such as noise, object's distortion, and incident light sources. We consider global illumination by accurately approximation of real world physics for light and matter interactions in a 3D scene. The system is able to adapt to input light conditions to improve the recognition performance. The results of the proposed system are given in terms of recognition metrics and computational efficiency.

Level: All
Type: Poster
Tags: Computer Vision & Machine Vision; Rendering & Ray Tracing

Day: Monday, 03/16
Time: 17:00 - 20:00
Location: Grand Ballroom 220A
View PDF

P5141 - Real-time FullHD Tracking-Learning-Detection on a 2-SMX GPU

Carlos S. Bederián CONICET Professional, Teaching Assistant, CONICET, Universidad Nacional de Córdoba
Carlos Bederián received an MsC in Computer Science by Universidad Nacional de Córdoba (UNC). He is HPC Professional at Consejo Nacional de Investigaciones Científicas y Tecnológicas (CONICET). Carlos is Teaching Assistant in Computer Science and founder member of the University GPGPU Computing Group. Carlos regularly teaches parallel programming for CPU and GPU platforms. He has successfully ported many scientific applications to CUDA, and he publishes in workshops and journals periodically. He deployed, installed and he is currently maintaining the most powerful operative cluster in Argentina (Mendieta).

We address the problem of real-time tracking from high resolution video streams. We present a GPU implementation of the TLD algorithm of Kalal et al. (2012), one of the most robust tracking algorithms in the literature. Its high-computational cost has restricted its use to low-resolution videos. We ported the algorithm from a CPU optimized version to a 2-SMX NVIDIA Kepler architecture. We achieved an average speedup of 3x on different 1080p videos, obtaining a system capable of real-time tracking on FullHD streams using very low cost hardware.

Level: All
Type: Poster
Tags: Computer Vision & Machine Vision

Day: Monday, 03/16
Time: 17:00 - 20:00
Location: Grand Ballroom 220A
View PDF

P5156 - Multiple Camera Mapping

Rodrigo Silva VFX Researcher, GLOBO TV NETWORK
Rodrigo Silva has experience in Electrical Engineering / Computer Science, with emphasis in Graphic Design and Software Engineering Computing, acting on the following topics: Modeling, 2D games, 3D games, Numerical Simulation, Pattern Recognition, Image Processing and Information, 3D Technology, education. He has worked five years as a visual effects researcher at GLobo TV Network and develops methods for optimizing the production process of visual effects.

This paper presents a new procedure to reconstruct and shade automatically, with artistic interference if needed, an area captured by a sequence of photos or a high definition video (e.g,. 4K cameras). The first CPU version uses VRay, the render time for the Race Track took about 15 minutes per frame, reducing by 8 the total render time of the classic scene modeling. We reduce significantly the render time when using Optix as our RayTrace engine.

Level: All
Type: Poster
Tags: Computer Vision & Machine Vision; Media & Entertainment

Day: Monday, 03/16
Time: 17:00 - 20:00
Location: Grand Ballroom 220A
View PDF

P5158 - Predicting ADAS Algorithms Performances on K1 Architecture

Romain Saussard PhD student, embedded image processing algorithm for ADAS, Renault
Romain Saussard
Romain Saussard is a PhD student at Renault and IEF, Paris, France. His thesis is on defining metrics of embeddability for computer vision algorithm on heterogeneous architectures applied to the ADAS.

Computer vision algorithms are widely used in automotive field for ADAS. A lot of computing architectures can be used to embed those algorithms : ARM, DSP, GPU and heterogeneous one like the K1. But the choice of the computing architecture remains a problem for the car manufacturer. We propose a method to predict performance of computer vision algorithms on multiple, heterogeneous architectures in order to help choosing the best algorithm - architecture association. The approach is illustrated with a lane detection algorithm embedded on the K1.

Level: All
Type: Poster
Tags: Automotive; Computer Vision & Machine Vision

Day: Monday, 03/16
Time: 17:00 - 20:00
Location: Grand Ballroom 220A
View PDF

P5160 - Implementation Mechanisms Autostereograms in Scalable Architectures

Carlos Jaime Barrios Hernández Director | Assistant Professor, High Performance and Scientific Computing Center, Universidad Industrial de Santander (SC3UIS)
Carlos Jaime Barrios Hernandez, is director of the High Performance and Scientific Computing Center of the Universidad Industrial de Santander, in Bucaramanga, Colombia. Assistant Professor of the Informatics and Systems Engineering School at the same university, he has a PhD in informatics and computer science at the Université de Nice-Sophia Antipolis in France. Today, he promotes the use of scalable architectures using parallel massive processing, creating the first one CUDA Teaching Center and the first CUDA Research Center in Colombia, he supports the creation of new CRC and CTC in Latin America and he coordinates five different formations in Parallel Massive Processing from tutorials and complete sessions in the supercomputing and distributed systems camping school.

There is a technique for presenting stereograms, where full information, for the two eyes, is contained in a single image. These images are known as "autostereograms", they may contain a wide variety of forms of depth with some limitations. The images are generated in multiple planes and in turn front or behind the physical plane. In order to perceive 3D shapes in autostereograms, it is necessary to separate the visual processes, between focusing and convergence, linked under normal vision. This using a supercomputing platform with 128 GPUs.

Level: All
Type: Poster
Tags: Computer Vision & Machine Vision; Visualization - In-Situ & Scientific

Day: Monday, 03/16
Time: 17:00 - 20:00
Location: Grand Ballroom 220A
View PDF

P5172 - A GPU Accelerated 3D Kinematic Modeling Platform for Behavioral Neuroscience

John Long Post-doctoral researcher, New York University Langone Medical Center
John is a postdoctoral researcher in the laboratory of Dr. György Buzsáki at the New York University Langone Medical Center. He received his PhD in neuroscience from the UC Berkeley Helen Wills Neuroscience Institute in 2011, in the Brain-Machine Interface laboratory of Dr. Jose Carmena. His current work in neuroscience leverages multiple camera photogrammetry and the power of GPUs to build 3D models of his neurophysiological subjects to study the relationships between memory formation in the brain, navigation, and action planning. He is also working within the clinical domain to develop a computer vision system for behaviorally diagnosing Parkinson's disease.

Computer vision techniques for 3D reconstruction and kinematic modeling are positioned to bring about a major advance in the field of behavioral neuroscience. Integrating GPUs into the software pipeline has qualitatively improved our ability to fit, inspect, and refine complex kinematic models. Our custom markerless motion capture system, in conjunction with our use of high-density silicon neural implants (≥ 100 channels), provides an unprecedented glimpse into the relationship between the brain, memory, and behavior.

Level: All
Type: Poster
Tags: Computer Vision & Machine Vision; Life & Material Science

Day: Monday, 03/16
Time: 17:00 - 20:00
Location: Grand Ballroom 220A
View PDF

P5176 - Galaxy Classification with Deep Convolutional Neural Networks

Honghui Shi Research Assistant, University of Illinois, Urbana-Champaign
Honghui Shi
Honghui Shi is a graduate Research Assistant. He attended Beckman Institute of Advanced Science and Technology. He is currently at the Electrical & Computer Engineering Department University of Illinois, Urbana-Champaign.

There are more than 170 billion galaxies in the observable universe, and we humans have captured image data covering more than a quarter of the whole sky with our powerful telescopes and ambitious sky surveys like SDSS. The vast amount of information is not meant for humans to process, and CPUs and traditional algorithm both meet their bottleneck in processing. With the help of recent deep learning technologies and powerful implementations with NVIDIA's GPUs, the developed models can competitively accurately classify galaxies.

Level: All
Type: Poster
Tags: Astronomy & Astrophysics; Computer Vision & Machine Vision; Machine Learning & Deep Learning

Day: Monday, 03/16
Time: 17:00 - 20:00
Location: Grand Ballroom 220A
View PDF

P5178 - Adult-Content Detection in Video (with the Use of NVIDIA GPU)

Denis Timoshenko Leading Software developer, Kuznech Inc.
Denis Timoshenko
PhD in Applied Mathematics and Control Processes from Saint-Petersburg State University (expected 2014).Participated in conferences on Computer Vision, Visual Search, Image Recognition, Convolutional Neural Networks, including: • IPAM Summer School "Computer Vision", USA, LA, 2013; • ECSE Summer School "Advances in Probabilistic Modeling for Pattern Recognition", Finland, 
Joensuu, 2012. Published articles and papers on the aboComputer Vision, Visual Search, Image Recognition, Convolutional Neural Networks. Since 2012 Denis is the leading software developer and team leader at Kuznech Inc. LLC.

Pornography detection is a significant subtask of online content filtering. One of the biggest problems of many social networks and video sharing websites is to prevent pornography distribution. In this work we describe a combined porn detector based on deep neural networks. Our detector works with several types of porn features, such as porn film studio logos, warning text and sexual explicit scenes. In addition, we show the results of speed comparison between CPU and GPU realization of neural networks, run according to this task.

Level: All
Type: Poster
Tags: Computer Vision & Machine Vision; Machine Learning & Deep Learning

Day: Monday, 03/16
Time: 17:00 - 20:00
Location: Grand Ballroom 220A
View PDF

P5183 - Fast Omni-image Unwarping on the Jetson TK1

Gustavo Manuel Silva Obregón Undergraduate student, Pontificia Universidad Católica del Perú
Gustavo Manuel  Silva Obregón
I am an undergraduate student of last cycle and research assistant in the electronic engineering department at Pontifical Catholic University of Peru. I am currently taking courses in digital signals and images processing and finishing my undergraduate thesis about implementation of omnidirectional video unwarping on the Jetson TK1 platform.

In this poster, we present an efficient and highly parallelizable algorithm for unwraping omnidirectional images that benefit from the CUDA platform (NVIDIA Jetson TK1). The implementation achieves the real-time with a range of 1697 ∼ 34 FPS for unwrapping omni-images (512x512 ∼ 4096x4096) into pano-images (128x512 ∼ 1024x4096).

Level: All
Type: Poster
Tags: Computer Vision & Machine Vision

Day: Monday, 03/16
Time: 17:00 - 20:00
Location: Grand Ballroom 220A
View PDF

P5195 - Efficient Image and Video Super-Resolution Using Deep Convolutional Neural Networks

Ding Liu Graduate Student, University of Illinois at Urbana-Champaign
Ding Liu
Ding Lu is a research assistant in the Image Formulation and Processing (IFP) group led by Professor Thomas Huang, University of Illinois at Urbana-Champaign. His interests include machine learning and computer vision problems.

We propose a GPU parallelized algorithm using deep convolutional neural network for single image super-resolution (SR). Unlike the traditional sparse coding based method, our deep learning approach parallelizes all the computation steps from end to end, without sacrificing the performance of the state-of-the-art SR methods both qualitatively and quantitatively. The GPU parallelization accelerates our algorithm significantly, and enables us to build up a real-time video SR system for real-time applications.

Level: All
Type: Poster
Tags: Computer Vision & Machine Vision

Day: Monday, 03/16
Time: 17:00 - 20:00
Location: Grand Ballroom 220A
View PDF

P5205 - High Speed Stabilization and Geo-Registration of Aerial Imagery based on GPU Optimization

Steve Suddarth Director, Transparent Sky, LLC
Dr. Steve Suddarth is the director of Transparent Sky, LLC, specializing in airborne Wide Area Motion Imaging (WAMI) technology for real-time surveillance. Steve has also served key leadership positions in the U.S. Military, including leading the development of WAMI systems that have been deployed to the Middle East. Dr. Suddarth also is the Chief Technical Officer and former Director of the Configurable Space Microsystems Innovation and Application Center (COSMIAC) at the University of New Mexico. Steve holds a Ph.D. in Electrical Engineering from the University of Washington and is also a graduate of the U.S. Air Force Academy.

GPU optimizations already improved projection speed of Wide-Area Motion Imaging (WAMI) maps by 100x. An Air Force-led team developed novel GPU-optimized algorithms that merge projection with stabilization and automated real-time tracking of items such as vehicles and people. The resulting systems will ultimately be deployed on small on-board processors in low-cost drones to replace multi-million dollar systems deployed on turbine aircraft. The imagery has military and civil applications such as security, traffic management and firefighting.

Level: All
Type: Poster
Tags: Computer Vision & Machine Vision; Video & Image Processing

Day: Monday, 03/16
Time: 17:00 - 20:00
Location: Grand Ballroom 220A
View PDF

P5213 - GPU Accelerated Haze Removal on Tegra K1

Bin Zhou Adjunct Research Professor, University of Science and Technology of China
Dr. Bin Zhou is the director and chief scientist of Marine Information Processing Laboratory(MIPL) at Institution of Oceanography, Shandong Academy of Sciences. He serves as an Adjunct Research Professor in School of Information Science and Technology at USTC and an NVIDIA CUDA Fellow. He is the PI of CUDA research center (CRC) in Institute of Advanced Technology(IAT), USTC.In MIPL, he leads a team working on information processing systems for marine environmental pollution & natural hazard monitoring and ocean-atmosphere simulation. In CRC, he performs researches on drones control, video processing and computer vision algorithms on NVIDIA GPU/CUDA platform.

Toxic haze becomes a major air pollution threat in China, which affects not only public health but also outdoor computer vision systems. By adapting dark channel prior method into dehazing process, very good effects are achieved. However, huge processing requirements bring big challenges. We refined the parallel algorithm and performed deep-optimization on Tegra K1 Jetson platform. Compared to ARM CPU, experiments show 156x speedup. The results show Tegra K1 has great potential for embedded real-time computer vision processing.

Level: All
Type: Poster
Tags: Computer Vision & Machine Vision; Video & Image Processing

Day: Monday, 03/16
Time: 17:00 - 20:00
Location: Grand Ballroom 220A
View PDF

P5216 - Visual Semantics from an Accelerated Model Extraction and Construction

Miguel Octavio Arias Estrada Researcher, Instituto Nacional de Astrofísica, Óptica y Electrónica
Miguel Arias-Estrada, is a researcher in Computer Science at INAOE (National Institute of Astrophysics, Optics and Electronics, Puebla, Mexico), with a Ph.D. in electrical engineering (computer vision) from Laval University (Canada) and BEng and MEng degrees in electronic engineering from University of Guanajuato (Mexico). Since 1997, Dr. Arias-Estrada has been exploring dedicated FPGA based architectures to accelerate image analysis and later integrated in smart cameras based on FPGA technology. Recently, he has involved in GPU based algorithm acceleration for computer vision and novel FPGA based controller systems for satellite control. Dr. Arias-Estrada has directed more than 35 M.Sc. and 8 Ph.D. student thesis, and he has participated in industrial projects and holds more than 5 patents.

Image representation is a critical step in computer vision. However, it remains as one of the most challenging topics, partly because of the lack of sufficiently discriminative and robust representations, and the high computational cost of state-of-the-art methods. In our work, we propose an image representation method that integrates low/middle-level features extracted from images with high level cognitive representations, which is accelerated with the use of GPUs could be improved by introducing high-level semantic knowledge representations.

Level: All
Type: Poster
Tags: Computer Vision & Machine Vision

Day: Monday, 03/16
Time: 17:00 - 20:00
Location: Grand Ballroom 220A
View PDF

P5218 - CloudCV: Large Scale Distributed Computer Vision as a Cloud Service

Clint Solomon Student, Virginia Tech

We are witnessing a proliferation of massive visual data. Unfortunately scaling existing computer vision algorithms to large datasets leaves researchers repeatedly solving the same algorithmic and infrastructural problems. Our goal is to democratize computer vision; one should not have to be a computer vision, big data and distributed computing expert to have access to state-of-the-art distributed computer vision algorithms. We provide access to state-of-art distributed computer vision algorithms as a cloud service through Web Interface & APIs.

Level: All
Type: Poster
Tags: Computer Vision & Machine Vision; Machine Learning & Deep Learning

Day: Monday, 03/16
Time: 17:00 - 20:00
Location: Grand Ballroom 220A
View PDF

P5224 - GPU-Accelerated 3D Surface Reconstruction Using Gaussian Mixture Sampling and Sparse Voxel Lists

Benjamin Eckart PhD Student, Carnegie Mellon University
Benjamin Eckart is an NVIDIA Graduate Fellow and PhD student with the Robotics Institute at Carnegie Mellon University. His research focuses on the creation of real-time algorithms for robotic perception that can be accelerated by parallel algorithms and novel data structures. He is exploring ways to use many-core architectures such as the GPU to rapidly create compact models from 3D range data in order to facilitate and unify common low-level perceptive tasks.

As 3D depth sensors become smaller, cheaper, and more ubiquitous, it is becoming increasingly important to develop efficient and robust techniques to manage and process point cloud data. A common operation of particular importance is the ability to derive solid 3D geometry from unorganized sets of points. In this poster, we describe a parallel method to both process and compress 3D point data into a statistical parametric form in order to quickly construct a 3D triangle mesh using a modified form of the Marching Cubes algorithm.

Level: All
Type: Poster
Tags: Computer Vision & Machine Vision; Machine Learning & Deep Learning

Day: Monday, 03/16
Time: 17:00 - 20:00
Location: Grand Ballroom 220A
View PDF

P5235 - Enhanced Human Computer Interaction Using Hand Gesture Analysis on GPU

Pragati Dharmale Graduate Student, SNHU, NH
Graduate Student in Southern New Hampshire University, NH with specialization in information technology. His core interest is in GPU and signal processing application development.

This poster represents very active research topic in human computer interaction (HCI) as automatic hand gesture recognition using NVIDIA's GPU. In this work, neural network based video gesture are processed and the finger counts recognize. Due to real time requirement, algorithm need to optimize and be computationally efficient. We implemented the MATLAB code, it performs slow when neural network processing started. Implementing them in a parallel programming model such as GPU-CUDA provided the necessary gain in processing speed.

Level: All
Type: Poster
Tags: Computer Vision & Machine Vision; Video & Image Processing

Day: Monday, 03/16
Time: 17:00 - 20:00
Location: Grand Ballroom 220A
View PDF

P5249 - Computer Vision and Visualization for Implanted Visual Prostheses Using Tegra K1

Wai Ho Li Chief Investigator (Signal Processing), Monash Vision Group
Wai Ho Li
Wai Ho is a Chief Investigator in Monash Vision Group (MVG), a research centre working on a cortical Implanted Visual Prosthesis ("bionic eye"). Within MVG, he works on wearable computer vision and simulated prosthetic vision. He is a core software developer for MVG, writing C++ code for ARM-based embedded platforms and MATLAB scripts for desktop-based simulations. Wai Ho's work at MVG has resulted in several patent applications and multiple peer reviewed publications in top AR, Robotics and Wearable IEEE/ACM conferences such as ISMAR, ICRA and ISWC. Wai Ho is also a co-leader of the Robotic Vision lab at Monash University (with Professor Tom Drummond). Within the lab, he works with his students on RGB-D sensing, Augmented Reality, Fast computer vision for mobile devices and 3D sensing for robotic applications. Wai Ho often works with industry partners on computer vision and image processing problems, such as automated asset management and pedestrian detection.

This poster presents Computer Vision and Visualization for Implanted Visual Prostheses optimized to run on the Tegra K1. Implanted Visual Prostheses (Bionic Eye) produce Prosthetic Vision similar to a low resolution dot pattern. Prosthetic Vision is improved by Computer Vision that detect and show salient visual content.Simulated Prosthetic Vision evaluates computer vision methods by visualizing them to users. The tiny 10W 120-gram Jetson performs Computer Vision and Visualizations at 30FPS, outpacing a 65W+ 3kg laptop.

Level: All
Type: Poster
Tags: Computer Vision & Machine Vision; Augmented Reality & Virtual Reality

Day: Monday, 03/16
Time: 17:00 - 20:00
Location: Grand Ballroom 220A
View PDF

P5270 - Input Resource Reduction in Development for Industrial Measurement Systems, Using GPUs

Takuya Yasuda image processing engineer, SCREEN Holdings Co., Ltd.
Takuya Yasuda is an image processing engineer who has worked for SCREEN Holdings Co., Ltd. (former Dainippon Screen Mfg. Co., Ltd.) for eight years. He has been involved in the development of various machine vision systems for industrial inspection. For the last five years, he has focused on GPGPU acceleration for their systems.

We present a launched measurement system for the transparent electrode, and improved efficiency in development process with GPUs. We have developed the following plural sets of algorithm in camera calibration, measurement of width in the transparent electrode line pattern and original image processing filter. The rapid response to needs of the users or our team members utilizing the GPU has made a reduction by about 75% in image processing development costs compared to the conventional development on FPGA-based hardware.

Level: All
Type: Poster
Tags: Computer Vision & Machine Vision; Manufacturing

Day: Monday, 03/16
Time: 17:00 - 20:00
Location: Grand Ballroom 220A
View PDF

P5280 - A GPU Accelerated Cardiac Image Segmentation Approach Using Diffeomorphic Registration

Kumaradevan Punithakumar Operational and Computational Director, Servier Virtual Cardiac Centre
Dr. Kumaradevan Punithakumar is the Operational and Computational Director of the Servier Virtual Cardiac Centre at the Mazankowski Alberta Heart Institute, Edmonton, Canada. He is also an Assistant Professor in the Department of Radiology and Diagnostic Imaging at the University of Alberta, Edmonton, Canada. His past careers include an Imaging Research Scientist at GE Healthcare, Canada. He received the Industrial Research and Development Fellowship by the National Sciences and Engineering Research Council of Canada in 2008, and the GE Innovation award in 2009. Dr. Punithakumar published over thirty scientific papers in many international journals and conferences, and has filed six patent applications. His areas of interest include medical image analysis and visualization, information fusion, object tracking, parallel computing and nonlinear filtering.

This study presents a moving mesh correspondence algorithm for the cardiac right ventricular (RV) segmentation using GPU computing. Automatic delineation of the RV is difficult because of its complex morphology and ill-defined borders. One solution to the problem is to use a non-rigid registration method to obtain the point correspondence in a sequence of cine MR images. In a previous study, we proposed GPU computing to accelerate the image registration algorithm. In this study, we further parallelize the problem by image concatenation.

Level: All
Type: Poster
Tags: Medical Imaging; Computer Vision & Machine Vision

Day: Monday, 03/16
Time: 17:00 - 20:00
Location: Grand Ballroom 220A
View PDF

P5314 - Concurrent Image Segmentation by Locally Specified Polygonal Markov Fields on the GPU

Michal Matuszak Assistant Professor, Faculty of Mathematics and Computer Science, Adam Mickiewicz University, Poznan, Poland
Michal Matuszak
Michal Matuszak is an Assistant Professor at the Adam Mickiewicz University in Poznan and PI/PD in National Science Center grant at the Nicolaus Copernicus University in Torun. His main interest are Probabilistic Graphical Models, Parallel Programing and Neuroscience.

We introduce a class of multicolored polygonal Markov fields driven by local activity functions. Whereas the local rather than global nature of the field specification ensures substantial additional flexibility for statistical applications in comparison to classical polygonal fields. Within the framework of this theory we develop a concurrent image segmentation algorithm based on Markovian optimization dynamics combining the simulated annealing ideas with those of Chen-style stochastic optimization.

Level: All
Type: Poster
Tags: Computer Vision & Machine Vision; Machine Learning & Deep Learning

Day: Monday, 03/16
Time: 17:00 - 20:00
Location: Grand Ballroom 220A
View PDF

P5322 - GPU Implementation of Particle Filter Based Object Tracking Algorithm

Pinalkumar Engineer Research Scholar, Indian Institute of Technology Bombay, INDIA
Pinalkumar J. Engineer received his Bachelors in Engineering at Instrumentation and Control Engineering from L. D. College of Engineering, Ahmedabad in 1999. Later, he received his Master of Engineering with specialization in Microprocessor system and application from M S University of Baroda in 2002. He is pursuing his PhD from Electrical Engineering Department of Indian Institute of Technology, Bombay and is currently an assistant professor at the same institute. Pinalkumar Engineer is member of professional societies IETE, India and student member of IEEE. He has more than 20 publications in international journals and conferences in field of Embedded Systems and High Performance Computing.

This poster presents GPU implementation of Particle filter based object tracking algorithm in Video. We have compared Matlab and OpenCV implementation with its GPU implementation. It is found that more 100X speedup is achieved for 1024 particles for CUDA implementation compared to pure Matlab implementation, maintaining frame rate of ~56 fps.We have also compared pure OpenCV implementation with GPU implementation, achieving 20X speedup.

Level: All
Type: Poster
Tags: Computer Vision & Machine Vision; Real-Time Graphics

Day: Monday, 03/16
Time: 17:00 - 20:00
Location: Grand Ballroom 220A
View PDF

P5332 - Embedded Real-time Obstacle Recognition

Ryan Beethe Student, Colorado School of Mines
Just a guy who loves building robots better, faster, and stronger. And smarter.

The project is an entry to the Colorado Robot Challenge, an obstacle course for autonomous robots. Key research in the project includes an embedded stereo vision application to identify obstacles on the course. A CUDA implementation of H. Hirschmuller's semi-global matching algorithm is being developed and will be contributed to the OpenCV library upon its completion.

Level: All
Type: Poster
Tags: Computer Vision & Machine Vision; Embedded Systems

Day: Monday, 03/16
Time: 17:00 - 20:00
Location: Grand Ballroom 220A
View PDF

Poster
 

HANGOUT

Presentation
Details

S5910 - Hangout: Computer Vision

Have burning questions about computer vision? Come to the GTC Hangouts! Hangouts are like "office hours" with your favorite professor, designed to connect you directly with NVIDIA engineers on a specific topic each hour. Pull up a chair and ask away – we're here to help!

Level: All
Type: Hangout
Tags: Computer Vision & Machine Vision

Day: Thursday, 03/19
Time: 09:00 - 10:00
Location: Pod A

S5911 - Hangout: Computer Vision

Have burning questions about computer vision? Come to this GTC Hangout! Hangouts are like "office hours" with your favorite professor, designed to connect you directly with NVIDIA engineers on a specific topic each hour. Pull up a chair and ask away – we're here to help!

Level: All
Type: Hangout
Tags: Computer Vision & Machine Vision

Day: Thursday, 03/19
Time: 17:00 - 18:00
Location: Pod B

Hangout