Sign In
GTC Logo
GPU
Technology
Conference

March 17-20, 2015 | San Jose, California
Check back often for session updates.

Scheduler Planner

Print
Download Pdf
 
  • List View
  • Calender View

 
Refine:
  • Session Levels:
  • |
  • |
  • |
  • |
  • Session Levels:
  •  
  •  
  •  
  •  
  • = Highly Rated Speaker

TALK

Presentation
Details

S5552 - Transparent Parallelization of Neural Network Training

Cyprien Noel Software Engineer, Flickr / Yahoo Inc.
Cyprien Noel
Cyprien worked on high performance distributed software in various settings, finance software startup, gaming, internet of things, and an Air Control simulator at NASA. He lived in France, NY, and loves his new home San Francisco. A couple years ago he started working again on his beginnings in machine learning, combining expertise in high performance computing and neural networks.Bio details to go here.
Simon Osindero Senior Manager / Research Engineer, Flickr / Yahoo Inc.
Simon Osindero
Simon Osindero is currently a senior principal researcher and manager at Flickr, Yahoo Inc. where he leads efforts on applied machine learning. Prior to joining Yahoo, he was CTO and co-founder of LookFlow, a startup that combined state-of-the-art machine learning, computer vision, and information visualization methods to build a revolutionary search-and-discovery experience. (LookFlow was acquired by Yahoo at the end of 2013.) Before starting LookFlow he developed machine learning algorithms for natural language processing and semantic analysis as a researcher at Idilia Inc. He is perhaps best known for his contribution to the field of machine learning through his post-doctoral work on Deep Belief Networks, at the University of Toronto in collaboration with Geoff Hinton and Yee Whye Teh. His 2006 paper is widely credited as reigniting the current wave of interest in "deep learning". He holds: a PhD in Computational Neuroscience from the Gatsby Unit at UCL; an MSci in Experimental & Theoretical Physics along with a BA/MA degrees in Physics, Molecular Biology, and Mathematics from the University of Cambridge; and diplomas in Photography and Design from Concordia University.

Deep learning has enjoyed tremendous success in recent years. Unfortunately, training large models can be very time consuming, even on GPU hardware. We describe a set of extensions to the state of the art Caffe library cite{Jia13caffe}, allowing training on multiple threads and GPUs, and across multiple machines. Our focus is on architecture, implementing asynchronous SGD without increasing Caffe's complexity.

Level: Intermediate
Type: Talk
Tags: Machine Learning & Deep Learning; Computer Vision & Machine Vision; Developer - Performance Optimization

Day: Tuesday, 03/17
Time: 13:00 - 13:25
Location: Room 210A

S5637 - ZFAS - The Brain of Piloted Driving at Audi

Matthias Rudolph Head of Architecture Driver Assistance Systems, Audi AG
Matthias Rudolph
Dr. Rudolph studied Electrical Engineering at the University of Kassel and got his Ph.D. in Aerospace Engineering and Engineering Mechanics from Iowa State in 1999 with a minor in mathematics. After holding various positions at Audi he took in 2009 the Lead of the Department "Architecture Driver Assistance Systems". The zFAS project is one of the core development of the department. Dr. Rudolph is a member of the management at Audi.

During the last several years, Audi has developed with partners a platform that enables piloted driving and piloted parking. At CES 2015 it was shown that the system can drive piloted on the highway from Silicon Valley to Las Vegas. The computational platform or brain of this vehicle is called zFAS, with the core element being the NVIDIA Tegra K1. This talk will start with the history and the motivation of piloted functions at Audi, followed by an overview of the current architecture and an outline of future potential leveraging deep learning algorithms.

Level: Intermediate
Type: Talk
Tags: Automotive; Computer Vision & Machine Vision; Video & Image Processing

Day: Tuesday, 03/17
Time: 13:00 - 13:25
Location: Room LL21F

S5866 - Project Tango: Mobile 3D Tracking and Perception

Johnny Lee Lead, Project Tango, Google
Highly-Rated Speaker
Johnny Lee
Johnny Lee is the lead of Project Tango at Google - which is a focused effort to bring computer vision and advanced sensor fusion to mobile platforms. Previously, he helped Google X explore new projects as Rapid Evaluator and was a core algorithms contributor to the original Xbox Kinect. His YouTube videos demonstrating Wii remote hacks have surpassed over 15 million views and became one of the most popular TED talk videos. In 2008, he received his PhD in Human-Computer Interaction from Carnegie Mellon University and has been recognized in MIT Technology Review’s TR35.
James Fung Platform Software Lead, Project Tango, Google
James Fung
James Fung has been applying GPUs to accelerate general purpose parallel computing, with a focus on image processing and computer vision. He received his Ph.D. in Electrical and Computer Engineering from the University of Toronto. He worked in Developer Technology at NVIDIA helping adoption of GPU Computer Vision. He is currently the Platform Software Lead in Project Tango at Google

Project Tango is a focused effort accelerate the progress and adoption in of 3D tracking and sensing technologies on mobile devices. It is a platform for developing advanced computer vision and sensor fusion algorithms to estimate position and orientation of the device in the real-time, while simultaneously generating a 3D map of the environment. This talk we discuss some of the underlying technologies that make this possible, such as the hardware sensors and some of the software algorithms. We will also show demonstrations of current state of development, and discuss the role of 3D sensing in mobile gaming, indoor navigation, virtual reality, augmented reality, and autonomous drones. We hope you will join us on this journey. We believe it will be one worth taking.

Level: All
Type: Talk
Tags: Computer Vision & Machine Vision; Augmented Reality & Virtual Reality

Day: Tuesday, 03/17
Time: 13:00 - 13:50
Location: Room 212B

S5108 - Vision-Based Driver Assistance: Seeing the Way Forward

Ian Riches Director, Global Automotive Practice, Strategy Analytics
Ian Riches is a Director in the Global Automotive Practice at Strategy Analytics. He heads a research team that covers all aspects of embedded automotive electronic systems, semiconductors and sensors on a worldwide basis. His areas of research include powertrain, chassis, safety, security and body applications – including high-growth areas such as hybrid and electric vehicles and advanced driver assistance systems. Before joining Strategy Analytics, Ian spent two years as assistant editor of Automotive Engineer, the UK magazine published by the IMechE. He has also held the position of Press Officer/Technical Author for MTL, a safety-related electronic equipment manufacturing company. With over eighteen years of experience, he is one of the foremost industry analysts in the automotive electronics sector. Ian holds an MA in engineering from Cambridge University, UK, where he specialized in fluid dynamics, turbo-machinery and internal combustion engines.

This market introduction to vision-based solutions in advanced driver assistance systems will highlight the regions, applications and vehicle sectors are driving the growth. Current and likely future architectures will be explored and the implications for both traditional and non-traditional automotive suppliers will be highlighted. Finally, the role of and implications of automated driving will be investigated and analyzed.

Level: All
Type: Talk
Tags: Automotive; Computer Vision & Machine Vision; Video & Image Processing

Day: Tuesday, 03/17
Time: 13:30 - 13:55
Location: Room LL21F

S5131 - Mobile Visual Search

Martin Peniak Parallel Computing Software Engineer, Cortexica
Martin Peniak
Martin works as a parallel computing software engineer at Cortexica where he develops algorithms for discrete as well as mobile GPUs. Martin got his Ph.D. in GPU computing applied to cognitive robotics and previously collaborated with international EU FP7 ITALK and Poeticon++ consortium that aimed at developing biologically-inspired artificial systems capable of progressively developing their cognitive capabilities through the interaction with their environments. He also collaborated with ESA (European Space Agency) on a project evolving neural network controllers for simulated Mars rover robots. In summer 2012, Martin worked at NVIDIA research in Santa Clara where he evaluated several machine learning algorithms on the next-generation GPU architecture. During his work at NVIDIA, he also developed a novel bio-inspired system for 3D object recognition.More recently, Martin did a TEDx talk, the first one covering GPU computing and its implications to robotics.

The attendees will learn about Cortexica's FindSimilar™ technology. Its algorithms are based on the way the human visual cortex recognises images and objects, meaning that poor lighting conditions, rotated or skewed images and other 'imperfect' objects can all be recognized accurately. In this presentation, you will learn about the challenges in the field of visual search and how our company addresses them by leveraging the processing power of GPUs including the latest NVIDIA K1 processor. This session will include several demonstrations of our technology and the latest mobile applications using NVIDIA K1 processors to speedup the visual search performance.

Level: Intermediate
Type: Talk
Tags: Computer Vision & Machine Vision; Video & Image Processing; Embedded Systems

Day: Tuesday, 03/17
Time: 13:30 - 13:55
Location: Room 210B

S5182 - The Future of Human Vision: Preferential Augmentation Using GPUs

Muhammad Shamim Bioinformatics Programmer, Baylor College of Medicine
Muhammad Shamim
Muhammad Shamim is a bioinformatics programmer in Dr. Erez Lieberman Aiden's Lab at the Baylor College of Medicine, working on a variety of projects ranging from big data and genomics to augmented reality. Muhammad is a graduate of Rice University with a BS in Computer Science and a BA in Computational & Applied Mathematics and Cognitive Sciences.

Loss of vision can result from an enormous number of visual disorders, a small subset of which can be addressed using traditional corrective lenses, i.e. by transforming light in accordance with Snell's law of refraction. In principle, a more general class of transformations might help address a broader range of disorders. Discover how GPUs are being used in augmented reality applications to correct or alleviate vision deterioration in real-time, as well as personalize vision in novel ways.

Level: All
Type: Talk
Tags: Augmented Reality & Virtual Reality; Computer Vision & Machine Vision; Video & Image Processing; Medical Imaging

Day: Tuesday, 03/17
Time: 13:30 - 13:55
Location: Room LL21C

S5581 - Visual Object Recognition Using Deep Convolutional Neural Networks

Rob Fergus Research Scientist , Facebook
Highly-Rated Speaker
Rob Fergus
Rob Fergus is an Associate Professor of Computer Science at the Courant Institute of Mathematical Sciences, New York University. He is also a Research Scientist at Facebook, working in their AI Research Group. He received a Masters in Electrical Engineering with Prof. Pietro Perona at Caltech, before completing a PhD with Prof. Andrew Zisserman at the University of Oxford in 2005. Before coming to NYU, he spent two years as a post-doc in the Computer Science and Artificial Intelligence Lab (CSAIL) at MIT, working with Prof. William Freeman. He has received several awards including a CVPR best paper prize, a Sloan Fellowship & NSF Career award and the IEEE Longuet-Higgins prize.

This talk will describe recent progress in object recognition using deep convolutional networks. Over the last few years, these have demonstrated significant gains over traditional computer vision approaches and are now widely used in industry (e.g. Google, Facebook, Microsoft, Baidu). Rob Fergus will outline how these models work, and describe architectures that produce state-of-the-art results on the leading recognition benchmarks. GPUs are an essential component to training these models. The talk will conclude with a live demo.

Level: Intermediate
Type: Talk
Tags: Machine Learning & Deep Learning; Computer Vision & Machine Vision

Day: Tuesday, 03/17
Time: 13:30 - 13:55
Location: Room 210A

S5201 - SMTool: A GPU based Satellite Image Analysis Tool

Dilip Patlolla R & D Staff, Oak Ridge National Laboratory
Highly-Rated Speaker
Dilip Patlolla
Dilip Patlolla is an R & D Staff in the Geographic Information Science and Technology (GIST) Group at the Oak Ridge National Laboratory, which has been a pioneer in the development, implementation, and application of systems, science, and technology for geographic information. His primary responsibilities include: opening up new domains of application for HPC, FPGAs, GPUs by researching and developing computing algorithms, and ensuring best possible performance on current and next-generation architectures. He leads the development of mapping and characterizing global-scale human settlements using advanced computing methods and received ORNL's 2013 Significant Event Achievement Award for the effort.

This session will demonstrate our advanced satellite image analytic tool referred as SMTool built on the CUDA platform to process city-scale sub-meter resolution satellite imagery to detect and discriminate man-made structures. Automated analysis of large-scale high resolution satellite imagery requires computationally efficient image representation techniques that characterize the distribution of structures in the scene. The interesting structures could range from simple edges, lines, to complex shapes of objects on the ground.Different representation techniques and their careful implementation exploiting the GPU architecture will be reviewed. We present results of SMTool from our ongoing work supporting global-scale population mapping and polio eradication and immunization efforts.

Level: All
Type: Talk
Tags: Computer Vision & Machine Vision; Machine Learning & Deep Learning; Big Data Analytics; Supercomputing

Day: Tuesday, 03/17
Time: 14:00 - 14:25
Location: Room 210B

S5626 - Accelerating Computer Vision and Augmented Reality via GPGPU Computing

Thomas Alt CEO and Co-Founder, Metaio
Thomas Alt
Thomas Alt founded Metaio GmbH in February 2003 together with Peter Meier and is currently serving as the CEO. Over the last decade, Thomas has grown Metaio into one of the most important names in Augmented Reality, serving customers around the world in diverse areas such as retail, manufacturing and aerospace. With over 120 employees worldwide and offices in Munich, San Francisco and New York, Metaio continues to bring to market the most innovative technology in the field of Augmented Reality and computer vision, both for device manufacturers, and end users. Thomas received his Ph.Dd in 2002 from Otto-Von-Guerike Universität Magdeburg, where he focused his dissertation on Augmented Reality.

It is no secret that augmented reality is a computationally-intensive endeavor. While human sight is taken for granted by the average person, getting a computer to "see" in a way that remotely resembles our expectations is an extremely complex challenge. Tasks such as extracting significant features from a camera feed, estimating a camera pose, and finally rendering digital content appropriately demand a huge amount of processing power from the CPU of today's mobile devices. Compounding this problem is the increasing interest 3D object tracking and depth-sensing cameras. Metaio CEO Dr. Thomas Alt will illustrate the current "CV Bottlenecks" and how GPU-based solutions can significantly improve the increasingly important mobile computer vision and augmented reality apps coming to market.

Level: All
Type: Talk
Tags: Augmented Reality & Virtual Reality; Computer Vision & Machine Vision; Machine Learning & Deep Learning

Day: Tuesday, 03/17
Time: 14:00 - 14:50
Location: Room LL21C

S5896 - A Performance, Energy and Accuracy Aware Benchmarking Methodology for Robot Vision

Luigi Nardi Research Associate, Imperial College London
Dr Luigi Nardi is a post-doctoral research associate at Imperial College London in the Software performance optimisation group. Luigi's primary role is to work in the co-design of high-performance computer vision systems where performance, power and accuracy are part of the same optimisation space. Luigi earned his Ph.D. in computer science creating a new performance domain-specific language (DSL) in the context of automatic code generation for applied mathematics. He has almost 10 years of experience in parallel computing and more than 4 years of experience developing GPU enabled codes using CUDA and OpenCL from desktop to embedded. Prior to his current position, Luigi was a permanent researcher at Murex S.A.S. working on the acceleration of production-level computational finance codes for pricing evaluation and risk management on cluster of GPUs.

We introduce SLAMBench, a publicly-available software framework for quantitative, comparable and validatable experimental research to investigate trade-offs in performance, accuracy and energy consumption for real-time 3D scene understanding. 3D scene understanding offers great potential for a new level of scene modelling, localisation and real environmental interaction for many types of robot, but its high computational requirements means that use on mass market embedded platforms is challenging. SLAMBench provides a KinectFusion implementation in C++, OpenMP, OpenCL and CUDA, and a powerful mechanism for reliable accuracy comparison of different implementation and algorithms. We experimentally investigate SLAMBench execution time, energy and accuracy on a variety of multicore and GPU-accelerated platforms.

Level: All
Type: Talk
Tags: Computer Vision & Machine Vision; Embedded Systems; Developer - Performance Optimization

Day: Tuesday, 03/17
Time: 14:00 - 14:25
Location: Room 210G

S5251 - Accelerating Automated Image Processing Pipelines for Cameras with Novel CFAs on GPUs

Qiyuan Tian Ph.D. Candidate, Stanford University
Qiyuan  Tian
Qiyuan Tian is a Ph.D. Candidate in the Department of Electrical Engineering at Stanford University. He received B.Eng. (2011) in Communication Science and Engineering at Fudan University, China, and M.S. (2013) in Electrical Engineering at Stanford University. He studied as an undergraduate exchange student (2009) in the Department of Electronic and Computer Engineering at The Hong Kong University of Science and Techonology. He is working on digital imaging, magnetic resonance imaging and neuroimaging.
Haomiao Jiang Ph.D. Candidate, Stanford University
Haomiao Jiang
Haomiao Jiang is a Ph.D. candidate in the Department of Electrical Engineering at Stanford University. He received B.A. (2011) in Information Security at Shanghai Jiao Tong University, China, and M.S. (2013) in Electrical Engineering at Stanford University. He is working with Professor Brian Wandell on color vision, display modeling and computational photography.

L3 (Local, Linear, Learned) is a new technology to automate and customize the design of image processing pipelines for cameras with novel architecture, such as unconventional color filter arrays. L3 classifies sensor image pixels into categories that are local in space and response and automatically learns linear operators that transform pixels to the calibrated output space using training data from camera simulation. The local and linear processing of individual pixels makes L3 ideal for parallelization. We accelerated the L3 pipeline on NVIDIA® Shield™ Tablets using GPUs for real time rendering of video captured by a multispectral camera prototype. The combination of L3 and GPUs delivers high performance with low power for image processing on mobile devices.

Level: All
Type: Talk
Tags: Defense; Video & Image Processing; Computer Vision & Machine Vision

Day: Tuesday, 03/17
Time: 14:30 - 14:55
Location: Room 210C

S5333 - SceneNet: 3D Reconstruction of Videos Taken by the Crowd on GPU

Chen Sagiv CEO, SagivTech Ltd.
Chen  Sagiv
Dr. Sagiv brings to SagivTech over 15 years of experience in the image processing industry both in Israel and the Netherlands. In addition to her activities with the company, she also collaborates with academic beneficiaries in Israel and Europe. Chen Sagiv holds a PhD from Tel Aviv University in Applied Mathematics, with specializations in texture analysis, filter banks and optimization problems.

If you visited a rock concert recently you probably recognized how many people are taking videos of the scenario, using their mobile phone cameras.The aim of SceneNet is to use these multiple video sources to create a high quality 3D video scene that can be shared via social networks. The SceneNet pipeline starts at the mobile device where the video streams are acquired, pre-processed and transmitted to the server, where the various video streams are registered and submitted to 3D reconstruction. We will share the compute challenges of SceneNet and the GPU based acceleration on mobile devices and the server, from pre-processing on the mobile device to extremely computationally demanding algorithms such as bundle adjustment and 3D reconstruction. SceneNet is a FP7 European funded project.

Level: All
Type: Talk
Tags: Computer Vision & Machine Vision; Developer - Algorithms; Video & Image Processing

Day: Tuesday, 03/17
Time: 14:30 - 14:55
Location: Room 210B

S5123 - Lights, Cameras, Action: Interactive Visualization and Computer Vision in the Car

Gernot Ziegler Senior Developer Technology Engineer (Computer Vision), NVIDIA
Highly-Rated Speaker
Gernot Ziegler
Dr Gernot Ziegler is an Austrian engineer with a PhD in Computer Science from University of Saarbrücken, Germany, and an MSc degree in Computer Science and Engineering from Linköping University, Sweden. He pursued his PhD studies at the Max-Planck-Institute for Computer Science, where he specialized in GPU algorithms for computer vision and data-parallel algorithms for spatial data structures. After six years in NVIDIA's Developer Technology (Compute) team, working with High Performance Computing, Gernot now has returned to his original research domain and consults in the use of GPU algorithms for computer vision in the automotive as a senior member of the NVIDIA's Developer Technology team for Computer Vision.

Learn how the GPU's real-time graphics capabilities can be used to interactively visualize and enhance the camera system of modern cars. The GPU simplifies design, interactive calibration and testing of the car's computer vision systems, and even allows for creating simulated environments where the behavior of the car's computer vision can be tested to pass standard safety tests or navigational street situations.

Level: Intermediate
Type: Talk
Tags: Automotive; Computer Vision & Machine Vision; Real-Time Graphics

Day: Tuesday, 03/17
Time: 15:00 - 15:25
Location: Room LL21F

S5383 - Mobile 3D Mapping With Tegra K1

Karol Majek Researcher, Institute of Mathematical Machines
Karol Majek
Karol is a PhD Student and Researcher at CUDA Research Center in Institute of Mathematical Machines. He is doing research in robotics. In the last two years he has focused on using CUDA technology in 3D mapping on robotic platforms. Currently he is working on embedding CUDA enabled algorithms to run on Tegra K1.

This work presents 3D mapping algorithm implemented on Tegra K1 device. The data processing pipeline is implemented in parallel using CUDA. The performance and accuracy of the final model were compared to the mobile and desktop GPU results. This work shows how to replace traditional CUDA-enabled laptops with embedded Tegra K1. Attendees will learn about the problems and challenges of embedding parallel 3D mapping algorithm and how to improve its speed.

Level: Intermediate
Type: Talk
Tags: Embedded Systems; Computer Vision & Machine Vision

Day: Tuesday, 03/17
Time: 15:00 - 15:25
Location: Room 210G

S5429 - Creating Dense Mixed GPU and FPGA Systems With Tegra K1 Using OpenCL

Lance Brown Director - Radar, EW and HPC, Colorado Engineering Inc
Lance Brown
Lance Brown has been in the COTS hardware world since 1999 after spending 5 years as a software engineering at Nortel and Motorola. Lance has been a field application engineer for Curtiss Wright and GE supporting GPU, CPU and FPGA products for telecom, networking and defense. He is now the director of radar, EW and HPC at Colorado Engineering Inc focusing on high TFLOP, low CSWAP systems. CEI's product line is based on 3D computing architectures co-developed with the Missile Defense Agency and Naval Research Labs. Lance is a graduate of the University of Texas - Arlington with a BS in Computer Science Engineering.

With the introduction of comprehensive OpenCL support and IEE754 hard floating point units for Altera FPGAs and availability of NVIDIA® Tegra® K1 GPUs, opportunities for designing compact solutions that used to require many discrete boards can now be done in small form factors for Distributed Aperture Systems (DAS), Situational Awareness 360 (SA360), Digital Signal Processing (DSP) and 100s of other high performance embedded computing (HPEC) from mil-aero to commercial to industrial to medical to consumer applications. Funded by Missile Defense Agency, Lance Brown will discuss the challenges and benefits of using multiple Altera Arria 10 FPGAs and multiple NVIDIA® Tegra® K1 GPUs on a single card to speed up 6 degrees of freedom simulations.

Level: Intermediate
Type: Talk
Tags: Defense; Embedded Systems; Signal & Audio Processing; Computer Vision & Machine Vision

Day: Tuesday, 03/17
Time: 15:00 - 15:25
Location: Room 210C

S5474 - CloudCV: Large-Scale Distributed Computer Vision as a Cloud Service

Dhruv Batra Assistant Professor, Virginia Tech
Dhruv Batra
Dhruv Batra is an Assistant Professor at the Bradley Department of Electrical and Computer Engineering at Virginia Tech, where he leads the VT Machine Learning & Perception group. He is a member of the Virginia Center for Autonomous Systems (VaCAS) and the VT Discovery Analytic Center (DAC). Prior to joining VT, he was a Research Assistant Professor at Toyota Technological Institute at Chicago (TTIC), a philanthropically endowed academic computer science institute located in the campus of University of Chicago. He received his M.S. and Ph.D. degrees from Carnegie Mellon University in 2007 and 2010 respectively, advised by Tsuhan Chen. In past, he has held visiting positions at the Machine Learning Department at CMU, and at CSAIL MIT. His research interests lie at the intersection of machine learning, computer vision and AI, with a focus on developing scalable algorithms for learning and inference in probabilistic models for holistic scene understanding. He has also worked on other topics such as interactive co-segmentation of large image collections, human body pose estimation, action recognition, depth estimation and distributed optimization for inference and learning in probabilistic graphical models. He was a recipient of the Carnegie Mellon Dean's Fellowship in 2007, the Google Faculty Research Award in 2013, the Virginia Tech Teacher of the Week in 2013, the Army Research Office (ARO) Young Investigator Program (YIP) award in 2014, and the National Science Foundation (NSF) CAREER award in 2014. His research is supported by NSF, ARO, ONR, Amazon, Google, Microsoft, and NVIDIA.

In this talk, Attendees can expect to learn about CloudCV, an ambitious system that will provide access to state-of-the-art distributed computer vision algorithms as a cloud service. Our goal is to democratize computer vision; one should not have to be a computer vision, big data and distributed computing expert to have access to state-of-the-art distributed computer vision algorithms. As the first step, CloudCV is focused on object detection and localization in images. CloudCV provides APIs for detecting if one of 200 different object categories such as entities (person, dog, cat, horse, etc), indoor objects (chair, table, sofa, etc), outdoor objects (car, bicycle, etc) are present in the image.

Level: All
Type: Talk
Tags: Computer Vision & Machine Vision; Machine Learning & Deep Learning; Data Center, Cloud Computing & HPC

Day: Tuesday, 03/17
Time: 15:00 - 15:25
Location: Room 210B

S5362 - A GPU-Accelerated 3D Kinematic Modeling Platform for Behavioral Neuroscience

John Long Post-doctoral Researcher, New York University Langone Medical Center
John Long
John is a postdoctoral researcher in the laboratory of Dr. György Buzsáki at the New York University Langone Medical Center. He received his PhD in neuroscience from the UC Berkeley Helen Wills Neuroscience Institute in 2011, in the Brain-Machine Interface laboratory of Dr. Jose Carmena. His current work in neuroscience leverages multiple camera photogrammetry and the power of GPUs to build 3D models of his neurophysiological subjects to study the relationships between memory formation in the brain, navigation, and action planning. He is also working within the clinical domain to develop a computer vision system for behaviorally diagnosing Parkinson's disease.

Computer vision techniques for 3D reconstruction and kinematic modeling are positioned to bring about a major advance in the field of behavioral neuroscience. Integrating GPUs into the software pipeline has qualitatively improved our ability to fit, inspect, and refine complex kinematic models. Our custom markerless motion capture system, in conjunction with our use of high-density silicon neural implants (≥ 100 channels), provides an unprecedented glimpse into the relationship between the brain, memory, and behavior.

Level: Intermediate
Type: Talk
Tags: Computer Vision & Machine Vision; Life & Material Science; Developer - Algorithms

Day: Tuesday, 03/17
Time: 15:30 - 15:55
Location: Room 210B

S5760 - Real-Time, Content-Driven Representations at Twitter

Clement Farabet Senior Software Engineer, Twitter
Clement Farabet
Clement Farabet is a senior software engineer at Twitter, where he leads the effort on representation learning for all things Twitter. Clement Farabet received a Master’s Degree in Electrical Engineering with honors from Institut National des Sciences Appliquées (INSA) de Lyon, France in 2008. His Master’s thesis work on reconfigurable hardware for deep neural networks was developed at the Courant Institute of Mathematical Sciences of New York University with Professor Yann LeCun, and led to a patent. He then joined Professor Yann LeCun’s laboratory in 2008, as a research scientist. In 2009, he started collaborating with Yale University’s e-Lab, led by Professor Eugenio Culurciello. This joint work later led to the creation of TeraDeep (www.teradeep.com). In 2010, he started the PhD program at Université Paris-Est, co-advised by Professors Laurent Najman and Yann LeCun. His thesis focused on real-time image understanding/parsing with deep convolutional networks. The main contributions of his thesis were multi-scale convolutional networks and graph-based techniques for efficient segmentations of class prediction maps. He graduated in 2013, and went on to cofound Madbits, a company that focused on representing, understanding and connecting images. Madbits got acquired by Twitter in 2014.

Twitter is a unique source of real-time information, offering amazing opportunities for automatic content understanding. The format of this content is diverse (tweets, photos, videos, music, hyperlinks, follow graph, ...), the distribution of topics ever-changing (on a weekly, daily, or sometimes hourly basis), and the volume ever-growing; making it very challenging to automatically and continuously expose relevant content. Manually defining features to represent this data is showing its limits. In this talk, I provide an overview of how automated, content-driven representations—enabled by modern deep-learning algorithms—enables us to build adaptive systems which capture the richness of this content. Specifically, the presentation focuses on deep representations for images and images+text.

Level: All
Type: Talk
Tags: Machine Learning & Deep Learning; Computer Vision & Machine Vision

Day: Tuesday, 03/17
Time: 15:30 - 15:55
Location: Room 210A

S5870 - Audi Piloted Driving: In the Fast Lane to the Future

Daniel Lipinski Senior Engineer, Audi of America
Daniel started working for Audi in 2008 as the lead developer for the European Traffic Sign Recognition system. In 2012 he joined the Volkswagen Electronics Research Lab (ERL) in Silicon Valley, where he led the application of several driver assistance systems to the U.S. market. Daniel is now the project lead with one of the most comprehensive Volkswagen Group and Audi research projects for piloted driving. One of his project cars is “Jack”, the Audi piloted driving concept car that successfully completed the 550 miles automated driving road test from Silicon Valley to Las Vegas. Lipinski studied Computer and Communications Systems Engineering at the Technical University in Braunschweig, Germany.

On the eve of CES 2015, Audi, ERL and VW Group Research accomplished the most dynamic automated driving road test yet, with non-engineers behind the wheel over 550 miles+ on public freeways. With the advanced Highway Pilot technology built into a car nicknamed "Jack", Audi demonstrated how far the automated technology has matured within the last decade. What enabled such complex technology is the massive growth in processing power, a field in which NVIDIA processors will be playing a central role in the future.

Level: All
Type: Talk
Tags: Automotive; Computer Vision & Machine Vision; Video & Image Processing

Day: Tuesday, 03/17
Time: 15:30 - 15:55
Location: Room LL21F

S5118 - Impressions: The Global Impact of Culture, Imagery and Visual Communication

Don Levy President & Cultivator, Smith Brook Farm
Don  Levy
Don Levy has been at the forefront of the entertainment industry's digital transformation, developing "the intersection of entertainment and technology" throughout his career and at Sony Pictures Entertainment (Columbia Pictures/Sony Pictures Digital) from 1995-2012. He founded Smith Brook Farm in 2012 as a creative consultancy and is also the co-founder of Spud Media, LLC, a new entertainment venture serving the family market. Levy attended New York University, received his B.A. from the University of Denver and earned certificates from UCLA's Anderson School of Business. Don is a member of the Academy of Motion Picture Arts & Sciences, serving on its feature animation nominating committee and recently chaired a working group for the Science and Technology Council. He also is a member of The Television Academy's Interactive Peer Group, The Visual Effects Society, ASIFA Hollywood, the International Photographers Guild and METAL, the Media, Entertainment and Technology Alpha Leaders organization. Levy is a frequent speaker on the subjects of innovation, digital creativity, education and visual effects. His 2012 talk on the principles and evolution of visual effects at the TED Conference in Long Beach, CA was posted on TED.com in January 2013. He is active in local education issues and organizes TEDxConejo in association with the Conejo Valley (Thousand Oaks, Ca) Unified School District.

We are what we see. The question is how does what we see influence our lives and the lives of future generations? We live in a visual world. This has brought us closer together and enabled people everywhere to share everything from the latest pop culture phenomenon to the most catastrophic news. Infographics and animation explain every subject. From an early age, he's appreciated the power of images to move people. Today, the line between fact and fiction is virtually gone. Many of the images that impressed me in my most formative years were of dreams and hope and aspiration. Others made me think. With a curiosity born of my Hollywood experience in the dream factory and thinking back on how the pictures of my own youth continue to influence, I'll share with you some thoughts and ideas

Level: All
Type: Talk
Tags: Media & Entertainment; Computer Vision & Machine Vision; Augmented Reality & Virtual Reality; Developer - Performance Optimization

Day: Tuesday, 03/17
Time: 16:00 - 16:25
Location: Room LL21D

S5373 - GPU + Drones + 3D Imaging for Precision Farming and Construction

Bingcai Zhang Tech Fellow, BAE Systems
Bingcai Zhang
Dr. Zhang is a technical fellow at BAE Systems, the premier global defense and aerospace company. He joined BAE Systems in September 1995 right out of University of Wisconsin-Madison, where he earned his Ph.D. in engineering college and MS in computer science. His research interests are: (1)geospatial information technology and 3D mapping; (2)robot vision and unmanned systems; and (3)3D geoweb search. He has held positions as chief architect, chief photogrammetrist, R&D manager, and technical fellow with BAE Systems. Dr. Zhang has three inventions: (1)Embedded Photogrammetry, (2)Next Generation Automatic Terrain Extraction (NGATE), and Automatic 3D Object Extraction. Embedded photogrammetry is a concept to embed a precise 3D measurement component called photogrammetry into non-photogrammetry applications such as GIS and CAD. NGATE generates 3D terrain model from stereo images. AFE is a production capable system that automatically extracts 3D objects such as houses, buildings, trees from a digital surface model or LiDAR point clouds.

Agriculture and construction are two of the largest industries in the world. Democratization of 3-D imaging technology with drones, digital cameras, and GPU is applicable for precision farming and construction. Precision farming can increase crop yields, reduce pollution, save water, and increase productivity. The demand for precision farming has since increased, however, with more people living on planet Earth with fixed natural resources. Timely precise 3-D measurements are important for construction. Today, most of these 3-D measurements are obtained manually. BAE Systems is developing GPU-accelerated 3-D imaging technology with drone images for precision farming and construction.

Level: All
Type: Talk
Tags: Computer Vision & Machine Vision; Video & Image Processing

Day: Tuesday, 03/17
Time: 16:00 - 16:25
Location: Room 210B

S5599 - Gesture Recognition: Using a Multi Sensor Approach

Shalini Gupta Senior Research Scientist, NVIDIA Research
Shalini Gupta is a Senior Research Scientist at NVIDIA Research. Formerly, she was a Senior Mobile Computer Vision Engineer at NVIDIA, and an Imaging Scientist at Texas Instruments. Shalini received her Doctoral degree in Electrical and Computer Engineering from the University of Texas at Austin in 2008.

For accurate and power-efficient in-vehicle hand-gesture recognition, a novel multi-sensor system can be comprised of a short-range radar, a color camera, and a depth camera, which together make the system robust against variable lighting conditions. To jointly calibrate the radar and depth sensors, a convolutional deep neural networks can fuse data from multiple sensors and to classify the gestures. This algorithm accurately recognizes 10 different gestures acquired indoors and outdoors in a car during the day and at night, consuming significantly less power than purely vision-based systems.

Level: All
Type: Talk
Tags: Automotive; Computer Vision & Machine Vision

Day: Tuesday, 03/17
Time: 16:00 - 16:25
Location: Room LL21F

S5811 - TK1-Based Solutions for Intelligent Video Analytic Applications

Hai Tao CEO, Beijing Vion Technology Inc. (BVT)
Hai  Tao
Hai Tao is the founder and CEO of Beijing Vion Technology Inc. He has 25 years of experience in image processing and computer vision. Prior to BVT, he was an associate professor at UC Santa Cruz. Dr. Tao received his Ph.D. degree from University of Illinois at Urbana Champaign.

This talk demonstrates how GPU-based embedded computer vision system are transforming the world of video processing in several vertical markets including ATM safety, intelligent transportation systems (ITS), business intelligence (BI), and smart video surveillance. By taking full advantage of the TK1's 300+ GFLOPS computing power, BVT has built and deployed embedded systems for people counting, shopping traffic gender and age analysis, perimeter monitoring, violence and chasing detection, and ATM service area protection. These application systems require development of custom-made computer vision algorithms and efficient implementation of these algorithms in GPU. In addition, we will also demonstrate how the world's first TK1-based smart cameras are being developed for various applications including license plate recognition, face recognition and crowd management. Compared to the previous DSP-based smart camera solution, the powerful embedded GPU-based solution is the first that can support imaging sensor resolution up to 12 mega-pixels. The talk will also provide technical details on Cuda implementation of several computer vision algorithms.

Level: All
Type: Talk
Tags: Embedded Systems; Computer Vision & Machine Vision

Day: Tuesday, 03/17
Time: 16:00 - 16:25
Location: Room 210G

S5873 - Optimized GPU Kernels for Deep Learning

Amir Khosrowshahi CTO and Co-Founder, Nervana Systems
Amir is CTO of Nervana Systems, a startup bringing unprecedented performance and scale to deep learning. He has a Ph.D. from UC Berkeley, and MA and BA from Harvard.

Deep learning has recently achieved great success in domains such as images, speech, and text. These gains have been made possible by efficient GPU implementations such as cuDNN. We show optimizations at the assembly level that result in significant performance improvements over existing methods. In particular, we show how operations such as convolutions and dense matrix multiply can be efficiently implemented using a custom assembler to attain state-of-the-art performance on the NVIDIA Maxwell GPU architecture. Additionally, we can significantly reduce memory bandwidth and run much larger models by using limited precision with a minimal tradeoff in model accuracy.

Level: Beginner
Type: Talk
Tags: Machine Learning & Deep Learning; Computer Vision & Machine Vision; Developer - Performance Optimization

Day: Tuesday, 03/17
Time: 16:00 - 16:25
Location: Room 210A

S5740 - Clarifai: Scaling Deep Learning

Matthew Zeiler CEO, Clarifai
Matthew Zeiler
Matthew Zeiler, PhD, Founder and CEO of Clarifai Inc. studied machine learning and image recognition with several pioneers in the field of deep learning at University of Toronto and New York University. His insights into neural networks produced the top 5 results in the 2013 ImageNet classification competition. He founded Clarifai to push the limits of practical machine learning, which will power the next generation of intelligent applications and devices.

The field of artificial intelligence and machine learning is in a period of rapid, revolutionary improvement. Across many application areas, deep learning is proving to outperform techniques that were considered state of the art only a few years ago. Although the fundamental techniques were developed in the 1980s and 1990s, it was only recently that they were applied at large scale, due to the advent of general-purpose GPU computing and the availability of internet-scale datasets. The deep learning experts at Clarifai have spent years working alongside pioneers of the field and form a team who has vast experience developing new deep learning techniques and building state of the art systems that solve real problems. In this talk we will present some of the latest technologies we have developed and show how they can be applied to power a new generation of intelligent applications.

Level: All
Type: Talk
Tags: Machine Learning & Deep Learning; Computer Vision & Machine Vision; Developer - Tools & Libraries

Day: Tuesday, 03/17
Time: 16:30 - 16:55
Location: Room 210A

S5761 - Achieving Real-Time Performances on Facial Motion Capture and Animation on Mobile GPUs

Emiliano Gambaretto Director of Research, Mixamo
Emiliano Gambaretto
Emiliano Gambaretto obtained a PhD degree in Bioengineering from Politecnico di Milano (Italy) in 2011 for his research on Markerless Motion Capture. Part of his research was carried out at Stanford Biomotion Lab in 2006. He also received a Master's Degree in Biomedical Engineering from Politecnico di Milano and a Diplome d'Ingenieur from Ecole Centrale de Lyon (France) in 2006. He was part of Mixamo's founding team in 2008. He's currently Director of Research at Mixamo. His job consists of designing and developing the technologies behind Mixamo's services. That includes motion models, auto-rigging, real-time face animation, integration with 3rd-party software and industry standards.

3D Animation is one of the most prominent forms of contemporary art, with millions people drawn to its emotional power in movie theaters and games every year. Mixamo developed a GPU-powered facial capture and animation technology to enable users to animate a character's face in real-time. This technology, originally targeted to desktop and laptop GPUs, is now available on mobile thanks to the improved performance of new generation hardware. This presentation will focus on the challenges faced and the strategies adopted to port this technology to Tegra K1 powered devices. We adopted two parallel approaches: one approach optimized our tracking algorithm and ported our code to CUDA (from OpenCL); the other completely changed facial tracking paradigm focusing on an intrinsically faster machine learning approach based on a cascade of simple regressors. We will compare performances and strengths of both approaches.

Level: All
Type: Talk
Tags: Computer Vision & Machine Vision; Developer - Algorithms; Developer - Performance Optimization

Day: Tuesday, 03/17
Time: 16:30 - 16:55
Location: Room 210B

S5751 - Stereovision and the Future of Autonomous Machines

Edwin Azzam CTO, STEREOLABS
Edwin Azzam co-founded STEREOLABS in 2010. As STEREOLABS’s Chief Technical Officer, Edwin is responsible for leading the company’s product development and technology strategy in stereoscopic image processing. Prior to founding STEREOLABS, Edwin was a project manager at Airbus Defence and Space. Edwin holds a Master’s degree in Optics & Image Processing from Institut d’Optique, France, as well as a Master’s degree in Management from ESSEC Business School. He is a PhD supervisor and a National Technical Expert for the ANR (National Research Agency), where he uses his technical and market expertise for the assessment of national research projects in the field of computer vision and 3D image processing. Edwin was honored twice with the National Innovation Prize by the French Ministry of Research. Between 2010 and 2014, Edwin received 10 different distinctions for his achievements in the stereoscopic 3D field. In 2010, he won the European Innovation Award with STEREOLABS which recognizes the most promising and innovative technological companies in Europe.

Discover how stereovision and 3D depth sensing on mobile GPUs enable the development of future autonomous cars, drones and robots. We will discuss the benefits and challenges of using stereo cameras as depth sensing sensors, and how to leverage the power of embedded GPU to overcome these challenges. We will also show demonstrations of how the technology can be used to create 3D surrounding reconstruction, detect obstacles and navigate autonomously.

Level: All
Type: Talk
Tags: Computer Vision & Machine Vision; Automotive; Video & Image Processing

Day: Wednesday, 03/18
Time: 09:00 - 09:25
Location: Room 210B

S5317 - Development of a GPU Accelerated Visual Tracking Framework

David Concha Researcher, Universidad Rey Juan Carlos
David Concha
David received his B.Sc. Degree in Computer Science from Universidad Rey Juan Carlos (URJC) in 2011 and is currently a Ph.D. student and grant holder at Universidad Rey Juan Carlos. His research interests focus on Computer Vision and GPU Computing. Some research works done recently, exploits the graphics hardware to accelerate Computer Vision algorithms. In particular, David uses GPUs to accelerate methods related to 3D/2D motion tracking, medical image reconstruction, face recognition, high-definition depth maps computation, image segmentation, etc.

This session presents the development of a visual tracking system whose ultimate goal is to track multiple articulated objects. Throughout the development, different technologies for GPU programming are used, like OpenGL, Cg and CUDA; various types of sensor such as cameras or Kinects; and different methodologies like particle filters, Kalman filter or Variable Neighborhood Search (VNS) metaheuristic.

Level: All
Type: Talk
Tags: Computer Vision & Machine Vision; Video & Image Processing

Day: Wednesday, 03/18
Time: 09:30 - 09:55
Location: Room 210B

S5205 - Real-Time and High Resolution Feature Tracking and Object Recognition

Peter Andreas Entschev Software Engineer, ArrayFire
Peter Andreas Entschev
Peter Entschev is currently a software engineer at ArrayFire, where he primarily works on concurrent computer vision problems. He has received his Bachelor's degree in Telecommunication Systems and Master's degree in Computer Science from the Federal University of Technology - Paraná (UTFPR), Brazil. Before joining ArrayFire, he worked on real-time computer vision research at SEW-Eurodrive in Germany and with system administration and development of Linux distributions for the Brazilian Government.

This session will cover real-time feature tracking and object recognition in high resolution videos using GPUs and productive software libraries including ArrayFire. Feature tracking and object recognition are computer vision problems that have challenged researchers for decades. Over the last 15 years, numerous approaches were proposed to solve these problems, some of the most important being SIFT, SURF and ORB. Traditionally, these approaches are so computationally complex that processing more than a few frames per second is impossible. Using an NVIDIA K20 GPU with ORB, we are able to process more than 30 frames per second on images in the order of 10000x10000 pixels. Multiple quality and timing benchmarks will be presented, covering some of the most robust feature tracking methods.

Level: All
Type: Talk
Tags: Computer Vision & Machine Vision; Developer - Algorithms; Video & Image Processing

Day: Wednesday, 03/18
Time: 10:00 - 10:25
Location: Room 210B

ECS5001 - CEO Show & Tell: Paracosm

Amir Rubin CEO, Paracosm
Amir Rubin
Amir co-founded Paracosm in Jan 2013, and currently serves as CEO. Prior to Paracosm, Amir spent the past 10 years developing 3D-vision systems and 3D-simulation applications.

He co-founded his first company, Prioria Robotics, while he was still a computer-engineering student at the University of Florida. At Prioria he developed vision systems for small drones. Most recently, he was employee #1 at a successful Florida startup, Shadow Health, which develops interactive healthcare simulations. He also holds a patent for weighing cows based on 3D-imagery and photographs of them.

Paracosm enables robots and augmented reality applications to understand and interact with the world around them. Their core technology is a spatial intelligence platform that provides the tools to collaboratively capture interior spaces, generate 3D maps, and create immersive experiences. They are a team of engineers, artists, and dreamers based in Gainesville, FL and Cambridge, MA.

Level: All
Type: Talk
Tags: Computer Vision & Machine Vision; Emerging Companies Summit

Day: Wednesday, 03/18
Time: 10:15 - 10:30
Location: Room 220B

ECS5002 - CEO Show & Tell: Herta Security

Javier Rodríguez Saeta CEO, Herta Security
Javier Rodríguez Saeta
Javier Rodríguez Saeta received the B.S., M.S. and Ph.D. degrees in Electrical (Telecommunication) Engineering from the Technical University of Catalonia, UPC, Barcelona (Spain), in 2000 and 2005, respectively. He has also received the B.A. degree in Business Administration by the Open University of Catalonia (UOC), and the MBA by ESADE Business School. In 2000 he worked for Robert Bosch, GmbH, in Hildesheim (Germany). In 2001, he joined Biometric Technologies, S.L., in Barcelona (Spain), where he was the R&D Manager. He founded Herta Security in 2009 and became the CEO of the company.

Herta Security is a world leader in the development of cutting edge facial recognition solutions. Based in Barcelona, Spain, – with offices in Madrid and London -, the company offers fast, accurate, robust, end-customer oriented solutions for video surveillance, access control, and marketing requirements.

Level: All
Type: Talk
Tags: Computer Vision & Machine Vision; Emerging Companies Summit

Day: Wednesday, 03/18
Time: 10:30 - 10:45
Location: Room 220B

S5221 - Tracking Objects Better, Faster, Longer

Alptekin Temizel Associate Professor, Middle East Technical University
Alptekin Temizel
Dr. Alptekin Temizel is an associate professor at Informatics Institute, Middle East Technical University (METU). He received his BSc in Electrical and Electronic Engineering from METU, Ankara, Turkey (1999) and his PhD from Centre for Vision, Speech and Signal Processing, University of Surrey, UK (2006). Between 1999-2001 he worked as a research assistant in University of Hertfordshire, UK. He co-founded Visioprime Ltd., UK –a company developing intelligent video systems for security and surveillance applications- and worked as a senior research engineer in this company between 2001-2006. Since 2006, he is a professor in Graduate School of Informatics, Middle East Technical University (METU), Turkey and consultant to several R&D companies. He is the principle investigator of Virtual Reality and Computer Vision Research Group (VRCV), NVIDIA CUDA Teaching Center and CUDA Research Center. His main research interest areas are: image and video processing, video surveillance, computer vision, parallel programming and GPU programming.

In this talk, we demonstrate a real-time long-term-tracker, Hybrid-TLD (H-TLD), which is based on the recently proposed Tracking-Learning-Detection (TLD) framework. TLD simultaneously tracks the object, learns its appearance and detects when it re-appears. While it has been shown to have promising results, its high computational cost prohibits running it at higher resolutions and frame-rates. We present our analysis of the framework and our modifications to make it work effectively on a CPU-GPU hybrid setting with a high utilization of both processing units using OpenMP and CUDA. Our results show that 10.25 speed up at 1920x1080 resolution could be obtained. The source code of the developed H-TLD library has been made publicly available.

Level: Intermediate
Type: Talk
Tags: Computer Vision & Machine Vision; Video & Image Processing

Day: Wednesday, 03/18
Time: 10:30 - 10:55
Location: Room 210B

ECS5004 - CEO Show & Tell: Jibo

Cynthia Breazeal Chief Scientist, Jibo
Cynthia Breazeal
Dr. Cynthia Breazeal is Chief Scientist of Jibo, Inc. She is also an Associate Professor at the MIT Media Lab where she directs the Personal Robots Group. Breazeal is recognized as a prominent innovator at the intersection of technology and society as the pioneer of Social Robotics. Her research spans both the creation of intelligent and socially responsive robots, as well as studying their impact on contributing to people's quality of life across education & learning, creativity, health, telecommunications, aging, entertainment, play, and more. Jibo, Inc. brings the technologies, design insights, and user experience of social robots to the home as the world's first family robot to help busy families to manage, care for, coordinate and connect with loved ones with greater ease, engagement, efficacy, and fun. As an open platform, Jibo enables 3rd party developers to bring the engagement and emotional lift of social robots to their apps and services.

Described by the company as the "world's first family robot," Jibo looks straight out of Pixar, but the plans that founder and Chief Scientist Breazeal has for the in-home social robot are very real. Jibo first appeared on the scene last summer as an Indiegogo crowd fund-raiser, bringing in the tidy sum of $2.3 million, and just recently announced it's raised $25.3 million in Series A funding. It's also taken almost 5,000 pre-orders to date, which are expected to start shipping at the end of this year. What will Jibo do? When fully realized, Jibo will engage as a helpful part of the family, a companion who knows the other members, and provides them with personalized messages and reminders, serves as the family photographer, tells stories to the kids, etc. As an open 3rd party developer platform, Jibo's skills will continue to expand, eventually providing services like ordering pizza and so much more. The company's goal is for Jibo to "help families manage, care for, coordinate, and connect with greater ease, engagement, efficiency, and fun."

Level: All
Type: Talk
Tags: Computer Vision & Machine Vision; Emerging Companies Summit

Day: Wednesday, 03/18
Time: 11:00 - 11:15
Location: Room 220B

S5147 - Faster Convolutional Neural Networks by Separable Filters

Che-Rung Lee Professor, National Tsing Hua University
Che-Rung Lee
Che-Rung Lee received his B.S. and M.S. degrees in Computer Science from National Tsing Hua University Taiwan in 1996 and 2000 respectively, and the Ph.D. degree in Computer Science from University of Maryland, College Park in 2007. He joined the Department of Computer Science at National Tsing Hua University as an assistant professor in 2008 and became an associate professor in 2013. His research interests include numerical algorithms, scientific computing, high-performance computation, and cloud computing. He is the chair of CCOE (CUDA Center Of Excellence) in NTHU (National Tsing Hua University). He is a member of IEEE and SIAM.

Learn how to accelerate the training process of the convolutional neural networks (CNNs) for image recognition on GPU with separable filters. It uses Singular Value Decomposition (SVD) to approximate 2D filters by the product of two 1D filters, and performs two 1D convolutions consecutively. The GPU implementation consists of two kernels. First is a batched SVD routine on GPUs that can compute multiple small matrices simultaneously. Second is the computation of convolution, which combines three methods using different memory spaces for various filter size. Experimental results show that the implementations can achieve 1.35x~2.66x speedup in the forward pass and the backward pass comparing to the state of art GPU implementations of CNNs.

Level: Intermediate
Type: Talk
Tags: Machine Learning & Deep Learning; Computer Vision & Machine Vision; Developer - Performance Optimization

Day: Wednesday, 03/18
Time: 14:00 - 14:25
Location: Room 210A

S5295 - Next Generation Surround-View for Cars

Miguel Sainz Principal Engineer, Computer Vision, NVIDIA
Miguel Sainz is a Principal Engineer in Computer Vision at NVIDIA. His research interests and focus include real-time 3D graphics, image based modelling and rendering, camera calibration, 3D model reconstruction from images, tracking and image processing. Prior to joining NVIDIA Miguel received a degree in Electrical Engineering at the Polytechnic University of Catalonia, UPC,(Spain) and a PhD. degree in Electrical and Computer Engineering at the University of California, Irvine.
Timo Stich Sr. Developer Technology Engineer, NVIDIA
Highly-Rated Speaker
Timo Stich
Timo Stich is a Senior Developer Technology Engineer for NVIDIA Corporation. His focus is on image processing applications of Graphics Processors. Prior to joining NVIDIA he was research staff in Computer Graphics and Image Processing at the Max-Planck-Institute for Computer Science and the Computer Graphics Lab of Brunswick University, Germany. He received a diploma degree in Computer Science from Mannheim University, Germany and a Ph.D. degree from the Brunswick University, Germany.

A robust proof of concept Surround-Vision and Top View system for cars includes four car mounted cameras as inputs and the Jetson Pro platform as the computation and display unit, relying on CUDA and OpenGL for both GPGPU and rendering of the final views. Topics covered will include: the placement and calibration of the cameras, color correction and data preprocessing. A technical deep dive on the common pitfalls will highlight common visual artefacts in Top-View visualizations, and will present the algorithmic building blocks to correct for those errors.

Level: Intermediate
Type: Talk
Tags: Automotive; Computer Vision & Machine Vision; Real-Time Graphics

Day: Wednesday, 03/18
Time: 14:00 - 14:25
Location: Room LL21F

ECS5006 - Early Stage Challenge: GeekSys

Luiz Vitor Martinez Cardoso CEO, GeekSys
Luiz Vitor Martinez Cardoso
Luiz is a 26 years engineer and entrepreneur who was nominated as one of the most innovative professionals on both communication and marketing in South America. Luiz got his dual engineering degree in computer and electronics engineering from a top Brazilian school and accumulates previous experiences on academia, small, mid and multinational companies, including GE. Self-learner, Luiz has a very special way of seeing the world, being able to anticipate trends, architect technologies from scratch and delivery it to market. Today Luiz is dedicating all efforts to make GeekSys be the leader in the Store Performance Management (SPM) field.

GeekSys is the most innovative and awarded brick & mortar analytics start-up in Brazil today. GeekSys has born from two engineering class mates curiosity while they were trying to develop a marketing tool to understand how customers behave in front of a shopping window back in 2010. After two years of intense R&D, GeekSys released his first product into the market. From 2012 to 2014, GeekSys released tree more products and received 7 awards in technology, innovation and business models. GeekSys was the first company in the world to be able to read and quantify the purchase intention inside physical stores and translate it to a more natural language to retailers. GeekSys was also the creator of the Store Performance Management (SPM) concept, comparable to CRM and ERP. Today GeekSys has big customers in Brazil and is integrating all previous complex technologies into a single and uniform platform, as never seen before. GeekSys is working hard to lead the retail analytics market and our key advantage is the ability to use technology as a path to value.

Level: All
Type: Talk
Tags: Computer Vision & Machine Vision; Big Data Analytics; Emerging Companies Summit

Day: Wednesday, 03/18
Time: 14:30 - 14:38
Location: Room 220B

S5255 - Power Efficient Visual Computing on Mobile Platforms

Brant ZHAO GPU Architect, NVIDIA
Brant is a GPU architect at NVIDIA Shanghai. He is focusing on GPU computing analysis and architecture investigation. His study targeted at providing performance and power optimized implementation of computing applications on current generation of GPUs as well as architecture improvements for next generation of GPUs to help current applications achieve better GPU utilization and power efficiency.

Tegra K1 brings desktop GPU into mobile world which makes it possible for mobile platforms to succeed in more and more complex visual computing tasks. With future more powerful Tegra family chips, much more compute applications are expected for the mobile world. Besides performance tuning for all these applications, it's also critical to make them power efficient as they are running on mobile devices which have limited power budget. In this work, we will present methodology of doing power analysis and optimizations for mobile computing workloads. Three case studies will be presented to explain the three items of the methodology:(1) Analyze the whole pipeline at system level; (2) Using energy efficient features of the target platforms; (3) Reduce the total instruction count to save energy.

Level: Intermediate
Type: Talk
Tags: Computer Vision & Machine Vision; Developer - Performance Optimization; Automotive

Day: Wednesday, 03/18
Time: 14:30 - 14:55
Location: Room 210B

S5629 - Reconstruction Networks for Efficient Face Detection and Landmark Localization

Bo Yu Visiting Researcher, Carnegie Mellon University
TBD
Ian Lane Assistant Research Professor, Carnegie Mellon University
Ian Lane
Ian Lane is an Assistant Professor at Carnegie Mellon University. He leads the speech and language-processing group at CMU Silicon Valley and performs research in the areas of Speech Recognition, spoken language understanding and speech interaction. Ian and his group are developing methods to accelerate speech and language technologies using General Purpose Graphics Processing Units (GPUs). His group has already obtained 1000x speedup for signal processing tasks, 100x speedup for Viterbi training and over 20x speedup for complex tasks such as graph search. These new technologies have enabled the group to explore novel interaction paradigms for human machine interaction.

In this talk we introduce Reconstruction Networks, a novel neural network-structure that enables extremely efficient object detection in images. Reconstruction Networks directly reconstruct the regions of interest of one or more objects within an image without explicitly performing image segmentation or generating key point descriptors. We show that Reconstruction Networks can learn the structure of face and facial landmarks automatically, even under various poses and illumination conditions and outperform state-of-the-art performance for Face Detection and Facial Landmark Localization while requiring only a fraction of the computational cost.

Level: Intermediate
Type: Talk
Tags: Machine Learning & Deep Learning; Computer Vision & Machine Vision; Automotive

Day: Wednesday, 03/18
Time: 14:30 - 14:55
Location: Room 210A

ECS5009 - Early Stage Challenge: Replica Labs

Jack Morrison CTO, Replica Labs
Jack Morrison
Jack got started programming during his undergrad at Bowdoin College. He got hooked on robotics and computer vision as a member of Northern Bites RoboCup team at Bowdoin. Since then, he's spent his hours working on optical navigation for UAVs and researching distributed SLAM with cellphones. He resides in Boulder, Colorado with his fiance and their pets.

Replica Labs is a computer vision company focused on dense reconstruction from video feeds alone. Using the amazing, highly parallelizable power of Nvidia's CUDA core technology, we are able to translate single lens video feeds, that are available from any smartphone, into dense and highly accurate 3D reconstructions. Replica Labs' current focus is to use this amazing core technology to disrupt the way consumers take measurements of objects in their every day lives. With our robust software solution, we are able to reconstruct objects at a consumer's home with sub-millimeter accuracy! Replica's first product, Rendor, will empower billions of phones to become 3D scanners that can then transform the landscape and reach of 3D rendering and the reach that such information will have in e-commerce. With a small team of computer vision scientists and engineers, Rendor is currently under development and in an open beta phase of testing.

Level: All
Type: Talk
Tags: Computer Vision & Machine Vision; Video & Image Processing; Emerging Companies Summit

Day: Wednesday, 03/18
Time: 15:00 - 15:08
Location: Room 220B

S5577 - Building State-of-Art Face Processing Pipeline with GPU

Shuchang Zhou Principal Scientist, Megvii Inc.
Shuchang Zhou
Shuchang Zhou got his bachelor's degree from Tsinghua University in 2004 and Master's degree from National University of Singapore in 2005. Before joining Megvii in 2014, he had worked as Assistant Professor in Chinese Academy of Sciences and a software engineer in Google. He was owner of multiple American and international patents.

Megvii Inc. revisited face-related problems with deep learning techniques powered by GPU. Substantial progress had been made and performance kept increasing with inflowing of data. This brings facial recognition technique closer to solving the identity problem, which is fundamental to security, credibility and accountability of Internet. Availability and power-efficiency of GPU enables Megvii to explore deeper and more complex neural network topology, handle higher resolution images and videos, and extend to embedded devices of more limited power profile. As time of writing, Face++ of Megvii is a leading face recognition service provider on cloud, and has processed more than 40 billion images and run on 50 million devices.

Level: All
Type: Talk
Tags: Computer Vision & Machine Vision; Big Data Analytics; Machine Learning & Deep Learning

Day: Wednesday, 03/18
Time: 15:00 - 15:25
Location: Room 210B

ECS5012 - Early Stage Challenge: QM Scientific

Faris Alqadah CEO, QM Scientific
Faris Alqadah
Faris leads the overall vision of QM Scientific to be the smartest shopping intelligence platform that answers consumers everyday questions simply, accurately and in real-time. In addition, he leads data science development and holds a PhD in the field from the University of Cincinnati. Prior to QM Scientific, Faris built very large scale consumer propensity, segmentation, and recommender systems as a senior data scientist at PayPal. Previously, he served as a fellow at the Johns Hopkins School of Medicine where he applied data science research to solve challenging problems in genomics and proteomics. His data science research work has been published in leading peer-reviewed conferences and journals and was awarded a Best Doctoral Forum Poster Award and twice nominated for Best Paper Awards. For fun Faris bangs his head to epic heavy metal music and fights monsters with his two children.

QM Scientific (QMS) is a shopping intelligence company whose platform empowers consumers to make smart buying decisions in real time. Targeting the grocery retail vertical, our goal is to enable consumers to answer everyday questions such as: • "What is the best store to shop at right now for my list?" • "Are there cheaper alternatives for the products I buy regularly?" • "How much do I spend on diapers and beer monthly?" Simply, accurately and in real time. The QMS platform utilizes proprietary Data Science, Computer Vision and Natural Language Processing technology to intelligently extract, connect, and organize millions of products and prices from thousands of sources including the web, partner datasets, and receipt/product images. In December 2014, QMS launched PriceSwarm: a grocery price comparison app built on the QMS platform. With PriceSwarm, users create shopping lists in natural language and the platform recommends a store to shop at by optimizing price, quality, shopping behavior and location. In addition, users contribute real time prices and receive personalized cost saving recommendations and analytics by simply snapping a picture of their receipt.

Level: All
Type: Talk
Tags: Machine Learning & Deep Learning; Computer Vision & Machine Vision; Emerging Companies Summit

Day: Wednesday, 03/18
Time: 15:30 - 15:38
Location: Room 220B

S5368 - Nonlinear Structured Prediction Using the GPU

Alexander Schwing Postdoctoral Fellow, University of Toronto
Alexander Schwing
Alex Schwing's research interests are optimization algorithms, statistical models and parallelization of implementations for high performance computing environments. Interesting playground for all three fields are inference and structured prediction in machine learning as well as computer vision, and in particular 3D scene understanding. He is currently working with Prof. Ruslan Salakhutdinov and Prof. Raquel Urtasun as a postdoc within the machine learning group at University of Toronto, Computer Science department. In 2013 he completed his PhD under supervision of Prof. Marc Pollefeys, Prof. Tamir Hazan and Prof. Raquel Urtasun within the Computer Vision and Geometry (CVG) group of the computer science department of ETH Zurich (ETHZ).

Learn how to combine deep neural networks with probabilistic models to build classification algorithms that jointly reason about multiple variables. Typically, deep neural networks reason about a single object of interest within the observed data, e.g., a single object within an image. We show how to enrich deep learning to jointly predict a set of random variables while leveraging learned variable correlations. To this end we present an efficient GPU driven algorithm based on neural networks that is able to jointly capture nonlinearities for multiple variables and their correlations.

Level: Intermediate
Type: Talk
Tags: Machine Learning & Deep Learning; Computer Vision & Machine Vision

Day: Wednesday, 03/18
Time: 15:30 - 15:55
Location: Room 210A

S5546 - GPU Accelerated Haze Removal on Tegra K1

Bin Zhou Adjunct Research Professor, University of Science and Technology of China
Bin Zhou
Dr. Bin Zhou is the director and chief scientist of Marine Information Processing Laboratory(MIPL) at Institution of Oceanography, Shandong Academy of Sciences. He serves as an Adjunct Research Professor in School of Information Science and Technology at USTC and an NVIDIA CUDA Fellow. He is the PI of CUDA research center (CRC) in Institute of Advanced Technology(IAT), USTC.In MIPL, he leads a team working on information processing systems for marine environmental pollution & natural hazard monitoring and ocean-atmosphere simulation. In CRC, he performs researches on drones control, video processing and computer vision algorithms on NVIDIA GPU/CUDA platform.

This talk shows how Tegra K1 GPU accelerates the dehazing process for outdoor computer vision systems. Toxic haze becomes a major air pollution threat in China, which affects not only public health but also outdoor computer vision systems. By adapting dark channel prior method into dehazing process, very good effects are achieved. However, huge processing requirements bring big challenges. We refined the parallel algorithm and performed deep-optimization on Tegra K1 Jetson platform. Compared to ARM CPU, experiments show 156x speedup. The results show Tegra K1 has great potential for embedded real-time computer vision processing.

Level: All
Type: Talk
Tags: Computer Vision & Machine Vision; Video & Image Processing

Day: Wednesday, 03/18
Time: 15:30 - 15:55
Location: Room 210B

S5789 - The Fast Lane from Silicon Valley to Munich

Uwe Higgen BMW Group Technology Office USA, BMW Group
Uwe Higgen
Effective September 1, 2014 Uwe Higgen was appointed Head of the BMW Group Technology Office USA. In this position, he is responsible for accelerating the delivery of automotive innovation to customers through the evaluation, development and design of new technologies. Higgen oversees a team of highly talented engineers specializing in Connected Car, Electromobility, Powertrain, Autonomous Driving and User Experience/Interface Design. His Silicon Valley-based team produces work that enables BMW to be the future, see the future and reimagine the future of world-class automotive engineering for individual mobility. The BMW Group Technology Office USA focuses on human-machine interfaces, mechatronics, infotainment, telematics and creating new portals and opportunities for business communication. Prior to his arrival in the U.S., Higgen served as the Head of BMW Group AppCenter in Munich. In this role, he was responsible for delivering the integration of information and entertainment smartphone applications into the vehicle. Higgen began his career at BMW Group in 2001. He holds a master degree in Computer Science from The Carl von Ossietzky University of Oldenburg.

Learn how the BMW Group Technology Office in Silicon Valley integrates with the automaker's world-wide research and development departments, with a specific focus on an active safety system running on NVIDIA hardware, recently developed for the i3. As one of the first automakers to open a research and development office in Silicon Valley, BMW has a long history of innovation in the Bay Area. Projects range from series vehicles to Formula 1, and from research to pre-development – including the iDrive interface, Apps4Automotive, the all-electric Mini-E, and Head-Up Displays.

Level: Beginner
Type: Talk
Tags: Automotive; Embedded Systems; Computer Vision & Machine Vision

Day: Wednesday, 03/18
Time: 15:30 - 15:55
Location: Room LL21F

S5457 - Maximizing Face Detection Performance on GPUs

Paulius Micikevicius devtech engineer, NVIDIA
Paulius Micikevicius is a developer technology engineer at NVIDIA, focusing on performance analysis and optimization. Prior to joining NVIDIA he was an assistant professor of Computer Science at Armstrong Atlantic State University. Paulius holds a PhD in Computer Science from University of Central Florida and BSc from Midwestern State University.

In this talk we look at GPU performance optimization for face detection using various techniques and features, including cascades with Haar-like features, multi-block local binary patterns. For each approach we examine various implementation tradeoffs and their performance limiters, as well as performance dependence on data. We also investigate optimization by combining the approaches and by doing additional work pruning.

Level: Intermediate
Type: Talk
Tags: Computer Vision & Machine Vision; Developer - Performance Optimization

Day: Wednesday, 03/18
Time: 16:00 - 16:50
Location: Room 210B

S5319 - CUDA in Urban Search and Rescue: Mission Planing Module for Icarus Project

Pawel Musialik Programmer and Young Researcher, Institute of Mathematical Machines
Pawel Musialik
Pawel is a graduate of Warsaw University of Technology, currently a Ph.D. candidate on Military University of Technology in Warsaw. Since February 2012 Pawel has been a young scientist and programmer in Institute of Mathematical Machines. His current research topics are semantic maps, 3D point cloud analysis and quantitative and qualitative reasoning. Pawel posseses over 5 years of C++ experience, 2 years as CUDA programmer, and 4 years of experience in academic lectures.

This session will concentrate on the topic of mission planning for search hand rescue personnel and how CUDA can help in this task. Urban Search and Rescue is a challenging and important activity in current society. The ICARUS project (Integrated Components for Assisted Rescue and Unmanned Search operations) concentrates on aiding these efforts by providing robotic components for rescue teams. Adding mobile robots into the tray rises the need for additional planning effort, which would consume a lot of time using classical approach. We will present how this can be prevented by using CUDA based mission planners for solving tasks like path planning, patrol communication relay location etc. A number of CUDA-implemented algorithms will be shown along with example results.

Level: All
Type: Talk
Tags: Machine Learning & Deep Learning; Computer Vision & Machine Vision

Day: Wednesday, 03/18
Time: 16:30 - 16:55
Location: Room 210A

S5331 - GPUs and Machine Learning: A Look at cuDNN

Sharan Chetlur Software Engineer, NVIDIA
Highly-Rated Speaker
Sharan Chetlur
Sharan Chetlur is an engineer at NVIDIA working in the CUDA Libraries and Algorithms Group. He currently works in the fields of Deep Learning and Neural Networks, and is a developer of the cuDNN library. Previously, his focus was on applications in Linear Algebra, working as a developer on the cuBLAS and cuSparse libraries. Sharan holds a Master's Degree in Computer Engineering from the University of Florida.

We describe cuDNN, NVIDIA's in-house CUDA library of deep learning primitives. Addressing the demand from engineers and data scientists, we created a library similar in intent to BLAS, with optimized routines for deep learning workloads. Previously, Neural Network framework developers had to implement these low-level routines for GPUs on an ad-hoc basis, optimizing individual computational kernels by hand and repeating this work as new parallel processors emerged. cuDNN alleviates this burden by providing tuned black box implementations of these functions. The library is easy to integrate into existing frameworks, and provides optimized performance and memory usage across GPU generations. We discuss supported functionality, algorithmic implementation details and performance achieved.

Level: Intermediate
Type: Talk
Tags: Machine Learning & Deep Learning; Computer Vision & Machine Vision; Developer - Tools & Libraries

Day: Thursday, 03/19
Time: 09:00 - 09:50
Location: Room 210A

S5665 - GPUs and the ImageNet Large Scale Visual Recognition Challenge (ILSVRC)

Olga Russakovsky Ph.D. Student , Computer Science, Stanford University
Olga Russakovsky (http://ai.stanford.edu/~olga) is a computer science PhD student at Stanford University advised by Professor Fei-Fei Li. Her main research interest is in computer vision, specifically focusing on large-scale object detection and recognition. For the past two years she has been the lead organizer of the international ImageNet Large Scale Visual Recognition Challenge which was featured in the New York Times, MIT Technology Review, and other international media venues. She has organized several workshops at top-tier computer vision conferences: the ImageNet challenge workshop at ICCV’13 and ECCV’14, the upcoming workshop on Large-Scale Visual Recognition and Retrieval at CVPR’15, and the new Women in Computer Vision workshop at CVPR’15. During her PhD she collaborated closely with NEC Laboratories America and with Yahoo! Research Labs. She was awarded the NSF Graduate Fellowship and the CRA undergraduate research award.
Alex Berg Assistant Professor, UNC Chapel Hill
Alex is interested in all aspects of computer vision and related problems in other fields. His thesis was on shape and object recognition in images using a new take on deform-able templates. He also works on large scale machine learning algorithms for object recognition and detection, image retrieval, recognizing and synthesizing human action in video, recovering human body poses from photographs, detecting and identifying human faces in images, detecting vehicles in images, and more.

This session will provide an introduction to the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), analyze the results of the 2014 challenge and provide a glimpse into what the 2015 challenge has in store. Highlights include a discussion of large-scale image recognition, a history of the ILSVRC and an overview of current techniques and trends in image classification and object detection, as well as the role that GPUs have played in this challenge.

Level: All
Type: Talk
Tags: Machine Learning & Deep Learning; Computer Vision & Machine Vision

Day: Thursday, 03/19
Time: 10:00 - 10:50
Location: Room 210A

S5540 - Building a Life-Size Automultiscopic Display Using Consumer Hardware

Andrew Jones Research Programmer, USC Institute for Creative Technologies
Andrew Jones has been a researcher in the Graphics Lab at the USC Institute for Creative Technologies since 2002. His research has covered reconstructing the Parthenon in Athens, high dynamic range photography, and 3D scanning of human faces, bodies, and performances. Currently, Andrew is finishing up his PhD work on rendering for automultiscopic 3D displays.

Automultiscopic displays allow multiple users to experience 3D content without the hassle of special glasses or head gear. Such displays generate many simultaneous images with high-angular density, so that each eye perceives a distinct and different view. This presents a unique challenge for content acquisition and rendering. In this talk, we explain how to build an automultiscopic display using off-the-shelf projectors, video-splitters, and graphics cards. We also present a GPU-based algorithm for rendering a large numbers of views from a sparse array of video cameras.

Level: Intermediate
Type: Talk
Tags: Visualization - Large Scale & Multi-Display; Augmented Reality & Virtual Reality; Computer Vision & Machine Vision

Day: Thursday, 03/19
Time: 10:30 - 10:55
Location: Room LL21F

S5646 - RGBD Occlusion Detection via Deep Convolutional Neural Networks

Vivek Venugopalan Senior Research Scientist, United Technologies Research Center
Vivek Venugopalan is a Senior Research Scientist with United Technologies Research Center (UTRC). He works in the areas of hardware acceleration and reconfigurable platforms at UTRC for aerospace and building system applications.

Occlusion edge detection is a very challenging task in robotics and unmanned autonomous vehicles. The occlusion edges in images correspond to range discontinuity in the scene from the point of view of the observer and extracting these edges from raw image and videos are very computationally intensive. Deep Learning techniques have largely replaced existing methods for extracting information in similar applications by mapping the problem to large multi-layer neural networks. These techniques rely on utilizing Deep Convolutional Neural Networks (DCNNs) with multiple hidden layers for capturing the local spatial correlations, that help in identifying occlusion edges in images and videos.

Level: Intermediate
Type: Talk
Tags: Machine Learning & Deep Learning; Computer Vision & Machine Vision; Embedded Systems

Day: Thursday, 03/19
Time: 15:00 - 15:25
Location: Room 210A

S5300 - High Quality Real Time Image Processing Framework on Mobile Platforms using Tegra K1

Eyal Hirsch Mobile GPU Leader , SagivTech Ltd.
Mr. Eyal Hirsch has 15 years’ experience as a software developer. Prior to joining SagivTech, Eyal was a member of AMD’s OpenCL team in Israel, developing and working on the OpenCL driver from AMD. Prior to AMD, Eyal was a team leader at Geomage, a leading software company in the Oil&Gas field. Geomage deployed one of the very first commercial GPU clusters in Israel, consisting of over many GPUs. Eyal developed all the GPU implementations and was responsible for all aspects of the GPU life cycle from development through production. Prior to Geomage, Eyal served as a team leader in Cyota, who was later sold to RSA.

Real time image processing involves computationally intensive tasks. It becomes extremely important for mobile platforms equipped with cameras, e.g. wearable devices. Image processing algorithms perfectly suit the GPU architecture, and their implementation on discrete GPUs is well established. Now, as compute enabled GPUs are available on mobile platforms, real time image processing is easier to obtain. SagivTech is a partner in Google's project Tango where it implemented Mantis Vision's depth algorithms on Tegra K1. Hear SagivTech experts on application of computer vision algorithms to the Tegra K1. We share our experience and provide tips on Mobile GPU computing, and demonstrate the advantages of implementing state of the art computer vision algorithms such as FREAK, BRISK and DOG.

Level: All
Type: Talk
Tags: Video & Image Processing; Computer Vision & Machine Vision; Developer - Performance Optimization

Day: Thursday, 03/19
Time: 16:30 - 16:55
Location: Room LL21A

S5585 - Multi-GPU Training for Large-Scale Visual Object Recognition

Wei Xia Research Scientist, Orbeus
Wei Xia
Wei XIA, now working in Orbeus Inc. as a research scientist, expected to obtain his Ph.D. degree in Computer Vision and Machine Learning from National University of Singapore, 2014. He has rich research experience in the field of generic object classification, detection and segmentation. He has won the winner awards of both segmentation and classification competitions in PASCAL VOC Challenge 2012, runner-up award in ILSVRC Challenge 2013, and winner award in ILSVRC Challenge 2014, both of which are among the most impactful competitions in this field. He visited Lund University, Sweden, as a visiting scholar in 2013. He has published many academic papers in top international conferences and journals of computer vision, and was awarded the President Graduate Fellowship (1%) for his achievements in both research and coursework in National University of Singapore. He also served as the reviewer for many international conferences and journals, like ECCV/BMVC/ICASSP/ICPR/ICIP/ACMMM/TCSVT/MVP, etc.. Besides, for industry experience, he was the research intern in Panasonic Singapore Laboratory (2012-2013) and Singapore 2359 media Pte Ltd (2013).

Despite the great progress of the deep learning models (Deep Convolutional Neural Networks) in the field of visual recognition in the past few years, one of the greatest bottlenecks lies in the extremely long training hours (from several weeks to months) to handle tens of millions of training images. The goal of this session is to share the results that we achieved when we used multiple-GPUs installed in one server to speed-up the training process. By configuring 16 GPUs (8 Titan Zs) and optimizing the parallel implementation for the CNN training, up to 14x speed increase is achieved without compromising, and even sometimes boosting, the model's accuracy. Comprehensive experimental results have demonstrated the linear scalability of the proposed multi-GPU training processes.

Level: Intermediate
Type: Talk
Tags: Machine Learning & Deep Learning; Computer Vision & Machine Vision; Video & Image Processing

Day: Thursday, 03/19
Time: 17:00 - 17:25
Location: Room 210A

S5869 - SenDISA: Distributed Intelligent, Video, Sensor & Actuator Analytics Platform for Smart Cities (Presented by Sensen)

Dr. Subhash Challa CEO, Sensen Networks
With a focus on sales & strategy, I help my team to close sales and manage major accounts in a variety of markets including transportation, security, gaming and hospitality. Prior to taking up the full time role as the CEO of SenSen Networks in January 2012, I was a Senior Principal Scientist at NICTA, University of Melbourne and lead a number of ICT for life sciences projects. I started my professional career as a Research Fellow at the University of Melbourne in 1998, where I led a number of tracking and data fusion projects. With deep and passionate interest in taking ideas to usable products, I spent over a decade of my career in R&D & product development. I was the Professor of Computer Systems Engineering at the University of Technology Sydney from 2004-2007.

This session will introduce SenSen's proprietary Video, Sensor and Actuator Analytics Platform (SenDISA) that is used by world's most prestigious and trusted organizations including the Abu Dhabi airport, Singapore police, Roads & Maritime Services Australia, Westgate Bridge, Melbourne, Australia, City of Trondheim, Norway, City of Brisbane, Ipswich city, Manly city and more. We will present how our innovative algorithms powered by the GPGPU based SenDISA platform is enabling Big Data analytic applications by fusing data from Video, Sensor & IoT devises and combining them with other transaction data to deliver smart city solutions across the globe. We will provide insights into the architecture of SenDISA and the market specific Big Data solutions serving different market verticals.

Level: Intermediate
Type: Talk
Tags: Big Data Analytics; Computer Vision & Machine Vision; Video & Image Processing

Day: Thursday, 03/19
Time: 17:30 - 17:55
Location: Room 210B

S5346 - 3 Engineers, 2 Months: The World's First Operating Room Enhanced by High Performance Computing

John Clarke CTO, Cydar Ltd
John Clarke
John has a long history of working at the extremes of computing. His career started with microcomputers deployed at the bottom of oil wells and has now reached the cloud. Along the way he has analyzed the risk of financial transactions, located cell phones using cellular time-of-flight, and led teams building world leading codec hardware IP. John holds a PhD in robotics and machine vision from the University of Oxford.

Learn how a tiny team built and deployed a 424TFlop/s supercomputer in only two months. This supercomputer is used to provide real-time enhanced visualizations to endovascular surgeons during aortic aneurysm repair. Real-time machine vision demands not only massive parallel data processing but also massive dataflows and unavoidably serial processing. In this talk, we describe how three advanced machine vision algorithms were each taken from single high-end GPU and moved to a cloud of GPU servers where the price-performance sweet spot is far from the high end. We describe the design and performance of our work and data distribution systems which are solutions to the cloud specific problems of slow intra-cloud networking and occasional cloud server hiatuses.

Level: All
Type: Talk
Tags: Medical Imaging; Computer Vision & Machine Vision; Supercomputing

Day: Friday, 03/20
Time: 09:00 - 09:50
Location: Room LL21B

S5435 - Using the Power of the GPU to Connect the Web to the Real World

Rob Manson CEO & Co-founder, buildAR.com
Rob Manson
Rob Manson is CEO & co-founder of buildAR.com, the world's first cloud based Augmented Reality Authoring Platform launched in 2009. Rob is one of the editors of the Media Stream Depth Extensions Specification and an Invited Expert with the ISO, W3C and the Khronos Group. He's an active evangelist within the global AR and standards communities and is regularly invited to speak on the topics of the Augmented Web, Augmented Reality, WebRTC and the development of multi-device platforms.

This session will take a detailed look at the various media stream processing pipelines available on the Web Platform and how the optimization of these will be critical in the near future. We will look specifically at how you can use GPUs directly from Javascript for vision and sensor processing. One specific example will explore how Depth Cameras can now be used to extend the web and the influence this may have on the other pipelines too. These streams of sensor and image data now make it possible to connect the web to the real world. GPUs are a key asset for taming this growing flood of data.

Level: Intermediate
Type: Talk
Tags: Web Acceleration; Augmented Reality & Virtual Reality; Computer Vision & Machine Vision

Day: Friday, 03/20
Time: 09:00 - 09:25
Location: Room LL21C

S5208 - Streaming FFTs on Large 3D Microscope Images

Peter Steinbach HPC Developer, Max Planck Institute of Molecular Cell Biology and Genetics
Peter Steinbach
I have studied at Desy Hamburg and Zeuthen, Humboldt University of Berlin and the University of Leipzig of which I received a Diploma thesis in Physics. After that, I conducted a PhD thesis in particle physics by analysing data of the ATLAS experiment at the Large Hadron Collider (CERN. SUI). I am now a High Performance Computing (HPC) Developer at the Max Planck Institute of Molecular Cell Biology and Genetics where I support scientific groups to develop fast software that harnesses the capabilities of today's HPC installations.

Dive deep into efficient and fast memory transfers of multi-gigabyte image data to perform swift iterative deconvolutions of 3D microscope imagery. Through the creation of an open-source GPU deconvolution implementation (github.com/psteinb/libmultiviewnative), I studied various techniques to orchestrate memory copies of multi-dimensional images. I will present concepts, available options and details of efficient memory transfers from host to device memory. I will showcase CUDA/C++ code and discuss my experiences with various CUDA versions on NVIDIA hardware that lead to greater performance than achieved by just performing the calculations on device (2-3x). This work will enable the scientific community to push the limits of processing and handling data gathered by imaging living tissue.

Level: Intermediate
Type: Talk
Tags: Video & Image Processing; Life & Material Science; Computer Vision & Machine Vision; Data Center, Cloud Computing & HPC

Day: Friday, 03/20
Time: 10:00 - 10:25
Location: Room LL21A

S5305 - A 2D Convolution Framework for Extreme Performance Tuning

Alan Wang Compute Architect, NVIDIA
Alan is a GPU Architect in computer vision field at NVIDIA. He is experienced in parallelization, performance modeling and architecture-specific tuning. Alan is currently working on 2D convolution projects. Before joining computer architecture team, Alan works on graphics tracing and FPGA architecture&EDA software.

We propose a 2D convolution framework that (1) maintains a unified abstraction incorporating a series of optimization techniques and (2) can auto-tune the performance on different GPUs. We quantify and analyze the performance impact of using a single strategy which reveals its potential when applied to other application. The experiment shows the algorithm tuned by our framework can have a 5X speed up compared with a native version and a 2X speed up compared with the NPP library on GK20a and GM107.

Level: Intermediate
Type: Talk
Tags: Video & Image Processing; Developer - Performance Optimization; Computer Vision & Machine Vision

Day: Friday, 03/20
Time: 10:30 - 10:55
Location: Room LL21A

S5734 - Khronos API Standards for GPU Accelerated Graphics, Compute and Vision Processing

Neil Trevett Vice President Mobile Ecosystem, NVIDIA
Neil has spent over thirty years in the 3D graphics industry and is currently responsible for driving the advanced apps ecosystem for NVIDIA Tegra. Neil is also the elected President of the Khronos Group industry standards consortium where he initiated the OpenGL ES standard, helped catalyze the WebGL project and chairs the OpenCL working group. Previously, as Vice President of 3Dlabs, Neil was at the forefront of the silicon revolution bringing interactive 3D to the PC, and he established the embedded graphics division of 3Dlabs to bring advanced visual processing to a wide-range of non-PC platforms. Neil was elected President for eight consecutive years of the Web3D Consortium dedicated to creating open standards for communicating real-time 3D on the Internet. Neil graduated from Birmingham University in the UK with a First Class Joint Honors B.Sc. in electronic engineering and computer science and holds several patents in the area of graphics technology.

Discover how over 100 companies cooperate at the Khronos Group to create open, royalty free standards that enable developers to access the power of the GPU to accelerate demanding compute graphics and vision applications. This session includes the very latest updates to popular API standards including OpenGL, OpenCL, OpenVX, and the newly announced next generation OpenGL initiative that together will accelerate the availability of cutting-edge applications such as Augmented and Virtual Reality. Neil is the serving President of the Khronos Group

Level: All
Type: Talk
Tags: Real-Time Graphics; Computer Vision & Machine Vision; Web Acceleration

Day: Friday, 03/20
Time: 11:00 - 11:25
Location: Room LL21D

Talk
 

TUTORIAL

Presentation
Details

S5796 - Image Learning and Computer Vision in CUDA (Presented by ArrayFire)

Peter Andreas Entschev Software Engineer, ArrayFire
Peter Entschev is currently a Software Developer at ArrayFire, where he primarily works on concurrent computer vision problems. He has received his Bachelor's degree in Telecommunication Systems and Master's degree in Computer Science from the Federal University of Technology - Paraná (UTFPR), Brazil. Before joining ArrayFire, he worked on real-time computer vision research at SEW-Eurodrive in Germany and with system administration and development of Linux distributions for the Brazilian Government.

Analyzing a massive data set? Need fast results? Need computer vision algorithms? Not sure when and where to start? The answer is here and now! In this tutorial we will give you the tools to bring your favorite computer vision algorithm to life. In this tutorial we will go over key challenges for implementing computer vision and machine learning algorithms on the GPU. We will walk you through several computer vision algorithms for the GPU (ORB, Fast, SIFT) and give you the hands experience to implement you own algorithms.

Level: All
Type: Tutorial
Tags: Video & Image Processing; Computer Vision & Machine Vision

Day: Tuesday, 03/17
Time: 15:00 - 16:20
Location: Room 210F

Tutorial
 

PANEL

Presentation
Details

S5816 - Google Tango Tablet: Application Rapid Fire Presentations

Larry Yang Lead Product Manager, Project Tango, Google
Larry Yang
Larry Yang is the lead product manager of Project Tango, responsible creating an ecosystem of devices, applications and services that use the Project Tango platform. Before Project Tango he led the product management team for Google Fiber, and before that he was the product manager for the GoogleTV platform and partnerships. Larry has 15 years of experience creating innovative consumer products and services, including leading product management for a new consumer video conferencing business at Cisco Systems and serving as general manager of the Xbox 360 console group at Microsoft. Before his work with consumer electronics, he spent 10 years in various microprocessor development and marketing roles at Sun Microsystems. Larry has a B.S. and M.S. in Electrical Engineering from Stanford University.
Eric Lee CTO & Partner, Left Field Labs
Eric Lee
Eric Lee is CTO and partner at Left Field Labs, a storytelling company based in Venice CA that believes technology is pushing humanity towards a new era of art, culture, and commerce. Eric is driven by building interactive products and platforms that showcase the power of emerging technologies in enhancing the human experience. He has worked in design and engineering for over 15 years, and has led the creation of award winning projects ranging from apps and websites to 3d­printed music players and virtual reality games.
Jeff Schmitz Senior Technical Artist & Graphics Programmer, NVYVE
Jeff  Schmitz
Jeff Schmitz is a senior technical artist and graphics programmer at NVYVE, a leading company in architectural and product visualization, and Unite 2014 award winner for Best Visual Simulation. With over 10 years of experience, Jeff has gained an unparalleled wealth of knowledge and experience in Unity, Unreal Engine and OpenGL. Always looking for challenges and passionate about what he does, Jeff has been working to bring the very latest in cutting edge 3D graphics techniques into the visualization and simulation industry.
Iman Mostafavi Co-Founder and COO, Limbic
Iman  Mostafavi
Iman Mostafavi is co-founder and COO of Limbic, a mobile-focused game studio best known for its hit games Zombie Gunship and Tower Madness. He left the Ph.D. program in Computer Science at U.C. San Diego to pursue his dream of making games. Before Limbic, Iman created visualization software to help neuroscientists study the brain.

Come to hear from the first wave of application developers exploring the unique odometry and depth sensor capabilities of Google's Tango Tablet using Tegra K1. Five leading-edge developers will showcase the applications that they are developing for Tango, how they are using Tango's spatial awareness – and share the lessons learned so far.

Level: All
Type: Panel
Tags: Computer Vision & Machine Vision

Day: Wednesday, 03/18
Time: 09:00 - 09:50
Location: Room LL21A

Panel
 

KEYNOTE

Presentation
Details

S5818 - Deep Learning: What's Next

Andrew Ng Chief Scientist, Baidu
Dr. Andrew Ng is Chief Scientist at Baidu. He leads Baidu Research, which includes three interrelated labs: the Silicon Valley AI Lab, the Institute of Deep Learning and the Big Data Lab. The organization brings together global research talent to work on fundamental technologies in areas such as image recognition and image-based search, speech recognition, natural language processing and semantic intelligence. In addition to his role at Baidu, Dr. Ng is a faculty member in Stanford University's Computer Science Department, and Chairman of Coursera, an online education platform that he co-founded. Dr. Ng is the author or co-author of over 100 published papers in machine learning, robotics and related fields. He holds degrees from Carnegie Mellon University, MIT and the University of California, Berkeley.

Deep Learning has transformed many important tasks, including speech and image recognition. Deep Learning systems scale well by absorbing huge amounts of data to create accurate models. The computational resources afforded by GPUs have been instrumental to this scaling. However, as Deep Learning has become more mainstream, it has generated some hype, and has been linked to everything from world peace to evil killer robots. In this talk, Dr. Ng will help separate hype from reality, and discuss potential ways that Deep Learning technologies can benefit society in the short and long term.

Level: All
Type: Keynote
Tags: Computer Vision & Machine Vision

Day: Thursday, 03/19
Time: 11:00 - 12:00
Location: Hall 3

Keynote
 

HANDS-ON LAB

Presentation
Details

S5647 - Hands-on Lab: DIY Deep Learning for Vision with Caffe

Evan Shelhamer PhD Student / Lead Developer of Caffe, UC Berkeley
Evan Shelhamer
Evan Shelhamer is a PhD student at UC Berkeley advised by Trevor Darrell as a member of the Berkeley Vision and Learning Center. His research is on deep learning and end-to-end optimization for vision. He is the lead developer of the Caffe deep learning framework and takes his coffee black.
Yangqing Jia Research Scientist, Google
Yangqing Jia
Yangqing Jia finished his Ph.D. in computer vision at UC Berkeley supervised by Trevor Darrell in May 2014. He is now a research scientist at Google. His main interests lie in large-scale and cognitive science inspired vision systems. His work focuses on enabling efficient learning of state-of-the-art features and human-like concept generalization from perceptual inputs. He was in the GoogLeNet team that won several of the ILSVRC 2014 challenges. He was also the recipient of the best paper award at ECCV 2014. He is the original author and a core developer of Caffe.

This tutorial is designed to equip researchers and developers with the tools and know-how needed to incorporate deep learning into their work. Both the ideas and implementation of state-of-the-art deep learning models will be presented. While deep learning and deep features have recently achieved strong results in many tasks, a common framework and shared models are needed to advance further research and applications and reduce the barrier to entry. To this end we present the Caffe framework that offers an open-source library, public reference models, and working examples for deep learning. Join our tour from the 1989 LeNet for digit recognition to today's top ILSVRC14 vision models. Follow along with do-it-yourself code notebooks. While focusing on vision, general techniques are covered.

Level: All
Type: Hands-on Lab
Tags: Machine Learning & Deep Learning; Computer Vision & Machine Vision

Day: Wednesday, 03/18
Time: 14:00 - 15:20
Location: Room 211A

S5574 - Hands-on Lab: Applied Deep Learning for Vision, Natural Language and Audio with Torch7

Soumith Chintala Research Engineer, Facebook
Soumith is a Research Engineer at Facebook AI Research. Prior to joining Facebook in August 2014, Soumith worked at MuseAmi, where he built deep learning models for music and vision targeted at mobile devices. In the past, Soumith worked on state-of-the-art deep learning models for pedestrian detection, natural image OCR, depth-images among others while driving his research heavily using CUDA and multiple GPUs.

This is a hands-on tutorial targeted at machine learning enthusiasts and researchers and covers applying deep learning techniques on classifying images, videos, audio and natural language data. The session is driven in Torch: a scientific computing platform that has great toolboxes for deep learning and optimization among others, and fast CUDA backends with multi-GPU support. Torch is supported by Facebook, Google, Twitter and a strong community who actively open-source their code and packages.

Level: Beginner
Type: Hands-on Lab
Tags: Machine Learning & Deep Learning; Computer Vision & Machine Vision; Signal & Audio Processing

Day: Wednesday, 03/18
Time: 15:30 - 16:50
Location: Room 211A

S5895 - Accelerating Computer Vision Algorithms on CUDA-Capable Embedded Platforms

Alexander Smorkalov Senior Software Engineer, Itseez
Alexander Smorkalov is a Senior Software Engineer at Itseez, leading a team of developers maintaining the OpenCV library on mobile and embedded platforms. He received a master's degree from Nizhny Novgorod State University, in Russia. His professional interests include: system programming, computer vision, and performance acceleration of multimedia applications. Alexander is a contributor to OpenCV library development, and has worked on porting it to Android, Windows RT and Embedded Linux platforms.

Computer Vision is becoming a part of everyday life, and is an integral feature of modern smart devices. The OpenCV library (http://opencv.org) provides powerful tools for initial algorithm development, and enables deployment on a wide spectrum of target platforms, ranging from servers to embedded and mobile devices. Despite rapid growth in available computational power, application performance and responsiveness still remain a key challenge. In this hands-on lab with Jetson TK1 dev kits, we will study how an OpenCV-based application can be ported to the NVIDIA Jetson embedded platform, and then accelerated using CUDA technology. We will start with a computational photography application that uses only CPU cores for processing. Next, we will profile the application and see that the CPU is not powerful enough to process high resolution images with acceptable performance. After that, we will replace detected hotspots with CUDA implementations, enabling decent processing speed. Finally, some tips and best practices for cross-platform development will be given.

Level: Beginner
Type: Hands-on Lab
Tags: Embedded Systems; Computer Vision & Machine Vision

Day: Friday, 03/20
Time: 09:30 - 10:50
Location: Room 211A

S5850 - Hands-on Lab: Project Tango UnitySDK & C/C++ API Introduction

Jason Guo Developer Relations Engineer, Google
Jason Guo is a Developer Relations Engineer at Google Project Tango team. He received his Master degree from Carnegie Mellon University, Entertainment Technology Center. Jason holds a background in human computer interaction, digital media, and computer graphics. With a great passion of exploring new interactive experiences, Jason has been involved with multiple AR/VR projects related to depth sensing and 6 DOF motion tracking. Jason is also one of the major contributors of Project Tango examples and demos.
Chase Cobb Software Engineer, Google
Chase Cobb is a Software Engineer on Google's Project Tango team. Chase has four years of experience in game development and has worked on everything from mobile titles to triple-A console games. His expertise is in mobile game development using the Unity game engine. Over the last year, Chase has been developing the Project Tango Unity SDK and has created many technical demos, as well as sample projects.

UnitySDK Session (40 min.): Part I of this session walks the user through porting an existing game to the Tango platform, using motion tracking, and demonstrating the best developer practices. We will introduce developers to the UI/UX library and demonstrate the benefits it provides in terms of user feedback. C/C++ API Session (40 min.): In Part II, developers will create a JNI based motion tracking example from an Android Studio skeleton project. We will walk through Tango API, including configuration, Android lifecycle, and implementing basic motion tracking functionalities. Seating is limited. Please fill out the form at the following link to reserve your seat and a Project Tango device here: Reserve Your Spot

Level: All
Type: Hands-on Lab
Tags: Augmented Reality & Virtual Reality; Computer Vision & Machine Vision; Game Development

Day: Friday, 03/20
Time: 13:30 - 14:50
Location: Room 211A

Hands-on lab