Title: "Butterfly Effects" in Perceptual Development
Abstract: A young child's difficulties with face recognition led us to explore how atypicalities in early perceptual development can impact later proficiencies. In the specific context of children who have been created for congenital blindness, we and others have found some consistent deficits in post-operative face recognition performance. These deficits have typically been attributed to the pre-operative deprivation the children experienced during 'sensitive periods' of development. However, we suggest that there may be an additional factor at work: Differences in *post-operative* visual experiences of newly-sighted children relative to those of typically developing infants. These differences may skew early visual learning and lead to abnormalities in recognition processes. We call these 'butterfly effects' given that they involve potentially large consequences of small initial perturbations in visual experience. Such hypothesized butterfly effects have relevance for three goals: 1. accounting for observed deficits in newly-sighted children, 2. understanding why some aspects of normal visual development unfold in the way they do, and 3. formulating effective training regimens for computational vision systems.
Bio: Pawan Sinha is a tenured professor of vision and computational neuroscience in the Department of Brain and Cognitive Sciences at MIT. He received his undergraduate degree in computer science from the Indian Institute of Technology, New Delhi and his Masters and doctoral degrees in Artificial Intelligence from the Department of Computer Science at MIT. He has also had extended research stays at the University of California, Berkeley, Xerox Palo Alto Research Center, the Sarnoff Research Center in Princeton, and the
Max-Planck Institute for Biological Cybernetics in Tübingen, Germany.
Prof. Sinha’s research interests span neuroscience, artificial intelligence, machine learning, and public health.Using a combination of experimental and computational modeling techniques, research in his laboratory focuses on understanding how the human brain learns to interpret and recognize complex sensory signals, such as images and videos. Prof. Sinha's experimental work on these issues involves studying healthy individuals and also those with neurological disorders such as autism. The goal is not only to derive clues regarding the nature and development of human visual skills, but also to create more powerful and robust AI systems.
Prof. Sinha founded Project Prakash in 2005 with the twin objectives of providing treatment to children with severe visual impairments and also understanding mechanisms of learning and plasticity in the brain. This project has provided insights into several fundamental questions about brain function (even some that had remained open for the past three centuries) while also transforming the lives of many blind children by bringing them the gift of sight.
Prof. Sinha is a recipient of the Pisart Vision Award from the Lighthouse Guild, the inaugural Asia Game Changers Award, the PECASE – US Government’s highest award for young scientists, the Alfred P. Sloan Foundation Fellowship in Neuroscience, the John Merck Scholars Award for research on developmental disorders, the Jeptha and Emily Wade Award for creative research, the Troland Award from the National Academies, the Distinguished Alumnus Award from IIT Delhi, and the Oberdorfer Award from the ARVO Foundation. His laboratory’s research has appeared in several leading scientific journals including Nature, Science, Nature Neuroscience, Proceedings of the National Academy of Sciences and Proceedings of the Royal Society. This work has been profiled in several media channels including the New York Times, Washington Post, Wall Street Journal, New Yorker, ABC News, New Scientist, National Public Radio and TIME magazine.
Prof. Sinha’s teaching has consistently received high ratings from students. He has received the Dean’s Award for Advising and Teaching at MIT. To enhance scientific literacy on a broader scale, he has written a series of newspaper articles on various aspects of normal and abnormal brain function with the goal of bringing the latest findings in neuroscience to the attention of the general public.
Prof. Sinha has served on the program committees for prominent scientific conferences on object and face recognition and is currently a member of the editorial board of ACM's Journal of Applied Perception. He is a founder of Imagen Inc, a company that applies insights regarding human image processing to challenging real-world machine vision problems. Imagen was the winner of the MIT Entrepreneurship competition. Prof. Sinha was named a Global Indus Technovator, and was also inducted into the Guinness Book of World Records for creating the world’s smallest reproduction of a printed book.
Title: Using Data while Protecting Privacy in the Digital Era
Abstract: We live in a time when information about most of our movements and actions is collected and stored in real time. The availability of large-scale behavioral data dramatically increases our capacity to understand and potentially affect the behavior of individuals and collectives.
The use of this data, however, raises legitimate privacy concerns. Anonymization is meant to address these concerns: allowing data to be fully used while preserving individuals' privacy. In this talk, I will first discuss how traditional data protection mechanisms fail to protect people's privacy in the age of big data. More specifically, I will show how the mere absence of obvious identifiers such as name or phone number or the addition of noise are not enough to prevent re-identification. Second, I will describe what I see as a necessary evolution of the notion of data anonymization towards an anonymous use of data. I will then conclude by discussing some of the modern privacy engineering techniques currently developed to allow large-scale behavioral data to be used while giving individual strong privacy guarantees.
Bio: Yves-Alexandre de Montjoye is an Associate Professor at Imperial College London. He currently is a Special Adviser on AI and Data Protection to EC Justice Commissioner Reynders and a Parliament-appointed expert to the Belgian Data Protection Agency (APD-GBA). In 2018-2019, he was a Special Adviser to EC Competition Commissioner Vestager co-authoring the Competition Policy for the Digital Era report. His research has been published in Science and Nature Communications and has enjoyed wide media coverage (BBC, CNN, New York Times, Wall Street Journal, Harvard Business Review, etc.). His work on the shortcomings of anonymization has appeared in reports of the World Economic Forum, FTC, European Commission, and the OECD. Yves-Alexandre worked for the Boston Consulting Group and acted as an expert for both the Bill and Melinda Gates Foundation and the United Nations. He received his PhD from MIT in 2015 and obtained, over a period of 6 years, an M.Sc. from UCLouvain in Applied Mathematics, an M.Sc. (Centralien) from Ã‰cole Centrale Paris, an M.Sc. from KULeuven in Mathematical Engineering as well as his B.Sc. in engineering from UCLouvain.
Title: Human-centric Vision in the Real World: Challenges and Opportunities
Abstract: Human-centric vision, or the ability of machines to perceive and analyze humans, is the gateway to human-computer interaction (HCI). It is foundational technology, which is an essential component of any AI system with wide-ranging applications across product categories. From our nascent successes of shipping face detection solutions in digital cameras some fifteen years ago, HCI has witnessed phenomenal growth. Today, many successful and robust HCI technologies are ubiquitous in everyday products from cell phones to cars, including those for face, gaze and body tracking, and face recognition. Much of this growth has been driven by the availability of large and diverse datasets, adequate compute power and the advent of deep learning. However, we are only just getting started and much exciting new work lays ahead. In this talk, I will describe several cutting-edge HCI technologies that we have invented at NVIDIA along with our journey of converting them into successful real-world shipped products. I will describe open challenges encountered along the way, which present rich opportunities for continued research and development in the field. Finally, I will close with thoughts on what lays ahead, both in terms of unexplored/underdeveloped fundamental HCI technologies and the exciting novel applications areas that they can enable in the future.
Bio: Shalini De Mello is a Principal Research Scientist and Lead in the Learning and Perception research group at NVIDIA. She has been a researcher there since 2013. She received doctoral and master’s degrees in Electrical and Computer Engineering from the University of Texas at Austin in 2008 and 2004, respectively. Her primary research interests are in designing computer vision and machine learning algorithms for perceiving and analyzing humans -- particularly their face, gaze and gestures. Over her career, she has researched and invented technologies for face recognition and detection, 2D and 3D head pose estimation, gaze estimation, conditional 2D facial image synthesis, facial action unit estimation and gesture recognition, among others. In addition, she is interested in learning AI algorithms with limited supervision and data and has recently been pushing the boundaries of what is possible with this ideology in the context of human-centric vision. She holds over 30 patents and her research has led to the development of or incorporated into several innovative human-centric NVIDIA products, including NVIDIA DriveIX for intelligent AI-based automotive interfaces and NVIDIA Maxine for smart video conferencing solutions.
Title: Customer Behavior (Demand) Prediction in Brick-and-Mortar Stores from Product Recognition
Abstract: Although e-commerce accounts for more and more of total retail sales over the past decades, traditional brick-and-mortar stores still account for over 70% of total sales. Certain categories of retail businesses, such as convenience stores, still offer a unique value that online retailers cannot compete with: the ability to quickly and easily walk in and out to obtain what you want in just a few minutes.
Meanwhile, how to increase the operation efficiency of brick-and-mortar stores by leveraging digital transformation, akin to that of e-commerce, caught the attention from industry, sometimes dubbed the name “new retail.” To achieve this, the first step is to transform the real-world physical information from the brick-and-mortar stores into digital form, where computer vision plays an essential role. Then, we may build and apply mathematical models to infer key customer behaviors such as their demands of different products, which naturally leads to optimized decisions of key operation steps such as assortment planning.
In this talk, I will elaborate on and share how we may directly infer customer shopping behaviors from the product display map (combined with product sales records). Such product display maps represent how each product is presented in the shelf, which are obtained through a full digitization of the storefront, i.e., recognizing shelves and estimating their poses, recognizing products along with in-shelf positions using state-of-the-art computer vision technologies. The system built upon these technologies are running 24/7 in our storefronts at Convenience Bee, a new and fast-growing convenience store chain in mainland China. I will use real data from our stores to illustrate the technologies behind.
Bio: Gang Hua is the Chief Technology Officer of Convenience Bee, and Chief Scientist of its AI Research Branch, Wormpex AI Research. His research focuses on computer vision, pattern recognition, machine learning, robotics, towards general Artificial Intelligence, with primary applications in cloud and edge intelligence, and currently with a focus on new retail intelligence.(See http://www.ganghua.org/ ).
Before that, he served in various roles at Microsoft (2015-18) as the Science/Technical Adviser to the CVP of the Computer Vision Group, Director of Computer Vision Science Team in Redmond and Taipei ATL, and Senior Principal Researcher/Research Manager at Microsoft Research. He was an Associate Professor at Stevens Institute of Technology (2011-15). During 2014-15, he took an on leave and worked on the Amazon-Go project. He was an Visiting Researcher (2011-14) and a Research Staff Member (2010-11) at IBM Research T. J. Watson Center, a Senior Researcher (2009-10) at Nokia Research Center Hollywood, and a Scientist (2006-09) at Microsoft Live labs Research. He received his Ph.D. degree in ECE from Northwestern University in 2006.
He is an IEEE Fellow, an IAPR Fellow, and an ACM Distinguished Scientist for contributions to Computer Vision and Multimedia. He is the recipient of the 2015 IAPR Young Biometrics Investigator Award for contributions to Unconstrained Face Recognition from Images and Videos, and a recipient of the 2013 Google Research Faculty Award.
He is a General Chair of ICCV25, a Program Chair for CVPR19&22, and has served as Area Chairs for CVPR21, ECCV20 CVPR17, ICCV17, ACM MM17&15&12&11, CVPR15, ICCV11, ICIP12&13&15, ICASSP12&13. He is currently serving as an Associate Editors for T-PAMI, IJCV, CVIU and MVA. He was an Associate Editor in Chief for CVIU, an Associate Editor for IEEE T-IP (2012-15, 2017-19), IEEE T-CSVT (2015-19), and Vision and View Department Editor for IEEE Multimedia Magazine (2011-16).
He has published more than 200 peer reviewed papers in top conferences such as CVPR/ICCV/ECCV, and top journals such as T-PAMI and IJCV. He holds 20 issued U.S Patents and also has more than 20 more U.S. Patents Pending.
Title: How Does Our Body Talk, Think, and Interact in Face-to-Face Language Use?
Abstract: Most research on language has focused on spoken and written language only. However when we use language in face-to-face interactions we use not only speech but also use our bodily actions, such as gestures in meaningful ways to communicate our messages and in ways closely linked to the spoken aspects of our language. For example we can enhance or complement our speech with a drinking gesture, as we say “we stayed up late last night”. In this talk I will summarize research that investigates how such meaningful bodily actions are recruited in using language as a dynamic, adaptive and flexible system and how gestures interact with speech during production and comprehension of language and in dialogic contexts at the behavioral, cognitive, and neural levels. I will also touch upon how novel kinematic analyses techniques using OPEN POSE and KINECT help us see the “invisible” in visible communicative expressions. Overall I will claim that multimodal uses of language contribute significantly to how language can be recruited as a flexible and adaptive system for enhancing human cognition and communication and that we need more technological advances to understand the complexities of this uniquely human capacity
Bio: Prof. Dr. Asli Özyürek is a Professor at Radboud University Nijmegen and a Principle Investigator at Donders Institute for Brain, Behavior and Cognition and at the Max Planck Institute for Psycholinguistics . She is the Director of the Multimodal Language and Cognition lab.Özyürek completed her BA in Psychology at Bogazici University in Istanbul Turkey and earned a double PhD degree in Psychology and Linguistics from University of Chicago. She has received many career grants such as ERC Starting Grant, Dutch Science foundation VIDI and VICI grants, hosted many Marie Curie Individual Fellowships. She is an elected member of Academia Europea and has received an ASPASIA award from Dutch Science Foundation and Young Scientist award from Turkish Science Foundation. She has publications in Science, PNAS, Cognition, Psychological Science, J of Cognitive Neuroscience, Neuroimage, Cerebral Cortex among others.
Her research focuses on the role language plays, as a dynamic, adaptive and flexible system, in human cognition and communication. Focusing on cross-linguistic and multimodal uses of language she investigates the neural, cognitive and social foundations of how language is used flexibly in embodied and situated contexts through our bodily actions as in the use of gestures used during speaking by hearing and sign languages used by deaf communities.
Title: "Building for a Billion: Learning from India's Population Scale Digital Infrastructure Journey"
Abstract: "India has been building a set of digital building blocks from identity to payment to credentialing to data across domains over the last decade. This loosely coupled, interoperable, federated set of digital building blocks, generally known as India Stack, has allowed India to leapfrog in the areas of social welfare and financial inclusion. Most of these are in the form of protocols and open APIs allowing the market, society, and government actors to build diverse innovation in the form of apps and platforms on top. India is now extending these design principles to further build additional building blocks in education, health, and other domains to make nonlinear transformation at a billion people scale. In this talk, Pramod, who has been the Chief Architect for most of these digital building blocks, speaks about the big picture, learnings, and the impact."
Bio: Dr. Pramod Varma is the CTO of EkStep Foundation, a not-for-profit creating learner-
centric, open source, digital public goods under Project Sunbird (sunbird.org). These digital building blocks are used to provide learning opportunities to 200 million children in India and also in other learning, skilling, and capacity building efforts in India and other countries.
In addition, he continues to be the Chief Architect of Aadhaar, India's digital identity program that has successfully covered more than 1.2 billion people in a short span of 7 years. He also helped architect various India Stack layers such as eSign, Digital Locker, and Unified Payment Interface (UPI), and Data Empowerment and Protection Architecture (DEPA), all of which are now working at population scale in India. He is also the co-founder of Beckn Foundation, the genesis author and angel donor of the open source beckn protocol (becknprotocol.io) for creation of decentralized open discovery, fulfillment, and commerce networks.
He is an advisor to Unique Identification Authority of India (UIDAI), National Payment Corporation (NPCI), Goods and Services Tax Network (GSTN), National Health Authority (NHA), Securities and Exchange Board of India (SEBI), Turing Institute Identity Initiative in the UK, and helps with many digital public infrastructure initiatives in India and across the globe.
Pramod holds a Master’s and Ph.D. degree in Computer Science along with a second Master’s in Applied Mathematics. His interests include Internet scale distributed architectures and intelligent systems. He is passionate about technology, science, society, and teaching.
Title: Masked face recognition: from human performance to presentation attacks.
Abstract: The pandemic has led to the wide use of face masks, essential to keep the pandemic under control. The effect of mask-wearing on face recognition in a collaborative environment is a sensitive issue with implications on many aspects of the identity management processes. This talk will give an insight into the effect of facial masks on automatic face recognition and will build on that to discuss the validity of evaluation protocols, the possible efforts to reduce the effect of masks, the human expert performance in masked face recognition, and the effect of masks on face presentation attack detection.
Bio: Naser Damer is a senior researcher at the competence center Smart Living & Biometric Technologies, Fraunhofer IGD. He received his PhD in computer science from the Technischen Universität Darmstadt (2018). He is a researcher at Fraunhofer IGD since 2011 performing research management, applied research, scientific consulting, and system evaluation. His main research interests lie in the fields of biometrics, machine learning and information fusion. He published more than 80 scientific papers in these fields. Dr. Damer is a Principal Investigator at the National Research Center for Applied Cybersecurity ATHENE in Darmstadt, Germany. He lectures on Biometric recognition and security, as well as on Ambient Inteligence at the Technischen Universität Darmstadt. Dr. Damer is a member of the organizing teams of a number of conferences, workshops, and special sessions, including being a programm co-chair of BIOSIG and a publication co-chair of IWBF2020. He serves as a reviewer for a number of journals and conferences and as an associate editor for the Visual Computer journal. He represents the German Institute for Standardization (DIN) in the ISO/IEC SC37 international biometrics standardization committee.
Title: Why Regulate Facial Recognition and Surveillance Technologies?
Abstract: Face recognition as a biometric is widely used, and provides benefits of convenience, improved accessibility, and better law enforcement. However, it is open to numerous criticisms as a surveillance technology, including problems associated with its various biases, with its potential misuses by corporations and oppressive governments, and with the psychological effects it has on the population, when people are put under constant surveillance. While there are other technologies that are potentially much more ubiquitous and powerful for surveillance (such as mobile phones, or web browsers), face recognition technology represents an observing eye, and has an uncanny symbolic power. In the last few years, we have seen an increased pressure from the public to regulate the use of face recognition in various scenarios, across the globe, in parallel with an accelerated technology push for developing systems at massive scales. In this talk, I would like to continue the discussions that started in FG'20 about the role played by the researchers and developers of technology in this debate, in the light of recent developments.
Bio: Albert Ali Salah is professor and chair of Social and Affective Computing at the Information and Computing Sciences Department of Utrecht University, and adjunct professor at the Department of Computer Engineering of Boğaziçi University. He has co-authored over 200 publications on pattern recognition, multimodal interfaces, and computer analysis of human behavior. He serves as a Steering Board member of ACM ICMI, IEEE FG, and eNTERFACE, as an associate editor of journals including IEEE Trans. on Cognitive and Developmental Systems, IEEE Trans. Affective Computing, and Int. Journal on Human-Computer Studies. Albert was the scientific coordinator of the Data for Refugees (D4R) Challenge that used large scale mobile data to improve the living conditions of millions of Syrian refugees in Turkey. He is a senior research affiliate of DataPop Alliance, a senior member of IEEE and ACM, and he is on Twitter (@SzassTam).