C. V. Jawahar
Professor at International Institute of Information Technology (IIIT), India.
Towards Multimodality in Perception Tasks
Abstract: A number of perception tasks (especially in vision,
language and speech) are solved today with very high accuracy using
data-driven techniques. We are now seeing the emergence of a set
of more natural tasks that are inherently multimodal (eg. VQA).
They are closer to the way we interact with the world around us or
perceive our sensory inputs. As a result, today’s AI systems
(aka. deep learning architectures) are also becoming increasingly
capable of jointly processing inputs from different modalities.
In fact, they enjoy processing multiple modalities (eg. text, speech
and visual) together for superior solutions. Such algorithms
are also now discovering interesting correlations across the
modalities. In this talk, we especially focus on the interplay
between text, speech and visual content in talking face videos.
We present some of the recent results (including some from our
own research) and discuss ongoing trends and the challenges in
front of the community. For example, how many lip movements
can explain the speech that is produced and the reverse? How does
the multimodal nature of our inputs open up new avenues and
innovative solutions? Can we substitute or supplement one modality
for the other? Initial results hint at new possibilities in education,
healthcare and assistive technologies.
Professor of Affective Intelligence and Robotics (AFAR) and the Head of the AFAR Lab at the University of Cambridge’s Department of Computer Science and Technology, UK.
Artificial Emotional Intelligence: Quo Vadis?
Professor in the Department of Computer Science at the University of Texas at Austin and a Research Director in Facebook AI Research (FAIR).
Abstract: Perception systems that can both see and hear have great potential to unlock problems in video understanding, augmented reality, and embodied AI. I will present our recent work in audio-visual (AV) perception.
First, we explore how audio’s spatial signals can augment visual understanding of 3D environments. This includes ideas for self-supervised feature learning from echoes, AV floorplan reconstruction, and active source separation, where an agent intelligently moves to hear things better in a busy environment. Throughout this line of work, we leverage our open-source SoundSpaces platform, which allows state-of-the-art rendering of highly realistic audio in real-world scanned environments.
Next, building on these spatial AV ideas, we introduce new ways to enhance the audio stream – making it possible to transport a sound to a new physical environment observed in a photo, or to dereverberate speech so it is intelligible for machine and human ears alike. Finally, I will overview Ego4D, a massive new egocentric video dataset built via a multi-institution collaboration that supports an array of exciting multimodal tasks.
Marleen de Bruijne
Professor of AI in medical image analysis at Erasmus MC, The Netherlands
Learning with less in medical imaging
Abstract: Supervised learning approaches have had tremendous success in medical imaging in the past few years. Automated analysis using convolutional neural networks is now in many cases as accurate as the assessment of an expert observer. A major factor still hampering the adoption of these techniques in practice is that it can be very expensive, time-consuming, or even impossible to obtain sufficiently many representative and well-annotated training images to train reliable models. On the other hand, weaker labels are often readily available, for instance in the form of a radiologist’s assessment of the presence or absence of certain abnormalities. In this talk, we will discuss various approaches to exploit such information and to make machine learning techniques work in real life situations, where (annotated) training data is limited, available annotations may be wrong, data is highly heterogeneous, and training data may not be representative for the target data to analyze. I will present examples in several medical imaging applications.
Dr. Xian-Sheng Hua
Vice President of Alibaba Group, Head of City Brain Lab of DAMO Academy.
Scalable Real-World Visual Intelligence System - from Algorithm to Platform to Application
Institute of Automation Chinese Academy of Sciences (CASIA), China
Iris Recognition: Progress and Challenges
King-Sun Fu Prize Lecture
Abstract: Iris recognition has proven to be a most reliable biometric solution for personal identification and has received much attention from the pattern recognition community. However, it is far from being a solved problem as many open issues remain to be resolved to make iris recognition more user-friendly and robust. In this talk, I will present an overview of our decades’ efforts on iris recognition, including iris image acquisition, iris image pre-processing, iris feature extraction and security issues of iris recognition systems. I will discuss our most recent work on light-field iris recognition and all-in-focus simultaneous iris recognition of multiple people at a distance. Examples will be given to demonstrate the successful routine use of our work in a wide range of fields such as mobile payment, banking, access control, welfare distribution, etc. I will also address some of the remaining challenges as well as promising future research directions before closing the talk.
MSU Foundation Professor, Data Science and Engineering Lab, Michigan State University, USA
Graph Neural Networks: Models, Trustworthiness, and Applications
J. K. Aggarwal Prize Lecture
Abstract: Graph Neural Networks (GNNs) have shown their power in graph representation learning. They have advanced numerous recognition and learning tasks in many domains such as biology and healthcare. In this talk, I will first introduce a novel perspective to understand and unify existing GNNs that paves a principled and innovative way to design new GNN models. As GNNs become more pervasive, there is an ever-growing concern over how GNNs can be trusted. Then I will discuss how to build trustworthy GNNs. Given that graphs have been leveraged to denote data in real-world systems, I will finally demonstrate representative applications of GNNs.
Prof. Yunhong Wang
School of Computer Science and Engineering, Beihang University, China
Towards Practical Biometrics: Face and Gait
Maria Petrou Prize Lecture
Abstract: Biometrics are unique physical or behavioural characteristics that can be adopted for identification. In the last few years, substantial advancements have been made in this field with the development of deep learning theories and technologies. This is evidenced by not only the high results on large-scale benchmarks but also the attempts accounting for soft-biometrics, including gender, expression, age, etc. Meanwhile, recent studies show additional challenges in uncontrolled conditions, such as severe variations in scale, pose, illumination, occlusion and cluttered background, which should be well handled for real-world applications. This talk focuses on two typical representatives, face recogniiton and gait recognition, with dedicatedly designed deep learning based methodologies towards practical use, covering the tasks from identity recognition to attribute analysis, presenting the latest progress on the interpretability and robustness of deep neural networks. Finally, some perspectives are discussed to facilitate future research.