The Vidal Lab conducts research on machine learning, computer vision, dynamical systems, robotics, and biomedical data science. Our machine learning research focuses on deep learning theory, trustworthy AI, deep generative models, parsimonious representation learning, continual learning, and optimization. In computer vision, we focus on vision–language models, 3D vision, video analysis, and human motion analysis. In robotics and dynamical systems, we focus on linear, hybrid and multi-agent systems. These techniques drive medical applications in radiology, cardiology, hematology, and surgery. As part of the Center for Innovation in Data Engineering and Science (IDEAS), the Center for AI-Enabled Systems: Safe, Explainable & Trustworthy (ASSET), and the General Robotics, Automation, Sensing and Perception Lab (GRASP), we integrate theoretical breakthroughs with practical, impactful solutions.
Topics
Deep Learning Theory
We study the mathematical principles behind modern neural networks. Our lab focuses on understanding the optimization landscape and generalization properties of positively homogeneous networks, analyzing the learning dynamics and implicit bias of gradient-based methods, and providing principled explanations to deep learning phenomena such as neural collapse, low-rank adaptation, and learning on the edge of stability. Through this work, we aim to connect fundamental theory with the practical behavior of large-scale neural networks.
Trustworthy AI
We develop AI systems that are interpretable, robust, and reliable for high-stakes applications. Our information pursuit framework enables explainable-by-design models that make predictions from a sequence of informative question–answer pairs. Our parsimonious concept engineering method improves the safety of multimodal generative models through targeted interventions in their latent spaces. Our theoretical analysis of the conditions under which robust classifiers are guaranteed to exist enables the design of provably robust classifiers. Our algorithms for generating adversarial attacks stress test the reliability of visual classifiers and large language models. This research lays the foundation for trustworthy AI in high-stakes applications.
Deep Generative Models
We explore the theory and practice of deep generative models. Our contributions include convergence guarantees for generative model inversion and transformer-based diffusion models. Our work on flow matching aligns latent distributions with the multimodal structure of real data, improving optimization and convergence. We have also developed methods for accurate diffusion-based image restoration and for fast, high-quality text-to-image generation via retrieval-augmented generation with few-step diffusion models. These efforts provide principled methods for efficient and reliable generative modeling.
Parsimonious Representation Learning
We focus on discovering sparse and low-rank structure in high-dimensional data. Our lab pioneered the development of methods for clustering data in a union of subspaces, including methods such as Generalized Principal Component Analysis (GPCA), Sparse Subspace Clustering (SSC), and Dual Principal Component Pursuit (DPCP). We also develop nonconvex matrix and tensor factorization methods with guarantees of global optimality. This work provides scalable, robust, and geometrically principled tools for representation learning.
Continual Learning
We build models that learn tasks sequentially without forgetting. By linking continual learning with adaptive filtering, we leverage classical theory to inform new algorithms. Frameworks like the Ideal Continual Learner (ICL) and LoRanPAC unify existing methods, provide theoretical guarantees, and maintain stability over long task sequences. Our goal is to develop continual learning methods that are both theoretically sound and practically effective.
Optimization
We analyze the convergence and global optimality of optimization algorithms for machine learning, including problems such as subspace clustering, nonconvex matrix and tensor factorization, dictionary learning, generative model inversion, and deep learning. We also study distributed optimization on Riemannian manifolds and accelerated methods for both smooth and nonsmooth problems from a dynamical systems perspective. Through this work, we unify theory and practice to deliver principled, scalable optimization algorithms for machine learning.
Vision–Language Models
We develop vision–language models that connect visual data with semantic concepts to enable interpretable and controllable multimodal AI systems. Our work includes methods for visual grounding, image captioning, and multimodal generation that explicitly model the relationship between images, language, and structured knowledge. We have developed concept-based interpretable classification frameworks that explain model predictions through human-understandable visual attributes. We have also introduced approaches such as knowledge pursuit prompting for zero-shot multimodal synthesis and concept-based image editing methods that enable targeted manipulation of visual content. Our research further explores hierarchical concept representations to improve interpretability and robustness. This work advances multimodal AI systems that are interpretable, controllable, and robust.
Human Motion Analysis
We develop methods for understanding human motion from video and skeletal data. Our work spans action recognition, detection, and segmentation, as well as human pose estimation, combining geometric modeling, structured temporal representations, and deep learning. Early contributions to action recognition combined kernels on dynamical systems with optical-flow-based feature representations. Subsequent work established benchmarks such as the MHAD dataset and introduced methods based on informative skeletal joints and moving poselets. Later work developed spatio-temporal convolutional networks and temporal convolutional models for action detection and segmentation, as well as visual-symbolic representations for recognizing complex group activities. More recently, we have developed methods for human pose estimation, disentangled motion representation learning, and multimodal motion representations that support recognition, retrieval, and generation of human movements. This work provides principled and data-driven approaches for fine-grained human behavior understanding.
Video Analysis
We study methods for modeling dynamic visual phenomena in videos. Our work includes early contributions to modeling dynamic textures through dynamical systems formulations, enabling registration, recognition, and tracking of complex temporal patterns. We have also developed efficient approaches for semantic video segmentation that capture the spatiotemporal structure of visual processes. More recent work explores the robustness of video classification systems through adversarial training, and the generation of perpetual videos of dynamic scenes while enforcing 3D consistency. By combining geometric representations, dynamical systems, and deep learning, this research provides principled tools for understanding complex temporal patterns in visual data.
Image Analysis
We develop algorithms for image segmentation, restoration, and visual representation learning. Our work includes foundational contributions to image segmentation and alpha matting based on discrete optimization, semantic segmentation methods that combine structured output learning and conditional random fields, and visual representation learning methods that improve image categorization and clustering. We have also developed techniques for image deblurring and restoration based on both classical image priors and modern priors derived from diffusion models. This research provides robust and scalable tools for extracting meaningful structure from visual data and improving image quality for downstream tasks.
Geometric Vision
We develop geometric methods for recovering 3D structure and motion from images and video. Our early work introduced algebraic and geometric approaches for multibody motion segmentation, including formulations based on multibody epipolar geometry and higher-order tensor representations. This work also led to widely used benchmarks such as the Hopkins 155 dataset. Subsequent contributions developed robust methods for motion segmentation, 3D registration, and object pose estimation, as well as optimization-based approaches for geometric estimation using tools such as dual principal component pursuit and iteratively reweighted least squares. Our recent work continues to advance robust geometric estimation for rotation search and 3D reconstruction. This research establishes mathematically grounded approaches to geometric vision with strong robustness and optimality properties.
Learning and Analyzing Linear, Bilinear and Hybrid Dynamical Systems
We develop rigorous methods for analyzing and learning dynamical systems. Our work on hybrid systems established conditions for observability and algorithms for system identification, providing rigorous tools for modeling systems that exhibit both continuous and discrete behaviors. We have also developed realization theory for stochastic jump-Markov linear systems and stochastic bilinear systems, providing fundamental characterizations of when such systems admit minimal and identifiable representations. More recent work studies observability and identification of linear systems with sparse inputs, developing methods to recover system dynamics when inputs are high-dimensional but only sparsely active. These contributions provide new tools for identifying dynamical systems from limited data.
Geometry and Distances of Spaces of Dynamical Systems
We develop principled distances and similarity measures for comparing dynamical systems. Our early work introduced Binet–Cauchy kernels for comparing linear dynamical systems, reducing kernels on trajectories to kernels on the parameters. Building on this foundation, we developed group-action-induced distances for averaging and clustering dynamical systems, establishing a geometric framework that accounts for invariances in system representations. This work was further extended through the development of alignment distances on spaces of dynamical systems, together with efficient algorithms for their computation. These contributions have been particularly impactful in computer vision, where dynamical systems provide compact representations of temporal data, enabling robust methods for video comparison, dynamic scene understanding, and human action recognition.
Dynamical Systems Perspectives on Optimization Algorithms
We investigate optimization algorithms through the lens of dynamical systems. By modeling classical and accelerated gradient-based methods as continuous-time dynamical systems, we obtain new insights into convergence, stability, and robustness. This includes studies of accelerated gradient descent, ADMM, conformal symplectic and relativistic optimization, nonsmooth dynamical systems, and proximal splitting methods. Our approach unifies theory and practice, providing both principled explanations of algorithmic behavior and design of novel, efficient optimization methods.
Distributed Optimization and Consensus on Manifolds
We develop geometric methods for distributed optimization and consensus on manifolds. Our early work introduced Riemannian optimization methods for optimal motion estimation on the essential manifold, establishing geometric formulations for recovering camera motion from image correspondences. Building on this foundation, we developed convergence guarantees for gradient descent methods for computing Riemannian centers of mass, providing fundamental tools for averaging data on manifolds. This work was further extended through the development of distributed consensus algorithms and consensus on manifolds, together with theoretical guarantees and scalable algorithms for multi-agent coordination. These contributions have been particularly impactful in computer vision, robotics, and biomedical imaging. In vision and robotics, they have enabled distributed localization of camera sensor networks, coordinated estimation, and geometric averaging methods for multi-view perception and autonomous systems. In brain imaging, these methods have also enabled principled averaging and clustering of probability density functions arising in high angular resolution diffusion MRI, supporting applications such as population analysis and morphometry.
Robotics and Autonomous Systems
We develop geometric and control-theoretic methods for autonomous robotic systems operating in complex environments. Our early work introduced vision-based control strategies for autonomous helicopter landing, demonstrating how visual feedback can be integrated with dynamical models to enable precise navigation and control. We also developed game-theoretic formulations for pursuit–evasion games, providing strategies for multi-agent coordination in adversarial settings. These contributions established principled connections between vision, control, and multi-agent systems, and helped advance the development of autonomous robots capable of perception-driven decision making and coordinated behavior.
Diffusion MRI Analysis
We develop computational methods for the analysis of high angular resolution diffusion imaging (HARDI) data to study brain structure and connectivity. Our work includes foundational contributions to HARDI reconstruction, restoration, registration, and segmentation based on information geometry, sparse representation, and non-convex optimization. A key contribution of our work is the ability to process diffusion signals while respecting the underlying geometric structure of the space of orientation distribution functions. These methods have enabled applications such as the study of structural brain asymmetries in twins and the identification of imaging biomarkers for neurological disorders such as mild cognitive impairment. This work provides principled tools for extracting clinically meaningful information from high-dimensional neuroimaging data.
Surgical Activity Recognition and Skill Assessment
We develop computer vision and machine learning methods for understanding surgical workflows and assessing surgical skill from robotic surgery data. Our work includes early contributions based on bag-of-spatiotemporal feature representations, sparse hidden Markov models, and conditional random fields for gesture segmentation and classification. We have also developed spatio-temporal deep learning architectures for surgical activity recognition and skill assessment. In addition, our lab contributed to the development of widely used benchmarks, such as the JHU-ISI gesture and skill assessment dataset, and helped shape the field through influential tutorials on data-driven methods for robotic surgery. This research provides quantitative and automated tools for objective surgical training and assessment.
Analysis of Stem Cell–Derived Cardiomyocytes
We develop machine learning methods for the analysis of stem cell–derived cardiomyocytes to support advances in regenerative medicine and drug discovery. Our work includes approaches for clustering and classification of cardiac cells based on cell morphology and contractile dynamics. We have developed shape analysis methods based on metamorphosis models to capture complex cell deformations, as well as recurrent neural network models to analyze temporal contraction patterns. These methods enable automated phenotyping of cardiac cells and provide scalable tools for understanding variability in stem-cell-derived cardiac populations.
Computational Microscopy and Point-of-Care Diagnostics
We develop computational imaging and machine learning methods for cell detection, classification, and counting from holographic microscopy data. Our work includes sparse representation-based approaches for holographic reconstruction, hybrid physics-based and deep learning methods for phase retrieval, and methods that combine dictionary learning with graphical models for cell detection, classification, and counting. More recent work has explored encoder–decoder deep learning architectures for robust detection and counting of cells in holographic images. A major focus of this research is the development of computational tools for point-of-care diagnostics, including automated complete blood count with three-part differential and urinalysis using lensless imaging platforms. This work advances low-cost and scalable diagnostic technologies for resource-constrained healthcare settings.
Computer Vision for Neurological and Developmental Disorders
We develop computer vision methods for the analysis of human movement to support the diagnosis and treatment of neurological and developmental disorders. Our work includes methods for recognizing infant actions from multi-view video to support rehabilitation therapy. We have also developed deep learning models for detection and segmentation of motor tics from video for Tourette syndrome assessment. A major focus of our recent work is the Computerized Assessment of Motion Imitation (CAMI), which uses both 3D skeletal motion capture data and 2D video-based pose estimation to quantify motor imitation ability as a biomarker for autism spectrum disorder. These methods enable objective and quantitative assessment of motor function and have demonstrated promise for distinguishing typically developing children from children with autism. This research establishes computer vision as a tool for scalable and objective behavioral assessment in clinical settings.
Medical Vision–Language Models and Foundation Models for Healthcare
We develop multimodal AI systems that combine medical images and clinical text to enable interpretable and reliable clinical decision support. Our work includes methods for extracting structured medical facts from radiology reports through learned text representations and using these representations to build interpretable-by-design systems for report classification. Our recent work extends these ideas to jointly reason over medical images and reports by grounding extracted clinical concepts directly in chest X-ray images, enabling automated report generation and interpretable classification. More broadly, we are extending these methods to other imaging modalities such as echocardiography, with the goal of developing foundation models for radiology and cardiology. This research advances multimodal foundation models for healthcare that emphasize interpretability, reliability, and clinical utility.
Publications by Topic
Select one or more topics to filter publications