Asim Unmesh
PhD Candidate, Electrical & Computer Engineering, Purdue University (USA)
Thesis Topic: Scalable and Compositional Human Activity Understanding (HAU) for Object Centric Tasks
Thesis Motivation in one line: Scalability enables real-world deployment; compositionality unlocks details of activities enabling richer applications.
Motivation
Manufacturing and production are core human endeavors that generate material wealth and prosperity. Improving efficiencies
can raise living standards and reduce waste, benefiting both society and the environment.
Applications of HAU? Better work-instruction design, Worker training and feedback, AI task copilots and task assistants, and robotic imitation learning/ learning from human demonstration.
Research Focus
- Scalability — Scalability refers to performance across new activities, facilities, and visual conditions.
- Compositionality — Compositionality in the context of HAU refers to structured understanding suchas using task hierarchies and interactions (human–object, object–object) - among other compositional elements.
Challenges of Scalability for HAU
- Data Bottleneck - Sourcing relevant videos, pre-processing, annotating for training and evaluation
- Generalization Bottleneck - Performing across different scenarios such as task changes, environment changes among others.
- Training and Evaluation Bottleneck - Compute, knowledge and skill requirements for training and evaluating state of the art systems for HAU.
Challenges of Compositionality for HAU
- Challenges in defining and annotating compositional elements
- Fine-grained compositional understanding
- Learning of relations (such as causal relations, part-whole relations) between different compositional elements
Overview of Research for Scalable Human Action Understanding
- [Under-Review] Open-vocabulary, zero-shot Action Segmentation
- [In Progress] Synthetic+real pipelines.
Overview of Research for Compositional Human Action Understanding
- [Published] Novel Object-Object Interaction Dataset - Published at IEEE RA-L 2023, presented at ICRA 2024
- Interaction-centric representations for Action Recognition
- [Under Review]
- [In Progress] Temporal Grounding in Assembly Activities using Video-Language Models
Selected Work
- Interacting Objects Dataset — 10k object–object interaction annotations for richer dynamic scene representations. (IEEE RA‑L, 2024)
- Open‑Vocabulary Temporal Action Segmentation — training‑free pipeline using VLMs and optimal transport for temporal consistency. (Under review)
- Assemblify: Generating Adaptive On-Demand 3D Animations for Context-Aware Mechanical Assembly Guidance — Agentic Approach for assembly activity guidance using Assembly by Disassembly Algorithm. (C&E accepted; others under review)
© 2025 Asim Unmesh