Asim Unmesh

PhD Candidate, Electrical & Computer Engineering, Purdue University (USA)
Thesis Topic: Scalable and Compositional Human Activity Understanding (HAU) for Object Centric Tasks

Thesis Motivation in one line: Scalability enables real-world deployment; compositionality unlocks details of activities enabling richer applications.

Motivation

Manufacturing and production are core human endeavors that generate material wealth and prosperity. Improving efficiencies can raise living standards and reduce waste, benefiting both society and the environment.

Applications of HAU? Better work-instruction design, Worker training and feedback, AI task copilots and task assistants, and robotic imitation learning/ learning from human demonstration.

Research Focus

Scalability — Scalability refers to performance across new activities, facilities, and visual conditions.
Compositionality — Compositionality in the context of HAU refers to structured understanding suchas using task hierarchies and interactions (human–object, object–object) - among other compositional elements.

Challenges of Scalability for HAU

Data Bottleneck - Sourcing relevant videos, pre-processing, annotating for training and evaluation
Generalization Bottleneck - Performing across different scenarios such as task changes, environment changes among others.
Training and Evaluation Bottleneck - Compute, knowledge and skill requirements for training and evaluating state of the art systems for HAU.

Challenges of Compositionality for HAU

Challenges in defining and annotating compositional elements
Fine-grained compositional understanding
Learning of relations (such as causal relations, part-whole relations) between different compositional elements

Overview of Research for Scalable Human Action Understanding

[Under-Review] Open-vocabulary, zero-shot Action Segmentation
[In Progress] Synthetic+real pipelines.

Overview of Research for Compositional Human Action Understanding

[Published] Novel Object-Object Interaction Dataset - Published at IEEE RA-L 2023, presented at ICRA 2024
- Interaction-centric representations for Action Recognition
[Under Review]
[In Progress] Temporal Grounding in Assembly Activities using Video-Language Models

Selected Work

Interacting Objects Dataset — 10k object–object interaction annotations for richer dynamic scene representations. (IEEE RA‑L, 2024)
Open‑Vocabulary Temporal Action Segmentation — training‑free pipeline using VLMs and optimal transport for temporal consistency. (Under review)
Assemblify: Generating Adaptive On-Demand 3D Animations for Context-Aware Mechanical Assembly Guidance — Agentic Approach for assembly activity guidance using Assembly by Disassembly Algorithm. (C&E accepted; others under review)

Contact

Email: a.unmesh@gmail.com
Google Scholar Page