Asim Unmesh

PhD Candidate @ Purdue — Human Activity Understanding

Asim Unmesh

PhD Candidate, Electrical & Computer Engineering, Purdue University (USA)
Thesis Topic: Scalable and Compositional Human Activity Understanding (HAU) for Object Centric Tasks

Thesis Motivation in one line: Scalability enables real-world deployment; compositionality unlocks details of activities enabling richer applications.


Motivation

Manufacturing and production are core human endeavors that generate material wealth and prosperity. Improving efficiencies can raise living standards and reduce waste, benefiting both society and the environment.

Applications of HAU? Better work-instruction design, Worker training and feedback, AI task copilots and task assistants, and robotic imitation learning/ learning from human demonstration.

Research Focus

  1. Scalability — Scalability refers to performance across new activities, facilities, and visual conditions.
  2. Compositionality — Compositionality in the context of HAU refers to structured understanding suchas using task hierarchies and interactions (human–object, object–object) - among other compositional elements.

Challenges of Scalability for HAU

  1. Data Bottleneck - Sourcing relevant videos, pre-processing, annotating for training and evaluation
  2. Generalization Bottleneck - Performing across different scenarios such as task changes, environment changes among others.
  3. Training and Evaluation Bottleneck - Compute, knowledge and skill requirements for training and evaluating state of the art systems for HAU.

Challenges of Compositionality for HAU

  1. Challenges in defining and annotating compositional elements
  2. Fine-grained compositional understanding
  3. Learning of relations (such as causal relations, part-whole relations) between different compositional elements

Overview of Research for Scalable Human Action Understanding

Overview of Research for Compositional Human Action Understanding

  1. [Published] Novel Object-Object Interaction Dataset - Published at IEEE RA-L 2023, presented at ICRA 2024
    • Interaction-centric representations for Action Recognition
  2. [Under Review]
  3. [In Progress] Temporal Grounding in Assembly Activities using Video-Language Models

Selected Work

Contact

© 2025 Asim Unmesh