Publications

Peer-reviewed research contributions in multimodal reasoning, large language models, and instructional video understanding.

DynaStride: Dynamic Stride Windowing with Multimodal Chain-of-Thought

NeurIPS 2025 — Oral Presentation (7HVU Workshop)
AAAI 2026 — AI4EDU Workshop

  • Developed a hierarchical scene-captioning pipeline integrating dynamic stride window selection with multimodal chain-of-thought reasoning.
  • Leveraged Qwen2.5, Qwen3, and MiniLM embeddings for temporally coherent subcaption aggregation.
  • Achieved +17% CIDEr over GPT-4o and +14% over VLLaMA-3 on the YouCook2 dataset.
arXiv| Instructional Multi-Scene Captioning