Publications
Peer-reviewed research contributions in multimodal reasoning, large language models, and instructional video understanding.
DynaStride: Dynamic Stride Windowing with Multimodal Chain-of-Thought
NeurIPS 2025 — Oral Presentation (7HVU Workshop)
AAAI 2026 — AI4EDU Workshop
- Developed a hierarchical scene-captioning pipeline integrating dynamic stride window selection with multimodal chain-of-thought reasoning.
- Leveraged Qwen2.5, Qwen3, and MiniLM embeddings for temporally coherent subcaption aggregation.
- Achieved +17% CIDEr over GPT-4o and +14% over VLLaMA-3 on the YouCook2 dataset.
arXiv| Instructional Multi-Scene Captioning