Research

My research focuses on multi-agent reasoning systems, multimodal ML, interpretable ML, and large-scale experimental evaluation across real-world datasets. I have not only contributed to research projects in these areas, but also designed and executed large-scale experiments, built reproducible pipelines, and led research teams.

Lead ML Research Extern — DIMACS / Rutgers MBS Exchange

Interpretable ML & Model Multiplicity

Leading an empirical study under Dr. Linda Ness on interpretable decision trees (SPLIT, GOSDT, LicketySPLIT, LicketyRESPLIT) against 3 boosting models (XGBoost, LightGBM, CatBoost) across 6 real-world datasets examining conditions where simple models can achieve near-identical accuracy to complex models, and how dataset properties influence model structure and multiplicity.
This work explores the Rashomon effect in decision trees, demonstrating that many structurally distinct models can achieve near-identical accuracy while differing significantly in interpretability.
Quantifying performance–interpretability tradeoffs using accuracy, class-specific recall, macro F1, tree depth, leaf count (log-scale), and Rashomon set size.
Analyzing how preprocessing (SMOTE, TGB) reshapes Rashomon set size and model interpretability.
Empirically demonstrating that large Rashomon sets correspond to multiple equally performant but structurally diverse decision trees, highlighting the non-uniqueness of interpretable models.
Designing reproducible experimental pipelines and executing large-scale experiments on Rutgers’ Amarel HPC cluster.

AI Researcher — Algoverse

Multi-Agent Reasoning & Multimodal Machine Learning

Multi-Agent Deliberation & Consensus Dynamics (First Author, Solo)

Developed a multi-agent LLM framework to investigate how consensus emerges across subjective and objective tasks, identifying model-to-model deference as a primary driver of convergence rather than independent reasoning.
Designed and executed large-scale 20-round deliberation experiments across GlobalOpinionsQA, Anthropic Persona-Written Evals, and Humanity’s Last Exam using named and anonymized agents from the GPT-4.1 family (GPT-4.1, GPT-4.1-nano, GPT-4.1-mini).
Introduced a rotation-based experimental paradigm to disentangle the effects of model identity vs. answer quality, demonstrating that response quality plays a stronger role in deference dynamics.
Formalized quantitative metrics for multi-agent interaction, including inter-round disagreement, pairwise disagreement, and directional model deference.
Showed that system-level interventions (e.g., prompting strategies) can significantly alter or destabilize consensus formation.
Paper in preparation for submission to ICML workshops (Pluralistic Alignment, AI4Good, Epistemic Intelligence in ML).

DynaStride: Dynamic Stride Windowing with MMCoT (Second Author)

Contributed to a hierarchical video captioning pipeline combining dynamic stride window selection with multimodal chain-of-thought reasoning (MMCoT) for temporally coherent scene understanding.
Implemented and integrated Qwen2.5, Qwen3, and MiniLM models with subcaption aggregation to improve long-range temporal consistency.
Designed a comprehensive evaluation framework spanning BLEU-4, METEOR, CIDEr, BERTScore, SBERT similarity, and temporal alignment metrics.
Achieved +17% CIDEr over GPT-4o and +14% over VideoLLaMA-3 on the YouCook2 dataset.
Accepted to NeurIPS 2025 (7HVU Workshop, Oral) and AAAI 2026 (AI4EDU Workshop) .

Algorithms Research Shadow — The College of New Jersey

Sparse Dynamic Programming for RNA Folding

Investigated classical and modern RNA secondary structure prediction algorithms including Nussinov, Zuker, and LinearFold.
Implemented sparse dynamic programming strategies to reduce computational complexity in large-sequence folding tasks.
Deployed large-scale experiments on a SLURM-managed HPC cluster using the ViennaRNA package.
Automated batch processing pipelines to benchmark folding accuracy, energy scores, and runtime across thousands of RNA sequences.
RNA Folding Research Summary