Research

My research focuses on multi-agent reasoning systems, multimodal ML, interpretable ML, and large-scale experimental evaluation across real-world datasets. I have not only contributed to research projects in these areas, but also designed and executed large-scale experiments, built reproducible pipelines, and led research teams.

Lead ML Researcher — DIMACS / Rutgers MBS Exchange

Interpretable ML & Model Multiplicity

  • Lead an empirical study under Dr. Linda Ness and Dr. Lesia Semenova on sparse decision trees (SPLIT, GOSDT, LicketySPLIT, LicketyRESPLIT) against 3 boosting models (XGBoost, LightGBM, CatBoost) across 6 real-world datasets examining conditions where simple models can achieve near-identical accuracy to complex models, and how dataset properties influence model structure and multiplicity.
  • Deemonstrated that binarzing variables via ThresholdGuessBinarizer eliminates the performance gap between decision trees and boosting models, while preserving interpretability.
  • Quantified performance–interpretability tradeoffs using accuracy, class-specific recall, macro F1, tree depth, leaf count (log-scale), and Rashomon set size.
  • Analyzed that using preprocessing techniques such as SMOTE reshapes the Rashomon set, but doesn't improve prediction performance indicating that there isn't a correlation between Rashomon set size and prediction performance.
  • Empirically demonstrated that large Rashomon sets correspond to multiple equally performant but structurally diverse decision trees, highlighting the non-uniqueness of interpretable models.
  • Conducted a parameter sweep on ThresholdGuessBinarizer showing that MAX_DEPTH is the dominant driver for Rashomon set growth, but across all settings, performance doesn't improve with larger Rashomon sets.

AI Researcher — Algoverse

Multi-Agent Reasoning & Multimodal Machine Learning

Multi-Agent Deliberation & Consensus Dynamics (First Author, Solo)

  • Developed a multi-agent LLM framework to study consensus formation, showing that convergence is driven by directional model deference rather than purely independent reasoning.
  • Designed and executed large-scale 20-round deliberation experiments across subjective and objective benchmarks (GlobalOpinionsQA, Anthropic Written-Evals, Humanity’s Last Exam) using GPT-4.1, Mistral, and cross-family model systems (Phi, LLaMA, and Mistral).
  • Demonstrated that model deference is not a fixed hierarchical property, but varies with dataset characteristics, model composition, and interaction setting.
  • Introduced a rotation-based experimental framework that disentangles model identity from response content, revealing that identity alone does not explain inter-agent influence.
  • Formalized quantitative metrics for multi-agent dynamics, including inter-round disagreement (IDR), directional model deference (MDR), and accuracy-aware deference measures (MDAR).
  • Showed that system-level interventions (e.g., adversarial and independent prompting) can significantly alter or destabilize consensus formation by reducing or reshaping model deference.
  • Accepted to the ICML 2026 Pluralistic Alignment workshop and currently under review at the ICML 2026 AI4GOOD workshop.

DynaStride: Dynamic Stride Windowing with MMCoT (Second Author)

  • Contributed to a hierarchical video captioning pipeline combining dynamic stride window selection with multimodal chain-of-thought reasoning (MMCoT) for temporally coherent scene understanding.
  • Implemented and integrated Qwen2.5, Qwen3, and MiniLM models with subcaption aggregation to improve long-range temporal consistency.
  • Designed a comprehensive evaluation framework spanning BLEU-4, METEOR, CIDEr, BERTScore, SBERT similarity, and temporal alignment metrics.
  • Achieved +17% CIDEr over GPT-4o and +14% over VideoLLaMA-3 on the YouCook2 dataset.
  • Accepted to NeurIPS 2025 (7HVU Workshop, Oral) and AAAI 2026 (AI4EDU Workshop).

Algorithms Research Shadow — The College of New Jersey

Sparse Dynamic Programming for RNA Folding

  • Investigated classical and modern RNA secondary structure prediction algorithms including Nussinov, Zuker, and LinearFold.
  • Implemented sparse dynamic programming strategies to reduce computational complexity in large-sequence folding tasks.
  • Deployed large-scale experiments on a SLURM-managed HPC cluster using the ViennaRNA package.
  • Automated batch processing pipelines to benchmark folding accuracy, energy scores, and runtime across thousands of RNA sequences.
  • RNA Folding Research Summary