Research
My research focuses on multi-agent reasoning systems, multimodal ML, interpretable ML, and large-scale experimental evaluation across real-world datasets. I have not only contributed to research projects in these areas, but also designed and executed large-scale experiments, built reproducible pipelines, and led research teams.
Lead ML Researcher — DIMACS / Rutgers MBS Exchange
Interpretable ML & Model Multiplicity
- Lead an empirical study under Dr. Linda Ness and Dr. Lesia Semenova on sparse decision trees (SPLIT, GOSDT, LicketySPLIT, LicketyRESPLIT) against 3 boosting models (XGBoost, LightGBM, CatBoost) across 6 real-world datasets examining conditions where simple models can achieve near-identical accuracy to complex models, and how dataset properties influence model structure and multiplicity.
- Deemonstrated that binarzing variables via ThresholdGuessBinarizer eliminates the performance gap between decision trees and boosting models, while preserving interpretability.
- Quantified performance–interpretability tradeoffs using accuracy, class-specific recall, macro F1, tree depth, leaf count (log-scale), and Rashomon set size.
- Analyzed that using preprocessing techniques such as SMOTE reshapes the Rashomon set, but doesn't improve prediction performance indicating that there isn't a correlation between Rashomon set size and prediction performance.
- Empirically demonstrated that large Rashomon sets correspond to multiple equally performant but structurally diverse decision trees, highlighting the non-uniqueness of interpretable models.
- Conducted a parameter sweep on ThresholdGuessBinarizer showing that MAX_DEPTH is the dominant driver for Rashomon set growth, but across all settings, performance doesn't improve with larger Rashomon sets.
AI Researcher — Algoverse
Multi-Agent Reasoning & Multimodal Machine Learning
Multi-Agent Deliberation & Consensus Dynamics (First Author, Solo)
- Developed a multi-agent LLM framework to study consensus formation, showing that convergence is driven by directional model deference rather than purely independent reasoning.
- Designed and executed large-scale 20-round deliberation experiments across subjective and objective benchmarks (GlobalOpinionsQA, Anthropic Written-Evals, Humanity’s Last Exam) using GPT-4.1, Mistral, and cross-family model systems (Phi, LLaMA, and Mistral).
- Demonstrated that model deference is not a fixed hierarchical property, but varies with dataset characteristics, model composition, and interaction setting.
- Introduced a rotation-based experimental framework that disentangles model identity from response content, revealing that identity alone does not explain inter-agent influence.
- Formalized quantitative metrics for multi-agent dynamics, including inter-round disagreement (IDR), directional model deference (MDR), and accuracy-aware deference measures (MDAR).
- Showed that system-level interventions (e.g., adversarial and independent prompting) can significantly alter or destabilize consensus formation by reducing or reshaping model deference.
- Accepted to the ICML 2026 Pluralistic Alignment workshop and currently under review at the ICML 2026 AI4GOOD workshop.
DynaStride: Dynamic Stride Windowing with MMCoT (Second Author)
- Contributed to a hierarchical video captioning pipeline combining dynamic stride window selection with multimodal chain-of-thought reasoning (MMCoT) for temporally coherent scene understanding.
- Implemented and integrated Qwen2.5, Qwen3, and MiniLM models with subcaption aggregation to improve long-range temporal consistency.
- Designed a comprehensive evaluation framework spanning BLEU-4, METEOR, CIDEr, BERTScore, SBERT similarity, and temporal alignment metrics.
- Achieved +17% CIDEr over GPT-4o and +14% over VideoLLaMA-3 on the YouCook2 dataset.
- Accepted to NeurIPS 2025 (7HVU Workshop, Oral) and AAAI 2026 (AI4EDU Workshop).
Algorithms Research Shadow — The College of New Jersey
Sparse Dynamic Programming for RNA Folding
- Investigated classical and modern RNA secondary structure prediction algorithms including Nussinov, Zuker, and LinearFold.
- Implemented sparse dynamic programming strategies to reduce computational complexity in large-sequence folding tasks.
- Deployed large-scale experiments on a SLURM-managed HPC cluster using the ViennaRNA package.
- Automated batch processing pipelines to benchmark folding accuracy, energy scores, and runtime across thousands of RNA sequences.
- RNA Folding Research Summary