Amey Pasarkar

I am a PhD student at Princeton University, working in Vertaix on efficient data curation and evaluation algorithms for AI models. I am fortunate to be advised by Professor Adji Bousso Dieng and am generously supported by a NSF Graduate Research Fellowship.

Prior to joining Princeton, I graduated from Columbia University with a degree in Operations Research, where I had the privilege of working under Professor Itsik Pe'er on designing probabalistic graphical models for microbial dynamics. I have also recently interned at Amazon Science on the Fire TV team.

You can contact me at amey [dot] pasarkar [at] princeton [dot] edu.

CV  /  Google Scholar  /  GitHub  /  LinkedIn

profile photo

Research

Datasets have exploded in size, yet our ability to measure what they actually contain remains limited. I develop scalable methods for analyzing and curating large datasets, with a focus on generative modeling, and efficient training across image, text, and scientific domains.

Vendi Novelty Scores for Out-of-Distribution Detection

Amey P. Pasarkar, Adji Bousso Dieng
arXiv pre-print, 2026

We introduce an efficient, diversity-based Out-of-Distribution detection algorithm that achieves state-of-the-art performance on numerous image classification datasets and model architectures.

The Vendiscope: An Algorithmic Microscope For Data Collections

Amey P. Pasarkar, Adji Bousso Dieng
arXiv pre-print, 2025

We introduce the Vendiscope, a scalable algorithmic microscope for analyzing the contents of datasets and behaviors of models in any domain where similarity can be defined. The Vendiscope can detect outliers and duplicates, diagnose model failure modes, and reveal memorization patterns across modalities, all at scale.

Cousins Of The Vendi Score: A Family Of Similarity-Based Diversity Metrics For Science And Machine Learning

Amey P. Pasarkar, Adji Bousso Dieng
AISTATS, 2024

We generalize the Vendi Score to an entire family of metrics called the "Vendi scores". We demonstrate the utility of these metrics in evaluating image generative models, uncovering patterns of model memorization.

Vendi Sampling For Molecular Simulations: Diversity As A Force For Faster Convergence And Better Exploration

Amey P. Pasarkar, Gianluca M. Bencomo, Simon Olsson, Adji Bousso Dieng
Journal of Chemical Physics, 2023

We propose Vendi Sampling, a method for fast sampling from complex Boltzmann distributions that encourages diversity throughout the sampling process. We demonstrate on various molecular systems that Vendi Sampling outperforms standard Langevin Dynamics.

Directional Gaussian Mixture Models of the Gut Microbiome Elucidate Microbial Spatial Structure

Amey P. Pasarkar, Tyler A. Joseph, Itsik Pe'er
mSystems, 2021

We develop a class of directional Gaussian Mixture models to elucidate the spatial arrangement of the gut microbiome. With no prior information, these models are able to recapitulate known spatial dynamics and propose new ones as well. Research was also presented at Probabalistic Modeling in Genomics, 2021.

Social Distancing and the Internet: What Can Network Performance Measurements Tell Us?

Jessica De Oliveira Moreira, Amey P. Pasarkar, Wenjun Chen Kerry Hu, Jan Janak Henning Schulzrinne
TPRC, 2021

We analyzed fixed broadband usage across the United States in the months following COVID-19 lockdowns to determine the effect of lockdowns on internet usage patterns and network performance. We founds signs of increased congestion following significant changes in user behavior.

Efficient and Accurate Inference of Mixed Microbial Population Trajectories from Longitudinal Count Data

Tyler A. Joseph, Amey P. Pasarkar, Itsik Pe'er
Cell Systems, 2020

We develop a multi-layer probabilistic graphical model that can deconvolve microbiome time-series data into its sources of noise. We show our model can be efficiently fit with variational inference while providing more accurate resuls than existing methods. Also presented at RECOMB 2020.