Negin Raoof

Negin Raoof

I’m a third year PhD student at UC Berkeley, advised by Alex Dimakis. My research focuses on data curation for large language models. I currently work as a research scientist intern at Bespoke Labs.

Before starting the PhD program, I was a software engineer at Microsoft working on: Siphon: Streaming data ingestion with Apache Kafka, and PyTorch - ONNX .

LinkedIn / CV / Google Scholar / Github

News

Publications

Infilling Score: A Pretraining Data Detection Algorithm for Large Language Models

Negin Raoof, Litu Rout, Giannis Daras, Sujay Sanghavi, Constantine Caramanis, Sanjay Shakkottai, Alex Dimakis
ICLR 2025 Paper

Modeling Bilingual Disfluencies with Large Language Models

Negin Raoof, Yating Wu, Carlos Bonilla, Junyi Jessy Li, Stephanie M Grasso, Alex Dimakis, Zoi Gkalitsiou
ICML 2024 Workshop on LLMs and Cognition Paper

Solving Inverse Problems Provably via Posterior Sampling with Latent Diffusion Models

Litu Rout, Negin Raoof, Giannis Daras, Constantine Caramanis, Alexandros G. Dimakis, Sanjay Shakkottai
NeurIPS 2023 Paper

Multitasking Models are Robust to Structural Failure: A Neural Model for Bilingual Cognitive Reserve

Giannis Daras (*) , Negin Raoof (*), Zoi Gkalitsiou, Alexandros G. Dimakis
NeurIPS 2022 Paper

Blog Posts

Processing trillions of events per day with Apache Kafka on Azure