Nikhil Sardana
Machine Learning Research
- Natural Language Processing
- Accounting for Inference in LLM Scaling Laws
- Modifying the Chinchilla scaling laws to find the optimal model size and training duration, adjusting for total (training + inference) costs over the model lifetime.
- Read the paper (ICML 2024). An earlier version appeared at the NeurIPS 2023 Workshop on Efficient Natural Language and Speech Processing.
- Upcycling
- Sparse Upcycling is the conversion of a dense LLM to a Mixture-of-Experts model. We find that the model quality improvements from upcycling are largely outweighed by the significant inference costs.
- Read the paper (presented at the NeurIPS 2024 Workshop on Efficient Natural Language and Speech Processing).
- Optimizing BERT pretraining
- Developing and synthesizing a collection of architecture and training optimizations to reduce the cost of training BERT models to $20. Conducted ablation study to determine the contributions and interactions of individual methods.
- Read the paper (NeurIPS 2023).
- Reinforcement Learning
- Autonomous Reinforcement Learning: Formalism and Benchmarking
- Formalizing the problem statement and creating a benchmark suite of environments for autonomous reinforcement learning, a setting where agents must learn without human interventions.
- Read the paper (ICLR 2022) or view the website.
- Meta-Learning
- Bayesian Meta-Learning with Gaussian Processes
- Developed a Gaussian process + deep network model to produce better, more expressive uncertainty estimates on few-shot regression tasks.
- Final project for CS 330.
- Read the paper or view the code.
Stanford University
Coursework
- Machine Learning
- CS 224n: Natural Language Processing
- CS 234: Reinforcement Learning
- CS 330: Multi-Task and Meta Learning
- CS 236: Deep Generative Models
- CS 228: Probabilistic Graphical Models
- Computer Science Theory
- CS 154: Automata and Complexity Theory
- CS 161, 168, 261: Algorithms
- CS 259Q: Quantum Computing
- CS 255, 355: Cryptography
- Computer Systems
- CS 149: Parallel Computing
- CS 143: Compilers
- Analysis
- Math 116: Complex Analysis
- Math 155: Analytic Number Theory
- Math 61CM: Real Analysis & Linear Algebra
- Algebra
- Math 120: Group Theory
- Math 121: Galois Theory
Projects
- Volunteer work: Taught computer science after-school at Palo Alto High School (2019-2020). Read the lectures.
- Dynamic Optimality and Tango Trees
- Exploring the field of dynamic optimality, analyzing the tango tree, and running a few experiments.
- Final project for CS 166 (Data Structures).
- Read the paper.
- MIT Battlecode
Ancient History