Nikhil Sardana

contact

Machine Learning Research

Natural Language Processing

Accounting for Inference in LLM Scaling Laws

Modifying the Chinchilla scaling laws to find the optimal model size and training duration, adjusting for total (training + inference) costs over the model lifetime.
Read the paper (ICML 2024) or view the code. An earlier version appeared at the NeurIPS 2023 Workshop on Efficient Natural Language and Speech Processing.

Upcycling

Sparse Upcycling is the conversion of a dense LLM to a Mixture-of-Experts model. We find that the model quality improvements from upcycling are largely outweighed by the significant inference costs.
Read the paper (presented at the NeurIPS 2024 Workshop on Efficient Natural Language and Speech Processing).

Optimizing BERT pretraining

Developing and synthesizing a collection of architecture and training optimizations to reduce the cost of training BERT models to $20. Conducted ablation study to determine the contributions and interactions of individual methods.
Read the paper (NeurIPS 2023).

Reinforcement Learning

Autonomous Reinforcement Learning: Formalism and Benchmarking

Formalizing the problem statement and creating a benchmark suite of environments for autonomous reinforcement learning, a setting where agents must learn without human interventions.
Read the paper (ICLR 2022) or view the website.

Meta-Learning

Bayesian Meta-Learning with Gaussian Processes

Developed a Gaussian process + deep network model to produce better, more expressive uncertainty estimates on few-shot regression tasks.
Final project for CS 330.
Read the paper or view the code.

Stanford University

Coursework

Machine Learning

CS 224n: Natural Language Processing
CS 234: Reinforcement Learning
CS 330: Multi-Task and Meta Learning
CS 236: Deep Generative Models
CS 228: Probabilistic Graphical Models

Computer Science Theory

CS 154: Automata and Complexity Theory
CS 161, 168, 261: Algorithms
CS 259Q: Quantum Computing
CS 255, 355: Cryptography

Computer Systems

CS 149: Parallel Computing
CS 143: Compilers

Analysis

Math 116: Complex Analysis
Math 155: Analytic Number Theory
Math 61CM: Real Analysis & Linear Algebra

Algebra

Math 120: Group Theory
Math 121: Galois Theory

Projects

Volunteer work: Taught computer science after-school at Palo Alto High School (2019-2020). Read the lectures.
Dynamic Optimality and Tango Trees

Exploring the field of dynamic optimality, analyzing the tango tree, and running a few experiments.
Final project for CS 166 (Data Structures).
Read the paper.

MIT Battlecode

Monthlong AI strategy game/programming competition run by MIT students, open to students worldwide.
Team smite, 2020: 2nd place
Team smite, 2019: 1st place
Team smite, 2018: T-9th place
Team Segfault, 2017: 5th, HS tournament
2020 Code | 2019 Code | 2019 Strategy Report | 2018 Code | 2017 Code

Ancient History

Read A Guide to the Sciences at TJHSST, a small book written by me and a few friends on the science opportunities available to high-school students at TJ.
Co-founded the TJHSST Machine Learning Club.