Apoorva Sharma

Apoorva Sharma


Apoorva Sharma is a graduate student in the Aeronautics and Astronautics department. Prior to studying at Stanford, he received a BS in Engineering at Harvey Mudd College in 2016. At Harvey Mudd College, he worked on trajectory planning for autonomous underwater vehicles under supervision of Professor Chris Clark in the Lab for Autonomous and Intelligent Robotics.

Apoorva’s research interests are in the intersection of machine learning, control theory, and planning. His current work focuses on robust and adaptive methods for planning under uncertainty.


ASL Publications

  1. T. Lew, A. Sharma, J. Harrison, A. Bylard, and M. Pavone, “Safe Active Dynamics Learning and Control: A Sequential Exploration-Exploitation Framework,” 2021. (Submitted)

    Abstract: To safely deploy learning-based systems in highly uncertain environments, one must ensure that they always satisfy constraints. In this work, we propose a practical and theoretically justified approach to maintaining safety in the presence of dynamics uncertainty. Our approach leverages Bayesian meta-learning with last-layer adaptation: the expressiveness of neural-network features trained offline, paired with efficient last-layer online adaptation, enables the derivation of tight confidence sets which contract around the true dynamics as the model adapts online. We exploit these confidence sets to plan trajectories that guarantee the safety of the system. Our approach handles problems with high dynamics uncertainty where reaching the goal safely is initially infeasible by first exploring to gather data and reduce uncertainty, before autonomously exploiting the acquired information to safely perform the task. Under reasonable assumptions, we prove that our framework provides safety guarantees in the form of a single joint chance constraint. Furthermore, we use this theoretical analysis to motivate regularization of the model to improve performance. We extensively demonstrate our approach in simulation and on hardware.

    @inproceedings{LewEtAl2021,
      author = {Lew, T. and Sharma, A. and Harrison, J. and Bylard, A. and Pavone, M.},
      title = {Safe Active Dynamics Learning and Control: A Sequential Exploration-Exploitation Framework},
      year = {2021},
      note = {Submitted},
      month = mar,
      url = {https://arxiv.org/pdf/2008.11700.pdf},
      keywords = {sub},
      owner = {lewt},
      timestamp = {2021-03-05}
    }
    
  2. A. Sharma, N. Azizan, and M. Pavone, “Sketching Curvature for Efficient Out-of-Distribution Detection for Deep Neural Networks,” 2021. (Submitted)

    Abstract: In order to safely deploy Deep Neural Networks (DNNs) within the perception pipelines of real-time decision making systems, there is a need for safeguards that can detect out-of-training-distribution (OoD) inputs both efficiently and accurately. Building on recent work leveraging the local curvature of DNNs to reason about epistemic uncertainty, we propose Sketching Curvature of OoD Detection (SCOD), an architecture-agnostic framework for equipping any trained DNN with a task-relevant epistemic uncertainty estimate. Offline, given a trained model and its training data, SCOD employs tools from matrix sketching to tractably compute a low-rank approximation of the Fisher information matrix, which characterizes which directions in the weight space are most influential on the predictions over the training data. Online, we estimate uncertainty by measuring how much perturbations orthogonal to these directions can alter predictions at a new test input. We apply SCOD to pre-trained networks of varying architectures on several tasks, ranging from regression to classification. We demonstrate that SCOD achieves comparable or better OoD detection performance with lower computational burden relative to existing baselines.

    @inproceedings{SharmaAzizanEtAl2021,
      author = {Sharma, A. and Azizan, N. and Pavone, M.},
      title = {Sketching Curvature for Efficient Out-of-Distribution Detection for Deep Neural Networks},
      year = {2021},
      note = {Submitted},
      month = mar,
      url = {https://arxiv.org/abs/2102.12567},
      keywords = {sub},
      owner = {apoorva},
      timestamp = {2021-03-09}
    }
    
  3. R. Dyro, J. Harrison, A. Sharma, and M. Pavone, “Particle MPC for Uncertain and Learning-Based Control,” 2021. (Submitted)

    Abstract: Autonomous decision-making in novel or changing environments requires quantification and consideration of uncertainties in the system or environment dynamics that impact downstream control performance. Thus, as robotic systems move from highly structured environments to open worlds, incorporating uncertainty in learning or estimation into the control pipeline is essential for robust and efficient performance. In this paper we present a nonlinear particle model predictive control (PMPC) approach to control under uncertainty. This approach, due to the particle representation of uncertainty, is capable of handling arbitrary uncertainty specifications. We implement our nonlinear PMPC scheme with a sequential convex programming non-convex optimization scheme, and we discuss practical implementation of such a framework. We investigate our approach for two robotic systems across three problem settings: time-varying, partially observed dynamics; sensing uncertainty; and model-based reinforcement learning, and show that our approach improves performance over baselines in all settings.

    @inproceedings{DyroHarrisonEtAl2021,
      author = {Dyro, R. and Harrison, J. and Sharma, A. and Pavone, M.},
      title = {Particle MPC for Uncertain and Learning-Based Control},
      year = {2021},
      note = {Submitted},
      keywords = {sub},
      owner = {jh2},
      timestamp = {2021-03-23}
    }
    
  4. J. Harrison, A. Sharma, C. Finn, and M. Pavone, “Continuous Meta-Learning without Tasks,” in Conf. on Neural Information Processing Systems, 2020.

    Abstract: Meta-learning is a promising strategy for learning to efficiently learn within new tasks, using data gathered from a distribution of tasks. However, the meta-learning literature thus far has focused on the task segmented setting, where at train-time, offline data is assumed to be split according to the underlying task, and at test-time, the algorithms are optimized to learn in a single task. In this work, we enable the application of generic meta-learning algorithms to settings where this task segmentation is unavailable, such as continual online learning with a time-varying task. We present meta-learning via online changepoint analysis (MOCA), an approach which augments a meta-learning algorithm with a differentiable Bayesian changepoint detection scheme. The framework allows both training and testing directly on time series data without segmenting it into discrete tasks. We demonstrate the utility of this approach on a nonlinear meta-regression benchmark as well as two meta-image-classification benchmarks.

    @inproceedings{HarrisonSharmaEtAl2020,
      author = {Harrison, J. and Sharma, A. and Finn, C. and Pavone, M.},
      booktitle = {{Conf. on Neural Information Processing Systems}},
      title = {Continuous Meta-Learning without Tasks},
      year = {2020},
      note = {Submitted},
      month = dec,
      url = {https://arxiv.org/abs/1912.08866},
      owner = {apoorva},
      timestamp = {2020-05-05}
    }
    
  5. A. Sharma, J. Harrison, M. Tsao, and M. Pavone, “Robust and Adaptive Planning under Model Uncertainty,” in Int. Conf. on Automated Planning and Scheduling, Berkeley, California, 2019.

    Abstract: Planning under model uncertainty is a fundamental problem across many applications of decision making and learning. In this paper, we propose the Robust Adaptive Monte Carlo Planning (RAMCP) algorithm, which allows computation of risk-sensitive Bayes-adaptive policies that optimally trade off exploration, exploitation, and robustness. RAMCP formulates the risk-sensitive planning problem as a two-player zero-sum game, in which an adversary perturbs the agent’s belief over the models. We introduce two versions of the RAMCP algorithm. The first, RAMCP-F, converges to an optimal risk-sensitive policy without having to rebuild the search tree as the underlying belief over models is perturbed. The second version, RAMCP-I, improves computational efficiency at the cost of losing theoretical guarantees, but is shown to yield empirical results comparable to RAMCP-F. RAMCP is demonstrated on an n-pull multi-armed bandit problem, as well as a patient treatment scenario.

    @inproceedings{SharmaHarrisonEtAl2019,
      author = {Sharma, A. and Harrison, J. and Tsao, M. and Pavone, M.},
      title = {Robust and Adaptive Planning under Model Uncertainty},
      booktitle = {{Int. Conf. on Automated Planning and Scheduling}},
      year = {2019},
      note = {In Press},
      address = {Berkeley, California},
      month = jul,
      url = {https://arxiv.org/pdf/1901.02577.pdf},
      owner = {apoorva},
      timestamp = {2019-04-10}
    }
    
  6. S. Chinchali, A. Sharma, J. Harrison, A. Elhafsi, D. Kang, E. Pergament, E. Cidon, S. Katti, and M. Pavone, “Network Offloading Policies for Cloud Robotics: a Learning-based Approach,” in Robotics: Science and Systems, Freiburg im Breisgau, Germany, 2019.

    Abstract: Today’s robotic systems are increasingly turning to computationally expensive models such as deep neural networks (DNNs) for tasks like localization, perception, planning, and object detection. However, resource-constrained robots, like low-power drones, often have insufficient on-board compute resources or power reserves to scalably run the most accurate, state-of-the art neural network compute models. Cloud robotics allows mobile robots the benefit of offloading compute to centralized servers if they are uncertain locally or want to run more accurate, compute-intensive models. However, cloud robotics comes with a key, often understated cost: communicating with the cloud over congested wireless networks may result in latency or loss of data. In fact, sending high data-rate video or LIDAR from multiple robots over congested networks can lead to prohibitive delay for real-time applications, which we measure experimentally. In this paper, we formulate a novel Robot Offloading Problem - how and when should robots offload sensing tasks, especially if they are uncertain, to improve accuracy while minimizing the cost of cloud communication? We formulate offloading as a sequential decision making problem for robots, and propose a solution using deep reinforcement learning. In both simulations and hardware experiments using state-of-the art vision DNNs, our offloading strategy improves vision task performance by between 1.3-2.6x of benchmark offloading strategies, allowing robots the potential to significantly transcend their on-board sensing accuracy but with limited cost of cloud communication.

    @inproceedings{ChinchaliSharmaEtAl2019,
      author = {Chinchali, S. and Sharma, A. and Harrison, J. and Elhafsi, A. and Kang, D. and Pergament, E. and Cidon, E. and Katti, S. and Pavone, M.},
      title = {Network Offloading Policies for Cloud Robotics: a Learning-based Approach},
      booktitle = {{Robotics: Science and Systems}},
      year = {2019},
      address = {Freiburg im Breisgau, Germany},
      month = jun,
      url = {https://arxiv.org/pdf/1902.05703.pdf},
      owner = {apoorva},
      timestamp = {2019-02-07}
    }
    
  7. B. Ivanovic, J. Harrison, A. Sharma, M. Chen, and M. Pavone, “BaRC: Backward Reachability Curriculum for Robotic Reinforcement Learning,” in Proc. IEEE Conf. on Robotics and Automation, Montreal, Canada, 2019.

    Abstract: Model-free Reinforcement Learning (RL) offers an attractive approach to learn control policies for high-dimensional systems, but its relatively poor sample complexity often forces training in simulated environments. Even in simulation, goal-directed tasks whose natural reward function is sparse remain intractable for state-of-the-art model-free algorithms for continuous control. The bottleneck in these tasks is the prohibitive amount of exploration required to obtain a learning signal from the initial state of the system. In this work, we leverage physical priors in the form of an approximate system dynamics model to design a curriculum scheme for a model-free policy optimization algorithm. Our Backward Reachability Curriculum (BaRC) begins policy training from states that require a small number of actions to accomplish the task, and expands the initial state distribution backwards in a dynamically-consistent manner once the policy optimization algorithm demonstrates sufficient performance. BaRC is general, in that it can accelerate training of any model-free RL algorithm on a broad class of goal-directed continuous control MDPs. Its curriculum strategy is physically intuitive, easy-to-tune, and allows incorporating physical priors to accelerate training without hindering the performance, flexibility, and applicability of the model-free RL algorithm. We evaluate our approach on two representative dynamic robotic learning problems and find substantial performance improvement relative to previous curriculum generation techniques and naïve exploration strategies

    @inproceedings{IvanovicHarrisonEtAl2019,
      author = {Ivanovic, B. and Harrison, J. and Sharma, A. and Chen, M. and Pavone, M.},
      title = {{BaRC:} Backward Reachability Curriculum for Robotic Reinforcement Learning},
      booktitle = {{Proc. IEEE Conf. on Robotics and Automation}},
      year = {2019},
      address = {Montreal, Canada},
      month = may,
      url = {https://arxiv.org/pdf/1806.06161.pdf},
      owner = {borisi},
      timestamp = {2018-09-05}
    }
    
  8. J. Harrison, A. Sharma, and M. Pavone, “Meta-Learning Priors for Efficient Online Bayesian Regression,” in Workshop on Algorithmic Foundations of Robotics, Merida, Mexico, 2018. (In Press)

    Abstract: Gaussian Process (GP) regression has seen widespread use in robotics due to its generality, simplicity of use, and the utility of Bayesian predictions. In particular, the predominant implementation of GP regression is kernel-based, as it enables fitting of arbitrary nonlinear functions by leveraging kernel functions as infinite-dimensional features. While incorporating prior information has the potential to drastically improve data efficiency of kernel-based GP regression, expressing complex priors through the choice of kernel function and associated hyperparameters is often challenging and unintuitive. Furthermore, the computational complexity of kernel-based GP regression scales poorly with the number of samples, limiting its application in regimes where a large amount of data is available. In this work, we propose ALPaCA, an algorithm for efficient Bayesian regression which addresses these issues. ALPaCA uses a dataset of sample functions to learn a domain-specific, finite-dimensional feature encoding, as well as a prior over the associated weights, such that Bayesian linear regression in this feature space yields accurate online predictions of the posterior density. These features are neural networks, which are trained via a meta-learning approach. ALPaCA extracts all prior information from the dataset, rather than relying on the choice of arbitrary, restrictive kernel hyperparameters. Furthermore, it substantially reduces sample complexity, and allows scaling to large systems. We investigate the performance of ALPaCA on two simple regression problems, two simulated robotic systems, and on a lane-change driving task performed by humans. We find our approach outperforms kernel-based GP regression, as well as state of the art meta-learning approaches, thereby providing a promising plug-in tool for many regression tasks in robotics where scalability and data-efficiency are important.

    @inproceedings{HarrisonSharmaEtAl2018,
      author = {Harrison, J. and Sharma, A. and Pavone, M.},
      title = {Meta-Learning Priors for Efficient Online Bayesian Regression},
      booktitle = {{Workshop on Algorithmic Foundations of Robotics}},
      year = {2018},
      address = {Merida, Mexico},
      month = oct,
      url = {https://arxiv.org/pdf/1807.08912.pdf},
      keywords = {press},
      owner = {apoorva},
      timestamp = {2018-10-07}
    }