Boris Ivanovic

ASL Publications

R. Luo, S. Zhao, J. Kuck, B. Ivanovic, S. Savarese, E. Schmerling, and M. Pavone, “Sample-Efficient Safety Assurances using Conformal Prediction,” in Int. Journal of Robotics Research, 2023.
[BibTeX] [Abstract]

Abstract: When deploying machine learning models in high-stakes robotics applications, the ability to detect unsafe situations is crucial. Early warning systems can provide alerts when an unsafe situation is imminent (in the absence of corrective action). To reliably improve safety, these warning systems should have a provable false negative rate; i.e., of the situations than are unsafe, fewer than epsilon will occur without an alert. In this work, we present a framework that combines a statistical inference technique known as conformal prediction with a simulator of robot/environment dynamics, in order to tune warning systems to provably achieve an epsilon false negative rate using as few as 1/epsilon data points. We apply our framework to a driver warning system and a robotic grasping application, and empirically demonstrate guaranteed false negative rate and low false detection (positive) rate using very little data.
```
@inproceedings{LuoZhaoEtAl2023,
  author = {Luo, R. and Zhao, S. and Kuck, J. and Ivanovic, B. and Savarese, S. and Schmerling, E. and Pavone, M.},
  title = {Sample-Efficient Safety Assurances using Conformal Prediction},
  booktitle = {{Int. Journal of Robotics Research}},
  year = {2023},
  owner = {rsluo},
  timestamp = {2023-02-10},
  url = {https://arxiv.org/abs/2109.14082}
}
```
R. Luo, S. Zhao, J. Kuck, B. Ivanovic, S. Savarese, E. Schmerling, and M. Pavone, “Sample-Efficient Safety Assurances using Conformal Prediction,” in Workshop on Algorithmic Foundations of Robotics, 2022.
[BibTeX] [Abstract]

Abstract: When deploying machine learning models in high-stakes robotics applications, the ability to detect unsafe situations is crucial. Early warning systems can provide alerts when an unsafe situation is imminent (in the absence of corrective action). To reliably improve safety, these warning systems should have a provable false negative rate; i.e., of the situations than are unsafe, fewer than epsilon will occur without an alert. In this work, we present a framework that combines a statistical inference technique known as conformal prediction with a simulator of robot/environment dynamics, in order to tune warning systems to provably achieve an epsilon false negative rate using as few as 1/epsilon data points. We apply our framework to a driver warning system and a robotic grasping application, and empirically demonstrate guaranteed false negative rate and low false detection (positive) rate using very little data.
```
@inproceedings{LuoZhaoEtAl2022,
  author = {Luo, R. and Zhao, S. and Kuck, J. and Ivanovic, B. and Savarese, S. and Schmerling, E. and Pavone, M.},
  title = {Sample-Efficient Safety Assurances using Conformal Prediction},
  booktitle = {{Workshop on Algorithmic Foundations of Robotics}},
  year = {2022},
  month = may,
  owner = {rsluo},
  timestamp = {2021-09-20},
  url = {https://arxiv.org/abs/2109.14082}
}
```
B. Ivanovic, Y. Lin, S. Shrivastava, P. Chakravarty, and M. Pavone, “Propagating State Uncertainty Through Trajectory Forecasting,” in Proc. IEEE Conf. on Robotics and Automation, 2022.
[BibTeX] [Abstract]

Abstract: Uncertainty pervades through the modern robotic autonomy stack, with nearly every component (e.g., sensors, detection, classification, tracking, behavior prediction) producing continuous or discrete probabilistic distributions. Trajectory forecasting, in particular, is surrounded by uncertainty as its inputs are produced by (noisy) upstream perception and its outputs are predictions that are often probabilistic for use in downstream planning. However, most trajectory forecasting methods do not account for upstream uncertainty, instead taking only the most-likely values. As a result, perceptual uncertainties are not propagated through forecasting and predictions are frequently overconfident. To address this, we present a novel method for incorporating perceptual state uncertainty in trajectory forecasting, a key component of which is a new statistical distance-based loss function which encourages predicting uncertainties that better match upstream perception. We evaluate our approach both in illustrative simulations and on large-scale, real-world data, demonstrating its efficacy in propagating perceptual state uncertainty through prediction and producing more calibrated predictions.
```
@inproceedings{IvanovicLinEtAl2022,
  author = {Ivanovic, B. and Lin, Y. and Shrivastava, S. and Chakravarty, P. and Pavone, M.},
  title = {Propagating State Uncertainty Through Trajectory Forecasting},
  booktitle = {{Proc. IEEE Conf. on Robotics and Automation}},
  year = {2022},
  month = may,
  keywords = {pub},
  owner = {borisi},
  timestamp = {2022-02-01}
}
```
B. Ivanovic, K.-H. Lee, P. Tokmakov, B. Wulfe, R. McAllister, A. Gaidon, and M. Pavone, “Heterogeneous-Agent Trajectory Forecasting Incorporating Class Uncertainty,” in IEEE/RSJ Int. Conf. on Intelligent Robots & Systems, 2022.
[BibTeX] [Abstract]

Abstract: Reasoning about the future behavior of other agents is critical to safe robot navigation. The multiplicity of plausible futures is further amplified by the uncertainty inherent to agent state estimation from data, including positions, velocities, and semantic class. Forecasting methods, however, typically neglect class uncertainty, conditioning instead only on the agent’s most likely class, even though perception models often return full class distributions. To exploit this information, we present HAICU, a method for heterogeneous-agent trajectory forecasting that explicitly incorporates agents’ class probabilities. We additionally present PUP, a new challenging real-world autonomous driving dataset, to investigate the impact of Perceptual Uncertainty in Prediction. It contains challenging crowded scenes with unfiltered agent class probabilities that reflect the long-tail of current state-of-the-art perception systems. We demonstrate that incorporating class probabilities in trajectory forecasting significantly improves performance in the face of uncertainty, and enables new forecasting capabilities such as counterfactual predictions.
```
@inproceedings{IvanovicLeeEtAl2021,
  author = {Ivanovic, B. and Lee, K-H. and Tokmakov, P. and Wulfe, B. and McAllister, R. and Gaidon, A. and Pavone, M.},
  title = {Heterogeneous-Agent Trajectory Forecasting Incorporating Class Uncertainty},
  booktitle = {{IEEE/RSJ Int. Conf. on Intelligent Robots \& Systems}},
  month = may,
  year = {2022},
  keywords = {pub},
  owner = {borisi},
  timestamp = {2021-09-29},
  url = {https://arxiv.org/abs/2104.12446}
}
```
B. Ivanovic, “Trajectory Forecasting in the Modern Robotic Autonomy Stack,” PhD thesis, Stanford University, Dept. of Aeronautics and Astronautics, Stanford, California, 2021.
[BibTeX] [Abstract]

Abstract: Autonomous systems are increasingly nearing widespread adoption, with new robotic platforms constantly being tested and deployed alongside humans in domains such as autonomous driving, service robotics, and surveillance. Accordingly, human-robot interaction will soon be present in many everyday scenarios. However, there are still many challenges preventing autonomous systems from safely and smoothly navigating interactions with humans. For example, while merging into traffic is one of the most common day-to-day maneuvers we perform as drivers, it poses a major problem for state-of-the-art self-driving vehicles. The reason humans can naturally navigate through many social interaction scenarios, such as merging in traffic, is that humans have an intrinsic capacity to reason about other people’s intents, beliefs, and desires, applying this reasoning to predict what might happen in the future and make corresponding decisions. As a result, imbuing autonomous systems with the ability to reason about other agents’ potential future actions is critical to enabling informed decision making and proactive actions to be taken in human-robot interaction scenarios. Indeed, the ability to predict other agents’ behaviors (also known as "trajectory forecasting") has already become a core component of modern robotic systems, especially so in safety-critical applications such as autonomous vehicles. Towards this end, this dissertation tackles the development of trajectory forecasting methods, their effective integration within the robotic autonomy stack, and the injection of task-awareness in their performance evaluation.
```
@phdthesis{Ivanovic2021,
  author = {Ivanovic, B.},
  title = {Trajectory Forecasting in the Modern Robotic Autonomy Stack},
  school = {{Stanford University, Dept. of Aeronautics and Astronautics}},
  year = {2021},
  address = {Stanford, California},
  month = dec,
  url = {https://stacks.stanford.edu/file/druid:nw436bv8593/IvanovicPhD-augmented.pdf},
  owner = {bylard},
  timestamp = {2021-12-06}
}
```
S. Schaefer, K. Leung, B. Ivanovic, and M. Pavone, “Leveraging Neural Network Gradients within Trajectory Optimization for Proactive Human-Robot Interactions,” in Proc. IEEE Conf. on Robotics and Automation, Xi’an, China, 2021.
[BibTeX] [Abstract]

Abstract: To achieve seamless human-robot interactions, robots need to intimately reason about complex interaction dynamics and future human behaviors within their motion planning process. However, there is a disconnect between state-of-the-art neural network-based human behavior models and robot motion planners—either the behavior models are limited in their consideration of downstream planning or a simplified behavior model is used to ensure tractability of the planning problem. In this work, we present a framework that fuses together the interpretability and flexibility of trajectory optimization (TO) with the predictive power of state-of-the-art human trajectory prediction models. In particular, we leverage gradient information from data-driven prediction models to explicitly reason about human-robot interaction dynamics within a gradient-based TO problem. We demonstrate the efficacy of our approach in a multi-agent scenario whereby a robot is required to safely and efficiently navigate through a crowd of up to ten pedestrians. We compare against a variety of planning methods, and show that by explicitly accounting for interaction dynamics within the planner, our method offers safer and more efficient behaviors, even yielding proactive and nuanced behaviors such as waiting for a pedestrian to pass before moving.
```
@inproceedings{SchaeferLeungEtAl2021,
  author = {Schaefer, S. and Leung, K. and Ivanovic, B. and Pavone, M.},
  title = {Leveraging Neural Network Gradients within Trajectory Optimization for Proactive Human-Robot Interactions},
  booktitle = {{Proc. IEEE Conf. on Robotics and Automation}},
  year = {2021},
  address = {Xi'an, China},
  month = may,
  url = {https://arxiv.org/abs/2012.01027},
  owner = {borisi},
  timestamp = {2020-11-01}
}
```
B. Ivanovic, K. Leung, E. Schmerling, and M. Pavone, “Multimodal Deep Generative Models for Trajectory Prediction: A Conditional Variational Autoencoder Approach,” IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 295–302, Apr. 2021.
[BibTeX] [Abstract]

Abstract: Human behavior prediction models enable robots to anticipate how humans may react to their actions, and hence are instrumental to devising safe and proactive robot planning algorithms. However, modeling complex interaction dynamics and capturing the possibility of many possible outcomes in such interactive settings is very challenging, which has recently prompted the study of several different approaches. In this work, we provide a self-contained tutorial on a conditional variational autoencoder (CVAE) approach to human behavior prediction which, at its core, can produce a multimodal probability distribution over future human trajectories conditioned on past interactions and candidate robot future actions. Specifically, the goals of this tutorial paper are to review and build a taxonomy of state-of-the-art methods in human behavior prediction, from physics-based to purely data-driven methods, provide a rigorous yet easily accessible description of a data-driven, CVAE-based approach, highlight important design characteristics that make this an attractive model to use in the context of model-based planning for human-robot interactions, and provide important design considerations when using this class of models.
```
@article{IvanovicLeungEtAl2020,
  author = {Ivanovic, B. and Leung, K. and Schmerling, E. and Pavone, M.},
  title = {Multimodal Deep Generative Models for Trajectory Prediction: A Conditional Variational Autoencoder Approach},
  journal = {{IEEE Robotics and Automation Letters}},
  volume = {6},
  number = {2},
  pages = {295--302},
  year = {2021},
  month = apr,
  url = {https://arxiv.org/abs/2008.03880},
  owner = {borisi},
  timestamp = {2020-12-23}
}
```
M. Itkina, B. Ivanovic, R. Senanayake, M. J. Kochenderfer, and M. Pavone, “Evidential Sparsification of Multimodal Latent Spaces in Conditional Variational Autoencoders,” in Conf. on Neural Information Processing Systems, 2020.
[BibTeX] [Abstract]

Abstract: Discrete latent spaces in variational autoencoders have been shown to effectively capture the data distribution for many real-world problems such as natural language understanding, human intent prediction, and visual scene representation. However, discrete latent spaces need to be sufficiently large to capture the complexities of real-world data, rendering downstream tasks computationally challenging. For instance, performing motion planning in a high-dimensional latent representation of the environment could be intractable. We consider the problem of sparsifying the discrete latent space of a trained conditional variational autoencoder, while preserving its learned multimodality. As a post hoc latent space reduction technique, we use evidential theory to identify the latent classes that receive direct evidence from a particular input condition and filter out those that do not. Experiments on diverse tasks, such as image generation and human behavior prediction, demonstrate the effectiveness of our proposed technique at reducing the discrete latent sample space size of a model while maintaining its learned multimodality.
```
@inproceedings{ItkinaIvanovicEtAl2019,
  author = {Itkina, M. and Ivanovic, B. and Senanayake, R. and Kochenderfer, M. J. and Pavone, M.},
  title = {Evidential Sparsification of Multimodal Latent Spaces in Conditional Variational Autoencoders},
  booktitle = {{Conf. on Neural Information Processing Systems}},
  year = {2020},
  address = {},
  month = dec,
  owner = {borisi},
  timestamp = {2020-09-27},
  url = {https://arxiv.org/abs/2010.09164}
}
```
B. Ivanovic, A. Elhafsi, G. Rosman, A. Gaidon, and M. Pavone, “MATS: An Interpretable Trajectory Forecasting Representation for Planning and Control,” in Conf. on Robot Learning, 2020.
[BibTeX] [Abstract]

Abstract: Reasoning about human motion is a core component of modern human-robot interactive systems. In particular, one of the main uses of behavior prediction in autonomous systems is to inform ego-robot motion planning and control. However, a majority of planning and control algorithms reason about system dynamics rather than the predicted agent tracklets that are commonly output by trajectory forecasting methods, which can hinder their integration. Towards this end, we propose Mixtures of Affine Time-varying Systems (MATS) as an output representation for trajectory forecasting that is more amenable to downstream planning and control use. Our approach leverages successful ideas from probabilistic trajectory forecasting works to learn dynamical system representations that are well-studied in the planning and control literature. We integrate our predictions with a proposed multimodal planning methodology and demonstrate significant computational efficiency improvements on a large-scale autonomous driving dataset.
```
@inproceedings{IvanovicElhafsiEtAl2020,
  author = {Ivanovic, B. and Elhafsi, A. and Rosman, G. and Gaidon, A. and Pavone, M.},
  title = {{MATS}: An Interpretable Trajectory Forecasting Representation for Planning and Control},
  booktitle = {{Conf. on Robot Learning}},
  year = {2020},
  month = nov,
  owner = {borisi},
  timestamp = {2020-10-14},
  url = {https://arxiv.org/abs/2009.07517}
}
```
H. Nishimura, B. Ivanovic, A. Gaidon, M. Pavone, and M. Schwager, “Risk-Sensitive Sequential Action Control with Multi-Modal Human Trajectory Forecasting for Safe Crowd-Robot Interaction,” in IEEE/RSJ Int. Conf. on Intelligent Robots & Systems, 2020.
[BibTeX] [Abstract]

Abstract: This paper presents a novel online framework for safe crowd-robot interaction based on risk-sensitive stochastic optimal control, wherein the risk is modeled by the entropic risk measure. The control algorithm relies on mode insertion gradient optimization for this risk measure as well as Monte Carlo sampling from Trajectron++, a state-of-the-art generative model that produces multimodal probabilistic trajectory forecasts for multiple interacting agents. Our modular approach decouples the crowd-robot interaction into learning-based prediction and model-based control, which is advantageous compared to end-to-end policy learning methods in that it allows the robot’s desired behavior to be specified at run time. In particular, we show that the robot exhibits diverse interaction behavior by varying the risk sensitivity parameter. A simulation study and a real-world experiment show that the proposed online framework can accomplish safe and efficient navigation while avoiding collisions with more than 50 humans in the scene.
```
@inproceedings{NishimuraIvanovicEtAl2020,
  author = {Nishimura, H. and Ivanovic, B. and Gaidon, A. and Pavone, M. and Schwager, M.},
  title = {Risk-Sensitive Sequential Action Control with Multi-Modal Human Trajectory Forecasting for Safe Crowd-Robot Interaction},
  booktitle = {{IEEE/RSJ Int. Conf. on Intelligent Robots \& Systems}},
  year = {2020},
  address = {},
  month = oct,
  owner = {borisi},
  timestamp = {2020-07-03},
  url = {https://arxiv.org/abs/2009.05702}
}
```
T. Salzmann, B. Ivanovic, P. Chakravarty, and M. Pavone, “Trajectron++: Dynamically-Feasible Trajectory Forecasting With Heterogeneous Data,” in European Conf. on Computer Vision, 2020.
[BibTeX] [Abstract]

Abstract: Reasoning about human motion is an important prerequisite to safe and socially-aware robotic navigation. As a result, multi-agent behavior prediction has become a core component of modern human-robot interactive systems, such as self-driving cars. While there exist many methods for trajectory forecasting, most do not enforce dynamic constraints and do not account for environmental information (e.g., maps). Towards this end, we present Trajectron++, a modular, graph-structured recurrent model that forecasts the trajectories of a general number of diverse agents while incorporating agent dynamics and heterogeneous data (e.g., semantic maps). Trajectron++ is designed to be tightly integrated with robotic planning and control frameworks; for example, it can produce predictions that are optionally conditioned on ego-agent motion plans. We demonstrate its performance on several challenging real-world trajectory forecasting datasets, outperforming a wide array of state-of-the-art deterministic and generative methods.
```
@inproceedings{SalzmannIvanovicEtAl2020,
  author = {Salzmann, T. and Ivanovic, B. and Chakravarty, P. and Pavone, M.},
  title = {Trajectron++: Dynamically-Feasible Trajectory Forecasting With Heterogeneous Data},
  booktitle = {{European Conf. on Computer Vision}},
  year = {2020},
  address = {},
  month = aug,
  owner = {borisi},
  timestamp = {2020-09-14},
  url = {https://arxiv.org/abs/2001.03093}
}
```
A. Elhafsi, B. Ivanovic, L. Janson, and M. Pavone, “Map-Predictive Motion Planning in Unknown Environments,” in Proc. IEEE Conf. on Robotics and Automation, Paris, France, 2020.
[BibTeX] [Abstract]

Abstract: Algorithms for motion planning in unknown environments are generally limited in their ability to reason about the structure of the unobserved environment. As such, current methods generally navigate unknown environments by relying on heuristic methods to choose intermediate objectives along frontiers. We present a unified method that combines map prediction and motion planning for safe, time-efficient autonomous navigation of unknown environments by dynamically-constrained robots. We propose a data-driven method for predicting the map of the unobserved environment, using the robot’s observations of its surroundings as context. These map predictions are then used to plan trajectories from the robot’s position to the goal without requiring frontier selection. We demonstrate that our map-predictive motion planning strategy yields a substantial improvement in trajectory time over a naive frontier pursuit method and demonstrates similar performance to methods using more sophisticated frontier selection heuristics with significantly shorter computation time.
```
@inproceedings{ElhafsiIvanovicEtAl2020,
  author = {Elhafsi, A. and Ivanovic, B. and Janson, L. and Pavone, M.},
  title = {Map-Predictive Motion Planning in Unknown Environments},
  booktitle = {{Proc. IEEE Conf. on Robotics and Automation}},
  year = {2020},
  address = {Paris, France},
  month = jun,
  url = {https://arxiv.org/abs/1910.08184},
  owner = {borisi},
  timestamp = {2019-10-21}
}
```
B. Ivanovic and M. Pavone, “The Trajectron: Probabilistic Multi-Agent Trajectory Modeling with Dynamic Spatiotemporal Graphs,” in IEEE Int. Conf. on Computer Vision, Seoul, South Korea, 2019.
[BibTeX] [Abstract]

Abstract: Developing safe human-robot interaction systems is a necessary step towards the widespread integration of autonomous agents in society. A key component of such systems is the ability to reason about the many potential futures (e.g. trajectories) of other agents in the scene. Towards this end, we present the Trajectron, a graph-structured model that predicts many potential future trajectories of multiple agents simultaneously in both highly dynamic and multimodal scenarios (i.e. where the number of agents in the scene is time-varying and there are many possible highly-distinct futures for each agent). It combines tools from recurrent sequence modeling and variational deep generative modeling to produce a distribution of future trajectories for each agent in a scene. We demonstrate the performance of our model on several datasets, obtaining state-of-the-art results on standard trajectory prediction metrics as well as introducing a new metric for comparing models that output distributions.
```
@inproceedings{IvanovicPavone2019,
  author = {Ivanovic, B. and Pavone, M.},
  title = {The {Trajectron}: Probabilistic Multi-Agent Trajectory Modeling with Dynamic Spatiotemporal Graphs},
  booktitle = {{IEEE Int. Conf. on Computer Vision}},
  year = {2019},
  address = {Seoul, South Korea},
  month = oct,
  url = {https://arxiv.org/abs/1810.05993},
  owner = {borisi},
  timestamp = {2019-07-22}
}
```
B. Ivanovic, J. Harrison, A. Sharma, M. Chen, and M. Pavone, “BaRC: Backward Reachability Curriculum for Robotic Reinforcement Learning,” in Proc. IEEE Conf. on Robotics and Automation, Montreal, Canada, 2019.
[BibTeX] [Abstract]

Abstract: Model-free Reinforcement Learning (RL) offers an attractive approach to learn control policies for high-dimensional systems, but its relatively poor sample complexity often forces training in simulated environments. Even in simulation, goal-directed tasks whose natural reward function is sparse remain intractable for state-of-the-art model-free algorithms for continuous control. The bottleneck in these tasks is the prohibitive amount of exploration required to obtain a learning signal from the initial state of the system. In this work, we leverage physical priors in the form of an approximate system dynamics model to design a curriculum scheme for a model-free policy optimization algorithm. Our Backward Reachability Curriculum (BaRC) begins policy training from states that require a small number of actions to accomplish the task, and expands the initial state distribution backwards in a dynamically-consistent manner once the policy optimization algorithm demonstrates sufficient performance. BaRC is general, in that it can accelerate training of any model-free RL algorithm on a broad class of goal-directed continuous control MDPs. Its curriculum strategy is physically intuitive, easy-to-tune, and allows incorporating physical priors to accelerate training without hindering the performance, flexibility, and applicability of the model-free RL algorithm. We evaluate our approach on two representative dynamic robotic learning problems and find substantial performance improvement relative to previous curriculum generation techniques and naïve exploration strategies
```
@inproceedings{IvanovicHarrisonEtAl2019,
  author = {Ivanovic, B. and Harrison, J. and Sharma, A. and Chen, M. and Pavone, M.},
  title = {{BaRC:} Backward Reachability Curriculum for Robotic Reinforcement Learning},
  booktitle = {{Proc. IEEE Conf. on Robotics and Automation}},
  year = {2019},
  address = {Montreal, Canada},
  month = may,
  url = {https://arxiv.org/pdf/1806.06161.pdf},
  owner = {borisi},
  timestamp = {2018-09-05}
}
```
B. Ivanovic, E. Schmerling, K. Leung, and M. Pavone, “Generative Modeling of Multimodal Multi-Human Behavior,” in IEEE/RSJ Int. Conf. on Intelligent Robots & Systems, Madrid, Spain, 2018.
[BibTeX] [Abstract]

Abstract: This work presents a methodology for modeling and predicting human behavior in settings with N humans interacting in highly multimodal scenarios (i.e. where there are many possible highly-distinct futures). A motivating example includes robots interacting with humans in crowded environments, such as self-driving cars operating alongside human-driven vehicles or human-robot collaborative bin packing in a warehouse. Our approach to model human behavior in such uncertain environments is to model humans in the scene as nodes in a graphical model, with edges encoding relationships between them. For each human, we learn a multimodal probability distribution over future actions from a dataset of multi-human interactions. Learning such distributions is made possible by recent advances in the theory of conditional variational autoencoders and deep learning approximations of probabilistic graphical models. Specifically, we learn action distributions conditioned on interaction history, neighboring human behavior, and candidate future agent behavior in order to take into account response dynamics. We demonstrate the performance of such a modeling approach in modeling basketball player trajectories, a highly multimodal, multi-human scenario which serves as a proxy for many robotic applications.
```
@inproceedings{IvanovicSchmerlingEtAl2018,
  author = {Ivanovic, B. and Schmerling, E. and Leung, K. and Pavone, M.},
  title = {Generative Modeling of Multimodal Multi-Human Behavior},
  booktitle = {{IEEE/RSJ Int. Conf. on Intelligent Robots \& Systems}},
  year = {2018},
  address = {Madrid, Spain},
  month = oct,
  url = {https://arxiv.org/pdf/1803.02015.pdf},
  owner = {borisi},
  timestamp = {2018-10-14}
}
```
J. Harrison, A. Garg, B. Ivanovic, Y. Zhu, S. Savarese, F.-F. Li, and M. Pavone, “ADAPT: Zero-Shot Adaptive Policy Transfer for Stochastic Dynamical Systems,” in Int. Symp. on Robotics Research, Puerto Varas, Chile, 2017.
[BibTeX] [Abstract]

Abstract: Model-free policy learning has enabled robust performance of complex tasks with relatively simple algorithms. However, this simplicity comes at the cost of requiring an Oracle and arguably very poor sample complexity. This renders such methods unsuitable for physical systems. Variants of model-based methods address this problem through the use of simulators, however, this gives rise to the problem of policy transfer from simulated to the physical system. Model mismatch due to systematic parameter shift and unmodelled dynamics error may cause suboptimal or unsafe behavior upon direct transfer. We introduce the Adaptive Policy Transfer for Stochastic Dynamics (ADAPT) algorithm that achieves provably safe and robust, dynamically-feasible zero-shot transfer of RL-policies to new domains with dynamics error. ADAPT combines the strengths of offline policy learning in a black-box source simulator with online tube-based MPC to attenuate bounded model mismatch between the source and target dynamics. ADAPT allows online transfer of policy, trained solely in a simulation offline, to a family of unknown targets without fine-tuning. We also formally show that (i) ADAPT guarantees state and control safety through state-action tubes under the assumption of Lipschitz continuity of the divergence in dynamics and, (ii) ADAPT results in a bounded loss of reward accumulation in case of direct transfer with ADAPT as compared to a policy trained only on target. We evaluate ADAPT on 2 continuous, non-holonomic simulated dynamical systems with 4 different disturbance models, and find that ADAPT performs between 50%-300% better on mean reward accrual than direct policy transfer.
```
@inproceedings{HarrisonGargEtAl2017,
  author = {Harrison, J. and Garg, A. and Ivanovic, B. and Zhu, Y. and Savarese, S. and Li, F.-F. and Pavone, M.},
  title = {{ADAPT:} Zero-Shot Adaptive Policy Transfer for Stochastic Dynamical Systems},
  booktitle = {{Int. Symp. on Robotics Research}},
  year = {2017},
  address = {Puerto Varas, Chile},
  month = dec,
  url = {https://arxiv.org/pdf/1707.04674.pdf},
  owner = {pavone},
  timestamp = {2018-01-16}
}
```

Contacts:

Boris Ivanovic

Awards:

ASL Publications