About Me

I graduated from UC Berkeley in 2019 with a B.S. in Electrical Engineering and Computer Science. At Berkeley, I had the privilege to do research in Anca Dragan's InterACT Lab, focusing on human-robot interaction and game-theoretic hierarchical planning for autonomous vehicles.
In addition to this experience, I enjoyed learning about diverse areas of robotics, completing projects on motion planning, controllers, grasping, and soft robotics, among other topics.

After graduating, I joined Symbio Robotics, where I designed controllers and trajectory generation methods, created an endurance testing framework, and worked on simulation.

Now, I'm on a team in Waymo Research focused on the intersection of machine learning, reinforcement learning, and traditional motion planning.



Hierarchical Game-Theoretic Planning for Autonomous Vehicles

J.F. Fisac*, E. Bronstein*, E. Stefansson, D. Sadigh, S.S. Sastry, A.D. Dragan

International Conference on Robotics and Automation (ICRA), 2019
pdf poster

The actions of an autonomous vehicle on the road affect and are affected by those of other drivers, whether overtaking, negotiating a merge, or avoiding an accident. This mutual dependence, best captured by dynamic game theory, creates a strong coupling between the vehicle's planning and its predictions of other drivers' behavior, and constitutes an open problem with direct implications on the safety and viability of autonomous driving technology. Unfortunately, dynamic games are too computationally demanding to meet the real-time constraints of autonomous driving in its continuous state and action space. In this paper, we introduce a novel game-theoretic trajectory planning algorithm for autonomous driving, that enables real-time performance by hierarchically decomposing the underlying dynamic game into a long-horizon" strategic" game with simplified dynamics and full information structure, and a short-horizon" tactical" game with full dynamics and a simplified information structure. The value of the strategic game is used to guide the tactical planning, implicitly extending the planning horizon, pushing the local trajectory optimization closer to global solutions, and, most importantly, quantitatively accounting for the autonomous vehicle and the human driver's ability and incentives to influence each other. In addition, our approach admits non-deterministic models of human decision-making, rather than relying on perfectly rational predictions. Our results showcase richer, safer, and more effective autonomous behavior in comparison to existing techniques.

An Efficient Reachability-Based Framework for Provably Safe Autonomous Navigation in Unknown Environments

A. Bajcsy*, S. Bansal*, E. Bronstein, V. Tolani, C.J. Tomlin

Conference on Decision and Control (CDC), 2019

Real-world autonomous vehicles often operate in a priori unknown environments. Since most of these systems are safety-critical, it is important to ensure they operate safely in the face of environment uncertainty, such as unseen obstacles. Current safety analysis tools enable autonomous systems to reason about safety given full information about the state of the environment a priori. However, these tools do not scale well to scenarios where the environment is being sensed in real time, such as during navigation tasks. In this work, we propose a novel, real-time safety analysis method based on Hamilton-Jacobi reachability that provides strong safety guarantees despite environment uncertainty. Our safety method is planner-agnostic and provides guarantees for a variety of mapping sensors. We demonstrate our approach in simulation and in hardware to provide safety guarantees around a state-of-the-art vision-based, learning-based planner.

Generating Highly Predictive Probabilistic Models Of Task Durations

I.K. Isukapati, C. Igoe, E. Bronstein, V. Parimi, S.F. Smith

IEEE Transactions on Intelligent Transportation Systems, March 2020
pdf poster

In many applications, uncertainty in the durations of tasks complicates the development of plans and schedules. This has given rise to a range of resilient planning and scheduling techniques that in some way rely on probabilistic models of task durations. In this paper we consider the problem of using historical data to develop probabilistic task models for such planning and scheduling techniques. We describe a novel, Bayesian hierarchical approach for constructing task duration distributions from past data, and demonstrate its effectiveness in constructing predictive probabilistic distribution models. Unlike traditional statistical learning techniques, the proposed approach relies on minimal data, is inherently adaptive to time varying task duration distribution, and provides a rich description of confidence for decision making. These ideas are demonstrated using historical data provided by a local transit authority on bus dwell times at urban bus stops. Our results show that the task distributions generated by our approach yield significantly more accurate predictions than those generated by standard regression techniques.


Grasp Transfer by Parts

Class project for EECS 106B: Robotic Manipulation and Interaction

Grasping, which focuses on enabling robots to manipulate objects, is challenging because of the large space of possible grasps and object poses that must be considered. We seek to decrease the complexity of planning grasps on objects by 1) segmenting a query object into parts, and 2) transferring precomputed good grasps to these parts from parts of previously seen objects, with the novel consideration that the query object and other object need not be from the same semantic class.

Extending Single-Task Policy Distillation in Reinforcement Learning

Class project for CS 294-112: Deep Reinforcement Learning

Deep learning models with a large number of parameters are often unnecessarily large and time-consuming during both training and prediction. Policy distillation seeks to address this concern by distilling the policy from a larger teacher network to a smaller student network. We aim to improve the student network’s training sample complexity by considering 1) how a student network’s exploration strategy affects its learning behavior, and 2) how a student network can efficiently learn from multiple teachers of varying expertise. To address the first question, we conduct experiments with teacher and student Deep Q-Networks in the Pong environment and test several student exploration strategies (greedy, epsilon-greedy, Boltzmann, and Bayesian exploration with dropout), finding that greedy and Bayesian strategies result in minimal sample complexity. To explore the second question, we pose the problem of multi-teacher single-task policy distillation as a multi-armed bandit problem, where the teachers are the arms and the payoffs are the rewards the student receives after learning from the teachers. We show that non-contextual bandit algorithms such as random, epsilon-greedy, and UCB1 perform well when learning from multiple teachers, and UCB1 learns efficiently even when the teachers are of varying expertise. We also suggest how contextual bandit algorithms can use the state observation to decide which teacher to learn from, thus learning a holistic policy over the entire state space from teachers that are experts in subparts of the state space.