de la industria del petróleo
3.5 Comercio exterior, 1996-
We test of algorithms in both simulations and with user study. In both of these we present the simulated users and the real users with queries with comparative trajectory snapshots. The snapshots are generated in a driving simulator that uses a simple point-mass model of the ego-car’s dynamics. We define the physical state of
the system x = [x, y, ψ, v]T, where x and y are the coordinates of the vehicle, ψ is the
heading and v is the speed. We let u = [u1, u2]T as the control input, where u1 is the
steering input and u2 is the acceleration. We denote the friction coefficient as µ. We
can write the dynamics model of the ego-car as:
[ ˙x, ˙y, ˙ψ, ˙v] = [v · cos(ψ), v · sin(ψ), v · u1, u2− µ · v] (4.5)
The simulator provides as top-down view of the environment.
Now given this general preference model, we developed a unified algorithmic frame- work that can learn both static and dynamic preferences using rich human guidance that does not involve demonstrations. Now what do we mean by rich human guid- ance? Here we showed how we treat learning preferences from rich guidance as a human-robot interaction game. In the following chapter we will formally define rich guidance and present our algorithmic framework.
Learning Preference from Richer
Queries
In Chapter.4 we presented a rich preference model that can represent both static and dynamic preferences for autonomous driving behavior. Our learning problem, in this thesis, involves learning a distribution over parameters of this model - (w, γ), where w is the weight vector for reward function representing a specific preference mode and γ is a parameter vector governing mode transitions as function of prior driving experience. We proposed that the robot can learn these parameters by seeking richer guidance from human. Now since comparison-based learning has emerged as a useful form of human guidance in learning preferred driving style (see Chapter.2) (5; 76; 73; 7; 30; 45; 74; 66; 43; 48)., we build on this approach and DEVELOP RICHER COMPARISON QUERIES.
5.1
Learning Preference from Rich Human Guid-
ance
Comaprison Queries. In comparison queries, the robot iteratively shows users two
possible trajectories (often in the same environment, for the same starting state), and asks which they prefer. For example, in Fig.5.1b. the autonomous car in orange is cutting in front of the white traffic car in option A, whereas in option B, it goes on straight without interacting with the white car. The car asks the user to pick between these two trajectories. It then uses the answer to update its understanding of the reward parameters. In comparison-based learning each answer tells you that one trajectory is better than another. This binary feedback is very little information compared to a demonstration where we directly get the optimal trajectory. In this thesis, we propose a middle ground between comparisons and demonstrations. We leverage several variants of the comparison-queries to collate richer information per query than just one trajectory is better than the other. We refer to these queries as
Rich Queries as opposed to Comparison-only queries.
Chapter 5. Learning Preference from Richer Queries 37 demonstrations comparisons
!
"≥ !
$!
"≥ !, ∀!
'( '()* ! "# "$Do you prefer "%or "&? Which feature matters?
!
"# "$
Do you prefer "%or "&? Which feature matters?
( = 1 + = , - . /, 1 = - +, ( /, 1 = - + /, 1 -((|/, 1) Does ℘ change ? ! "# "$
Do you prefer "%or "&? Which feature matters?
a. Comparisons are much less informative than demonstration.
b. The middle ground is augmented comparison queries.
Figure 5.1: a. We argue that comparisons are far less informative than demonstrations and can lead to slower learning of a continuous high dimensional reward function. Instead we look for a middle ground between comparisons and demonstrations. b. Our key insight in this thesis is that we can ask people for richer information in the form of rich queries: a series of hierarchical sub-queries at least one of which is a
comparison query. In this example, qi is a usual comparison query where user is
asked to pick between ξA and ξB and qi+1 is a follow-up query like which feature of
the reward function was the most responsible for your choice?
Rich Queries. Our insight is that we can extract much richer information from
a single comparison query by asking some follow-up queries like emphwhy did you
select A? or would this choice change had the white car behaved differently? (see qi+1
Fig.5.1). We propose rich query q as a sequence of connected sub-queries qi, where
the current sub-query qi forms the context for the future sub-query qi+1. A rich query
has at least one rich query comparison-only query of the form qi = (x0, ξA, ξB), where
we ask the user ”Which of the two trajectories do you prefer?”. Our framework is
designed to accommodate several sub-queries of different forms, for example, qi+1
could be a follow-up question seeking to understand the user choice in qi or qi+1could
be another comparison-only query that uses qi as a context. The user’s response to