Читать книгу Robot Learning from Human Teachers - Sonia Chernova - Страница 9
ОглавлениеCHAPTER 2
Human Social Learning
When a machine learner is in the presence of a human that is motivated to help, social interaction can be a key element in the success of the learning process. Although robots can also learn from observing demonstrations not directed at them, albeit less efficiently, the scenario we address here is primarily the one where a person is explicitly trying to teach the robot something in particular.
In this chapter, we review some key insights from human psychology that can influence the design of learning robots. We focus our discussion on findings in situated learning, a field of study that looks at the social world of a child and how it contributes to their development. In a situated learning interaction, a good instructor maintains a mental model of the learner’s understanding and structures the learning task appropriately with timely feedback and guidance. The learner contributes to the process by expressing their internal state via communicative acts (e.g., expressing understanding, confusion, attention, etc.). This reciprocal and tightly coupled interaction enables the learner to leverage from instruction to build the appropriate representations and associations.
The situated learning process stands in contrast to typical scenarios of machine learning which are often neither interactive nor intuitive for a non-expert human partner. Since social learning mechanisms used by humans are both proven to be effective and naturally occurring across society, enabling robots to engage in social interaction with the user can lead to more flexible, efficient, personable and teachable machines that more closely match the user’s expectations in behavior.
It is worth noting that despite its reliance on human teachers, the field of Learning from Demonstration has not focused much attention on the interactivity of the learning system. As we will see in Chapters 4 and 5, it is quite typical to first collect demonstrations in batch and then have a learning algorithm use this data to model a skill or task later. What the work highlighted in this chapter points out is the distinction between a typical batch process and the interactivity of a social learning process. We will return to this topic in Chapter 6, where we consider how to make an LfD process interactive through online learning, high level critiques of the robot’s exploration, and the incorporation of Active Learning.
Figure 2.1: In this chapter we start with a look at the Human Teacher component of the LfD pipeline. A survey of human social learning provides insight into biases and expectations that a human may bring to the LfD process.
Figure 2.2: Starting at an early ages, children use the information around them to learn from observation, experience, and instruction, striving to imitate the adults around them.
In this chapter, we highlight characteristics of human social learning in the first three sections. We look at human motivation for learning, how human teachers scaffold the learning process, and what feedback human learners provide. All of these topics have implications for the technical design of robot learners, which are the focus of the remaining chapters of this book (Figure 2.1).
2.1 LEARNING IS A PART OF ALL ACTIVITY
In most Machine Learning scenarios, learning is an explicit activity. The system is designed to learn a particular thing at a particular time. With humans, on the other hand, there is an ever-present motivation for learning, a drive to improve oneself, and an ability to seek out the expertise of others. Some inspiring characteristics of a motivated learner include: a curiosity about new environments and experiences; the ability to recognize and exploit good sources of information, and to adopt such an information source as a role model; the desire to “be more like” that role model, which underlies all activity; and a sense of one’s level of mastery with acquired skills, further driving the motivation to explore and learn about the world at opportune times.
Self-Determination Theory seeks to understand the mechanisms behind both intrinsic and extrinsic motivation in human behavior in general [224]. Here our focus is on situated learning interactions rather than self-motivated learning. We summarize two types of human motivation that lay the foundation for social learning interactions.
Motivated to Interact
A critical part of learning is gaining the ability to exploit the expertise of others [203]. Children put themselves in a good position to learn new things by being able to recognize, seek proximity to, and interact with their caregivers. They assume that the caregiver has their best interest in mind and even very young infants use this to their advantage when faced with an unknown situation [219].
The ability and desire to engage, communicate, and interact with others is seen from an early age. By the time infants are two months old, they can actively engage in communicative interactions or turn-taking routines with adults. Studies have shown that infants can start and stop communication with their mother through gesture and gaze, and that it is the infants that control the pace of the turn taking interaction [130, 257]. This turn taking capability is the foundation of many situated learning activities, and is a precursor to more sophisticated interactions, such as imitation. For example, Arbib characterizes learning as assisted imitation, a dynamic turn-taking activity [274]. Bruner characterizes social scaffolding interactions in general as asymmetric cooperation that becomes symmetric over time [99]. Thus, turn-taking engagements are an underlying framework in which learning takes place.
Turn-taking abilities are characteristically based on causal assumptions about the world. There is an expectation that the world, and particularly other actors in the world, will have some contingent response to one’s activity. Thus, the ability to take advantage of these social interactions requires a robot to have models of engagement, turn taking, and other fundamental social skills. A growing body of research within the HRI field has focused on models for engagement and turn-taking. The work of [218] and [110] identifies and generates “connection events” in order for a robot to maintain engagement with a human interaction partner. Other systems have been developed to control multimodal dialog for social robots, such as the work of [128] that controls dynamic switching of behaviors in the speech and gesture modalities, and the framework of [185] that controls task-based dialog using parallelized processes with interruption handling. The work of [62] and [63] centers on building autonomous robot controllers for successfully engaging in human-like turn-taking interactions, with a computational model for regulating the speaking floor that explicitly represents and reasons about all four components of the behavior regulation problem: seizing the speaking floor, yielding the floor, holding the floor, and auditing the owner of the floor.
Motivated to Learn
Another important influence on human learning is the idea of a “like-me” bias—the propensity and ability to map between actions seen by others and done by self is seen at a very early age [174]. As the child grows older, interacting with adults, they come to understand that the adult is “like-me” and is therefore a source of information about actions and skills [274]. For example, both Bruner and Leontiev indicate that play is intrinsically motivated and that the object of play is the desire to be like adults and participate in the adult world [107]. Lave and Wenger make a similar argument for the motivation of learning altogether [155]. They develop of theory of “Legitimate Peripheral Participation,” in which the driving force for learning a new practice is the learner’s motivation to form their identity and become a full participant in the practice. On a large scale this is the motivation of all learning, children “wanting to become full participants in the adult world.”
Litowitz has a similar explanation: the child wishes to be like the adult and is thus motivated to imitate and be lead through activities by the adult. He goes one step further, however, and poses an elegant theory of why the process stops. The child gets out of the subordinate learner role and becomes capable on its own through the very same mechanism. The desire to be like the adult extends to the meta-activity level, the child comes to want to have the adult-role of structuring activity (wanting to choose the clothes they wear, resisting being told what to do, etc.) [163].
Given this motivation to imitate, there are several ways in which an adult’s behavior can influence a child’s exploration or learning process. The following four social learning mechanisms have been identified in both human and animal learners [56, 254].
• Stimulus (local) enhancement is a mechanism through which an observer (child, novice) is drawn to objects others interact with. This facilitates learning by focusing the observer’s exploration on interesting objects—ones useful to other social group members.
• Emulation is a process where the observer witnesses someone produce a particular result on an object, but then employs their own action repertoire to produce the result. Learning is facilitated both by attention direction to an object of interest and by observing the goal.
• Mimicking corresponds to the observer copying the actions of others without an appreciation of their purpose. The observer later comes to discover the effects of the action in various situations. Mimicking suggests, to the observer, actions that can produce useful results.
• Imitation refers to reproducing the actions of others to obtain the same results with the same goal.
Cakmak et al. [46] present an implementation of these four social learning mechanisms and articulate the distinct computational benefits of each. Their results show that all four social strategies provide learning benefits over self exploration, particularly when the target goal of learning is a rare occurrence in the environment. The work characterizes the differences between strategies, showing that the “best” one depends on both the nature of the problem space and the current behavior of the social partner.
The general concept of motivation has also been studied in the context of reinforcement learning. Intrinsically motivated RL been proposed as a framework within which agents exploit “internal reinforcement” that rewards novel situations or experiences [65, 233]. A number of other techniques for integrating self-motivation and curiosity have also been studied within the context of developmental learning [121, 200, 229], however these methodologies have not yet been applied in the context of interactive learning agents or LfD.
Figure 2.3: Examples of scaffolding the learning process through attention direction and simplification of the task or environment.
2.2 TEACHERS SCAFFOLD THE LEARNING PROCESS
An important characteristic of a good learner is the ability to learn both on one’s own and by interacting with another. Children are capable of exploring and learning on their own, but in the presence of a teacher they can take advantage of the social cues and communicative acts provided to accomplish more. For instance, the teacher often guides the child’s search process by providing timely feedback, luring the child to perform desired behaviors, and controlling the environment so the appropriate cues are easy to attend to, thereby allowing the child to learn more effectively, appropriately, and flexibly. Scaffolding is the process by which an adult organizes a new skill into manageable steps and provides support such that a child can achieve something they would not be able to accomplish independently [99, 265]. A good teacher will scale instruction appropriately and create a good environment for learning the task at hand. In robotics, the human may be able to help the robot with hard problems like “what to learn,” “when to learn,” “what action to try,” and “how to measure success” [35].
2.2.1 ATTENTION DIRECTION
Attention direction is one of the essential mechanisms that contributes to the learning process [268, 274]. Analyzing parent-child tutoring sessions reveals a number of ways that adults provide structure and guide attention to let children succeed: placing important objects close to the child’s face, arranging the physical environment such that the desired action is within reach, or doing a demonstration in the infant’s line of sight to introduce object affordances.
The adult is also implicitly directing the child’s attention with their gaze direction. The tendency to follow eye gaze is seen very early on, this is a first step to reference and joint attention. It has also been shown that in order to hold joint attention and direct the infant’s attention, a communicative situation must first be established. This can be with a period of eye contact, verbal, or behavioral contingent responses [76].
Within HRI research, a growing body of work has focused on social gaze behavior [117, 127, 153, 181, 182, 230, 256, 270], for example in the use of gaze for regulating turn-taking in two-party [153, 270] and multi-party conversations [24, 171, 182, 256]. These studies provide strong evidence that gaze cues from a robot support conversational functions and result in a more natural interaction with a human. As an example of applying this to context of learning, [183] showed how using human-like visual saliency detection may help a robot learner segment a teaching demonstration into steps, and determine the right aspects of the state to pay attention to during the demonstration.
Another way of directing attention is to emphasize or exaggerate parts of the desired movement. This form of instruction is challenging to adapt to LfD because the goal is not to reproduce the exaggeration itself, but instead to direct the focus of attention during learning.
2.2.2 DYNAMIC SCAFFOLDING
Dynamic scaffolding is the notion that adults create a learning situation that is the right level of complexity for the learner. The adult adjusts dynamically to make sure the child is working within the Zone of Proximal Development, defined as the gap between what a learner has already mastered and what he or she can achieve with the aid of a teacher. In a way, the teacher creates “microworlds” for the learner to master parts of the task in isolation before moving on, providing safety and intermediate attainable goals [42]. For example, with language parents first treat anything as conversational speech, but eventually they raise their expectations, scaffolding the child’s conversational abilities [257]. In book reading, the parent will at first ask and answer their own questions, and later they will expect the child to participate in the question/answer game.
Closely related to this idea is Lave and Wenger’s theory of legitimate peripheral participation, which states that the best way to learn is by starting on the sidelines and gradually gaining responsibility. This limits the opportunity for failure while still letting the newcomer play a legitimate part in the community. The level of scaffolding provided is an important factor in learning, instructors that always intervene to prevent problems may actually inhibit learning and the development of abilities to detect and prevent errors [219].
The idea of scaffolding has been adapted into machine learning, and LfD specifically. Several LfD techniques have leveraged the human teacher in spacial scaffolding, in which the teacher restructures the learning environment to direct or focus the attention of the learner on the most relevant aspects of the task being learned [26, 227, 228]. Within other techniques, scaffolding is used as a means to build complex behaviors by combining or adapting simpler previously taught skills [13, 14, 129].
2.2.3 EXTERNALIZING AND MODELING METACOGNITION
When working with children, adults often externalize the thinking process [23, 57]. In problem solving, a common simplification is to switch from an open-ended “wh” question (where, who, why, etc.), to yes/no questions when the child is having trouble. For example when asking “do you know where X is?” and the child says “no” or has trouble, the adult will switch to yes/no questions like “is it …?” to frame the search space. Often the yes/no questions are absurd to define the extremes of the space, instead exemplifying the process that the child should be using to come up with the answer for the question.
Greenfield also observes that if a child turns to an adult during a task, the adult may ask a question or give a gesture hint. The questions asked are meant to elicit the thinking process. Additionally, an important role that the adult plays in a child’s learning process is linking new information to old, showing or suggesting to the child similarities between new problems and old ones [219]. A good teacher makes the information in a new problem compatible with what is known, guiding the generalization process, helping the child apply skills across various contexts.
Importantly, in humans, the key element that enables the above techniques to be successful is meta-learning. Children can go from being directed in a task through leading questions and hints to internalizing that process and being able to achieve the task on their own. Thus, in robots, it is important to not only follow instructions and model the specific activity, but to learn task strategies (e.g., questions to ask, what to pay attention to, etc.), from these interactions.
2.3 ROLE OF COMMUNICATION IN SOCIAL LEARNING
2.3.1 EXPRESSION PROVIDES FEEDBACK TO GUIDE A TEACHER
To be a good instructor, one must maintain a mental model of the learner’s state (e.g., what is understood so far, what remains confusing or unknown) in order to appropriately structure the learning task with timely feedback and guidance. The learner helps the instructor by expressing their internal state via communicative acts (e.g., expressions, gestures, or vocalizations that reveal understanding, confusion, attention, etc.). Through reciprocal and tightly coupled interaction, the learner and instructor cooperate to aid both the instructor’s ability to maintain a good mental model of the learner, and the learner’s ability to leverage from instruction to build the appropriate models, representations, and associations.
With this view of learning as a tightly coupled collaboration, theories of human cooperative and collaborative activity help inform the design of robot learners. Cohen et al. analyzed task dialogs in which an expert instructed a novice assembling a physical device, and found that much of task dialog can be viewed in terms of joint intentions [72]. Their study identified key discourse functions including: organizational markers that synchronize the start of new joint actions (“now,” “next,” etc.), elaborations and clarifications for when the expert believes the apprentice does not understand, and confirmations establishing the mutual belief that a step was accomplished. Another important work is that of Bratman, in which he defines prerequisites for an activity to be considered shared and cooperative, stressing the importance of mutual responsiveness, commitment to the joint activity and commitment to mutual support [34]. Cohen et al. support these guidelines and also predict that an efficient and robust collaboration scheme in a changing environment needs an open channel of communication.
These theories argue for the importance of sharing information through communication in order to maintain a successful collaborative activity. Thus, a robot learner that people will find collaborative and cooperative, must take into account nonverbal communication, such as gestures and gaze, to facilitate the interaction and maintain an understandable transparent interface between the human and the machine.
2.3.2 ASKING QUESTIONS
In developmental psychology, the role of curiosity and inquiry is highlighted time and again as a crucial component to the learning process. Early in development this is characterized in self-learning where there is an active process of effectively asking questions of the environment. Piagetian self-regulatory reflexes (e.g., sucking, grasping, circular reactions) are crucial to early learning, helping infants/children obtain developmentally appropriate experiences for learning [207]. The work of Gopnik has additionally shown that children (and adults) are highly efficient in this process. In one study, Gopnik and colleagues demonstrated to children a “blicket machine” that made a sound when certain objects were put near it but not others. When asked to figure out how to make it go, they observed that 2, 3, and 4-year olds would efficiently explore the environment with actions (interventions) to uncover the pattern of conditional dependence between objects and the sound, inferring the causal structure of the machine [97].
Later, children become experts in actively seeking knowledge from their social environment, first becoming proficient at deciding to whom to pay attention. Movellan showed that children are highly efficient in their behavior, and in the face of deciding whether or not someone or something is reacting contingently to themselves, optimize their actions to gain the most information [178]. Thus, even pre-verbal children that cannot “ask questions” in the traditional sense of the term, are not passive observers but active learners in their world.
Educational psychology gives another view, looking at questions in a pedagogical context. Grasser and Person studied tutoring sessions in both grade school and college students, classifying a variety of question categories, under two main groups, those requiring short answers vs. long answers. They then studied the frequency and intent of various questions in real tutorial settings. They found the frequency of different types of questions was similar across two different settings, and that students primarily ask questions because of a knowledge deficit and to maintain common ground (e.g., confirming knowledge) [98]. In other research they have shown that the quality of a student’s questions and the completeness of their answers are the best predictors of final exam performance. Hence, performance was not correlated with answers students gave to confirming questions like “did you get that” [204]. Thus, a good teacher must do more than ask for knowledge confirmations to maintain a good mental model of the learner’s current knowledge.
Figure 2.4: Simon, at Georgia Tech, is one example of a robot designed with both learning and social interaction in mind. Techniques for making use of scaffolding, attention direction, transparency, and question asking are central to the development of this system.
These experiments quantifying question usage are closely related to HRI goals, and techniques integrating some of these principles into LfD will be discussed in Chapter 6.
2.4 IMPLICATIONS FOR THE DESIGN OF ROBOT LEARNERS
The human learning process serves as an inspiration in the design of social learning robots. By studying human learning we gain insights into the design of advanced learning systems. Furthermore, because learning from demonstration inherently requires interaction between the robot and the user, designing the interaction to conform to the user’s expectations leads to a more natural and effective learning process. The extent to which social elements need to be integrated into LfD often depends on the application. In some circumstances, the robot may benefit from the full range of social interactions, taking into account social cues such as gestures, gaze, direction of attention, and possibly even extending to affect. In other applications, minimal or no social understanding may be required, with the interaction instead limited to a human-computer interface. In all cases, the designers of the robot strive for the most natural, flexible, and efficient learning system for the given task. The following design elements are some that should be considered in the design of robots that learn from demonstration.
• Social interaction. Should the robot leverage the social aspect of the interaction? Would learning be aided if the robot understood the social cues of the user? Would learning be aided if the robot could exhibit social cues? Which social cues are most effective for LfD interactions? Which social cues, whether from the robot or teacher, are most informative for task learning, and which social cues are most preferred by users?
• Motivation for learning. Does the robot require intrinsic motivation for learning, or will all learning be initiated and directed by the human user?
• Transparency. To be effective, a teacher must be able to maintain an as accurate a mental model of the learner’s knowledge as possible. How can the robot externalize what it has learned and make elements of the internal model transparent to the user? What techniques for communicating the learner’s knowledge should be used to aid the learning process? Is it necessary that the communication techniques mimic the way humans communicate, or is it equally (or more) effective to leverage interfaces that are not part of natural human communication, such as screen-based devices?
• Question asking. Asking questions is a critical part of the human learning process. How does the robot effectively communicate the limits of its knowledge or pose a question? How can the user frame the answer in a way that the robot can understand, and how should the gained information be used to improve the underlying model? Many different types of questions can be designed, such as “what should I do now?” or “what is the intended goal?” Given multiple possible questions, how can the robot determine which questions to ask?
• Scaffolding. Just as for humans, complex tasks can be easier for machines to learn if they are broken down into simpler components. Organization of knowledge or skills into simpler parts also often allows for greater efficiency through reuse. How can the robot leverage scaffolding in its learning and interaction with the user? How can previously learned policies be built upon and reused in new settings? Note that in addition to simply saving learned policies, this could involve parameterizing the action space of the robot, allowing a previously learned skill (e.g., pick up box) to generalize to new objects or scenarios.
• Directing attention. Humans use a number of techniques to control the direction and scope of attention within a conversation. In the context of learning, both in the role of a teacher and a student asking a question, this skill is often used to focus learning, akin to feature selection in machine learning. How can control of attention be leveraged to simplify learning in complex domains? How can the robot direct the attention of the user, and vice versa? How does the learning algorithm respond to shifts in attention?
• Online vs. batch learning. The majority of traditional machine learning techniques make use of a batch learning process, examining all the training data at once and producing a model. Learning from demonstration can be cast as a batch learning process that occurs at the end of a training session, or once enough new demonstrations are acquired. However, it can also be viewed as an online learning process in which training data is acquired incrementally, similar to active learning. The choice between online and batch learning is important in the design of an interactive learning system as it will determine the flow of interaction and how new training data is acquired and integrated into the model.
As can be seen from this discussion, social learning mechanisms have the potential to play an important role in every part of the LfD process. In the next chapter, and the ones that follow, we switch to looking at LfD from a computational perspective, studying the Machine Learning techniques that can be applied to this problem. However, human involvement remains a critical factor in the discussed methods, and we return to this topic in Chapter 6, where we consider interactive techniques for policy refinement.