Abstract: Infants exploit the perception that others are ‘like me’ to bootstrap social cognition (Meltzoff, 2007a). This paper demonstrates how the above theory can be instantiated in a social robot that uses itself as a model to recognize structural similarities with other robots; this thereby enables the student to distinguish between appropriate and inappropriate teachers. This is accomplished by the student robot first performing self-discovery, a phase in which it uses actuation–perception relationships to infer its own structure. Second, the student models a candidate teacher using a vision-based active learning approach to create an approximate physical simulation of the teacher. Third, the student determines that the teacher is structurally similar (but not necessarily visually similar) to itself if it can find a neural controller that allows its self model (created in the first phase) to reproduce the perceived motion of the teacher model (created in the second phase). Fourth, the student uses the neural controller (created in the third phase) to move, resulting in imitation of the teacher. Results with a physical student robot and two physical robot teachers demonstrate the effectiveness of this approach. The generalizability of the proposed model allows it to be used over variations in the demonstrator: The student robot would still be able to imitate teachers of different sizes and at different distances from itself, as well as different positions in its field of view, because change in the interrelations of the teacher’s body parts are used for imitation, rather than absolute geometric properties.
Abstract: We present a novel stereo vision modeling framework that generates approximate, yet physically-plausible representations of objects rather than creating accurate models that are computationally expensive to generate. Our approach to the modeling of target scenes is based on carefully selecting a small subset of the total pixels available for visual processing. To achieve this, we use the estimation-exploration algorithm (EEA) to create the visual models: a population of three-dimensional models is optimized against a growing set of training pixels, and periodically a new pixel that causes disagreement among the models is selected from the observed stereo images of the scene and added to the training set. We show here that using only 5% of the available pixels, the algorithm can generate approximate models of compound objects in a scene. Our algorithm serves the dual goals of extracting the 3D structure and relative motion of objects of interest by modeling the target objects in terms of their physical parameters (e.g., position, orientation, shape, etc.), and tracking how these parameters vary with time. We support our claims with results from simulation as well from a real robot lifting a compound object.