KR20070121601A

KR20070121601A - Evaluating visual proto-objects for robot interaction

Info

Publication number: KR20070121601A
Application number: KR1020070061790A
Authority: KR
Inventors: 헤르베르트 얀쎈; 브람 볼데르
Original assignee: 혼다 리서치 인스티튜트 유럽 게엠베하
Priority date: 2006-06-22
Filing date: 2007-06-22
Publication date: 2007-12-27
Also published as: KR100904805B1

Abstract

Evaluating visual proto-objects for robot interaction is provided to enhance interaction between a robot and a circumstance around the robot based on visual information about the circumstance around the robot. In evaluating visual proto-objects for robot interaction, an interaction robot includes a visual detecting unit, a control unit, and a computing unit. The computing unit processes output signals from the visual detecting unit to generate proto-objects stored in a memory, wherein the proto-objects indicate interested blobs in an input field of the visual detecting unit based on 3D position label. The computing unit forms object hypothesizes with regard to category of the object based on evaluation of the proto-objects about different behavior property limit. The computing unit selects one among visual tracking motion, motion of the robot, and/or operation of the control unit on the basis of the proto-objects and the hypotheses as a target for motion.

Description

Evaluate visual primitive objects for robotic interaction {EVALUATING VISUAL PROTO-OBJECTS FOR ROBOT INTERACTION}

도 1은 본 발명에 따른 시스템의 통신 경로 및 작업의 분배의 개략도.1 is a schematic diagram of a distribution of work and communication paths of a system according to the present invention;

도 2는 스테레오 이미지 획득 시스템의 좌표들이 평행 배열된 축들로 변환되는 것을 도시하는 도면.2 shows that the coordinates of a stereo image acquisition system are converted into axes arranged in parallel.

인간에 가까운 로보트들에 대한 연구는 점차로, 자율 의사결정(autonomous decision making) 및 복합적으로 코디네이트된 행위(complexed coordinated behaviour)를 포함하여 복합 환경들에서의 상호작용에 촛점을 맞추고 있다.Research on robots that are close to humans is increasingly focusing on interactions in complex environments, including autonomous decision making and complex coordinated behaviour.

시각적 정보, 특히 스테레오 정보를 평가하는 로봇들은 로봇들의 행위를 제어하기 위한 환경에 대하여 그렇게 얻어진 정보를 이용하는 것으로 알려져 있다.Robots evaluating visual information, particularly stereo information, are known to use the information so obtained about the environment for controlling their behavior.

본 발명은 상이하고 간단하거나 또는 복잡한 동작들을 수행할 수 있도록 충분한 자유도(degrees-of-freedom)를 갖는 로봇들에 관한 것이다.The present invention relates to robots with degrees-of-freedom that allow them to perform different, simple or complex operations.

따라서, 본 발명은 로봇의 환경에 대한 시각적 정보에 기초하여 그 환경과 로봇의 상호작용을 향상시키는 기술을 제안한다.Accordingly, the present invention proposes a technique for improving the interaction between the environment and the robot based on visual information about the environment of the robot.

본 목적은 독립 청구항들의 특징들에 의하여 달성된다. 종속항들은 본 발명의 중심 사상을 더 전개한다.This object is achieved by the features of the independent claims. The dependent claims further develop the central idea of the invention.

본 발명은 예를 들면, 간단한 의사결정 및 코디네이트된 전체 몸체 동작을 이용하여 시각적 인식에 의하여 구동된 환경과 상호작용할 수 있는 로봇으로 구현될 수 있다.The invention can be implemented as a robot capable of interacting with the environment driven by visual perception, for example, using simple decision making and coordinated full body motion.

본 발명에 의하여 제안된 한 양상은 보다 장기의(long term) 타겟들을 처리하도록 용이하게 확장 가능한 아키텍쳐의 기본 구성요소들을 구현하기 위하여, 예를 들면, 임의의 신장된 채색된 객체(elongated coloured object)와 같은 시각적 타겟 객체들의 정의를 이용하는 시스템을 구축하는 것이다.One aspect proposed by the present invention is to implement, for example, any elongated colored object in order to implement the basic components of an architecture that is easily extensible to handle longer term targets. To build a system that uses the definition of visual target objects, such as

본 발명의 실시예들의 양상들은 다음과 같다.Aspects of embodiments of the present invention are as follows.

● 원시 객체들(proto-objects)이 미가공 형태로 예를 들면 3d에서 이 원시 객체들을 시각적 트랙킹하거나 또는 도달하기(reaching) 및 붙잡기(grasping)를 위하여 필요한 경우 안정한 객체 가정들을 형성하는데 이용될 수 있도록 단기 감지 메모리에 원시 객체들로서 인식 정보를 저장.Proto-objects can be used in raw form to form stable object assumptions, if necessary for visual tracking or reaching and grasping of these primitive objects, for example in 3d. Recognition information is stored as raw objects in short-term detection memory.

● 감지 정보 및 내부 예측에 기초하여 행위 및 동작 대안들을 평가하는 결정 메커니즘들.Decision mechanisms to evaluate behavioral and operational alternatives based on sensing information and internal prediction.

● 광범위한 가능한 타겟 기술들(target descriptions)에 의하여 구동될 수 있고 영공간 기준(null space criteria)으로서 비용 함수 세트를 이용하여 부드럽게 잘 코디네이트된 전체 몸체 동작들을 확실하게 하는 동작 제어 시스템.A motion control system that can be driven by a wide range of possible target descriptions and ensures smooth, well coordinated overall body motions using a set of cost functions as null space criteria.

인식 시스템은 컬러 및 스테레오 기반 3d 정보를 이용하여 관련된 시각적 자극을 검출하고 이 정보를 단기 감지 메모리에 원시 객체들로서 유지한다.The recognition system uses color and stereo based 3d information to detect the associated visual stimulus and maintain this information in the short term sensing memory as primitive objects.

이 감지 메모리는 그 후 시각적 트랙킹을 위한 타겟들을 도출하고 도달 동작들을 위한 동작 타겟들이 도출될 수 있는 안정한 객체 가정들을 형성하는데 이용된다. 예측 기반 결정 시스템은 최적의 동작 전략을 선택하고 그것을 실시간 실행한다. 실행된 동작들뿐만 아니라 내부 예측은 영공간의 비용 함수 이외에 태스크 공간의 플렉시블 타겟 기술(flexible target description)을 이용하여 양호하게 코디네이트되고 부드러운 전체 몸체 동작을 달성하는 통합된 제어 시스템을 이용한다.This sensory memory is then used to derive targets for visual tracking and to form stable object assumptions from which motion targets for reach operations can be derived. The prediction based decision system selects an optimal action strategy and executes it in real time. The internal predictions, as well as the actions performed, utilize an integrated control system that achieves well coordinated and smooth overall body motions using the flexible target description of the task space in addition to the cost function of zero space.

발명의 개요Summary of the Invention

본 발명의 일 양상은 시각적 감지 수단, 조작 수단 및 컴퓨팅 수단을 포함하는 상호작용 로봇에 관한 것이다. 컴퓨팅 수단은,One aspect of the invention relates to an interactive robot comprising visual sensing means, manipulation means and computing means. Computing means,

- 메모리에 저장되어 있고, 적어도 3d 위치 라벨에 의하여 시각적 감지 수단의 입력 필드 내에서 관심 있는 블로브들(blobs)을 나타내는 원시 객체들을 생성하기 위하여 시각적 감지 수단으로부터의 출력 신호들을 처리하고,Process output signals from the visual sensing means to generate primitive objects stored in memory and representing blobs of interest in the input field of the visual sensing means by at least a 3d position label,

- 상이한 행위 특정 제한들에 대한 원시 객체들의 평가에 기초하여 객체의 카테고리에 대한 객체 가정들을 형성하며,Forming object assumptions for the category of the object based on the evaluation of the primitive objects for different behavior specific restrictions,

- 동작을 위한 타겟으로서 적어도 하나의 원시 객체 및 상기 가정들에 기초하여, 시각적 감지 수단의 시각적 트랙킹 동작, 로봇의 몸체의 동작 및/또는 조작 수단의 동작 중 적어도 하나를 결정하도록 설계된다.-Based on at least one primitive object and the assumptions as a target for the operation, it is designed to determine at least one of the visual tracking operation of the visual sensing means, the operation of the body of the robot and / or the operation of the manipulation means.

블로브들은 또한 크기, 방향(orientation), 감지의 시간 및 정확도 라벨 중 적어도 하나에 의하여 표시될 수 있다.The blobs may also be indicated by at least one of size, orientation, time of detection and accuracy label.

조작 수단의 동작은 붙잡기(grasping) 동작 및 내밀기(poking) 동작 중 적어도 하나를 포함할 수 있다.The operation of the manipulation means may comprise at least one of a grasping operation and a poking operation.

컴퓨팅 수단은 최근에(in the recent past) 생성된 원시 객체들만을 고려하도록 설계될 수 있다.The computing means may be designed to only consider primitive objects created in the recent past.

원시 객체들에 대한 적어도 하나의 평가 기준은 그들의 신장(elongation)일 수 있다.At least one evaluation criterion for the primitive objects may be their elongation.

원시 객체들에 대한 적어도 하나의 평가 기준은 행위 특정 기준점(behavior-specific reference point)에 대한 그들의 간격일 수 있다.At least one evaluation criterion for the primitive objects may be their spacing to a behavior-specific reference point.

원시 객체들에 대한 적어도 하나의 평가 기준은 시간에 따른 그들의 안정도이다.At least one evaluation criterion for primitive objects is their stability over time.

본 발명의 다른 양상은 시각적 감지 수단, 붙잡기(grasping) 수단 및 컴퓨팅 수단을 포함하는 상호작용 로봇을 제어하는 방법에 관한 것이다. 상기 방법은,Another aspect of the invention relates to a method of controlling an interactive robot comprising visual sensing means, grasping means and computing means. The method,

원시 객체들을 생성하기 위하여 시각적 감지 수단으로부터의 출력 신호들을 처리하는 단계 - 상기 원시 객체들은 적어도 3d 위치 라벨에 의하여 시각적 감지 수단의 입력 필드 내에서 관심 있는 블로브들을 나타냄 -,Processing the output signals from the visual sensing means to generate the primitive objects, the primitive objects representing blobs of interest in the input field of the visual sensing means by at least a 3d location label;

원시 객체들을 평가함으로써 객체의 카테고리에 대한 가정들을 형성하는 단계,Forming assumptions about the category of the object by evaluating the primitive objects,

동작에 대한 타겟으로서 적어도 하나의 원시 객체 및 상기 가정들에 기초하 여, 시각적 감지 수단의 시각적 트랙킹 동작, 로봇의 몸체의 동작 및/또는 조작 수단의 동작 중 적어도 하나를 결정하는 단계Determining at least one of a visual tracking operation of the visual sensing means, a motion of the body of the robot and / or an operation of the manipulation means, based on the at least one primitive object and the assumptions as a target for the movement

를 포함한다.It includes.

본 발명의 추가적인 목적들, 특징들 및 장점들은 첨부도면과 함께 본 발명의 비제한적인 실시예의 이하의 상세한 설명으로부터 자명해질 것이다.Further objects, features and advantages of the present invention will become apparent from the following detailed description of a non-limiting embodiment of the invention in conjunction with the accompanying drawings.

일반적으로 본 발명은 보다 장기의 타겟들을 처리하도록 용이하게 확장 가능한 아키텍쳐의 기본 구성요소들을 구현하기 위하여, 시각적 타겟 객체들, 예를 들면, 임의의 신장된 채색된 객체의 정의를 이용하는 시스템을 만드는 것을 제안한다.In general, the present invention seeks to create a system that uses the definition of visual target objects, e.g., any elongated colored object, to implement the basic components of an architecture that is easily extensible to handle longer term targets. Suggest.

도 1에 도시된 본 발명의 일 실시예의 구성요소들은 다음과 같다:The components of one embodiment of the present invention shown in FIG. 1 are as follows:

● 원시 객체들이 미가공 형태로 예를 들면 3d에서 이 원시 객체들을 시각적 트랙킹하거나 또는 도달하기(reaching) 및 붙잡기(grasping)를 위하여 필요한 경우 안정한 객체 가정들을 형성하는데 이용될 수 있도록 단기 감지 메모리에 원시 객체들로서 인식 정보를 저장.Primitive objects in raw form in short-term sense memory so that they can be used in raw form to form stable object assumptions, if necessary for visual tracking or reaching and grasping of these primitive objects, for example in 3d. Storing recognition information as

인식 시스템은 컬러 및 스테레오 기반 3d 정보를 이용하여 관련된 시각적 자 극을 검출하고 이 정보를 단기 감지 메모리에 원시 객체들로서 유지한다.The recognition system uses color and stereo based 3d information to detect related visual stimuli and maintains this information as primitive objects in short-term sensing memory.

이 감지 메모리는 그 후 시각적 트랙킹을 위한 타겟들을 도출하고 도달 동작들을 위한 동작 타겟들이 도출될 수 있는 안정한 객체 가정들을 형성하는데 이용된다. 예측 기반 결정 시스템은 최적의 동작 전략을 선택하고 그것을 실시간 실행한다. 실행된 동작들 뿐만 아니라 내부 예측은 영공간의 비용 함수 이외에 태스크 공간의 플렉시블 타겟 기술을 이용하여 양호하게 코디네이트되고 부드러운 전체 몸체 동작을 달성하는 통합된 제어 시스템을 이용한다. This sensory memory is then used to derive targets for visual tracking and to form stable object assumptions from which motion targets for reach operations can be derived. The prediction based decision system selects an optimal action strategy and executes it in real time. The internal predictions as well as the operations performed use an integrated control system that achieves well coordinated and smooth overall body motion using the task space's flexible target technology in addition to the zero space cost function.

A. 시스템 개요A. System Overview

도 1은 상호작용 자율 로봇을 제어하도록 설계된 컴퓨팅 유닛의 통신 경로 및 처리의 분배를 도시한다.1 illustrates the distribution of communication paths and processing of a computing unit designed to control an interactive autonomous robot.

스테레오 컬러 이미지들(도 2 참조)은 이미지 획득 유닛에 의하여 연속 획득되고 그 후 두 개의 평행 통로들(pathways)에서 처리된다. 제1 통로는 관심 있는 영역들(이하 "블로브들(blobs)"로 호칭함)을 추출하도록 설계된 컬러 구분 유닛이다.Stereo color images (see FIG. 2) are successively acquired by the image acquisition unit and then processed in two parallel paths. The first passageway is a color discriminating unit designed to extract areas of interest (hereinafter referred to as "blobs").

도 2는 좌측 카메라(인덱스 "1") 및 우측 카메라(인덱스 "r")로부터의 초기의 비정렬된 좌표들을 평행 방식으로 정렬하기 위하여 그들이 어떻게 변환되는지를 도시한다.2 shows how they are transformed to align the initial unaligned coordinates from the left camera (index "1") and the right camera (index "r") in a parallel manner.

제2 통로는 3d 정보 추출 유닛을 포함하며, 이것은 또한 스테레오 계산 블럭으로 호칭될 수 있고, 각각의 픽셀에 대하여 이미지 획득 유닛으로의 시각적 간격을 계산한다.The second passageway includes a 3d information extraction unit, which may also be called a stereo calculation block, which calculates the visual distance to the image acquisition unit for each pixel.

두 통로들의 결과들이 재결합되어 단기 감지 메모리에 시간에 따라 안정화되어 있는 3d 블로브 형태의 원시 객체들을 형성한다.The results of the two paths are recombined to form primitive objects in the form of 3d blobs that are stabilized over time in short-term sensing memory.

정의된 기준을 이용하여, 감지 메모리에 저장된 현재 원시 객체들을 평가함으로써 객체 가정들이 생성된다. 이 가정들은 그 후 타겟들로서 상이한 행위들에 의하여 이용될 수 있다. 행위들은 예를 들면, 다음 중 하나 이상이다:Using defined criteria, object assumptions are created by evaluating current primitive objects stored in the sense memory. These assumptions can then be used by different acts as targets. Actions are, for example, one or more of the following:

- 헤드의 검색 또는 트랙킹 행위-Search or tracking of heads

- 로봇의 다리들의 워킹 또는 휴식 행위-Walking or resting on the legs of the robot

- 로봇의 팔의 도달, 붙잡기 또는 내밀기 행위.-Reaching, catching or extruding the robot's arms.

헤드 타겟들, 즉 헤드의 검색 또는 트랙킹 행위를 위한 타겟들은 피트니스 펑션(fitness function)을 이용하여 선택된다. 피트니스 펑션은 정의된 로봇 관련 객체들 및 멀티플 객체들의 관점에서 상이한 행위들의 "피트니스"를 평가한다. 피트니스 펑션을 적용함으로써 대응하는 스칼라 피트니스 값이 발생된다.Head targets, ie targets for searching or tracking behavior of the head, are selected using a fitness function. The fitness function evaluates the "fitness" of different actions in terms of defined robot-related objects and multiple objects. By applying the fitness function a corresponding scalar fitness value is generated.

팔 및 몸체 타겟들은 가장 적합한 내부 예측을 선택함으로써 생성될 수 있다.Arm and body targets can be generated by selecting the most suitable internal prediction.

시야 방향, 손의 위치와 방위 및 다리 동작을 제어하기 위한 타겟들은 모터 커맨들들을 발생시키는 전체 몸체 동작 시스템에 공급된다.Targets for controlling viewing direction, hand position and orientation, and leg motion are supplied to the overall body motion system that generates motor commands.

전체 몸체 동작 시스템은 운동학(kinematics)에 기초한 충돌 검출 시스템에 의하여 지지되어 로봇이 손상되는 것을 방지할 수 있다.The whole body motion system can be supported by a collision detection system based on kinematics to prevent damage to the robot.

시각적 타겟들을 행위에 통합하기 위하여, 시각적 데이터 및 로봇 자세들은 시간 라벨된다. 주어진 시간에 대하여 로봇 자세에 액세스하는 메커니즘들이 제공 된다. 이것은 또한 이미지 획득과 동작 서브 시스템들 사이의 시간 동기화를 요구한다.In order to integrate the visual targets into the behavior, the visual data and the robot poses are time labeled. Mechanisms are provided for accessing the robot posture for a given time. This also requires time synchronization between image acquisition and operating subsystems.

시스템의 수 개의 구성요소들이 이하에 더욱 상세히 기술될 것이다.Several components of the system will be described in more detail below.

비전 및 제어 처리는 실시간 환경에서 데이터 구동 방식으로 상호작용하는 수 개의 보다 작은 모듈들로 나누어진다.Vision and control processing is divided into several smaller modules that interact in a data-driven fashion in a real-time environment.

B. 비전B. Vision

개요summary

처리의 체인은 새로 획득된 컬러 스테레오 이미지들로 개시한다. 이 이미지들은 컬러 및 그레이 스케일 이미지들을 이용하여 두 개의 독립한 평행 통로들로 공급된다. 컬러 처리는 다시 컴팩트 영역들로 구분되는 컬러 유사성의 픽셀 방식 마스크를 구성하는 컬러 이미지의 컬러 구분으로 구성된다. 3D 정보를 추출하기 위하여 좌측과 우측 이미지들 간의 이미지 불일치들을 계산하는데 그레이 스케일 이미지들이 이용될 수 있다.The chain of processing starts with newly acquired color stereo images. These images are fed into two independent parallel passages using color and gray scale images. Color processing consists of color separation of the color image, which in turn constitutes a pixelated mask of color similarity divided into compact regions. Gray scale images can be used to calculate image inconsistencies between left and right images to extract 3D information.

획득된 비전 데이터로부터 3차원 "원시 객체"를 구성하기 위하여, 각각의 컬러 세그먼트는, 예를 들면 기본 방위(principal orientation) 및 개별 크기들을 평가하는 픽셀 위치들의 2차원 PCA(Principal Component Analysis)를 이용하여 블로브(예를 들면, 방향이 정해진 타원(oriented ellipse))로 변환된다. 스테레오 계산들로부터의 불일치들의 중앙값을 이용하여, 이 블로브는 3d 표현으로 변환된다. 너무 작거나 주어진 범위 밖의 깊이를 가지는 블로브들은 무시된다.In order to construct a three-dimensional "raw object" from the acquired vision data, each color segment uses, for example, two-dimensional Principal Component Analysis (PCA) of pixel positions to evaluate the principal orientation and the individual sizes. To blobs (eg oriented ellipse). Using the median of discrepancies from the stereo calculations, this blob is transformed into a 3d representation. Blobs that are too small or have a depth outside the given range are ignored.

이 3d 블로브가 시간적 및 공간적으로 안정화되기 위하여, 월드 좌표 들(world coordinates)로 변환되는데, 그것은 획득의 시간 이후 로봇이 이동했을 가능성이 있기 때문이다. 이미지 획득시에 로봇의 자세에 액세스하기 위하여, 시스템은 최신의 자세들로 계속 업데이트되는 링 버퍼로서 조직된 자세 버퍼 및 비전 데이터의 시간 스탬프를 이용한다.In order for this 3d blob to stabilize both temporally and spatially, it is converted into world coordinates, since it is possible that the robot has moved since the time of acquisition. To access the pose of the robot in image acquisition, the system uses the time stamp of the vision data and the organized pose buffer as a ring buffer that is constantly updated with the latest poses.

월드 좌표들 내의 3d 블로브들은 물리 객체들일 수 있는 것에 대한 예비적인 대략적 표현이기 때문에, 포로토-객체들로 호칭된다. 시간상의 안정화를 위하여, 감지 메모리는 원시 객체 측정들의 현재 리스트를 새로운 시간 스텝에 대한 기존의 원시 객체들의 예측과 비교한다. 블로브 위치, 크기, 및 방위의 메트릭(metric)을 이용하여, 감지 메모리는 기존의 원시 객체들을 업데이트하거나, 새로운 원시 객체들을 예시하거나, 또는 소정의 시간 이후 확인되지 않은 원시 객체들을 삭제한다. 후자는 폐색된(occluded) 객체들뿐 아니라 분리물들(outliers)이 메모리에 너무 오래 잔류하지 않는 것을 확실히 한다.3d blobs in world coordinates are called porto-objects because they are a preliminary approximate representation of what may be physical objects. For stabilization in time, the sense memory compares the current list of primitive object measurements with the prediction of existing primitive objects for a new time step. Using a metric of blob location, size, and orientation, the sense memory updates existing primitive objects, illustrates new primitive objects, or deletes primitive objects that have not been identified after a certain time. The latter ensures that outliers as well as occluded objects do not remain in memory too long.

블로브들에To the blobs 기초한 원시 객체 Based primitive objects 들의field 생성의 상세한 설명 Detailed description of produce

시각적 원시 객체 데이터로부터 객체 가정들의 생성에 대한 상세한 설명이 이하에 주어질 것이다.A detailed description will be given below on the generation of object assumptions from visual raw object data.

전술된 바와 같은 원시 객체는 불명확한 상태로 유지될 수 있고 복수의 평가 방법들을 가능하게 한다. 그러나, 각각의 객체 가정은 이 데이터의 평가의 단지 하나의 특정한 방법에 기초한다. 상상할 수 있는 로봇 상호작용 응용에서 본 발명에 따른 시스템은 예를 들면, 하나의 특정한 신장된 객체에 대하여 붙잡기 동안 헤드 동작에 의하여 임의의 유형의 구분 가능한 객체 또는 영역을 추적하기 위하여, 동시에 다양한 평가 방법들에 관하여 시각적 원시 객체 데이터를 평가한다.The primitive object as described above can remain opaque and enables multiple evaluation methods. However, each object assumption is based on only one particular method of evaluation of this data. In imaginable robotic interaction applications, the system according to the present invention can simultaneously measure various types of distinguishable objects or regions by means of a head movement during capture for one particular stretched object, for example. Evaluate visual primitive object data with respect to

획득의 시간으로 라벨된 컬러 이미지들의 쌍들을 이용하는 것이 제안된다. 이 이미지들은, 이미 전술되고 도 1에 도시된 바와 같이, 두개의 평행한 경로들에서 처리된다:It is proposed to use pairs of color images labeled with the time of acquisition. These images are processed in two parallel paths, as already described above and shown in FIG. 1:

1. 스테레오 불일치 계산:1. Calculate Stereo Mismatch:

컬러 이미지 쌍들로부터 강도(intensity) 이미지 쌍들이 생성된다. 이미지들은 수정, 즉 변형되며 결과는 예를 들면 동일선상 이미지 로우들(collinear image rows)을 갖는 두 개의 핀홀 카메라들로 캡쳐된 이미지들에 대응한다(도 2 참조). 두 이미지들 내의 대응하는 특징들 간의 수평 방향 불일치들은, 그 특징들이 충분히 중요하다면, 계산된다.Intensity image pairs are generated from the color image pairs. The images are modified, ie deformed and the result corresponds to images captured with two pinhole cameras, for example with collinear image rows (see FIG. 2). Horizontal inconsistencies between corresponding features in the two images are calculated if the features are important enough.

2. 컬러 구분(colour segmentation):2. Color segmentation:

상기 쌍들 중 하나의 컬러는 상기와 같이 수정되고 예를 들면 HLS(Hue, Luminance, Saturation) 컬러 공간으로 변환된다. 모든 픽셀들은 HLS 공간의 특정 볼륨내에 있는지 여부에 대하여 평가되며 그 결과는 픽셀의 한 클래스의 작은 영역들을 제거하는 형태학적 연산들(morphological operations)을 거친다. HLS 볼륨 내에 있는 결과 픽셀들은 이미지면 내에서 연속적인 영역들로 그룹화된다. 추가적인 처리를 위하여 최소 크기를 초과하는 최대의 결과적인 그룹들이 선택된다.The color of one of the pairs is modified as above and converted to, for example, HLS (Hue, Luminance, Saturation) color space. All pixels are evaluated for whether they are within a particular volume of HLS space and the result is subjected to morphological operations that remove small areas of one class of pixels. The resulting pixels in the HLS volume are grouped into consecutive regions within the image plane. The maximum resulting groups exceeding the minimum size are selected for further processing.

컬러 구분으로부터의 그룹들의 각각에 대하여 이미지면 내의 영역의 중심 X_p, Y_p 및 모든 그 픽셀들의 불일치들 d의 중앙값이 계산된다. 또한 그룹 영역이 이미지 경계들을 터치하는지 여부가 검출되며, 만일 그렇다면 그 영역에 대응하는 현실 객체의 부분들은 아마도 관찰하는 필드의 외부에 있기 때문에 데이터는 부정확한 것으로 라벨된다.For each of the groups from the color distinction, the median of the centers X _p , Y _p and inconsistencies d of all those pixels in the image plane is calculated. It is also detected whether the group region touches the image boundaries, and if so, the data is labeled as incorrect because the parts of the real object corresponding to that region are probably outside the field of observation.

픽셀 위치들의 상관 매트릭스(correlation matrix)의 PCA(principal component analysis)를 이용하여 각각의 그룹에 대하여 이미지면 내의 픽셀들의 표준 편차들 σ_p1 σ_p2 및 기본 축 ω의 방위가 계산된다.Using the principal component analysis (PCA) of the correlation matrix of pixel positions, the orientation of the standard deviations σ _p1 σ _p2 and the fundamental axis ω of the pixels in the image plane is calculated for each group.

카메라 시스템의 기하학적 구조를 이용하여 좌표들 (x_p, y_p, d) 및 σ_p1 및 σ_p2 는 메트릭 좌표들 (x_c, y_c, z_c) 및 메트릭 표준 편차들 σ_c1 및 σ_c2 로 변환된다.Using the geometry of the camera system, the coordinates (x _p , y _p , d) and σ _p1 and σ _p2 are _converted into metric coordinates (x _c , y _c , z _c ) and metric standard deviations σ _c1 and σ _c2 . Is converted.

이미지들의 시간 라벨을 이용하여 이미지 캡쳐시 로봇 자세 및 위치가 도출되어 위치를 월드 좌표들

로 변환하고 기본축 방위 ω를 월드 좌표 내의 방위 벡터

로 변환하는데 이용된다.Using the time labels of the images, the robot pose and position are derived when capturing the image.

And convert the principal axis bearing ω into the bearing vector within the world coordinates.

Used to convert

따라서 블로브는 전술된 바와 같이 시간 라벨, 위치

, 방위

, 표준 편차들 σ_c1 및 σ_c2, 및 데이터의 정확 여부를 나타내는 라벨을 포함하는 데이터의 세트로서 정의될 수 있다.Thus, blobs may have time labels, positions, as described above.

Bearing

, Standard deviations σ _c1 and σ _c2 , and a label indicating the accuracy of the data.

감지 메모리 내의 원시 객체Primitive Objects in Sense Memory 들의field 저장: Save:

인커밍 블로브 데이터를 취하고 그것을 감지 메모리의 내용들과 비교함으로써 블로브 데이터로부터 원시 객체들이 도출된다.Primitive objects are derived from the blob data by taking the incoming blob data and comparing it with the contents of the sense memory.

메모리가 빈 상태라면 새로운 원시 객체에 고유 식별자를 할당하고 인커밍 블로브 데이터를 그 안에 삽입함으로써 블로브 데이터로부터 원시 객체가 생성된다.If the memory is empty, a primitive object is created from the blob data by assigning a new identifier to the new primitive object and inserting incoming blob data into it.

감지 메모리가 이미 하나 이상의 원시 객체들을 포함한다면 블로브 데이터로서 각각의 원시 객체에 대한 예측이 생성된다. 이 예측된 블로브 데이터는 원시 객체에 포함되어 있는 모든 블로브 데이터에 기초하며 현재 시간에 대하여 생성된다.If the sense memory already contains one or more primitive objects, predictions for each primitive object are generated as blob data. This predicted blob data is based on all the blob data contained in the primitive object and is generated for the current time.

각각의 인커밍 블로브는 기존의 원시 객체 또는 인커밍 블로브와 예측된 블로브 사이의 최소 간격에 기초하여 새로 생성된 객체에 삽입되어 모든 인커밍 블로브들이 고유 식별자들에 할당된다.Each incoming blob is inserted into a newly created object based on an existing primitive object or the minimum distance between the incoming blob and the predicted blob so that all incoming blobs are assigned unique identifiers.

간격 계산을 위한 메트릭은 유클리드 간격(Euclidean distance) 및 상대적 회전각 양자에 기초한다.The metric for calculating the spacing is based on both Euclidean distance and relative rotation angle.

삽입된 블로브 데이터도 수정되어 새로운 블로브의 방위 간격은 항상 90도 이하이다. 이것은 180도 플립들에 관하여 블로브 방위 기술(blob orientation description)이 모호하기(ambiguous) 때문에 가능하다.The inserted blob data is also modified so that the azimuth spacing of the new blob is always 90 degrees or less. This is possible because the blob orientation description is ambiguous with respect to 180 degree flips.

처리에 의하여 새로운 인커밍 블로브 데이터가 생성될 때마다, 그 시간 라벨은 모든 원시 객체들 내의 블로브 데이터의 시간 라벨들과 비교되고 소정 임계치보다 더 오래된 모든 블로브 데이터는 삭제된다. 이것은 이미지 처리가 이미지 쌍들 내에서 어떠한 블로브들도 발견하지 않은 경우에도 수행된다. 원시 객체가 어떠한 블로브 데이터도 포함하지 않으면 역시 감지 메모리로부터 삭제된다.Each time a process generates new incoming blob data, the time label is compared with the time labels of the blob data in all the primitive objects and all blob data older than a predetermined threshold are deleted. This is done even if image processing does not find any blobs in the image pairs. If the raw object does not contain any blob data, it is also deleted from the sense memory.

상기 비교를 위하여 필요한 바와 같은 예측은 위치

, 방위

, 및 표준 편차들 σ_c1 σ_c2 에 대한 원시 객체 내의 블로브 데이터로부터 로우 패스 필터에 의해 도출된다.Prediction as needed for the comparison is location

Bearing

, And from the blob data in the raw object for the standard deviations σ _c1 σ _c2 .

C. 행위 선택C. Action Selection

객체 가정들은 특정 기준에 대하여 감지 메모리 내에 저장된 원시 객체들을 평가함으로써 생성된다. 예를 들면, 객체 가정들에 대한 평가 기준으로 원시 객체들의 신장이 선택될 수 있는 것으로서, 즉 타원체들의 반지름들에 기초하여, 신장된 원시 객체들이 행위 타겟들로서 평가되는 한편, 보다 구형의 원시 객체들은 무시된다. 이 신장된 객체 가정들의 존재는 행위 선택 메커니즘들 내의 주요 기준이다.Object assumptions are created by evaluating primitive objects stored in the sense memory against specific criteria. For example, the elongation of primitive objects may be selected as an evaluation criterion for object assumptions, that is, based on the radii of ellipsoids, the elongated primitive objects are evaluated as behavior targets, while the more spherical primitive objects are Ignored The presence of these extended object assumptions is the main criterion in behavior selection mechanisms.

예를 들면, 다음의 두 선택 메커니즘들이 이용되어 두 개의 주 행위 그룹들을 제어할 수 있다:For example, the following two selection mechanisms can be used to control two main action groups:

- 검색 및 트랙 행위들은, 예를 들면, 1999년 뉴럴 네트웍스, T. Bergener, C. Bruckhoff, P. Dahm, H. Janssen, F. Joublin, R. Menzner, A. Steinhage, 및 W. von Seelen의 "Complex behaviour by means of dynamical systems for an anthropomorphic robot"에 기재된 바와 같은 피트니스 펑션에 기초하여 선택될 수 있다.Search and track actions are described, for example, in 1999 by Neural Networks, T. Bergener, C. Bruckhoff, P. Dahm, H. Janssen, F. Joublin, R. Menzner, A. Steinhage, and W. von Seelen. And may be selected based on a fitness function as described in "Complex behavior by means of dynamical systems for an anthropomorphic robot."

감지 메모리의 출력은 예를 들면 두 개의 상이한 헤드 행위들을 구동하는데 사용될 수 있다:The output of the sense memory can be used to drive two different head actions, for example:

1) 객체들 검색 및1) search for objects and

2) 객체들 또는 블로브들 응시 또는 트랙킹.2) Stare or track objects or blobs.

이 행위들과 별도로 항상 어느 행위가 활성되어야 하는지를 결정하는 결정예(decision instance) 또는 "조정자(arbiter)"가 제공된다. 조정자의 결정은 오직, 행위들의 자극으로부터 공급되는 스칼라 값에 기초하며, 이 스칼라 값은 "피트니스 값"이다. 이 피트니스 값은 항상 행위가 얼마나 잘 수행될 수 있는 지를 나타낸다. 이 구체적인 경우에 트랙킹은 응시 방향을 가리키는 적어도 하나의 부정확한 블로브 위치를 필요로 하지만, 물론 전체 객체 가정(full object hypothesis)을 이용할 수도 있다. 따라서, 트랙킹 행위는 어떠한 블로브 또는 객체라도 존재하면 1의 피트니스를 출력할 것이고 다른 경우라면 0의 피트니스를 출력할 것이다. 검색 행위는 필요조건들을 전혀 갖지 않으므로 그 피트니스는 1에 고정된다. 이 매우 간단한 행위 셋업의 경우, 조정(arbitration)은 물론 사소한 것이다. 그러나, 본 발명은 확장성을 위하여 99년 Bergener 등(상기 참조)에 기술된 것과 유사한 경쟁 역학 시스템(competitive dynamic system)을 이용하는 것을 제안한다. 따라서 조정자는 각각의 행위에 대한 활성 값을 계산하는 경쟁 역학에 대한 입력으로서, 모든 행위들의 자극으로부터 발생하는 스칼라 피트니스 값들로부터 만들어진 벡터를 이용한다. 경쟁 역학은 지시된 금지(inhibition) - 행위 A는 행위 B를 금지하지만 그 반대는 아님 - 를 인코드하여 행위 우선화 및 심지어 행위 주기들을 특정하는데 이용될 수 있는 사전 특정된 금지 매트릭스를 이용한다. 이 경우 트랙킹은 그렇게 지시된 금지에 의하여 검색으로 우선화될 수 있다.Apart from these actions there is always a decision instance or " arbiter " that determines which action should be active. The decision of the coordinator is based solely on the scalar value supplied from the stimulus of the actions, which is the "fitness value". This fitness value always indicates how well the action can be performed. Tracking in this specific case requires at least one incorrect blob location pointing in the gaze direction, but of course may use the full object hypothesis. Thus, the tracking behavior will output fitness of 1 if any blob or object is present and output fitness of 0 otherwise. The fitness is fixed at 1 since the search behavior has no requirements at all. In this very simple behavior setup, arbitration is of course trivial. However, the present invention proposes to use a competitive dynamic system similar to that described in 1999 Bergener et al. (See above) for scalability. Thus, the coordinator uses a vector made from scalar fitness values resulting from the stimulus of all actions as input to the competitive dynamics that calculates the activity value for each action. Competitive dynamics uses pre-specified inhibition matrices that can be used to encode directed inhibition (act A inhibits action B but not vice versa) to specify action prioritization and even cycles of action. In this case tracking can be prioritized to search by the prohibition so indicated.

검색 행위는 간단한 이완 역학(relaxation dynamics)을 갖는 리턴 맵의 과도한 저해상(5 by 7) 금지를 이용하여 실현된다. 검색 행위가 활성으로 되고 새로운 비전 데이터가 이용 가능하게 되면 맵에서의 현재 응시 방향의 값을 증가시키고 새로운 응시 타겟으로서 맵 내의 최저 값을 선택할 것이다. 또한 전체 맵은 0으로의 이완 및 적은 부가적인 노이즈를 겪는다.Retrieval behavior is realized using excessive low (5 by 7) inhibition of the return map with simple relaxation dynamics. When the search behavior becomes active and new vision data becomes available, it will increase the value of the current gaze direction in the map and select the lowest value in the map as the new gaze target. The entire map also suffers from relaxation to zero and less additional noise.

이것은 즉시 모든 시각적 정보를 고려하는 고정들의 랜덤 시퀀스(random sequence of fixations)을 갖는 시각적 검색 패턴을 발생시키고 관련 객체들의 효율적이고 빠른 검색을 발생시킨다. 리턴 맵의 금지의 크기는 팬/틸트 동작 범위(pan/tilt movement range)에 대한 카메라의 관찰의 필드(field of view)로부터 도출된다. 보다 높은 해상도가 검색을 크게 변경시키지는 않을 것이다. 이완 시상수(relaxation time constant)가 제2 범위 내에 설정되어, 금지 맵을 효과적으로 무효화시킬 로봇의 동작은 문제가 되지 않는다.This immediately generates a visual search pattern with a random sequence of fixations that takes into account all visual information and results in an efficient and fast search of related objects. The magnitude of the inhibition of the return map is derived from the field of view of the camera relative to the pan / tilt movement range. Higher resolution will not significantly change the search. The relaxation time constant is set within the second range so that the operation of the robot to effectively invalidate the forbidden map is not a problem.

트랙킹 행위는 3차원 포인트들의 멀티 트랙킹으로서 구현될 수 있다. 상기 행위는 모든 관련 원시 객체들과 객체 가정들을 고려하여 그들을 관찰의 필드 중심에 있도록 하기 위한 팬/틸트 각도들을 계산한다. 그 후 팬/틸트 좌표들 내의 사다리꼴 형상(trapezoidal shape)을 갖는 비용 펑션이 이용되어 카메라들의 유효 관찰 필드 내에 최대 수의 객체들을 유지할 팬/틸트 각도를 검색하고 이것은 팬/틸트 커맨드로서 보내진다. 트랙킹 행위는 항상 감지 메모리의 안정화된 출력을 이용하므로 로봇은 블로브가 잠시 사라진 경우에도 특정한 위치를 여전히 응시할 것이다. 이것은 전체 시스템의 성능을 크게 향상시킨다.The tracking behavior can be implemented as multi-tracking of three-dimensional points. The action takes into account all relevant primitive objects and object assumptions and calculates pan / tilt angles to keep them in the field center of observation. A cost function with a trapezoidal shape in pan / tilt coordinates is then used to retrieve the pan / tilt angle to keep the maximum number of objects in the effective viewing field of the cameras, which are sent as pan / tilt commands. The tracking behavior always uses the stabilized output of the sense memory, so the robot will still stare at a particular position even if the blob has disappeared for a while. This greatly improves the performance of the overall system.

내부 예측들에 기초한 하드 코드된 기준들을 이용하는 다른 행위들이 다음 섹션에서 논의될 것이다.Other actions using hard coded criteria based on internal predictions will be discussed in the next section.

이 선택 메커니즘들을 이용하여, 시스템은 신장된 객체들을 검색하거나 그들 중 하나 이상을 트랙킹할 수 있다. 동시에, 로봇은 손바닥이 객체의 기본 축에 정렬된 상태로 가장 적절한 팔을 이용하여 신장된 객체에 도달할 수 있으며, 만일 객체가 너무 가까이 있거나 너무 멀리 있다면, 또한 적절한 워킹 동작을 선택할 것이다. 이용 가능한 타겟이 없다면, 로봇은 워킹을 중단하고 그 팔들을 휴식 위치로 이동시킬 것이다.Using these selection mechanisms, the system can search for stretched objects or track one or more of them. At the same time, the robot can reach the stretched object using the most appropriate arm with the palms aligned with the object's primary axis, and if the object is too close or too far, it will also select the appropriate walking motion. If no target is available, the robot will stop walking and move its arms to a resting position.

평가들은 또한 모든 원시 객체들의 블로브 데이터 예측들에 기초된다. 이 예측의 라벨은 원시 객체 내의 최근 블로브 데이터가 예측 시간보다 더 오래된 것이라면 "기억된(memorised)"으로 설정된다. 다른 경우에는, 원시 객체 내의 최근 블로브 데이터의 라벨로 설정된다.The evaluations are also based on blob data predictions of all primitive objects. The label of this prediction is set to "memorised" if the latest blob data in the primitive object is older than the prediction time. In other cases, it is set to the label of recent blob data in the primitive object.

고정 및 트랙킹의 행위에 대하여 이미 충분한 최소 기준은 부정확한 것으로 라벨된 블로브이다. 부정확한 것으로 라벨된 어떠한 블로브도 접근 행위를 위하여 이용될 수 있다. 만일 안정한 값들 σ_c1 과 σ_c2 및 불충분한 비전 데이터에 의존하는 것을 피하기 위한 최대 간격과 같은 보다 엄격한 기준을 고려한다면 안정한 객체 가정들을 추출한다. "포크 벌룬(poke baloon)"과 같은 조작 행위들을 실행하기 위하여 대략적인 구형 형상 ((σ_c1 - σ_c2 )/σ_c1 < 임계치) 및 행위의 가장 용이한 실행(예를 들면, 몸체의 정면에서의 내밀기(poking)를 위한 행위 특정 기준 점으로 의 최소 간격)과 같은 부가적인 제한들이 안정한 객체 가정들에 붙여질 수 있다. "파워 붙잡기 객체(power grasp object)"와 같은 행위는 붙잡기 안정성을 위한 최소 신장 ((σ_c1 - σ_c2 )/σ_c1 > 임계치) 및 적절한 직경 (임계치 < σ_c2 < 임계치)을 필요로 할 것이다.The minimum criteria already sufficient for the behavior of fixation and tracking is blobs labeled as incorrect. Any blob labeled incorrect may be used for the access act. If we consider more stringent criteria such as stable values σ _c1 and σ _c2 and maximum spacing to avoid relying on insufficient vision data, we extract stable object assumptions. The approximate spherical shape ((σ _c1 − σ _c2 ) / σ _c1 <threshold) and the easiest execution of the action (eg, from the front of the body) to carry out manipulating actions such as “poke baloon”. Additional constraints, such as the minimum interval to behavior specific reference point for poking, may be imposed on stable object assumptions. Actions such as a "power grasp object" will require a minimum elongation ((σ _c1 -σ _c2 ) / σ _c1 > threshold) and an appropriate diameter (threshold <σ _c2 <threshold) for catch stability. .

E. 전체 몸체 동작 및 예측E. Overall Body Behavior and Prediction

헤드, 팔, 및 다리에 대한 타겟들을 이용하여, 시스템은, 2005년 휴머노이드의, M. Gienger, H. Janssen, 및 C. Goerick, "Task oriented whole body motion for humanoid robots"에 그 기초가 기술되어 있는 전체 몸체 제어기에 의해 모터 커맨드들을 생성할 수 있다.Using targets for the head, arms, and legs, the system is described in 2005 by Humanoid, M. Gienger, H. Janssen, and C. Goerick, "Task oriented whole body motion for humanoid robots." The motor commands can be generated by a full body controller.

전체 몸체 동작의 원리는 태스크 공간의 플렉시블 기술(flexible description)을 이용하고 영공간을 이용하여 질량 중심 이동 보상 및 조인트 한계의 회피와 같은 몇 가지 최적화 기준을 충족하는 것이다.The principle of total body motion is to use some flexible criteria in the task space and use zero space to meet some optimization criteria, such as compensation for center of mass movement and avoidance of joint limits.

전체 몸체 동작에 대한 계산 비용들이 충분히 낮으므로, 로봇 동작을 직접 발생시키는데 이용될 수 있을 뿐만 아니라 그 신속한 수렴(convergence)으로 인하여 실시간보다 더 빠른 시간 스케일로 상이한 행위들을 시뮬레이션할 수 있다.Since computational costs for overall body motion are low enough, not only can they be used to directly generate robotic motion, but their rapid convergence allows simulation of different behaviors on a faster time scale than real time.

이로 인하여, 워킹 및 팔 동작들의 행위 선택을 지원하는데 이용된다. 4개의 내부 시뮬레이션들은 스탠딩 또는 워킹 중 좌측 팔 또는 우측팔 양자를 이용하여 연속적으로 현재의 자세로부터 타겟 객체에 도달하려고 시도한다. 그 후 메트릭이 이용되어 실시간으로 실행되는 최적의 행위를 선택한다.As such, it is used to support behavior selection of walking and arm actions. Four internal simulations attempt to reach the target object from the current pose continuously using either the left or right arm during standing or walking. The metric is then used to select the optimal behavior to run in real time.

F. 충돌 검출F. Collision Detection

동작중 로봇의 안전을 확보하기 위하여, 실시간 충돌 검출 알고리즘이 이용된다. 충돌 검출은 로봇의 세그먼트들(팔다리들 및 몸체 부분들) 사이의 간격을 계산하기 위하여 운동학적 정보와 함께 이용되는 구형들(spheres) 및 구형 스위프된(sphere-swept) 라인들에 관한 로봇 몸체의 내부 계층 기술(internal hierarchical description)을 이용한다. 이 간격들 중 어느 것이라도 임계치보다 작으면 고레벨 동작 제어는 디스에이블되어, 양발 워킹(bipedal walking)의 역학적 안정화만이 여전히 활성 상태일 것이다.In order to ensure the safety of the robot during operation, a real-time collision detection algorithm is used. Collision detection is based on the robot body's relative to spheres and sphere-swept lines used with kinematic information to calculate the spacing between segments (limbs and body parts) of the robot. Use internal hierarchical description. If any of these intervals is less than the threshold, the high level motion control will be disabled so that only mechanical stabilization of bipedal walking will still be active.

충돌 검출은 최후 안전 조치로서 기능하며 로봇의 정상 동작중에는 트리거되지 않는다.Collision detection functions as a last safety measure and is not triggered during normal robot operation.

또한, 간단한 충돌 회피는 모든 동작 타겟들의 위치를 제한하므로, 예를 들면, 몸체 내부 또는 몸체에 매우 가까운 손목 타겟 위치들은 결코 생성되지 않는다.In addition, simple collision avoidance limits the position of all motion targets, so wrist target positions, for example, within or very close to the body, are never created.

V. 요약V. Summary

인간에 가까운 로봇이 다리와 팔 모두를 이용하여 그 시각적 환경과 상호작용할 수 있는 시스템의 설계 및 구현이 도입된다.The design and implementation of a system is introduced in which a human-like robot can interact with its visual environment using both legs and arms.

상호작용을 위한 타겟들은 시각적으로 추출된 원시 객체들에 기초한다. 제어 시스템은 로봇이 그 상호작용의 범위를 증가시키고, 동시에 복수의 타겟들을 달성하며, 원하지 않는 자세들을 회피하는 것을 허용한다. 임의의 시간 및 자세에서 상이한 종류의 행위들 사이에서 전환하기 위하여 수 개의 상이한 선택 메커니즘들 이 이용된다.Targets for interaction are based on visually extracted primitive objects. The control system allows the robot to increase the range of its interaction, at the same time achieve multiple targets, and avoid unwanted poses. Several different selection mechanisms are used to switch between different kinds of actions at any time and posture.

본 발명에 따르면 로봇의 환경에 대한 시각적 정보에 기초하여 그 환경과 로봇의 상호작용을 향상시킬 수 있다.According to the present invention, the interaction between the environment and the robot can be improved based on visual information about the environment of the robot.

Claims

An interactive robot comprising visual sensing means, manipulation means and computing means, the computing means comprising:

Process output signals from visual sensing means to create pro-objects stored in memory, the primitive objects being blobs of interest in the input field of the visual sensing means by at least a 3d position label; Indicates blobs-,

Form object assumptions about the category of the object based on the evaluation of the primitive objects for different behavior specific restrictions,

Based on at least one primitive object and the assumptions as a target for the operation, to determine at least one of a visual tracking operation of the visual sensing means, an operation of the body of the robot and / or an operation of the operation means.

Interactive robot engineered.

The method of claim 1,

The blobs are also represented by at least one of size, orientation, time of detection and accuracy labels.

The method according to claim 1 or 2,

Operation of the manipulation means comprises at least one of grasping and poking operations.

The interactive robot of claim 1, wherein the computing means is designed to only consider primitive objects created in the recent past.

The interactive robot of claim 1, wherein at least one evaluation criterion for the primitive objects is their elongation.

The interactive robot of claim 1, wherein at least one evaluation criterion for the primitive objects is their spacing relative to a behavior specific reference point.

The interactive robot of claim 1, wherein at least one evaluation criterion for the primitive objects is their stability over time.

A method of controlling an interactive robot comprising visual sensing means, catching means and computing means,

Processing output signals from the visual sensing means to generate poroto-objects, wherein the primitive objects represent blobs of interest in the input field of the visual sensing means by at least a 3d location label;

Forming assumptions about the category of the object by evaluating the primitive objects,

Determining at least one of a visual tracking action of the visual sensing means, an action of the body of the robot and / or an action of the manipulation means based on at least one primitive object and assumptions as a target for the action

Interactive robot control method comprising a.

The method of claim 7, wherein

Discarding the primitive objects after the elapse of a predetermined time period or when the operation of the body of the robot reaches a predetermined threshold.

The method according to claim 8 or 9,

The blobs are also represented by at least one of size, orientation, time of detection and accuracy label.

The method according to any one of claims 8 to 10, wherein the operation of the operation means includes at least one of a catching and extruding operation.

The method according to any one of claims 8 to 11,

Said computing means being designed to consider only recently created primitive objects.

The method according to any one of claims 8 to 12,

At least one evaluation criterion for the primitive objects is their height.

The method according to any one of claims 8 to 13,

At least one evaluation criterion for the primitive objects is their spacing relative to a behavior specific reference point.

The method according to any one of claims 8 to 14,

At least one evaluation criterion for the primitive objects is their stability over time.

A computer software program product supporting the method of any one of claims 8 to 15 when executed in a computing device of an interactive robot.