KR20090113553A

KR20090113553A - Semantic active object recognition

Info

Publication number: KR20090113553A
Application number: KR1020080039337A
Authority: KR
Inventors: 서일홍; 류광근
Original assignee: 한양대학교 산학협력단
Priority date: 2008-04-28
Filing date: 2008-04-28
Publication date: 2009-11-02
Also published as: KR101031542B1

Abstract

PURPOSE: A semantic active object algorithm is provided to increase the recognition rate of a targeted object although prominent features of the object are difficult to be recognized. CONSTITUTION: An object recognition method through an input image from a camera comprises the following steps of: preparing many-sided images of each object; calculating/pre-storing each entropy of the many-sided images; determining a target object; determining an estimation region estimated as the target object from the input image; determining whether the estimation region is the target object; if not, determining one side of the target object corresponding to the estimation region; and moving the focus to a position where a many-sided image having lower entropy can be obtained based on the side of the target object.

Description

Active Object Recognition Algorithm Combining Knowledge {SEMANTIC ACTIVE OBJECT RECOGNITION}

본 발명은 화상을 이용한 물체 인식 알고리즘에 관한 것으로서, 더욱 상세하게는, 물체 인식 알고리즘으로서 복잡한 환경에서 물체 인식의 정확도를 높이기 위해 하향식/상향식 시각집중기술과 정보량 기반 다음 시점 결정 기술을 개발, 물체 인식 알고리즘에 적용하는 방법에 관한 것이다.The present invention relates to an object recognition algorithm using an image. More specifically, as an object recognition algorithm, in order to increase the accuracy of object recognition in a complex environment, a top-down / down-level visual focusing technique and an information amount based next viewpoint determination technique are developed, and object recognition is performed. A method of applying to an algorithm.

본 발명은 이동 로봇의 물체 인식, 이동 로봇의 사람 인식, 감시 카메라, 핸드폰 카메라, 비디오 카메라, 컴퓨터게임, 화상 채팅, 컴퓨터용 USB 카메라, 디지털 카메라, 모바일 로봇의 카메라, 디지털 카메라를 이용한 인터랙티브 게임에 적용될 수 있다.The present invention relates to an object game of a mobile robot, a person recognition of a mobile robot, a surveillance camera, a mobile phone camera, a video camera, a computer game, a video chat, a computer USB camera, a digital camera, a mobile robot camera, and an interactive game using a digital camera. Can be applied.

종래의 물체 인식 알고리즘은 주로 강인한 시각 특징을 추출하는 알고리즘을 개발하였다. 강인한 시각특징이란 어떤 물체가 똑바로 서있을 때 추출한 시각 특징이 그 물체가 회전, 크기변화, 밝기의 변화 등의 여러 가지 외부 영향을 받더라도 똑같은 위치에서 시각특징이 추출되는 것을 말한다. 이러한 강인한 시각 특징을 바탕으로 물체를 인식한다.Conventional object recognition algorithms have mainly developed algorithms for extracting robust visual features. Robust visual features mean that visual features extracted when an object is standing upright are extracted from the same location even if the object is subject to various external influences such as rotation, size change, and brightness change. Recognize objects based on these powerful visual features.

종래에는 한 장의 영상에서 시각적 특징을 얻고 미리 저장된 물체들의 시각적 특징과 비교하여 물체 인식을 하였다. 그런데, 도1에 도시되어 있는 바와 같이, 화면에 여러 물체가 있거나, 찾아야 하는 물체가 조금 가려져 있거나, 배경이 복잡하거나 하는 여러 가지 상황에서 물체 인식을 해야 할 경우가 있는데 이러한 상황에서는 한 장의 사진으로 물체를 인식한 결과의 신뢰도가 떨어질 수밖에 없다. 따라서 여러 장의 연속된 사진을 촬영하고 그러한 연속적인 데이터를 이용하여 물체 인식을 함으로써 물체 인식도 성공하고 인식 결과의 신뢰도도 높일 수 있다. Conventionally, object recognition is performed by obtaining a visual feature from a single image and comparing it with visual features of previously stored objects. However, as illustrated in FIG. 1, there are cases where object recognition is required in various situations in which there are several objects on the screen, the objects to be searched a little, or the background is complicated. The reliability of the result of recognizing the object is inevitably deteriorated. Therefore, by taking several consecutive pictures and using the continuous data to recognize the object, the object recognition is successful and the reliability of the recognition result can be improved.

본 발명은 물체의 두드러진 특징을 인식하기 어려운 경우에도, 목표하는 물건의 인식율을 높일 수 있는 방법 및 이를 이용한 로봇 등을 제공하는 것을 목적으로 한다.An object of the present invention is to provide a method for increasing the recognition rate of a target object and a robot using the same even when it is difficult to recognize the salient features of the object.

물체 인식의 정확도를 높이기 위해서는 여러 장의 사진을 촬영함으로써 데이터를 많이 얻어야 한다. 사람이 어떤 물체가 맞는지 확인하기 위해 자세히 살펴보는 것과 같은 원리이다. 이러한 동작이 알고리즘적으로 가능하게 하기 위해서는 종래의 물체 인식 기술에 두 가지 기술이 더 추가해야 한다. 한 가지는 입력된 영상에서 찾고자 하는 물체를 그 물체에 대한 지식 정보와 그 물체 고유의 시각 특징을 고려한 하향식/상향식 시각 주목 기술이고 다른 하나는 그 물체를 더욱 잘 인식하기 위한 시점으로 이동하기 위해 엔트로피를 기반으로 가장 정보량이 높은 곳으로 이동할 수 있도록 다음 행동을 추천하는 기술이다. 이러한 기술이 추가된다면 물체 인식의 정확도를 높일 수 있다. In order to increase the accuracy of object recognition, it is necessary to acquire a lot of data by taking several pictures. This is the same principle that a person looks closely at to see which object fits. In order for this operation to be algorithmically possible, two more techniques must be added to the conventional object recognition technique. One is a top-down / bottom-up visual attention technique that considers the object to be searched in the input image, taking into account the knowledge of the object and its own visual characteristics, and the other is entropy to move to the point of view for better recognition of the object. It is a technology that recommends the following actions to move to the highest information level based on The addition of these techniques can increase the accuracy of object recognition.

본 발명에서는 물체 인식 알고리즘에 있어서 물체 인식의 정확도를 높이기 위해 시각 집중 기술과 다음 시점 결정 기술을 적용한 개선된 물체 인식 알고리즘을 제시한다. 본 발명은 디지털 카메라로부터 들어오는 시각 정보에서 인식하고자 하는 물체에 해당하는 부분을 얻기 위한 하향식/상향식 시각 집중 기술과 인식하고 자 하는 물체의 인식률을 높이기 위해 정보량을 기반으로 시점을 이동하는 기술, 그리고 물체 인식 알고리즘으로 구성된다. 하향식/상향식 시각 집중 기술은 물체의 특징적인 표면 정보나 놓여있던 장소 등의 하향식 정보와 카메라 등으로 촬영한 영상에서 얻어내는 시각적 정보의 도드라짐을 기반으로 한 상향식 정보를 결합한 것이 특징이다. 정보량을 기반으로 물체를 보는 시점을 이동하는 기술은 물체의 정보량을 계산하여 정보량이 많은 물체의 면으로 이동하도록 한다. 또한 이러한 정보량의 변화를 행동과 결합하여 물체의 인식률을 높이는데 가장 적절한 행동을 추천하도록 한다.In the present invention, to improve the accuracy of object recognition in the object recognition algorithm, an improved object recognition algorithm using the visual focusing technique and the next viewpoint determination technique is proposed. The present invention is a top-down / bottom-up visual focusing technique for obtaining a portion corresponding to an object to be recognized in visual information from a digital camera, a technique for moving a viewpoint based on an amount of information to increase the recognition rate of an object to be recognized, and an object. It consists of a recognition algorithm. Top-down / bottom-up visual focusing technology combines top-down information based on the appearance of characteristic surface information of objects and places where they were placed, and bottom-up information based on the emergence of visual information obtained from images taken with a camera. The technique of shifting the view point of an object based on the amount of information calculates the amount of information of the object and moves it to the plane of the object having a large amount of information. In addition, this change of information amount is combined with behavior to recommend the most appropriate behavior to raise the recognition rate of the object.

기존의 물체 인식 알고리즘에 적용이 가능하며 물체 인식 성능을 향상시킬 수 있다.It can be applied to the existing object recognition algorithm and can improve the object recognition performance.

이하, 첨부된 도면을 참조하여 본 발명의 실시예를 구체적으로 설명하도록 한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

본 발명은 로봇에 영상을 촬상하기 위한 영상장치, 이동을 위한 구동장치, 영상의 저장을 위한 메모리, 영상 데이터의 분석을 위한 프로세서 등이 기본적으로 구비되어 있다는 종래 로봇의 기본적인 구성의 전제하에, 이에 탑재되어 물건의 인식율을 높일 수 있는 알고리즘을 중심으로 설명하도록 한다.The present invention is based on the premise of the basic configuration of the conventional robot that is basically provided with an image device for imaging the image, a driving device for movement, a memory for storing the image, a processor for analyzing the image data, and the like. The algorithm will be described based on the algorithm that can increase the recognition rate of objects.

도2는 본 발명의 실시예의 실시예에 따른 개략적인 물체 인식을 수행하는 순서도이다.2 is a flowchart for performing schematic object recognition according to an embodiment of the present invention.

먼저 로봇의 영상장치를 통해 촬상된 영상 데이터를 입력받는다. First, the image data captured by the robot's imaging device is input.

다음, 상향식 및 하향식 시각 집중 방식으로 영상 데이터 중 목표하는 물체로 의심되는 영상에 집중한다. Next, the image focuses on the image suspected of being the target object in the image data in a bottom-up and top-down manner.

여기서, 상향식 및 하향식 시각 집중 방식에 대해 도3을 참조하여 설명하도록 한다.Here, the bottom-up and top-down visual focusing scheme will be described with reference to FIG. 3.

이를 위해, 로봇은 찾고자 하는 물건의 고유의 시각적 특징에 의존하여 시각 집중을 수행하여 물체를 찾는 첫번째 과정을 거친다. 시각적 특성이란 물체 고유의 색상, 명암도, 지배적인 색상, 윤곽, 방위 등을 포함하는 것이다. To do this, the robot goes through the first process of finding an object by visually focusing on the inherent visual characteristics of the object to be found. Visual characteristics include the inherent color, contrast, dominant color, contour, orientation, and so on.

물체를 한번 찾고 난 뒤에는 물체가 놓였던 장소와 주변의 물체를 인지하고 이를 함께 저장한다.After finding the object once, it recognizes the place where the object was and the object around it and stores it together.

이제 다시 이 물건을 찾아야 하는 경우, 그 물체가 놓여있던 장소 정보와 찾는 물체의 주변 물체에 대한 정보를 이용하여 물체가 놓여 있는 장소를 예상하는 하향식 시각 집중과 물체 고유의 시각적 특징과 현재 화면에서 얻어진 도드라진 시각 정보를 이용하는 상향식 방법을 결합하여 시각을 집중시킨다.Now, when we need to find this object again, we use the information about the place where the object was located and the information about the object around it to obtain a top-down visual focus that predicts where the object is located, Concentrate time by combining bottom-up methods that use raised visual information.

즉, 도3에 도시된 바와 같이, 물체의 상황(OBJECT CONTEXT)을 의미있는 지식정보로 저장할 수 있는데 장소 정보, 주변 물건의 인식 정보, 주변 물건과의 거리 및 방위 정보 등을 저장한다.That is, as shown in FIG. 3, the object context may be stored as meaningful knowledge information, and the place information, recognition information of surrounding objects, distance and azimuth information of surrounding objects, and the like are stored.

이러한 하향식 방법은 물체가 특정 물건에 가려져 있거나, 주변에 많은 물건 이 혼재하여 목표하는 물건을 영상으로 분별하기 어려울 때 매우 유용한 정보로 이용될 수 있다. 따라서, 물건 고유의 시각적 특징에 의존하여 즉, 상향식 방법으로 시각을 집중시키는 방법과 하향식 방법으로 시각을 병행하는 것이 바람직하다.This top-down method can be used as a very useful information when an object is hidden by a certain object or when there are many objects mixed around it, and it is difficult to distinguish a target object into an image. Therefore, it is preferable to rely on the visual characteristics intrinsic to the object, that is, to concentrate the vision in a bottom-up method and to parallel the vision in a top-down method.

이러한 방법으로 추출 및 집중된 영상을 통해 물체를 인식한다. 만약, 집중된 영상이 목표하는 물체가 맞는지 확증할 수 없다면, 이를 확증할 수 있는 새로운 영상을 획득해야 한다.In this way, objects are recognized through extracted and focused images. If the focused image cannot confirm whether the target object is correct, a new image that can confirm this must be acquired.

특히, 새로운 영상이 올바른 판단을 이끌 수 있는 데이터여야 할 것이다. 이를 위해, 영상 데이터가 획득되는 시점을 합리적으로 선택해야 할 것이다.In particular, the new video should be data that can lead to the right judgment. To this end, it is necessary to reasonably select a time point at which image data is acquired.

이를 위해, 본 발명은 엔트로피에 기반한 시점 결정 방식을 취한다. 이 방식은 도4에 개략적으로 도시된 도면을 참조하여 설명하도록 한다.To this end, the present invention takes a point determination method based on entropy. This method will be described with reference to the drawings schematically shown in FIG.

이에 앞서, 본 로봇은 물체의 전,후,좌,우,상,하 이미지를 미리 저장하고 있다. 또한, 관심의 대상이 되는 모든 물체에 대한 전,후,좌,우,상,하 이미지를 미리 저장하고 있다. 이들 각각의 이미지가 해당 물체를 식별하는데 결정적 실마리가 될 확률과 다른 물체로 오판단할 수 있는 확률을 대비하여 각각의 이미지가 갖는 엔트로피를 산출, 미리 저장하고 있다.Prior to this, the robot has previously stored the front, rear, left, right, up and down images of the object. In addition, the front, rear, left, right, up and down images of all the objects of interest are stored in advance. The entropy of each image is calculated and stored in advance for the probability that each of these images will be the decisive clue for identifying the object and the probability of misjudgment with another object.

현재 촬상한 영상 데이터에 포함된 물체의 이미지가 6면 중 어디에 해당하는지 판별하고, 더 낮은 엔트로피를 갖는 영상을 얻을 수 있는 시점으로 로봇을 이동시키거나, 그러한 동작을 결정한다.It determines whether the image of the object included in the image data currently captured corresponds to any of the six surfaces, and moves the robot to a point in time at which an image having a lower entropy can be obtained, or determines such an operation.

즉, 현재 촬상된 영상 데이터 중 집중된 영상 부분에 기초하여, 확실한 영상 데이터를 얻기 위한 로봇의 시점(視點)을 정하기 위한 최선의 로봇 진행 방향을 결 정한다.That is, based on the concentrated image portion of the image data currently captured, the best robot traveling direction for determining the viewpoint of the robot for obtaining reliable image data is determined.

도4는 이와 같은 과정을 도식적으로 표현한 것이다. 도4를 참조하면, 영상 데이터로부터 SIFT, CCH(Color Coocurrence Histgram), DC(Dominant Color) 알고리즘 등을 통해 물체의 전면임이 판단된 경우, 엔트로피를 낮추는 방향으로 다음 시점을 결정하여 행동을 결정하도록 하는 것이다.4 is a schematic representation of this process. Referring to FIG. 4, when it is determined that the object is the front surface through SIFT, CCH (Color Coocurrence Histgram), DC (Dominant Color) algorithm, etc., the next time point is determined in a direction of lowering entropy to determine an action. will be.

이동 후에는, 도2에 도시된 바와 같이, 영상 데이터를 다시 획득하고 그 과정들을 반복하게 되는 것이다.After the movement, as shown in Fig. 2, the image data is acquired again and the processes are repeated.

최종적으로, 목표하는 물체가 해당 영상 데이터와 일치하는 것으로 판단될 때, 해당 루프는 종결된다.Finally, when it is determined that the target object matches the image data, the loop is terminated.

도5는 영상을 촬상하여 본 발명의 알고리즘을 적용할 때, 이용되는 영상들을 각각의 창으로 활성화시킨 화면이다.FIG. 5 is a screen in which images used are activated in respective windows when an image is captured and an algorithm of the present invention is applied.

도5에 도시된 바와 같이, "카메라 입력"된 영상을 윤곽 및 색을 기준으로 분할하면 "세그멘테이션"된 영상을 얻을 수 있다. "물체 모델"을 컵으로 선택할 때, 하향식/상향식 방식에 따라 "시각 집중"된 영상을 얻을 수 있다. 이제 컵을 더욱 확실히 인식할 수 있는 방향으로 로봇을 이동시킬 때의 경로 즉, "추천행동"이 방향성으로 그래픽 처리되어 나타난다.As illustrated in FIG. 5, when the "camera input" image is divided based on the outline and the color, the "segmented" image may be obtained. When the "object model" is selected as the cup, a "visually focused" image can be obtained in a top-down / bottom-up manner. Now the path, or “recommended behavior,” for moving the robot in a direction that makes the cup more recognizable is shown graphically as directional.

도6은 시각 집중 처리를 위한 상향식/하향식 기능 등을 선택할 수 있는 사용자 인터페이스 화면이다. 여기에서, 상향식 시각 집중시 이용할 컵의 특징 예컨대, 색, 밝기, 방향성 등을 선택하는 것도 가능하다. 6 is a user interface screen for selecting a bottom-up / downward function for visual focus processing. Here, it is also possible to select a feature of the cup to be used for bottom-up visual focusing, such as color, brightness, directionality, and the like.

도7은 물체 이미지에 적용할 알고리즘의 선택하기 위한 사용자 인터페이스 창이다. 도7을 참조하면, 물체 이미지의 우측 화면은 지배적인 색상을 중심으로, 그 하단의 "feature 2"는 SIFT 화면을 표현한 것이다. 그리고 그 좌측의 화면은 엔트로피 맵을 나타낸 것으로, 물체 리스트에 나타난 물건들을 식별하는데 사용되는 색상들의 엔트로피를 지도로 표현한 것이다.7 is a user interface window for selecting an algorithm to apply to an object image. Referring to FIG. 7, the right screen of the object image is centered on the dominant color, and “feature 2” at the bottom represents the SIFT screen. The screen on the left shows the entropy map, which maps the entropy of the colors used to identify objects in the object list.

비록 지금까지 본 발명의 몇몇 실시예들이 도시되고 설명되었지만, 본 발명이 속하는 기술분야의 통상의 지식을 가진 당업자라면 본 발명의 원칙이나 정신에서 벗어나지 않으면서 본 실시예를 변형할 수 있음을 알 수 있을 것이다. 발명의 범위는 첨부된 청구항과 그 균등물에 의해 정해질 것이다.Although some embodiments of the present invention have been shown and described so far, it will be apparent to those skilled in the art that the present invention may be modified without departing from the principles or spirit of the invention. There will be. It is intended that the scope of the invention be defined by the claims appended hereto and their equivalents.

도1은 찾고자 하는 물체가 여러 물체들과 혼재되어 있는 상황을 표시한 사진이다.1 is a photograph showing a situation in which an object to be searched is mixed with various objects.

도3은 상향식/하향식 방식을 설명하기 위한 개략적인 도면이다.3 is a schematic diagram for explaining a bottom-up / downward manner.

도4는 현재 촬상된 영상 데이터에 포함된 물체의 모습과 이에 대응하는 엔트로피를 기초로 다음 시점을 얻기 위한 행동 결정 방식을 표현한 도면이다.FIG. 4 is a diagram illustrating a behavior determination method for obtaining a next point of view based on an appearance of an object included in currently captured image data and a corresponding entropy.

도6은 시각 집중 처리를 위한 상향식/하향식 기능 등을 선택할 수 있는 사용자 인터페이스 화면이다.6 is a user interface screen for selecting a bottom-up / downward function for visual focus processing.

도7은 물체 이미지에 적용할 알고리즘의 선택하기 위한 사용자 인터페이스 창이다.7 is a user interface window for selecting an algorithm to apply to an object image.

Claims

In the object recognition method through the input image from the camera,

(a) preparing a multi-sided image of an object viewed from various angles for each object of interest, and calculating and storing each entropy of the multi-sided images in advance;

(b) determining a target object;

(c) determining an estimated area estimated from the input image as the target object;

(d) determining whether or not the estimation region represents the target object;

(e) if the estimation region is not determined as the target object, determining which side of the target object corresponds to the estimation region;

(f) moving the viewpoint to a position from which a multi-sided image having a low entropy can be obtained among the multi-faceted images of the target object with respect to the plane of the target object indicated by the estimation region. Object recognition method.

The method of claim 1,

And the entropy is determined based on a probability of misjudging from the multi-faceted image of the object of interest to another object.

The method of claim 1,

(g) if it is determined in step (d) that the estimation region represents the target object, storing peripheral information of the estimation region as contextual information associated with the target object; Robot object recognition method.

The method of claim 1,

(c) determining the estimated area estimated as the target object from the image input from the camera,

And evaluating and reflecting the degree of agreement between the contextual information associated with the target object and the surrounding information of the estimated area.

The method according to claim 3 or 4,

The situation information includes place information, identification information of surrounding objects, distance information with respect to surrounding objects, and orientation information.