KR20190044950A

KR20190044950A - An apparatus and method for recognizing an object based on a single inference multiple label generation

Info

Publication number: KR20190044950A
Application number: KR1020170137596A
Authority: KR
Inventors: 조충상; 이영한; 정혜동
Original assignee: 전자부품연구원
Priority date: 2017-10-23
Filing date: 2017-10-23
Publication date: 2019-05-02
Also published as: KR102041987B1

Abstract

The present invention relates to an apparatus for recognizing an object based on single inference multi-label generation, which comprises: an object recognition unit recognizing an object and a background to be inferred from an inputted image; a database storing and managing single label information used in a different type in accordance with a space, place, and destination; a condition information providing unit providing condition information of the image in order to match the recognized object information to label information of the object; and an object label processing unit selecting the single label information stored in the database based on the condition information inputted from the condition information providing unit, recognizing the recognized object by matching the object information recognized through the object recognition unit to the selected single label information, and outputting the recognized object information.

Description

[0001] The present invention relates to an apparatus and method for recognizing an object based on a single inference multiple label generation,

본 발명은 단일 추론 다중 레이블 생성 기반의 객체 인식 장치 및 방법에 관한 것으로, 입력되는 영상에서 딥러닝 방법을 통해 객체를 추출하되, 도메인별로 객체를 인식하지 않고 상위 개념을 포괄하는 라벨과의 매칭을 통해 객체의 라벨링을 수행하는 단일 추론 다중 레이블 생성 기반의 객체 인식 장치 및 방법에 관한 것이다. The present invention relates to an apparatus and method for recognizing an object based on a single inference multi-label generation method, in which an object is extracted through a deep learning method in an input image, And more particularly to an apparatus and method for recognizing an object based on a single inference multiple label generation that performs labeling of an object through a plurality of objects.

종래 객체 인식 장치는 검색하고자 하는 도메인에 맞추어 학습된 모델을 기반으로 추론을 수행하며 그 결과를 객체 추론에 사용한다. Conventionally, an object recognition apparatus performs inference based on a model learned according to a domain to be searched and uses the result for object inference.

이러한 구조를 갖는 종래 객체 인식 장치는 원하는 도메인에 맞는 모델을 생성하기 위한 작업을 수행해야 한다. Conventional object recognition apparatus having such a structure needs to perform an operation for creating a model suitable for a desired domain.

이와 같이, 종래 객체 인식 장치는 도메인에 맞는 모델을 생성함으로써, 도메인이 변경되었을 경우 최적의 결과를 도출할 수 없는 문제점이 있다. In this way, the conventional object recognition apparatus generates a model suitable for a domain, so that there is a problem that when the domain is changed, an optimum result can not be obtained.

한편, 특허출원 10-2015-0132625호(객체 인식 장치 및 방법, 객체 인식 모델 학습 장치 및 방법)이 2015년 09월 18일 출원된바 있다. 종래 객체 인식 장치는 이미지 프레임 내에서 각각의 픽셀에 설정하고, 심층 신경망 기반의 모델을 이용하여 각 픽셀들을 레이블링하되, 레이블된 제1 픽셀들을 기초로 객체를 인식하는 것과 같은 특징을 갖는 특허가 출원된 바도 있다. On the other hand, Patent Application No. 10-2015-0132625 (object recognition apparatus and method, object recognition model learning apparatus and method) was filed on September 18, 2015. [ Conventionally, an object recognition apparatus has been proposed in which a patent having a feature of setting each pixel in an image frame and labeling each pixel using a depth-based neural network-based model and recognizing an object based on the labeled first pixels There is also a bar.

본 발명은 종래 문제점을 해결하기 위해 안출된 것으로, 본 발명의 목적은 상위 개념들을 포함한 레벨 데이터로 객체 인식을 학습함으로써, 도메인의 조건에 관계없이 객체 인식을 수행할 수 있는 단일 추론 다중 레이블 생성 기반의 객체 인식 장치 및 방법을 제공하고자 하는 것이다. The present invention has been conceived to solve the conventional problems, and an object of the present invention is to provide a single inference multi-label generation system capable of performing object recognition regardless of domain conditions by learning object recognition using level data including higher- And to provide an apparatus and method for recognizing an object of the object.

본 발명의 목적은 이상에서 언급한 목적으로 제한되지 않으며, 언급되지 않은 또 다른 목적들은 아래의 기재로부터 당업자에게 명확하게 이해될 수 있을 것이다.The objects of the present invention are not limited to the above-mentioned objects, and other objects not mentioned can be clearly understood by those skilled in the art from the following description.

상기 목적을 달성하기 위한 본 발명의 일 실시예에 따른 단일 추론 다중 레이블 생성 기반의 객체 인식 장치는 입력되는 영상으로부터 추론하고자 하는 객체 정보를 인식하는 객체 인식부; 상기 객체 인식부를 통해 인식된 객체 정보를 공간, 위치 정보에 대응되는 객체의 라벨 정보로 변환할 수 있도록, 인식 가능한 객체 정보와 그 객체 정보에 대응되는 객체의 라벨 정보가 포함된 단일 레이블 정보를 저장 관리하는 데이터베이스; 상기 인식된 객체 정보가 상기 객체의 라벨 정보로 매칭될 수 있도록, 영상의 상황 정보를 제공하는 상황 정보 제공부; 및 상기 상황 정보 제공부로부터 입력된 상기 상황 정보를 기반으로 상기 데이터베이스에 저장된 단일 레이블 정보를 선택하고, 상기 객체 인식부를 통해 인식된 객체 정보를 상기 선택한 단일 레이블 정보에 매칭하여 상기 매칭된 객체의 라벨 정보를 출력하는 객체 라벨 처리부;를 포함한다. According to an aspect of the present invention, there is provided an apparatus for recognizing an object based on a single inference multiple label generation, comprising: an object recognition unit for recognizing object information to be inferred from an input image; A single label information including recognizable object information and label information of the object corresponding to the object information is stored so that the object information recognized by the object recognition unit can be converted into label information of an object corresponding to space and position information Managed databases; A status information providing unit for providing status information of the image so that the recognized object information can be matched with the label information of the object; And selecting the single label information stored in the database based on the context information input from the context information providing unit, matching the object information recognized through the object recognition unit with the selected single label information, And an object label processing unit for outputting information.

한편, 상기 상황 정보 제공부는, 상기 입력되는 영상으로부터 음성 정보를 추출하는 음성 추출부; 및 상기 음성 추출부를 통해 추출된 음성으로부터 분석한 영상의 상황 정보를 제공하는 상황 정보 분석부;를 포함한다. The situation information providing unit may include: a voice extracting unit for extracting voice information from the input image; And a situation information analyzer for providing context information of the analyzed image from the speech extracted through the speech extractor.

또한, 상기 상황 정보 제공부는, 상기 음성 추출부를 통해 추출된 음성을 텍스트로 변환하는 텍스트 변환부; 및 상기 텍스트 변환부를 통해 변환된 텍스트로부터 공간과 상황에 대한 상황 정보를 제공하는 상황 정보 분석부를 포함할 수 있다. The context information providing unit may include: a text conversion unit for converting the speech extracted through the speech extracting unit into text; And a status information analyzing unit for providing status information on a space and a status from the text converted by the text converting unit.

그리고, 상기 상황 정보 제공부는, 텍스트를 포함하는 파일을 입력받기 위한 입력 인터페이스를 더 포함한다. The situation information providing unit further includes an input interface for receiving a file including text.

또한, 사용자로부터 직접 상황 정보를 입력받을 수 있도록, 사용자 입력 인터페이스를 더 포함한다. In addition, a user input interface is further included so that the user can input situation information directly from the user.

한편, 상기 상황 정보 제공부는, 이미지 기반 공간 분류/인식 엔진을 통해서 입력된 영상으로부터 상황 정보를 분석하여 인식하는 공간인식부를 포함한다. Meanwhile, the situation information providing unit includes a spatial recognition unit for analyzing and recognizing the situation information from the input image through the image-based spatial classification / recognition engine.

그리고 본 발명의 일 실시예에 따른 단일 추론 다중 레이블 생성 기반의 객체 인식 방법은 객체 인식부가 입력되는 영상으로부터 추론하고자 하는 객체 정보를 인식하는 단계; 상황 정보 제공부가 인식된 객체 정보를 객체의 라벨 정보로 매칭될 수 있도록, 영상의 상황 정보를 제공하는 단계; 객체 라벨 처리부가 상기 상황 정보 제공부로부터 입력된 상황 정보를 기반으로 상기 데이터베이스에서 인식된 객체 정보와 매칭되는 객체 라벨 정보를 포함하는 단일 레이블 정보를 선택하는 단계; 상기 객체 라벨 처리부가 상기 객체 인식부를 통해 인식된 객체 정보를 상기 선택한 단일 레이블 정보에 매칭하여 상기 객체의 라벨 정보를 출력하는 단계;를 포함한다. According to an embodiment of the present invention, there is provided a method for recognizing an object based on a single inference multiple label generation, comprising: recognizing object information to be inferred from an input image; Providing situation information of the image so that the situation information providing unit can match the recognized object information with the label information of the object; The object label processing unit selecting single label information including object label information matched with the object information recognized in the database based on the context information input from the context information providing unit; And outputting label information of the object by matching the object information recognized by the object label processing unit with the selected single label information.

여기서, 상기 상황 정보 제공하는 단계는, 음성 추출부가 상기 입력되는 영상으로부터 음성 정보를 추출하는 단계; 및 상황 정보 분석부가 상기 음성 추출부를 통해 추출된 음성으로부터 분석한 영상의 상황 정보를 제공하는 단계;를 포함한다. Here, the providing of the context information may include: extracting speech information from the input image; And the situation information analyzing unit provides context information of an image analyzed from the speech extracted through the speech extracting unit.

그리고 상기 상황 정보 제공하는 단계는, 텍스트 변환부가 상기 음성 추출부를 통해 추출된 음성을 텍스트로 변환하는 단계; 및 상황 정보 분석부가 상기 텍스트 변환부를 통해 변환된 텍스트로부터 공간과 상황에 대한 상황 정보를 제공하는 단계;를 포함한다. The step of providing the context information may further include the steps of: converting the speech extracted by the text conversion unit through the speech extraction unit into text; And the situation information analyzing unit provides context information on the space and the situation from the text converted through the text conversion unit.

한편, 상기 상황 정보 제공하는 단계는, 입력 인터페이스가 텍스트를 포함하는 파일을 입력받는 단계;를 더 포함할 수 있다. The providing of the context information may further include receiving a file including text by the input interface.

그리고 상기 상황 정보 제공하는 단계는, 사용자 입력 인터페이스가 사용자로부터 직접 상황 정보를 입력받는 단계를 더 포함할 수 있다. The providing of the context information may further include receiving the context information directly from the user input interface.

상기 상황 정보 제공하는 단계는, 공간인식부가 이미지 기반 공간 분류/인식 엔진을 통해서 입력된 영상으로부터 상황 정보를 분석하여 인식하는 단계;를 포함할 수 있다. The step of providing the context information may include analyzing and recognizing the context information from the input image through the image recognition based space classification / recognition engine.

본 발명의 일 실시예에 따르면, 객체 인식을 위한 도메인의 변화가 발생하더라도, 도메인에 대응되는 모델을 다시 학습하지 않고, 단일 학습 모델을 기반으로 해당 도메인을 추론할 수 있는 효과가 있다. According to an embodiment of the present invention, there is an effect that a corresponding domain can be inferred based on a single learning model without re-learning a model corresponding to a domain even if a domain change occurs for object recognition.

본 발명의 다른 효과에 따르면, 단일 학습 모델을 통해 다양한 도메인에 맞는 객체 인식 결과를 제공해줄 수 있는 장점이 있다. According to another advantage of the present invention, it is possible to provide object recognition results corresponding to various domains through a single learning model.

도 1은 본 발명의 일 실시예에 따른 단일 추론 다중 레이블 생성 기반의 객체 인식 장치를 설명하기 위한 기능블럭도.
도 2는 본 발명의 일실시예에서 객체 인식부를 통해 인식된 객체 정보를 설명하기 위한 참고도.
도 3은 본 발명의 일 실시예에 채용된 상황 정보 제공부의 세부 구성을 설명하기 위한 기능블럭도.
도 4는 본 발명의 일 실시예에 채용된 다른 상황 정보 제공부의 세부 구성을 설명하기 위한 기능블럭도.
도 5는 본 발명의 다른 실시예에 채용된 상황 정보 제공부를 설명하기 위한 기능 블럭도.
도 6은 본 발명의 일 실시예에 따른 단일 추론 다중 레이블 생성 기반의 객체 인식 방법을 설명하기 위한 순서도.
도 7은 본 발명의 일 실시예에서 상황 제공 방법의 세부 과정을 설명하기 위한 순서도이다. 1 is a functional block diagram illustrating an object recognition apparatus based on a single inference multiple label generation according to an embodiment of the present invention;
FIG. 2 is a reference diagram for explaining object information recognized by an object recognition unit in an embodiment of the present invention; FIG.
3 is a functional block diagram for explaining a detailed configuration of a situation information providing unit employed in an embodiment of the present invention;
FIG. 4 is a functional block diagram for explaining a detailed configuration of another situation information providing unit adopted in an embodiment of the present invention; FIG.
5 is a functional block diagram for explaining a situation information providing unit employed in another embodiment of the present invention.
FIG. 6 is a flow chart for explaining an object recognition method based on a single inference multiple label generation according to an embodiment of the present invention; FIG.
7 is a flowchart illustrating a detailed procedure of a method for providing a situation in an embodiment of the present invention.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 것이며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하며, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다. 한편, 본 명세서에서 사용된 용어는 실시예들을 설명하기 위한 것이며 본 발명을 제한하고자 하는 것은 아니다. 본 명세서에서, 단수형은 문구에서 특별히 언급하지 않는 한 복수형도 포함한다. 명세서에서 사용되는 "포함한다(comprises)" 및/또는 "포함하는(comprising)"은 언급된 구성소자, 단계, 동작 및/또는 소자는 하나 이상의 다른 구성소자, 단계, 동작 및/또는 소자의 존재 또는 추가를 배제하지 않는다.BRIEF DESCRIPTION OF THE DRAWINGS The advantages and features of the present invention and the manner of achieving them will become apparent with reference to the embodiments described in detail below with reference to the accompanying drawings. The present invention may, however, be embodied in many different forms and should not be construed as being limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art. Is provided to fully convey the scope of the invention to those skilled in the art, and the invention is only defined by the scope of the claims. It is to be understood that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. In the present specification, the singular form includes plural forms unless otherwise specified in the specification. As used herein, the terms " comprises, " and / or " comprising " refer to the presence or absence of one or more other components, steps, operations, and / Or additions.

이하, 본 발명의 바람직한 실시예에 대하여 첨부한 도면을 참조하여 상세히 설명하기로 한다. Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른 단일 추론 다중 레이블 생성 기반의 객체 인식 장치를 설명하기 위한 기능블럭도이다. 도 1에 도시된 바와 같이, 본 발명의 일 실시예에 따른 단일 추론 다중 레이블 생성 기반의 객체 인식 장치는 객체 인식부(100), 데이터베이스(200), 상황 정보 제공부(300) 및 객체 라벨 처리부(400)를 포함하여 이루어진다. 1 is a functional block diagram for explaining an object recognition apparatus based on a single inference multiple label generation according to an embodiment of the present invention. 1, an object recognition apparatus based on a single inference multiple label generation according to an exemplary embodiment of the present invention includes an object recognition unit 100, a database 200, a context information providing unit 300, (400).

객체 인식부(100)는 입력되는 영상으로부터 추론하고자 하는 객체와 배경을 인식하는 역할을 한다. The object recognition unit 100 recognizes an object and background to be inferred from the input image.

그리고 데이터베이스(200)는 상기 객체 인식부를 통해 인식된 객체 정보와, 공간, 위치에 따른 객체 정보에 매칭되는 객체의 라벨 정보를 포함하는 단일 레이블 정보를 저장 관리하는 단일 레이블 정보를 저장 관리한다. 예를 들면, 상기 데이터베이스(300)에 저장되는 단일 레이블 정보는 홈 라벨 정보, 사무실(office) 라벨 정보 및 부엌 라벨 정보와 같은 공간에 따른 다른 형식의 단일 레이블 정보들이 저장 관리된다. The database 200 stores and manages single label information for storing and managing single label information including object information recognized through the object recognition unit and label information of an object matched with object information according to space and position. For example, the single label information stored in the database 300 stores and manages single label information of different formats according to a space such as home label information, office label information, and kitchen label information.

또한 상황 정보 제공부(300)는 인식된 객체 정보를 객체의 라벨 정보로 매칭될 수 있도록, 영상의 상황 정보를 제공하는 역할을 한다. 여기서, 상황 정보 제공부(300)가 제공하는 영상의 상황 정보는 단일 레이블 정보를 판단하기 위한 정보이다. In addition, the context information providing unit 300 provides context information of the image so that the recognized object information can be matched with the label information of the object. Here, the context information provided by the context information providing unit 300 is information for determining single label information.

그리고 객체 라벨 처리부(400)는 상기 상황 정보 제공부(300)로부터 입력된 상황 정보를 기반으로 상기 데이터베이스(200)에 저장된 단일 레이블 정보를 선택하고, 상기 객체 인식부(100)를 통해 인식된 객체 정보를 상기 선택한 단일 레이블 정보에 매칭하여 상기 매칭된 객체의 라벨 정보를 출력하는 역할을 한다. The object label processing unit 400 selects the single label information stored in the database 200 based on the context information input from the context information providing unit 300, And outputs label information of the matched object by matching the information with the selected single label information.

따라서, 본 발명의 일 실시예에 따르면, 종래 기술에서와 같이 입력되는 영상에서 객체를 인지할 때, 도메인 즉 객체를 직접 데이터베이스에 저장된 데이터와 비교하지 않고 상위 개념의 학습 모델인 단일 레이블 정보에 매칭함으로써, 단일 추론으로 여러 도메인에 맞는 라벨을 제공할 수 있는 효과가 있다. Therefore, according to an embodiment of the present invention, when recognizing an object in an input image as in the prior art, a domain or an object is not directly compared with data stored in a database, but is matched to a single label information Thus, it is possible to provide a label suitable for various domains with a single inference.

도 2는 본 발명의 일실시예에서 객체 인식부를 통해 인식된 객체 정보를 설명하기 위한 참고도이다, 도 2에 도시된 바와 같이, 본 발명의 일 실시예에 따르면, 객체 인식부(100)는 영상에 대하여 객체 인식 작업을 수행할 경우, 그 영상으로부터 사람 3명, 책상 1개, 의자 2개 및 빈 의자 1개와 같이 다양한 객체를 인식할 수 있다. 본 실시예에서는 객체 인식부(100)가 상기 나열된 객체에 대하여만 인식하였으나 이를 한정하지 않고 추가적인 다른 객체의 인식 또한 가능하다. FIG. 2 is a reference diagram for explaining object information recognized by an object recognition unit in an embodiment of the present invention. As shown in FIG. 2, according to an embodiment of the present invention, When an object recognition operation is performed on an image, various objects such as three persons, a desk, two chairs, and one empty chair can be recognized from the image. In the present embodiment, the object recognition unit 100 recognizes only the listed objects, but it is also possible to recognize additional objects without limiting the objects.

이때, 상황 정보 제공부(300)는 객체를 인식하고자 하는 영상의 상황 정보를 제공한다. 만약, 상황 정보 제공부(300)가 제공하는 상황 정보가 "부엌"인 경우, 객체 처리부(400)는 상황 정보 제공부(300)가 제공한 상황 정보, 즉, "부엌"을 기반으로 데이터베이스(200)에 저장된 단일 레이블 정보를 선택한다. At this time, the context information providing unit 300 provides context information of an image to be recognized. If the context information provided by the context information providing unit 300 is a " kitchen ", the object processing unit 400 receives the context information provided by the context information providing unit 300, 200). &Lt; / RTI >

여기서, 데이터베이스(200)에는 상기 객체 인식부를 통해 인식된 객체 정보와, 공간, 위치에 따른 객체 정보에 매칭되는 객체의 라벨 정보를 포함하는 단일 레이블 정보를 저장 관리하는 단일 레이블 정보를 저장 관리한다. 예를 들면, 상기 데이터베이스(300)에 저장되는 단일 레이블 정보는 홈 라벨 정보, 사무실(office) 라벨 정보 및 부엌 라벨 정보와 같은 공간에 따른 다른 형식의 단일 레이블 정보들이 저장 관리된다. Here, the database 200 stores and manages single-label information for storing and managing single-label information including object information recognized through the object recognition unit and label information of objects matched with object information according to space and position. For example, the single label information stored in the database 300 stores and manages single label information of different formats according to a space such as home label information, office label information, and kitchen label information.

예를 들어, 데이터베이스(200)에 저장된 부어 라벨 정보는 객체의 인식 정보인 "책상"과, "책상"에 매칭 정보인 "식탁"이 매칭되어 저장 관리될 수 있다. For example, the poured label information stored in the database 200 may be stored and managed by matching "desk", which is recognition information of the object, and "table", which is matching information, with "desk".

따라서, 객체 인식부(100)를 통해 인식된 사람 3명, 책상 1개, 의자 2개 및 빈 의자 1개와 같이 다양한 객체 정보가 제공되고, 상황 정보 제공부(300)로부터 제공된 상황 정보인 "부엌"이 제공됨에 따라, 객체 처리부(400)는 데이터베이스(200)의 저장된 부억에 대한 단일 레이블 정보를 선택하고, 그 부억에 대한 단일 레이블 정보를 통해 인식된 객체 정보를 매칭하여 해당 객체애 대한 라벨 정보를 출력하게 된다. Accordingly, various object information such as three persons recognized by the object recognition unit 100, one desk, two chairs and one empty chair are provided, and the status information provided from the situation information providing unit 300 The object processing unit 400 selects the single label information for the stored memory of the database 200 and matches the recognized object information through the single label information for the memory 200 to obtain the label information .

따라서, 객체 처리부(400)는 인식된 객체 정보 중 "책상"을 "식탁"으로 변환하여 "사람 3명, 식탁 1개, 의자 2개 및 빈 의자 1개"와 같은 객체에 대한 라벨 정보를 출력할 수 있게 된다. Accordingly, the object processing unit 400 converts the " desk " of the recognized object information into " table ", and outputs label information about an object such as " three persons, one table, two chairs and one empty chair & .

만약, 상황 정보 제공부(300)로부터 제공되는 상황 정보가 "부엌"이 아닌 경우, 객체 처리부(400)는 상기 실시예에서와 같이 "책상"을 "식탁"이라 변경하지 않고 객체 인식부(100)를 통해 인식된 "책상" 그대로 라벨 정보를 출력하게 될 수도 있다. If the situation information provided from the situation information providing unit 300 is not "kitchen", the object processing unit 400 does not change the "desk" to the "dining table" Quot; desk " recognized through the < / RTI >

만약, 다른 단일 레이블 정보가 선택된 경우라면, 객체 처리부(400)는 그 단일 레이블 정보에 대응되도록 "책상"을 다른 용어로 변경된 라벨 정보를 출력할 수도 있다. If another single label information is selected, the object processing unit 400 may output the label information changed from the " desk " to another term so as to correspond to the single label information.

도 3은 본 발명의 일 실시예에 채용된 상황 정보 제공부(300)의 상세 구성을 설명하기 위한 기능 블럭도이다. 도 3에 도시된 바와 같이, 본 발명의 일 실시예에 채용된 상황 정보 제공부(300)는 도 3에 도시된 바와 같이 음성 추출부(310)와 상황 정보 분석부(320)를 포함하여 이루어진다. FIG. 3 is a functional block diagram illustrating a detailed configuration of the context information providing unit 300 employed in an embodiment of the present invention. 3, the context information providing unit 300 employed in the embodiment of the present invention includes a speech extracting unit 310 and a context information analyzing unit 320 as shown in FIG. 3 .

음성 추출부(310)는 상기 입력되는 영상으로부터 음성 정보를 추출하는 역할을 한다. 이와 같이, 음성 추출부(310)를 통해 객체를 추출하고자 하는 영상으로부터 음성 정보를 추출함으로써, 동기화된 정보를 통해 영상의 상황 정보를 분석할 수 있게 된다. The voice extracting unit 310 extracts voice information from the input image. As described above, the audio extracting unit 310 extracts audio information from an image to extract an object, so that it is possible to analyze the situation information of the image through the synchronized information.

상황 정보 분석부(320)는 상기 음성 추출부(310)를 통해 추출된 음성으로부터 분석한 영상의 상황 정보를 제공하는 역할을 한다. The context information analyzing unit 320 provides context information of the analyzed image from the voice extracted through the speech extracting unit 310.

즉, 본 발명의 일 실시예에 따르면, 객체를 추출하고자 영상과 동기화된 음성을 통해 영상의 상황 정보를 분석할 수 있는 효과가 있다. That is, according to an embodiment of the present invention, there is an effect that context information of an image can be analyzed through voice synchronized with an image in order to extract an object.

도 4는 본 발명의 일 실시예에 채용된 다른 상황 정보 제공부의 세부 구성을 설명하기 위한 기능블럭도이다. 도 4에 도시된 바와 같이, 상황 정보 제공부(300)는 텍스트 변환부(330)를 더 포함할 수 있다. 4 is a functional block diagram for explaining a detailed configuration of another situation information providing unit employed in an embodiment of the present invention. As shown in FIG. 4, the context information providing unit 300 may further include a text conversion unit 330.

텍스트 변환부(330)는 음성 추출부(310)를 통해 추출된 음성을 텍스트로 변환하는 역할을 한다. The text conversion unit 330 converts the voice extracted through the voice extraction unit 310 into text.

이에, 상황 정보 분석부(320)는 텍스트 변환부(330)를 통해 변환된 텍스트로부터 공간과 상황에 대한 상황 정보를 제공할 수 있다. Accordingly, the context information analyzer 320 can provide context information on the space and the context from the text converted through the text transformation unit 330.

그리고, 본 발명의 일 실시예에 채용된 상황 정보 제공부(300)는 이미지 기반 공간 분류/인식 엔진을 통해서 입력된 영상으로부터 상황 정보를 분석하여 인식하는 공간인식부(340)를 포함한다. In addition, the context information providing unit 300 employed in the embodiment of the present invention includes a spatial recognition unit 340 for analyzing and recognizing context information from an input image through an image-based spatial classification / recognition engine.

이러한, 공간인식부(340)는 음성 및 텍스트가 포함되지 않은 경우에도 영상을 분석하여 상황 정보를 분석함으로써, CCTV와 같은 장치에서도 단일 추론을 통해 객체 인식 결과를 추론할 수 있는 장점이 있다. The spatial recognition unit 340 analyzes the image and analyzes the situation information even when the voice and the text are not included, so that it is possible to deduce an object recognition result through a single inference even in a device such as CCTV.

도 5는 본 발명의 다른 실시예에 채용된 상황 정보 제공부(300)를 설명하기 위한 기능 블럭도이다. 도 5에 도시된 바와 같이, 본 발명의 다른 실시예에 채용된 상황 정보 제공부(300)는 파일 입력 인터페이스(350)를 포함할 수 있다. 5 is a functional block diagram for explaining a situation information providing unit 300 employed in another embodiment of the present invention. As shown in FIG. 5, the context information providing unit 300 employed in another embodiment of the present invention may include a file input interface 350.

파일 입력 인터페이스(350)는 텍스트를 포함하는 파일을 입력받는 역할을 한다. The file input interface 350 serves to receive a file containing text.

따라서, 본 발명의 다른 실시예에 따르면, 파일 입력 인터페이스를 통해 외부 단말로부터 다양한 방법으로 텍스트를 포함하는 파일을 입력받고, 그 파일에 포함된 텍스트를 상황 정보 분석부(320)가 분석함으로써, 용이하게 상황 정보를 획득할 수 있는 장점이 있다. Accordingly, according to another embodiment of the present invention, the context information analyzing unit 320 analyzes the text included in the file in a variety of ways from an external terminal through a file input interface, There is an advantage in that the situation information can be obtained.

그리고 본 발명의 다른 실시예에 채용된 상황 정보 제공부(300)는 사용자 입력 인터페이스(360)를 포함할 수 있다. And the status information providing unit 300 employed in another embodiment of the present invention may include a user input interface 360. [

사용자 입력 인터페이스(360)는 사용자로부터 직접 상황 정보를 입력받는 역할을 하는 것으로, 키 입력 수단이 이용될 수 있다. 즉, 사용자가 직접 영상의 상황 정보를 직접 입력함으로써, 상황 정보를 검출하기 위한 시간을 단축할 수 있는 효과가 있다. The user input interface 360 receives the situation information directly from the user, and key input means can be used. That is, there is an effect that the time for detecting the situation information can be shortened by directly inputting the situation information of the image directly by the user.

또한, 상황 정보 제공부(300)는 GPS, 네비게이션과 같은 위치 센서(370)로부터 위치정보를 수신하고, 그 정보로부터 상황 정보를 획득할 수도 있다. Also, the situation information providing unit 300 may receive the position information from the position sensor 370 such as GPS and navigation, and may obtain the situation information from the information.

이와 같이, 상황 정보 제공부(300)가 GPS 등과 같은 위치 센서(370)와 연결되고, 자동차와 같은 이동 수단에 장착된 경우, 별다른 추가 동작 없이도 각 위치에 맞게 객체의 라벨 정보 출력을 가능하게 하는 장점이 있다. In this manner, when the situation information providing unit 300 is connected to a position sensor 370 such as a GPS or the like and is mounted on a moving means such as an automobile, it is possible to output label information of the object to each position without any additional operation There are advantages.

이하, 하기에서는 본 발명의 일 실시예에 따른 단일 추론 다중 레이블 생성 기반의 객체 인식 방법에 대하여 도 6을 참조하여 설명하기로 한다. Hereinafter, an object recognition method based on a single inference multiple label generation according to an embodiment of the present invention will be described with reference to FIG.

먼저, 객체 인식부(100)가 입력되는 영상으로부터 추론하고자 하는 객체 정보를 인식한다(S100). First, the object recognition unit 100 recognizes object information to be inferred from the input image (S100).

이어서, 상황 정보 제공부(300)가 인식된 객체 정보를 객체의 라벨 정보로 매칭될 수 있도록, 영상의 상황 정보를 제공한다(S200). Then, the context information providing unit 300 provides context information of the image so that the recognized object information can be matched with the label information of the object (S200).

이후, 객체 라벨 처리부(400)가 상기 상황 정보 제공부(300)로부터 입력된 상황 정보를 기반으로 상기 데이터베이스(200)에서 Thereafter, the object label processing unit 400 extracts, based on the context information input from the context information providing unit 300,

상기 객체 인식부를 통해 인식된 객체 정보와, 공간, 위치에 따른 객체 정보에 매칭되는 객체의 라벨 정보를 포함하는 단일 레이블 정보를 선택한다(S300). In operation S300, a single label information including object information recognized through the object recognition unit and label information of an object matching space information and object information according to a location is selected.

그러면, 상기 객체 라벨 처리부(400)가 상기 객체 인식부(100)를 통해 인식된 객체 정보를 상기 선택한 단일 레이블 정보에 매칭하여 상기 인식된 객체를 인지하고, 그 인지된 객체 정보를 출력한다(S400). Then, the object label processing unit 400 recognizes the recognized object by matching the object information recognized through the object recognition unit 100 with the selected single label information, and outputs the recognized object information (S400 ).

도 7은 본 발명의 일 실시예에서 상황 제공 방법의 세부 과정을 설명하기 위한 순서도이다. 도 7에 도시된 바와 같이, 본 발명의 일 실시예에 채용된 상황 정보 제공하는 단계(S200)는 음성 추출부(310)가 상기 입력되는 영상으로부터 음성 정보를 추출한다(S210). 7 is a flowchart illustrating a detailed procedure of a method for providing a situation in an embodiment of the present invention. As shown in FIG. 7, in operation S200 of providing context information used in an embodiment of the present invention, the speech extraction unit 310 extracts speech information from the input image (S210).

이어서, 상황 정보 분석부(320)가 상기 음성 추출부(310)를 통해 추출된 음성으로부터 분석한 영상의 상황 정보를 제공한다(S220). Then, the context information analyzing unit 320 provides context information of the analyzed image from the speech extracted through the speech extracting unit 310 (S220).

이와 같이, 본 발명의 일 실시예에 따르면, 객체를 추출하고자 영상과 동기화된 음성을 통해 영상의 상황 정보를 분석할 수 있는 효과가 있다. As described above, according to an embodiment of the present invention, there is an effect that context information of an image can be analyzed through voice synchronized with an image in order to extract an object.

그리고 상기 상황 정보 제공하는 단계(S200)는, 텍스트 변환부(330)가 상기 음성 추출부(310)를 통해 추출된 음성을 텍스트로 변환하는 단계를 더 포함할 수 있다. The step S200 of providing the context information may further include converting the speech extracted by the text extracting unit 310 into text.

따라서, 상황 정보 분석부(320)가 상기 텍스트 변환부(330)를 통해 변환된 텍스트로부터 공간과 상황에 대한 상황 정보를 제공할 수 있다. Accordingly, the situation information analysis unit 320 can provide context information on the space and the situation from the text converted through the text conversion unit 330.

한편, 상기 상황 정보 제공하는 단계(S200)는, 입력 인터페이스가 텍스트를 포함하는 파일을 입력받을 수 있다. Meanwhile, in the step S200 of providing the context information, the input interface may receive a file including text.

따라서, 본 발명의 다른 실시예에 따르면, 파일 입력 인터페이스(350)를 통해 텍스트를 포함하는 파일을 입력받고, 그 파일에 포함된 텍스트를 상황 정보 분석부(320)가 분석함으로써, 용이하게 상황 정보를 획득할 수 있는 장점이 있다. Accordingly, according to another embodiment of the present invention, the file containing the text is inputted through the file input interface 350, and the context information analyzing unit 320 analyzes the text included in the file, Can be obtained.

그리고 상기 상황 정보 제공하는 단계(S200)는, 사용자 입력 인터페이스(360)가 사용자로부터 직접 상황 정보를 입력받을 수 있다. 즉, 사용자가 직접 영상의 상황 정보를 직접 입력함으로써, 상황 정보를 검출하기 위한 시간을 단축할 수 있는 효과가 있다. In operation S200, the user input interface 360 may receive context information directly from the user. That is, there is an effect that the time for detecting the situation information can be shortened by directly inputting the situation information of the image directly by the user.

한편, 상기 상황 정보 제공하는 단계(S200)는, 공간인식부(340)가 이미지 기반 공간 분류/인식 엔진을 통해서 입력된 영상으로부터 상황 정보를 분석하여 인식할 수도 있다. Meanwhile, the step S200 of providing the context information may allow the spatial recognition unit 340 to analyze the situation information from the image input through the image-based spatial classification / recognition engine.

이상, 본 발명의 구성에 대하여 첨부 도면을 참조하여 상세히 설명하였으나, 이는 예시에 불과한 것으로서, 본 발명이 속하는 기술분야에 통상의 지식을 가진자라면 본 발명의 기술적 사상의 범위 내에서 다양한 변형과 변경이 가능함은 물론이다. 따라서 본 발명의 보호 범위는 전술한 실시예에 국한되어서는 아니되며 이하의 특허청구범위의 기재에 의하여 정해져야 할 것이다. While the present invention has been described in detail with reference to the accompanying drawings, it is to be understood that the invention is not limited to the above-described embodiments. Those skilled in the art will appreciate that various modifications, Of course, this is possible. Accordingly, the scope of protection of the present invention should not be limited to the above-described embodiments, but should be determined by the description of the following claims.

100 : 객체 인식부 200 :데이터베이스
300 : 상황 정보 제공부 310 : 음성 추출부
320 : 상황 정보 분석부 330 : 텍스트 변환부
340 : 공간인식부 350 : 파일 입력 인터페이스
360 : 사용자 입력 인터페이스 370 : 위치 센서
400 : 객체 라벨 처리부 100: object recognition unit 200: database
300: Situation information providing unit 310:
320: situation information analysis unit 330: text conversion unit
340: Space recognition unit 350: File input interface
360: user input interface 370: position sensor
400: object label processor

Claims

An object recognition unit for recognizing object information to be inferred from an input image;
A single label information including recognizable object information and label information of the object corresponding to the object information is stored so that the object information recognized by the object recognition unit can be converted into label information of an object corresponding to space and position information Managed databases;
A status information providing unit for providing status information of the image so that the recognized object information can be matched with the label information of the object; And
Selecting single label information stored in the database based on the context information input from the context information providing unit, matching the object information recognized through the object recognition unit with the selected single label information, And an object label processing unit for outputting the object label.

The method according to claim 1,
The situation information providing unit,
A voice extracting unit for extracting voice information from the input image; And
And a context information analyzer for providing context information of an image analyzed from the speech extracted through the speech extractor.

The method according to claim 1,
The situation information providing unit,
And an input interface for receiving a file containing text containing the context information.

The method according to claim 1,
The situation information providing unit,
A device for recognizing an object based on a single inference multiple label generation further comprising a user input interface for receiving situation information directly from a user.

The method according to claim 1,
The situation information providing unit,
And a spatial recognition unit for analyzing and recognizing the situation information from the input image through an image-based spatial classification / recognition engine.

The method according to claim 1,
The situation information providing unit,
Further comprising a position sensor for receiving position information,
Wherein the object recognition apparatus is based on a single inference multiple label generation method for analyzing and recognizing the situation information from the position information received through the position sensor.

Recognizing object information to be inferred from the input image;
Providing situation information of the image so that the situation information providing unit can match the recognized object information with the label information of the object;
The object label processing unit selecting single label information including object label information matched with the object information recognized in the database based on the context information input from the context information providing unit; And
And outputting label information of the object by matching the object information recognized by the object labeling unit with the selected single label information by the object label processing unit.

8. The method of claim 7,
Wherein the providing of the situation information comprises:
Extracting audio information from the input image; And
And providing the context information analyzing unit with context information of an image analyzed from the speech extracted through the speech extracting unit.

9. The method of claim 8,
Wherein the providing of the situation information comprises:
Converting the voice extracted through the voice extracting unit into text by the text converting unit; And
And providing the context information analyzing unit with context information on the space and the context from the text converted through the text conversion unit.

10. The method of claim 9,
Wherein the providing of the situation information comprises:
The method of claim 1, further comprising: receiving an input interface file containing a text;

8. The method of claim 7,
Wherein the providing of the situation information comprises:
Wherein the user input interface further comprises receiving context information directly from a user.

8. The method of claim 7,
Wherein the providing of the situation information comprises:
And a step of analyzing and recognizing the situation information from the input image through the spatial recognition unit, an image-based spatial classification / recognition engine, and a recognition method based on a single inference multiple label generation.