KR102041987B1

KR102041987B1 - An apparatus and method for recognizing an object based on a single inference multiple label generation

Info

Publication number: KR102041987B1
Application number: KR1020170137596A
Authority: KR
Inventors: 조충상; 이영한; 정혜동
Original assignee: 전자부품연구원
Priority date: 2017-10-23
Filing date: 2017-10-23
Publication date: 2019-11-07
Also published as: KR20190044950A

Abstract

본 발명은 단일 추론 다중 레이블 생성 기반의 객체 인식 장치에 관한 것으로, 입력되는 영상으로부터 추론하고자 하는 객체와 배경을 인식하는 객체 인식부; 공간, 위치, 목적에 따라 다른 형식으로 사용되는 단일 레이블 정보를 저장 관리하는 데이터베이스; 인식된 객체 정보를 객체의 라벨 정보로 매칭될 수 있도록, 영상의 상황 정보를 제공하는 상황 정보 제공부; 및 상황 정보 제공부로부터 입력된 상황 정보를 기반으로 데이터베이스에 저장된 단일 레이블 정보를 선택하고, 객체 인식부를 통해 인식된 객체 정보를 선택한 단일 레이블 정보에 매칭하여 인식된 객체를 인지하고, 그 인지된 객체 정보를 출력하는 객체 라벨 처리부;를 포함한다. The present invention relates to an object recognition apparatus based on a single inference multiple label generation, comprising: an object recognition unit recognizing an object and a background to be inferred from an input image; A database for storing and managing single label information used in different formats according to space, location, and purpose; A situation information providing unit providing context information of an image so that the recognized object information can be matched with label information of the object; And selecting the single label information stored in the database based on the context information input from the context information provider, matching the object information recognized through the object recognition unit with the selected single label information, and recognizing the recognized object. It includes; object label processing unit for outputting information.

Description

An apparatus and method for recognizing an object based on a single inference multiple label generation}

본 발명은 단일 추론 다중 레이블 생성 기반의 객체 인식 장치 및 방법에 관한 것으로, 입력되는 영상에서 딥러닝 방법을 통해 객체를 추출하되, 도메인별로 객체를 인식하지 않고 상위 개념을 포괄하는 라벨과의 매칭을 통해 객체의 라벨링을 수행하는 단일 추론 다중 레이블 생성 기반의 객체 인식 장치 및 방법에 관한 것이다. The present invention relates to an object recognition apparatus and method based on single inference and multiple label generation, and extracts an object from an input image through a deep learning method, but does not recognize an object for each domain and matches a label covering a higher concept. The present invention relates to an object recognition apparatus and method based on a single inference and multiple label generation for performing object labeling.

종래 객체 인식 장치는 검색하고자 하는 도메인에 맞추어 학습된 모델을 기반으로 추론을 수행하며 그 결과를 객체 추론에 사용한다. The conventional object recognition apparatus performs inference based on a model trained on a domain to be searched and uses the result in object inference.

이러한 구조를 갖는 종래 객체 인식 장치는 원하는 도메인에 맞는 모델을 생성하기 위한 작업을 수행해야 한다. A conventional object recognition apparatus having such a structure must perform a task for generating a model suitable for a desired domain.

이와 같이, 종래 객체 인식 장치는 도메인에 맞는 모델을 생성함으로써, 도메인이 변경되었을 경우 최적의 결과를 도출할 수 없는 문제점이 있다. As described above, the conventional object recognition apparatus has a problem in that it is not possible to derive an optimal result when the domain is changed by generating a model suitable for the domain.

한편, 특허출원 10-2015-0132625호(객체 인식 장치 및 방법, 객체 인식 모델 학습 장치 및 방법)이 2015년 09월 18일 출원된바 있다. 종래 객체 인식 장치는 이미지 프레임 내에서 각각의 픽셀에 설정하고, 심층 신경망 기반의 모델을 이용하여 각 픽셀들을 레이블링하되, 레이블된 제1 픽셀들을 기초로 객체를 인식하는 것과 같은 특징을 갖는 특허가 출원된 바도 있다. On the other hand, Patent Application No. 10-2015-0132625 (object recognition apparatus and method, object recognition model learning apparatus and method) has been filed on September 18, 2015. Conventional object recognition apparatus is set to each pixel in the image frame, and using a deep neural network-based model to label each pixel, a patent application having the characteristics such as recognizing the object based on the labeled first pixels There is also.

본 발명은 종래 문제점을 해결하기 위해 안출된 것으로, 본 발명의 목적은 상위 개념들을 포함한 레벨 데이터로 객체 인식을 학습함으로써, 도메인의 조건에 관계없이 객체 인식을 수행할 수 있는 단일 추론 다중 레이블 생성 기반의 객체 인식 장치 및 방법을 제공하고자 하는 것이다. SUMMARY OF THE INVENTION The present invention has been made to solve a conventional problem, and an object of the present invention is to learn object recognition with level data including higher concepts, and to generate object inference based on a single inference multiple label regardless of domain conditions. An object recognition apparatus and method are provided.

본 발명의 목적은 이상에서 언급한 목적으로 제한되지 않으며, 언급되지 않은 또 다른 목적들은 아래의 기재로부터 당업자에게 명확하게 이해될 수 있을 것이다.The object of the present invention is not limited to the above-mentioned object, and other objects that are not mentioned will be clearly understood by those skilled in the art from the following description.

상기 목적을 달성하기 위한 본 발명의 일 실시예에 따른 단일 추론 다중 레이블 생성 기반의 객체 인식 장치는 입력되는 영상으로부터 추론하고자 하는 객체 정보를 인식하는 객체 인식부; 상기 객체 인식부를 통해 인식된 객체 정보를 공간, 위치 정보에 대응되는 객체의 라벨 정보로 변환할 수 있도록, 인식 가능한 객체 정보와 그 객체 정보에 대응되는 객체의 라벨 정보가 포함된 단일 레이블 정보를 저장 관리하는 데이터베이스; 상기 인식된 객체 정보가 상기 객체의 라벨 정보로 매칭될 수 있도록, 영상의 상황 정보를 제공하는 상황 정보 제공부; 및 상기 상황 정보 제공부로부터 입력된 상기 상황 정보를 기반으로 상기 데이터베이스에 저장된 단일 레이블 정보를 선택하고, 상기 객체 인식부를 통해 인식된 객체 정보를 상기 선택한 단일 레이블 정보에 매칭하여 상기 매칭된 객체의 라벨 정보를 출력하는 객체 라벨 처리부;를 포함한다. According to an embodiment of the present invention, there is provided an object recognition apparatus based on a single inference multiple label generation, including: an object recognition unit recognizing object information to be inferred from an input image; Stores single label information including recognizable object information and label information of an object corresponding to the object information so that the object information recognized by the object recognition unit can be converted into label information of an object corresponding to space and position information. Managing database; A situation information providing unit providing situation information of an image so that the recognized object information can be matched with label information of the object; And selecting single label information stored in the database based on the context information input from the context information providing unit, matching object information recognized through the object recognition unit with the selected single label information to label the matched object. It includes; object label processing unit for outputting information.

한편, 상기 상황 정보 제공부는, 상기 입력되는 영상으로부터 음성 정보를 추출하는 음성 추출부; 및 상기 음성 추출부를 통해 추출된 음성으로부터 분석한 영상의 상황 정보를 제공하는 상황 정보 분석부;를 포함한다. On the other hand, the situation information providing unit, the voice extraction unit for extracting the audio information from the input image; And a situation information analyzer configured to provide situation information of an image analyzed from the voice extracted by the voice extractor.

또한, 상기 상황 정보 제공부는, 상기 음성 추출부를 통해 추출된 음성을 텍스트로 변환하는 텍스트 변환부; 및 상기 텍스트 변환부를 통해 변환된 텍스트로부터 공간과 상황에 대한 상황 정보를 제공하는 상황 정보 분석부를 포함할 수 있다. The situation information providing unit may include: a text converter configured to convert the voice extracted by the voice extractor into text; And a contextual information analysis unit which provides contextual information on spaces and contexts from the text converted through the text transformation unit.

그리고, 상기 상황 정보 제공부는, 텍스트를 포함하는 파일을 입력받기 위한 입력 인터페이스를 더 포함한다. The contextual information providing unit further includes an input interface for receiving a file including text.

또한, 사용자로부터 직접 상황 정보를 입력받을 수 있도록, 사용자 입력 인터페이스를 더 포함한다. The apparatus may further include a user input interface to receive context information directly from the user.

한편, 상기 상황 정보 제공부는, 이미지 기반 공간 분류/인식 엔진을 통해서 입력된 영상으로부터 상황 정보를 분석하여 인식하는 공간인식부를 포함한다. The situation information providing unit may include a space recognition unit that analyzes and recognizes situation information from an image input through an image-based spatial classification / recognition engine.

그리고 본 발명의 일 실시예에 따른 단일 추론 다중 레이블 생성 기반의 객체 인식 방법은 객체 인식부가 입력되는 영상으로부터 추론하고자 하는 객체 정보를 인식하는 단계; 상황 정보 제공부가 인식된 객체 정보를 객체의 라벨 정보로 매칭될 수 있도록, 영상의 상황 정보를 제공하는 단계; 객체 라벨 처리부가 상기 상황 정보 제공부로부터 입력된 상황 정보를 기반으로 상기 데이터베이스에서 인식된 객체 정보와 매칭되는 객체 라벨 정보를 포함하는 단일 레이블 정보를 선택하는 단계; 상기 객체 라벨 처리부가 상기 객체 인식부를 통해 인식된 객체 정보를 상기 선택한 단일 레이블 정보에 매칭하여 상기 객체의 라벨 정보를 출력하는 단계;를 포함한다. The object recognition method based on the single inference multi-label generation according to an embodiment of the present invention comprises the steps of: recognizing object information to be inferred from an input image of the object recognition unit; Providing context information of an image so that the context information provider may match the recognized object information with label information of the object; Selecting, by the object label processing unit, single label information including object label information matching the object information recognized in the database based on the context information input from the context information providing unit; And outputting label information of the object by matching the object information recognized by the object label processing unit with the selected single label information by the object label processing unit.

여기서, 상기 상황 정보 제공하는 단계는, 음성 추출부가 상기 입력되는 영상으로부터 음성 정보를 추출하는 단계; 및 상황 정보 분석부가 상기 음성 추출부를 통해 추출된 음성으로부터 분석한 영상의 상황 정보를 제공하는 단계;를 포함한다. The providing of the situation information may include: extracting audio information from the input image by a voice extractor; And providing contextual information of an image analyzed from the speech extracted by the context information analyzer by the speech extractor.

그리고 상기 상황 정보 제공하는 단계는, 텍스트 변환부가 상기 음성 추출부를 통해 추출된 음성을 텍스트로 변환하는 단계; 및 상황 정보 분석부가 상기 텍스트 변환부를 통해 변환된 텍스트로부터 공간과 상황에 대한 상황 정보를 제공하는 단계;를 포함한다. The providing of the context information may include: converting a voice extracted by the text extractor into text into a text; And a contextual information analyzing unit providing contextual information on spaces and contexts from the text converted by the text converting unit.

한편, 상기 상황 정보 제공하는 단계는, 입력 인터페이스가 텍스트를 포함하는 파일을 입력받는 단계;를 더 포함할 수 있다. The providing of the context information may further include receiving, by the input interface, a file including text.

그리고 상기 상황 정보 제공하는 단계는, 사용자 입력 인터페이스가 사용자로부터 직접 상황 정보를 입력받는 단계를 더 포함할 수 있다. The providing of the situation information may further include receiving, by the user input interface, the situation information directly from the user.

상기 상황 정보 제공하는 단계는, 공간인식부가 이미지 기반 공간 분류/인식 엔진을 통해서 입력된 영상으로부터 상황 정보를 분석하여 인식하는 단계;를 포함할 수 있다. The providing of the situation information may include: analyzing and recognizing, by the spatial recognition unit, the context information from the image input through the image-based spatial classification / recognition engine.

본 발명의 일 실시예에 따르면, 객체 인식을 위한 도메인의 변화가 발생하더라도, 도메인에 대응되는 모델을 다시 학습하지 않고, 단일 학습 모델을 기반으로 해당 도메인을 추론할 수 있는 효과가 있다. According to an embodiment of the present invention, even if a change of a domain for object recognition occurs, the domain can be inferred based on a single learning model without re-learning a model corresponding to the domain.

본 발명의 다른 효과에 따르면, 단일 학습 모델을 통해 다양한 도메인에 맞는 객체 인식 결과를 제공해줄 수 있는 장점이 있다. According to another effect of the present invention, there is an advantage that can provide object recognition results for various domains through a single learning model.

도 1은 본 발명의 일 실시예에 따른 단일 추론 다중 레이블 생성 기반의 객체 인식 장치를 설명하기 위한 기능블럭도.
도 2는 본 발명의 일실시예에서 객체 인식부를 통해 인식된 객체 정보를 설명하기 위한 참고도.
도 3은 본 발명의 일 실시예에 채용된 상황 정보 제공부의 세부 구성을 설명하기 위한 기능블럭도.
도 4는 본 발명의 일 실시예에 채용된 다른 상황 정보 제공부의 세부 구성을 설명하기 위한 기능블럭도.
도 5는 본 발명의 다른 실시예에 채용된 상황 정보 제공부를 설명하기 위한 기능 블럭도.
도 6은 본 발명의 일 실시예에 따른 단일 추론 다중 레이블 생성 기반의 객체 인식 방법을 설명하기 위한 순서도.
도 7은 본 발명의 일 실시예에서 상황 제공 방법의 세부 과정을 설명하기 위한 순서도이다. 1 is a functional block diagram illustrating an object recognition apparatus based on a single inference multiple label generation according to an embodiment of the present invention.
2 is a reference diagram for explaining object information recognized through an object recognition unit in an embodiment of the present invention.
Figure 3 is a functional block diagram for explaining the detailed configuration of the status information providing unit employed in an embodiment of the present invention.
Figure 4 is a functional block diagram for explaining the detailed configuration of another context information providing unit employed in an embodiment of the present invention.
5 is a functional block diagram for explaining a context information providing unit employed in another embodiment of the present invention.
6 is a flowchart illustrating an object recognition method based on single inference multi-label generation according to an embodiment of the present invention.
7 is a flowchart illustrating a detailed process of a situation providing method according to an embodiment of the present invention.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 것이며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하며, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다. 한편, 본 명세서에서 사용된 용어는 실시예들을 설명하기 위한 것이며 본 발명을 제한하고자 하는 것은 아니다. 본 명세서에서, 단수형은 문구에서 특별히 언급하지 않는 한 복수형도 포함한다. 명세서에서 사용되는 "포함한다(comprises)" 및/또는 "포함하는(comprising)"은 언급된 구성소자, 단계, 동작 및/또는 소자는 하나 이상의 다른 구성소자, 단계, 동작 및/또는 소자의 존재 또는 추가를 배제하지 않는다.Advantages and features of the present invention and methods for achieving them will be apparent with reference to the embodiments described below in detail with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below, but will be implemented in various forms, and only the present embodiments are intended to complete the disclosure of the present invention, and the general knowledge in the art to which the present invention pertains. It is provided to fully convey the scope of the invention to those skilled in the art, and the present invention is defined only by the scope of the claims. Meanwhile, the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. In this specification, the singular also includes the plural unless specifically stated otherwise in the phrase. As used herein, “comprises” and / or “comprising” refers to a component, step, operation and / or device that is present in one or more other components, steps, operations and / or elements. Or does not exclude additions.

이하, 본 발명의 바람직한 실시예에 대하여 첨부한 도면을 참조하여 상세히 설명하기로 한다. Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른 단일 추론 다중 레이블 생성 기반의 객체 인식 장치를 설명하기 위한 기능블럭도이다. 도 1에 도시된 바와 같이, 본 발명의 일 실시예에 따른 단일 추론 다중 레이블 생성 기반의 객체 인식 장치는 객체 인식부(100), 데이터베이스(200), 상황 정보 제공부(300) 및 객체 라벨 처리부(400)를 포함하여 이루어진다. 1 is a functional block diagram illustrating an object recognition apparatus based on a single inference multiple label generation according to an embodiment of the present invention. As shown in FIG. 1, an object recognition apparatus based on a single inference multi-label generation according to an embodiment of the present invention includes an object recognition unit 100, a database 200, a context information providing unit 300, and an object label processing unit. 400 is made.

객체 인식부(100)는 입력되는 영상으로부터 추론하고자 하는 객체와 배경을 인식하는 역할을 한다. The object recognizer 100 recognizes an object and a background to be inferred from the input image.

그리고 데이터베이스(200)는 상기 객체 인식부를 통해 인식된 객체 정보와, 공간, 위치에 따른 객체 정보에 매칭되는 객체의 라벨 정보를 포함하는 단일 레이블 정보를 저장 관리하는 단일 레이블 정보를 저장 관리한다. 예를 들면, 상기 데이터베이스(300)에 저장되는 단일 레이블 정보는 홈 라벨 정보, 사무실(office) 라벨 정보 및 부엌 라벨 정보와 같은 공간에 따른 다른 형식의 단일 레이블 정보들이 저장 관리된다. The database 200 stores and manages single label information for storing and managing single label information including object information recognized through the object recognition unit and label information of an object matching the object information according to space and position. For example, the single label information stored in the database 300 is stored and managed in different formats according to a space such as home label information, office label information, and kitchen label information.

또한 상황 정보 제공부(300)는 인식된 객체 정보를 객체의 라벨 정보로 매칭될 수 있도록, 영상의 상황 정보를 제공하는 역할을 한다. 여기서, 상황 정보 제공부(300)가 제공하는 영상의 상황 정보는 단일 레이블 정보를 판단하기 위한 정보이다. In addition, the context information providing unit 300 serves to provide context information of an image so that the recognized object information may be matched with the label information of the object. Here, the situation information of the image provided by the situation information providing unit 300 is information for determining the single label information.

그리고 객체 라벨 처리부(400)는 상기 상황 정보 제공부(300)로부터 입력된 상황 정보를 기반으로 상기 데이터베이스(200)에 저장된 단일 레이블 정보를 선택하고, 상기 객체 인식부(100)를 통해 인식된 객체 정보를 상기 선택한 단일 레이블 정보에 매칭하여 상기 매칭된 객체의 라벨 정보를 출력하는 역할을 한다. The object label processing unit 400 selects single label information stored in the database 200 based on the context information input from the context information providing unit 300, and recognizes the object through the object recognition unit 100. Matching information to the selected single label information and outputting label information of the matched object.

따라서, 본 발명의 일 실시예에 따르면, 종래 기술에서와 같이 입력되는 영상에서 객체를 인지할 때, 도메인 즉 객체를 직접 데이터베이스에 저장된 데이터와 비교하지 않고 상위 개념의 학습 모델인 단일 레이블 정보에 매칭함으로써, 단일 추론으로 여러 도메인에 맞는 라벨을 제공할 수 있는 효과가 있다. Accordingly, according to an embodiment of the present invention, when recognizing an object in an input image as in the prior art, the domain, that is, the object, does not directly compare with the data stored in the database, but matches the single label information that is a higher learning model. As a result, a single inference can be provided to provide labels for multiple domains.

도 2는 본 발명의 일실시예에서 객체 인식부를 통해 인식된 객체 정보를 설명하기 위한 참고도이다, 도 2에 도시된 바와 같이, 본 발명의 일 실시예에 따르면, 객체 인식부(100)는 영상에 대하여 객체 인식 작업을 수행할 경우, 그 영상으로부터 사람 3명, 책상 1개, 의자 2개 및 빈 의자 1개와 같이 다양한 객체를 인식할 수 있다. 본 실시예에서는 객체 인식부(100)가 상기 나열된 객체에 대하여만 인식하였으나 이를 한정하지 않고 추가적인 다른 객체의 인식 또한 가능하다. 2 is a reference diagram for explaining object information recognized through an object recognizer in an embodiment of the present invention. As shown in FIG. 2, according to an embodiment of the present invention, the object recognizer 100 is When object recognition is performed on an image, various objects such as three people, a desk, two chairs, and one empty chair can be recognized from the image. In the present embodiment, the object recognition unit 100 recognizes only the objects listed above, but not limited thereto, and additional objects may be recognized.

이때, 상황 정보 제공부(300)는 객체를 인식하고자 하는 영상의 상황 정보를 제공한다. 만약, 상황 정보 제공부(300)가 제공하는 상황 정보가 "부엌"인 경우, 객체 처리부(400)는 상황 정보 제공부(300)가 제공한 상황 정보, 즉, "부엌"을 기반으로 데이터베이스(200)에 저장된 단일 레이블 정보를 선택한다. In this case, the contextual information providing unit 300 provides contextual information of an image to recognize an object. If the contextual information provided by the contextual information providing unit 300 is "kitchen", the object processing unit 400 may generate a database based on the contextual information provided by the contextual information providing unit 300, that is, "kitchen". Select the single label information stored in step 200).

여기서, 데이터베이스(200)에는 상기 객체 인식부를 통해 인식된 객체 정보와, 공간, 위치에 따른 객체 정보에 매칭되는 객체의 라벨 정보를 포함하는 단일 레이블 정보를 저장 관리하는 단일 레이블 정보를 저장 관리한다. 예를 들면, 상기 데이터베이스(300)에 저장되는 단일 레이블 정보는 홈 라벨 정보, 사무실(office) 라벨 정보 및 부엌 라벨 정보와 같은 공간에 따른 다른 형식의 단일 레이블 정보들이 저장 관리된다. Here, the database 200 stores and manages single label information for storing and managing single label information including object information recognized through the object recognition unit and label information of an object matching the object information according to space and position. For example, the single label information stored in the database 300 is stored and managed in different formats according to a space such as home label information, office label information, and kitchen label information.

예를 들어, 데이터베이스(200)에 저장된 부어 라벨 정보는 객체의 인식 정보인 "책상"과, "책상"에 매칭 정보인 "식탁"이 매칭되어 저장 관리될 수 있다. For example, the pour label information stored in the database 200 may be stored and managed by matching “desk” which is recognition information of an object and “dining table” which is matching information to “desk”.

따라서, 객체 인식부(100)를 통해 인식된 사람 3명, 책상 1개, 의자 2개 및 빈 의자 1개와 같이 다양한 객체 정보가 제공되고, 상황 정보 제공부(300)로부터 제공된 상황 정보인 "부엌"이 제공됨에 따라, 객체 처리부(400)는 데이터베이스(200)의 저장된 부억에 대한 단일 레이블 정보를 선택하고, 그 부억에 대한 단일 레이블 정보를 통해 인식된 객체 정보를 매칭하여 해당 객체애 대한 라벨 정보를 출력하게 된다. Accordingly, various object information is provided, such as three persons recognized through the object recognition unit 100, one desk, two chairs, and one empty chair, and the "kitchen" which is the context information provided from the context information provider 300. "Is provided, the object processing unit 400 selects the single label information for the stored kitchen of the database 200, and matches the recognized object information through the single label information for the kitchen, the label information for the object Will print

따라서, 객체 처리부(400)는 인식된 객체 정보 중 "책상"을 "식탁"으로 변환하여 "사람 3명, 식탁 1개, 의자 2개 및 빈 의자 1개"와 같은 객체에 대한 라벨 정보를 출력할 수 있게 된다. Accordingly, the object processing unit 400 converts "desk" from the recognized object information into "dining table" and outputs label information on objects such as "three people, one dining table, two chairs, and one empty chair". You can do it.

만약, 상황 정보 제공부(300)로부터 제공되는 상황 정보가 "부엌"이 아닌 경우, 객체 처리부(400)는 상기 실시예에서와 같이 "책상"을 "식탁"이라 변경하지 않고 객체 인식부(100)를 통해 인식된 "책상" 그대로 라벨 정보를 출력하게 될 수도 있다. If the contextual information provided from the contextual information providing unit 300 is not "kitchen", the object processing unit 400 does not change the "desk" to "dining table" as in the above embodiment, and the object recognition unit 100. The label information may be output as is the "desktop" recognized through).

만약, 다른 단일 레이블 정보가 선택된 경우라면, 객체 처리부(400)는 그 단일 레이블 정보에 대응되도록 "책상"을 다른 용어로 변경된 라벨 정보를 출력할 수도 있다. If other single label information is selected, the object processing unit 400 may output label information in which “desk” is changed to another term so as to correspond to the single label information.

도 3은 본 발명의 일 실시예에 채용된 상황 정보 제공부(300)의 상세 구성을 설명하기 위한 기능 블럭도이다. 도 3에 도시된 바와 같이, 본 발명의 일 실시예에 채용된 상황 정보 제공부(300)는 도 3에 도시된 바와 같이 음성 추출부(310)와 상황 정보 분석부(320)를 포함하여 이루어진다. 3 is a functional block diagram illustrating a detailed configuration of the situation information providing unit 300 employed in an embodiment of the present invention. As shown in FIG. 3, the situation information providing unit 300 employed in an embodiment of the present invention includes a voice extracting unit 310 and a situation information analyzing unit 320 as shown in FIG. 3. .

음성 추출부(310)는 상기 입력되는 영상으로부터 음성 정보를 추출하는 역할을 한다. 이와 같이, 음성 추출부(310)를 통해 객체를 추출하고자 하는 영상으로부터 음성 정보를 추출함으로써, 동기화된 정보를 통해 영상의 상황 정보를 분석할 수 있게 된다. The voice extractor 310 extracts voice information from the input image. In this way, by extracting the voice information from the image to extract the object through the voice extraction unit 310, it is possible to analyze the situation information of the image through the synchronized information.

상황 정보 분석부(320)는 상기 음성 추출부(310)를 통해 추출된 음성으로부터 분석한 영상의 상황 정보를 제공하는 역할을 한다. The context information analyzer 320 serves to provide context information of an image analyzed from the speech extracted by the speech extractor 310.

즉, 본 발명의 일 실시예에 따르면, 객체를 추출하고자 영상과 동기화된 음성을 통해 영상의 상황 정보를 분석할 수 있는 효과가 있다. That is, according to an embodiment of the present invention, there is an effect of analyzing the situation information of the image through the voice synchronized with the image to extract the object.

도 4는 본 발명의 일 실시예에 채용된 다른 상황 정보 제공부의 세부 구성을 설명하기 위한 기능블럭도이다. 도 4에 도시된 바와 같이, 상황 정보 제공부(300)는 텍스트 변환부(330)를 더 포함할 수 있다. 4 is a functional block diagram illustrating a detailed configuration of another situation information providing unit employed in an embodiment of the present invention. As shown in FIG. 4, the contextual information provider 300 may further include a text converter 330.

텍스트 변환부(330)는 음성 추출부(310)를 통해 추출된 음성을 텍스트로 변환하는 역할을 한다. The text converter 330 converts the voice extracted by the voice extractor 310 into text.

이에, 상황 정보 분석부(320)는 텍스트 변환부(330)를 통해 변환된 텍스트로부터 공간과 상황에 대한 상황 정보를 제공할 수 있다. Thus, the contextual information analysis unit 320 may provide contextual information on spaces and contexts from the text converted by the text transformation unit 330.

그리고, 본 발명의 일 실시예에 채용된 상황 정보 제공부(300)는 이미지 기반 공간 분류/인식 엔진을 통해서 입력된 영상으로부터 상황 정보를 분석하여 인식하는 공간인식부(340)를 포함한다. In addition, the contextual information provider 300 employed in the exemplary embodiment of the present invention includes a space recognizer 340 that analyzes and recognizes contextual information from an image input through an image-based spatial classifier / recognition engine.

이러한, 공간인식부(340)는 음성 및 텍스트가 포함되지 않은 경우에도 영상을 분석하여 상황 정보를 분석함으로써, CCTV와 같은 장치에서도 단일 추론을 통해 객체 인식 결과를 추론할 수 있는 장점이 있다. The spatial recognition unit 340 analyzes the image even when the voice and the text are not included, thereby analyzing the contextual information, thereby inferring the object recognition result through a single inference in a device such as a CCTV.

도 5는 본 발명의 다른 실시예에 채용된 상황 정보 제공부(300)를 설명하기 위한 기능 블럭도이다. 도 5에 도시된 바와 같이, 본 발명의 다른 실시예에 채용된 상황 정보 제공부(300)는 파일 입력 인터페이스(350)를 포함할 수 있다. 5 is a functional block diagram for explaining the context information providing unit 300 employed in another embodiment of the present invention. As illustrated in FIG. 5, the context information providing unit 300 employed in another embodiment of the present invention may include a file input interface 350.

파일 입력 인터페이스(350)는 텍스트를 포함하는 파일을 입력받는 역할을 한다. The file input interface 350 serves to receive a file including text.

따라서, 본 발명의 다른 실시예에 따르면, 파일 입력 인터페이스를 통해 외부 단말로부터 다양한 방법으로 텍스트를 포함하는 파일을 입력받고, 그 파일에 포함된 텍스트를 상황 정보 분석부(320)가 분석함으로써, 용이하게 상황 정보를 획득할 수 있는 장점이 있다. Therefore, according to another embodiment of the present invention, by receiving a file including text from the external terminal through a file input interface in various ways, the contextual information analysis unit 320 analyzes the text included in the file, so that There is an advantage that can obtain the situation information.

그리고 본 발명의 다른 실시예에 채용된 상황 정보 제공부(300)는 사용자 입력 인터페이스(360)를 포함할 수 있다. In addition, the contextual information providing unit 300 employed in another embodiment of the present invention may include a user input interface 360.

사용자 입력 인터페이스(360)는 사용자로부터 직접 상황 정보를 입력받는 역할을 하는 것으로, 키 입력 수단이 이용될 수 있다. 즉, 사용자가 직접 영상의 상황 정보를 직접 입력함으로써, 상황 정보를 검출하기 위한 시간을 단축할 수 있는 효과가 있다. The user input interface 360 serves to receive context information directly from a user, and a key input means may be used. That is, the user directly inputs the situation information of the image, thereby reducing the time for detecting the situation information.

또한, 상황 정보 제공부(300)는 GPS, 네비게이션과 같은 위치 센서(370)로부터 위치정보를 수신하고, 그 정보로부터 상황 정보를 획득할 수도 있다. In addition, the contextual information providing unit 300 may receive the positional information from the position sensor 370 such as GPS and navigation, and may obtain the contextual information from the information.

이와 같이, 상황 정보 제공부(300)가 GPS 등과 같은 위치 센서(370)와 연결되고, 자동차와 같은 이동 수단에 장착된 경우, 별다른 추가 동작 없이도 각 위치에 맞게 객체의 라벨 정보 출력을 가능하게 하는 장점이 있다. As such, when the situation information providing unit 300 is connected to a position sensor 370 such as a GPS, and mounted on a moving means such as a car, it is possible to output label information of an object to each position without any additional operation. There is an advantage.

이하, 하기에서는 본 발명의 일 실시예에 따른 단일 추론 다중 레이블 생성 기반의 객체 인식 방법에 대하여 도 6을 참조하여 설명하기로 한다. Hereinafter, an object recognition method based on a single inference multiple label generation according to an embodiment of the present invention will be described with reference to FIG. 6.

먼저, 객체 인식부(100)가 입력되는 영상으로부터 추론하고자 하는 객체 정보를 인식한다(S100). First, the object recognition unit 100 recognizes object information to be inferred from the input image (S100).

이어서, 상황 정보 제공부(300)가 인식된 객체 정보를 객체의 라벨 정보로 매칭될 수 있도록, 영상의 상황 정보를 제공한다(S200). Subsequently, the contextual information providing unit 300 provides contextual information of the image so that the recognized object information may be matched with the label information of the object (S200).

이후, 객체 라벨 처리부(400)가 상기 상황 정보 제공부(300)로부터 입력된 상황 정보를 기반으로 상기 데이터베이스(200)에서 Then, the object label processing unit 400 in the database 200 based on the situation information input from the situation information providing unit 300.

상기 객체 인식부를 통해 인식된 객체 정보와, 공간, 위치에 따른 객체 정보에 매칭되는 객체의 라벨 정보를 포함하는 단일 레이블 정보를 선택한다(S300). In operation S300, single label information including object information recognized through the object recognition unit and label information of an object matching the object information according to space and position are selected.

그러면, 상기 객체 라벨 처리부(400)가 상기 객체 인식부(100)를 통해 인식된 객체 정보를 상기 선택한 단일 레이블 정보에 매칭하여 상기 인식된 객체를 인지하고, 그 인지된 객체 정보를 출력한다(S400). Then, the object label processing unit 400 recognizes the recognized object by matching the object information recognized by the object recognition unit 100 with the selected single label information, and outputs the recognized object information (S400). ).

도 7은 본 발명의 일 실시예에서 상황 제공 방법의 세부 과정을 설명하기 위한 순서도이다. 도 7에 도시된 바와 같이, 본 발명의 일 실시예에 채용된 상황 정보 제공하는 단계(S200)는 음성 추출부(310)가 상기 입력되는 영상으로부터 음성 정보를 추출한다(S210). 7 is a flowchart illustrating a detailed process of a situation providing method according to an embodiment of the present invention. As shown in FIG. 7, in the providing of the situation information employed in the exemplary embodiment of the present invention (S200), the voice extractor 310 extracts the audio information from the input image (S210).

이어서, 상황 정보 분석부(320)가 상기 음성 추출부(310)를 통해 추출된 음성으로부터 분석한 영상의 상황 정보를 제공한다(S220). Subsequently, the situation information analyzer 320 provides context information of an image analyzed from the voice extracted by the voice extractor 310 (S220).

이와 같이, 본 발명의 일 실시예에 따르면, 객체를 추출하고자 영상과 동기화된 음성을 통해 영상의 상황 정보를 분석할 수 있는 효과가 있다. As described above, according to an embodiment of the present invention, there is an effect of analyzing the situation information of the image through the voice synchronized with the image to extract the object.

그리고 상기 상황 정보 제공하는 단계(S200)는, 텍스트 변환부(330)가 상기 음성 추출부(310)를 통해 추출된 음성을 텍스트로 변환하는 단계를 더 포함할 수 있다. The providing of the situation information (S200) may further include converting, by the text converter 330, the voice extracted by the voice extractor 310 into text.

따라서, 상황 정보 분석부(320)가 상기 텍스트 변환부(330)를 통해 변환된 텍스트로부터 공간과 상황에 대한 상황 정보를 제공할 수 있다. Therefore, the contextual information analysis unit 320 may provide contextual information about spaces and contexts from the text converted by the text transformation unit 330.

한편, 상기 상황 정보 제공하는 단계(S200)는, 입력 인터페이스가 텍스트를 포함하는 파일을 입력받을 수 있다. Meanwhile, in the providing of the situation information (S200), an input interface may receive a file including text.

따라서, 본 발명의 다른 실시예에 따르면, 파일 입력 인터페이스(350)를 통해 텍스트를 포함하는 파일을 입력받고, 그 파일에 포함된 텍스트를 상황 정보 분석부(320)가 분석함으로써, 용이하게 상황 정보를 획득할 수 있는 장점이 있다. Therefore, according to another embodiment of the present invention, by receiving the file including the text through the file input interface 350, the context information analysis unit 320 analyzes the text contained in the file, the situation information easily There is an advantage to obtain.

그리고 상기 상황 정보 제공하는 단계(S200)는, 사용자 입력 인터페이스(360)가 사용자로부터 직접 상황 정보를 입력받을 수 있다. 즉, 사용자가 직접 영상의 상황 정보를 직접 입력함으로써, 상황 정보를 검출하기 위한 시간을 단축할 수 있는 효과가 있다. In the providing of the situation information (S200), the user input interface 360 may receive the situation information directly from the user. That is, the user directly inputs the situation information of the image, thereby reducing the time for detecting the situation information.

한편, 상기 상황 정보 제공하는 단계(S200)는, 공간인식부(340)가 이미지 기반 공간 분류/인식 엔진을 통해서 입력된 영상으로부터 상황 정보를 분석하여 인식할 수도 있다. Meanwhile, in the providing of the situation information (S200), the space recognition unit 340 may analyze and recognize the situation information from the image input through the image-based spatial classification / recognition engine.

이상, 본 발명의 구성에 대하여 첨부 도면을 참조하여 상세히 설명하였으나, 이는 예시에 불과한 것으로서, 본 발명이 속하는 기술분야에 통상의 지식을 가진자라면 본 발명의 기술적 사상의 범위 내에서 다양한 변형과 변경이 가능함은 물론이다. 따라서 본 발명의 보호 범위는 전술한 실시예에 국한되어서는 아니되며 이하의 특허청구범위의 기재에 의하여 정해져야 할 것이다. In the above, the configuration of the present invention has been described in detail with reference to the accompanying drawings, which are merely examples, and those skilled in the art to which the present invention pertains various modifications and changes within the scope of the technical idea of the present invention. Of course this is possible. Therefore, the protection scope of the present invention should not be limited to the above-described embodiment but should be defined by the following claims.

100 : 객체 인식부 200 :데이터베이스
300 : 상황 정보 제공부 310 : 음성 추출부
320 : 상황 정보 분석부 330 : 텍스트 변환부
340 : 공간인식부 350 : 파일 입력 인터페이스
360 : 사용자 입력 인터페이스 370 : 위치 센서
400 : 객체 라벨 처리부 100: object recognition unit 200: database
300: context information provider 310: voice extractor
320: context information analysis unit 330: text conversion unit
340: space recognition unit 350: file input interface
360: user input interface 370: position sensor
400: object label processing unit

Claims

An object recognizing unit recognizing object information to be inferred from an input image based on a model trained on level data including higher concepts regardless of domain conditions;
Stores single label information including recognizable object information and label information of an object corresponding to the object information so that the object information recognized by the object recognition unit can be converted into label information of an object corresponding to space and position information. Managing database;
A situation information providing unit providing situation information of an image so that the recognized object information can be matched with label information of the object; And
The label information of the matched object is selected by selecting single label information stored in the database based on the context information input from the context information providing unit, matching the object information recognized through the object recognition unit with the selected single label information. Includes; object label processing unit for providing a label for multiple domains in a single inference by outputting the;
The situation information provider,
A location sensor for receiving location information; A file input interface for receiving a file including text including context information; And a user input interface for receiving context information directly from a user.
It includes a voice extraction unit for extracting the voice information from the input image and a space recognition unit for analyzing the situation information from the image input through the image-based spatial classification / recognition engine,
Analyzes and recognizes the situation information from the location information received through the position sensor, analyzes and recognizes the text included in the file, and a text conversion unit for converting the voice extracted through the speech extraction unit into text, Single inference multi-label generation object recognition apparatus comprising a context information analyzer for providing context information of the image from the text.

delete

Recognizing object information to be inferred from an image input based on a model learned by the level data including the higher concept by the object recognition unit regardless of domain conditions;
Providing context information of an image so that the context information provider can match the recognized object information with label information of the object;
Selecting, by an object label processor, single label information including object label information matching object information recognized in a database based on the context information input from the context information provider; And
And outputting, by the object label processing unit, the object information recognized by the object recognition unit to match the selected single label information and outputting the label information of the object corresponding to various domains by a single inference.
Providing the situation information,
Extracting voice information from the input image by a voice extractor;
Receiving, by the user input interface, context information directly from a user; And
Providing, by a context information analyzer, context information of an image analyzed from the speech extracted by the speech extractor;
Receiving, by the input interface, a file including text;
And a spatial recognition unit analyzing and recognizing contextual information from an image input through an image-based spatial classification / recognition engine.
The context information analyzer analyzes and recognizes the context information from the location information received through the sensor, analyzes and recognizes the text included in the file, and converts the speech extracted through the speech extractor into text. To provide context information of an image from the converted text.
An object recognition method based on single inference and multiple label generation.

delete