KR20210038349A

KR20210038349A - A Method and Apparatus for Object Recognition using Machine Learning

Info

Publication number: KR20210038349A
Application number: KR1020200124537A
Authority: KR
Inventors: 윤정하; 김재현; 조희연
Original assignee: 주식회사 작당모의
Priority date: 2019-09-29
Filing date: 2020-09-25
Publication date: 2021-04-07
Also published as: KR20210038280A; KR102539072B1

Abstract

The present invention relates to a method and device for recognizing an object in an image using machine learning. According to one embodiment of the present invention, the method for recognizing an object includes the steps of: (a) obtaining an object-related image; (b) extracting object text information from the object-related image; and (c) inputting the object text information into an object recognition deep learning model, and recognizing at least one of an object, an object identifier, and an object display time from the obtained object-related image.

Description

A Method and Apparatus for Object Recognition using Machine Learning}

본 발명은 머신 러닝을 이용한 영상 내 객체 인식 방법 및 장치에 관한 것으로, 더욱 상세하게는 미신 러닝을 이용하여 객체 및 객체 표시 시간을 인식하기 위한 방법 및 장치에 관한 것이다.The present invention relates to a method and apparatus for recognizing an object in an image using machine learning, and more particularly, to a method and apparatus for recognizing an object and an object display time using superstitious learning.

최근 개인의 노하우를 공유하는 방법이 TEXT 중심에서 영상중심으로 이동하고 있는 추세이다. 이러한 영상에서 사용한 사물을 판별해 낼 수 있다면 다양한 비즈니스 모델을 붙일 수 있으며, 컨텐츠를 풍성하게 가공할 수 있는 기본이 될 수 있다. 이를 구현하기 위해 사람이 인위적으로 대입하는 방식은 많은 시간과 자본노동이 소요되기도 하고 일정한 품질관리를 유지하기 어려운 단점이 있다. 이를 활용한다면, 영상을 가공하는 사람에게나, 영상을 통해 노하우를 제공 받는 사람들에게 모두 유익한 정보로서의 의미가 있을 것이다.Recently, the method of sharing personal know-how is moving from the TEXT center to the video center. If the objects used in these videos can be identified, various business models can be attached, and it can be the basis for rich processing of content. In order to realize this, the method of artificially substituting people takes a lot of time and capital labor, and it is difficult to maintain a certain quality control. If this is used, it will have a meaning as useful information for both the people who process the image and those who are provided with know-how through the image.

다만, 영상 속에서 객체를 인지할 수 있도록 하는 과정에서 다량의 이미지 학습데이터를 수집해서 태깅해야 하는 초기 데이터수집 노력이 너무 크다는 문제점이 있다.However, in the process of recognizing an object in an image, there is a problem that the initial data collection effort to collect and tag a large amount of image learning data is too great.

[특허문헌 1] 한국공개특허 제10-2020-0011728호[Patent Document 1] Korean Laid-Open Patent No. 10-2020-0011728

본 발명은 전술한 문제점을 해결하기 위하여 창출된 것으로, 머신 러닝을 이용한 영상 내 객체 인식 방법 및 장치를 제공하는 것을 그 목적으로 한다.The present invention was created to solve the above-described problem, and an object thereof is to provide a method and apparatus for recognizing an object in an image using machine learning.

또한, 본 발명은 인공지능을 도입하여 영상 속에서 객체를 찾아내기 위하여 사람의 수작업이 대량 투입되어야 학습할 수 있는 종래 상황을 개선하고자 한다.In addition, the present invention is to improve the conventional situation in which learning can be performed only when a large amount of human manual work is invested in order to find an object in an image by introducing artificial intelligence.

또한, 본 발명은 최초 수 백개 정도의 적은 수량으로 시작하여 제품 학습을 시작할 수 있는 스파이럴 학습모델을 도입하여 빠른 시간안에 객체의 특성상 영상 속에서 객체를 인식할 수 있도록 하는 장치 및 방법을 제공함에 그 목적이 있다.In addition, the present invention provides an apparatus and method for recognizing an object in an image due to the nature of the object in a short time by introducing a spiral learning model that can start product learning by starting with a small quantity of about several hundred for the first time. There is a purpose.

본 발명의 목적들은 이상에서 언급한 목적들로 제한되지 않으며, 언급되지 않은 또 다른 목적들은 아래의 기재로부터 명확하게 이해될 수 있을 것이다.The objects of the present invention are not limited to the objects mentioned above, and other objects that are not mentioned will be clearly understood from the following description.

상기한 목적들을 달성하기 위하여, 본 발명의 일 실시예에 따른 객체 인식 방법은 (a) 객체 관련 영상을 획득하는 단계; (b) 상기 객체 관련 영상으로부터 객체 문자 정보를 추출하는 단계; 및 (c) 상기 객체 문자 정보를 객체 인식 딥러닝 모델에 입력하여, 상기 획득된 객체 관련 영상으로부터 객체, 객체 식별자 및 객체 표시 시간 중 적어도 하나를 인식하는 단계;를 포함할 수 있다.In order to achieve the above objects, an object recognition method according to an embodiment of the present invention includes the steps of: (a) obtaining an object-related image; (b) extracting object text information from the object-related image; And (c) recognizing at least one of an object, an object identifier, and an object display time from the acquired object-related image by inputting the object text information into an object recognition deep learning model.

실시예에서, 상기 (b) 단계는, 상기 객체 관련 영상으로부터 객체 이미지 정보, 객체 사운드 정보 및 객체 문자 정보 중 적어도 하나를 추출하는 단계;를 포함할 수 있다. In an embodiment, the step (b) may include extracting at least one of object image information, object sound information, and object text information from the object-related image.

실시예에서, 상기 (c) 단계는, 상기 객체 이미지 정보, 객체 사운드 정보 및 객체 문자 정보 중 적어도 하나를 객체 인식 딥러닝 모델에 입력하여, 상기 객체 관련 영상으로부터 상기 객체, 객체 식별자 및 객체 표시 시간 중 적어도 하나를 인식하는 단계;를 포함할 수 있다. In an embodiment, the step (c) includes inputting at least one of the object image information, object sound information, and object text information into an object recognition deep learning model, and the object, object identifier, and object display time from the object-related image. Recognizing at least one of the; may include.

실시예에서, 상기 (c) 단계는, 상기 객체 이미지 정보와 상기 객체 인식 딥러닝 모델에 의해 생성된 학습 이미지 정보가 일치하지 않는 경우, 상기 객체 사운드 정보와 상기 객체 인식 딥러닝 모델에 의해 생성된 학습 사운드 정보의 일치 여부를 판단하는 단계; 상기 객체 사운드 정보와 상기 학습 사운드 정보가 일치하지 않는 경우, 상기 객체 문자 정보와 상기 객체 인식 딥러닝 모델에 의해 생성된 학습 문자 정보의 일치 여부를 판단하는 단계; 및 상기 객체 문자 정보와 상기 학습 문자 정보가 일치하지 않는 경우, 상기 객체 이미지 정보, 객체 사운드 정보 및 객체 문자 정보를 상기 객체 인식 딥러닝 모델의 학습 파일에 저장하는 단계;를 포함할 수 있다. In an embodiment, in the step (c), when the object image information and the training image information generated by the object recognition deep learning model do not match, the object sound information and the object recognition deep learning model are generated. Determining whether the learning sound information matches; If the object sound information and the learning sound information do not match, determining whether the object text information and the learning text information generated by the object recognition deep learning model match; And if the object text information and the learning text information do not match, storing the object image information, object sound information, and object text information in a learning file of the object recognition deep learning model.

실시예에서, 상기 (c) 단계는, 상기 객체 이미지 정보와 상기 객체 인식 딥러닝 모델에 의해 생성된 학습 이미지 정보가 일치하지 않는 경우, 상기 객체 문자 정보와 상기 객체 인식 딥러닝 모델에 의해 생성된 학습 문자 정보의 일치 여부를 판단하는 단계; 상기 객체 문자 정보와 상기 학습 문자 정보가 일치하지 않는 경우, 상기 객체 사운드 정보와 상기 객체 인식 딥러닝 모델에 의해 생성된 학습 사운드 정보의 일치 여부를 판단하는 단계; 및 상기 객체 사운드 정보와 상기 학습 사운드 정보가 일치하지 않는 경우, 상기 객체 이미지 정보, 객체 사운드 정보 및 객체 문자 정보를 상기 객체 인식 딥러닝 모델의 학습 파일에 저장하는 단계;를 포함할 수 있다. In an embodiment, in the step (c), when the object image information and the training image information generated by the object recognition deep learning model do not match, the object text information and the object recognition deep learning model are generated. Determining whether the learning character information is matched; Determining whether the object sound information and the learning sound information generated by the object recognition deep learning model match when the object text information and the learning text information do not match; And if the object sound information and the learning sound information do not match, storing the object image information, object sound information, and object text information in a learning file of the object recognition deep learning model.

실시예에서, 상기 (c) 단계는, 상기 객체 사운드 정보와 상기 객체 인식 딥러닝 모델에 의해 생성된 학습 사운드 정보가 일치하지 않는 경우, 상기 객체 이미지 정보와 상기 객체 인식 딥러닝 모델에 의해 생성된 학습 이미지 정보의 일치 여부를 판단하는 단계; 상기 객체 이미지 정보와 상기 학습 이미지 정보가 일치하지 않는 경우, 상기 객체 문자 정보와 상기 객체 인식 딥러닝 모델에 의해 생성된 학습 문자 정보의 일치 여부를 판단하는 단계; 및 상기 객체 문자 정보와 상기 학습 문자 정보가 일치하지 않는 경우, 상기 객체 이미지 정보, 객체 사운드 정보 및 객체 문자 정보를 상기 객체 인식 딥러닝 모델의 학습 파일에 저장하는 단계;를 포함할 수 있다. In an embodiment, in the step (c), when the object sound information and the learning sound information generated by the object recognition deep learning model do not match, the object image information and the object recognition deep learning model are generated. Determining whether or not the training image information matches; If the object image information and the training image information do not match, determining whether the object text information and the learning text information generated by the object recognition deep learning model match; And if the object text information and the learning text information do not match, storing the object image information, object sound information, and object text information in a learning file of the object recognition deep learning model.

실시예에서, 상기 (c) 단계는, 상기 객체 사운드 정보와 상기 객체 인식 딥러닝 모델에 의해 생성된 학습 사운드 정보가 일치하지 않는 경우, 상기 객체 문자 정보와 상기 객체 인식 딥러닝 모델에 의해 생성된 학습 문자 정보의 일치 여부를 판단하는 단계; 상기 객체 문자 정보와 상기 학습 문자 정보가 일치하지 않는 경우, 상기 객체 이미지 정보와 상기 객체 인식 딥러닝 모델에 의해 생성된 학습 이미지 정보의 일치 여부를 판단하는 단계; 및 상기 객체 이미지 정보와 상기 학습 이미지 정보가 일치하지 않는 경우, 상기 객체 이미지 정보, 객체 사운드 정보 및 객체 문자 정보를 상기 객체 인식 딥러닝 모델의 학습 파일에 저장하는 단계;를 포함할 수 있다. In an embodiment, in the step (c), when the object sound information and the learning sound information generated by the object recognition deep learning model do not match, the object text information and the object recognition deep learning model are generated. Determining whether the learning character information is matched; If the object text information and the learning text information do not match, determining whether the object image information and the learning image information generated by the object recognition deep learning model match; And if the object image information and the training image information do not match, storing the object image information, object sound information, and object text information in a training file of the object recognition deep learning model.

실시예에서, 상기 (c) 단계는, 상기 객체 문자 정보와 상기 객체 인식 딥러닝 모델에 의해 생성된 학습 문자 정보가 일치하지 않는 경우, 상기 객체 이미지 정보와 상기 객체 인식 딥러닝 모델에 의해 생성된 학습 이미지 정보의 일치 여부를 판단하는 단계; 상기 객체 이미지 정보와 상기 학습 이미지 정보가 일치하지 않는 경우, 상기 객체 사운드 정보와 상기 객체 인식 딥러닝 모델에 의해 생성된 학습 사운드 정보의 일치 여부를 판단하는 단계; 및 상기 객체 사운드 정보와 상기 학습 사운드 정보가 일치하지 않는 경우, 상기 객체 이미지 정보, 객체 사운드 정보 및 객체 문자 정보를 상기 객체 인식 딥러닝 모델의 학습 파일에 저장하는 단계;를 포함할 수 있다. In an embodiment, in the step (c), when the object text information and the learning text information generated by the object recognition deep learning model do not match, the object image information and the object recognition deep learning model are generated. Determining whether or not the training image information matches; Determining whether the object sound information and the learning sound information generated by the object recognition deep learning model match when the object image information and the training image information do not match; And if the object sound information and the learning sound information do not match, storing the object image information, object sound information, and object text information in a learning file of the object recognition deep learning model.

실시예에서, 상기 (c) 단계는, 상기 객체 문자 정보와 상기 객체 인식 딥러닝 모델에 의해 생성된 학습 문자 정보가 일치하지 않는 경우, 상기 객체 사운드 정보와 상기 객체 인식 딥러닝 모델에 의해 생성된 학습 사운드 정보의 일치 여부를 판단하는 단계; 상기 객체 사운드 정보와 상기 학습 사운드 정보가 일치하지 않는 경우, 상기 객체 이미지 정보와 상기 객체 인식 딥러닝 모델에 의해 생성된 학습 이미지 정보의 일치 여부를 판단하는 단계; 및 상기 객체 이미지 정보와 상기 학습 이미지 정보가 일치하지 않는 경우, 상기 객체 이미지 정보, 객체 사운드 정보 및 객체 문자 정보를 상기 객체 인식 딥러닝 모델의 학습 파일에 저장하는 단계;를 포함할 수 있다. In an embodiment, in the step (c), when the object text information and the learning text information generated by the object recognition deep learning model do not match, the object sound information and the object recognition deep learning model are generated. Determining whether the learning sound information matches; Determining whether the object image information and the learning image information generated by the object recognition deep learning model match when the object sound information and the learning sound information do not match; And if the object image information and the training image information do not match, storing the object image information, object sound information, and object text information in a training file of the object recognition deep learning model.

실시예에서, 상기 (c) 단계는, 상기 객체 이미지 정보와 상기 객체 인식 딥러닝 모델에 의해 생성된학습 이미지 정보가 일치하거나, 상기 객체 사운드 정보와 상기 객체 인식 딥러닝 모델에 의해 생성된 학습 사운드 정보가 일치하거나 또는 상기 객체 문자 정보와 상기 객체 인식 딥러닝 모델에 의해 생성된 학습 문자 정보가 일치하는 경우, 상기 객체 이미지 정보, 객체 사운드 정보 및 객체 문자 정보 중 적어도 하나에 대응하는 상기 객체, 객체 식별자 및 객체 표시 시간을 인식하는 단계;를 포함할 수 있다. In an embodiment, in the step (c), the object image information and the learning image information generated by the object recognition deep learning model match, or the object sound information and the learning sound generated by the object recognition deep learning model When the information matches or the object text information and the learning text information generated by the object recognition deep learning model match, the object and object corresponding to at least one of the object image information, object sound information, and object text information Recognizing an identifier and an object display time; may include.

실시예에서, 객체 인식 장치는, 객체 관련 영상을 획득하는 통신부; 및 상기 객체 관련 영상으로부터 객체 문자 정보를 추출하고, 상기 객체 문자 정보를 객체 인식 딥러닝 모델에 입력하여, 상기 획득된 객체 관련 영상으로부터 객체, 객체 식별자 및 객체 표시 시간 중 적어도 하나를 인식하는 제어부;를 포함할 수 있다. In an embodiment, an object recognition apparatus includes: a communication unit that obtains an image related to an object; And a controller configured to extract object text information from the object-related image, input the object text information into an object recognition deep learning model, and recognize at least one of an object, an object identifier, and an object display time from the obtained object-related image. It may include.

실시예에서, 상기 제어부는, 상기 객체 관련 영상으로부터 객체 이미지 정보, 객체 사운드 정보 및 객체 문자 정보 중 적어도 하나를 추출할 수 있다. In an embodiment, the controller may extract at least one of object image information, object sound information, and object text information from the object-related image.

실시예에서, 상기 제어부는, 상기 객체 이미지 정보, 객체 사운드 정보 및 객체 문자 정보 중 적어도 하나를 객체 인식 딥러닝 모델에 입력하여, 상기 객체 관련 영상으로부터 상기 객체, 객체 식별자 및 객체 표시 시간 중 적어도 하나를 인식할 수 있다. In an embodiment, the controller inputs at least one of the object image information, object sound information, and object text information into an object recognition deep learning model, and at least one of the object, object identifier, and object display time from the object-related image. Can be recognized.

실시예에서, 상기 제어부는, 상기 객체 이미지 정보와 상기 객체 인식 딥러닝 모델에 의해 생성된 학습 이미지 정보가 일치하지 않는 경우, 상기 객체 사운드 정보와 상기 객체 인식 딥러닝 모델에 의해 생성된 학습 사운드 정보의 일치 여부를 판단하고, 상기 객체 사운드 정보와 상기 학습 사운드 정보가 일치하지 않는 경우, 상기 객체 문자 정보와 상기 객체 인식 딥러닝 모델에 의해 생성된 학습 문자 정보의 일치 여부를 판단하고, 상기 객체 문자 정보와 상기 학습 문자 정보가 일치하지 않는 경우, 상기 객체 이미지 정보, 객체 사운드 정보 및 객체 문자 정보를 상기 객체 인식 딥러닝 모델의 학습 파일에 저장할 수 있다. In an embodiment, when the object image information and the learning image information generated by the object recognition deep learning model do not match, the control unit comprises the object sound information and the learning sound information generated by the object recognition deep learning model. Is determined whether the object sound information and the learning sound information do not match, it is determined whether the object text information and the learning text information generated by the object recognition deep learning model match, and the object text When the information and the learning text information do not match, the object image information, object sound information, and object text information may be stored in a training file of the object recognition deep learning model.

실시예에서, 상기 제어부는, 상기 객체 이미지 정보와 상기 객체 인식 딥러닝 모델에 의해 생성된 학습 이미지 정보가 일치하지 않는 경우, 상기 객체 문자 정보와 상기 객체 인식 딥러닝 모델에 의해 생성된 학습 문자 정보의 일치 여부를 판단하고, 상기 객체 문자 정보와 상기 학습 문자 정보가 일치하지 않는 경우, 상기 객체 사운드 정보와 상기 객체 인식 딥러닝 모델에 의해 생성된 학습 사운드 정보의 일치 여부를 판단하고, 상기 객체 사운드 정보와 상기 학습 사운드 정보가 일치하지 않는 경우, 상기 객체 이미지 정보, 객체 사운드 정보 및 객체 문자 정보를 상기 객체 인식 딥러닝 모델의 학습 파일에 저장할 수 있다. In an embodiment, when the object image information and the training image information generated by the object recognition deep learning model do not match, the control unit comprises the object text information and the learning character information generated by the object recognition deep learning model. When the object text information and the learning text information do not match, it is determined whether the object sound information and the learning sound information generated by the object recognition deep learning model match, and the object sound When the information and the learning sound information do not match, the object image information, object sound information, and object text information may be stored in a training file of the object recognition deep learning model.

실시예에서, 상기 제어부는, 상기 객체 사운드 정보와 상기 객체 인식 딥러닝 모델에 의해 생성된 학습 사운드 정보가 일치하지 않는 경우, 상기 객체 이미지 정보와 상기 객체 인식 딥러닝 모델에 의해 생성된 학습 이미지 정보의 일치 여부를 판단하고, 상기 객체 이미지 정보와 상기 학습 이미지 정보가 일치하지 않는 경우, 상기 객체 문자 정보와 상기 객체 인식 딥러닝 모델에 의해 생성된 학습 문자 정보의 일치 여부를 판단하고, 상기 객체 문자 정보와 상기 학습 문자 정보가 일치하지 않는 경우, 상기 객체 이미지 정보, 객체 사운드 정보 및 객체 문자 정보를 상기 객체 인식 딥러닝 모델의 학습 파일에 저장할 수 있다. In an embodiment, when the object sound information and the learning sound information generated by the object recognition deep learning model do not match, the control unit includes the object image information and training image information generated by the object recognition deep learning model. Is determined whether or not the object image information and the training image information do not match, it is determined whether the object text information and the learning text information generated by the object recognition deep learning model match, and the object text When the information and the learning text information do not match, the object image information, object sound information, and object text information may be stored in a training file of the object recognition deep learning model.

실시예에서, 상기 제어부는, 상기 객체 사운드 정보와 상기 객체 인식 딥러닝 모델에 의해 생성된 학습 사운드 정보가 일치하지 않는 경우, 상기 객체 문자 정보와 상기 객체 인식 딥러닝 모델에 의해 생성된 학습 문자 정보의 일치 여부를 판단하고, 상기 객체 문자 정보와 상기 학습 문자 정보가 일치하지 않는 경우, 상기 객체 이미지 정보와 상기 객체 인식 딥러닝 모델에 의해 생성된 학습 이미지 정보의 일치 여부를 판단하고, 상기 객체 이미지 정보와 상기 학습 이미지 정보가 일치하지 않는 경우, 상기 객체 이미지 정보, 객체 사운드 정보 및 객체 문자 정보를 상기 객체 인식 딥러닝 모델의 학습 파일에 저장할 수 있다. In an embodiment, when the object sound information and the learning sound information generated by the object recognition deep learning model do not match, the control unit comprises the object text information and the learning text information generated by the object recognition deep learning model. When the object text information and the learning text information do not match, it is determined whether the object image information and the learning image information generated by the object recognition deep learning model match, and the object image When the information and the training image information do not match, the object image information, object sound information, and object text information may be stored in a training file of the object recognition deep learning model.

실시예에서, 상기 제어부는, 상기 객체 문자 정보와 상기 객체 인식 딥러닝 모델에 의해 생성된 학습 문자 정보가 일치하지 않는 경우, 상기 객체 이미지 정보와 상기 객체 인식 딥러닝 모델에 의해 생성된 학습 이미지 정보의 일치 여부를 판단하고, 상기 객체 이미지 정보와 상기 학습 이미지 정보가 일치하지 않는 경우, 상기 객체 사운드 정보와 상기 객체 인식 딥러닝 모델에 의해 생성된 학습 사운드 정보의 일치 여부를 판단하고, 상기 객체 사운드 정보와 상기 학습 사운드 정보가 일치하지 않는 경우, 상기 객체 이미지 정보, 객체 사운드 정보 및 객체 문자 정보를 상기 객체 인식 딥러닝 모델의 학습 파일에 저장할 수 있다. In an embodiment, when the object text information and the learning text information generated by the object recognition deep learning model do not match, the object image information and the learning image information generated by the object recognition deep learning model Is determined whether the object image information and the learning image information do not match, it is determined whether the object sound information and the learning sound information generated by the object recognition deep learning model match, and the object sound When the information and the learning sound information do not match, the object image information, object sound information, and object text information may be stored in a training file of the object recognition deep learning model.

실시예에서, 상기 제어부는, 상기 객체 문자 정보와 상기 객체 인식 딥러닝 모델에 의해 생성된 학습 문자 정보가 일치하지 않는 경우, 상기 객체 사운드 정보와 상기 객체 인식 딥러닝 모델에 의해 생성된 학습 사운드 정보의 일치 여부를 판단하고, 상기 객체 사운드 정보와 상기 학습 사운드 정보가 일치하지 않는 경우, 상기 객체 이미지 정보와 상기 객체 인식 딥러닝 모델에 의해 생성된 학습 이미지 정보의 일치 여부를 판단하고, 상기 객체 이미지 정보와 상기 학습 이미지 정보가 일치하지 않는 경우, 상기 객체 이미지 정보, 객체 사운드 정보 및 객체 문자 정보를 상기 객체 인식 딥러닝 모델의 학습 파일에 저장할 수 있다. In an embodiment, if the object text information and the learning text information generated by the object recognition deep learning model do not match, the object sound information and the learning sound information generated by the object recognition deep learning model Is determined whether the object sound information and the learning sound information do not match, it is determined whether the object image information and the learning image information generated by the object recognition deep learning model match, and the object image When the information and the training image information do not match, the object image information, object sound information, and object text information may be stored in a training file of the object recognition deep learning model.

실시예에서, 상기 제어부는, 상기 객체 이미지 정보와 상기 객체 인식 딥러닝 모델에 의해 생성된학습 이미지 정보가 일치하거나, 상기 객체 사운드 정보와 상기 객체 인식 딥러닝 모델에 의해 생성된 학습 사운드 정보가 일치하거나 또는 상기 객체 문자 정보와 상기 객체 인식 딥러닝 모델에 의해 생성된 학습 문자 정보가 일치하는 경우, 상기 객체 이미지 정보, 객체 사운드 정보 및 객체 문자 정보 중 적어도 하나에 대응하는 상기 객체, 객체 식별자 및 객체 표시 시간을 인식할 수 있다. In an embodiment, the control unit matches the object image information and the learning image information generated by the object recognition deep learning model, or the object sound information matches the learning sound information generated by the object recognition deep learning model. Or, when the object text information and the learning text information generated by the object recognition deep learning model match, the object, object identifier, and object corresponding to at least one of the object image information, object sound information, and object text information The display time can be recognized.

상기한 목적들을 달성하기 위한 구체적인 사항들은 첨부된 도면과 함께 상세하게 후술될 실시예들을 참조하면 명확해질 것이다.Detailed matters for achieving the above objects will become apparent with reference to embodiments to be described later in detail together with the accompanying drawings.

그러나, 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라, 서로 다른 다양한 형태로 구성될 수 있으며, 본 발명의 개시가 완전하도록 하고 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자(이하, "통상의 기술자")에게 발명의 범주를 완전하게 알려주기 위해서 제공되는 것이다.However, the present invention is not limited to the embodiments disclosed below, but may be configured in various different forms, so that the disclosure of the present invention is complete and those of ordinary skill in the technical field to which the present invention pertains ( Hereinafter, it is provided in order to completely inform the scope of the invention to the "normal engineer").

본 발명의 일 실시예에 의하면, 기계학습을 통하여 영상내 객체를 검출하고 이용함으로 영상컨텐츠를 제공함에 있어서 더욱 풍부하고 활용도 있는 서비스를 제공할 수 있다. According to an embodiment of the present invention, it is possible to provide a richer and more useful service in providing image contents by detecting and using an object in an image through machine learning.

또한, 본 발명의 일 실시예에 의하면, 영상 내 다양한 제품이 사용되고 있는 현상을 알 수 있고, 특정브랜드나 제품이 얼마만큼 영상에서 소요되는지를 특정할 수 있다.In addition, according to an embodiment of the present invention, it is possible to know a phenomenon in which various products in an image are being used, and to specify how much a specific brand or product is required in the image.

또한, 본 발명의 일 실시예에 의하면, 고객의 궁금증을 풀어줄 수 있고 긴 영상내 특정 제품이 노출된 곳으로 바로 진입시키는 서비스가 가능하다.In addition, according to an exemplary embodiment of the present invention, it is possible to solve a customer's curiosity and provide a service that directly enters a place where a specific product in a long image is exposed.

본 발명의 효과들은 상술된 효과들로 제한되지 않으며, 본 발명의 기술적 특징들에 의하여 기대되는 잠정적인 효과들은 아래의 기재로부터 명확하게 이해될 수 있을 것이다.The effects of the present invention are not limited to the above-described effects, and the potential effects expected by the technical features of the present invention will be clearly understood from the following description.

도 1은 본 발명의 일 실시예에 따른 객체 인식 방법을 도시한 도면이다.
도 2a는 본 발명의 일 실시예에 따른 영상 수집의 예를 도시한 도면이다.
도 2b는 본 발명의 일 실시예에 따른 객체 인식 딥러닝 모델 학습의 예를 도시한 도면이다.
도 2c 및 2d는 본 발명의 일 실시예에 따른 객체 인식의 예를 도시한 도면이다.
도 3은 본 발명의 일 실시예에 따른 객체 인식을 위한 사전학습 동작 방법을 도시한 도면이다.
도 4는 본 발명의 일 실시예에 따른 객체 인식을 위한 인식추출 동작 방법을 도시한 도면이다.
도 5는 본 발명의 일 실시예에 따른 다른 객체 인식 방법을 도시한 도면이다.
도 6은 본 발명의 일 실시예에 따른 객체 인식을 위한 다른 사전학습 동작 방법을 도시한 도면이다.
도 7은 본 발명의 제1 실시예에 따른 객체 인식을 위한 인식추출 동작 방법을 도시한 도면이다.
도 8은 본 발명의 제2 실시예에 따른 객체 인식을 위한 인식추출 동작 방법을 도시한 도면이다.
도 9는 본 발명의 제3 실시예에 따른 객체 인식을 위한 인식추출 동작 방법을 도시한 도면이다.
도 10은 본 발명의 제4 실시예에 따른 객체 인식을 위한 인식추출 동작 방법을 도시한 도면이다.
도 11은 본 발명의 제5 실시예에 따른 객체 인식을 위한 인식추출 동작 방법을 도시한 도면이다.
도 12는 본 발명의 제6 실시예에 따른 객체 인식을 위한 인식추출 동작 방법을 도시한 도면이다.도 13은 본 발명의 일 실시예에 따른 객체 인식 장치의 기능적 구성을 도시한 도면이다.1 is a diagram illustrating an object recognition method according to an embodiment of the present invention.
2A is a diagram illustrating an example of image collection according to an embodiment of the present invention.
2B is a diagram illustrating an example of learning an object recognition deep learning model according to an embodiment of the present invention.
2C and 2D are diagrams illustrating an example of object recognition according to an embodiment of the present invention.
3 is a diagram illustrating a pre-learning operation method for object recognition according to an embodiment of the present invention.
4 is a diagram illustrating a recognition extraction operation method for object recognition according to an embodiment of the present invention.
5 is a diagram illustrating another object recognition method according to an embodiment of the present invention.
6 is a diagram illustrating another method of pre-learning operation for object recognition according to an embodiment of the present invention.
7 is a diagram illustrating a recognition extraction operation method for object recognition according to the first embodiment of the present invention.
FIG. 8 is a diagram illustrating a recognition extraction operation method for object recognition according to a second embodiment of the present invention.
9 is a diagram illustrating a recognition extraction operation method for object recognition according to a third embodiment of the present invention.
10 is a diagram illustrating a recognition extraction operation method for object recognition according to a fourth embodiment of the present invention.
11 is a diagram illustrating a recognition extraction operation method for object recognition according to a fifth embodiment of the present invention.
12 is a diagram illustrating a recognition extraction operation method for object recognition according to a sixth embodiment of the present invention. FIG. 13 is a diagram showing a functional configuration of an object recognition apparatus according to an embodiment of the present invention.

본 발명은 다양한 변경을 가할 수 있고, 여러 가지 실시예들을 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 이를 상세히 설명하고자 한다. In the present invention, various modifications may be made and various embodiments may be provided, and specific embodiments will be illustrated in the drawings and described in detail.

청구범위에 개시된 발명의 다양한 특징들은 도면 및 상세한 설명을 고려하여 더 잘 이해될 수 있을 것이다. 명세서에 개시된 장치, 방법, 제법 및 다양한 실시예들은 예시를 위해서 제공되는 것이다. 개시된 구조 및 기능상의 특징들은 통상의 기술자로 하여금 다양한 실시예들을 구체적으로 실시할 수 있도록 하기 위한 것이고, 발명의 범위를 제한하기 위한 것이 아니다. 개시된 용어 및 문장들은 개시된 발명의 다양한 특징들을 이해하기 쉽게 설명하기 위한 것이고, 발명의 범위를 제한하기 위한 것이 아니다.Various features of the invention disclosed in the claims may be better understood in view of the drawings and detailed description. The apparatus, method, preparation method, and various embodiments disclosed in the specification are provided for illustration purposes. The disclosed structural and functional features are intended to enable a person skilled in the art to specifically implement various embodiments, and are not intended to limit the scope of the invention. The disclosed terms and sentences are intended to describe various features of the disclosed invention in an easy to understand manner, and are not intended to limit the scope of the invention.

본 발명을 설명함에 있어서, 관련된 공지기술에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우, 그 상세한 설명을 생략한다.In describing the present invention, when it is determined that a detailed description of a related known technology may unnecessarily obscure the subject matter of the present invention, a detailed description thereof will be omitted.

이하, 본 발명의 일 실시예에 따른 머신 러닝을 이용한 영상 내 객체 인식 방법 및 장치를 설명한다.Hereinafter, a method and apparatus for recognizing an object in an image using machine learning according to an embodiment of the present invention will be described.

도 1은 본 발명의 일 실시예에 따른 객체 인식 방법을 도시한 도면이다. 도 2a는 본 발명의 일 실시예에 따른 영상 수집의 예를 도시한 도면이다. 도 2b는 본 발명의 일 실시예에 따른 객체 인식 딥러닝 모델 학습의 예를 도시한 도면이다. 도 2c 및 2d는 본 발명의 일 실시예에 따른 객체 인식의 예를 도시한 도면이다.1 is a diagram illustrating an object recognition method according to an embodiment of the present invention. 2A is a diagram illustrating an example of image collection according to an embodiment of the present invention. 2B is a diagram illustrating an example of learning an object recognition deep learning model according to an embodiment of the present invention. 2C and 2D are diagrams illustrating an example of object recognition according to an embodiment of the present invention.

도 1을 참고하면, S101 단계는, 객체 관련 영상을 획득하는 단계이다. 일 실시예에서, 도 2a를 참고하면, 객체 관련 영상(201)을 획득하고, 객체 관련 영상(201)을 다수의 프레임으로 분할하며, 다수의 프레임 중 객체가 포함된 프레임(203)을 결정할 수 있다. Referring to FIG. 1, step S101 is a step of obtaining an image related to an object. In one embodiment, referring to FIG. 2A, an object-related image 201 is obtained, the object-related image 201 is divided into a plurality of frames, and a frame 203 including an object among the plurality of frames may be determined. have.

예를 들어, 다수의 프레임은 객체 관련 영상(201)을 1초 단위로 분할하여 생성될 수 있다. For example, a plurality of frames may be generated by dividing the object-related image 201 in units of 1 second.

S103 단계는, 객체 인식 딥러닝 모델을 이용하여, 객체 관련 영상으로부터 객체 및 객체 표시 시간을 인식하는 단계이다.Step S103 is a step of recognizing an object and an object display time from an object-related image using an object recognition deep learning model.

일 실시예에서, 도 2b를 참고하면, 미리 태깅된 객체의 학습 이미지로부터 객체 인식 딥러닝 모델(210)을 학습시킬 수 있다. 예를 들어, 미리 태깅된 객체의 학습 이미지로부터 특징(feature)을 결정하고, 결정된 특징을 벡터(vector) 값으로 변환할 수 있다. In an embodiment, referring to FIG. 2B, the object recognition deep learning model 210 may be trained from a training image of a pre-tagged object. For example, a feature may be determined from a learning image of a pre-tagged object, and the determined feature may be converted into a vector value.

일 실시예에서, 도 2c 및 2d를 참고하면, 객체 식별자(220) 및 해당 객체가 표시되는 화면에 대한 객체 표시 시간을 결정할 수 있다. In an embodiment, referring to FIGS. 2C and 2D, an object identifier 220 and an object display time for a screen on which the corresponding object is displayed may be determined.

일 실시예에서, 객체 및 객체 표시 시간에 기반하여 객체 관련 영상을 디스플레이할 수 있다. In an embodiment, an object-related image may be displayed based on the object and the object display time.

일 실시예에서, 객체 표시 시간에 대한 입력을 획득하고, 다수의 프레임 중 객체 표시 시간에 대응하는 객체가 포함된 프레임을 디스플레이할 수 있다. In an embodiment, an input for an object display time may be obtained, and a frame including an object corresponding to the object display time among a plurality of frames may be displayed.

일 실시예에서, 사용자에 의한 객체 표시 시간에 대한 입력의 횟수가 임계값 이상인 경우, 상기 객체 표시 시간에 대응하는 객체가 포함되는 적어도 하나의 객체 관련 영상의 리스트를 디스플레이할 수 있다.In an embodiment, when the number of times the user inputs the object display time is greater than or equal to a threshold value, a list of at least one object-related image including an object corresponding to the object display time may be displayed.

즉, 해당 객체 표시 시간으로의 타임 워프의 횟수가 일정 수 이상인 경우, 해당 객체에 대한 사용자의 선호도가 높은 것으로 판단하고, 해당 객체와 관련된 다양한 영상들의 리스트를 사용자에게 제공함으로써, 사용자의 객체 검색 활용성을 높일 수 있다. That is, if the number of time warps to the display time of the object is more than a certain number, it is determined that the user's preference for the object is high, and a list of various images related to the object is provided to the user, thereby utilizing the user's object search. You can increase your sex.

예를 들어, 상기 객체는, 화장품, 악세사리, 패션잡화 등 다양한 제품을 포함할 수 있으나, 이에 제한되지 않는다. For example, the object may include various products such as cosmetics, accessories, and fashion goods, but is not limited thereto.

도 3은 본 발명의 일 실시예에 따른 객체 인식을 위한 사전학습 동작 방법을 도시한 도면이다.3 is a diagram illustrating a pre-learning operation method for object recognition according to an embodiment of the present invention.

도 3을 참고하면, S301 단계는, 자체 확보한 알고리즘으로 학습영상을 수집하는 단계이다. 여기서, 학습영상은 객체 인식 딥러닝 모델의 학습을 위한 영상을 포함할 수 있다. Referring to FIG. 3, step S301 is a step of collecting a learning image using an algorithm secured by itself. Here, the training image may include an image for learning an object recognition deep learning model.

일 실시예에서, 학습영상에 존재하는 키워드를 파악하고 키워드들이 자체 확보한 알고리즘을 이용하여, 영상으로 사용할 수 있는 영상과 사용할 수 없는 영상을 구분할 수 있다.In an embodiment, a keyword existing in a learning image may be identified, and an image that can be used as an image and an image that cannot be used may be distinguished by using an algorithm obtained by the keywords themselves.

S303 단계는, 학습영상으로부터 객체 이미지 정보를 추출하는 단계이다. 예를 들어, 블러현상과 번짐 현상에 대한 문제를 최소화하기 위해 1초 단위로 객체 이미지 정보를 추출하여 학습영상을 세분화할 수 있다.Step S303 is a step of extracting object image information from the training video. For example, in order to minimize the problem of blurring and blurring, the learning image can be subdivided by extracting object image information every second.

S305 단계는, 객체 이미지 정보로부터 객체 인식 딥러닝 모델(210)을 학습시키는 단계이다. 이 경우, 객체 이미지 정보는 객체의 학습 이미지를 포함할 수 있다. Step S305 is a step of learning the object recognition deep learning model 210 from the object image information. In this case, the object image information may include a learning image of the object.

이 경우, 학습 이미지의 객체는 사용자에 의해 미리 태깅될 수 있다. 즉, 최초 사용자의 개입으로 객체를 태깅하기, 최소화 시킬 수 있는 최소 수량을 구해 도입할 수 있다.In this case, the object of the training image may be tagged in advance by the user. In other words, it is possible to obtain and introduce the minimum quantity that can be minimized by tagging an object with the intervention of the first user.

이후, 객체의 이미지 중에서 특징을 파악하여 벡터 형태의 계산할 수 있다. 예를 들어, 객체 인식 딥러닝 모델(210)은 YOLO 알고리즘, SSD(Single Shot Multibox Detector) 알고리즘 및 CNN 알고리즘 등이 있으나, 다른 알고리즘의 적용을 배제하는 것은 아니다.Thereafter, a feature of the object image may be identified and a vector form may be calculated. For example, the object recognition deep learning model 210 includes a YOLO algorithm, a single shot multibox detector (SSD) algorithm, and a CNN algorithm, but the application of other algorithms is not excluded.

S307 단계는, 객체 인식 딥러닝 모델(210)의 학습에 따라 계산된 학습파일을 저장하는 단계이다. 이 경우, 학습파일은 추출하는 서버로 이동하여 추출의 적정성을 측정할 수 있다.In step S307, a learning file calculated according to the learning of the object recognition deep learning model 210 is stored. In this case, the learning file can be moved to the server to be extracted and the appropriateness of the extraction can be measured.

S309 단계는, 학습파일을 활용하여 객체 관련 영상에서 객체를 자동 태깅하는 단계이다. 즉, 새로 유입된 객체 관련 영상에서의 객체를 학습할 수 있는 데이터로 자동 유입될 수 있도록 하는 자동 태깅단계이다. Step S309 is a step of automatically tagging an object in an object-related image by using the learning file. In other words, it is an automatic tagging step in which an object in a newly introduced object-related image can be automatically introduced into data that can be learned.

일 실시예에서, 양질의 학습 이미지를 많이 입수하고 학습을 시킬수록 인식률이 많이 올라가므로 이를 반복적으로 학습하여 원하는 인식률이 나올때까지 S305 단계 내지 S309 단계를 반복할 수 있다. In one embodiment, since the recognition rate increases as more and more high-quality learning images are acquired and learned, steps S305 to S309 may be repeated until a desired recognition rate is achieved by repeatedly learning this.

도 4는 본 발명의 일 실시예에 따른 객체 인식을 위한 인식추출 동작 방법을 도시한 도면이다.4 is a diagram illustrating a recognition extraction operation method for object recognition according to an embodiment of the present invention.

도 4를 참고하면, S401 단계는, 객체 관련 영상을 획득하는 단계이다. 즉, 새로운 영상을 입력할 수 있다. 일 실시예에서, 새로운 영상은 도 3의 S301 단계와 동일한 방식으로 획득될 수 있다.Referring to FIG. 4, step S401 is a step of obtaining an image related to an object. That is, a new image can be input. In an embodiment, a new image may be acquired in the same manner as in step S301 of FIG. 3.

S403 단계는, 객체 관련 영상에서 객체 이미지 정보를 추출할 수 있다. 즉, 객체 관련 영상으로부터 객체가 포함된 프레임을 추출할 수 있다. 예를 들어, 객체 이미지 정보가 입력될 수 있도록 1초 단위 이미지로 추출할 수 있다.In step S403, object image information may be extracted from the object-related image. That is, a frame including an object may be extracted from an object-related image. For example, object image information may be extracted in units of 1 second so that information on the object image can be input.

S405 단계는, 객체 이미지 정보와 객체 인식 딥러닝 모델에 의해 생성된 학습파일의 일치 여부를 판단하는 단계이다. 즉, 객체 이미지 정보와 학습 파일을 가지고 객체의 종류를 찾아낼 수 있다. 여기서, 학습 파일은 기존 객체 DB(database)를 포함할 수 있다.In step S405, it is determined whether the object image information and the learning file generated by the object recognition deep learning model match. In other words, it is possible to find the type of the object with the object image information and the learning file. Here, the learning file may include an existing object DB (database).

S407 단계는, 객체 이미지 정보와 객체 인식 딥러닝 모델에 의해 생성된 학습파일이 일치하는 경우, 객체 이미지 정보에 대응하는 객체의 식별자 및 객체 표시 시간(time)을 추출하는 단계이다. In step S407, when the object image information and the learning file generated by the object recognition deep learning model match, an identifier of an object corresponding to the object image information and an object display time are extracted.

S409 단계는, 객체 이미지 정보와 객체 인식 딥러닝 모델에 의해 생성된 학습파일이 일치하지 않는 경우, 새로운 객체를 등록할 수 있도록 객체 이미지 정보를 저장하는 단계이다. In step S409, when the object image information and the learning file generated by the object recognition deep learning model do not match, the object image information is stored so that a new object can be registered.

즉, 매칭할 수 없는 데이터들은 또 수동 태깅하여 객체 인식 딥러닝 모델의 학습에 이용하여 다음번 인식추출단계에서는 객체 DB와 매칭 될 수 있도록 선순환 사이클(Circle)이 원활이 만들어지도록 시스템을 구성할 수 있다.That is, the data that cannot be matched can be manually tagged and used for learning the object recognition deep learning model, and the system can be configured to smoothly create a virtuous cycle so that it can be matched with the object DB in the next recognition extraction step. .

도 5는 본 발명의 일 실시예에 따른 다른 객체 인식 방법을 도시한 도면이다.5 is a diagram illustrating another object recognition method according to an embodiment of the present invention.

도 5를 참고하면, S501 단계는, 객체 관련 영상을 획득하는 단계이다. Referring to FIG. 5, step S501 is a step of obtaining an image related to an object.

S503 단계는, 객체 관련 영상으로부터 객체 문자 정보를 추출하는 단계이다. 예를 들어, 객체 문자 정보는, 객체 관련 영상에 포함된 객체에 대한 OCR(optical character recognition) 정보를 포함할 수 있다. 일 실시예에서, 객체 관련 영상으로부터 객체 이미지 정보, 객체 사운드 정보 및 객체 문자 정보 중 적어도 하나를 추출할 수 있다. In step S503, object text information is extracted from an object-related image. For example, the object text information may include optical character recognition (OCR) information on an object included in an object-related image. In an embodiment, at least one of object image information, object sound information, and object text information may be extracted from an object-related image.

S505 단계는, 객체 문자 정보를 객체 인식 딥러닝 모델에 입력하여, 객체 관련 영상으로부터 객체, 객체 식별자 및 객체 표시 시간 중 적어도 하나를 인식하는 단계이다. 일 실시예에서, 객체 이미지 정보, 객체 사운드 정보 및 객체 문자 정보 중 적어도 하나를 객체 인식 딥러닝 모델에 입력하여, 객체 관련 영상으로부터 객체, 객체 식별자 및 객체 표시 시간 중 적어도 하나를 인식할 수 있다. 예를 들어, 객체 식별자는 객체의 ID(identification)를 포함할 수 있다. Step S505 is a step of recognizing at least one of an object, an object identifier, and an object display time from an object related image by inputting object text information into an object recognition deep learning model. In an embodiment, at least one of an object, an object identifier, and an object display time may be recognized from an object-related image by inputting at least one of object image information, object sound information, and object text information into an object recognition deep learning model. For example, the object identifier may include an ID (identification) of the object.

도 6은 본 발명의 일 실시예에 따른 객체 인식을 위한 다른 사전학습 동작 방법을 도시한 도면이다. 6 is a diagram illustrating another pre-learning operation method for object recognition according to an embodiment of the present invention.

도 6을 참고하면, S601 단계는, 자체 확보한 알고리즘으로 학습영상을 수집하는 단계이다. 여기서, 학습영상은 객체 인식 딥러닝 모델의 학습을 위한 영상을 포함할 수 있다. Referring to FIG. 6, step S601 is a step of collecting a learning image using an algorithm secured by itself. Here, the training image may include an image for learning an object recognition deep learning model.

S603 단계는, 학습영상으로부터 객체 이미지 정보, 객체 사운드 정보 및 객체 문자 정보를 추출하는 단계이다. 예를 들어, 블러현상과 번짐 현상에 대한 문제를 최소화하기 위해 1초 단위로 객체 이미지 정보, 객체 사운드 정보 및 객체 문자 정보를 추출하여 학습영상을 세분화할 수 있다.In step S603, object image information, object sound information, and object text information are extracted from the learning image. For example, in order to minimize the problem of blur and blur, the learning image can be subdivided by extracting object image information, object sound information, and object text information every second.

일 실시예에서, 객체 이미지 정보는 객체에 해당하는 이미지 영역을 포함할 수 있다. 객체 사운드 정보는 객체를 발음하는 소리 정보를 포함할 수 있다. 객체 문자 정보는 영상의 자막 또는 설명 문자를 포함할 수 있다. In an embodiment, the object image information may include an image area corresponding to the object. The object sound information may include sound information for pronouncing the object. The object text information may include a caption or description text of an image.

S605 단계는, 객체 이미지 정보, 객체 사운드 정보 및 객체 문자 정보를 이용하여 객체 인식 딥러닝 모델(210)을 학습시키는 단계이다. 이 경우, 객체 이미지 정보는 객체의 학습 이미지를 포함할 수 있다. In step S605, the object recognition deep learning model 210 is trained using object image information, object sound information, and object text information. In this case, the object image information may include a learning image of the object.

이 경우, 학습 이미지의 객체 이미지 정보, 객체 사운드 정보 및 객체 문자 정보는 사용자에 의해 미리 태깅될 수 있다. 즉, 최초 사용자의 개입으로 객체를 태깅하는 것을 최소화 시킬 수 있는 최소 수량을 구해 도입할 수 있다.In this case, object image information, object sound information, and object text information of the training image may be previously tagged by the user. In other words, it is possible to obtain and introduce the minimum quantity that can minimize tagging of objects by the intervention of the first user.

S607 단계는, 객체 인식 딥러닝 모델(210)의 학습에 따라 계산된 학습파일을 저장하는 단계이다. 이 경우, 학습파일은 추출하는 서버로 이동하여 추출의 적정성을 측정할 수 있다.Step S607 is a step of storing a training file calculated according to the learning of the object recognition deep learning model 210. In this case, the learning file can be moved to the server to be extracted and the appropriateness of the extraction can be measured.

일 실시예에서, 양질의 학습 이미지를 많이 입수하고 학습을 시킬수록 인식률이 많이 올라가므로 이를 반복적으로 학습하여 원하는 인식률이 나올때까지 S605 단계 및 S607 단계를 반복할 수 있다. In an embodiment, since the recognition rate increases as more and more high-quality learning images are acquired and learned, steps S605 and S607 may be repeated until a desired recognition rate is achieved by learning them repeatedly.

도 7은 본 발명의 제1 실시예에 따른 객체 인식을 위한 인식추출 동작 방법을 도시한 도면이다.7 is a diagram illustrating a recognition extraction operation method for object recognition according to the first embodiment of the present invention.

도 7을 참고하면, 701 단계는, 객체 관련 영상을 획득하는 단계이다. 즉, 새로운 영상을 입력할 수 있다. 일 실시예에서, 새로운 영상은 도 6의 S601 단계와 동일한 방식으로 획득될 수 있다. Referring to FIG. 7, operation 701 is an operation of obtaining an image related to an object. That is, a new image can be input. In an embodiment, a new image may be acquired in the same manner as in step S601 of FIG. 6.

S703 단계는, 객체 관련 영상에서 객체 이미지 정보, 객체 사운드 정보 및 객체 문자 정보를 추출하는 단계이다. 즉, 객체 관련 영상으로부터 객체가 포함된 프레임을 추출할 수 있다. 예를 들어, 객체 이미지 정보, 객체 사운드 정보 및 객체 문자 정보가 입력될 수 있도록 1초 단위 이미지로 추출할 수 있다. In step S703, object image information, object sound information, and object text information are extracted from an object-related image. That is, a frame including an object may be extracted from an object-related image. For example, object image information, object sound information, and object text information may be extracted as an image in units of 1 second to be input.

S705 단계는, 객체 이미지 정보와 객체 인식 딥러닝 모델에 의해 생성된 학습 이미지 정보의 일치 여부를 판단하는 단계이다. 즉, 객체 이미지 정보와 학습 파일의 객체에 대한 학습 이미지 정보를 가지고 객체의 종류를 찾아낼 수 있다. 여기서, 학습 파일은 기존 객체 DB(database)를 포함할 수 있다.In step S705, it is determined whether the object image information and the training image information generated by the object recognition deep learning model match. That is, it is possible to find the type of the object with the object image information and the learning image information about the object in the learning file. Here, the learning file may include an existing object DB (database).

S707 단계는, 객체 이미지 정보와 객체 인식 딥러닝 모델에 의해 생성된 학습 이미지 정보가 일치하지 않는 경우, 객체 사운드 정보와 객체 인식 딥러닝 모델에 의해 생성된 학습 파일의 객체에 대한 학습 사운드 정보의 일치 여부를 판단하는 단계이다. 즉, 객체 사운드 정보와 학습 파일의 객체에 대한 학습 사운드 정보를 가지고 객체의 종류를 찾아낼 수 있다. In step S707, when the object image information and the training image information generated by the object recognition deep learning model do not match, the object sound information and the learning sound information for the object of the training file generated by the object recognition deep learning model are matched. This is the step of determining whether or not. That is, it is possible to find the type of the object based on the object sound information and the learning sound information on the object in the learning file.

S709 단계는, 객체 사운드 정보와 객체 인식 딥러닝 모델에 의해 생성된 학습 사운드 정보가 일치하지 않는 경우, 객체 문자 정보와 객체 인식 딥러닝 모델에 의해 생성된 학습 파일의 객체에 대한 학습 문자 정보의 일치 여부를 판단하는 단계이다. 즉, 객체 문자 정보와 학습 파일의 객체에 대한 학습 문자 정보를 가지고 객체의 종류를 찾아낼 수 있다. In step S709, when the object sound information and the learning sound information generated by the object recognition deep learning model do not match, the object text information and the learning text information for the object of the training file generated by the object recognition deep learning model are matched. This is the step of determining whether or not. In other words, it is possible to find the type of the object based on the object text information and the learning text information on the object in the learning file.

S711 단계는, 객체 문자 정보와 객체 인식 딥러닝 모델에 의해 생성된 학습파일의 학습 문자 정보가 일치하지 않는 경우, 새로운 객체를 등록할 수 있도록 객체 이미지 정보, 객체 사운드 정보 및 객체 문자 정보를 객체 인식 딥러닝 모델의 학습 파일에 저장하는 단계이다. In step S711, when the object text information and the learning text information of the learning file generated by the object recognition deep learning model do not match, object image information, object sound information, and object text information are recognized so that a new object can be registered. This is the step of saving in the training file of the deep learning model.

즉, 매칭할 수 없는 데이터들은 또 수동 태깅하여 객체 인식 딥러닝 모델의 학습에 이용하여 다음번 인식추출단계에서는 객체 DB와 매칭될 수 있도록 선순환 사이클(Circle)이 원활이 만들어지도록 시스템을 구성할 수 있다.That is, data that cannot be matched can be manually tagged and used for learning the object recognition deep learning model, and the system can be configured so that a virtuous cycle can be created smoothly so that it can be matched with the object DB in the next recognition extraction step. .

S713 단계는, 객체 이미지 정보와 학습 이미지 정보가 일치하거나, 객체 사운드 정보와 학습 사운드 정보가 일치하거나 또는 객체 문자 정보와 학습 문자 정보가 일치하는 경우, 객체 이미지 정보, 객체 사운드 정보 및 객체 문자 정보 중 적어도 하나에 대응하는 객체 식별자 및 객체 표시 시간(time)을 추출하는 단계이다. In step S713, when object image information and learning image information match, object sound information and learning sound information match, or object text information and learning text information match, among object image information, object sound information, and object text information This is the step of extracting an object identifier and an object display time corresponding to at least one.

S715 단계는, 객체 이미지 정보, 객체 사운드 정보 및 객체 문자 정보 중 적어도 하나에 대응하는 객체 식별자 및 객체 표시 시간을 학습 파일에 추가하는 단계이다. In step S715, an object identifier and an object display time corresponding to at least one of object image information, object sound information, and object text information are added to the learning file.

도 8은 본 발명의 제2 실시예에 따른 객체 인식을 위한 인식추출 동작 방법을 도시한 도면이다.8 is a diagram illustrating a method of extracting recognition for object recognition according to a second embodiment of the present invention.

도 8을 참고하면, S801 단계는, 객체 관련 영상을 획득하는 단계이다. Referring to FIG. 8, step S801 is a step of obtaining an object-related image.

S803 단계는, 객체 관련 영상에서 객체 이미지 정보, 객체 사운드 정보 및 객체 문자 정보를 추출하는 단계이다. In step S803, object image information, object sound information, and object text information are extracted from an object-related image.

S805 단계는, 객체 이미지 정보와 객체 인식 딥러닝 모델에 의해 생성된 학습 이미지 정보의 일치 여부를 판단하는 단계이다. In step S805, it is determined whether the object image information and the training image information generated by the object recognition deep learning model match.

S807 단계는, 객체 이미지 정보와 객체 인식 딥러닝 모델에 의해 생성된 학습 이미지 정보가 일치하지 않는 경우, 객체 문자 정보와 객체 인식 딥러닝 모델에 의해 생성된 학습 파일의 객체에 대한 학습 문자 정보의 일치 여부를 판단하는 단계이다. In step S807, if the object image information and the training image information generated by the object recognition deep learning model do not match, the object text information and the training text information for the object of the training file generated by the object recognition deep learning model are matched. This is the step of determining whether or not.

S809 단계는, 객체 문자 정보와 객체 인식 딥러닝 모델에 의해 생성된 학습 문자 정보가 일치하지 않는 경우, 객체 사운드 정보와 객체 인식 딥러닝 모델에 의해 생성된 학습 파일의 객체에 대한 학습 사운드 정보의 일치 여부를 판단하는 단계이다. In step S809, when the object text information and the learning text information generated by the object recognition deep learning model do not match, the object sound information and the learning sound information for the object of the training file generated by the object recognition deep learning model are matched. This is the step of determining whether or not.

S811 단계는, 객체 사운드 정보와 객체 인식 딥러닝 모델에 의해 생성된 학습파일의 학습 사운드 정보가 일치하지 않는 경우, 새로운 객체를 등록할 수 있도록 객체 이미지 정보, 객체 사운드 정보 및 객체 문자 정보를 객체 인식 딥러닝 모델의 학습 파일에 저장하는 단계이다. In step S811, when the object sound information and the learning sound information of the learning file generated by the object recognition deep learning model do not match, object image information, object sound information, and object text information are object recognized so that a new object can be registered. This is the step of saving in the training file of the deep learning model.

S813 단계는, 객체 이미지 정보와 학습 이미지 정보가 일치하거나, 객체 사운드 정보와 학습 사운드 정보가 일치하거나 또는 객체 문자 정보와 학습 문자 정보가 일치하는 경우, 객체 이미지 정보, 객체 사운드 정보 및 객체 문자 정보 중 적어도 하나에 대응하는 객체 식별자 및 객체 표시 시간(time)을 추출하는 단계이다. In step S813, when object image information and learning image information match, object sound information and learning sound information match, or object text information and learning text information match, among object image information, object sound information, and object text information This is the step of extracting an object identifier and an object display time corresponding to at least one.

S815 단계는, 객체 이미지 정보, 객체 사운드 정보 및 객체 문자 정보 중 적어도 하나에 대응하는 객체 식별자 및 객체 표시 시간을 학습 파일에 추가하는 단계이다. In step S815, an object identifier and an object display time corresponding to at least one of object image information, object sound information, and object text information are added to the learning file.

도 9는 본 발명의 제3 실시예에 따른 객체 인식을 위한 인식추출 동작 방법을 도시한 도면이다.9 is a diagram illustrating a recognition extraction operation method for object recognition according to a third embodiment of the present invention.

도 9를 참고하면, S901 단계는, 객체 관련 영상을 획득하는 단계이다. Referring to FIG. 9, step S901 is a step of obtaining an image related to an object.

S903 단계는, 객체 관련 영상에서 객체 이미지 정보, 객체 사운드 정보 및 객체 문자 정보를 추출하는 단계이다. In step S903, object image information, object sound information, and object text information are extracted from an object-related image.

S905 단계는, 객체 사운드 정보와 객체 인식 딥러닝 모델에 의해 생성된 학습 사운드 정보의 일치 여부를 판단하는 단계이다. In step S905, it is determined whether the object sound information and the learning sound information generated by the object recognition deep learning model match.

S907 단계는, 객체 사운드 정보와 객체 인식 딥러닝 모델에 의해 생성된 학습 사운드 정보가 일치하지 않는 경우, 객체 이미지 정보와 객체 인식 딥러닝 모델에 의해 생성된 학습 파일의 객체에 대한 학습 이미지 정보의 일치 여부를 판단하는 단계이다. In step S907, when the object sound information and the learning sound information generated by the object recognition deep learning model do not match, the object image information and the training image information for the object of the training file generated by the object recognition deep learning model are matched. This is the step of determining whether or not.

S909 단계는, 객체 이미지 정보와 객체 인식 딥러닝 모델에 의해 생성된 학습 이미지 정보가 일치하지 않는 경우, 객체 문자 정보와 객체 인식 딥러닝 모델에 의해 생성된 학습 파일의 객체에 대한 학습 문자 정보의 일치 여부를 판단하는 단계이다. In step S909, if the object image information and the training image information generated by the object recognition deep learning model do not match, the object text information and the learning text information for the object of the training file generated by the object recognition deep learning model are matched. This is the step of determining whether or not.

S911 단계는, 객체 문자 정보와 객체 인식 딥러닝 모델에 의해 생성된 학습파일의 학습 문자 정보가 일치하지 않는 경우, 새로운 객체를 등록할 수 있도록 객체 이미지 정보, 객체 사운드 정보 및 객체 문자 정보를 객체 인식 딥러닝 모델의 학습 파일에 저장하는 단계이다. In step S911, when the object text information and the learning text information of the learning file generated by the object recognition deep learning model do not match, object image information, object sound information, and object text information are object recognized so that a new object can be registered. This is the step of saving in the training file of the deep learning model.

S913 단계는, 객체 이미지 정보와 학습 이미지 정보가 일치하거나, 객체 사운드 정보와 학습 사운드 정보가 일치하거나 또는 객체 문자 정보와 학습 문자 정보가 일치하는 경우, 객체 이미지 정보, 객체 사운드 정보 및 객체 문자 정보 중 적어도 하나에 대응하는 객체 식별자 및 객체 표시 시간(time)을 추출하는 단계이다. In step S913, when the object image information and the learning image information match, the object sound information and the learning sound information match, or the object text information and the learning text information match, among the object image information, the object sound information, and the object text information This is the step of extracting an object identifier and an object display time corresponding to at least one.

S915 단계는, 객체 이미지 정보, 객체 사운드 정보 및 객체 문자 정보 중 적어도 하나에 대응하는 객체 식별자 및 객체 표시 시간을 학습 파일에 추가하는 단계이다. In step S915, an object identifier and an object display time corresponding to at least one of object image information, object sound information, and object text information are added to the learning file.

도 10은 본 발명의 제4 실시예에 따른 객체 인식을 위한 인식추출 동작 방법을 도시한 도면이다.10 is a diagram illustrating a recognition extraction operation method for object recognition according to a fourth embodiment of the present invention.

도 10을 참고하면, S1001 단계는, 객체 관련 영상을 획득하는 단계이다. Referring to FIG. 10, step S1001 is a step of obtaining an object-related image.

S1003 단계는, 객체 관련 영상에서 객체 이미지 정보, 객체 사운드 정보 및 객체 문자 정보를 추출하는 단계이다. In step S1003, object image information, object sound information, and object text information are extracted from an object-related image.

S1005 단계는, 객체 사운드 정보와 객체 인식 딥러닝 모델에 의해 생성된 학습 사운드 정보의 일치 여부를 판단하는 단계이다. In step S1005, it is determined whether the object sound information and the learning sound information generated by the object recognition deep learning model match.

S1007 단계는, 객체 사운드 정보와 객체 인식 딥러닝 모델에 의해 생성된 학습 사운드 정보가 일치하지 않는 경우, 객체 문자 정보와 객체 인식 딥러닝 모델에 의해 생성된 학습 파일의 객체에 대한 학습 문자 정보의 일치 여부를 판단하는 단계이다. In step S1007, when the object sound information and the learning sound information generated by the object recognition deep learning model do not match, the object text information and the learning text information for the object of the training file generated by the object recognition deep learning model are matched. This is the step of determining whether or not.

S1009 단계는, 객체 문자 정보와 객체 인식 딥러닝 모델에 의해 생성된 학습 문자 정보가 일치하지 않는 경우, 객체 이미지 정보와 객체 인식 딥러닝 모델에 의해 생성된 학습 파일의 객체에 대한 학습 이미지 정보의 일치 여부를 판단하는 단계이다. In step S1009, when the object text information and the learning text information generated by the object recognition deep learning model do not match, the object image information and the training image information for the object of the training file generated by the object recognition deep learning model are matched. This is the step of determining whether or not.

S1011 단계는, 객체 이미지 정보와 객체 인식 딥러닝 모델에 의해 생성된 학습파일의 학습 이미지 정보가 일치하지 않는 경우, 새로운 객체를 등록할 수 있도록 객체 이미지 정보, 객체 사운드 정보 및 객체 문자 정보를 객체 인식 딥러닝 모델의 학습 파일에 저장하는 단계이다. In step S1011, when the object image information and the training image information of the learning file generated by the object recognition deep learning model do not match, object image information, object sound information, and object text information are object recognized so that a new object can be registered. This is the step of saving in the training file of the deep learning model.

S1013 단계는, 객체 이미지 정보와 학습 이미지 정보가 일치하거나, 객체 사운드 정보와 학습 사운드 정보가 일치하거나 또는 객체 문자 정보와 학습 문자 정보가 일치하는 경우, 객체 이미지 정보, 객체 사운드 정보 및 객체 문자 정보 중 적어도 하나에 대응하는 객체 식별자 및 객체 표시 시간(time)을 추출하는 단계이다. In step S1013, when object image information and learning image information match, object sound information and learning sound information match, or object text information and learning text information match, among object image information, object sound information, and object text information This is the step of extracting an object identifier and an object display time corresponding to at least one.

S1015 단계는, 객체 이미지 정보, 객체 사운드 정보 및 객체 문자 정보 중 적어도 하나에 대응하는 객체 식별자 및 객체 표시 시간을 학습 파일에 추가하는 단계이다. In step S1015, an object identifier and an object display time corresponding to at least one of object image information, object sound information, and object text information are added to the learning file.

도 11은 본 발명의 제5 실시예에 따른 객체 인식을 위한 인식추출 동작 방법을 도시한 도면이다.11 is a diagram illustrating a recognition extraction operation method for object recognition according to a fifth embodiment of the present invention.

도 11을 참고하면, S1101 단계는, 객체 관련 영상을 획득하는 단계이다. Referring to FIG. 11, step S1101 is a step of obtaining an image related to an object.

S1103 단계는, 객체 관련 영상에서 객체 이미지 정보, 객체 사운드 정보 및 객체 문자 정보를 추출하는 단계이다. In step S1103, object image information, object sound information, and object text information are extracted from an object-related image.

S1105 단계는, 객체 문자 정보와 객체 인식 딥러닝 모델에 의해 생성된 학습 문자 정보의 일치 여부를 판단하는 단계이다. In step S1105, it is determined whether the object text information and the learning text information generated by the object recognition deep learning model match.

S1107 단계는, 객체 문자 정보와 객체 인식 딥러닝 모델에 의해 생성된 학습 문자 정보가 일치하지 않는 경우, 객체 이미지 정보와 객체 인식 딥러닝 모델에 의해 생성된 학습 파일의 객체에 대한 학습 이미지 정보의 일치 여부를 판단하는 단계이다. In step S1107, when the object text information and the learning text information generated by the object recognition deep learning model do not match, the object image information and the training image information for the object of the training file generated by the object recognition deep learning model are matched. This is the step of determining whether or not.

S1109 단계는, 객체 이미지 정보와 객체 인식 딥러닝 모델에 의해 생성된 학습 이미지 정보가 일치하지 않는 경우, 객체 사운드 정보와 객체 인식 딥러닝 모델에 의해 생성된 학습 파일의 객체에 대한 학습 사운드 정보의 일치 여부를 판단하는 단계이다. In step S1109, when the object image information and the training image information generated by the object recognition deep learning model do not match, the object sound information and the learning sound information for the object of the training file generated by the object recognition deep learning model are matched. This is the step of determining whether or not.

S1111 단계는, 객체 사운드 정보와 객체 인식 딥러닝 모델에 의해 생성된 학습파일의 학습 사운드 정보가 일치하지 않는 경우, 새로운 객체를 등록할 수 있도록 객체 이미지 정보, 객체 사운드 정보 및 객체 문자 정보를 객체 인식 딥러닝 모델의 학습 파일에 저장하는 단계이다. In step S1111, when the object sound information and the learning sound information of the learning file generated by the object recognition deep learning model do not match, object image information, object sound information, and object text information are object recognized so that a new object can be registered. This is the step of saving in the training file of the deep learning model.

S1113 단계는, 객체 이미지 정보와 학습 이미지 정보가 일치하거나, 객체 사운드 정보와 학습 사운드 정보가 일치하거나 또는 객체 문자 정보와 학습 문자 정보가 일치하는 경우, 객체 이미지 정보, 객체 사운드 정보 및 객체 문자 정보 중 적어도 하나에 대응하는 객체 식별자 및 객체 표시 시간(time)을 추출하는 단계이다. In step S1113, when the object image information and the learning image information match, the object sound information and the learning sound information match, or the object text information and the learning text information match, among the object image information, the object sound information, and the object text information This is the step of extracting an object identifier and an object display time corresponding to at least one.

S1115 단계는, 객체 이미지 정보, 객체 사운드 정보 및 객체 문자 정보 중 적어도 하나에 대응하는 객체 식별자 및 객체 표시 시간을 학습 파일에 추가하는 단계이다. In step S1115, an object identifier and an object display time corresponding to at least one of object image information, object sound information, and object text information are added to the learning file.

도 12는 본 발명의 제6 실시예에 따른 객체 인식을 위한 인식추출 동작 방법을 도시한 도면이다.12 is a diagram illustrating a recognition extraction operation method for object recognition according to a sixth embodiment of the present invention.

도 12를 참고하면, S1201 단계는, 객체 관련 영상을 획득하는 단계이다. Referring to FIG. 12, step S1201 is a step of obtaining an object-related image.

S1203 단계는, 객체 관련 영상에서 객체 이미지 정보, 객체 사운드 정보 및 객체 문자 정보를 추출하는 단계이다. Step S1203 is a step of extracting object image information, object sound information, and object text information from an object-related image.

S1205 단계는, 객체 이미지 정보와 객체 인식 딥러닝 모델에 의해 생성된 학습 이미지 정보의 일치 여부를 판단하는 단계이다. In step S1205, it is determined whether the object image information and the training image information generated by the object recognition deep learning model match.

S1207 단계는, 객체 문자 정보와 객체 인식 딥러닝 모델에 의해 생성된 학습 문자 정보가 일치하지 않는 경우, 객체 사운드 정보와 객체 인식 딥러닝 모델에 의해 생성된 학습 파일의 객체에 대한 학습 사운드 정보의 일치 여부를 판단하는 단계이다. In step S1207, when the object text information and the learning text information generated by the object recognition deep learning model do not match, the object sound information and the learning sound information for the object of the training file generated by the object recognition deep learning model are matched. This is the step of determining whether or not.

S1209 단계는, 객체 사운드 정보와 객체 인식 딥러닝 모델에 의해 생성된 학습 사운드 정보가 일치하지 않는 경우, 객체 이미지 정보와 객체 인식 딥러닝 모델에 의해 생성된 학습 파일의 객체에 대한 학습 이미지 정보의 일치 여부를 판단하는 단계이다. In step S1209, when the object sound information and the learning sound information generated by the object recognition deep learning model do not match, the object image information and the training image information for the object of the training file generated by the object recognition deep learning model are matched. This is the step of determining whether or not.

S1211 단계는, 객체 이미지 정보와 객체 인식 딥러닝 모델에 의해 생성된 학습파일의 학습 이미지 정보가 일치하지 않는 경우, 새로운 객체를 등록할 수 있도록 객체 이미지 정보, 객체 사운드 정보 및 객체 문자 정보를 객체 인식 딥러닝 모델의 학습 파일에 저장하는 단계이다. In step S1211, when the object image information and the training image information of the learning file generated by the object recognition deep learning model do not match, object image information, object sound information, and object text information are object recognized so that a new object can be registered. This is the step of saving in the training file of the deep learning model.

S1213 단계는, 객체 이미지 정보와 학습 이미지 정보가 일치하거나, 객체 사운드 정보와 학습 사운드 정보가 일치하거나 또는 객체 문자 정보와 학습 문자 정보가 일치하는 경우, 객체 이미지 정보, 객체 사운드 정보 및 객체 문자 정보 중 적어도 하나에 대응하는 객체 식별자 및 객체 표시 시간(time)을 추출하는 단계이다. In step S1213, when the object image information and the learning image information match, the object sound information and the learning sound information match, or the object text information and the learning text information match, among the object image information, the object sound information, and the object text information This is the step of extracting an object identifier and an object display time corresponding to at least one.

S1215 단계는, 객체 이미지 정보, 객체 사운드 정보 및 객체 문자 정보 중 적어도 하나에 대응하는 객체 식별자 및 객체 표시 시간을 학습 파일에 추가하는 단계이다. In step S1215, an object identifier and an object display time corresponding to at least one of object image information, object sound information, and object text information are added to the learning file.

상술한 다양한 실시예에서, 객체 이미지 정보를 객체 인식 딥러닝 모델에 입력하여 객체 이미지 정보에 대응하는 제1 객체를 인식하고, 객체 사운드 정보를 객체 인식 딥러닝 모델에 입력하여 객체 사운드 정보에 대응하는 제2 객체를 인식하고, 객체 문자 정보를 객체 인식 딥러닝 모델에 입력하여 객체 문자 정보에 대응하는 제3 객체를 인식할 수 있다. In the above-described various embodiments, object image information is input to an object recognition deep learning model to recognize a first object corresponding to object image information, and object sound information is input to an object recognition deep learning model to correspond to object sound information. A third object corresponding to the object text information may be recognized by recognizing the second object and inputting the object text information into the object recognition deep learning model.

이 경우, 객체 이미지 정보에 대응하는 제1 객체와 객체 사운드 정보에 대응하는 제2 객체와 객체 문자 정보에 대응하는 제3 객체 중 하나의 객체만이 다른 객체로 인식되는 경우, 동일하게 인식된 객체를 최종적으로 인식된 객체로 결정할 수 있다. In this case, when only one of the first object corresponding to the object image information, the second object corresponding to the object sound information, and the third object corresponding to the object text information is recognized as another object, the same recognized object Can be determined as the finally recognized object.

예를 들어, 제1 객체와 제2 객체가 동일한 객체로 인식되고, 제3 객체가 제1 객체 및 제2 객체와 다른 객체로 인식된 경우, 동일하게 인식된 객체, 즉, 제1 객체 및 제2 객체를 최종적으로 인식된 객체로 결정할 수 있다. For example, if a first object and a second object are recognized as the same object, and a third object is recognized as an object different from the first object and the second object, the same recognized object, that is, the first object and the second object. 2 The object can be determined as the finally recognized object.

또한, 객체 이미지 정보에 대응하는 제1 객체와 객체 사운드 정보에 대응하는 제2 객체와 객체 문자 정보에 대응하는 제3 객체가 서로 모두 상이한 경우, 제1 객체에 대응하는 객체 표시 시간, 제2 객체에 대응하는 객체 표시 시간, 제3 객체에 대응하는 객체 표시 시간을 추출하고, 해당 객체 관련 영상에서 각 객체에 대응하는 객체 표시 시간이 추출된 개수가 가장 많은 객체를 최종적으로 인식된 객체로 결정할 수 있다. In addition, when the first object corresponding to the object image information, the second object corresponding to the object sound information, and the third object corresponding to the object text information are all different from each other, the object display time corresponding to the first object, the second object The object display time corresponding to the object display time and the object display time corresponding to the third object are extracted, and the object with the largest number of object display times corresponding to each object from the object-related image can be determined as the finally recognized object. have.

예를 들어, 제1 객체에 대응하는 객체 표시 시간의 개수(예: 6번)가 제2 객체에 대응하는 객체 표시 시간의 개수(예: 3번)와 제3 객체에 대응하는 객체 표시 시간의 개수(예: 4번)보다 많은 경우, 제1 객체를 최종적으로 인식된 객체로 결정할 수 있다. For example, the number of object display times corresponding to the first object (eg, number 6) is the number of the object display times corresponding to the second object (eg, number 3) and the object display time corresponding to the third object. If more than the number (eg, 4 times), the first object may be determined as the finally recognized object.

도 13은 본 발명의 일 실시예에 따른 객체 인식 장치(1300)의 기능적 구성을 도시한 도면이다.13 is a diagram illustrating a functional configuration of an object recognition apparatus 1300 according to an embodiment of the present invention.

도 13을 참고하면, 객체 인식 장치(1300)는 통신부(1310), 제어부(1320), 표시부(1330), 입력부(1340) 및 저장부(1350)를 포함할 수 있다.Referring to FIG. 13, the object recognition apparatus 1300 may include a communication unit 1310, a control unit 1320, a display unit 1330, an input unit 1340, and a storage unit 1350.

통신부(1310)는 객체 관련 영상을 획득할 수 있다. The communication unit 1310 may acquire an object-related image.

일 실시예에서, 통신부(1310)는 유선 통신 모듈 및 무선 통신 모듈 중 적어도 하나를 포함할 수 있다. 통신부(1310)의 전부 또는 일부는 '송신부', '수신부' 또는 '송수신부(transceiver)'로 지칭될 수 있다.In an embodiment, the communication unit 1310 may include at least one of a wired communication module and a wireless communication module. All or part of the communication unit 1310 may be referred to as a'transmitter', a'receiver', or a'transceiver'.

제어부(1320)는 객체 관련 영상으로부터 객체 문자 정보를 추출하고, 객체 문자 정보를 객체 인식 딥러닝 모델에 입력하여, 상기 획득된 객체 관련 영상으로부터 객체, 객체 식별자 및 객체 표시 시간 중 적어도 하나를 인식할 수 있다. The controller 1320 extracts object text information from an object-related image, inputs object text information into an object recognition deep learning model, and recognizes at least one of an object, an object identifier, and an object display time from the acquired object-related image. I can.

일 실시예에서, 제어부(1320)는 뷰티관련 크리에이터 및 관련 영상을 수집하는 영상수집부(1322), 수집된 영상을 모아서 심화학습(Deep Learning)하고 기 학습한 학습데이터를 활용하여 신규 제품을 자동 태깅하여 학습하는 사물학습부(1324), 특정한 이미지를 제시했을때 학습된 제품 중에서 이 제품이 무엇인지를 구분해 내는 사물추출부(1326) 및 새로운 객체를 등록할 수 있도록 객체 이미지 정보, 객체 사운드 정보 및 객체 문자 정보를 저장하는 재사용부(1328)를 포함할 수 있다. In one embodiment, the control unit 1320 is an image collection unit 1322 that collects beauty-related creators and related images, deep learning by collecting the collected images, and automatically creates new products using the previously learned learning data. An object learning unit 1324 that learns by tagging, an object extraction unit 1326 that identifies what this product is among the products learned when a specific image is presented, and object image information and object sound to register a new object. It may include a reuse unit 1328 for storing information and object text information.

일 실시예에서, 제어부(1320)는 적어도 하나의 프로세서 또는 마이크로(micro) 프로세서를 포함하거나, 또는, 프로세서의 일부일 수 있다. 또한, 제어부(1320)는 CP(communication processor)라 지칭될 수 있다. 제어부(1320)는 본 발명의 다양한 실시예에 따른 객체 인식 장치(1300)의 동작을 제어할 수 있다. In an embodiment, the controller 1320 may include at least one processor or a micro processor, or may be a part of a processor. Also, the control unit 1320 may be referred to as a communication processor (CP). The controller 1320 may control the operation of the object recognition apparatus 1300 according to various embodiments of the present disclosure.

표시부(1330)는 객체, 객체 식별자 및 객체 표시 시간에 기반하여 객체 관련 영상을 디스플레이할 수 있다. 일 실시예에서, 표시부(1330)는 다수의 프레임 중, 객체 표시 시간에 대응하는 객체가 포함된 프레임을 디스플레이할 수 있다. The display unit 1330 may display an object-related image based on an object, an object identifier, and an object display time. In an embodiment, the display unit 1330 may display a frame including an object corresponding to an object display time among a plurality of frames.

일 실시예에서, 표시부(1330)는 객체 인식 장치(1300)에서 처리되는 정보를 나타낼 수 있다. 예를 들면, 표시부(1330)는 액정 디스플레이(LCD; Liquid Crystal Display), 발광 다이오드(LED; Light Emitting Diode) 디스플레이, 유기 발광 다이오드(OLED; Organic LED) 디스플레이, 마이크로 전자기계 시스템(MEMS; Micro Electro Mechanical Systems) 디스플레이 및 전자 종이(electronic paper) 디스플레이 중 적어도 어느 하나를 포함할 수 있다.In an embodiment, the display unit 1330 may display information processed by the object recognition apparatus 1300. For example, the display unit 1330 may include a liquid crystal display (LCD), a light emitting diode (LED) display, an organic light emitting diode (OLED) display, and a microelectromechanical system (MEMS). Mechanical Systems) display and electronic paper display.

입력부(1340) 객체 표시 시간에 대한 입력을 획득할 수 있다. 일 실시예에서, 입력부(1340)는 사용자에 의한 객체 표시 시간에 대한 입력을 획득할 수 있다. The input unit 1340 may obtain an input for an object display time. In an embodiment, the input unit 1340 may obtain an input for an object display time by a user.

저장부(1350)는 객체 인식 딥러닝 모델(210)의 학습파일, 객체 관련 영상, 객체, 객체 식별자 및 객체 표시 시간을 저장할 수 있다. The storage unit 1350 may store a training file of the object recognition deep learning model 210, an object-related image, an object, an object identifier, and an object display time.

일 실시예에서, 저장부(1350)는 휘발성 메모리, 비휘발성 메모리 또는 휘발성 메모리와 비휘발성 메모리의 조합으로 구성될 수 있다. 그리고, 저장부(1350)는 제어부(1320)의 요청에 따라 저장된 데이터를 제공할 수 있다.In an embodiment, the storage unit 1350 may be formed of a volatile memory, a nonvolatile memory, or a combination of a volatile memory and a nonvolatile memory. In addition, the storage unit 1350 may provide stored data according to the request of the control unit 1320.

도 13을 참고하면, 객체 인식 장치(1300)는 통신부(1310), 제어부(1320), 표시부(1330), 입력부(1340) 및 저장부(1350)를 포함할 수 있다. 본 발명의 다양한 실시 예들에서 객체 인식 장치(1300)는 도 13에 설명된 구성들이 필수적인 것은 아니어서, 도 13에 설명된 구성들보다 많은 구성들을 가지거나, 또는 그보다 적은 구성들을 가지는 것으로 구현될 수 있다.Referring to FIG. 13, the object recognition apparatus 1300 may include a communication unit 1310, a control unit 1320, a display unit 1330, an input unit 1340, and a storage unit 1350. In various embodiments of the present invention, the object recognition apparatus 1300 may be implemented as having more or fewer configurations than the configurations described in FIG. 13 because the configurations described in FIG. 13 are not essential. have.

본 발명에 따르면, 최초 수백개의 영상으로 수동으로 학습하고 학습한 데이터를 활용하여 다른 이미지를 자동으로 추출할 수 있도록 시스템을 구축하였다. According to the present invention, a system is constructed so that the first hundreds of images are manually learned and other images can be automatically extracted using the learned data.

또한, 본 발명에 따르면, 객체 이미지 정보를 집어넣으면 자동으로 태깅할 수 있는 것은 자동으로 태깅될 수 있도록 하였고, 자동으로 태깅되지 않은 것들을 따로 모아 태깅하도록 시스템을 구축하여, 사람의 수작업이 최소화될 수 있다.In addition, according to the present invention, when object image information is inserted, those that can be automatically tagged can be automatically tagged, and by constructing a system to separately collect and tag those that are not automatically tagged, human manual work can be minimized have.

또한, 본 발명에 따르면, 초기 데이터수집을 최소화 할 수 있도록 최초 소량의 데이터를 이용하여 학습하고 이 학습데이터를 활용하여 자동으로 이미지의 형태를 추출하여 학습데이터 만드는데 활용하고 이러한 과정을 반복시켜서 고품질의 학습데이터를 학습할 수 있다.In addition, according to the present invention, in order to minimize the initial data collection, the initial data is learned using a small amount of data, and the shape of the image is automatically extracted and used to create learning data. Learning data can be learned.

이상의 설명은 본 발명의 기술적 사상을 예시적으로 설명한 것에 불과한 것으로, 통상의 기술자라면 본 발명의 본질적인 특성이 벗어나지 않는 범위에서 다양한 변경 및 수정이 가능할 것이다.The above description is merely illustrative of the technical idea of the present invention, and those of ordinary skill in the art will be able to make various changes and modifications without departing from the essential characteristics of the present invention.

본 명세서에 개시된 다양한 실시예들은 순서에 관계없이 수행될 수 있으며, 동시에 또는 별도로 수행될 수 있다. Various embodiments disclosed herein may be performed in any order, and may be performed simultaneously or separately.

따라서, 본 명세서에 개시된 실시예들은 본 발명의 기술적 사상을 한정하기 위한 것이 아니라, 설명하기 위한 것이고, 이러한 실시예들에 의하여 본 발명의 범위가 한정되는 것은 아니다.Accordingly, the embodiments disclosed in the present specification are not intended to limit the technical idea of the present invention, but are intended to be described, and the scope of the present invention is not limited by these embodiments.

본 발명의 보호범위는 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 발명의 권리범위에 포함되는 것으로 이해되어야 한다.The scope of protection of the present invention should be interpreted by the claims, and all technical ideas within the scope equivalent thereto should be understood as being included in the scope of the present invention.

201: 객체 관련 영상
203: 프레임
210: 객체 인식 딥러닝 모델
220: 객체 식별자
1300: 객체 인식 장치
1310: 통신부
1320: 제어부
1322: 영상수집부
1324: 사물학습부
1326: 사물추출부
1328: 재사용부
1330: 표시부
1340: 입력부
1350: 저장부201: object related video
203: frame
210: object recognition deep learning model
220: object identifier
1300: object recognition device
1310: Ministry of Communications
1320: control unit
1322: image collection unit
1324: Object Learning Department
1326: object extraction unit
1328: reuse department
1330: display
1340: input
1350: storage

Claims

(a) obtaining an object-related image;
(b) extracting object text information from the object-related image; And
(c) recognizing at least one of an object, an object identifier, and an object display time from the acquired object-related image by inputting the object text information into an object recognition deep learning model;
Containing,
Object recognition method.

The method of claim 1,
The step (b),
Extracting at least one of object image information, object sound information, and object text information from the object-related image;
Containing,
Object recognition method.

The method of claim 2,
The step (c),
Inputting at least one of the object image information, object sound information, and object text information into an object recognition deep learning model to recognize at least one of the object, an object identifier, and an object display time from the object-related image;
Containing,
Object recognition method.

The method of claim 2,
The step (c),
If the object image information and the training image information generated by the object recognition deep learning model do not match, determining whether the object sound information and the training sound information generated by the object recognition deep learning model match;
If the object sound information and the learning sound information do not match, determining whether the object text information and the learning text information generated by the object recognition deep learning model match; And
If the object text information and the learning text information do not match, storing the object image information, object sound information, and object text information in a learning file of the object recognition deep learning model;
Containing,
Object recognition method.

The method of claim 2,
The step (c),
If the object image information and the training image information generated by the object recognition deep learning model do not match, determining whether the object text information and the training text information generated by the object recognition deep learning model match;
Determining whether the object sound information and the learning sound information generated by the object recognition deep learning model match when the object text information and the learning text information do not match; And
If the object sound information and the learning sound information do not match, storing the object image information, object sound information, and object text information in a learning file of the object recognition deep learning model;
Containing,
Object recognition method.

The method of claim 2,
The step (c),
If the object sound information and the learning sound information generated by the object recognition deep learning model do not match, determining whether the object image information and the training image information generated by the object recognition deep learning model match;
If the object image information and the training image information do not match, determining whether the object text information and the learning text information generated by the object recognition deep learning model match; And
If the object text information and the learning text information do not match, storing the object image information, object sound information, and object text information in a learning file of the object recognition deep learning model;
Containing,
Object recognition method.

The method of claim 2,
The step (c),
If the object sound information and the learning sound information generated by the object recognition deep learning model do not match, determining whether the object text information matches the learning text information generated by the object recognition deep learning model;
If the object text information and the learning text information do not match, determining whether the object image information and the learning image information generated by the object recognition deep learning model match; And
If the object image information and the training image information do not match, storing the object image information, object sound information, and object text information in a training file of the object recognition deep learning model;
Containing,
Object recognition method.

The method of claim 2,
The step (c),
If the object text information and the learning text information generated by the object recognition deep learning model do not match, determining whether the object image information and the training image information generated by the object recognition deep learning model match;
Determining whether the object sound information and the learning sound information generated by the object recognition deep learning model match when the object image information and the training image information do not match; And
If the object sound information and the learning sound information do not match, storing the object image information, object sound information, and object text information in a learning file of the object recognition deep learning model;
Containing,
Object recognition method.

The method of claim 2,
The step (c),
If the object text information and the learning text information generated by the object recognition deep learning model do not match, determining whether the object sound information matches the learning sound information generated by the object recognition deep learning model;
Determining whether the object image information and the learning image information generated by the object recognition deep learning model match when the object sound information and the learning sound information do not match; And
If the object image information and the training image information do not match, storing the object image information, object sound information, and object text information in a training file of the object recognition deep learning model;
Containing,
Object recognition method.

The method of claim 2,
The step (c),
The object image information and the learning image information generated by the object recognition deep learning model match, the object sound information and the learning sound information generated by the object recognition deep learning model match, or the object text information and the Recognizing the object, object identifier, and object display time corresponding to at least one of the object image information, object sound information, and object text information when the learning text information generated by the object recognition deep learning model matches;
Containing,
Object recognition method.

A communication unit that obtains an object-related image; And
Extracting object text information from the object-related image,
A controller configured to recognize at least one of an object, an object identifier, and an object display time from the acquired object-related image by inputting the object text information into an object recognition deep learning model;
Containing,
Object recognition device.

The method of claim 11,
The control unit,
Extracting at least one of object image information, object sound information, and object text information from the object-related image,
Object recognition device.

The method of claim 12,
The control unit,
Inputting at least one of the object image information, object sound information, and object text information into an object recognition deep learning model to recognize at least one of the object, object identifier, and object display time from the object-related image,
Object recognition device.

The method of claim 12,
The control unit,
When the object image information and the learning image information generated by the object recognition deep learning model do not match, it is determined whether the object sound information and the learning sound information generated by the object recognition deep learning model match,
When the object sound information and the learning sound information do not match, it is determined whether the object text information and the learning text information generated by the object recognition deep learning model match,
If the object text information and the learning text information do not match, storing the object image information, object sound information, and object text information in a learning file of the object recognition deep learning model,
Object recognition device.

The method of claim 12,
The control unit,
When the object image information and the learning image information generated by the object recognition deep learning model do not match, it is determined whether the object text information and the learning text information generated by the object recognition deep learning model match,
If the object text information and the learning text information do not match, it is determined whether the object sound information and the learning sound information generated by the object recognition deep learning model match,
If the object sound information and the learning sound information do not match, storing the object image information, object sound information, and object text information in a learning file of the object recognition deep learning model,
Object recognition device.

The method of claim 12,
The control unit,
When the object sound information and the learning sound information generated by the object recognition deep learning model do not match, it is determined whether the object image information and the training image information generated by the object recognition deep learning model match,
When the object image information and the training image information do not match, it is determined whether the object text information and the learning text information generated by the object recognition deep learning model match,
If the object text information and the learning text information do not match, storing the object image information, object sound information, and object text information in a learning file of the object recognition deep learning model,
Object recognition device.

The method of claim 12,
The control unit,
When the object sound information and the learning sound information generated by the object recognition deep learning model do not match, it is determined whether the object text information and the learning text information generated by the object recognition deep learning model match,
When the object text information and the learning text information do not match, it is determined whether the object image information and the learning image information generated by the object recognition deep learning model match,
If the object image information and the training image information do not match, storing the object image information, object sound information, and object text information in a training file of the object recognition deep learning model,
Object recognition device.

The method of claim 12,
The control unit,
When the object text information and the learning text information generated by the object recognition deep learning model do not match, it is determined whether the object image information and the training image information generated by the object recognition deep learning model match,
When the object image information and the training image information do not match, it is determined whether the object sound information and the learning sound information generated by the object recognition deep learning model match,
If the object sound information and the learning sound information do not match, storing the object image information, object sound information, and object text information in a learning file of the object recognition deep learning model,
Object recognition device.

The method of claim 12,
The control unit,
When the object text information and the learning text information generated by the object recognition deep learning model do not match, it is determined whether the object sound information and the learning sound information generated by the object recognition deep learning model match,
When the object sound information and the learning sound information do not match, it is determined whether the object image information and the learning image information generated by the object recognition deep learning model match,
If the object image information and the training image information do not match, storing the object image information, object sound information, and object text information in a training file of the object recognition deep learning model,
Object recognition device.

The method of claim 12,
The control unit,
The object image information and the learning image information generated by the object recognition deep learning model coincide, the object sound information and the learning sound information generated by the object recognition deep learning model match, or the object text information and the Recognizing the object, object identifier, and object display time corresponding to at least one of the object image information, object sound information, and object text information when the learning text information generated by the object recognition deep learning model matches,
Object recognition device.