KR101945487B1

KR101945487B1 - Apparatus and method for recognizing object included in video data

Info

Publication number: KR101945487B1
Application number: KR1020170098066A
Authority: KR
Inventors: 김봉모; 임재균; 조영관
Original assignee: 에스케이텔레콤 주식회사
Priority date: 2017-08-02
Filing date: 2017-08-02
Publication date: 2019-02-07

Abstract

According to an embodiment of the present invention, a method for recognizing an object included in a video comprises the steps of: extracting a plurality of stationary images from a video composed of the plurality of stationary images obtained by photographing in a chronological order; determining a type of an object included in each of the plurality of stationary images for each of the plurality of stationary images; determining, as an unidentified image, a stationary image in which a type of an included object is not determined, among the plurality of stationary images; and estimating the type of the object included in the unidentified image based on a type of an object included in each of a first number of stationary images selected in a chronological order close to the unidentified image from the stationary image.

Description

[0001] APPARATUS AND METHOD FOR RECOGNIZING OBJECT INCLUDED IN VIDEO DATA [0002]

본 발명은 동영상을 구성하는 정지 영상 중 포함하고 있는 객체의 종류가 명확히 판단되지 않는 정지 영상에 대해, 포함하고 있는 객체의 종류를 높은 확률로 추정하기 위한 장치 및 방법에 관한 것이다. The present invention relates to an apparatus and a method for estimating a type of an object included in a still image constituting a moving image with a high probability with respect to a still image whose type is not clearly determined.

영상 데이터에 포함된 객체의 인식을 위한 영상 인식 기술은, 높은 수요와 그에 따른 지속적인 연구에 힘입어 지금까지 발전을 거듭해 왔다. 이와 같은 영상 인식 기술은, 얼굴 인식을 이용한 보안 시스템, 사용자의 제스처를 인지하는 입력 인터페이스 등 다양한 분야에서 응용이 가능하다.Image recognition technology for recognizing objects included in image data has been developed so far due to high demand and continuous research. Such image recognition technology can be applied to various fields such as a security system using face recognition and an input interface for recognizing a user's gesture.

영상 인식 기술에 있어서 가장 중요한 것은 인식의 정확도를 일정 수준 이상 확보하는 것이라 할 수 있다. 현재 하나의 정지 영상에 포함된 객체를 인식하는 기술은 일반적으로 좋은 성능을 보여주고 있다. 하지만 복수의 정지 영상으로 구성되는 동영상의 경우, 객체의 움직임에 의한 모션 블러(motion blur) 혹은 조도와 같은 객체 주변 환경 등의 영향으로 인해, 포함하는 객체를 정확히 인식하기 어려운 정지 영상이 발생할 수 있다. 이에 따라, 동영상을 구성하는 모든 정지 영상에 대한 인식의 신뢰도가 충분히 담보되기 어렵다.The most important thing in the image recognition technology is to secure the recognition accuracy to a certain level or more. Currently, the technique of recognizing objects included in one still image generally shows good performance. However, in the case of a moving image composed of a plurality of still images, a still image that is difficult to accurately recognize an included object may be generated due to the influence of an object surrounding environment such as motion blur or illumination due to movement of the object . Accordingly, the reliability of recognition of all the still images constituting the moving picture is hardly guaranteed.

실제 상황에서 영상 인식의 목적을 달성하기 위해서는, 동영상을 구성하는 모든 정지 영상에 대해 정확한 객체 인식이 수행되어야 하므로, 인식의 어려움이 있는 정지 영상에 대해서도 포함하고 있는 객체를 높은 정확도로 알아내야 할 필요가 있다.In order to achieve the purpose of image recognition in real situations, accurate object recognition must be performed on all the still images constituting a moving image. Therefore, it is necessary to find out objects including the still images having difficulty in recognition with high accuracy .

한국공개특허공보, 제 10-2008-0085182 호 (2008.09.23. 공개)Korean Unexamined Patent Publication No. 10-2008-0085182 (published on September 23, 2008)

본 발명이 해결하고자 하는 과제는, 동영상에 포함된 복수의 정지 영상에 있어서, 통상적인 방법으로는 정확한 인식이 어려운 정지 영상에 포함된 객체의 종류를 높은 정확도로 알아내기 위한 장치 및 방법을 제공하는 것이다.An object of the present invention is to provide an apparatus and method for finding a kind of an object included in a still image that is difficult to accurately recognize in a plurality of still images included in a moving image with high accuracy will be.

다만, 본 발명이 해결하고자 하는 과제는 이상에서 언급한 것으로 제한되지 않으며, 언급되지 않은 또 다른 해결하고자 하는 과제는 아래의 기재로부터 본 발명이 속하는 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.It is to be understood, however, that the present invention is not limited to the above-mentioned embodiments, but other and further objects which need not be mentioned can be clearly understood by those skilled in the art from the following description will be.

본 발명의 일 실시예에 따른, 동영상에 포함된 객체를 인식하는 방법은, 시간 순서대로 촬영하여 획득된 복수의 정지 영상으로 구성되는 동영상으로부터, 상기 복수의 정지 영상을 추출하는 단계, 상기 각각의 정지 영상 내에 포함된 객체의 종류를, 상기 각각의 정지 영상에 대해 판단하는 단계, 상기 복수의 정지 영상 중, 포함된 객체의 종류가 판단되지 않는 정지 영상을 미확인 영상으로서 결정하는 단계 및 상기 정지 영상 중 상기 미확인 영상과 시간적으로 가까운 순으로 선택된 제 1 개수의 정지 영상 각각에 포함된 객체의 종류에 기초하여, 상기 미확인 영상 내에 포함된 객체의 종류를 추정하는 단계를 포함할 수 있다.A method of recognizing an object included in a moving image according to an embodiment of the present invention includes extracting the plurality of still images from a moving image composed of a plurality of still images obtained by photographing in time sequence, Determining a type of an object included in a still image for each of the still images, determining a still image in which the type of the included object is not determined among the plurality of still images as an unidentified image, And estimating a type of the object included in the unidentified image based on the type of the object included in each of the first number of still images selected in descending order of time from the unidentified image.

본 발명의 일 실시예에 따른, 동영상에 포함된 객체를 인식하는 장치는, 시간 순서대로 촬영하여 획득된 복수의 정지 영상으로 구성되는 동영상으로부터, 상기 복수의 정지 영상을 추출하는 정지 영상 추출부, 상기 각각의 정지 영상 내에 포함된 객체의 종류를, 상기 각각의 정지 영상에 대해 판단하고, 상기 복수의 정지 영상 중, 포함된 객체의 종류가 판단되지 않는 정지 영상을 미확인 영상으로서 결정하는 객체 판단부 및 상기 정지 영상 중 상기 미확인 영상과 시간적으로 가까운 순으로 선택된 제 1 개수의 정지 영상 각각에 포함된 객체의 종류에 기초하여, 상기 미확인 영상 내에 포함된 객체의 종류를 추정하는 추정부를 포함할 수 있다.An apparatus for recognizing an object included in a moving image according to an embodiment of the present invention includes a still image extracting unit for extracting the plurality of still images from a moving image composed of a plurality of still images obtained by photographing in time sequence, An object determination unit for determining a type of an object included in each of the still images with respect to each of the still images and determining a still image from which the type of the included object is not determined as an unidentified image, And an estimation unit estimating a type of an object included in the unidentified image based on a type of the object included in each of the first number of still images selected in descending order of time from the unidentified image of the still image .

본 발명의 일 실시예에 따르면, 동영상을 구성하는 정지 영상 중 포함된 객체의 종류가 판단되지 않는 미확인 영상에 대해, 상기 미확인 영상에 시간적으로 가까이 존재하는 정지 영상들로부터의 정보에 기초하여, 상기 미확인 영상이 포함하고 있다고 생각되는 객체의 종류를 높은 정확도로 추정할 수 있다. 이로써 동영상 전체에 대한 인식의 신뢰도가 높아질 수 있으며, 궁극적으로는 영상 인식을 기반으로 하는 각종 서비스를 사용자에게 높은 품질로 제공할 수 있다. According to an embodiment of the present invention, there is provided an image processing method for an unidentified image in which the type of an included object is not determined among still images constituting a moving image, based on information from still images temporally close to the unidentified image, It is possible to estimate the kinds of objects considered to contain the unidentified image with high accuracy. As a result, the reliability of the recognition of the entire video can be enhanced, and ultimately, various services based on image recognition can be provided to the users with high quality.

도 1은 본 발명의 일 실시예에 따른 동영상에 포함된 객체를 인식하는 장치의 구성을 도시한 도면이다.
도 2는 본 발명의 일 실시예에 따른 동영상에 포함된 객체를 인식하는 방법의 각 단계를 도시한 도면이다.
도 3은 본 발명의 일 실시예에 따른 동영상에 포함된 객체를 인식하기 위한 방법의 세부 사항을 구체적으로 설명하기 위한 도면이다.1 is a block diagram illustrating an apparatus for recognizing an object included in a moving picture according to an exemplary embodiment of the present invention.
FIG. 2 is a diagram illustrating each step of a method for recognizing an object included in a moving picture according to an embodiment of the present invention.
3 is a diagram for explaining details of a method for recognizing an object included in a moving picture according to an embodiment of the present invention.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하고, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다.BRIEF DESCRIPTION OF THE DRAWINGS The advantages and features of the present invention, and the manner of achieving them, will be apparent from and elucidated with reference to the embodiments described hereinafter in conjunction with the accompanying drawings. The present invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art. To fully disclose the scope of the invention to those skilled in the art, and the invention is only defined by the scope of the claims.

본 발명의 실시예들을 설명함에 있어서 공지 기능 또는 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략할 것이다. 그리고 후술되는 용어들은 본 발명의 실시예에서의 기능을 고려하여 정의된 용어들로서 이는 사용자, 운용자의 의도 또는 관례 등에 따라 달라질 수 있다. 그러므로 그 정의는 본 명세서 전반에 걸친 내용을 토대로 내려져야 할 것이다.In the following description of the present invention, a detailed description of known functions and configurations incorporated herein will be omitted when it may make the subject matter of the present invention rather unclear. The following terms are defined in consideration of the functions in the embodiments of the present invention, which may vary depending on the intention of the user, the intention or the custom of the operator. Therefore, the definition should be based on the contents throughout this specification.

도 1은 본 발명의 일 실시예에 따른 동영상에 포함된 객체를 인식하는 장치의 구성을 도시한 도면이다. 도 1의 객체 인식 장치(100)는 입력부(110), 영상 추출부(120), 객체 판단부(130), 추정부(140), 보정부(150), 출력부(160) 및 데이터베이스(170)를 포함할 수 있다. 다만, 도 1의 객체 인식 장치(100)의 구성 요소는 본 발명의 일 실시예에 불과하므로, 도 1에 의해 본 발명의 사상이 한정 해석되는 것은 아니다. 1 is a block diagram illustrating an apparatus for recognizing an object included in a moving picture according to an exemplary embodiment of the present invention. The object recognition apparatus 100 includes an input unit 110, an image extraction unit 120, an object determination unit 130, an estimation unit 140, a correction unit 150, an output unit 160, ). However, since the components of the object recognition apparatus 100 of FIG. 1 are only one embodiment of the present invention, the concept of the present invention is not limited to FIG.

입력부(110)는 객체 인식 장치(100)의 인식의 대상이 되는 동영상을 입력받을 수 있다. 이와 같은 입력 기능의 수행을 위해, 입력부(110)는 키보드(keyboard), 마우스(mouse) 등의 입력 장치, 데이터의 수신을 위한 데이터 버스, 혹은 유/무선 통신 모듈 등을 통해 구현될 수 있다.The input unit 110 may receive a moving image to be recognized by the object recognition apparatus 100. [ In order to perform such an input function, the input unit 110 may be implemented by an input device such as a keyboard and a mouse, a data bus for receiving data, or a wired / wireless communication module.

영상 추출부(120)는 입력부(110)를 통해 입력된 동영상으로부터, 상기 동영상을 구성하는 복수의 정지 영상 각각을 추출할 수 있다. 즉, 상기 각각의 정지 영상은 상기 동영상의 복수의 프레임(frame) 각각에 대응되는 영상이라 볼 수 있으며, 시간적으로 일정한 간격을 두고 촬영된 영상인 것이 일반적이다. 영상 추출부(120)는 상기 정지 영상의 추출을 수행하면서, 각 정지 영상이 동영상 내에서 등장하는 순서에 대한 정보 역시 함께 추출할 수 있다. 이와 같은 영상 추출부(120)는 후술할 객체 판단부(130), 추정부(140), 보정부(150) 및 출력부(160)와 함께 마이크로프로세서(microprocessor)를 포함하는 연산 장치에 의해 구현될 수 있다.The image extracting unit 120 may extract a plurality of still images constituting the moving image from the moving image input through the input unit 110. [ That is, each of the still images may be regarded as an image corresponding to each of a plurality of frames of the moving image, and is generally a video captured at a predetermined interval in time. The image extracting unit 120 extracts the still images and extracts information on the order in which the still images appear in the moving image. The image extracting unit 120 may be implemented by an arithmetic unit including a microprocessor together with an object determining unit 130, an estimating unit 140, a correcting unit 150 and an output unit 160 to be described later. .

객체 판단부(130)는 각각의 정지 영상 내에 포함된 객체의 종류와, 해당 객체가 정지 영상 내에서 존재하는 위치를 판단할 수 있다. 한편, 정지 영상 중에서는 포함하고 있는 객체의 종류가 용이하게 판단되지 않는 정지 영상이 있을 수 있는데, 객체 판단부(130)는 이들 정지 영상을 미확인 영상으로 분류할 수 있다.The object determining unit 130 may determine the type of object included in each still image and the position in which the object exists in the still image. On the other hand, among the still images, there may be still images in which the types of objects included therein are not easily determined. The object determination unit 130 may classify the still images into unidentified images.

추정부(140)는 객체 판단부(130)가 분류한 미확인 영상이 포함하고 있는 객체의 종류를 추정할 수 있다. 이와 같은 추정은, 미확인 영상과 시간적으로 가까운 다른 정지 영상들에 포함된 객체의 종류 및 해당 객체의 위치 등의 정보를 이용하여 수행될 수 있으며, 자세한 추정의 방법에 대해서는 후술한다.The estimator 140 may estimate the type of object included in the unidentified image classified by the object determiner 130. [ Such estimation can be performed using information such as the type of the object included in the still images other than the unidentified image and the temporally different still images and the position of the corresponding object, and a detailed estimation method will be described later.

보정부(150)는 포함된 객체의 종류가 객체 판단부(130)에 의해 판단된 정지 영상(이하에서는 미확인 영상과 구별하는 의미에서, 편의상 "확인 영상"이라 칭하도록 함) 및 포함된 객체의 종류가 추정부(140)에 의해 추정된 미확인 영상에 대해, 상기 판단 혹은 추정의 결과를 보정할 수 있다. 즉, 보정부(150)는 객체 판단부(130)와 추정부(140)에 의해 도출된 인식 결과의 정확성을 보다 향상시키는 역할을 수행할 수 있다.The correction unit 150 determines whether the type of the included object is a still image determined by the object determination unit 130 (hereinafter referred to as a "confirmation image" for the sake of simplicity in the sense of distinguishing it from an unidentified image) The result of the determination or estimation can be corrected for the unidentified image estimated by the type estimating unit 140. [ That is, the correcting unit 150 can improve the accuracy of the recognition result derived by the object determining unit 130 and the estimating unit 140.

출력부(160)는 동영상에 포함된 각 정지 영상이 포함하는 객체의 종류 및 위치를 판단 혹은 추정한 결과를 출력할 수 있다. 출력의 형태는 시각적인 형태의 출력일 수도 있고, 객체 인식 장치(100)의 외부에 대한 전자적 형태의 출력일 수도 있다. 이를 위해 출력부(160)는 디스플레이(display)와 같은 시각적 출력 장치 혹은 스피커(speaker)와 같은 청각적 출력 장치를 포함할 수 있으며, 경우에 따라서는 객체 인식 장치(100) 외부로 데이터를 전송하기 위한 데이터 버스 혹은 유/무선 통신 모듈 등을 포함할 수 있다. The output unit 160 may output a result of determining or estimating the type and position of an object included in each still image included in the moving image. The output may be in the form of a visual output or may be in the form of an electronic output to the outside of the object recognition apparatus 100. The output unit 160 may include a visual output device such as a display or an auditory output device such as a speaker and may transmit data to the outside of the object recognition apparatus 100 A data bus for data communication, a wired / wireless communication module, and the like.

데이터베이스(170)는 객체 인식 장치(100)가 필요로 하는 정보를 저장할 수 있다. 예컨대, 데이터베이스(170)는 영상 추출부(120)에 의해 추출된 각 정지 영상 및 이들의 시간적 순서, 객체 판단부(130)의 판단 결과 및 추정부(140)의 추정 결과, 보정부(150)의 보정을 거친 객체 인식의 최종 결과 등을 저장할 수 있다. 객체 판단부(130)가 정지 영상에 포함된 객체의 종류를 판단하기 위해 이용할 학습 모델 역시 데이터베이스(170)에 저장될 수 있다. 물론 데이터베이스(170)가 저장하는 정보는 위와 같은 예시로 한정되는 것은 아니고, 객체 인식 장치(100)의 동작에 필요한 정보라면 어떤 것이든 될 수 있다. The database 170 may store the information required by the object recognizing apparatus 100. For example, the database 170 stores the still images extracted by the image extracting unit 120, their temporal order, the determination result of the object determination unit 130, and the estimation result of the estimation unit 140, And the final result of the object recognition after the correction of the object. A learning model used by the object determining unit 130 to determine the type of the object included in the still image may also be stored in the database 170. [ Of course, the information stored in the database 170 is not limited to the above example, and may be any information required for the operation of the object recognition apparatus 100. [

데이터베이스(170)는 구체적으로 컴퓨터 판독 기록 매체로서 구현될 수 있으며, 이러한 컴퓨터 판독 기록 매체의 예로는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 플래시 메모리(flash memory)와 같은 프로그램 명령어들을 저장하고 수행하도록 특별히 구성된 하드웨어 장치를 들 수 있다.The database 170 may be embodied as a computer readable medium, and examples of such computer readable media include magnetic media such as hard disks, floppy disks and magnetic tape, optical media such as CD-ROMs, DVDs, A magneto-optical media such as a floppy disk, a magneto-optical media such as a floppy disk, a magneto-optical media such as a floppy disk, a magneto-optical media such as a floppy disk, a magneto-optical media such as a floppy disk, a flash memory, .

도 2는 본 발명의 일 실시예에 따른 동영상에 포함된 객체를 인식하는 방법의 각 단계를 도시한 도면이다. 도 2의 객체 인식 방법은 도 1을 참조하여 설명한 객체 인식 장치(100)에 의해 수행될 수 있다. 단, 도 2에 도시된 방법은 본 발명의 일 실시예에 불과하므로 도 2에 의해 본 발명의 사상이 한정 해석되는 것은 아니며, 도 2에 도시된 방법의 각 단계는 경우에 따라 도면에 도시된 바와 그 순서를 달리하여 수행될 수 있음은 물론이다.FIG. 2 is a diagram illustrating each step of a method for recognizing an object included in a moving picture according to an embodiment of the present invention. The object recognizing method of FIG. 2 can be performed by the object recognizing apparatus 100 described with reference to FIG. However, since the method shown in FIG. 2 is only an embodiment of the present invention, the concept of the present invention is not limited to FIG. 2, and each step of the method shown in FIG. And may be performed in a different order of the bar and the order.

우선, 입력부(110)가 객체 인식의 대상이 되는 동영상을 획득할 수 있다(S110). 그러면, 영상 추출부(120)는 상기 동영상으로부터 정지 영상을 추출할 수 있으며(S120), 객체 판단부(130)는 각 정지 영상에 포함된 객체의 종류 및 위치를 판단할 수 있다(S130).First, the input unit 110 can acquire a moving image to be object recognized (S110). In operation S120, the image extracting unit 120 may extract a still image from the moving image. In operation S120, the object determining unit 130 may determine the type and position of the object included in each still image.

이하에서는 객체 판단부(130)가 정지 영상에 포함된 객체의 종류를 판단하는 방법에 대해 보다 자세히 설명하도록 한다. 임의의 정지 영상에 포함된 객체의 종류를 판단함에 있어, 다양한 방식이 객체 판단부(130)에 의해 적용될 수 있지만, 한 예로서 기계학습(machine learning)을 통해 구축된 학습 모델을 이용하는 방법을 생각해볼 수 있다.Hereinafter, a method for the object determining unit 130 to determine the type of object included in the still image will be described in more detail. In determining the types of objects included in a still image, various methods may be applied by the object determination unit 130. However, as an example, a method of using a learning model constructed through machine learning may be considered You can try.

기계학습에 의해 생성된 학습 모델은, 복수의 학습 영상을 이용하여 생성될 수 있다. 상기 복수의 학습 영상 각각은 정적 이미지를 포함하는 영상일 수 있으며, 특정 객체의 이미지를 포함할 수 있다. 이와 같은 복수의 학습 영상이 포함하는 객체를 샘플 객체라 칭하도록 한다. A learning model generated by machine learning can be generated using a plurality of learning images. Each of the plurality of learning images may be an image including a static image, and may include an image of a specific object. An object included in such a plurality of learning images is referred to as a sample object.

상기 학습 모델에 의하면, 학습 영상 각각으로부터 추출된 특징 정보와, 상기 학습 영상 각각이 포함하는 샘플 객체 간의 관계가 정의될 수 있다. 이와 같은 학습 모델은 CNN(convolution neural network), DNN(deep neural network) 등의 다양한 머신 러닝 혹은 딥 러닝(deep learning) 기법을 통해 생성될 수 있으며, 반드시 상기 예시된 방법에 의해서만 생성되도록 제한되는 것은 아니다. 객체 판단부(130)는 미리 마련된 학습 영상으로부터 이와 같은 학습 모델을 직접 생성할 수도 있지만, 기 생성되어 데이터베이스(170)에 저장된 학습 모델을 이용할 수도 있다.According to the learning model, the relationship between the feature information extracted from each learning image and the sample object included in each of the learning images can be defined. Such a learning model can be generated through various machine learning or deep learning techniques such as CNN (convolution neural network) and DNN (deep neural network), and is limited to be generated only by the above-described method no. The object determination unit 130 may directly generate such a learning model from a learning image prepared in advance, but may use a learning model that is generated and stored in the database 170. [

상기 학습 모델을 이용하여, 객체 판단부(130)는 인식 대상이 되는 동영상으로부터 추출된 각각의 정지 영상에 포함된 객체의 종류를 판단할 수 있다. 예컨대 객체 판단부(130)는, 판단 대상이 되는 특정한 정지 영상에 표시된 형상과, 학습 모델에 저장된 각 샘플 객체의 형상 간의 유사도를 산출할 수 있다. 그리고 객체 판단부(130)는 상기 샘플 객체 중 상기 유사도의 값이 가장 큰 샘플 객체를, 상기 특정한 정지 영상에 포함된 객체라고 판단할 수 있으며, 상기 특정한 정지 영상 내에서 해당 객체의 형상이 존재하는 영역의 위치 역시 판단할 수 있다.Using the learning model, the object determination unit 130 can determine the type of object included in each still image extracted from the moving image to be recognized. For example, the object determining unit 130 can calculate the similarity between the shape displayed on the specific still image to be determined and the shape of each sample object stored in the learning model. Then, the object determining unit 130 may determine that the sample object having the largest value of the similarity among the sample objects is an object included in the specific still image, and if the shape of the corresponding object exists in the specific still image The location of the region can also be determined.

다만, 상기 유사도의 값이 가장 큰 샘플 객체라 하더라도, 유사도의 값 자체가 일정 수준에 미달한다면 상기 특정한 정지 영상에 포함된 객체라고 보기는 어려울 수 있다. 이에, 객체 판단부(130)는 각 정지 영상에 대하여, 정지 영상에 표시된 형상과, 정지 영상에 포함되어 있다고 판단되는 객체의 형상 간의 유사도의 값이 소정의 임계 유사도 미만인 정지 영상을 "미확인 영상"으로 판정할 수 있다. However, even if the value of the similarity value is the largest, it may be difficult to view the object included in the specific still image if the value of the similarity value itself is less than a certain level. The object determining unit 130 determines whether the still image having the similarity value between the shape of the still image and the shape of the object included in the still image is smaller than the predetermined threshold similarity, .

정지 영상 중 상기 미확인 영상 외의 확인 영상에 대해서는, 블러(blur) 수치를 구하는 과정이 수행될 수 있다. 블러 수치는 정지 영상에 포함된 객체가 얼마나 흐릿하게 표현되어 있는가를 나타내는 수치일 수 있다. 촬영 과정에서의 객체의 움직임 등으로 인해 정지 영상에 표현된 객체는 흐릿하게 표현될 수 있으며, 이와 같이 흐릿하게 표현된 객체를 포함하는 정지 영상에 대해서는 객체 인식의 결과가 부정확할 수 있다. 예컨대, 사람의 손가락 모양이 승리를 뜻하는 "V" 모양에서 OK를 뜻하는 "O" 모양으로 전환되는 순간을 촬영한 정지 영상은, 전환 동작에 의해 흐릿하게 촬영된 손가락을 포함하고 있을 가능성이 크며, 손가락 모양 역시 "V"와 "O"의 중간 정도일 가능성이 크므로, 객체 판단부(130)에서 상기 정지 영상에 포함된 객체의 종류를 "V" 모양의 손가락과 "O" 모양의 손가락 중 어느 하나로 판단했다 하더라도 그 결과가 정확하다는 보장을 하기가 어렵다.For a confirmation image other than the unidentified image among the still images, a process of obtaining a blur value may be performed. The blur value may be a number indicating how blurry the object contained in the still image is. The object represented in the still image may be blurred due to the movement of the object in the shooting process and the result of object recognition may be incorrect for the still image including the blurred object. For example, it is likely that a still image captured at the moment when a finger shape of a person changes from a "V" shape representing a victory to an "O " representing an OK includes fingers photographed blurred by a switching operation And the finger shape is also likely to be intermediate between "V" and "O", the object determination unit 130 determines the type of the object included in the still image as a "V" shaped finger and an "O" shaped finger It is difficult to guarantee that the results are accurate.

이에, 객체 판단부(130)는 정지 영상 중 확인 영상 각각에 대해 블러 수치를 구하고, 블러 수치가 소정의 임계치 이상인 확인 영상은 미확인 영상이 되도록 분류할 수 있다. 이와 같은 정지 영상의 블러 수치는, 정지 영상 내에서 객체의 형상이 존재한다고 판단된 영역에 대한 주파수 분석 결과에 기초하여 산출될 수 있다. 정지 영상에 객체의 형상이 선명하게 표현되어 있을수록 해당 정지 영상에 대한 주파수 스펙트럼에서 고주파 성분이 차지하는 비중이 높으므로, 이와 같은 원리를 이용하여 객체 판단부(130)는 각 확인 영상의 블러 수치를 구할 수 있다.Accordingly, the object determining unit 130 may obtain a blur value for each of the confirmed images in the still image, and classify the confirmed image whose blur value is equal to or greater than the predetermined threshold value to be an unidentified image. The blur value of the still image can be calculated based on the frequency analysis result of the region in which the shape of the object is determined to exist in the still image. As the shape of the object is clearly represented in the still image, the specific gravity of the high frequency component occupies a high frequency in the frequency spectrum of the still image. Therefore, the object determination unit 130 uses the principle to calculate the blur value Can be obtained.

블러 수치를 구하기 위한 구체적인 방법의 한 예를 소개하면 다음과 같다. 본 예에 의하면 라플라시안(Laplacian)을 이용하여 정지 영상의 블러 수치를 계산할 수 있다. 객체 판단부(130)는, 정지 영상 내에서 객체의 형상이 존재한다고 판단되는 영역에 속하는 모든 픽셀 혹은 일부의 선택된 픽셀에 대해, 소정의 크기를 갖는 블록을 이용하여 라플라시안 값을 계산할 수 있다. 아래의 수학식 1은 3x3 크기를 갖는 블록을 이용하여 라플라시안 값을 계산하기 위한 식이다.An example of a concrete method for obtaining the blur value is as follows. According to this example, a blur value of a still image can be calculated using Laplacian. The object determining unit 130 may calculate a Laplacian value using a block having a predetermined size for all or some selected pixels belonging to an area where the shape of the object is determined to exist in the still image. Equation (1) below is an equation for calculating a laplacian value using a block having a size of 3x3.

상기 수학식 1에서, G(x,y)는 (x,y)라는 좌표에 해당하는 위치에 존재하는 임의의 픽셀의 라플라시안 값을 의미한다. I(x,y)는 (x,y)에 존재하는 픽셀의 값을 의미한다. 즉, 상기 수학식 1에 의해 구해지는 라플라시안 값은, (x,y)에 존재하는 픽셀과, 상기 픽셀에 인접한 픽셀들 간의 값의 차이가 어느 정도인지를 나타내는 값이라 할 수 있다.In Equation (1), G (x, y) denotes a Laplacian value of an arbitrary pixel existing at a position corresponding to the coordinates (x, y). I (x, y) means a value of a pixel existing at (x, y). That is, the Laplacian value obtained by Equation (1) may be a value indicating the degree of difference between the pixel existing at (x, y) and the pixel adjacent to the pixel.

객체 판단부(130)는, 상기 라플라시안의 값을, 정지 영상 내에서 객체의 형상이 존재한다고 판단되는 영역의 각 픽셀에 대해 구한 후, 이들 라플라시안 값들의 분산(variance)을 구할 수 있다. 블러 수치는 상기 분산의 값에 반비례하도록 정해질 수 있다. 이에 따르면, 상기 분산의 크기가 크면 블러 수치가 작다는 의미가 되고, 분산의 크기가 작으면 블러 수치가 크다는 의미가 될 수 있다.The object determining unit 130 may obtain the Laplacian value for each pixel in the region where the shape of the object is determined to exist in the still image and then obtain the variance of the Laplacian values. The blur value may be set to be inversely proportional to the value of the variance. According to this, if the magnitude of the dispersion is large, it means that the blur value is small, and if the dispersion size is small, it means that the blur value is large.

전술한 바와 같이, 각 확인 영상의 블러 수치에 기초하여, 객체 판단부(130)는 블러 수치가 임계치 이상인 확인 영상은 미확인 영상이 되도록 분류할 수 있다. 상기 임계치의 값은 실험을 통해 경험적으로 설정되거나, 상기 확인 영상 각각의 블러 수치의 분포에 기초하여 정해질 수 있다. 혹은, 상기 임계치의 값은 전술한 학습 영상을 기계학습을 통해 분석한 결과에 기초하여 정해질 수도 있을 것이다. 한편, 객체 판단부(130)는 블러 수치를 구하는 과정을 따로 수행하지 않고, 전술한 학습 모델을 이용한 객체 판단 과정에서 블러의 정도가 높은 정지 영상을 미확인 영상으로 바로 분류할 수도 있다. 즉, 정지 영상의 블러의 정도 판단까지 기계학습에 맡길 수 있는 것이며, 이를 위해 상기 학습 모델은 학습 영상 각각의 특징 정보와 학습 영상 각각의 블러의 정도 간의 관계에 대한 정보도 포함할 수 있다.As described above, based on the blur value of each confirmation image, the object determination unit 130 can classify the confirmation image whose blur value is equal to or larger than the threshold value to be an unidentified image. The value of the threshold may be empirically determined through experimentation or may be determined based on the distribution of the blur values of each of the verification images. Alternatively, the value of the threshold value may be determined based on a result of analyzing the above-described learning image through machine learning. On the other hand, the object determining unit 130 may not classify the blur value separately, and may classify the still image having a high degree of blur into an unidentified image in the object determining process using the learning model. That is, until the determination of the degree of blur of the still image, the learning model can be left to machine learning. For this, the learning model may also include information on the relationship between the feature information of each learning image and the degree of blur of each learning image.

전술한 과정을 통해 판정된 미확인 영상에 대해서는, 트래킹(tracking) 기법을 이용하여 포함된 객체의 위치를 예상하고, 해당 위치만을 대상으로 하여 학습 모델에 기초한 객체 종류 판단을 수행할 수 있다. 트래킹 기법은, 특정 미확인 영상에 비해 시간적으로 앞서는 선행 영상을 이용하여, 상기 특정 미확인 영상의 영역 중 객체의 형상이 존재할 것으로 생각되는 영역을 알아내는 것이다.For the unidentified image determined through the above-described process, the position of the included object can be predicted using a tracking technique, and the object type determination based on the learning model can be performed only on the corresponding position. The tracking technique uses a preceding image temporally preceding a specific unidentified image to find an area in which the shape of the object is considered to exist in the area of the specific unidentified image.

상기 트래킹을 위해, 객체 판단부(130)는 특정 미확인 영상에 시간적으로 가까운 선행 영상을 순서대로 소정의 제 2 개수(예컨대, 5개)만큼 선택할 수 있다. 다음으로, 객체 판단부(130)는 상기 선택된 선행 영상 각각에 대해, 각 선행 영상 내에서 객체의 형상이 존재하는 위치를 탐지할 수 있다. 이와 같은 과정을 통해, 객체 판단부(130)는 객체의 이동에 관한 정보를 획득할 수 있으며, 상기 획득된 정보에 기초하여, 상기 특정 미확인 영상 내에서 객체의 형상이 존재할 것으로 예상되는 영역을 결정할 수 있다. For the tracking, the object determining unit 130 may select a predetermined second number (for example, five) of the preceding images temporally close to the specific unidentified image in order. Next, the object determination unit 130 may detect a position where the shape of the object exists in each preceding image, with respect to each of the selected preceding images. Through this process, the object determination unit 130 can acquire information about the movement of the object, and based on the obtained information, determines an area in which the shape of the object is expected to exist in the specific unidentified image .

다음으로, 객체 판단부(130)는 상기 특정 미확인 영상의 전체 영역이 아닌, 객체의 형상이 존재할 것으로 예상되는 영역에 대해서만 학습 모델에 기초한 객체 종류 판단을 적용할 수 있다. 이는 학습 모델에 기초한 객체 종류 판단의 정확도가, 정지 영상의 전체 영역을 대상으로 할 때보다 실제로 객체의 형상이 존재하는 영역을 대상으로 할 때 더 높음을 이용한 것이다.Next, the object determination unit 130 may apply the object type determination based on the learning model only to an area where the shape of the object is expected to exist, rather than the entire area of the specific unidentified image. This is based on the fact that the accuracy of object type judgment based on the learning model is higher when the region of the object is actually located than when the entire region of the still image is targeted.

객체 판단부(130)는 전술한 바와 같은 트래킹을 이용한 객체 종류 판단을 미확인 영상 각각에 대해 수행할 수 있으며, 객체 종류가 판단되는 미확인 영상(예컨대, 정지 영상에 표시된 형상과, 정지 영상에 포함되어 있다고 판단되는 객체의 형상 간의 유사도의 값이 상기 임계 유사도 이상인 영상)은 확인 영상으로 분류할 수 있다.The object determining unit 130 may perform the object type determination using the tracking as described above for each unacknowledged image, and may include an unidentified image (e.g., a shape displayed on the still image and a still image included in the still image) The image having the similarity value between the shapes of the objects determined to be equal to or greater than the threshold similarity degree) can be classified into the confirmation image.

객체 판단부(130)는 전술한 과정을 모두 거치고도 미확인 영상으로 남은 영상만을 미확인 영상으로서 확정할 수 있다(S140). 상기 확정된 미확인 영상 각각에 대해, 추정부(140)는 미확인 영상에 포함된 객체 종류를 추정할 수 있다(S150). 추정부(140)의 상기 추정 동작에는, 보팅(voting) 기법이 활용될 수 있다.The object determination unit 130 can determine only the image remaining as an unidentified image through the above-described process but not yet identified (S140). For each of the confirmed unidentified images, the estimator 140 may estimate the object type included in the unidentified image (S150). For the estimation operation of the estimation unit 140, a voting technique may be utilized.

특정 미확인 영상에 대해, 추정부(140)는 상기 특정 미확인 영상과 시간적으로 가까운 순으로 제 1 개수의 정지 영상(이하에서는 "참고 영상"이라 칭함)을 선택할 수 있다. 이하에서는 제 1 개수를 10이라 가정하도록 하며, 이 경우 상기 특정 미확인 영상보다 시간적으로 바로 앞에 연속하여 존재하는 5개의 정지 영상과, 바로 뒤에 연속하여 존재하는 5개의 정지 영상의 총 10개 정지 영상이 참고 영상으로 선택될 수 있다. For a specific unidentified image, the estimator 140 may select a first number of still images (hereinafter referred to as a "reference image") in order of temporal closeness to the specific unidentified image. Hereinafter, it is assumed that the first number is 10, and in this case, a total of 10 still images, that is, 5 still images continuously existing immediately before the specific unidentified image and 5 still images continuously following the specific unidentified image, It can be selected as a reference image.

다음으로, 추정부(140)는 참고 영상 각각에 포함된 객체들이 상기 참고 영상에 나타나는 빈도에 기초하여, 상기 특정 미확인 영상에 포함된 객체를 추정할 수 있다. 예컨대, 객체의 종류로서 사람의 손가락의 4가지 모양이 있고, 이들 4가지 모양은 각각 V 모양, O 모양, 손바닥 모양, 엄지를 치켜든 모양이라고 가정하도록 한다. 상기 참고 영상 10개에서, V 모양이 5회, O 모양이 3회, 손바닥 모양이 2회 나왔으며, 엄지를 치켜든 모양은 한 차례도 등장하지 않았다면, 추정부(140)는 참고 영상에서 가장 많이 등장한 V 모양의 손가락을 상기 특정 미확인 영상에 포함된 객체로서 추정할 수 있다. 즉, 추정부(140)는 참고 영상에서 가장 높은 빈도로 등장한 객체를 상기 특정 미확인 영상에 포함된 객체로서 추정할 수 있다.Next, the estimator 140 may estimate an object included in the specific unidentified image based on the frequency with which the objects included in each of the reference images appear on the reference image. For example, assume that there are four types of human fingers as objects, and these four shapes are V shape, O shape, palm shape, and thumb shape. In the reference images 10, if the V shape is 5 times, the O shape is 3 times, the palm shape is 2 times, and the thumb raised shape has not appeared once, It is possible to estimate the V-shaped fingers, which have appeared many times, as objects included in the specific unidentified image. That is, the estimator 140 may estimate the object that appeared at the highest frequency in the reference image as an object included in the specific unidentified image.

이와 달리, 추정부(140)는 참고 영상에서 각 객체가 등장하는 횟수를 단순 합산하는 대신, 가중치를 적용한 합계를 이용할 수도 있다. 상기 가중치는 상기 특정 미확인 영상에 시간적으로 가까울수록 큰 값을 갖도록 부여될 수 있다. 전술한 예에서, O 모양이 3회가 나와 V 모양보다 적은 빈도로 등장하였으나, O 모양이 상기 특정 미확인 영상에 시간적으로 가까운 참고 영상들에 등장하였고, V 모양은 시간적으로 먼 참고 영상들에 등장하였다면, 가중치를 적용한 합계의 크기는 O 모양이 V 모양에 비해 클 수 있으며, 이 경우 추정부(140)는 O 모양의 손가락을 상기 특정 미확인 영상에 포함된 객체로서 추정할 수 있다. 이와 같은 가중치는 예컨대 가우시안 분포(Gaussian distribution)에 기초하여 정해질 수 있다.Alternatively, the estimator 140 may use a weighted sum instead of simply summing the number of times each object appears in the reference image. The weight may be given to have a larger value as the temporal distance is closer to the specific unidentified image. In the above example, although the O shape appears three times and appears less frequently than the V shape, the O shape appears in reference images temporally close to the specific unidentified image, and the V shape appears in temporally distant reference images The size of the total weighted sum may be larger than that of the V shape. In this case, the estimator 140 may estimate the O-shaped finger as an object included in the specific unidentified image. Such a weight can be determined based on, for example, a Gaussian distribution.

추정부(140)는 전술한 바와 같은 방식으로 모든 미확인 영상에 대해 포함된 객체의 종류를 추정할 수 있다. 이와 같이 미확인 영상에 포함된 객체의 종류를 추정하기 위해, 미확인 영상에 시간적으로 가까운 참고 영상들을 이용하는 보팅 기법을 활용함으로써, 통상적인 방법으로는 포함된 객체의 종류가 확인되지 않는 정지 영상에 대해서도 포함된 객체를 높은 정확도로 추정하는 것이 가능하다.The estimator 140 may estimate the type of object included in all unidentified images in the manner described above. In order to estimate the type of the object included in the unidentified image, a voting technique using reference images temporally close to the unidentified image is used, so that still images including the types of the included objects are not included in the conventional method It is possible to estimate the object with high accuracy.

도 3은 본 발명의 일 실시예에 따른 동영상에 포함된 객체를 인식하기 위한 방법의 세부 사항을 구체적으로 설명하기 위한 도면이다. 즉, 도 3은 도 2의 단계 S130 내지 S150을 보다 구체적으로 설명하기 위한 것이다.3 is a diagram for explaining details of a method for recognizing an object included in a moving picture according to an embodiment of the present invention. That is, FIG. 3 is intended to explain steps S130 to S150 in FIG. 2 more specifically.

도 3을 참조하면, 동영상(200)으로부터 정지 영상(210)을 추출할 수 있음을 볼 수 있다. 이들 정지 영상(210) 중 시간적으로 서로 인접한 3개의 정지 영상(211, 212, 213)을 살펴보면, 인접한 각 정지 영상(211, 212, 213)은 각기 학습 모델을 이용한 객체 인식 과정을 거칠 수 있다(S131). 학습 모델을 이용한 객체 인식에 성공하면, 정지 영상(211, 212, 213)은 블러 수치 계산 과정을 거칠 수 있다. Referring to FIG. 3, it can be seen that the still image 210 can be extracted from the moving image 200. Referring to the three still images 211, 212, and 213 adjacent to each other in time among the still images 210, the adjacent still images 211, 212, and 213 may undergo an object recognition process using a learning model S131). If the object recognition is successful using the learning model, the still images 211, 212, and 213 may be subjected to the blur value calculation process.

객체 인식에 성공하고 블러 수치가 임계치 미만인 정지 영상(210)은 객체 판단이 완료된 확인 영상으로서 확정되지만, 객체 인식이 실패하거나, 객체 인식에 성공하였어도 블러 수치가 임계치 이상인 정지 영상(210)은 트래킹 기법을 이용하여 객체 위치를 예상한 후, 객체의 형상이 존재한다고 생각되는 영역에 대해서만 학습 모델을 이용한 객체 인식을 적용하는 과정을 거친다(S134). 여기서 객체 인식에 성공한 정지 영상(210)은 객체 판단이 완료된 확인 영상이 될 수 있으나, 여기에서도 인식에 실패한 정지 영상(210)은 미확인 영상으로 분류되어, 보팅에 의한 객체 종류 추정의 과정을 거치게 된다(S150). 정지 영상(211)이 미확인 영상으로 분류되었을 경우, 인접한 정지 영상(212, 213)들은 위에서 설명한 바와 같이 보팅에 의한 추정의 자료를 제공할 수 있다.The still image 210 having the blur value less than the threshold is determined as the confirmation image in which the object determination is completed, but the still image 210 having the blur value of the threshold value or more, even if the object recognition fails or the object recognition succeeds, (S134), the object recognition is performed using the learning model only for the region in which the shape of the object is assumed to exist. Here, the still image 210 that has succeeded in recognizing the object may be a confirmation image in which the object determination is completed. However, the still image 210 that has failed to be recognized is classified as an unidentified image and subjected to the object type estimation process by voting (S150). When the still image 211 is classified as an unidentified image, the adjacent still images 212 and 213 can provide estimation data by voting as described above.

상기 S150까지의 단계를 거치고 나면, 모든 정지 영상 각각에 대해 포함된 객체 종류의 판단 혹은 추정이 완료된다. 하지만 이와 같은 판단 및 추정 결과에도 오류는 있을 수 있다. 예컨대, 어떤 특정 정지 영상은 V 모양의 손가락을 객체로서 포함한다고 판단되었을 수 있다. 그런데, 시간적으로 상기 특정 정지 영상에 앞과 뒤로 서로 인접한 두 정지 영상에 포함된 객체는 O 모양의 손가락이라고 가정해 보자. 그러면 서로 인접한 3개의 정지 영상에 걸쳐 손가락의 형상이 O 모양→V 모양→O 모양으로 순차적으로 바뀌었다는 것인데, 일반적으로 서로 인접한 정지 영상 간의 시간 간격은 30ms 정도로 극히 짧다. 이와 같은 짧은 시간 동안 위와 같은 변화가 이루어졌다고 보기에는 그 개연성이 지나치게 낮다. 따라서 이 경우, 상기 특정 정지 영상에 포함되었다고 판단된 객체의 종류를, V 모양의 손가락에서 O 모양의 손가락으로 수정할 수 있다.After the steps up to S150, the determination or estimation of the included object types is completed for each still image. However, there may also be errors in the results of these judgments and estimates. For example, a certain still image may have been determined to include a V-shaped finger as an object. Let us suppose that an object included in two still images adjacent to each other in time before and after the specific still image is an O-shaped finger. Then, the shape of the finger is sequentially changed from O shape to V shape to O shape over three adjacent still images. Generally, the time interval between adjacent still images is extremely short, about 30 ms. The likelihood of such a change over such a short period of time is too low. Therefore, in this case, the type of the object determined to be included in the specific still image can be modified from the V-shaped finger to the O-shaped finger.

즉, 이를 일반화하여 설명하면, 보정부(150)는 특정 정지 영상에 있어서, 시간적으로 상기 특정 정지 영상에 앞과 뒤로 서로 인접한 두 정지 영상에 포함된 객체의 종류가 서로 같되, 상기 특정 정지 영상에 포함된 객체의 종류와는 다른 경우, 상기 특정 정지 영상에 포함된 객체의 종류를 상기 두 정지 영상에 포함된 객체의 종류로 수정할 수 있다(S160). 이와 같은 보정부(150)의 동작은 확인 영상에 대해서만 수행될 수 있지만, 추정이 완료된 미확인 영상에 대해서도 수행될 수도 있다. In other words, the correction unit 150 may be configured such that the correction unit 150 corrects a specific still image by using the same type of objects included in two still images adjacent to the specific still image temporally, If the type of the object included in the still image is different from the type of the included object, the type of the object included in the specific still image may be modified to the type of the object included in the two still images (S160). The operation of the correcting unit 150 may be performed only on the confirmed image, but may also be performed on the unidentified image on which the estimation is completed.

또한, 보정부(150)는 특정 정지 영상에 인접한 두 정지 영상만을 보정에 사용할 수도 있고, 특정 정지 영상에 시간적으로 가까운 정지 영상을 두 개보다 많은 수로 선택하여 보정에 사용할 수도 있다. 두 개보다 많은 수의 정지 영상을 보정을 위해 선택했을 경우, 이들 선택된 정지 영상에 포함된 객체의 종류가 모두 같고, 특정 정지 영상에 포함된 객체의 종류는 상기 선택된 정지 영상에 포함된 객체의 종류와 다를 때에, 보정부(150)는 상기 특정 정지 영상에 포함된 객체의 종류를 수정할 수 있을 것이다.In addition, the corrector 150 may use only two still images adjacent to a specific still image for correction, or may select a still image closer in time to a specific still image as more than two to use for correction. When more than two still images are selected for correction, the types of the objects included in the selected still images are all the same, and the types of objects included in the specific still images are the types of the objects included in the selected still images The correcting unit 150 may modify the type of object included in the specific still image.

보정부(150)에 의한 결과 보정까지 완료되면, 출력부(160)는 객체 인식의 결과를 객체 인식 장치(100) 외부로 출력할 수 있다(S170). 물론 상기 객체 인식의 결과는 데이터베이스(170)에 저장될 수도 있다.When the correction by the correction unit 150 is completed, the output unit 160 may output the result of object recognition to the outside of the object recognition apparatus 100 (S170). Of course, the result of the object recognition may be stored in the database 170.

지금까지 설명한 본 발명의 일 실시예에 따른 객체 인식 방법에 의하면, 모션 블러 혹은 갑작스런 조도 변화 등 객체의 인식률을 떨어뜨리는 외부 요인이 존재하는 상황에서도, 높은 신뢰도의 객체 인식 결과를 제공할 수 있다.According to the object recognition method of the present invention described above, it is possible to provide a high-reliability object recognition result even in the presence of an external factor that lowers the recognition rate of an object such as a motion blur or a sudden change in illumination.

본 발명에 첨부된 블록도의 각 블록과 흐름도의 각 단계의 조합들은 컴퓨터 프로그램 인스트럭션들에 의해 수행될 수도 있다. 이들 컴퓨터 프로그램 인스트럭션들은 범용 컴퓨터, 특수용 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비의 인코딩 프로세서에 탑재될 수 있으므로, 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비의 인코딩 프로세서를 통해 수행되는 그 인스트럭션들이 블록도의 각 블록 또는 흐름도의 각 단계에서 설명된 기능들을 수행하는 수단을 생성하게 된다. 이들 컴퓨터 프로그램 인스트럭션들은 특정 방법으로 기능을 구현하기 위해 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비를 지향할 수 있는 컴퓨터 이용 가능 또는 컴퓨터 판독 가능 메모리에 저장되는 것도 가능하므로, 그 컴퓨터 이용가능 또는 컴퓨터 판독 가능 메모리에 저장된 인스트럭션들은 블록도의 각 블록 또는 흐름도 각 단계에서 설명된 기능을 수행하는 인스트럭션 수단을 내포하는 제조 품목을 생산하는 것도 가능하다. 컴퓨터 프로그램 인스트럭션들은 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비 상에 탑재되는 것도 가능하므로, 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비 상에서 일련의 동작 단계들이 수행되어 컴퓨터로 실행되는 프로세스를 생성해서 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비를 수행하는 인스트럭션들은 블록도의 각 블록 및 흐름도의 각 단계에서 설명된 기능들을 실행하기 위한 단계들을 제공하는 것도 가능하다.Combinations of each step of the flowchart and each block of the block diagrams appended to the present invention may be performed by computer program instructions. These computer program instructions may be embedded in an encoding processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, so that the instructions, performed through the encoding processor of a computer or other programmable data processing apparatus, Thereby creating means for performing the functions described in each step of the flowchart. These computer program instructions may also be stored in a computer usable or computer readable memory capable of directing a computer or other programmable data processing apparatus to implement the functionality in a particular manner so that the computer usable or computer readable memory It is also possible for the instructions stored in the block diagram to produce a manufacturing item containing instruction means for performing the functions described in each block or flowchart of the block diagram. Computer program instructions may also be stored on a computer or other programmable data processing equipment so that a series of operating steps may be performed on a computer or other programmable data processing equipment to create a computer- It is also possible that the instructions that perform the processing equipment provide the steps for executing the functions described in each block of the block diagram and at each step of the flowchart.

또한, 각 블록 또는 각 단계는 특정된 논리적 기능(들)을 실행하기 위한 하나 이상의 실행 가능한 인스트럭션들을 포함하는 모듈, 세그먼트 또는 코드의 일부를 나타낼 수 있다. 또, 몇 가지 대체 실시예들에서는 블록들 또는 단계들에서 언급된 기능들이 순서를 벗어나서 발생하는 것도 가능함을 주목해야 한다. 예컨대, 잇달아 도시되어 있는 두 개의 블록들 또는 단계들은 사실 실질적으로 동시에 수행되는 것도 가능하고 또는 그 블록들 또는 단계들이 때때로 해당하는 기능에 따라 역순으로 수행되는 것도 가능하다.Also, each block or each step may represent a module, segment, or portion of code that includes one or more executable instructions for executing the specified logical function (s). It should also be noted that in some alternative embodiments, the functions mentioned in the blocks or steps may occur out of order. For example, two blocks or steps shown in succession may in fact be performed substantially concurrently, or the blocks or steps may sometimes be performed in reverse order according to the corresponding function.

이상의 설명은 본 발명의 기술 사상을 예시적으로 설명한 것에 불과한 것으로서, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자라면 본 발명의 본질적인 품질에서 벗어나지 않는 범위에서 다양한 수정 및 변형이 가능할 것이다. 따라서, 본 발명에 개시된 실시예들은 본 발명의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 실시예에 의하여 본 발명의 기술 사상의 범위가 한정되는 것은 아니다. 본 발명의 보호 범위는 아래의 청구범위에 의하여 해석되어야 하며, 그와 균등한 범위 내에 있는 모든 기술사상은 본 발명의 권리범위에 포함되는 것으로 해석되어야 할 것이다.The above description is merely illustrative of the technical idea of the present invention, and various modifications and changes may be made by those skilled in the art without departing from the essential characteristics of the present invention. Therefore, the embodiments disclosed in the present invention are intended to illustrate rather than limit the scope of the present invention, and the scope of the technical idea of the present invention is not limited by these embodiments. The scope of protection of the present invention should be construed according to the following claims, and all technical ideas within the scope of equivalents thereof should be construed as falling within the scope of the present invention.

본 발명의 일 실시예에 따르면, 동영상을 구성하는 정지 영상 중 인식이 어려운 미확인 영상에 포함된 객체의 종류를 높은 정확도로 추정할 수 있으므로, 동영상 전체에 대한 인식의 신뢰도가 높아질 수 있으며, 궁극적으로는 영상 인식을 기반으로 하는 각종 서비스를 사용자에게 높은 품질로 제공할 수 있다.According to an embodiment of the present invention, since it is possible to estimate a type of an object included in an unidentified image, which is hard to recognize among still images constituting a moving image, with high accuracy, the reliability of recognition of the entire moving image can be enhanced, Can provide various services based on image recognition to users with high quality.

100: 객체 인식 장치
110: 입력부
120: 영상 추출부
130: 객체 판단부
140: 추정부
150: 보정부
160: 출력부
170: 데이터베이스100: Object recognition device
110: input unit
120:
130:
140:
150:
160: Output section
170: Database

Claims

Extracting the plurality of still images from a moving image composed of a plurality of still images obtained by photographing in time sequence;
Determining a type of an object included in each of the plurality of still images, for each of the plurality of still images;
Determining a still image in which the type of an included object is not determined among the plurality of still images as an unidentified image; And
Estimating a type of an object included in the unidentified image based on a type of an object included in each of the first number of still images selected in descending order of time from the unidentified image among the plurality of still images,
Wherein the determining comprises:
Sequentially selecting a temporally preceding image and a temporally subsequent image based on a still image for which the type of the object is not judged to be a second number;
Detecting, for each of the selected second number of images, a location where the shape of the included object is present;
Determining an area in which an object is expected to exist in a still image in which the type of the object is not determined based on a result of the detection;
Estimating a type of an object existing in the expected area; And
And determining the still image to which the type of the object is not estimated as the unidentified image if the type of the object included in the predicted area is not estimated,
Wherein the determining step comprises:
Using a learning model that is learned based on learning data for which a plurality of learning images are input and the type of sample objects included in each of the plurality of learning images and the blur value of each of the plurality of learning images are correct, The type of object included in each still image and the blur value of each of the inputted still images
How to recognize objects contained in a movie.

delete

The method according to claim 1,
Wherein the step of determining a still image in which the type of an included object among the plurality of still images is not determined as an unidentified image,
Performing a step of obtaining similarity between a shape displayed on one of the still images and a shape of an object determined to be included in the image based on the learning model for each of the still images; And
Determining, as the unidentified image, a still image whose similarity is less than a predetermined threshold similarity among the plurality of still images
How to recognize objects contained in a movie.

The method of claim 3,
Wherein the step of determining a still image in which the type of an included object among the plurality of still images is not determined as an unidentified image,
Determining a still image whose blur value is equal to or greater than a predetermined threshold among the still images having the similarity degree equal to or higher than the threshold similarity as the unidentified image,
The blur value is calculated based on a frequency analysis result for a region in which the shape of the object included in the specific image exists in the specific image for which the blur value among the still images is to be obtained
How to recognize objects contained in a movie.

delete

The method according to claim 1,
Estimating a type of an object included in the unidentified image,
Estimating a type of an object included in the unidentified image based on a frequency of each of the objects included in each of the first number of still images appearing in the first number of still images
How to recognize objects contained in a movie.

The method according to claim 6,
The type of object included in the unidentified image is estimated based on a weight and a frequency assigned to each of the first number of still images and the weight is assigned to have a larger value as the temporal distance is closer to the unidentified image
How to recognize objects contained in a movie.

The method according to claim 1,
Wherein the type of the object included in the image is determined based on the type of the object included in the still image when the types of the objects included in the two still images adjacent to each other are temporally identical to each other, The method further comprises modifying the type of the object included in the image into the type of the object included in the two still images
How to recognize objects contained in a movie.

A still image extracting unit for extracting the plurality of still images from a moving image composed of a plurality of still images obtained by photographing in time sequence;
An object determination unit for determining a type of an object included in each of the still images with respect to each of the still images and determining a still image from which the type of the included object is not determined as an unidentified image, ; And
And an estimation unit estimating a type of an object included in the unidentified image based on a type of the object included in each of the first number of still images selected in descending order of time from the unidentified image of the still image,
The object judging unit,
Selecting a second number of images in order of a total number of temporally preceding images and temporally subsequent images based on a still image for which the type of the object is not judged; Determining an area in which the object is expected to exist in the still image in which the type of the object is not determined based on a result of the detection, And determining a type of the object included in the predicted area as the unidentified image, wherein the type of the object is not estimated,
The object judging unit,
Using a learning model that is learned based on learning data for which a plurality of learning images are input and the type of sample objects included in each of the plurality of learning images and the blur value of each of the plurality of learning images are correct, The type of object included in each still image and the blur value of each of the inputted still images
A device that recognizes objects contained in a movie.

A program stored in a computer-readable medium for performing the method according to any one of claims 1, 3, 4 and 6 to 8.

A computer-readable medium having recorded thereon a program comprising instructions for performing steps comprising the steps of any one of claims 1, 3, 4, and 6 to 8.