KR20190120489A

KR20190120489A - Apparatus for Video Recognition and Method thereof

Info

Publication number: KR20190120489A
Application number: KR1020180043730A
Authority: KR
Inventors: 변혜란; 조보라; 홍기범; 홍종광; 김호성; 황선희; 기민송; 김태형
Original assignee: 연세대학교 산학협력단
Priority date: 2018-04-16
Filing date: 2018-04-16
Publication date: 2019-10-24
Also published as: KR102138680B1

Abstract

According to the present invention, disclosed is an image recognition apparatus, which can recognize behaviors of objects in an image using a convolutional neural network capable of performing video analysis. The image recognition apparatus comprises: a stream generation unit generating an action stream including motion information on behaviors of each object in an input image including one or more objects; and a recognition unit receiving position information indicating a positional relationship of an action stream in the generated action stream or the input image and using a first recognizer outputting one or more class vectors as an indicator for classifying behaviors of the objects to recognize the behaviors of the objects.

Description

Apparatus for Video Recognition and Method

본 발명은 영상 인식 장치에 관한 것이다. 보다 상세하게는 신경망을 이용한 영상 내 객체의 행동을 인식하는 영상 인식 장치에 관한 것이다.The present invention relates to an image recognition device. More particularly, the present invention relates to an image recognition apparatus for recognizing the behavior of an object in an image using a neural network.

행동 인식은 시간 및 공간적 정보를 모두 포함하고 있는 비디오 영상에서 사람의 행동을 분류하는데 사용될 수 있다. 영상 내 주체와 그 주변의 사람, 주체와 주체의 행동 대상이 되는 객체, 객체와 또 다른 객체간의 관계는 영상 내 특정 객체의 행동을 분류하기 위한 요소로 사용될 수 있다.Behavioral awareness can be used to classify human behavior in video images that contain both temporal and spatial information. The subject in the image and its surroundings, the object that is the subject and the subject's action, and the relationship between the object and another object can be used as an element to classify the behavior of a specific object in the image.

인공 신경망(Neural Network)은 기계 학습과 인지 과학에서 생물학의 신경망에서 비롯된 통계학적 학습 알고리즘인데, 최근 컴퓨터 비전 등의 분야에서 기계적으로 영상을 인식하기 위한 기술이 활발히 연구되고 있다. 컨벌루션 신경망(Convolutional Neural Network, CNN)은 하나 또는 여러 개의 컨벌루션 레이어와 그 위에 올려진 인공 신경망 레이어들로 이루어진 신경망으로 2차원 입력 데이터인 영상과 음성 분석에 주로 사용되는 인공 신경망이다. Artificial Neural Network (Neural Network) is a statistical learning algorithm derived from the neural network of biology in machine learning and cognitive science. Recently, techniques for mechanically recognizing images in the field of computer vision have been actively studied. A convolutional neural network (CNN) is a neural network composed of one or several convolutional layers and artificial neural network layers mounted thereon. The convolutional neural network (CNN) is an artificial neural network mainly used for image and audio analysis, which is two-dimensional input data.

종래의, 인공 신경망을 이용한 영상 인식 기술은 주로 정지 영상에서만 이루어져왔고, 또한, 정지 영상 내 주체-객체와의 관계를 분석하는 한계가 있었다. 또한, 종래의 영상 인식 기술은 사람, 사물을 포함한 객체간의 관계를 분석함에 있어서 시간의 흐름을 고려하지 않는 문제점이 있었다.Conventional image recognition technology using an artificial neural network has been mainly made only in still images, and also there is a limit to analyze the relationship between the subject and the object in the still image. In addition, the conventional image recognition technology has a problem that does not consider the passage of time in analyzing the relationship between objects, including people and objects.

따라서, 동영상 단위의 영상 분석을 수행할 수 있는 인공 신경망을 기반으로, 영상 내 객체와, 상기 객체들의 문맥적인 상호 관계를 고려하여 영상을 인식할 수 있는 기술의 개발이 요구되고 있다.Therefore, based on an artificial neural network capable of performing image analysis on a moving image basis, development of a technology capable of recognizing an image in consideration of an object in the image and the contextual interrelationship of the objects is required.

한국 공개 특허 제 10-2017-0034226 (공고)Korean Unexamined Patent No. 10-2017-0034226 (Notice)

본 발명은 상기한 문제점을 해결하기 위하여 안출된 것으로서, 영상 인식 장치를 개시한다. 특히, 신경망을 이용하여 동영상 내 객체의 행동을 인식할 수 있는 영상 인식 장치를 개시한다.The present invention has been made to solve the above problems, and discloses an image recognition apparatus. In particular, an image recognition apparatus capable of recognizing the behavior of an object in a video using a neural network is disclosed.

본 발명은 상기한 목적을 달성하기 위해 안출된 것으로서, 본 발명의 영상 인식 장치는 적어도 하나의 객체들을 포함하는 입력 영상에서 상기 객체별 행동에 관한 모션 정보를 포함하는 액션 스트림을 생성하는 스트림 생성부; 및 상기 생성된 액션 스트림 또는 상기 입력 영상에서 액션 스트림의 위치 관계를 나타내는 위치 정보를 입력 받아, 상기 객체의 행동을 분류하기 위한 지표로서 적어도 하나의 클래스 벡터를 출력하는 제1 인식기를 이용하여 상기 객체들의 행동을 인식하는 인식부; 를 포함한다.The present invention has been made to achieve the above object, the image recognition apparatus of the present invention is a stream generation unit for generating an action stream including the motion information about the action for each object in the input image including at least one object ; And receiving the position information indicating the positional relationship of the action stream from the generated action stream or the input image, and outputting at least one class vector as an indicator for classifying the object's behavior. Recognition unit for recognizing their behavior; It includes.

본 발명에서 상기 스트림 생성부는 상기 입력 영상을 시간의 흐름에 따라 분할하여 복수개의 프레임 영상들을 생성하며, 상기 생성된 프레임 영상들에서 상기 객체 별로 상기 객체를 적어도 일부 포함하는 관심 영역을 검출하는 관심 영역 검출부; 를 더 포함하고, 상기 검출된 관심 영역을 이용하여 상기 액션 스트림을 생성할 수 있다.In the present invention, the stream generation unit generates a plurality of frame images by dividing the input image over time, and detects an ROI including at least a portion of the object for each object in the generated frame images. Detection unit; The apparatus may further include: generating the action stream using the detected ROI.

본 발명에서 상기 인식부는 상기 제1 인식기를 이용하여 상기 객체 별로 생성되는 액션 스트림의 클래스 벡터를 미리 설정된 방법에 따라 합산하는 합산부; 를 더 포함하고, 상기 합산된 클래스 벡터를 이용하여 상기 객체들의 행동을 인식할 수 있다.In the present invention, the recognizer comprises: a summing unit for summing a class vector of an action stream generated for each object by a predetermined method using the first recognizer; Further comprising, it is possible to recognize the behavior of the objects using the summed class vector.

본 발명에서 상기 스트림 생성부는 인접하는 서로 다른 프레임 영상들에서 검출된 상기 관심 영역간 연결 점수를 산출하며, 상기 산출된 연결 점수를 고려하여 상기 서로 다른 프레임 영상들에서 검출된 상기 관심 영역을 연결하는 관심 영역 연결부; 를 더 포함하고, 상기 연결된 관심 영역들을 이용하여 상기 객체 별로 상기 액션 스트림을 생성할 수 있다.In the present invention, the stream generation unit calculates a connection score between the ROIs detected from adjacent different frame images, and connects the ROIs detected from the different frame images in consideration of the calculated connection points. Area connections; The apparatus may further include: generating the action stream for each object by using the connected ROIs.

본 발명에서 상기 스트림 생성부는 상기 프레임 영상들에서 검출된 상기 관심 영역을 프레임 영상 단위로 재배열하고, 상기 재배열된 관심 영역을 연결하여 생성된 조합 스트림으로부터 상기 액션 스트림의 상호간 위치에 관한 위치 정보를 산출하는 위치 정보 산출부; 를 더 포함할 수 있다.In the present invention, the stream generation unit rearranges the ROIs detected in the frame images in frame image units, and position information about mutual positions of the action streams from the combined stream generated by concatenating the rearranged ROIs. A location information calculation unit for calculating a; It may further include.

본 발명에서 상기 관심 영역 연결부는 인접한 프레임 영상들 각각에서 검출된 상기 관심 영역들의 특징 정보를 입력으로 하는 유사도 함수, 상기 인접한 프레임 영상들 각각에서 검출된 상기 관심 영역들의 오버렙 비율을 출력으로 하는 교차비 함수 및 상기 관심 영역들의 클래스 정보의 유사도 중 적어도 하나를 고려하여 연결 점수를 산출하는 연결 점수 산출부; 를 더 포함하고, 상기 산출된 연결 점수를 이용하여 상기 관심 영역을 연결할 수 있다.In the present invention, the ROI connection unit outputs a similarity function for inputting feature information of the ROIs detected in adjacent frame images, and an intersection ratio for outputting an overlap ratio of the ROIs detected in the adjacent frame images. A connection score calculator configured to calculate a connection score in consideration of at least one of a similarity between a function and class information of the ROIs; In addition, the ROI may be connected using the calculated connection score.

본 발명에서 상기 관심 영역 연결부는 상기 관심 영역이 검출되지 않은 프레임 영상이 존재하는 경우, 상기 관심 영역이 검출되지 않은 프레임 영상에 인접한 프레임 영상들에서 검출된 관심 영역들의 특징 정보를 기반으로 상기 관심 영역이 검출되지 않은 프레임 영상 내의 관심 영역의 특징 정보를 추정하는 특징 정보 추정부; 를 더 포함하고, 상기 추정된 특징 정보를 이용하여 상기 관심 영역을 연결할 수 있다.In the present invention, when there is a frame image in which the ROI is not detected, the ROI connection unit based on the feature information of the ROIs detected in the frame images adjacent to the frame image in which the ROI is not detected. A feature information estimator for estimating feature information of the ROI in the undetected frame image; The apparatus may further include: connecting the ROI by using the estimated feature information.

본 발명에서 상기 관심 영역 검출부는 상기 생성된 프레임 영상 내 상기 관심 영역이 위치하는 좌표를 나타내는 특징 정보를 산출하는 특징 정보 산출부; 를 더 포함하고, 상기 산출된 특징 정보를 이용하여 상기 관심 영역을 검출할 수 있다.The ROI detector of the present invention may include a feature information calculator configured to calculate feature information indicating coordinates at which the ROI is located in the generated frame image; In addition, the ROI may be detected by using the calculated feature information.

본 발명에서 상기 위치 정보 산출부는 상기 프레임 영상들에서 상기 객체 별로 검출된 상기 관심 영역 및 상기 관심 영역이 검출되지 않은 부분을 구분하여 구분하는 영역 구분부; 및 상기 구분된 관심 영역을 소정의 조합 방법으로 재배열하는 재배열부; 를 더 포함하고, 상기 재배열된 관심 영역을 연결하여 생성된 조합 스트림을 기반으로 상기 위치 정보를 산출할 수 있다.The location information calculator may include: a region separator configured to classify and distinguish portions of the ROI detected for each object from the frame images and portions from which the ROI is not detected; And a rearrangement unit for rearranging the divided regions of interest by a predetermined combination method. Further, the location information may be calculated based on the combined stream generated by concatenating the rearranged ROIs.

본 발명에서 상기 특징 정보 산출부는 상기 생성된 프레임 영상들을 각각 분할하여 미리 결정된 크기의 격자셀을 생성하는 전처리부; 및 상기 생성된 프레임 영상들을 입력으로 하고, 상기 격자셀 내부에 중심을 가지고 상기 객체가 존재하는 확률을 나타내는 바운더리 셀의 중심 좌표 또는 상기 바운더리 셀 내 상기 객체가 존재하는 확률을 출력으로 하는 제2 인식기(Neural Network)을 이용하여 상기 바운더리 셀 각각의 중심 좌표 및 상기 바운더리 셀 내의 상기 객체가 존재하는 확률을 계산하는 계산부; 를 더 포함하고, 상기 확률 및 상기 중심 좌표가 계산된 바운더리 셀을 이용하여 상기 특징 정보를 산출할 수 있다.In the present invention, the feature information calculation unit comprises a pre-processing unit for generating a grid cell of a predetermined size by dividing the generated frame images, respectively; And a second recognizer configured to receive the generated frame images as inputs and output a center coordinate of a boundary cell indicating a probability that the object exists in a center of the grid cell or a probability that the object exists in the boundary cell. A calculation unit configured to calculate a center coordinate of each boundary cell and a probability that the object exists in the boundary cell using a neural network; Further, the feature information may be calculated using a boundary cell from which the probability and the center coordinates are calculated.

본 발명에서 상기 관심 영역 검출부는 상기 프레임 영상들 내에 상기 객체를 중복하여 포함하는 바운더리 셀 중 상기 바운더리 셀 내에 상기 객체가 존재하는 확률이 기 설정된 임계치 이상인지 여부를 고려하여 상기 객체를 중복하여 포함하는 바운더리 셀의 일부를 제거하는 바운더리 셀 제거부; 를 더 포함하고, 제거되고 남은 바운더리 셀을 이용하여 상기 관심 영역을 검출할 수 있다.In the present invention, the ROI detector includes the object in consideration of whether a probability of the object exists in the boundary cell among the boundary cells including the object in the frame images is greater than or equal to a preset threshold. A boundary cell removal unit for removing a part of the boundary cell; In addition, the ROI may be detected by using the boundary cell remaining after the removal.

본 발명에서 상기 클래스 벡터는 상기 객체의 행동을 분류하는 행동 목록과 상기 입력 영상 내 객체들의 행동이 상기 행동 목록에 해당할 확률을 나타내는 확률 정보를 포함하고, 상기 인식부는 상기 합산된 클래스 벡터의 상기 행동 목록별 상기 확률 정보를 이용하여 상기 객체들의 행동을 인식할 수 있다.In the present invention, the class vector includes a behavior list classifying the behavior of the object and probability information indicating a probability that the behavior of the objects in the input image corresponds to the behavior list, and the recognition unit is configured to determine the sum of the summed class vector. The behavior of the objects may be recognized using the probability information for each behavior list.

또한 상기한 목적을 달성하기 위하여 본 발명의 영상 인식 방법은 적어도 하나의 객체들을 포함하는 입력 영상에서 상기 객체별 행동에 관한 모션 정보를 포함하는 액션 스트림을 생성하는 단계; 및 상기 생성된 액션 스트림 또는 상기 입력 영상에서 상기 액션 스트림의 위치 관계를 나타내는 위치 정보를 입력 받아, 상기 객체의 행동을 분류하기 위한 지표로서 적어도 하나의 클래스 벡터를 출력으로 하는 제1 인식기를 이용하여 상기 객체들의 행동을 인식하는 단계; 를 포함한다.In addition, in order to achieve the above object, the image recognition method of the present invention comprises the steps of: generating an action stream including motion information about the action for each object in an input image including at least one object; And a first recognizer that receives position information indicating a positional relationship of the action stream in the generated action stream or the input image, and outputs at least one class vector as an index for classifying the behavior of the object. Recognizing the behavior of the objects; It includes.

본 발명에서 상기 생성하는 단계는 상기 입력 영상을 시간의 흐름에 따라 분할하여 복수개의 프레임 영상들을 생성하며, 상기 생성된 프레임 영상들에서 상기 객체 별로 상기 객체를 적어도 일부 포함하는 관심 영역을 검출하는 단계; 를 더 포함하고, 상기 검출된 관심 영역을 이용하여 상기 액션 스트림을 생성할 수 있다.The generating may include generating a plurality of frame images by dividing the input image over time, and detecting an ROI including at least a portion of the object for each object in the generated frame images. ; The apparatus may further include: generating the action stream using the detected ROI.

본 발명에서 상기 인식하는 단계는 상기 제1 인식기를 이용하여 상기 객체 별로 생성되는 액션 스트림의 클래스 벡터를 미리 설정된 방법에 따라 합산하는 단계; 를 더 포함하고, 상기 합산된 클래스 벡터를 이용하여 상기 객체들의 행동을 인식할 수 있다.In the present invention, the recognizing may include adding a class vector of an action stream generated for each object according to a preset method using the first recognizer; Further comprising, it is possible to recognize the behavior of the objects using the summed class vector.

본 발명에서 상기 생성하는 단계는 인접하는 서로 다른 프레임 영상들에서 검출된 상기 관심 영역간 연결 점수를 산출하며, 상기 산출된 연결 점수를 고려하여 상기 서로 다른 프레임 영상들에서 검출된 상기 관심 영역을 연결하는 단계; 를 더 포함하고, 상기 연결된 관심 영역들을 이용하여 상기 객체 별로 상기 액션 스트림을 생성할 수 있다.The generating may include calculating a connection score between the ROIs detected in adjacent frame images, and connecting the ROIs detected in the different frame images in consideration of the calculated connection scores. step; The apparatus may further include: generating the action stream for each object by using the connected ROIs.

본 발명에서 상기 생성하는 단계는 상기 프레임 영상들에서 검출된 상기 관심 영역을 프레임 영상 단위로 재배열하고, 상기 재배열된 관심 영역을 연결하여 생성된 조합 스트림으로부터 상기 액션 스트림의 상호간 위치에 관한 위치 정보를 산출하는 단계; 를 더 포함할 수 있다.In the present invention, the generating may include rearranging the ROIs detected in the frame images in a frame image unit and linking the rearranged ROIs with respect to mutual positions of the action streams. Calculating information; It may further include.

본 발명에서 상기 연결하는 단계는 인접한 프레임 영상들 각각에서 검출된 상기 관심 영역들의 특징 정보를 입력으로 하는 유사도 함수, 상기 인접한 프레임 영상들 각각에서 검출된 상기 관심 영역들의 오버렙 비율을 출력으로 하는 교차비 함수 및 상기 관심 영역들의 클래스 정보의 유사도 중 적어도 하나를 고려하여 연결 점수를 산출하는 단계; 를 더 포함하고, 상기 산출된 연결 점수를 이용하여 상기 관심 영역을 연결할 수 있다.In the present invention, the linking may include a similarity function as input of feature information of the ROIs detected in each of the adjacent frame images, and an intersection ratio as an output of an overlap ratio of the ROIs detected in the adjacent frame images. Calculating a connection score by considering at least one of a similarity between a function and class information of the ROIs; In addition, the ROI may be connected using the calculated connection score.

본 발명에서 상기 연결하는 단계는 상기 관심 영역이 검출되지 않은 프레임 영상이 존재하는 경우, 상기 관심 영역이 검출되지 않은 프레임 영상에 인접한 프레임 영상들에서 검출된 관심 영역들의 특징 정보를 기반으로 상기 관심 영역이 검출되지 않은 프레임 영상 내의 관심 영역의 특징 정보를 추정하는 단계; 를 더 포함하고, 상기 추정된 특징 정보를 이용하여 상기 관심 영역을 연결할 수 있다.In the present invention, the step of connecting the ROI based on the feature information of the ROIs detected in the frame images adjacent to the frame image in which the ROI is not detected when there is a frame image in which the ROI is not detected. Estimating feature information of the ROI in the undetected frame image; The apparatus may further include: connecting the ROI by using the estimated feature information.

본 발명에서 상기 검출하는 단계는 상기 생성된 프레임 영상 내 상기 관심 영역이 위치하는 좌표를 나타내는 특징 정보를 산출하는 단계; 를 더 포함하고, 상기 산출된 특징 정보를 이용하여 상기 관심 영역을 검출할 수 있다.The detecting may include: calculating feature information indicating coordinates at which the ROI is located in the generated frame image; In addition, the ROI may be detected by using the calculated feature information.

본 발명에서 상기 위치 정보를 산출하는 단계는 상기 프레임 영상들에서 상기 객체 별로 검출된 상기 관심 영역 및 상기 관심 영역이 검출되지 않은 부분을 구분하는 단계; 및 상기 구분된 관심 영역을 소정의 조합 방법으로 재배열하는 단계; 를 더 포함하고, 상기 재배열된 관심 영역을 연결하여 생성된 조합 스트림을 기반으로 상기 위치 정보를 산출할 수 있다.The calculating of the position information may include: distinguishing the region of interest detected by each object from the frame images and the portion of which the region of interest is not detected; Rearranging the separated regions of interest by a predetermined combination method; Further, the location information may be calculated based on the combined stream generated by concatenating the rearranged ROIs.

또한 본 발명은 컴퓨터에서 상기한 영상 인식 방법을 실행시키기 위한 컴퓨터에서 판독 가능한 기록매체에 저장된 컴퓨터 프로그램을 개시한다.The present invention also discloses a computer program stored in a computer-readable recording medium for executing the image recognition method on a computer.

본 발명에 따르면, 영상 내 객체의 행동을 인식할 수 있다. According to the present invention, the behavior of the object in the image can be recognized.

특히, 동영상 단위의 영상 분석이 가능한 합성곱 신경망을 이용하여 영상을 인식할 수 있다.In particular, an image may be recognized using a composite product neural network capable of analyzing an image in a video unit.

도 1은 본 발명의 일 실시 예에 따른 영상 인식 장치의 블록도이다.
도 2는 도 1의 실시 예에서 스트림 생성부의 확대 블록도이다.
도 3은 도 2의 실시 예에서 관심 영역 검출부의 확대 블록도이다.
도 4는 도 3의 실시 예에서 특징 정보 산출부의 확대 블록도이다.
도 5는 관심 영역 검출부가 관심 영역을 검출하는 과정을 나타내는 참고도이다.
도 6은 관심 영역 검출부가 검출한 관심 영역들을 나타내는 예시도이다.
도 7은 도 2의 실시 예에서 관심 영역 연결부의 확대 블록도이다.
도 8은 관심 영역 연결부가 관심 영역들을 연결하는 과정을 나타내는 참고도이다.
도 9는 특징 정보 추정부가 관심 영역의 특징 정보를 추정하는 과정을 나타내는 예시도이다.
도 10은 도 2의 실시 예에서 위치 정보 산출부의 확대 블록도이다.
도 11은 본 발명의 영상 인식 장치가 수행하는 영상 인식 과정을 나타낸다.
도 12는 도 1의 실시 예에서 인식부의 확대 블록도이다.
도 13은 본 발명의 일 실시 예에 따른 영상 인식 방법의 흐름도이다.
도 14는 도 13의 실시 예에서 생성하는 단계의 확대 흐름도이다.
도 15는 도 14의 실시 예에서 검출하는 단계의 확대 흐름도이다.
도 16은 도 14의 실시 예에서 연결하는 단계의 확대 흐름도이다.
도 17은 도 14의 실시 예에서 위치 정보를 산출하는 단계의 확대 흐름도이다.
도 18은 도 13의 실시 예에서 인식하는 단계의 확대 흐름도이다.1 is a block diagram of an image recognition apparatus according to an exemplary embodiment.
FIG. 2 is an enlarged block diagram of a stream generator in the embodiment of FIG. 1.
3 is an enlarged block diagram of the ROI detector of the embodiment of FIG. 2.
4 is an enlarged block diagram of a feature information calculating unit in the embodiment of FIG. 3.
5 is a reference diagram illustrating a process of detecting an ROI by the ROI detector.
6 is an exemplary diagram illustrating the ROIs detected by the ROI detector;
FIG. 7 is an enlarged block diagram of a region of interest connector in the embodiment of FIG. 2.
8 is a reference diagram illustrating a process of connecting the ROI by the ROI.
9 is an exemplary diagram illustrating a process of estimating feature information of a region of interest by a feature information estimator.
FIG. 10 is an enlarged block diagram of a location information calculator in the embodiment of FIG. 2.
11 illustrates an image recognition process performed by the image recognition apparatus of the present invention.
12 is an enlarged block diagram of a recognizer in the embodiment of FIG. 1.
13 is a flowchart of an image recognition method according to an exemplary embodiment.
FIG. 14 is an enlarged flowchart of steps generated in the embodiment of FIG. 13. FIG.
FIG. 15 is an enlarged flowchart of a detecting step in the embodiment of FIG. 14.
FIG. 16 is an enlarged flowchart of connecting steps in the embodiment of FIG. 14;
17 is an enlarged flowchart of calculating position information in the embodiment of FIG. 14.
18 is an enlarged flowchart of a step of recognizing in the embodiment of FIG. 13.

이하, 본 발명의 일 실시예를 첨부된 도면들을 참조하여 상세히 설명한다.Hereinafter, an embodiment of the present invention will be described in detail with reference to the accompanying drawings.

첨부 도면을 참조하여 설명함에 있어, 동일하거나 대응하는 구성 요소는 동일한 도면번호를 부여하고 이에 대한 중복되는 설명은 생략하기로 한다.In the description with reference to the accompanying drawings, the same or corresponding components will be given the same reference numerals and redundant description thereof will be omitted.

또한 본 발명을 설명함에 있어, 관련된 공지 구성 또는 기능에 대한 구체적인 설명이 본 발명의 요지를 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명은 생략할 수 있다. In describing the present invention, when it is determined that detailed descriptions of related well-known structures or functions may obscure the gist of the present invention, the detailed description may be omitted.

본 출원에서 사용한 용어는 단지 특정한 실시 예를 설명하기 위해 사용된 것으로, 용어를 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 이하에서 설명하는 각 단계는 하나 또는 여러 개의 소프트웨어 모듈로도 구비가 되거나 또는 각 기능을 담당하는 하드웨어로도 구현이 가능하며, 소프트웨어와 하드웨어가 복합된 형태로도 가능하다.The terminology used herein is for the purpose of describing particular example embodiments only and is not intended to be limiting of terms. Singular expressions include plural expressions unless the context clearly indicates otherwise. Each step described below may be provided by one or several software modules or may be implemented by hardware that is responsible for each function, or may be a combination of software and hardware.

각 용어의 구체적인 의미와 예시는 각 도면의 순서에 따라 이하 설명 한다.Specific meanings and examples of each term will be described below in the order of the drawings.

이하에서는 본 발명의 실시예에 따른 영상 인식 장치(10)의 구성을 관련된 도면을 상세히 설명한다.Hereinafter, the drawings related to the configuration of the image recognition device 10 according to an embodiment of the present invention will be described in detail.

도 1은 본 발명의 일 실시 예에 따른 영상 인식 장치(10)의 블록도이다.1 is a block diagram of an image recognition device 10 according to an exemplary embodiment.

영상 인식 장치(10)는 스트림 생성부(100) 및 인식부(200)를 포함한다. 예를 들어, 본 발명의 영상 인식 장치(10)는 적어도 하나의 객체들을 포함하는 영상을 입력 받고, 입력 영상에서 액션 스트림을 생성하며, 생성된 액션 스트림을 입력으로 하는 제1 인식기를 이용하여 영상 내 객체의 행동을 인식할 수 있다.The image recognition apparatus 10 includes a stream generator 100 and a recognizer 200. For example, the image recognition apparatus 10 of the present invention receives an image including at least one object, generates an action stream from the input image, and uses the first recognizer to input the generated action stream. I can recognize the behavior of my objects.

본 발명의 영상 인식 장치(10)가 인식하는 영상은 정지 영상 외에도 시간의 흐름에 따라 영상 정보가 변화하는 비디오 영상(동영상)을 포함한다. 본 발명의 영상 인식 장치(10)는 종래의 영상 인식 기술과는 달리, 동영상 단위의 영상 분석을 수행하여 영상을 인식할 수 있다. 보다 상세하게는, 본 발명의 영상 인식 장치(10)는 영상 내 객체의 행동을 분류함으로서 영상을 인식할 수 있다.The image recognized by the image recognition apparatus 10 of the present invention includes a video image (video) in which image information changes over time as well as a still image. Unlike conventional image recognition technology, the image recognition apparatus 10 of the present invention may recognize an image by performing image analysis on a video unit basis. More specifically, the image recognition apparatus 10 of the present invention may recognize an image by classifying the behavior of an object in the image.

본 발명의 영상 인식 장치(10)는 자율 주행 로봇과 같은 자율 주행 장치에 사용되어 자율 주행을 제어하는데 사용될 수 있고, CCTV와 같은 감시 장치에 사용되어 입력 영상의 자동 분석에 사용될 수 있다. 또한, 본 발명의 영상 인식 장치(10)는 홈 IOT(Internet Of Things)의 제어장치로 마련되어 사람의 행동을 자동으로 인식함으로서 가전 기기의 동작을 제어하는데 사용될 수 있다.The image recognition device 10 of the present invention may be used in an autonomous driving device such as an autonomous driving robot to control autonomous driving, and may be used in a monitoring device such as CCTV to be used for automatic analysis of an input image. In addition, the image recognition apparatus 10 of the present invention may be used as a control device of a home Internet of Things (IOT) and may be used to control the operation of a home appliance by automatically recognizing a human action.

본 발명에서 영상 인식 장치(10)가 사용하는 제1 인식기 및 제2 인식기는 여러 계층을 가진 깊은 신경망 구조를 가지는 딥러닝 알고리즘을 기반으로 지도 학습(Supervised Learning)되는 신경망(Neural Network)으로 마련될 수 있다. 바람직하게는, 본 발명의 영상 인식 장치(10)가 사용하는 제1 인식기 및 제2 인식기는 지도 학습을 기반으로 학습되는 신경망 구조로서, 적어도 하나의 컨벌루션 레이어 및 풀링 레이어를 포함하는 합성곱 신경망(Convolutional Neural Network)으로 마련될 수 있다. 역전파 알고리즘을 기반으로 고안된 인공 신경망은 신경망의 계층이 많아지게 될 경우 과도한 깊이로 인한 학습 시간 지연과 등의 문제점이 밝혀지게 됨으로써 한동안 연구가 정체되었으나, 오버 피팅 문제가 dropout등의 방법을 통해 해결됨으로써 알고리즘의 성능이 비약적으로 향상되었다.In the present invention, the first recognizer and the second recognizer used by the image recognition apparatus 10 may be provided as a neural network that is supervised learning based on a deep learning algorithm having a deep neural network structure having multiple layers. Can be. Preferably, the first recognizer and the second recognizer used by the image recognition apparatus 10 of the present invention are a neural network structure that is learned based on supervised learning, and a composite product neural network including at least one convolutional layer and a pooling layer ( Convolutional Neural Network). The artificial neural network designed based on the back propagation algorithm has been delayed for some time because the problems such as delay of learning time due to excessive depth are revealed when the neural network becomes large. However, the overfit problem is solved through the method of dropout. As a result, the performance of the algorithm has improved dramatically.

영상 인식 장치(10)가 이용하는 대표적 딥러닝 구조인 합성곱 신경망 (Convolutional Neural Network)은 사람의 시각 처리 과정을 모방하여 고안되었기 때문에 영상 및 이미지 처리에 적합한 딥러닝 알고리즘으로 평가 받고 있으며, 이미지를 추상화하여 표현할 수 있는 특징(feature)을 추출함으로서 영상 인식 분야에서 높은 성능을 나타내고 있다. Since the convolutional neural network, a representative deep learning structure used by the image recognition apparatus 10, is designed to mimic a human visual processing process, it is evaluated as a deep learning algorithm suitable for image and image processing, and abstracts an image. By extracting a feature that can be represented by a high performance in the field of image recognition.

구체적으로, 합성곱 신경망은 입력 이미지에서 특징(feature)들을 계층적으로 추출하고, 추출된 특징(feature)들을 이용하여 특징맵을 형성할 수 있다. 특징맵상 분포되는 특징들은 재배치된 이미지 정보들을 포함하고, 이를 통하여 합성곱 신경망은 효과적으로 이미지를 분류할 수 있다. 합성곱 신경망은 컨벌루션 레이어를 통하여 추출된 특징(feature)에 활성화 함수(Activation Function)을 적용하고, 다시 풀링 레이어를 반복 배치함으로서, 컨벌루션 레이어에서 추출된 특징 값들의 사이즈를 재조정 할 수 있다.In detail, the convolutional neural network may hierarchically extract features from an input image and form a feature map using the extracted features. The features distributed on the feature map include relocated image information, through which the composite product neural network can effectively classify the images. The multiplicative neural network can resize the feature values extracted from the convolutional layer by applying an activation function to the features extracted through the convolutional layer and repeatedly placing the pooling layer.

본 발명의 영상 인식 장치(10)가 이용하는 합성곱 신경망은 기본적으로 합성곱 신경망 구조로 마련되고, 입력 영상의 형태, 크기 및 화소값 정보에 따라 신경망 내의 구조가 다르게 마련될 수 있다. 즉, 영상 인식 장치(10)가 이용하는 합성곱 신경망의 입력 레이어, 히든 레이어, 및 출력 레이어의 수, 각 레이어에 포함된 노드의 수, 각 노드들의 엣지에 적용되는 가중치, 러닝 레이트(Learning Rate)등은 영상 인식 목적에 따라 서로 다르게 설정될 수 있다. The multiplicative neural network used by the image recognition apparatus 10 according to the present invention basically has a multiplicative neural network structure, and a structure within the neural network may be provided differently according to the shape, size, and pixel value information of the input image. That is, the number of input layers, hidden layers, and output layers of the composite product neural network used by the image recognition apparatus 10, the number of nodes included in each layer, a weight applied to the edges of the nodes, and a learning rate. Etc. may be set differently according to the purpose of image recognition.

또한, 영상 인식 장치(10)가 이용하는 합성곱 신경망은 신경망은 VGG-Net, GoogLeNet 및 ResNet 와 같은 구조가 적용된 신경망을 이용할 수 있다. In addition, the composite product neural network used by the image recognition apparatus 10 may use a neural network to which a structure such as VGG-Net, GoogLeNet, and ResNet is applied.

또한, 본 발명의 영상 인식 장치(10)가 사용하는 신경망들은 컨벌루션 레이어와 풀링 레이어(pooling layer)를 포함하는 CNN 구조에 풀리 커넥티드 레이어(fully-connected)가 연결된 구조로서, 과적합 문제가 발생하지 않는다면 계층의 깊이가 깊을수록 이미지 인식 정확도가 높아질 수 있다. 도 2를 참조하여 설명한다.In addition, the neural networks used by the image recognition apparatus 10 of the present invention have a structure in which a fully-connected layer is connected to a CNN structure including a convolutional layer and a pooling layer, and an overfit problem occurs. Otherwise, the deeper the layer, the higher the accuracy of image recognition. It demonstrates with reference to FIG.

스트림 생성부(100)는 관심 영역 검출부(120), 관심 영역 연결부(140) 및 위치 정보 산출부(160)를 포함한다. 예를 들어, 스트림 생성부(100)는 적어도 하나의 객체들을 포함하는 입력 영상에서 상기 객체 별 행동에 관한 모션 정보를 포함하는 액션 스트림을 생성한다. 본 발명의 액션 스트림(Action Stream)은 영상 내 존재하는 객체 별로 생성되는 액션 튜브(Action Tube)를 신경망에 입력되기 위한 형태로 전처리하여 생성할 수 있다. The stream generator 100 includes an ROI detector 120, an ROI 140, and a location information calculator 160. For example, the stream generator 100 generates an action stream including motion information regarding the action of each object in an input image including at least one object. The action stream of the present invention may be generated by preprocessing an action tube generated for each object existing in an image into a neural network.

본 발명의 액션 튜브(Action Tube)는 입력 영상에서 객체의 적어도 일부를 포함하는 3차원(X, Y, Z) 영상 데이터의 집합으로 객체 별로 생성될 수 있다. 본 발명의 액션 튜브는 시간의 흐름에 따라 변하는 관심 영역들 내 화소값 변화량을 포함하여 객체의 행동 변화에 관한 모션 정보를 포함할 수 있다. 본 발명의 모션 정보는 시간의 흐름에 따라 변하는 관심 영역내의 화소값들의 변화량의 일종으로, 단위 시간에서 픽셀 별 화소값 변화량의 미분값으로 마련될 수도 있다. The action tube of the present invention may be generated for each object as a set of three-dimensional (X, Y, Z) image data including at least a part of the object in the input image. The action tube of the present invention may include motion information regarding a change in behavior of an object, including an amount of change in pixel values in regions of interest that change over time. The motion information of the present invention is a kind of change amount of pixel values in the ROI that change with time, and may be provided as a derivative value of the pixel value change amount for each pixel in unit time.

예를 들어, 스트림 생성부(100)는 입력 영상을 분할하여 소정의 시간 간격을 가지는 프레임 영상들을 생성할 수 있고, 생성된 프레임 영상들에서 객체 별로 관심 영역을 검출하고, 검출된 관심 영역을 연결함으로서 액션 튜브를 생성하며, 생성된 액션 튜브를 전처리하여 액션 스트림을 생성할 수 있다. For example, the stream generator 100 may generate frame images having a predetermined time interval by dividing an input image, detect an ROI for each object from the generated frame images, and connect the detected ROIs. By doing so, an action tube can be generated, and the generated action tube can be preprocessed to generate an action stream.

소정의 시간 간격을 가지는 프레임 영상들에서 검출된 관심 영역을 연결하여 생성된 액션 튜브를 전처리하여 생성되는 액션 스트림은 시간의 흐름에 따라 변하는 화소값 변화량을 포함하는 액션 튜브를 전처리하여 생성되므로 시간의 흐름에 따른 객체의 행동 변화를 나타낼 수 있다. Since the action stream generated by preprocessing the action tube generated by concatenating the regions of interest detected in the frame images having a predetermined time interval is generated by preprocessing the action tube including the amount of change in pixel value according to time. It can represent the behavior change of the object according to the flow.

본 발명의 스트림 생성부(100)에서 객체 별로 생성된 액션 스트림은 56*56*3*16의 사이즈를 가질 수 있다. 스트림 생성부(100)에서 생성된 액션 스트림은 가로*세로 사이즈가 56*56인 관심 영역들을 연결하여 생성되고, RGB표색계의 화소값(픽셀값)을 포함하며, 하나의 액션 스트림은 16개의 영상 프레임들을 포함할 수 있다. 스트림 생성부(100)에서 생성된 액션 스트림의 사이즈는 제1 인식기의 설정 방법에 따라 달라질 수 있다. 도 3을 참조하여 설명한다.The action stream generated for each object in the stream generator 100 of the present invention may have a size of 56 * 56 * 3 * 16. The action stream generated by the stream generator 100 is generated by concatenating regions of interest that are 56 * 56 in width and length, and includes pixel values (pixel values) of an RGB color system, and one action stream includes 16 images. It may include frames. The size of the action stream generated by the stream generator 100 may vary depending on the setting method of the first recognizer. It demonstrates with reference to FIG.

관심영역 검출부(120)는 특징 정보 산출부(122) 및 바운더리 셀 제거부(126)를 포함한다. 예를 들어, 관심 영역 검출부(120)는 입력 영상을 시간의 흐름에 따라 분할하여 복수개의 프레임 영상들을 생성하며, 상기 생성된 프레임 영상들에서 상기 객체 별로 구분되어 상기 객체를 적어도 일부 포함하는 관심 영역을 검출할 수 있다. 도 4를 참조하여 설명한다.The ROI detector 120 includes a feature information calculator 122 and a boundary cell remover 126. For example, the ROI detector 120 generates a plurality of frame images by dividing an input image over time, and includes an ROI that includes at least a portion of the objects divided by the objects in the generated frame images. Can be detected. It demonstrates with reference to FIG.

특징 정보 산출부(122)는 전처리부(123) 및 계산부(124)를 포함한다. 예를 들어, 특징 정보 산출부(122)는 생성된 프레임 영상 내 상기 관심 영역이 위치하는 좌표를 나타내는 특징 정보를 산출할 수 있다. 본 발명에서 특징 정보는 프레임 영상 내 검출될 관심 영역의 위치에 관한 좌표로서, 프레임 영상에서 관심 영역의 중심 좌표 또는 관심 영역의 경계에 위치하는 좌표로 마련될 수 있다. 도 5를 참조하여 설명한다.The feature information calculator 122 includes a preprocessor 123 and a calculator 124. For example, the feature information calculator 122 may calculate feature information indicating coordinates at which the ROI is located in the generated frame image. In the present invention, the feature information may be provided as coordinates regarding a position of the ROI to be detected in the frame image, and may be provided as a center coordinate of the ROI in the frame image or a coordinate located at a boundary of the ROI. It demonstrates with reference to FIG.

전처리부(123)는 생성된 프레임 영상들을 각각 분할하여 도 5에 도시된 바와 같이, 미리 결정된 크기의 격자셀을 생성할 수 있다. 예를 들어, 전처리부(123)는 9*6 또는 7*7 크기의 격자셀들을 생성할 수 있다. 전처리부(123)에서 생성된 격자 셀들은 각각의 격자 셀들에 종속되는 바운더리 셀들을 포함할 수 있다.The preprocessor 123 may generate the grid cells having a predetermined size as shown in FIG. 5 by dividing the generated frame images. For example, the preprocessor 123 may generate grid cells having a size of 9 * 6 or 7 * 7. The grid cells generated by the preprocessor 123 may include boundary cells depending on the grid cells.

계산부(124)는 생성된 프레임 영상들을 입력으로 하고, 상기 격자셀 내부에 중심을 가지고 상기 객체가 존재하는 확률을 나타내는 바운더리 셀의 중심 좌표 또는 상기 바운더리 셀 내 상기 객체가 존재하는 확률을 출력으로 하는 제2 인식기(Neural Network)을 이용하여 상기 바운더리 셀 각각의 중심 좌표 및 상기 바운더리 셀 내의 상기 객체가 존재하는 확률을 계산한다. The calculation unit 124 receives the generated frame images as inputs and outputs the center coordinates of a boundary cell indicating a probability that the object exists in the grid cell or the probability that the object exists in the boundary cell. The center coordinate of each boundary cell and a probability that the object exists in the boundary cell are calculated using a second network.

본 발명의 제2 인식기는 지도 학습을 기반으로 학습되는 신경망 구조로서, 적어도 하나의 컨벌루션 레이어 및 풀링 레이어를 포함하는 합성곱 신경망(Convolutional Neural Network)으로 마련될 수 있다. 바람직하게는, 계산부(124)가 이용하는 제2 인식기는 R-CNN(Regions with Convolutional Neural Network) 또는 Faster R-CNN으로 마련될 수 있다. The second recognizer of the present invention is a neural network structure that is learned based on supervised learning, and may be provided as a convolutional neural network including at least one convolutional layer and a pooling layer. Preferably, the second recognizer used by the calculator 124 may be provided as Regions with Convolutional Neural Network (R-CNN) or Faster R-CNN.

특징 정보 산출부(122)는 바운더리 셀 각각의 중심 좌표 및 상기 바운더리 셀 내의 상기 객체가 존재하는 확률이 계산된 바운더리 셀들을 이용하여 특징 정보를 산출할 수 있다. 예를 들어, 계산부(124)가 이용하는 제2 인식기는 특징(feature)을 추출하는 적어도 하나의 컨벌루션 레이어 및 상기 컨벌루션 레이어의 일단에 연결되어 상기 바운더리 셀의 중심 좌표와 상기 바운더리 셀 내의 상기 객체가 존재하는 확률을 계산하는 적어도 하나의 풀리 커넥티드 레이어를 포함할 수 있다. 또한, 본 발명의 계산부(140)가 이용하는 제2 인식기는 컨벌루션 레이어 및 풀리 커넥티드 레이어에 더하여 컨벌루션 레이어와 교대로 반복 배치되는 풀링 레이어를 더 포함할 수 있다. The feature information calculator 122 may calculate feature information by using the center cells of each boundary cell and boundary cells having a probability that the object exists in the boundary cell. For example, the second recognizer used by the calculation unit 124 is connected to at least one convolutional layer for extracting a feature and one end of the convolutional layer so that the center coordinates of the boundary cell and the object in the boundary cell are connected. It may include at least one pulley connected layer for calculating the existence probability. In addition, the second recognizer used by the calculator 140 of the present invention may further include a pooling layer that is alternately arranged with the convolutional layer in addition to the convolutional layer and the pulley connected layer.

본 발명의 일 실시 예에 따른 특징 정보 산출부(122)가 특징 정보를 산출하는 과정을 설명하면 다음과 같다. 특징 정보 산출부(122)는 적어도 하나의 객체들이 포함된 입력 영상을 분할하여 프레임 영상을 생성하고, 상기 프레임 영상에서 기 설정된 간격의 격자 셀들을 생성한다. 계산부(124)는 제2 인식기를 이용하여 격자셀 내부에 중심 좌표를 가지고, 상기 격자셀에 종속되는 임의의 수의 바운더리 셀들(127, 128, 129, 132, 133)을 생성함과 동시에, 바운더리 셀 내부에 포함되는 객체들이 존재하는 확률을 계산할 수 있다.Referring to the process of calculating the feature information by the feature information calculation unit 122 according to an embodiment of the present invention as follows. The feature information calculator 122 generates a frame image by dividing an input image including at least one object, and generates grid cells having a predetermined interval from the frame image. The calculation unit 124 has a center coordinate inside the grid cell by using the second recognizer, and generates an arbitrary number of boundary cells 127, 128, 129, 132, and 133 that are dependent on the grid cell. The probability that there are objects included in the boundary cell can be calculated.

바운더리 셀 제거부(126)는 프레임 영상들 내에 상기 객체를 중복하여 포함하는 바운더리 셀 중 상기 바운더리 셀 내에 상기 객체가 존재하는 확률이 기 설정된 임계치 이상인지 여부를 고려하여 상기 객체를 중복하여 포함하는 바운더리 셀의 일부를 제거한다. 예를 들어, 바운더리 셀 제거부(126)는 계산부(124)가 생성한 바운더리 셀 내부에 객체가 존재할 확률이 모두 다른 바운더리 셀들(127, 128, 129, 132, 133, 134)중에서 내부에 객체가 존재할 확률이 가장 높은 바운더리 셀(132, 133)을 관심 영역으로 검출할 수 있다.The boundary cell remover 126 includes a boundary cell that overlaps the object in consideration of whether a probability that the object exists in the boundary cell among the boundary cells including the object in the frame images is greater than or equal to a preset threshold. Remove part of the cell. For example, the boundary cell remover 126 may select an object from among boundary cells 127, 128, 129, 132, 133, and 134 which have different probability that the object exists in the boundary cell generated by the calculator 124. The boundary cells 132 and 133 having the highest probability of are detected as the ROI.

이를 위하여 바운더리 셀 제거부(126)는 프레임 영상에서 존재하는 임의의 객체를 중복하여 포함하는 바운더리 셀 들 중에서 객체가 존재하는 확률이 기 설정된 임계치 이상인 바운더리 셀(132, 133)들만을 남기고 나머지 바운더리 셀들(127, 128, 129, 134)을 제거할 수 있다. 또한, 바운더리 셀 제거부(126)는 셀 내부에 병변이 존재할 확률이 기 설정된 임계치 이하인 바운더리 셀을 제거하기 위하여 NMS(Nom-maximal Suppression 비-최대값 억제) 알고리즘을 사용하여 오직 하나의 바운더리 셀만을 남길 수 있다. 도 6을 참조하여 설명한다.To this end, the boundary cell remover 126 may leave only the boundary cells 132 and 133 having a probability that an object exists more than a preset threshold among boundary cells including an arbitrary object existing in the frame image. (127, 128, 129, 134) can be removed. In addition, the boundary cell removal unit 126 uses only one boundary cell by using a Nom-maximal Suppression Non-Maximum Suppression (NMS) algorithm in order to remove a boundary cell having a probability that a lesion is present within a cell. I can leave it. It demonstrates with reference to FIG.

관심 영역 검출부(120)는 전술한 바와 같이, 특징 정보 산출부(122)에서 산출된 특징 정보들을 이용하여 입력 영상이 분할되어 생성된 프레임 영상들에서 사람 또는 사물을 포함하는 적어도 하나의 객체에 대한 이미지가 포함된 관심 영역을 검출할 수 있다. 도 7을 참조하여 설명한다.As described above, the ROI detector 120 may detect at least one object including a person or an object from the frame images generated by splitting the input image using the feature information calculated by the feature information calculator 122. The region of interest including the image may be detected. It demonstrates with reference to FIG.

관심 영역 연결부(140)는 연결 점수 산출부(142) 및 특징 정부 추정부(144)를 포함한다. 예를 들어, 관심 영역 연결부(140)는 인접하는 서로 다른 프레임 영상들에서 검출된 상기 관심 영역간 연결 점수를 산출하며, 상기 산출된 연결 점수를 고려하여 상기 서로 다른 프레임 영상들에서 검출된 상기 관심 영역을 연결한다. 도 8을 참조하여 설명한다.The ROI 140 includes a connection score calculator 142 and a feature government estimator 144. For example, the ROI 140 may calculate a connection score between the ROIs detected from adjacent different frame images, and the ROI detected from the different frame images in consideration of the calculated connection score. Connect It demonstrates with reference to FIG.

예를 들어, 관심 영역 연결부(140)는 입력 영상이 분할되어 생성된 복수의 프레임 영상들(302, 304, 306, 308)에서 각각 검출된 제1 객체에 대한 관심 영역들(331, 333, 335, 337, 339, 341, 343)과 제2 객체에 대한 관심 영역들(332, 334, 336, 338, 340, 342, 344) 각각을 연결하여 제1 객체에 대한 제1 액션 튜브(361)과 제2 객체에 대한 제2 액션 튜브(362)를 생성할 수 있다. For example, the ROI 140 may include ROIs 331, 333, 335 of the first object detected in the plurality of frame images 302, 304, 306, and 308 generated by splitting an input image. , 337, 339, 341, 343, and regions of interest 332, 334, 336, 338, 340, 342, and 344 for the second object, respectively, to connect the first action tube 361 to the first object. A second action tube 362 may be generated for the second object.

또한, 관심 영역 연결부(140)는 생성된 제1 액션 튜브(361)과 제2 객체에 대한 제2 액션 튜브(362)를 전처리하여 제1 객체 및 제2 객체에 대한 액션 스트림을 각각 생성할 수 있다. 즉, 본 발명의 액션 스트림들은 시간의 흐름에 따른 객체 별 행동에 대한 모션 정보들을 포함할 수 있음은 전술한 바와 같다. 본 발명에서 관심 영역 연결부(140)는 프레임 영상들 내에서 생성된 관심 영역들을 연결함에 있어, 객체 별로 연결하여 액션 튜브를 생성하고, 생성된 액션 튜브들을 전처리하여 객체 별 액션 스트림을 생성할 수 있다.In addition, the ROI 140 may preprocess the generated first action tube 361 and the second action tube 362 for the second object to generate an action stream for the first object and the second object, respectively. have. In other words, as described above, the action streams of the present invention may include motion information about an action for each object over time. In the present invention, the region of interest connector 140 may connect the regions of interest generated in the frame images, generate action tubes by connecting the objects, and generate the action stream for each object by preprocessing the generated action tubes. .

연결 점수 산출부(142)는 인접한 프레임 영상들 각각에서 검출된 상기 관심 영역들의 특징 정보를 입력으로 하는 유사도 함수, 상기 인접한 프레임 영상들 각각에서 검출된 상기 관심 영역들의 오버렙 비율을 출력으로 하는 교차비 함수 및 상기 관심 영역들의 클래스 정보의 유사도 중 적어도 하나를 고려하여 연결 점수를 산출한다. The connection score calculator 142 outputs a similarity function as an input of feature information of the ROIs detected in the adjacent frame images, and an overlap ratio for outputting an overlap ratio of the ROIs detected in the adjacent frame images. The connection score is calculated by considering at least one of a similarity between a function and class information of the ROIs.

여기에서, sim()은 프레임 영상들 각각에서 검출된 관심 영역들의 특징 정보를 입력으로 하여 0~1사이의 출력값을 가지는 유사도 함수, ov()는 인접한 프레임 영상들에서 각각 검출된 관심 영역들의 오버렙 비율을 출력으로 하는 교차비 함수,

는 인접한 프레임 영상들에서 각각 검출된 관심 영역들의 행동 분류를 위한 클래스 정보의 유사도,

및

는 인접한 프레임 영상들의 인덱스 번호,

는 임의의 스칼라 값,

는

인덱스 번호의 프레임 영상에서 검출된 관심 영역의 특징 정보,

는

인덱스 번호의 프레임 영상에 검출된 관심 영역의 특징 정보를 나타낸다. 본 발명의 프레임 영상에서 검출된 관심 영역의 특징 정보는 관심 영역이 해당 프레임 영상에서 위치하는 좌표 정보를 포함함은 전술한 바와 같다. 상기 클래스 정보는 관심 영역 에서 검출된 객체의 행동을 분류하는 카테고리(목록)를 의미한다.Here, sim () is a similarity function having an output value between 0 and 1 by inputting feature information of the ROIs detected in each of the frame images, and ov () is over of the ROIs detected in adjacent frame images, respectively. An odds ratio function with the output

Is similarity of class information for behavior classification of the ROIs detected in adjacent frame images,

And

Is the index number of adjacent frame images,

Is any scalar value,

Is

Feature information of the ROI detected from the frame image of the index number,

Is

The feature information of the ROI detected in the frame image of the index number is shown. As described above, the feature information of the ROI detected from the frame image of the present invention includes coordinate information located in the ROI of the frame image. The class information refers to a category (list) that classifies the behavior of the detected object in the ROI.

또한, 상기 수학식 1에서 ov() 함수는 IOU(Intersection Over Union) 값을 출력으로 하는 교차비 함수로서, 인접한 프레임 영상들 내에 각각 검출된 관심 영역들의 교집합 영역 및 합집합 영역을 입력으로 한다. 구체적으로는, 상기 교차비 함수는 상기 교집합 영역을 합집합 영역으로 나눈값을 출력으로 할 수 있다.In addition, in Equation 1, the ov () function is an intersection ratio function that outputs an IOU (Intersection Over Union) value, and inputs an intersection region and a union region of the ROIs detected in adjacent frame images, respectively. Specifically, the intersection ratio function may output a value obtained by dividing the intersection region by the union region.

본 발명의 일 실시 예에 따른 연결 점수 산출부(142)가 연결 점수(Link Score)를 산출하는 과정을 영상 프레임(302, 304)에서 설명하면 다음과 같다. 연결 점수 산출부(142)는 제1 영상 프레임(302)에서 검출된 관심 영역들(331, 332)중 제1 객체에 관한 제1 관심 영역(331)과 제2 영상 프레임(304)내의 제1 관심 영역(333)의 연결 점수를 산출한다. The process of calculating the link score by the link score calculator 142 according to an embodiment of the present invention will be described below with reference to the image frames 302 and 304. The connection score calculator 142 may include the first ROI 331 of the first object of the ROIs 331 and 332 detected in the first image frame 302 and the first within the second image frame 304. The connection score of the region of interest 333 is calculated.

또한, 연결 점수 산출부(142)는 제1 영상 프레임(302)내의 제1 관심 영역(331)과 제2 영상 프레임(304)내의 제2 관심 영역(334)의 연결 점수를 산출한다. 즉, 본 발명의 연결 점수 산출부(142)는 하나의 영상 프레임 내의 관심 영역이 인접한 다른 영상 프레임 내의 관심 영역과 연결될 수 있는 모든 경우의 연결(Link)에 대하여 연결 점수(Link Score)를 산출할 수 있다. 본 발명의 관심 영역 연결부(140)는 연결 점수가 가장 높게 나타나게 영상 프레임 내의 관심 영역들을 연결할 수 있다. 도 9를 참조하여 설명한다. In addition, the connection score calculator 142 calculates a connection score between the first ROI 331 in the first image frame 302 and the second ROI 334 in the second image frame 304. That is, the link score calculator 142 of the present invention calculates a link score for all the links in which the ROI in one image frame can be connected to the ROI in another adjacent image frame. Can be. The region of interest connector 140 of the present invention may connect regions of interest in the image frame such that the connection score is the highest. It demonstrates with reference to FIG.

특징 정보 추정부(144)는 관심 영역이 검출되지 않은 프레임 영상이 존재하는 경우, 상기 관심 영역이 검출되지 않은 프레임 영상(372)에 인접한 프레임 영상들(375)에서 검출된 관심 영역들(335, 336, 337, 338, 341, 342,343, 344)의 특징 정보를 기반으로 상기 관심 영역이 검출되지 않은 프레임 영상(372) 내의 관심 영역의 특징 정보를 추정한다. When there is a frame image in which the ROI is not detected, the feature information estimator 144 detects the ROIs 335 detected in the frame images 375 adjacent to the frame image 372 in which the ROI is not detected. Based on the feature information of 336, 337, 338, 341, 342, 343, and 344, feature information of the ROI in the frame image 372 in which the ROI is not detected is estimated.

예를 들어, 본 발명의 관심 영역 검출부(120)가 입력 영상을 분할하여 프레임 영상에서 각각 관심 영역을 검출함에 있어서, 영상 내 화질 등의 문제로 인하여 관심 영역을 제대로 추출하지 못하는 경우가 발생할 수 있다. 즉, 도 9에 도시된 바와 같이, 프레임 영상들(375)중에서 관심 영역이 검출되지 않은 프레임 영상(372)이 존재할 수 있다. For example, when the ROI detection unit 120 of the present invention divides an input image and detects ROIs from a frame image, the ROI may not be properly extracted due to problems such as image quality. . That is, as shown in FIG. 9, there may exist a frame image 372 in which the ROI is not detected among the frame images 375.

이러한 경우, 특징 정보 추정부(144)는 관심 영역이 검출되지 않은 프레임 영상들(374, 376, 378)에 인접한 프레임 영상들(371, 373, 379)내의 관심 영역들의 특징 정보를 함수 입력으로 가지는 선형 함수를 생성하고, 생성된 선형함수를 이용하여 관심 영역이 검출되지 않은 프레임 영상(374, 376, 378) 내 관심 영역의 특징 정보를 추정할 수 있다. In this case, the feature information estimator 144 has, as a function input, feature information of the ROIs in the frame images 371, 373, 379 adjacent to the frame images 374, 376, 378 where the ROI is not detected. A linear function may be generated, and the generated linear function may be used to estimate characteristic information of the ROI in the frame images 374, 376, and 378 in which the ROI is not detected.

본 발명의 특징 정보는 프레임 영상 내에서 관심 영역의 좌표를 의미하고, 특징 정보 추정부(144)는 관심 영역이 검출되지 않은 프레임 영상들(374, 376, 378)에 인접한 프레임 영상들(371, 373, 379)내의 관심 영역들의 위치를 기반으로 관심 영역이 검출되지 않은 프레임 영상들(274, 276, 278)에서 검출된 관심 영역의 위치를 추정할 수 있다. The feature information of the present invention means the coordinates of the ROI in the frame image, and the feature information estimator 144 can determine the frame images 371, which are adjacent to the frame images 374, 376, 378 for which the ROI is not detected. Based on the positions of the regions of interest in 373 and 379, the positions of the regions of interest detected in the frame images 274, 276, and 278 in which the regions of interest are not detected may be estimated.

즉, 특징 정보 추정부(144)는 관심 영역이 검출된 인접한 프레임 영상들(371, 373, 379)의 특징 정보들을 기반으로 보간 과정(Interpolation)을 수행함으로서 관심 영역이 검출되지 않은 프레임 영상들의 관심 영역의 좌표 정보인 특징 정보를 추정한다. 관심 영역 연결부(140)는 추정된 특징 정보를 이용하여 관심 영역이 검출되지 않은 프레임 영상 내 추정된 관심 영역을 생성하고, 추정된 관심 영역들을 인접한 프레임 영상 내의 관심 영역들과 연결할 수 있다. That is, the feature information estimator 144 performs an interpolation process based on the feature information of adjacent frame images 371, 373, and 379 from which the ROI is detected, so that the ROI of the image of the ROI is not detected. The feature information which is the coordinate information of the area is estimated. The region of interest connector 140 may generate an estimated region of interest in the frame image in which the region of interest is not detected using the estimated feature information, and may connect the estimated regions of interest with regions of interest in the adjacent frame image.

위치 정보 산출부(160)는 영역 구분부(162) 및 재배열 부(164)를 포함한다. 예를 들어, 위치 정보 산출부(160)는 프레임 영상들에서 검출된 상기 관심 영역을 프레임 영상 단위로 재배열하고, 상기 재배열된 관심 영역을 연결하여 생성된 조합 스트림으로부터 상기 액션 스트림의 상호간 위치에 관한 위치 정보를 산출한다.The location information calculator 160 includes an area separator 162 and a rearrangement unit 164. For example, the location information calculator 160 rearranges the ROIs detected in the frame images in frame image units, and positions the action streams from the combined streams generated by concatenating the rearranged ROIs. Compute positional information about.

예를 들어, 위치 정보 산출부(160)는 프레임 영상들에서 객체 별로 검출된 관심 영역들을 소정의 조합 방법으로 재배열하고, 재배열된 관심 영역들을 이용하여 조합 스트림을 생성하며, 생성된 조합 스트림을 이용하여 객체 별로 생성된 액션 스트림의 상호간 위치에 관한 위치 정보를 산출한다. 본 발명에서 위치 정보는 외형 정보가 아닌 프레임 영상 내에서 객체 별로 검출된 관심 영역들의 상호 위치에 대한 관계를 의미하며, 위치 정보를 산출하는 구체적인 방법은 후술한다.For example, the location information calculator 160 rearranges the ROIs detected for each object in the frame images by a predetermined combination method, generates a combination stream using the rearranged ROIs, and generates the generated combination stream. Using the to calculate the position information about the mutual position of the action stream generated for each object. In the present invention, the position information refers to a relationship of mutual positions of the ROIs detected for each object in the frame image, not the appearance information, and a detailed method of calculating the position information will be described later.

예를 들어, 위치 정보 산출부(160)는 제1 객체(사람), 제2 객체(사물)을 포함하는 입력 영상에서, 제1 객체에 관한 관심 영역들과 제2 객체에 관한 관심 영역들을 교차 배열 하거나 또는 밀집 배열 함으로서 조합 스트림을 생성하고, 생성된 조합 스트림을 이용하여 제1 객체에 관한 액션 스트림과 제2 객체에 관한 액션 스트림의 상호간 위치에 관한 위치 정보를 산출할 수 있다. 본 발명의 위치 정보 산출부(160)가 객체별로 생성된 액션 스트림의 위치 관계를 산출하기 위한 방법은 마스킹 또는 바이너리 처리에 제한되는 것은 아니며, 관심 영역들의 위치관계를 입력하기 위한 기타 공지의 기술을 포함한다. 도 11을 참조하여 설명한다.For example, the location information calculator 160 intersects the ROIs of the first object and the ROIs of the second object in an input image including the first object (person) and the second object (thing). By arranging or densely arranging, a combination stream may be generated, and the generated combination stream may be used to calculate position information regarding positions of the action streams related to the first object and the action streams related to the second object. The method for calculating the positional relationship of the action stream generated for each object by the positional information calculating unit 160 of the present invention is not limited to masking or binary processing, and other well-known techniques for inputting the positional relationship of the ROIs are described. Include. It demonstrates with reference to FIG.

영역 구분부(162)는 프레임 영상들에서 상기 객체 별로 검출된 상기 관심 영역 및 상기 관심 영역이 검출되지 않은 부분을 구분하여 마스킹 처리한다. 관심 영역 검출부(120)가 제1 객체(사람) 및 제2 객체(사물)을 포함하는 입력 영상에서, 제1 객체 및 제2 객체에 관한 관심 영역들을 검출한 경우를 예로 설명한다.The area divider 162 classifies and masks a portion of the ROI detected for each object and a portion of the frame images in which the ROI is not detected. A case in which the ROI detection unit 120 detects ROIs related to the first object and the second object from an input image including the first object (person) and the second object (thing) will be described as an example.

이 경우, 본 실시예에서 영역 구분부(162)는 관심 영역이 검출되지 않은 부분을 구분하여 마스킹 처리하는 마스킹부(미도시)를 포함할 수 있다.In this case, in the present exemplary embodiment, the area separator 162 may include a masking part (not shown) for classifying and masking a portion in which the ROI is not detected.

예를 들어, 마스킹부(미도시)는 프레임 영상들에서 제1 객체에 관한 관심 영역들이 검출된 부분과 제1 객체에 관한 관심 영역이 아닌 부분을 2가지 화소값 정보로 구분하여 마스킹할 수 있다. 또한, 제2 객체에 관한 관심 영역들이 검출된 부분과 제2 객체에 관한 관심 영역이 아닌 부분을 2가지 화소값 정보로 구분하여 마스킹할 수 있다. For example, the masking unit (not shown) may divide and mask a portion, in which the regions of interest of the first object are detected, and a portion which is not the region of interest in the first object, with two pixel value information in the frame images. . In addition, a portion in which the regions of interest of the second object are detected and a portion which is not the region of interest of the second object may be divided and masked by two pieces of pixel value information.

본 발명에서 영역 구분부(162)가 프레임 영상들에서 관심 영역들이 검출된 부분과 관심 영역들이 검출되지 않은 부분을 마스킹 하는 것은 프레임 영상을 바이너리(Binary) 처리하는 것으로 마련될 수 있다. 본 발명의 위치 정보 산출부(160)는 영역 구분부(162)에서 수행되는 마스킹 과정을 기반으로, 각 프레임 영상들에서 검출된 관심 영역들간의 위치 관계를 나타내는 위치 정보를 산출할 수 있다. In the present invention, masking a portion in which the regions of interest are detected and portions in which the regions of interest are not detected by the region separator 162 may be provided by performing binary processing on the frame image. The location information calculator 160 may calculate location information indicating a positional relationship between the ROIs detected in the frame images, based on a masking process performed by the area divider 162.

재배열 부(164)는 구분된 관심 영역을 소정의 조합 방법으로 재배열한다. 예를 들어, 재배열 부(164)는 제1 객체에 관한 마스킹된 관심 영역들(277)과 제2 객체에 관한 마스킹된 관심 영역들(279)을 소정의 조합 방법으로 조합할 수 있다. 재배열 부(164)가 구분된 관심 영역들을 조합하는 방법은 프레임별 정보를 고려하여 제1 객체에 관한 마스킹된 관심 영역-제2 객체에 관한 마스킹된 관심 영역-제1객체에 관한 마스킹된 관심 영역과 같이 객체 별 프레임 영상들을 교차 조합하는 방법을 포함한다.The rearrangement unit 164 rearranges the divided ROIs in a predetermined combination method. For example, the rearrangement unit 164 may combine the masked ROIs 277 related to the first object and the masked ROIs 279 related to the second object in a predetermined combination method. The rearrangement unit 164 combines the divided ROIs in consideration of the frame-by-frame information so that the masked ROI of the first object-masked ROI of the second object-masked ROI of the first object. It includes a method of cross-combining the frame images of each object like the region.

또한, 재배열 부(164)가 구분된 관심 영역들을 조합하는 방법은 제1 객체 및 제2 객체 별 정보를 고려하여, 제1 객체에 관한 마스킹된 관심 영역들을 우선 배열 후, 제2 객체에 관한 마스킹된 관심 영역들을 후 배열하는 방법을 포함한다.In addition, the method in which the rearrangement unit 164 combines the divided ROIs may be performed by first arranging masked ROIs of the first object in consideration of first and second object-specific information. A method of post-arraying the masked regions of interest.

위치 정보 산출부(160)는 소정의 조합 방법으로 재배열된 관심 영역들을 이용하여 조합 스트림을 생성하며, 생성된 조합 스트림을 이용하여 객체 별로 생성된 액션 스트림의 상호간 위치에 관한 위치 정보를 산출할 수 있다. 본 발명에서 위치 정보 산출부(160)가 생성하는 조합 스트림은 제1 객체에 관한 마스킹된 관심 영역들이 연결된 스트림과 제2 객체에 관한 마스킹된 관심 영역들이 연결된 스트림 쌍을 포함하는 페어와이즈 스트림(Pairwise Stream)으로 마련될 수 있다.The location information calculation unit 160 generates a combination stream using the rearranged ROIs by a predetermined combination method, and calculates location information regarding mutual positions of action streams generated for each object using the generated combination stream. Can be. In the present invention, the combined stream generated by the location information calculator 160 includes a pair of streams in which masked regions of interest regarding the first object are connected and a pair of streams in which masked regions of interest regarding the second object are connected. Stream).

본 발명의 페어와이즈 스트림은 영상 내 객체별 액션 스트림에 포함된 외형 정보를 제외한 객체별 액션 스트림의 위치 관계를 나타내는 위치 정보를 포함한다. 즉, 객체위치 정보 산출부(160)는 전술한 조합 스트림을 이용하여 제1 객체에 관한 액션 스트림과 제2 객체에 관한 액션 스트림의 위치 관계를 나타내는 위치 정보를 산출할 수 있으며, 이를 통하여 본 발명의 영상 인식 장치(10)는 영상 내 객체 별 위치 관계를 기반으로 객체의 행동을 인식 할 수 있다.The pairwise stream of the present invention includes location information indicating the positional relationship of the action stream for each object except the appearance information included in the action stream for each object in the image. That is, the object position information calculator 160 may calculate position information indicating the positional relationship between the action stream regarding the first object and the action stream regarding the second object by using the combination stream described above. The image recognition apparatus 10 may recognize the behavior of an object based on the positional relationship of each object in the image.

인식부(200)는 합산부(220) 및 비교부(240)를 포함한다. 예를 들어 인식부(200)는 생성된 액션 스트림 또는 상기 액션 스트림의 위치 정보를 입력으로 하고, 상기 객체의 행동을 분류하기 위한 지표로서 적어도 하나의 클래스 벡터를 출력으로 하는 제1 인식기를 이용하여 상기 객체들의 행동을 인식할 수 있다. 본 발명의 인식부(200)가 이용하는 클래스 벡터는 상기 객체의 행동을 분류하는 행동 목록과 상기 입력 영상 내 객체들의 행동이 상기 행동 목록에 해당할 확률을 나타내는 확률 정보를 포함한다. 인식부(200)는 상기 합산된 클래스 벡터의 상기 행동 목록별 상기 확률 정보를 이용하여 상기 객체들의 행동을 인식할 수 있다.The recognition unit 200 includes an adder 220 and a comparator 240. For example, the recognizer 200 uses the generated action stream or position information of the action stream as an input and uses a first recognizer that outputs at least one class vector as an indicator for classifying the behavior of the object. The behavior of the objects can be recognized. The class vector used by the recognition unit 200 of the present invention includes a behavior list classifying the behavior of the object and probability information indicating a probability that the behavior of the objects in the input image corresponds to the behavior list. The recognition unit 200 may recognize the actions of the objects by using the probability information for each action list of the summed class vector.

합산부(220)는 제1 인식기를 이용하여 상기 객체 별로 구분되는 액션 스트림 각각의 클래스 벡터를 미리 설정된 방법에 따라 합산하는 합산한다. 예를 들어, 합산부(220)는 스트림 생성부(100)에서 객체 별로 생성된 액션 스트림 및 객체 별 생성된 액션 스트림의 위치 관계를 포함하는 조합 스트림을 입력으로 하는 제1 인식기의 출력 값을 행동 목록 별로 미리 설정된 방법에 따라 합산할 수 있다.The adder 220 adds the class vectors of the action streams classified for each object according to a preset method using a first recognizer. For example, the adder 220 acts on an output value of a first recognizer that uses a combination stream including positional relationships between an action stream generated for each object and an action stream generated for each object in the stream generator 100. Each list can be added according to a preset method.

예를 들어, 합산부(220)는 제1 인식기에 제1 객체(사람)에 관한 액션 스트림을 입력 시 출력되는 제1 객체(사람)의 행동을 분류하기 위한 지표인 제1 클래스 벡터, 제1 인식기에 제2 객체(사물)에 관한 액션 스트림을 입력 시 출력되는 제2 객체(사물)의 행동을 분류하기 위한 지표인 제2 클래스 벡터 및 제1 인식기에 사람 또는 사물 별로 생성된 액션 스트림의 위치 관계를 포함하는 조합 스트림을 입력 시에 출력되는 제3 클래스 벡터에 서로 다른 가중치를 적용하여 합산하거나, 1/3과 같이 동일한 가중치를 적용하여 합산할 수 있다. 합산부(220)가 제1 클래스 벡터, 제2 클래스 벡터 및 제3 클래스 벡터에 동일한 가중치인 1/3을 적용하여 합산하는 것은 제1 클래스 벡터, 제2 클래스 벡터 및 제3 클래스 벡터의 평균을 구하는 과정으로 마련될 수 있다.For example, the adder 220 may classify the behavior of the first object (person) that is output when the action stream regarding the first object (person) is input to the first recognizer. The second class vector, which is an indicator for classifying the behavior of the second object (object) output when the action stream regarding the second object (object) is input to the recognizer, and the location of the action stream generated for each person or object in the first recognizer The combined stream including the relationship may be summed by applying different weights to the third class vector output at the time of input, or may be summed by applying the same weight as 1/3. The summing unit 220 adds equal weights 1/3 to the first class vector, the second class vector, and the third class vector to obtain an average of the first class vector, the second class vector, and the third class vector. Can be prepared by the process of seeking.

비교부(240)는 합산된 클래스 벡터 내 행동을 분류하는 행동 목록 별 확률 정보들을 미리 마련된 정답 데이터 셋(UCF-101 Detection Data Set)에 저장된 객체의 행동 목록 별 확률 정보와 비교한다. 예를 들어, 비교부(240)는 이미 알려진 객체의 행동 목록별 확률 정보들이 저장된 데이터 셋을 이용하여 합산부(220)에서 합산된 클래스 벡터가 나타내는 객체의 행동이 일치하는지 여부를 비교함으로서, 영상 내 객체의 행동을 정확하게 인식할 수 있다.The comparison unit 240 compares the probability information for each action list classifying the behaviors in the summed class vector with the probability information for each action list of the object stored in the preliminary answer data set (UCF-101 Detection Data Set). For example, the comparison unit 240 compares whether or not the behaviors of the objects represented by the class vector summed by the adder 220 match using a data set in which probability information for each action list of known objects is stored. I can accurately recognize the behavior of my objects.

도 2는 도 1의 실시 예에서 스트림 생성부의 확대 블록도이다.FIG. 2 is an enlarged block diagram of a stream generator in the embodiment of FIG. 1.

스트림 생성부(100)는 관심 영역 검출부(120), 관심 영역 연결부(140) 및 위치 정보 산출부(160)를 포함한다. 예를 들어, 스트림 생성부(100)는 입력 영상에 포함된 객체의 행동에 관한 모션 정보를 객체 별로 구분하여 포함하는 액션 스트림을 생성한다. 액션 스트림은 시간의 흐름에 따라 각 영상 프레임들에서 검출된 관심 영역들을 연결하여 생성된 것이다. The stream generator 100 includes an ROI detector 120, an ROI 140, and a location information calculator 160. For example, the stream generator 100 generates an action stream including motion information regarding the behavior of the object included in the input image for each object. The action stream is generated by concatenating the regions of interest detected in each image frame over time.

객체의 행동에 관한 모션 정보는 시간의 흐름에 따른 관심 영역들의 위치 변화량 및 관심 영역 내의 화소값 변화량을 모두 포함할 수 있지만, 프레임 영상 내 관심 영역들의 위치 변화량을 제외한 관심 영역 내의 화소값 변화량 만을 포함할 수 있다. The motion information about the behavior of the object may include both the position change amount of the ROIs and the pixel value change amount of the ROIs over time, but includes only the pixel value change amount of the ROI excluding the position change of the ROIs in the frame image can do.

예를 들어, 영상 프레임 내에 위치하는 관심 영역들의 위치와 화소값 정보들은 시간의 흐름에 따라 달라지므로, 시간의 흐름에 따라 영상 프레임 내에서 변하는 관심 영역들의 위치 변화량, 시간의 흐름에 따라 변하는 관심 영역내의 화소값 변화량등은 그 자체로 객체의 행동 변화에 관한 모션 정보를 구성할 수 있다. 다만, 바람직하게는, 본 발명의 모션 정보는 시간의 흐름에 따라 변하는 관심 영역내의 화소값 변화량만을 포함할 수 있음은 전술한 바와 같다. For example, since the position and pixel value information of the ROIs located in the image frame vary with time, the position change amount of the ROIs that change in the image frame with time, and the ROI change with time. The amount of change in the pixel value and the like in itself can constitute motion information regarding the behavior change of the object. However, as described above, preferably, the motion information of the present invention may include only the pixel value change amount in the ROI that changes with time.

다만, 본 발명의 영상 인식 장치(10)는 페어 와이즈 스트림을 이용하기 때문에, 프레임 영상내 관심 영역들의 위치 변화량까지 모두 고려하여 영상 내 객체들의 행동을 인식할 수 있음은 후술한다.However, since the image recognition apparatus 10 of the present invention uses the pair-wise stream, it will be described later that the behavior of the objects in the image can be recognized in consideration of the positional variation of the regions of interest in the frame image.

관심 영역 검출부(120)는 입력 영상을 시간의 흐름에 따라 분할하여 복수개의 프레임 영상들을 생성하며, 상기 생성된 프레임 영상들에서 상기 객체 별로 구분되어 상기 객체를 적어도 일부 포함하는 관심 영역을 검출할 수 있다. 관심 영역 검출부(120)는 입력 영상들을 16개 프레임 단위로 분할할 수 있고, 각각의 분할된 프레임 영상에서 객체에 관한 이미지를 주로 포함하는 관심 영역들을 검출할 수 있다.The ROI detector 120 generates a plurality of frame images by dividing an input image over time, and detects an ROI including at least a portion of the objects divided by the objects from the generated frame images. have. The ROI detector 120 may divide the input images into 16 frame units, and detect ROIs mainly including an image of an object from each of the divided frame images.

관심 영역 연결부(140)는 시간의 흐름에 따라 분할된 프레임 영상들에서 검출된 관심 영역들을 서로 연결하여 객체 별 액션 튜브들을 생성한다. 관심 영역 연결부(140)는 생성된 액션 튜브들을 제1 인식기의 입력 형태에 맞게 전처리하여 액션 스트림을 생성할 수 있음은 전술한 바와 같다.The region of interest connector 140 generates action tubes for each object by connecting the regions of interest detected in the divided frame images with time. As described above, the ROI 140 may generate the action stream by preprocessing the generated action tubes according to the input form of the first recognizer.

위치 정보 산출부(160)는 위치 정보 산출부(160)는 프레임 영상들에서 검출된 상기 관심 영역을 프레임 영상 단위로 재배열하고, 상기 재배열된 관심 영역을 연결하여 생성된 조합 스트림으로부터 상기 액션 스트림의 상호간 위치에 관한 위치 정보를 산출한다. 본 발명의 영상 인식 장치(10)는 위치 정보 산출부(160)가 산출한 위치 정보를 이용하여 영상 내 객체 별로 생성된 액션 스트림의 위치 관계를 이용하여 영상 내 객체의 행동을 인식할 수 있다. The location information calculator 160 rearranges the ROIs detected from the frame images in frame image units, and connects the rearranged ROIs from the combined stream generated. Compute positional information about mutual positions of streams. The image recognition apparatus 10 of the present invention may recognize the behavior of the object in the image using the positional relationship of the action stream generated for each object in the image by using the position information calculated by the position information calculator 160.

도 3은 도 2의 실시 예에서 관심 영역 검출부의 확대 블록도이다.3 is an enlarged block diagram of the ROI detector of the embodiment of FIG. 2.

관심 영역 검출부(120)는 특징 정보 산출부(122) 및 바운더리 셀 제거부(126)를 포함한다. 특징 정보 산출부(122)는 프레임 영상 내 검출될 관심 영역의 위치에 관한 좌표로서, 특징 정보를 산출한다. 특징 정보 산출부(122)는 영상 내에서 관심 영역을 검출하기 위하여 합성곱 신경망을 기반으로 하는 YOLO(You Only Look Once), Fast R-CNN등을 포함하는 비전 인식 알고리즘을 사용할 수 있다.The ROI detector 120 includes a feature information calculator 122 and a boundary cell remover 126. The feature information calculator 122 calculates feature information as coordinates of a location of the ROI to be detected in the frame image. The feature information calculator 122 may use a vision recognition algorithm including YOLO based on a multiplicative neural network, Fast R-CNN, etc. to detect a region of interest in an image.

바운더리 셀 제거부(126)는 프레임 영상들 내에 상기 객체를 중복하여 포함하는 바운더리 셀 중 상기 바운더리 셀 내에 상기 객체가 존재하는 확률이 기 설정된 임계치 이상인지 여부를 고려하여 상기 객체를 중복하여 포함하는 바운더리 셀의 일부를 제거한다. 바운더리 셀 제거부(126)가 바운더리 셀을 제거 하고 남은 바운더리 셀을 이용하여 관심 영역을 검출하는 방법은 전술한 바와 같다.The boundary cell remover 126 includes a boundary cell that overlaps the object in consideration of whether a probability that the object exists in the boundary cell among the boundary cells including the object in the frame images is greater than or equal to a preset threshold. Remove part of the cell. As described above, the boundary cell removal unit 126 removes the boundary cell and detects the ROI using the remaining boundary cell.

도 4는 도 3의 실시 예에서 특징 정보 산출부의 확대 블록도이다.4 is an enlarged block diagram of a feature information calculating unit in the embodiment of FIG. 3.

특징 정보 산출부(122)는 전처리부(123) 및 계산부(124)를 포함한다. 전처리부(123)는 생성된 프레임 영상들을 각각 분할하여 미리 결정된 크기의 격자셀을 생성할 수 있고, 생성된 격자 셀들을 각각의 격자셀들에 종속되는 바운더리 셀들을 포함함은 전술한 바와 같다.The feature information calculator 122 includes a preprocessor 123 and a calculator 124. The preprocessing unit 123 may generate the grid cells having a predetermined size by dividing the generated frame images, and include the boundary cells that are dependent on the grid cells.

본 발명의 제2 인식기는 컨벌루션 특징을 추출하는 적어도 하나의 컨벌루션 레이어 및 컨벌루션 레이어의 일단에 연결되어 바운더리 셀의 중심 좌표와 바운더리 셀 내의 객체가 존재하는 확률을 계산하는 풀리 커넥티드 레이어를 포함할 수 있다.The second recognizer of the present invention may include at least one convolutional layer extracting the convolutional feature and a pulley connected layer connected to one end of the convolutional layer to calculate the center coordinates of the boundary cell and the probability that an object in the boundary cell exists. have.

도 5는 관심 영역 검출부가 관심 영역을 검출하는 과정을 나타내는 참고도이다.5 is a reference diagram illustrating a process of detecting an ROI by the ROI detector.

전처리부(123)에서 생성된 격자 셀들에 종속되는 바운더리 셀들(127, 128, 129, 132, 133, 134)은 격자 셀들에 종속되어 내부에 객체에 관한 부분이 포함될 확률을 나타낸다. 관심 영역 검출부(120)는 생성된 바운더리 셀들 중 객체가 존재하는 확률이 기 설정된 임계치 이상이 아닌 바운더리 셀들을 모두 제거하고, 제거하고 남은 바운더리 셀들을 이용하여 관심 영역을 검출할 수 있음은 전술한 바와 같다. Boundary cells 127, 128, 129, 132, 133, and 134, which depend on the grid cells generated by the preprocessor 123, depend on the grid cells to indicate the probability that an object part is included therein. As described above, the ROI detector 120 may remove all boundary cells whose probability that the object exists among the generated boundary cells does not exceed a predetermined threshold value, and detect the ROI using the remaining boundary cells. same.

도 5에 도시된 바와 같이, 관심 영역 검출부(120)는 바운더리 셀들이 나타내는 확률을 고려하여 99%의 확률을 가지는 제1 객체(사람)에 관한 관심 영역 및 67%의 확률을 가지는 제2 객체(사물)에 관한 관심 영역을 검출할 수 있다. 관심 영역 검출부(120)는 관심 영역을 검출함에 이용하는 관심 영역들의 확률 정보는 전술한 IOU(Intersection Over Union) 를 포함한다.As illustrated in FIG. 5, the ROI detector 120 may consider a ROI of a first object (person) having a 99% probability and a second object having a 67% probability based on a probability indicated by boundary cells. The ROI may be detected. The ROI detection unit 120 includes probability information of the ROIs used to detect the ROI as described above.

도 6은 관심 영역 검출부가 검출한 관심 영역들을 나타내는 예시도이다.6 is an exemplary diagram illustrating the ROIs detected by the ROI detector;

관심 영역 검출부(120)는 도 6에 도시된 바와 같이, 영상 내 포함되는 객체 별 관심 영역들을 검출할 수 있다. 예를 들어, 입력 영상 내 사람들이 존재하는 경우, 제1 객체(사람) 및 제2 객체(사람)에 관한 관심 영역들을 검출할 수 있고, 입력 영상 내 사람 및 사물이 존재하는 경우, 제1 객체(사람) 및 제2 객체(사물)에 관한 관심 영역들을 검출할 수 있다.As shown in FIG. 6, the ROI detector 120 may detect ROIs for each object included in the image. For example, if there are people in the input image, the ROIs may be detected with respect to the first object (person) and the second object (person). If there are people and objects in the input image, the first object may be detected. The regions of interest relating to the (person) and the second object (thing) can be detected.

도 7은 도 2의 실시 예에서 관심 영역 연결부의 확대 블록도이다.FIG. 7 is an enlarged block diagram of a region of interest connector in the embodiment of FIG. 2.

관심 영역 연결부(140)는 연결 점수 산출부(142) 및 특징 정보 추정부(144)를 포함한다. 예를 들어, 연결 점수 산출부(142)는 인접한 프레임 영상들 각각에서 검출된 상기 관심 영역들의 특징 정보를 입력으로 하는 유사도 함수, 상기 인접한 프레임 영상들 각각에서 검출된 상기 관심 영역들의 오버렙 비율을 출력으로 하는 교차비 함수 및 상기 관심 영역들의 클래스 정보의 유사도 중 적어도 하나를 고려하여 연결 점수를 산출할 수 있다. 본 발명의 연결 점수 산출부(142)가 수학식 1을 이용하여 연결 점수를 산출하는 구체적인 방법은 전술한 바와 같으므로 생략한다.The ROI 140 includes a connection score calculator 142 and a feature information estimator 144. For example, the connection score calculator 142 may calculate a similarity function using the feature information of the ROIs detected in the adjacent frame images, and the overlap ratio of the ROIs detected in the adjacent frame images. The connection score may be calculated in consideration of at least one of a similarity between the cross ratio function as an output and the class information of the ROIs. Since the connection point calculation unit 142 of the present invention calculates the connection point using Equation 1 as described above, it will be omitted.

특징 정보 추정부(144)는 관심 영역이 검출되지 않은 프레임 영상이 존재하는 경우, 상기 관심 영역이 검출되지 않은 프레임 영상에 인접한 프레임 영상들에서 검출된 관심 영역들의 특징 정보를 기반으로 상기 관심 영역이 검출되지 않은 프레임 영상 내의 관심 영역의 특징 정보를 추정한다. 본 발명의 특징 정보 추정부(144)가 보간 방법을 사용하여 관심 영역이 검출되지 않은 프레임 영상 내의 관심영역의 위치를 추정하고, 추정된 관심 영역의 위치를 이용하여 관심 영역들을 연결하는 구체적인 방법은 전술한 바와 같다.When there is a frame image in which the ROI is not detected, the feature information estimator 144 may determine that the ROI is based on the feature information of the ROIs detected in the frame images adjacent to the frame image in which the ROI is not detected. The feature information of the ROI in the undetected frame image is estimated. According to a specific method of the present invention, the feature information estimator 144 estimates the location of the ROI in the frame image in which the ROI is not detected by using an interpolation method, and connects the ROIs using the estimated location of the ROI. As described above.

도 8은 관심 영역 연결부가 관심 영역들을 연결하는 과정을 나타내는 참고도이다.8 is a reference diagram illustrating a process of connecting the ROI by the ROI.

또한, 관심 영역 연결부(140)는 생성된 제1 액션 튜브(361)과 제2 객체에 대한 제2 액션 튜브(362)를 전처리하여 제1 객체 및 제2 객체에 대한 액션 스트림을 각각 생성할 수 있음은 전술한 바와 같다.In addition, the ROI 140 may preprocess the generated first action tube 361 and the second action tube 362 for the second object to generate an action stream for the first object and the second object, respectively. Yes is as described above.

도 9는 특징 정보 추정부가 관심 영역의 특징 정보를 추정하는 과정을 나타내는 예시도이다.9 is an exemplary diagram illustrating a process of estimating feature information of a region of interest by a feature information estimator.

특징 정보 추정부(144)는 관심 영역이 검출되지 않은 프레임 영상이 존재하는 경우, 상기 관심 영역이 검출되지 않은 프레임 영상에 인접한 프레임 영상들에서 검출된 관심 영역들의 특징 정보를 기반으로 상기 관심 영역이 검출되지 않은 프레임 영상 내의 관심 영역의 특징 정보를 추정한다. 특징 정보 추정부(144)가 보간 과정을 이용하여 관심 영역이 검출되지 않은 프레임 영상 내 에서 특징 정보를 추정하는 과정은 전술한 바와 같으므로 생략한다.When there is a frame image in which the ROI is not detected, the feature information estimator 144 may determine that the ROI is based on the feature information of the ROIs detected in the frame images adjacent to the frame image in which the ROI is not detected. The feature information of the ROI in the undetected frame image is estimated. Since the feature information estimator 144 estimates the feature information in the frame image in which the ROI is not detected using the interpolation process is omitted as described above.

도 10은 도 2의 실시 예에서 위치 정보 산출부의 확대 블록도이다.FIG. 10 is an enlarged block diagram of a location information calculator in the embodiment of FIG. 2.

위치 정보 산출부(160)는 영역 구분부(162) 및 재배열 부(164)를 포함한다. 예를 들어, 위치 정보 산출부(160)는 프레임 영상들에서 검출된 상기 관심 영역을 프레임 영상 단위로 재배열하고, 상기 재배열된 관심 영역을 연결하여 생성된 조합 스트림으로부터 상기 액션 스트림의 상호간 위치에 관한 위치 정보를 산출한다. 위치 정보 산출부(160)과 객체 별 생성된 액션 스트림의 위치 관계를 산출하는 구체적인 과정은 전술한 바와 같으므로 생략한다.The location information calculator 160 includes an area separator 162 and a rearrangement unit 164. For example, the location information calculator 160 rearranges the ROIs detected in the frame images in frame image units, and positions the action streams from the combined streams generated by concatenating the rearranged ROIs. Compute positional information about. A detailed process of calculating the positional relationship between the location information calculator 160 and the action stream generated for each object is the same as described above, and thus will be omitted.

도 11은 본 발명의 영상 인식 장치가 수행하는 영상 인식 과정을 나타낸다.11 illustrates an image recognition process performed by the image recognition apparatus of the present invention.

본 발명의 영상 인식 장치(10)는 적어도 하나의 동적 객체들을 포함하는 입력 영상에서 객체 별 액션 스트림(272, 274)를 생성하고, 동시에 객체 별로 생성된 액션 스트림을 조합하여 조합 스트림(279)를 생성한다. 본 발명의 조합 스트림은 객체 별 생성된 액션 스트림(272, 274)의 위치 관계를 나타냄은 전술한 바와 같다.The image recognition apparatus 10 of the present invention generates action streams 272 and 274 for each object in an input image including at least one dynamic object, and simultaneously generates a combination stream 279 by combining the action streams generated for each object. Create As described above, the combined stream of the present invention indicates the positional relationship of the action streams 272 and 274 generated for each object.

영상 인식 장치(10)는 객체 별 생성된 액션 스트림들(272, 274) 및 조합 스트림(페어와이즈 스트림, 279)를 제1 인식기에 입력하고, 출력되는 각각의 클래스 벡터들을 미리 설정된 방법으로 합산하며, 합산된 클래스 벡터들을 이용하여 영상 내 객체들의 행동을 인식할 수 있다.The image recognition apparatus 10 inputs the action streams 272 and 274 and the combination stream (pairwise stream 279) generated for each object to the first recognizer, and adds the output class vectors in a predetermined manner. In addition, the sum of the class vectors can be used to recognize the behavior of the objects in the image.

도 12는 도 1의 실시 예에서 인식부의 확대 블록도이다.12 is an enlarged block diagram of a recognizer in the embodiment of FIG. 1.

인식부(200)는 합산부(220) 및 비교부(240)를 포함한다. 예를 들어 인식부(200)는 생성된 액션 스트림 또는 상기 액션 스트림의 위치 정보를 입력으로 하고, 상기 객체의 행동을 분류하기 위한 지표로서 적어도 하나의 클래스 벡터를 출력으로 하는 제1 인식기를 이용하여 상기 객체들의 행동을 인식할 수 있다.The recognition unit 200 includes an adder 220 and a comparator 240. For example, the recognizer 200 uses the generated action stream or position information of the action stream as an input and uses a first recognizer that outputs at least one class vector as an indicator for classifying the behavior of the object. The behavior of the objects can be recognized.

도 13은 본 발명의 일 실시 예에 따른 영상 인식 방법의 흐름도이다.13 is a flowchart of an image recognition method according to an exemplary embodiment.

영상 인식 장치(10)가 수행하는 영상 인식 방법(10)은 시계열적으로 수행되는 하기의 단계들을 포함한다.The image recognition method 10 performed by the image recognition apparatus 10 includes the following steps performed in time series.

S100에서, 스트림 생성부(100)는 적어도 하나의 객체들을 포함하는 입력 영상에서 상기 객체별 행동에 관한 모션 정보를 포함하는 액션 스트림을 생성한다. 스트림 생성부(100)는 입력 영상 내에 복수의 객체들이 포함되는 경우, 객체 별로 액션 스트림을 생성할 수 있다. 또한 스트림 생성부(100)는 객체 별로 생성된 액션 스트림의 위치 관계를 나타내는 위치 정보를 포함하는 조합 스트림을 생성함으로서, 객체 별로 생성된 액션 스트림의 위치 관계를 이용하여 영상 내 객체의 행동을 인식할 수 있음은 전술한 바와 같다.In S100, the stream generator 100 generates an action stream including motion information regarding the action for each object in an input image including at least one object. The stream generator 100 may generate an action stream for each object when a plurality of objects are included in the input image. In addition, the stream generation unit 100 generates a combination stream including position information indicating the positional relationship of the action stream generated for each object, thereby recognizing the behavior of the object in the image using the positional relationship of the action stream generated for each object. May be as described above.

S200에서, 인식부(200)는 생성된 액션 스트림 또는 상기 액션 스트림의 위치 정보를 입력으로 하고, 상기 객체의 행동을 분류하기 위한 지표로서 적어도 하나의 클래스 벡터를 출력으로 하는 제1 인식기를 이용하여 상기 객체들의 행동을 인식할 수 있다. 예를 들어, 인식부(200)가 객체 별로 생성된 액션 스트림 및 객체 별로 생성된 액션 스트림의 위치 관계를 나타내는 조합 스트림을 각각 입력으로 하는 제1 인식기의 출력을 합산하여 영상 내 객체들의 행동을 인식하는 구체적인 방법은 전술한 바와 같다.In S200, the recognition unit 200 uses the generated action stream or position information of the action stream as an input, and uses a first recognizer that outputs at least one class vector as an indicator for classifying the behavior of the object. The behavior of the objects can be recognized. For example, the recognizer 200 recognizes the behavior of the objects in the image by summing the outputs of the first recognizer that respectively inputs a combination stream indicating the positional relationship between the action stream generated for each object and the action stream generated for each object. The specific method is as described above.

도 14는 도 13의 실시 예에서 생성하는 단계의 확대 흐름도이다.FIG. 14 is an enlarged flowchart of steps generated in the embodiment of FIG. 13. FIG.

S120에서, 관심 영역 검출부(120)는 입력 영상을 시간의 흐름에 따라 분할하여 복수개의 프레임 영상들을 생성하며, 상기 생성된 프레임 영상들에서 상기 객체 별로 구분되어 상기 객체를 적어도 일부 포함하는 관심 영역을 검출한다. 관심 영역 검출부(120)가 복수의 프레임 영상 들에서 관심 영역을 검출하는 구체적인 방법은 전술한 바와 같으므로 생략한다.In S120, the ROI detector 120 generates a plurality of frame images by dividing an input image over time, and generates an ROI including at least a portion of the objects divided by the objects in the generated frame images. Detect. The method of detecting the ROI from the plurality of frame images by the ROI 120 is the same as described above, and thus will be omitted.

S140에서, 관심 영역 연결부(140)는 서로 다른 프레임 영상들에서 검출된 상기 관심 영역간 연결 점수를 산출하며, 상기 산출된 연결 점수를 고려하여 상기 서로 다른 프레임 영상들에서 검출된 상기 관심 영역을 연결한다. 관심 영역 연결부(140)는 임의 프레임 영상 내 검출된 하나의 관심 영역과 다른 프레임 영상 내에 검출된 모든 관심 영역들과의 연결 점수를 고려하여 관심 영역을 연결할 수 있다.In S140, the ROI 140 may calculate a connection score between the ROIs detected in different frame images, and connect the ROIs detected in the different frame images in consideration of the calculated connection points. . The ROI 140 may connect the ROI in consideration of a connection score between one ROI detected in an arbitrary frame image and all ROIs detected in another frame image.

S160에서, 위치 정보 산출부(160)는 프레임 영상 들에서 검출된 객체 별 관심 영역을 프레임 영상 단위로 재배열하고, 재배열된 관심 영역을 연결하여 생성된 조합 스트림으로부터 액션 스트림의 상호간 위치 관계를 나타내는 위치 정보를 산출할 수 있다. 위치 정보 산출부(160)가 프레임 영상들에서 검출된 관심 영역들을 재배열 하는 방법은 전술한 바와 같으므로 생략한다.In S160, the location information calculator 160 rearranges the ROIs for each object detected in the frame images in frame image units, and calculates the positional relationship between the action streams from the combined stream generated by concatenating the rearranged ROIs. The positional information shown can be calculated. The method of rearranging the ROIs detected from the frame images by the location information calculator 160 is omitted as described above.

도 15는 도 14의 실시 예에서 검출하는 단계의 확대 흐름도이다.FIG. 15 is an enlarged flowchart of a detecting step in the embodiment of FIG. 14.

S122에서, 특징 정보 산출부(122)는 생성된 프레임 영상 내 상기 관심 영역이 위치하는 좌표를 나타내는 특징 정보를 산출할 수 있다. 본 발명에서 특징 정보는 프레임 영상 내 검출될 관심 영역의 위치에 관한 좌표로서, 프레임 영상에서 관심 영역의 중심 좌표로 마련될 수 있음은 전술한 바와 같다.In S122, the feature information calculator 122 may calculate feature information indicating coordinates at which the ROI is located in the generated frame image. In the present invention, the feature information is coordinates regarding the position of the ROI to be detected in the frame image, and may be provided as the center coordinates of the ROI in the frame image.

S126에서, 바운더리 셀 제거부(126)는 프레임 영상들 내에 상기 객체를 중복하여 포함하는 바운더리 셀 중 상기 바운더리 셀 내에 상기 객체가 존재하는 확률이 기 설정된 임계치 이상인지 여부를 고려하여 상기 객체를 중복하여 포함하는 바운더리 셀의 일부를 제거한다. 관심 영역 검출부(120)는 바운더리 셀 제거부(126)에서 제거 하고 남은 바운더리 셀들을 이용하여 관심 영역들을 검출한다.In S126, the boundary cell removing unit 126 overlaps the object in consideration of whether a probability that the object exists in the boundary cell among the boundary cells including the object in the frame images is greater than or equal to a preset threshold. Remove part of the boundary cell it contains. The region of interest detector 120 detects regions of interest using boundary cells remaining after the boundary cell remover 126 is removed.

도 16은 도 14의 실시 예에서 연결하는 단계의 확대 흐름도이다.FIG. 16 is an enlarged flowchart of connecting steps in the embodiment of FIG. 14;

S142에서, 연결 점수 산출부(142)는 인접한 프레임 영상들 각각에서 검출된 상기 관심 영역들의 특징 정보를 입력으로 하는 유사도 함수, 상기 인접한 프레임 영상들 각각에서 검출된 상기 관심 영역들의 오버렙 비율을 출력으로 하는 교차비 함수 및 상기 관심 영역들의 클래스 정보의 유사도 중 적어도 하나를 고려하여 연결 점수를 산출한다. 연결 점수 산출부(142)가 인접한 프레임 영상들 각각에서 검출된 관심 영역들의 연결 점수를 산출하는 구체적인 방법은 전술한 바와 같으므로 생략한다.In operation S142, the connection score calculator 142 outputs a similarity function that inputs feature information of the ROIs detected in each of the adjacent frame images, and an overlap ratio of the ROIs detected in each of the adjacent frame images. The connection score is calculated by considering at least one of a similarity ratio between the intersection ratio function and the class information of the ROIs. Since the connection score calculator 142 calculates the connection scores of the ROIs detected from each of the adjacent frame images, the connection score calculation unit 142 is omitted as described above.

S144에서, 특징 정부 추정부(144)는 입력 영상이 분할되어 생성된 프레임 영상들 중에서 관심 영역이 검출되지 않은 프레임 영상이 존재하는 경우, 관심 영역이 검출되지 않은 프레임 영상에 인접한 프레임 영상 내에서 검출된 관심 영역들의 좌표를 이용하여 관심 영역이 검출되지 않은 프레임 영상 내 관심 영역의 위치를 추정할 수 있다.In S144, when the frame image in which the region of interest is not detected is present among the frame images generated by splitting the input image, the feature determining unit 144 detects a frame image adjacent to the frame image in which the region of interest is not detected. The location of the ROI in the frame image in which the ROI is not detected may be estimated using the coordinates of the ROIs.

도 17은 도 14의 실시 예에서 위치 정보를 산출하는 단계의 확대 흐름도이다.17 is an enlarged flowchart of calculating position information in the embodiment of FIG. 14.

S162에서, 영역 구분부(162)는 프레임 영상들에서 상기 객체 별로 검출된 상기 관심 영역 및 상기 관심 영역이 검출되지 않은 부분을 구분한다. 예를 들어, 영역 구분부(162)는 프레임 영상 내 검출된 관심 영역들의 위치 정보를 알기 위해서 2가지 화소값 정보를 이용하여 관심 영역과 관심 영역이 아닌 부분을 구분하여 마스킹 할 수 있다.In operation S162, the area separator 162 may distinguish between the ROI detected for each object and the portion in which the ROI is not detected from the frame images. For example, the area divider 162 may distinguish and mask a region of interest and a non-interest region by using two pieces of pixel information in order to know the location information of the detected regions of interest in the frame image.

S164에서, 재배열 부(164)는 재배열 부(164)는 구분된 관심 영역을 소정의 조합 방법으로 재배열한다. 예를 들어, 재배열 부(164)는 제1 객체에 관한 마스킹된 관심 영역들(277)과 제2 객체에 관한 마스킹된 관심 영역들(279(을 소정의 조합 방법으로 조합할 수 있다. 재배열 부(164)가 마스킹된 관심 영역을 조합하는 방법은 제1 객체에 관한 관심 영역과 제2 객체에 관한 관심 영역을 교차로 조합하는 방법, 제1 객체에 관한 관심 영역들을 선배치하고, 제2 객체에 관한 관심 영역들을 후배치 하는 방법을 포함할 수 있다. In S164, the rearrangement unit 164 rearranges the divided ROI in a predetermined combination method. For example, the rearrangement unit 164 may combine the masked regions of interest 277 related to the first object and the masked regions of interest 279 related to the second object in a predetermined combination method. The method of combining the ROI masked by the array unit 164 may include a method of alternately combining an ROI of the first object and an ROI of the second object, arranging the ROIs of the first object, and arranging the second ROI. It may include a method of rearranging regions of interest relating to an object.

영상 인식 장치(10)는 객체 별로 검출된 관심 영역들을 구분하고, 구분된 관심 영역들을 재배열하여 조합 스트림을 생성함으로서, 영상 내 객체 들의 위치 관계를 이용하여 객체들의 행동을 인식할 수 있다.The image recognition apparatus 10 may recognize the behavior of the objects by using the positional relationship of the objects in the image by distinguishing the ROIs detected for each object and rearranging the ROIs.

도 18은 도 13의 실시 예에서 인식하는 단계의 확대 흐름도이다.18 is an enlarged flowchart of a step of recognizing in the embodiment of FIG. 13.

S220에서, 합산부(220)는 제1 인식기를 이용하여 상기 객체 별로 구분되는 액션 스트림 각각의 클래스 벡터를 미리 설정된 방법에 따라 행동 목록 별로 합산한다. 예를 들어, 합산부(220)는 객체 별 액션 스트림 및 객체 별로 생성된 액션 스트림의 위치 관계를 나타내는 조합 스트림을 입력으로 하는 제1 인식기의 출력인 클래스 벡터들에 가중치를 적용하여 합산할 수 있다.In operation S220, the adder 220 adds the class vectors of the action streams classified for each object by action list using a first recognizer according to a preset method. For example, the adder 220 may add and add weights to class vectors that are outputs of a first recognizer that takes a combination stream indicating a positional relationship between an action stream for each object and an action stream generated for each object. .

S240에서, 비교부(240)는 비교부(240)는 합산된 클래스 벡터 내 행동을 분류하는 행동 목록 별 확률 정보들을 미리 마련된 정답 데이터 셋(UCF-101 Detection Data Set)에 저장된 객체의 행동 목록 별 확률 정보와 비교한다. In operation S240, the comparison unit 240 compares the probability information for each action list for classifying the behaviors in the summed class vector for each action list of the object stored in the preliminary answer data set (UCF-101 Detection Data Set). Compare with probability information.

상기 설명된 본 발명의 일 실시예의 방법의 전체 또는 일부는, 컴퓨터에 의해 실행되는 프로그램 모듈과 같은 컴퓨터에 의해 실행 가능한 기록 매체의 형태(또는 컴퓨터 프로그램 제품)로 구현될 수 있다. 여기에서, 컴퓨터 판독 가능 매체는 컴퓨터 저장 매체(예를 들어, 메모리, 하드디스크, 자기/광학 매체 또는 SSD(Solid-State Drive) 등)를 포함할 수 있다. 컴퓨터 판독 가능 매체는 컴퓨터에 의해 액세스될 수 있는 임의의 가용 매체일 수 있고, 휘발성 및 비휘발성 매체, 분리형 및 비분리형 매체를 모두 포함한다.All or part of the method of one embodiment of the present invention described above may be implemented in the form of a computer-executable recording medium (or a computer program product), such as a program module executed by a computer. Here, the computer readable medium may include a computer storage medium (eg, memory, hard disk, magnetic / optical media or solid-state drive, etc.). Computer readable media can be any available media that can be accessed by a computer and includes both volatile and nonvolatile media, removable and non-removable media.

또한, 본 발명의 일 실시예에 따르는 방법의 전체 또는 일부는 컴퓨터에 의해 실행 가능한 명령어를 포함하며, 컴퓨터 프로그램은 프로세서에 의해 처리되는 프로그래밍 가능한 기계 명령어를 포함하고, 고레벨 프로그래밍 언어(High-level Programming Language), 객체 지향 프로그래밍 언어(Object-oriented Programming Language), 어셈블리 언어 또는 기계 언어 등으로 구현될 수 있다.In addition, all or part of the method according to an embodiment of the present invention includes instructions executable by a computer, the computer program including programmable machine instructions processed by a processor, and a high-level programming language. Language, an object-oriented programming language, an assembly language, or a machine language.

본 명세서에서의 부(means) 또는 모듈(Module)은 본 명세서에서 설명되는 각 명칭에 따른 기능과 동작을 수행할 수 있는 하드웨어를 의미할 수도 있고, 특정 기능과 동작을 수행할 수 있는 컴퓨터 프로그램 코드를 의미할 수도 있고, 또는 특정 기능과 동작을 수행시킬 수 있는 컴퓨터 프로그램 코드가 탑재된 전자적 기록 매체, 예를 들어 프로세서 또는 마이크로 프로세서를 의미할 수 있다. 다시 말해, 부(means) 또는 모듈(Module)은 본 발명의 기술적 사상을 수행하기 위한 하드웨어 및/또는 상기 하드웨어를 구동하기 위한 소프트웨어의 기능적 및/또는 구조적 결합을 의미할 수 있다. Means or modules in the present specification may mean hardware capable of performing functions and operations according to each name described in the present specification, and computer program code capable of performing specific functions and operations. It may mean an electronic recording medium, for example, a processor or a microprocessor, on which computer program code capable of performing specific functions and operations may be implemented. In other words, a means or module may mean a functional and / or structural combination of hardware for performing the technical idea of the present invention and / or software for driving the hardware.

따라서 본 발명의 일 실시예에 따르는 방법은 상술한 바와 같은 컴퓨터 프로그램이 컴퓨팅 장치에 의해 실행됨으로써 구현될 수 있다. 컴퓨팅 장치는 프로세서와, 메모리와, 저장 장치와, 메모리 및 고속 확장포트에 접속하고 있는 고속 인터페이스와, 저속 버스와 저장 장치에 접속하고 있는 저속 인터페이스 중 적어도 일부를 포함할 수 있다. 이러한 성분들 각각은 다양한 버스를 이용하여 서로 접속되어 있으며, 공통 머더보드에 탑재되거나 다른 적절한 방식으로 장착될 수 있다.Thus, a method according to an embodiment of the present invention may be implemented by executing a computer program as described above by a computing device. The computing device may include at least a portion of a processor, a memory, a storage device, a high speed interface connected to the memory and a high speed expansion port, and a low speed interface connected to the low speed bus and the storage device. Each of these components are connected to each other using a variety of buses and may be mounted on a common motherboard or otherwise mounted in a suitable manner.

이상의 설명은 본 발명의 기술 사상을 예시적으로 설명한 것에 불과한 것으로서, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자라면 본 발명의 본질적인 특성에서 벗어나지 않는 범위 내에서 다양한 수정, 변경 및 치환이 가능할 것이다. 따라서, 본 발명에 개시된 실시예 및 첨부된 도면들은 본 발명의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 실시예 및 첨부된 도면에 의하여 본 발명의 기술 사상의 범위가 한정되는 것은 아니다. 본 발명의 보호 범위는 아래의 청구 범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 발명의 권리 범위에 포함되는 것으로 해석되어야 할 것이다.The above description is merely illustrative of the technical idea of the present invention, and various modifications, changes, and substitutions may be made by those skilled in the art without departing from the essential characteristics of the present invention. will be. Accordingly, the embodiments disclosed in the present invention and the accompanying drawings are not intended to limit the technical spirit of the present invention but to describe the present invention, and the scope of the technical idea of the present invention is not limited by the embodiments and the accompanying drawings. . The scope of protection of the present invention should be interpreted by the following claims, and all technical ideas within the scope equivalent thereto should be construed as being included in the scope of the present invention.

Claims

A stream generator configured to generate an action stream including motion information regarding the action of each object in an input image including at least one object; And
Receiving the position information indicating the positional relationship of the action stream in the generated action stream or the input image, by using a first recognizer for outputting at least one class vector as an indicator for classifying the object's behavior Recognition unit for recognizing the behavior; Image recognition device comprising a.

The method of claim 1, wherein the stream generating unit
A region of interest detector configured to generate a plurality of frame images by dividing the input image over time, and to detect a region of interest including at least a portion of the object for each object in the generated frame images; More,
And generating the action stream by using the detected region of interest.

The method of claim 1, wherein the recognition unit
An adder configured to add a class vector of an action stream generated for each object by using the first recognizer according to a preset method; More,
And recognizing the behavior of the objects using the summed class vector.

The method of claim 2, wherein the stream generating unit
A region of interest connection unit configured to calculate a connection score between the ROIs detected from adjacent different frame images, and to connect the ROIs detected from the different frame images in consideration of the calculated connection scores; More,
And generating the action stream for each object using the connected ROIs.

The method of claim 4, wherein the stream generating unit
A position information calculator configured to rearrange the ROIs detected from the frame images in units of frame images, and calculate position information regarding mutual positions of the action streams from the combined stream generated by concatenating the rearranged ROIs; ; Image recognition apparatus further comprises a.

The method of claim 4, wherein the region of interest connection unit
A similarity function as an input of feature information of the ROIs detected in each adjacent frame image, an intersection ratio function outputting an overlap ratio of the ROIs detected in each of the adjacent frame images, and class information of the ROIs A connection score calculator configured to calculate a connection score in consideration of at least one of similarities of? More,
And connect the ROI using the calculated connection score.

The method of claim 6, wherein the region of interest connector is
If there is a frame image in which the ROI is not detected,
A feature information estimator estimating feature information of the ROI in the frame image in which the ROI is not detected, based on feature information of the ROIs detected in the frame images adjacent to the frame image in which the ROI is not detected; More,
And the region of interest is connected using the estimated feature information.

The apparatus of claim 2, wherein the ROI detector comprises:
A feature information calculator configured to calculate feature information representing coordinates at which the ROI is located in the generated frame image; More,
And the ROI is detected by using the calculated feature information.

The method of claim 5, wherein the location information calculation unit
An area separator configured to distinguish between the ROI detected for each object from the frame images and the portion where the ROI is not detected; And
A rearrangement unit for rearranging the divided ROIs in a predetermined combination method; More,
And calculating the location information based on the combined stream generated by concatenating the rearranged ROIs.

The method of claim 8, wherein the feature information calculation unit
A preprocessor for dividing the generated frame images to generate a grid cell having a predetermined size; And
A second recognizer configured to output the generated frame images as inputs and output a center coordinate of a boundary cell indicating a probability that the object exists in the grid cell or a probability that the object exists in the boundary cell; A calculation unit for calculating a center coordinate of each boundary cell and a probability that the object exists in the boundary cell using a neural network; More,
And calculating the feature information using a boundary cell from which the probability and the center coordinates are calculated.

The method of claim 10, wherein the region of interest detector is
A boundary cell that removes a part of boundary cells that overlap the object in consideration of whether or not a probability that the object exists in the boundary cell among the boundary cells that overlap the object in the frame images is greater than or equal to a preset threshold. Cell removal unit; More,
And detecting the region of interest using the removed boundary cells.

The method of claim 3,
The class vector includes a behavior list classifying the behavior of the object and probability information indicating a probability that the behavior of the objects in the input image corresponds to the behavior list.
And the recognition unit recognizes the actions of the objects by using the probability information for each action list of the summed class vector.

Generating an action stream including motion information about the action of each object in an input image including at least one object; And
By using the first recognizer that receives position information indicating the positional relationship of the action stream in the generated action stream or the input image, and outputs at least one class vector as an indicator for classifying the behavior of the object; Recognizing the behavior of objects; Image recognition method comprising a.

The method of claim 13, wherein the generating step
Dividing the input image over time to generate a plurality of frame images, and detecting an ROI including at least a part of the object for each object in the generated frame images; More,
And generating the action stream using the detected ROI.

15. The method of claim 14, wherein said generating is
Calculating a connection score between the ROIs detected from adjacent different frame images, and concatenating the ROIs detected from the different frame images in consideration of the calculated connection scores; More,
And generating the action stream for each object using the connected ROIs.

The method of claim 15, wherein the generating step
Rearranging the ROIs detected in the frame images in frame image units, and calculating position information regarding mutual positions of the action streams from a combined stream generated by concatenating the rearranged ROIs; Image recognition method further comprises.

The method of claim 15, wherein the connecting step
A similarity function as an input of feature information of the ROIs detected in each adjacent frame image, an intersection ratio function outputting an overlap ratio of the ROIs detected in each of the adjacent frame images, and class information of the ROIs Calculating a connection score in consideration of at least one of the similarities of; More,
And connecting the ROI using the calculated connection score.

The method of claim 17, wherein the connecting step
If there is a frame image in which the ROI is not detected,
Estimating feature information of the ROI in the frame image in which the ROI is not detected, based on the feature information of the ROIs detected in the frame images adjacent to the frame image in which the ROI is not detected; More,
And connecting the region of interest by using the estimated feature information.

The method of claim 16, wherein the calculating of the location information
Dividing the region of interest detected by each object from the frame images and a portion of which the region of interest is not detected; And
Rearranging the separated regions of interest by a predetermined combination method; More,
And calculating the location information based on the combined stream generated by concatenating the rearranged ROIs.

A program stored in a computer-readable recording medium which realizes the image recognition method according to any one of claims 13 to 19 through execution by a processor.