KR101898648B1

KR101898648B1 - Apparatus and Method for Detecting Interacting Groups Between Individuals in an Image

Info

Publication number: KR101898648B1
Application number: KR1020180000409A
Authority: KR
Inventors: 최상일; 유한주; 엄태규
Original assignee: 단국대학교 산학협력단
Priority date: 2018-01-02
Filing date: 2018-01-02
Publication date: 2018-09-13

Abstract

The present invention relates to a device and a method for detecting an interacting group between characters in an image, capable of accurately determining an interacting group by considering not only position information of the individual in the image but also the emotional interrelation between the individuals. The device for detecting an interacting group between characters in an image includes: a candidate group ingathering generating unit generating a candidate group ingathering by inferring the relative distances in a three-dimensional space between people in the image; a geometric relation scale calculation unit calculating the geometric relation scale by inferring the distance between two faces in an actual three-dimensional space in the candidate group ingathering; a social relation scale calculation unit calculating the social relation scale by analyzing social relation information by considering face information of the people and the position information of the people; and an interacting group determining unit performing optimization to determine the interacting group by selecting an ingathering of mutually exclusive groups among the groups which can be generated so that the total of a weighted value of a geometric relation scale value and a social relation scale value becomes maximum.

Description

[0001] Apparatus and Method for Detecting Interaction Group Between Individuals [

본 발명은 영상 분석에 관한 것으로, 구체적으로 영상 내의 개인의 위치 정보뿐만 아니라 개인 간의 감정적인 상호 관계를 고려하여 상호 작용하는 그룹을 보다 정확하게 결정할 수 있도록 한 영상 등장 인물들 사이의 상호 작용 그룹 탐지를 위한 장치 및 방법에 관한 것이다.The present invention relates to image analysis, and more specifically, it relates to a method and apparatus for detecting an interaction group between image characters, which can more accurately determine an interactive group considering not only the positional information of an individual in the image but also the emotional relationship between individuals And more particularly,

최근 소셜 네트워크의 발달로 사람들의 사회활동이 담긴 수많은 영상들이 생성되고 있다. 이러한 영상들은 상호작용과 사람들 간의 사회관계에 대한 중요한 정보를 포함하고 있다.The development of social networks has generated a lot of images of people's social activities. These images contain important information about interaction and social relations among people.

이런 정보들은 현대 사회의 중요한 분야인 시장 거래에서의 고객 분석, 사회 안전 서비스에서의 대상 감시 등에 매우 유용하다. 이를 위한 연관된 정보를 얻기 위해서는 사회활동이 담긴 일상생활의 영상으로부터 상호작용 그룹을 발견하는 기술이 필수적이다.Such information is very useful for analyzing customers in market transactions, which is an important area of modern society, and for monitoring objects in social security services. In order to obtain relevant information for this purpose, it is essential to find an interaction group from images of everyday life containing social activities.

이러한 기술의 필요성으로 영상내의 상호작용 그룹을 결정하는 몇몇 기술들이 제안되고 있다.Several techniques have been proposed to determine the interaction group in an image with the need for such a technique.

하지만, 이와 같은 기술들은 주로 감시카메라를 통해 취득된 영상들을 대상으로 연구를 진행하였다. 감시카메라를 통해 취득된 영상들은 넓은 화각을 가지고 있고 높은 위치의 고정된 장소에서 영상을 취득하기 때문에 사람의 위치에 대해 많은 정보를 가지고 있다.However, these techniques mainly focused on images acquired through surveillance cameras. The images acquired through the surveillance camera have a wide angle of view and have a lot of information about the position of a person because they acquire images at a fixed location at a high position.

하지만 최근 소셜 네트워크 서비스(Social Network Service, SNS)에서 생성되는 다양한 일상생활의 영상들은 다양한 각도와 사물을 촬영하였으며 일부 사물이 가려지는 경우 또한 있다. 그러므로 이전의 기술들을 바로 일상생활의 영상에 적용하는 것은 결코 자명한 과정이 아니다.However, the images of various daily lives generated in the recent social network service (SNS) have taken various angles and objects, and some objects are also occluded. Therefore, applying the previous techniques to the images of everyday life is not a self-evident process.

영상 내 사람들의 상호작용을 분석하는 기술들은 크게 두 가지로 구분할 수 있다.There are two main types of techniques for analyzing the interaction of people in a video.

하나는 사람의 행동을 기반으로 한 기술과 다른 하나는 영상에서 사람들의 위치 구조(formation)를 기반으로 한 기술이다.One is based on human behavior and the other is based on the formation of people in the image.

사람들의 행동을 기반으로 한 방법은 주로 비디오에서 미리 정해둔 동작을 발견하는 것에서 시작한다. 그 후 발견한 동작을 통해 상호작용하는 사람들을 찾고 이를 기반으로 상호작용 그룹을 결정한다. The method based on people's behavior usually starts with finding pre-defined motion in the video. Then, it finds the people who interact with it and determines the interaction group based on them.

현재까지의 행동을 기반으로 한 기술들은 단일 영상이 아닌 비디오로 부터의 연속적인 정보를 주로 사용한다.Techniques based on behavior to date use continuous information from video rather than single images.

그리고 행동을 기반으로 한 기술들은 미리 정해두지 않은 행동을 하는 사람들 간의 상호작용은 발견하지 못한다. 그러므로 이러한 기술들은 SNS로부터 생성되는 일상생활의 영상들에서 상호작용 그룹을 찾는 것에는 한계점을 보인다.And behavior-based technologies do not find interactions among people who do not act in advance. Therefore, these techniques have limitations in finding interaction groups in images of everyday life generated from SNS.

사람들의 구조를 기반으로 한 연구들은 행동을 기반으로 한 방법들보다 정적인 상황을 포함한 보다 다양한 종류의 상호작용을 분석할 수 있다.Studies based on people's structures can analyze a wider variety of interactions, including static situations, than behavior-based ones.

사람들의 구조를 기반으로 한 방법들은 먼저 탑뷰(top view)영상에서 사람들의 평면 위치를 결정하고 각 개인에 대해 얼굴 방향을 추론한다. 그 후 미리 정의한 사람들의 구조들 중에서 상호작용 그룹을 발견한다.The methods based on the structure of people first determine the plane position of the people in the top view image and deduce the face direction for each individual. Then we find an interaction group among the predefined structures of people.

하지만, 이러한 방법들은 탑뷰에서 촬영된 영상에서 사람들의 위치를 쉽게 얻을 수 있다는 가정을 하고 있다. 이러한 가정은 SNS로부터 발생하는 영상들이 아닌 감시카메라의 영상에서만 충족 될 수 있다.However, these methods assume that people can be easily located in the images captured in Topview. This assumption can be satisfied only in the surveillance camera image, not in the images originating from the SNS.

다른 방법으로 자동으로 영상에서 사람들의 탑뷰에서의 위치를 추론하는 방법이 있으나, 이 방법은 정적 상황을 포함하여 다양한 상황에서도 상호작용 그룹을 찾을 수 있다는 장점이 있지만 찾고자 하는 그룹의 구조가 미리 정의되어야 하는 제약이 있다.Another way is to automatically infer the location of people in the top view of the video, but this method has the advantage of being able to find the interaction group even in various situations including the static situation, but the structure of the group to be searched must be defined in advance .

또한, 얼굴의 방향이 영상에서 찾고자 하는 구조를 결정하는 데 중요한 역할을 하기 때문에 최종 결과가 추정된 얼굴 방향의 오류에 매우 민감하다.Also, since the direction of the face plays an important role in determining the structure to be searched in the image, the final result is very sensitive to the error of the estimated face direction.

그러므로 많은 수의 사람들과 다양한 조건에서 촬영된 일상생활 영상에 적용하는 데 한계가 있다.Therefore, there is a limit to apply to a large number of people and daily life images photographed under various conditions.

대한민국 공개특허 제10-2009-0024969호Korea Patent Publication No. 10-2009-0024969 대한민국 등록특허 제10-1782590호Korean Patent No. 10-1782590 대한민국 등록특허 제10-1715708호Korean Patent No. 10-1715708

본 발명은 이와 같은 종래 기술의 영상 분석 및 상호 작용 검출의 문제를 해결하기 위한 것으로, 영상 내의 개인의 위치 정보뿐만 아니라 개인 간의 감정적인 상호 관계를 고려하여 상호 작용하는 그룹을 보다 정확하게 결정할 수 있도록 한 영상 등장 인물들 사이의 상호 작용 그룹 탐지를 위한 장치 및 방법을 제공하는데 그 목적이 있다.The present invention solves the problem of image analysis and interaction detection of the related art, and it is an object of the present invention to provide an image processing apparatus and method that can more accurately determine an interactive group considering not only position information of an individual in an image but also emotional relationship among individuals It is an object of the present invention to provide an apparatus and method for detecting an interaction group between video characters.

본 발명은 새로운 상호 작용 척도를 설계함으로써 일상생활 영상에서 상호 작용하는 그룹을 탐지하는 영상 등장 인물들 사이의 상호 작용 그룹 탐지를 위한 장치 및 방법을 제공하는데 그 목적이 있다.It is an object of the present invention to provide an apparatus and method for detecting an interaction group between image characters that detect an interactive group in a daily life image by designing a new interaction scale.

본 발명은 일상생활 영상에서 상호 작용하는 그룹 탐지의 성능을 평가하기 위해 새로운 데이터 셋을 제작하여 영상 분석 및 상호 작용 검출의 효율성 검증을 통하여 성능을 높일 수 있도록 한 영상 등장 인물들 사이의 상호 작용 그룹 탐지를 위한 장치 및 방법을 제공하는데 그 목적이 있다.The present invention relates to an interactive group of image characters to enhance the performance through the verification of the efficiency of image analysis and interaction detection by producing a new data set for evaluating the performance of group detection that interacts in the everyday life image And to provide a device and method for detection.

본 발명은 사회적 관계와 같은 의미론적 단서와 사람들이 위치한 지점의 기하학적 정보를 이용하여 상호 작용하는 그룹을 보다 정확하게 탐지할 수 있도록 한 영상 등장 인물들 사이의 상호 작용 그룹 탐지를 위한 장치 및 방법을 제공하는데 그 목적이 있다.The present invention provides an apparatus and method for detecting an interaction group among image characters so as to more accurately detect an interacting group using semantic clues such as social relations and geometric information of a point where people are located It has its purpose.

본 발명은 상호 작용하는 그룹을 탐지하기 위한 새로운 데이터 셋인 MLPA Social Group 데이터 셋을 제작하고 데이터 셋을 사용한 성능 평가로 전략 수립, 카메라 보안 및 다양한 사회학적 분석과 같은 사회 경제적 분석에 유용하게 적용할 수 있도록 한 영상 등장 인물들 사이의 상호 작용 그룹 탐지를 위한 장치 및 방법을 제공하는데 그 목적이 있다.The present invention can be applied to socioeconomic analysis such as strategy establishment, camera security and various sociological analysis by making MLPA Social Group dataset, which is a new data set for detecting interactive groups, The present invention provides an apparatus and method for detecting an interaction group among a plurality of video characters.

본 발명의 목적들은 이상에서 언급한 목적들로 제한되지 않으며, 언급되지 않은 또 다른 목적들은 아래의 기재로부터 당업자에게 명확하게 이해될 수 있을 것이다.The objects of the present invention are not limited to the above-mentioned objects, and other objects not mentioned can be clearly understood by those skilled in the art from the following description.

이와 같은 목적을 달성하기 위한 본 발명에 따른 영상 등장 인물들 사이의 상호 작용 그룹 탐지를 위한 장치는 영상내에서 사람들 사이의 3차원 공간에서의 상대 거리를 추론하여 후보 그룹 집합 생성을 하는 후보 그룹 집합 생성부;후보 그룹 집합에서 실제 3차원 공간에서의 두 얼굴 사이의 거리를 추론하여 기하학적 관계 척도 산출을 하는 기하학적 관계 척도 산출부;사람들의 위치 정보와 함께 사람들의 얼굴 정보를 고려하여 사회적 관계 정보를 분석하여 사회적 관계 척도 산출을 하는 사회적 관계 척도 산출부;생성 될 수 있는 그룹들 중에서 상호 배타적인 그룹들의 집합을 기하학적 관계 척도 값과 사회적 관계 척도 값의 가중치 합이 최대가 되도록 선택하여 상호작용 그룹 결정을 위한 최적화를 수행하는 상호작용 그룹 결정부;를 포함하는 것을 특징으로 한다.According to an aspect of the present invention, there is provided an apparatus for detecting an interaction group among a plurality of video character sets, the apparatus comprising: a candidate group set generation unit for generating a candidate group set by inferring a relative distance in a three- A geometric relation scale calculation unit that calculates a geometrical relationship scale by inferring the distance between two faces in a real three-dimensional space in a candidate group set, and calculates social relation information by considering the face information of people together with the location information of people A social relation scale calculation unit for calculating a social relation scale and selecting a group of mutually exclusive groups among the groups that can be generated so as to maximize the weight sum of the values of the geometrical relationship scale and the social relation scale to be the interaction group And an interaction group determination unit for performing optimization for .

다른 목적을 달성하기 위한 본 발명에 따른 영상 등장 인물들 사이의 상호 작용 그룹 탐지를 위한 방법은 영상내 사람들의 얼굴 크기 정보를 사용하여 사람들 사이의 3차원 공간에서의 상대 거리를 추론하여 후보 그룹 집합 생성을 하는 단계;영상에서 실제 3차원 공간에서의 두 얼굴 사이의 거리를 추론하여 기하학적 관계 척도 산출을 하는 단계;사람들의 위치 정보와 함께 사람들의 얼굴 정보를 고려하여 사회적 관계 정보를 분석하여 사회적 관계 척도 산출을 하는 단계;생성 될 수 있는 그룹들 중에서 상호 배타적인 그룹들의 집합을 기하학적 관계 척도 값과 사회적 관계 척도 값의 가중치 합이 최대가 되도록 선택하여 상호작용 그룹 결정을 위한 최적화를 수행하는 단계;를 포함하는 것을 특징으로 한다.According to another aspect of the present invention, there is provided a method for detecting an interaction group between video characters, the method comprising: estimating a relative distance in a three- A step of generating a geometric relationship measure by inferring the distance between two faces in an actual three-dimensional space in the image, analyzing the social relationship information by considering the face information of the people together with the location information of the people, Performing optimization for determining an interaction group by selecting a set of mutually exclusive groups among groups that can be generated such that a sum of a weight of a geometric relation measure value and a social relation measure value is maximized; And a control unit.

이와 같은 본 발명에 따른 영상 등장 인물들 사이의 상호 작용 그룹 탐지를 위한 장치 및 방법은 다음과 같은 효과를 갖는다.The apparatus and method for detecting an interaction group among video characters according to the present invention have the following effects.

첫째, 영상 내의 개인의 위치 정보뿐만 아니라 개인 간의 감정적인 상호 관계를 고려하여 상호 작용하는 그룹을 보다 정확하게 결정할 수 있도록 한다.First, it allows the user to more accurately determine the interacting group by taking into account the emotional interrelationship among the individuals as well as the location information of the individual in the image.

둘째, 새로운 상호 작용 척도를 설계함으로써 일상생활 영상에서 상호 작용하는 그룹을 효과적으로 탐지할 수 있다.Second, by designing new interaction measures, interactive groups can be effectively detected in daily life images.

셋째, 일상생활 영상에서 상호 작용하는 그룹 탐지의 성능을 평가하기 위해 새로운 데이터 셋을 제작하여 영상 분석 및 상호 작용 검출의 효율성 검증을 통하여 성능을 높일 수 있다.Third, we can improve the performance by verifying the effectiveness of image analysis and interaction detection by creating a new data set to evaluate the performance of group detection that interacts in daily life images.

넷째, 사회적 관계와 같은 의미론적 단서와 사람들이 위치한 지점의 기하학적 정보를 이용하여 상호 작용하는 그룹을 보다 정확하게 탐지할 수 있도록 한다.Fourth, we use semantic clues such as social relations and geometric information of the points where people are located to more accurately detect interacting groups.

다섯째, 상호 작용하는 그룹을 탐지하기 위한 새로운 데이터 셋인 MLPA Social Group 데이터 셋을 제작하고 데이터 셋을 사용한 성능 평가로 전략 수립, 카메라 보안 및 다양한 사회학적 분석과 같은 사회 경제적 분석에 유용하게 적용할 수 있도록 한다.Fifth, MLPA Social Group dataset, which is a new data set for detecting interactive groups, is produced and performance evaluation using dataset is applied to socio-economic analysis such as strategy establishment, camera security and various sociological analysis. do.

도 1은 본 발명에 따른 영상 등장 인물들 사이의 상호 작용 그룹 탐지를 위한 장치의 구성도
도 2는 본 발명에 따른 영상 등장 인물들 사이의 상호 작용 그룹 탐지를 위한 방법을 나타낸 플로우 차트
도 3은 후보 그룹 집합 생성의 일 예를 나타낸 구성도
도 4는 본 발명에 따른 영상 등장 인물들 사이의 상호 작용 그룹 탐지 장치의 상세 구성도
도 5a 내지 도 5c는 기하학적 관계 척도 산출의 일 예를 나타낸 구성도
도 6은 상호작용 그룹 결정을 위한 최적화 과정을 나타낸 구성도
도 7은 MLPA Social Group 데이터 셋의 예를 나타낸 구성도
도 8은 MLPA Social Group 데이터 셋에서의 비교 결과를 나타낸 구성도
도 9는 관계 척도별 성능을 비교한 구성도1 is a block diagram of an apparatus for detecting an interaction group between video characters according to the present invention.
Figure 2 is a flow chart illustrating a method for detecting an interaction group between video characters according to the present invention.
3 is a block diagram showing an example of generation of a candidate group set
FIG. 4 is a detailed block diagram of an apparatus for detecting an interaction group among video characters according to the present invention.
Figs. 5A to 5C are diagrams showing an example of geometric relationship measure calculation
6 is a block diagram illustrating an optimization process for determining an interaction group
7 is a diagram showing an example of an MLPA Social Group dataset
8 is a diagram showing a comparison result in the MLPA Social Group data set
FIG. 9 is a diagram showing a comparison of performance by relational metrics

이하, 본 발명에 따른 영상 등장 인물들 사이의 상호 작용 그룹 탐지를 위한 장치 및 방법의 바람직한 실시 예에 관하여 상세히 설명하면 다음과 같다.DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, preferred embodiments of an apparatus and method for detecting an interaction group among video characters according to the present invention will be described in detail.

본 발명에 따른 영상 등장 인물들 사이의 상호 작용 그룹 탐지를 위한 장치 및 방법의 특징 및 이점들은 이하에서의 각 실시 예에 대한 상세한 설명을 통해 명백해질 것이다.The features and advantages of an apparatus and method for detecting an interaction group between video characters according to the present invention will be apparent from the following detailed description of each embodiment.

도 1은 본 발명에 따른 영상 등장 인물들 사이의 상호 작용 그룹 탐지를 위한 장치의 구성도이고, 도 2는 본 발명에 따른 영상 등장 인물들 사이의 상호 작용 그룹 탐지를 위한 방법을 나타낸 플로우 차트이다.FIG. 1 is a block diagram of an apparatus for detecting an interaction group among video characters according to the present invention, and FIG. 2 is a flowchart illustrating a method for detecting an interaction group among video characters according to the present invention .

본 발명에 따른 영상 등장 인물들 사이의 상호 작용 그룹 탐지를 위한 장치 및 방법은 사회적 관계와 같은 의미론적 단서와 사람들이 위치한 지점의 기하학적 정보를 이용하여 상호 작용하는 그룹을 보다 정확하게 탐지할 수 있도록 한 것이다.An apparatus and method for detecting an interaction group among video characters according to the present invention can detect a group interacting more accurately using semantic clues such as social relations and geometric information of a point where people are located will be.

이를 위하여 본 발명은 영상에서 사람들의 얼굴 사이의 기하학적 관계를 사용하여 실제 상호 작용하는 그룹의 후보를 만드는 구성을 포함할 수 있다.To this end, the present invention may include a configuration for creating candidates for groups that are actually interacting using geometric relationships between faces of people in an image.

본 발명은 개인이 생성 된 그룹의 각각에서 서로 상호 작용하는 정도를 정량적으로 계산하고, 이를 위해 표정, 성별, 나이를 통해 추론 된 정서적 관련성뿐만 아니라 각 개인의 위치 및 시선과 같은 신체적 정보를 고려하는 구성을 포함할 수 있다.The present invention quantitatively calculates the extent to which individuals interact with each other in the generated group and considers the physical information such as the location and gaze of each individual as well as the emotional relevance deduced through facial expression, sex, and age Configuration.

본 발명은 기하학적 관계의 분석을 위하여 개인이 위치하는 지점 간의 직선거리와 개인의 상대적인 얼굴 크기를 고려하여 사람 간의 거리를 계산하고, 사회적 관계를 분석하기 위해서는 사회학의 Interpersonal Circle을 기반으로 사람간의 사회적 관계를 8가지 종류로 구분하고, SRN(Social Relations Network)를 사용하여 그 결과를 사회적 관계 측정에 적용하는 구성을 포함할 수 있다.In order to analyze the geometric relationship, the present invention calculates the distance between people considering the straight line distance between the point where the individual is located and the relative face size of the individual, and analyzes the social relationship based on the interpersonal circle of sociology, Can be classified into eight categories, and SRN (Social Relations Network) can be used to apply the results to social relationship measurement.

본 발명에 따른 영상 등장 인물들 사이의 상호 작용 그룹 탐지를 위한 장치는 도 1에서와 같이, 사람들의 얼굴 크기 정보를 사용하여 사람들 사이의 3차원 공간에서의 상대 거리를 추론하여 후보 그룹 집합 생성을 하는 후보 그룹 집합 생성부(10)와, 일상생활 영상에서 실제 3차원 공간에서의 두 얼굴 사이의 거리를 추론하여 기하학적 관계 척도 산출을 하는 기하학적 관계 척도 산출부(20)와, 사람들의 위치 정보와 함께 사람들의 얼굴 정보를 고려하여 사회적 관계 정보를 분석하여 사회적 관계 척도 산출을 하는 사회적 관계 척도 산출부(30)와, 생성 될 수 있는 그룹들 중에서 상호 배타적인 그룹들의 집합을 기하학적 관계 척도 값과 사회적 관계 척도 값의 가중치 합이 최대가 되도록 선택하여 상호작용 그룹 결정을 위한 최적화를 수행하는 상호작용 그룹 결정부(40)를 포함한다.As shown in FIG. 1, an apparatus for detecting an interaction group among video characters according to the present invention generates a candidate group set by inferring a relative distance in a three-dimensional space between people using face size information of people A geometric relation scale calculation unit 20 for estimating a distance between two faces in an actual three-dimensional space in the daily life image to calculate a geometrical relation scale, A social relation scale calculating unit 30 for calculating a social relation scale by analyzing the social relation information by considering the face information of the people together, and a group of mutually exclusive groups among the groups that can be generated, An interaction that performs optimization for the determination of an interaction group by selecting the weighted sum of the relational measure values to be maximal And a determination unit 40. The

본 발명에 따른 영상 등장 인물들 사이의 상호 작용 그룹 탐지를 위한 방법을 구체적으로 설명하면 다음과 같다.A method for detecting an interaction group among video characters according to the present invention will be described in detail as follows.

도 2에서와 같이, 사람들의 얼굴 크기 정보를 사용하여 사람들 사이의 3차원 공간에서의 상대 거리를 추론하여 후보 그룹 집합 생성을 한다.(S201)2, the candidate group set is created by inferring the relative distance in the three-dimensional space between people using the face size information of the people (S201)

이어, 일상생활 영상에서 실제 3차원 공간에서의 두 얼굴 사이의 거리를 추론하여 기하학적 관계 척도 산출을 한다.(S202)Next, the geometric relationship scale is calculated by inferring the distance between two faces in the actual three-dimensional space in the daily life image (S202)

그리고 사람들의 위치 정보와 함께 사람들의 얼굴 정보를 고려하여 사회적 관계 정보를 분석하여 사회적 관계 척도 산출을 한다.(S203)Then, the social relation scale is calculated by analyzing the social relationship information in consideration of the location information of the people and the face information of the people (S203)

이어, 생성 될 수 있는 그룹들 중에서 상호 배타적인 그룹들의 집합을 기하학적 관계 척도 값과 사회적 관계 척도 값의 가중치 합이 최대가 되도록 선택하여 상호작용 그룹 결정을 위한 최적화를 수행한다.(S204)Then, optimization is performed for determining an interaction group by selecting a set of mutually exclusive groups from among the groups that can be generated so that the sum of the weights of the geometric relation measure and the social relation measure is maximized (S204)

도 3은 후보 그룹 집합 생성의 일 예를 나타낸 구성도이다.3 is a block diagram showing an example of generation of a candidate group set.

이하의 설명에서 '상호작용'은 두 명 혹은 그 이상의 사람들이 서로 간에 직접적으로 영향을 주는 행동들을 의미한다.In the following description, "interaction" means actions in which two or more people directly affect each other.

예를 들어, 대화를 하거나, 함께 식사를 하거나, 서로 바라보는 것을 포함하고, 본 발명은 주어진 일상생활의 사회적 활동이 담긴 영상 내에서 서로 상호작용을 하는 사람들을 그룹화하는 것이다.For example, the present invention includes grouping people who interact with each other in an image containing a given social activity of everyday life, including conversation, eating together, or looking at each other.

여기서, 영상

내에 존재하는

명의 얼굴 정보를 가지고 있는 집합을

라고 정의한다.Here,

Existing within

A set of face information

.

여기서,

는 각각

번째 사람의 얼굴 정보, 영상에서 얼굴의 중심 좌표 그리고 영상에서 해당 얼굴의 크기를 나타낸다. 그 후 상호 배타적인 상호작용 하는 사람들의 그룹들을 찾는다. here,

Respectively

The center coordinates of the face in the image, and the size of the face in the image. It then looks for groups of mutually exclusive interacting people.

여기서,

의 부분 집합

를 임의의 그룹들의 집합인 후보 그룹 집합이라고 하며 다음 조건을 만족한다.here,

Subset of

Is a set of candidate groups, which is a set of arbitrary groups, and satisfies the following conditions.

여기서,

는

에 속하는 그룹을 나타내고 그룹의 수(

)의 범위는 1부터

사이의 값을 갖는다.here,

The

And the number of groups (

) Ranges from 1 to

Lt; / RTI >

상호 작용 척도

를 정의하여 그룹

에 속한 개체들 간의 상호 작용 정도를 수치화하고 수학식 2의 목적 함수를 충족하는 최적의 그룹 집합

를 결정한다.Interaction scale

Group to define

And the optimal group set satisfying the objective function of Equation (2)

.

도 3은 입력으로부터 후보 그룹 집합을 생성하고 후보들 중에서 최적의 그룹 집합

를 찾는 예를 보여준다.FIG. 3 illustrates a method of generating a candidate group set from an input,

.

(a)에서는 입력된 영상 및 얼굴 정보를 예시로 나타낸다. (b)에서는 입력된 영상과 정보들을 그래픽 모델로 나타내고 있다. (c)에서는 (b)로 부터 생성된 후보 그룹 집합을 나타내며 각 그룹 집합은 둘 이상의 그룹으로 구성된다.(a) shows an input image and face information as an example. (b), the input image and information are represented by a graphic model. (c) shows a candidate group set generated from (b), and each group set is composed of two or more groups.

일부 그룹 집합은

및

에서처럼 공통 그룹을 가질 수 있다. Some group sets

And

As shown in FIG.

본 발명은 크게 후보 그룹 집합 생성, 각 그룹에 대한 상호 작용 척도 계산, 최적의 그룹 집합 탐색의 세 단계로 구성된다.The present invention largely consists of three steps: generation of a candidate group set, calculation of an interaction scale for each group, and search for an optimal group set.

후보 그룹 집합 생성 단계에서는 기하학적 제약 조건을 정의하고 이를 기반으로 얼굴 정보 집합

로부터

를 생성한다.In the candidate group set generation step, a geometric constraint is defined,

from

.

그 다음, 관계 척도로 각 그룹에 대한 기하학적 관계 및 사회적 관계의 정도를 계산한다.Then, calculate the degree of geometric and social relations for each group on a relational scale.

마지막으로 최적 그룹 집합 탐색 과정에서는 수학식 2의 목적 함수를 만족하는 최적의 그룹 집합을 최적화 과정을 통해 결정한다.Finally, in the optimal group set search process, an optimal group set satisfying the objective function of Equation (2) is determined through an optimization process.

도 4는 본 발명에 따른 영상 등장 인물들 사이의 상호 작용 그룹 탐지 장치의 상세 구성도이다.FIG. 4 is a detailed block diagram of an apparatus for detecting an interaction group among video characters according to the present invention.

기하학적 제약 조건을 만족하는 후보 그룹 집합을 생성하고 각 그룹에 대해 기하학적 및 사회적 관계를 기반으로 한 상호 작용 척도를 계산한다.Create a set of candidate groups that satisfy the geometric constraints and calculate the interaction scale based on geometric and social relations for each group.

그 후 최적의 상호 작용 그룹 집합

를 결정한다.The optimal interaction group set

.

임의의 그룹(

)가 상호 작용하는 사람들로 잘 그룹화 되었는지 평가하기 위한 상호 작용 척도는 다음과 같이 기하학적 관계 척도

와 사회적 관계 척도

로 계산된다.Any group (

) Were grouped well into interacting people, the interaction scale was used to measure the geometric relationship scale

And social relations scale

.

여기서,

는 두 관계의 균형을 맞추기 위한 설계 매개 변수로서, 0에서 1 사이의 값을 갖는다.here,

Is a design parameter for balancing the two relations and has a value between 0 and 1.

후보 그룹 집합 생성에 관하여 구체적으로 설명하면 다음과 같다.The generation of the candidate group set will be described in detail as follows.

영상내에서 먼 곳에 있는 사람들은 서로 상호 작용할 것 같지 않지만 이론적으로는 영상의 모든 사람들을 집합적으로 그룹화 하는 것이 가능하다.People in distant places in the image are unlikely to interact with each other, but it is theoretically possible to collectively group all people in the image.

따라서 상호 작용하는 그룹을 탐지 할 때, 계산량 및 성능 측면에서 멀리 떨어져있는 사람들은 그룹화에 제외하는 것이 효율적이다. Therefore, when detecting an interactive group, it is effective to exclude people who are far from the computation and performance in grouping.

이를 위해 본 발명은 기하학적 제약 조건을 만족하는 사람 쌍을 생성 한 후에 그룹화 과정을 진행한다.To this end, the present invention creates a pair of persons satisfying the geometric constraint, and then performs a grouping process.

사람들 사이의 거리를 측정하기 위해서는 3차원 공간에서 개인의 위치에 대한 정보가 필요하다. 그러나 일상생활에서 촬영된 영상들에서 사람들의 3차원 공간에서의 위치를 추론하는 것은 쉽지 않다.In order to measure the distance between people, information about the position of an individual in a three-dimensional space is needed. However, it is not easy to deduce the position of people in three-dimensional space in images taken in everyday life.

이를 해결하기 위해 사람들의 얼굴 크기 정보를 사용하여 사람들 사이의 3차원 공간에서의 상대 거리를 추론한다.To solve this problem, we use people's face size information to deduce the relative distance in three-dimensional space between people.

두 임의의 얼굴

과

의 얼굴 크기의 비율이 일정 크기보다 크다면, 두 사람이 3차원 공간에서 아주 멀리 떨어져 있다고 가정한다. 이 가정을 기반으로 두 얼굴의 비율

을 수학식 4에서와 같이 정의하여 두 얼굴이 그룹화 될 만큼 가까운 지의 여부를 판단한다.Two random faces

and

Is larger than a certain size, it is assumed that the two persons are far away from the three-dimensional space. Percentage of two faces based on this assumption

Is defined as shown in Equation (4) to determine whether the two faces are close enough to be grouped.

일 예로,

의 값이 설계 파라미터

보다 작거나 같을 때만 두 얼굴

과

을 그룹화 할 수 있다고 하고 이 조건을 기하학적 제약 조건이라고 한다. For example,

Lt; RTI ID = 0.0 >

Two faces only when less than or equal to

and

Can be grouped, and this condition is called a geometric constraint.

은 두 사람의 쌍에 대한 집합으로 각 쌍은 그룹화 될 수 있는 사람들로 구성된다. 그 후

의 원소들을 상호 배타적으로 선택하여 그룹

를 생성하고 생성된

를 통해 후보 그룹 집합

을 생성 한다.

Is a set of pairs of two, each pair consisting of people that can be grouped. After that

Of the elements of the group < RTI ID = 0.0 >

Lt; RTI ID = 0.0 >

Through the candidate group set

.

기하학적 관계 척도 산출에 관하여 구체적으로 설명하면 다음과 같다.A detailed description of the calculation of the geometric relationship scale is as follows.

도 5a 내지 도 5c는 기하학적 관계 척도 산출의 일 예를 나타낸 구성도이다.Figs. 5A to 5C are diagrams showing an example of geometric relationship measure calculation. Fig.

영상에서 각 사람의 위치는 개인 간의 상호 작용을 이해하는 데 중요한 단서이다. 상호 작용하는 사람들은 일반적으로 가까운 거리에 위치하며 사람들의 위치는 상호 작용의 유형에 따라 특정한 구조를 나타낸다.The position of each person in a video is an important clue to understanding interpersonal interactions. Interacting people are usually located at close distances, and the location of people represents a particular structure depending on the type of interaction.

사람들의 위치 정보를 사용하는 방법들은 주로 탑뷰 영상에서 사람들의 위치 구조를 기반으로 그들의 상호작용을 추론한다. 그러나 탑뷰 영상은 일상생활에서 쉽게 취득 할 수 없기 때문에 분석할 수 있는 영상의 종류에는 한계가 있다.The methods of using people's location information mainly deduce their interaction based on people's location structure in the top view image. However, since the top view image can not be easily acquired in everyday life, there are limitations on the types of images that can be analyzed.

본 발명은 이러한 한계를 극복하기 위해 일상생활 영상에서 실제 3차원 공간에서의 두 얼굴

과

사이의 거리

를 추론한다.In order to overcome these limitations, the present invention has been applied to a method for detecting two faces

and

Distance between

.

수학식 5에서와 같이, 얼굴 사이의 직선거리를 얼굴 크기로 보정하여 얼굴 사이의 거리를 계산한다.As in Equation (5), the distance between the faces is calculated by correcting the straight line distance between the faces to the face size.

추론된 거리

는 다음과 같이 임의의 그룹

에 대한 기하학적 척도를 정의하는 데 사용된다.Inferred distance

Is an arbitrary group

Is used to define the geometric metric for.

여기서

는

에 있는 모든 쌍의 집합이다.here

The

Is a set of all pairs in.

사회적 관계 척도 산출에 관하여 구체적으로 설명하면 다음과 같다.A detailed description of the calculation of the social relationship scale is as follows.

개인의 위치 정보는 상호 작용하는 그룹을 결정하는 중요한 단서이다. 하지만 정서적인 공감과 같은 정적인 상황이나 특정한 행동 없이 일정 거리에 있는 사람들 사이에서도 상호 작용은 발생할 수 있다.Personal location information is an important clue to determine which groups interact. Interactions can occur, however, even among people who are at a certain distance from static situations, such as emotional empathy, or without specific behavior.

이를 해결하기 위하여 본 발명에서는 다양한 형태의 상호작용 탐지를 위해 사람들의 위치 정보와 함께 사람들의 얼굴 정보를 고려하여 사회적 관계 정보를 분석한다.In order to solve this problem, the present invention analyzes social relation information by considering face information of people along with location information of people for various types of interaction detection.

본 발명은 확장된 Siamese 신경망의 형태의 SRN을 사용한다.The present invention uses SRN in the form of an expanded Siamese neural network.

SRN은 두 사람 사이의 사회적 관계 정도를 결정하기 위해 심층 합성곱 신경망(Deep Convolutional Network, DCN)을 기반으로 추가적인 특징을 사용하고 있다.SRN uses additional features based on the Deep Convolutional Network (DCN) to determine the degree of social relationship between the two.

SRN에서는 사회적 관계를 대표적인 사회학 이론 중 하나인 'Interpersonal circle'을 기반으로 '신뢰', '애착', '감정을 드러내는', '확신', '지배적', '경쟁적', '따뜻한', '친밀한'의 8가지로 정의한다.In SRN, social relations are defined as 'trust', 'attachment', 'revealing emotions', 'assurance', 'dominant', 'competitive', 'warm', 'intimate' '.

SRN은 한 쌍의 얼굴 영상(

)을 입력으로 부분 모듈인 DCN을 통해 나이, 성별, 얼굴 표정, 얼굴 방향 정보를 포함한 20 차원 특징 벡터를 추출한다.SRN is a pair of face images (

), The 20-dimensional feature vector including age, gender, facial expression, and face direction information is extracted through the partial module DCN.

추출된 특징 벡터는 Siamese 신경망과 유사한 형태로 구성된 신경망을 통해 최종적으로 사회적 관계를 나타내는 8차원 벡터

를 출력으로 하며 이때

은

를 만족한다.The feature vector extracted is a 8-dimensional vector representing the social relation through the neural network constructed in the form similar to Siamese neural network

And outputs

silver

.

기하학적 제약 조건을 만족하는

의 쌍

에 대한

을 사용하여 다음과 같이 그룹

에 대한 사회적 관계 척도

를 정의한다.Satisfy geometric constraints

Pair of

For

Use the following groups

Social Relations Scale for

.

상호작용 그룹 결정을 위한 최적화 과정을 구체적으로 설명하면 다음과 같다.The optimization process for determining the interaction group will be described in detail as follows.

도 6은 상호작용 그룹 결정을 위한 최적화 과정을 나타낸 구성도이다.6 is a block diagram illustrating an optimization process for determining an interaction group.

수학식 2의 목적 함수는

로부터 생성 될 수 있는 그룹들 중에서 상호 배타적인 그룹들의 집합을 기하학적 관계 척도 값과 사회적 관계 척도 값의 가중치 합이 최대가 되도록 선택 한다.The objective function of Equation (2)

A group of mutually exclusive groups is selected such that the weighted sum of the values of the geometric relationship measure and the social relation measure is maximized.

따라서 수학식 2는 조합 최적화 문제(Combinatorial optimization problem)의 한 유형이다. 이를 기반으로 수학식 2를 풀기 위해 본래 문제와 수학적으로 동등한 2진 정수 프로그래밍(Binary Interger Programming, BIP) 문제로 재정식화 한다.Equation 2 is therefore a type of combinatorial optimization problem. Based on this, we re-formulate the problem of Binary Inter- gram Programming (BIP) mathematically equivalent to the original problem to solve the equation (2).

먼저,

를 다음과 같이

에서 생성 할 수 있는 모든 그룹들의 집합으로 정의한다.first,

As follows

Is defined as a set of all groups that can be generated by the user.

여기서,

는

의 원소들로 생성되어 그룹화 될 수 있는 얼굴들의 쌍들로 구성되는 그룹을 나타내며,

은 가능한 모든 가능한 유효한 그룹들의 수를 나타내며 최대

의 값을 갖는다.here,

The

&Lt; / RTI > and pairs of faces that can be created and grouped into elements of < RTI ID = 0.0 >

Represents the number of all possible possible groups,

Lt; / RTI >

의 원소들 중에서 특정 그룹이 최적의 그룹 집합에 포함되는지의 여부를 결정하는 이진 결정 벡터(Binary decision vector)를

로 정의한다.

A binary decision vector that determines whether a particular group is included in the optimal group set

.

또한, 이진 결정 벡터에 대응하는 그룹의 상호작용 척도 값을 갖는 벡터를

로 정의한다.In addition, a vector having an interaction measure value of a group corresponding to a binary decision vector

.

따라서, 수학식 2를 푸는 것은 수학식 9의 BIP에서 최적의 이진 결정 벡터

를 찾는 것과 수학적으로 동일한 것이 된다.Thus, solving Equation (2) can be done in the BIP of Equation (9) using the optimal binary decision vector

Is mathematically equivalent to finding.

수학식 9를 통해 얻어지는

로부터

는 다음과 같이 얻어 진다:(9) < / RTI >

from

Is obtained as follows:

조합 최적화 문제를 풀기 위해 널리 사용되는 분기 한정(Branch and Bound, B & B)방법을 사용한다.To solve the combinatorial optimization problem, we use the widely used branch and bound (B & B) method.

MLPA Social Group 데이터 셋에 관하여 구체적으로 설명하면 다음과 같다.The MLPA Social Group dataset will be described in detail below.

도 7은 MLPA Social Group 데이터 셋의 예를 나타낸 구성도이고, 도 8은 MLPA Social Group 데이터 셋에서의 비교 결과를 나타낸 구성도이다.FIG. 7 is a block diagram showing an example of an MLPA Social Group data set, and FIG. 8 is a block diagram showing a comparison result in an MLPA Social Group data set.

그리고 도 9는 관계 척도별 성능을 비교한 구성도이다.And FIG. 9 is a diagram showing a comparison of the performance of each relational scale.

MLPA Social Group 데이터 셋은 웹에서 수집한 이미지로 구성된 그룹 데이터 셋으로, 다양한 상황, 각도, 사람수의 이미지 약 250장으로 구성되어 사람의 얼굴 정보, 상호작용 그룹의 정답지(ground truth)를 수기로 제작한 것이다.The MLPA Social Group dataset is a set of group data consisting of images collected from the web. It consists of about 250 images of various situations, angles, and numbers of people. It is made.

이와 같은 MLPA Social Group 데이터 셋을 이용하여 본 발명에 따른 영상 등장 인물들 사이의 상호 작용 그룹 탐지를 위한 장치 및 방법을 적용하여 성능을 평가하면 다음과 같다.The performance of the apparatus and method for detecting an interaction group among video characters according to the present invention is evaluated using the MLPA Social Group data set as follows.

표 1은 두 척도 값 중 하나를 사용하지 않으면 성능이 저하되는 것을 보여준다.Table 1 shows that performance drops when one of the two scales is not used.

표 1에서 보는 바와 같이

가 2/3 일 때 사회적 관계는 기하학적 관계에 비해 성능에 더 많은 영향을 미친다. 그러나

가 1 일 때, 두 관계 사이에는 큰 차이가 없다.As shown in Table 1

Is 2/3, social relations have more impact on performance than geometric relationships. But

Is 1, there is no significant difference between the two relations.

이것은 일상생활에서의 대부분의 상호 작용이 거리에 의존하기 때문에 보다 엄격한 기준의 평가에 기하학적 관계가 더 많은 영향을 미치기 때문이다.This is because geometric relationships have a greater impact on the evaluation of stricter standards because most interactions in everyday life depend on distance.

도 9에서와 같이, 기하학적 관계와 사회적 관계 모두 상호 작용 측정에 중요한 역할을 한다는 것을 알 수 있다.As shown in FIG. 9, it can be seen that both geometric and social relations play an important role in the interaction measurement.

이상에서 설명한 본 발명에 따른 영상 등장 인물들 사이의 상호 작용 그룹 탐지를 위한 장치 및 방법은 영상 내의 개인의 위치 정보뿐만 아니라 개인 간의 감정적인 상호 관계를 고려하여 상호 작용하는 그룹을 보다 정확하게 결정할 수 있도록 한 것이다.The apparatus and method for detecting an interaction group among the video characters according to the present invention can detect the position of an individual in a video image, It is.

본 발명은 사회적 관계와 같은 의미론적 단서와 사람들이 위치한 지점의 기하학적 정보를 이용하여 상호 작용하는 그룹을 보다 정확하게 탐지할 수 있도록 하고, 일상생활 영상에서 상호 작용하는 그룹 탐지의 성능을 평가하기 위해 새로운 데이터 셋을 제작하여 영상 분석 및 상호 작용 검출의 효율성 검증을 통하여 성능을 높일 수 있도록 한 것이다.The present invention provides a method and apparatus for detecting groups interacting more accurately using semantic clues such as social relations and geometric information of points where people are located, It is possible to improve the performance by verifying the efficiency of image analysis and interaction detection by creating a data set.

이상에서의 설명에서와 같이 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 변형된 형태로 본 발명이 구현되어 있음을 이해할 수 있을 것이다.As described above, it will be understood that the present invention is implemented in a modified form without departing from the essential characteristics of the present invention.

그러므로 명시된 실시 예들은 한정적인 관점이 아니라 설명적인 관점에서 고려되어야 하고, 본 발명의 범위는 전술한 설명이 아니라 특허청구 범위에 나타나 있으며, 그와 동등한 범위 내에 있는 모든 차이점은 본 발명에 포함된 것으로 해석되어야 할 것이다.It is therefore to be understood that the specified embodiments are to be considered in an illustrative rather than a restrictive sense and that the scope of the invention is indicated by the appended claims rather than by the foregoing description and that all such differences falling within the scope of equivalents thereof are intended to be embraced therein It should be interpreted.

10. 후보 그룹 집합 생성부 20. 기하학적 관계 척도 산출부
30. 사회적 관계 척도 산출부 40. 상호작용 그룹 결정부10. Candidate group set generation unit 20. Geometrical relation scale calculation unit
30. Social Relative Scale Calculation Unit 40. Interaction Group Decision Unit

Claims

A candidate group set generation unit for generating a candidate group set by deducing a relative distance in a three-dimensional space between people in an image;
A geometric relationship measure calculating unit for calculating a geometrical relation measure by deducing a distance between two faces in an actual three dimensional space in a candidate group set;
A social relation scale calculation unit for calculating the social relation scale by analyzing the social relation information by considering the face information of the people together with the location information of the people;
And an interaction group determination unit for performing optimization for determining an interaction group by selecting a set of groups mutually exclusive among the groups that can be generated so that the sum of the weights of the geometric relationship measure and the social relation measure is maximized and,
The candidate group set generation unit generates a candidate group set

Existing within

A set of face information

&Lt; / RTI >

Respectively

And the face coordinates of the face in the image, and the size of the face in the image.

delete

2. The apparatus of claim 1, wherein the candidate group set generation unit comprises:

Subset of

Is a set of candidate groups, which is a set of arbitrary groups, the following condition is satisfied,

here,

The

And the number of groups (

) Ranges from 1 to

, And the interaction scale

Group to define

And the optimal group set satisfying the objective function of the above equation

To

Wherein the determination of the interaction group is based on the determination of the interaction group.

4. The method of claim 3,

Is determined,
Any group (

) Were grouped well into interacting people, the interaction scale was used as a geometric relationship scale

And social relations scale

in,

Lt; / RTI >
here,

Is a design parameter for balancing the two relationships and has a value between 0 and 1. A device for detecting an interaction group between video characters.

The apparatus of claim 1, wherein the candidate group set generation unit comprises:
Two random faces

and

Is larger than a certain size, the ratio of the two faces

of

And determining whether grouping of the two faces is to be performed.

6. The method of claim 5,

Lt; RTI ID = 0.0 >

Two faces only when less than or equal to

and

Can be grouped,

Is a set of pairs of two, each pair consisting of people who can be grouped,

Of the elements of the group < RTI ID = 0.0 >

Lt; RTI ID = 0.0 >

Through the candidate group set

And generating an interactive group of video characters.

2. The apparatus according to claim 1,
Two faces in real 3D space in video

and

Distance between

In addition,

The distance between the faces is calculated by correcting the straight line distance between the faces to the face size,
Inferred distance

The

As an arbitrary group

Lt; RTI ID = 0.0 > geometric < / RTI >
here

The

The set of all pairs in the set of video characters.

4. The method of claim 1, wherein the social relational metric calculator uses SRN in the form of an expanded Siamese neural network,
SRN is a pair of face images (

), And 20-dimensional feature vectors including age, gender, facial expression, and facial direction information are extracted through DCN, which is a partial module, and extracted feature vectors are classified into 8 groups representing social relations through a neural network composed of Siamese neural network Dimension vector

And outputs

silver

Wherein the first group comprises a plurality of video characters.

9. The method of claim 8, further comprising:

Pair of

For

Using

Group by

Social Relations Scale for

Wherein the first group of images is defined as a group of images.

2. The apparatus according to claim 1,

Set of all groups that can be created in

Lt;

Lt; / RTI >
here,

The

Represents the number of all possible possible groups,

Lt; / RTI >

, And a vector having an interaction measure value of the group corresponding to the binary decision vector is defined as

Wherein the group is defined as a group consisting of a plurality of characters.

Generating a candidate group set by inferring a relative distance in a three-dimensional space between people using face size information of people in the image;
Calculating a geometric relationship scale by inferring a distance between two faces in an actual three-dimensional space in the image;
Analyzing the social relationship information in consideration of the location information of the people and the face information of the people, and calculating the social relation scale;
Selecting a set of mutually exclusive groups among the groups that can be generated so as to maximize the weighted sum of the values of the geometric relation measure and the social relation measure so as to optimize for the interaction group determination,
In order to generate a candidate group set in the step of generating a candidate group set,

Existing within

A set of face information

&Lt; / RTI >

Respectively

A face coordinate of the face of the first person, a face coordinate of the face, and a size of the face in the image.