KR102552398B1

KR102552398B1 - An area of interest analyzing system and method for attention in video data

Info

Publication number: KR102552398B1
Application number: KR1020220098566A
Authority: KR
Inventors: 김영호; 민윤정; 정형진
Original assignee: 주식회사 메이팜소프트
Priority date: 2022-08-08
Filing date: 2022-08-08
Publication date: 2023-07-07

Abstract

본 발명에 의한 동영상 데이터 내 관심 요구 영역 분석 시스템 및 그 방법은, 온라인 학습 영상 데이터를 제공 하기 앞서서, 학습자의 집중이 요구되는 영역을 미리 파악하기 위한 동영상 데이터의 자동 분석 기술에 관한 것이다.A system and method for analyzing a region requiring interest in video data according to the present invention relates to an automatic analysis technique of video data for preliminarily identifying a region requiring a learner's concentration prior to providing online learning video data.

Description

An area of interest analyzing system and method for attention in video data}

본 발명은 동영상 데이터 내 관심 요구 영역 분석 시스템 및 그 방법에 관한 것으로, 동영상 데이터를 제공받아 시청하는 이용자의 관심이 요구되는 영역 정보, 다시 말하자면, 이용자가 집중해야 하는 영역 정보를 분석할 수 있는 동영상 데이터 내 관심 요구 영역 분석 시스템 및 그 방법에 관한 것이다.The present invention relates to a system and method for analyzing a region requiring interest in video data, and is capable of analyzing information on a region requiring attention of a user receiving and viewing video data, that is, information on a region on which the user should focus. It relates to a system for analyzing a region of interest in data and a method therefor.

학습자(학생 등)는 어느 상황(온라인 학습, 오프라인 학습 등)에서 학습이 이루어지더라도, 학습자의 시선은 학습을 진행하고 있는 화자(speaker)(강의자 등)의 움직임 동선을 따라 움직이는 것이 일반적인 현상이다.It is a general phenomenon that learners (students, etc.) move along the movement line of the speaker (lecturer, etc.) .

이러한 학습자의 시선의 움직임과 화자의 움직임 동선의 일치도와 성적이 비례한다는 연구도 있다.There is also a study that the degree of coincidence between the movement of the learner's gaze and the movement of the speaker is proportional to the grade.

그렇지만, 최근들어, 온라인 교육 자체가 활성화되고 있을 뿐 아니라, 다양한 환경 조건 등으로 인해 실제 학습이 진행되는 장소(교실, 강의실 등)가 아닌 곳에서 혼자 공부해야 하는 상황이 발생되고 있다.However, in recent years, not only online education itself has been activated, but also a situation in which students have to study alone in a place other than a place (classroom, lecture room, etc.) where actual learning is conducted due to various environmental conditions.

이 때, 학습자 스스로 온라인 학습 동영상에 집중하는 것은, 화자와 한 장소에, 다시 말하자면, 실제 강의실에서 강의자와 한 공간에 있는 것과는 집중도 차이가 크며, 집중 자체를 스스로 유도하는 것이 쉽지 않은 것이 현실이다.At this time, the learners themselves focusing on the online learning videos have a large difference in concentration from being in the same place with the speaker, that is, in the same space as the lecturer in the actual classroom, and it is not easy to induce the concentration itself.

이러한 현상은 저학년으로 내려갈수록 많이 나타나는 것이 당연하다.It is natural that this phenomenon appears more and more as you go down to the lower grades.

이에 따라, 자기 주도 학습의 기본이라 할 수 있는 스스로 집중할 수 있는 코칭 시스템이 필요하며, 이러한 코칭 시스템의 원활하게 서비스하기 위해서는, 온라인 학습 동영상(동영상 데이터 등)에서 학습자가 집중해야 하는 영역을 미리 파악하는 것이 선행되어야 한다.Accordingly, a coaching system that can focus on oneself, which is the basis of self-directed learning, is required. Doing so should take precedence.

이러한 학습자가 집중해야 하는 영역을 사전에 파악하지 않고, 코칭 시스템을 서비스할 경우, 단순히 학습자가 온라인 학습 동영상이 제공되는 내내 시선을 떨어트리지 않고 출력 수단(모니터 등)을 응시하고 있기 때문에, 현재 온라인 학습에 집중하고 있다고 분석할 수도 있다. 이 경우, 학습자가 제공되는 온라인 학습 동영상의 엉뚱한 곳을 응시하고 있거나, 말 그래도 멍하니 출력 수단 만을 응시하고 있을 수도 있으나, 이에 대한 정확한 코칭이 불가능한 문제점이 있다.In the case of servicing the coaching system without identifying the area that these learners need to focus on in advance, the current online learning video is simply staring at the output means (monitor, etc.) You can analyze that you are focusing on learning. In this case, the learner may be staring at an unexpected place in the online learning video provided, or may be staring blankly only at the output means, but there is a problem in that accurate coaching for this is impossible.

그렇기 때문에, 본 발명의 일 실시예에 따른 동영상 데이터 내 관심 요구 영역 분석 시스템 및 그 방법은, 학습자에게 온라인 학습 동영상을 제공하기 전, 사전에, 학습자가 온라인 학습 동영상을 시청할 때, 집중해서 쳐다봐야 하는 영역(관심 요구 영역)을 판단하는 기술을 제공하고 있다.Therefore, in the system and method for analyzing the area of interest in video data according to an embodiment of the present invention, prior to providing the online learning video to the learner, when the learner watches the online learning video, the learner needs to focus and look at it. It provides a technology for determining the area (requiring interest area).

이를 통해서, 추후에, 판단한 영역(관심 요구 영역)을 이용하여, 해당하는 영역과 학습자의 응시 영역이 어느 정도 일치하는지 분석함으로써, 학습자가 온라인 학습 동영상을 시청하면서 학습 내용에 얼마나 집중하였는지 판단하여 이에 따른 코칭 시스템을 서비스할 수 있게 된다.Through this, later, by using the determined area (region of interest demand) to analyze how much the corresponding area and the learner's gaze area match, it is determined how much the learner focused on the learning content while watching the online learning video. service according to the coaching system.

이와 관련해서, 국내공개특허 제10-2020-0144890호("집중력 향상을 위한 이러닝 학습 시스템 및 방법")에서는 강의 중 방해자극을 없애고 안구 추적을 통해 주의 지속정도를 모니터링함으로써, 실제 집중도를 판단하고 이를 통해 피드백을 줄 수 있는 기술을 개시하고 있다.In this regard, Korean Patent Publication No. 10-2020-0144890 (“E-Learning Learning System and Method for Improving Concentration”) removes distracting stimuli during lectures and monitors the degree of attention through eye tracking to determine actual concentration and Through this, a technique capable of giving feedback is disclosed.

한국공개특허 제10-2020-0144890호 (공개일자 2020.12.30.)Korean Patent Publication No. 10-2020-0144890 (published on 2020.12.30.)

따라서 본 발명은 상기한 바와 같은 문제점을 해결하기 위해 안출된 것으로, 본 발명의 목적은 동영상 데이터 내 화자(speaker)의 존재 여부를 기초로, 화자의 얼굴, 손 및 팔을 포함하는 인체 영역의 움직임을 추출하거나 또는, 배경 영역의 변화량을 추출하여, 이를 분석함으로써, 동영상 데이터를 제공받아 시청하는 이용자의 관심이 요구되는 영역 정보(관심 요구 영역 정보)로 설정할 수 있는 동영상 데이터 내 관심 요구 영역 분석 시스템 및 그 방법에 관한 것이다.Accordingly, the present invention has been devised to solve the above problems, and an object of the present invention is to determine the movement of human body regions including the speaker's face, hands, and arms, based on whether or not a speaker exists in video data. A system for analyzing a region requiring interest in video data that can be set as information on a region requiring interest of a user watching video data (interest request region information) by extracting or analyzing the amount of change in the background region. and its method.

상기한 바와 같은 문제점을 해결하기 위한 본 발명의 일 실시예에 의한 동영상 데이터 내 관심 요구 영역 분석 시스템은, 관심 요구 영역 정보의 분석을 원하는 동영상 데이터를 입력받는 데이터 입력부, 상기 동영상 데이터를 구성하는 각 프레임 별로, 인체를 인식하고, 이를 기반으로 상기 관심 요구 영역 정보의 설정을 위한 예비 정보를 추출하는 예비 분석부 및 상기 예비 분석부에서 추출한 각 프레임 별 예비 정보를 이용하여, 상기 동영상 데이터에 대한 관심 요구 영역 정보를 생성하는 최종 분석부를 포함하는 것이 바람직하다.A system for analyzing a region of interest in video data according to an embodiment of the present invention to solve the above problems is a data input unit for receiving video data to be analyzed for information on a region of interest, By using a preliminary analysis unit that recognizes a human body for each frame and extracts preliminary information for setting the interest-requiring region information based on this, and the preliminary information for each frame extracted by the preliminary analysis unit, interest in the video data It is preferable to include a final analysis unit that generates the required area information.

더 나아가, 상기 예비 분석부는 기저장된 제1 분석 알고리즘을 적용하여, 해당하는 프레임 내 인체를 인식하고, 인식한 인체의 특징점 정보를 분석하는 제1 기초 분석부를 포함하는 것이 바람직하다.Furthermore, the preliminary analysis unit may include a first basic analysis unit that recognizes a human body within a corresponding frame by applying a first analysis algorithm stored in advance, and analyzes information on the recognized feature points of the human body.

더 나아가, 상기 예비 분석부는 상기 제1 기초 분석부에 의한 분석 결과를 전달받아, 인체의 인식 여부를 판단하는 제1 판단부, 상기 제1 판단부의 판단 결과에 따라, 인체가 인식될 경우, 상기 제1 기초 분석부에 의한 분석 결과를 이용하여, 안면(face)으로 설정된 특징점 정보의 분석 여부를 판단하는 제2 판단부 및 상기 제2 판단부의 판단 결과에 따라, 안면으로 설정된 특징점 정보가 분석될 경우, 해당하는 특징점 정보를 추출하는 제1 세부 분석부를 더 포함하는 것이 바람직하다.Furthermore, the preliminary analysis unit receives the analysis result by the first basic analysis unit and determines whether or not the human body is recognized. A second determination unit that determines whether to analyze the feature point information set as a face using the analysis result by the first basic analysis unit, and the feature point information set as a face is analyzed according to the determination result of the second determination unit. In this case, it is preferable to further include a first detailed analysis unit for extracting corresponding feature point information.

더 나아가, 상기 예비 분석부는 상기 제1 세부 분석부에 의해 추출 결과를 전달받아, 특징점 정보에 따른 안면 영역의 위치 정보를 분석하여, 예비 정보로 설정하는, 제1 설정부를 더 포함하는 것이 바람직하다.Furthermore, the preliminary analysis unit may further include a first setting unit that receives an extraction result from the first detailed analysis unit, analyzes location information of the facial region according to feature point information, and sets it as preliminary information. .

더 나아가, 상기 예비 분석부는 상기 제1 세부 분석부에 의해 추출 결과를 전달받아, 특징점 정보에 따른 안면 영역이 안면 영역의 전면부에 해당하는지 판단하는 제3 판단부 및 상기 제3 판단부의 판단 결과에 따라, 전면부에 해당될 경우, 상기 제1 세부 분석부에 의해 추출 결과를 전달받아, 특징점 정보에 따른 안면 영역의 위치 정보를 분석하여, 예비 정보로 설정하는 제2 설정부를 더 포함하는 것이 바람직하다.Furthermore, the preliminary analysis unit receives an extraction result from the first detailed analysis unit, and a third determination unit that determines whether the facial region according to feature point information corresponds to the front portion of the facial region and the third determination unit's determination result Accordingly, if the front part is applicable, a second setting unit for receiving the extraction result by the first detailed analysis unit, analyzing the location information of the facial region according to the feature point information, and setting it as preliminary information is further included. desirable.

더 나아가, 상기 예비 분석부는 상기 제2 판단부의 판단 결과에 따라, 안면으로 설정된 특징점 정보가 분석되지 않을 경우, 상기 제1 기초 분석부에 의한 분석 결과를 이용하여, 손(hand)으로 설정된 특징점 정보의 분석 여부를 판단하는 제4 판단부 및 상기 제4 판단부의 판단 결과에 따라, 손으로 설정된 특징점 정보가 분석될 경우, 해당하는 특징점 정보를 추출하여, 추출한 특징점 정보에 따른 손 영역의 위치 정보를 분석하여, 예비 정보로 설정하는 제3 설정부를 더 포함하는 것이 바람직하다.Furthermore, the preliminary analysis unit uses the analysis result of the first basic analysis unit when the feature point information set as the face is not analyzed according to the determination result of the second determination unit, and the feature point information set as the hand. If the feature point information set by the hand is analyzed, the corresponding feature point information is extracted and the positional information of the hand region according to the extracted feature point information It is preferable to further include a third setting unit that analyzes and sets preliminary information.

더 나아가, 상기 예비 분석부(200)는 상기 제1 판단부의 판단 결과에 따라, 인체가 인식되지 않을 경우, 기저장된 제2 분석 알고리즘을 적용하여, 해당하는 프레임과 직전 프레임 간의 변화 영역을 분석하는 제2 기초 분석부 및 상기 제2 기초 분석부에 의한 분석 결과를 전달받아, 변화 영역의 위치 정보를 분석하여, 예비 정보로 설정하는 제4 설정부를 더 포함하는 것이 바람직하다.Furthermore, the preliminary analysis unit 200 applies a pre-stored second analysis algorithm when the human body is not recognized according to the determination result of the first determination unit to analyze the change area between the corresponding frame and the previous frame. It is preferable to further include a second basic analysis unit and a fourth setting unit that receives the analysis result by the second basic analysis unit, analyzes location information of the change area, and sets it as preliminary information.

더 나아가, 상기 최종 분석부는 각 프레임 별 설정한 예비 정보를 이용하여, 상기 동영상 데이터를 구성하는 프레임의 순서를 기준으로, 연속하는 둘 이상의 프레임이 동일한 분석 결과를 통해서 예비 정보를 설정할 경우, 해당하는 첫 프레임을 시작 프레임으로 하고, 해당하는 마지막 프레임을 끝 프레임으로 하여, 상기 관심 요구 영역 정보의 시간 정보를 생성하고, 해당하는 첫 프레임부터 마지막 프레임까지의 예비 정보를 중첩시켜, 상기 관심 요구 영역 정보의 좌표 정보를 생성하는 것이 바람직하다.Furthermore, when the final analysis unit sets preliminary information through the same analysis result of two or more consecutive frames based on the order of the frames constituting the video data using the preliminary information set for each frame, the corresponding Taking the first frame as the start frame and the corresponding last frame as the end frame, time information of the request region information is generated, and preliminary information from the corresponding first frame to the last frame is overlapped to obtain the interest region information It is desirable to generate coordinate information of .

또는, 상기 최종 분석부는 각 프레임 별 설정한 예비 정보를 이용하여, 상기 동영상 데이터를 구성하는 프레임의 순서를 기준으로, 기설정된 소정 시간에 해당하는 프레임들이 동일한 분석 결과를 통해서 예비 정보를 설정할 경우, 해당하는 첫 프레임을 시작 프레임으로 하고, 해당하는 마지막 프레임을 끝 프레임으로 하여, 상기 관심 요구 영역 정보의 시간 정보를 생성하고, 해당하는 첫 프레임부터 마지막 프레임까지의 예비 정보를 중첩시켜, 상기 관심 요구 영역 정보의 좌표 정보를 생성하는 것이 바람직하다.Alternatively, when the final analyzer sets the preliminary information based on the analysis result of the frames corresponding to the predetermined time based on the order of the frames constituting the video data using the preliminary information set for each frame, Taking the corresponding first frame as the start frame and the corresponding last frame as the end frame, time information of the interest request region information is generated, and preliminary information from the corresponding first frame to the last frame is overlapped to obtain the interest request It is desirable to generate coordinate information of area information.

또는, 상기 최종 분석부는 각 프레임 별 설정한 예비 정보를 이용하여, 상기 동영상 데이터를 구성하는 프레임의 순서를 기준으로, 기설정된 소정 개수 이상의 프레임들이 동일한 분석 결과를 통해서 예비 정보를 설정할 경우, 해당하는 첫 프레임을 시작 프레임으로 하고, 해당하는 마지막 프레임을 끝 프레임으로 하여, 상기 관심 요구 영역 정보의 시간 정보를 생성하고, 해당하는 첫 프레임부터 마지막 프레임까지의 예비 정보를 중첩시켜, 상기 관심 요구 영역 정보의 좌표 정보를 생성하는 것이 바람직하다.Alternatively, when the final analyzer sets the preliminary information based on the analysis result of the same analysis result of a predetermined number or more frames based on the order of the frames constituting the video data using the preliminary information set for each frame, the corresponding Taking the first frame as the start frame and the corresponding last frame as the end frame, time information of the request region information is generated, and preliminary information from the corresponding first frame to the last frame is overlapped to obtain the interest region information It is desirable to generate coordinate information of .

본 발명의 일 실시예에 의한 연산 처리 수단에 의해 각 단계가 수행되는 동영상 데이터 내 관심 요구 영역 분석 시스템을 이용한 관심 요구 영역 분석 방법으로서, 데이터 입력부에서, 관심 요구 영역 정보의 분석을 원하는 동영상 데이터를 입력받는 데이터 입력 단계(S100), 예비 분석부에서, 기저장된 제1 분석 알고리즘을 이용하여, 상기 데이터 입력 단계(S100)에 의한 동영상 데이터를 구성하는 각 프레임 별로, 인체를 인식하고, 인식한 인체의 특징점 정보를 분석하는 기초 분석 단계(S200), 예비 분석부에서, 상기 기초 분석 단계(S200)의 분석 결과를 이용하여, 프레임 내 인체의 인식 여부를 판단하는 제1 판단 단계(S300), 예비 분석부에서, 상기 제1 판단 단계(S300)의 판단 결과에 따라, 인체가 인식될 경우, 상기 기초 분석 단계(S200)의 분석 결과를 이용하여, 안면(face)으로 설정된 특징점 정보의 분석 여부를 판단하는 제2 판단 단계(S400), 예비 분석부에서, 상기 제2 판단 단계(S400)의 판단 결과에 따라, 안면으로 설정된 특징점 정보가 분석될 경우, 해당하는 특징점 정보를 추출하여, 예비 정보로 설정하는 제1 설정 단계(S500) 및 최종 분석부에서, 설정한 각 프레임 별 예비 정보를 이용하여, 상기 동영상 데이터에 대한 관심 요구 영역 정보를 생성하는 최종 분석 단계(S600)를 포함하는 것이 바람직하다.A method for analyzing a required region of interest using a system for analyzing a required region of interest in video data in which each step is performed by an arithmetic processing unit according to an embodiment of the present invention, wherein a data input unit receives video data for which analysis of information on the required region of interest is desired. Received data input step (S100), in the preliminary analysis unit, by using the pre-stored first analysis algorithm, for each frame constituting the video data by the data input step (S100), the human body is recognized, and the recognized human body A basic analysis step (S200) of analyzing feature point information of the preliminary analysis unit, a first determination step (S300) of determining whether a human body is recognized in a frame using the analysis result of the basic analysis step (S200), a preliminary analysis unit In the analysis unit, when the human body is recognized according to the determination result of the first determination step (S300), using the analysis result of the basic analysis step (S200), whether to analyze the feature point information set as the face (face) is determined. In the second determination step (S400) of determining, in the preliminary analysis unit, when the feature point information set as the face is analyzed according to the determination result of the second determination step (S400), the corresponding feature point information is extracted and used as preliminary information. It is preferable to include a first setting step (S500) of setting and a final analysis step (S600) of generating information on the required region of interest for the video data by using the set preliminary information for each frame in the final analysis unit. .

더 나아가, 상기 제1 설정 단계(S500)는 상기 제2 판단 단계(S400)의 판단 결과에 따라, 안면으로 설정된 특징점 정보가 분석될 경우, 해당하는 특징점 정보를 추출하여, 특징점 정보에 따른 안면 영역의 위치 정보를 분석하여 예비 정보로 설정하는 것이 바람직하다.Furthermore, in the first setting step (S500), when the feature point information set as the face is analyzed according to the determination result of the second determination step (S400), the corresponding feature point information is extracted, and the face area according to the feature point information is extracted. It is desirable to analyze the location information of and set it as preliminary information.

더 나아가, 상기 제1 설정 단계(S500)는 상기 제2 판단 단계(S400)의 판단 결과에 따라, 안면으로 설정된 특징점 정보가 분석될 경우, 해당하는 특징점 정보를 추출하여, 특징점 정보에 따른 안면 영역이 안면 영역의 전면부에 해당하는지 판단하는 추가 판단 단계(S510) 및 상기 추가 판단 단계(S510)의 판단 결과에 따라, 전면부에 해당될 경우, 해당하는 특징점 정보에 따른 안면 영역의 위치 정보를 분석하여 예비 정보로 설정하는, 제2 설정 단계(S520)를 더 포함하는 것이 바람직하다.Furthermore, in the first setting step (S500), when the feature point information set as the face is analyzed according to the determination result of the second determination step (S400), the corresponding feature point information is extracted, and the face area according to the feature point information is extracted. Additional determination step (S510) of determining whether the face region corresponds to the front part and, according to the determination result of the additional determination step (S510), if the face region corresponds to the front part, location information of the face region according to corresponding feature point information is determined. It is preferable to further include a second setting step (S520) of analyzing and setting preliminary information.

더 나아가, 상기 동영상 데이터 내 관심 요구 영역 분석 방법은 예비 분석부에서, 상기 제1 판단 단계(S300)의 판단 결과에 따라, 인체가 인식되지 않을 경우, 기저장된 제2 분석 알고리즘을 적용하여, 해당하는 프레임과 직전 프레임 간의 변화 영역을 분석하는 추가 분석 단계(S700) 및 예비 분석부에서, 상기 추가 분석 단계(S700)의 분석 결과를 전달받아, 변화 영역의 위치 정보를 분석하여 예비 정보로 설정하는 제3 설정 단계(S800)를 더 포함하는 것이 바람직하다.Furthermore, in the method of analyzing the region of interest in the video data, when the human body is not recognized according to the determination result of the first determination step (S300) in the preliminary analysis unit, a pre-stored second analysis algorithm is applied, and the corresponding In the additional analysis step (S700) of analyzing the change area between the frame being played and the previous frame, and the preliminary analysis unit receives the analysis result of the additional analysis step (S700), analyzes the location information of the change area and sets it as preliminary information It is preferable to further include a third setting step (S800).

더 나아가, 상기 동영상 데이터 내 관심 요구 영역 분석 방법은 예비 분석부에서, 상기 제2 판단 단계(S400)의 판단 결과에 따라, 안면으로 설정된 특징점 정보가 분석되지 않을 경우, 상기 기초 분석 단계(S200)의 분석 결과를 이용하여, 손(hand)으로 설정된 특징점 정보의 분석 여부를 판단하는 제3 판단 단계(S900) 및 예비 분석부에서, 상기 제3 판단 단계(S900)의 판단 결과에 따라, 손으로 설정된 특징점 정보가 분석될 경우, 해당하는 특징점 정보를 추출하여, 특징점 정보에 따른 손 영역의 위치 정보를 분석하여 예비 정보로 설정하는 제4 설정 단계(S1000);를 더 포함하는 것이 바람직하다.Furthermore, the method of analyzing the required region of interest in the video data includes, in the preliminary analysis unit, when the feature point information set as the face is not analyzed according to the determination result of the second determination step (S400), the basic analysis step (S200) In the third judgment step (S900) of determining whether to analyze feature point information set by hand using the analysis result of and in the preliminary analysis unit, according to the judgment result of the third judgment step (S900), by hand It is preferable to further include a fourth setting step (S1000) of extracting the corresponding feature point information, analyzing the location information of the hand region according to the feature point information, and setting it as preliminary information when the set minutiae point information is analyzed.

더 나아가, 상기 최종 분석 단계(S600)는 각 프레임 별 설정한 예비 정보를 이용하여, 상기 동영상 데이터를 구성하는 프레임의 순서를 기준으로, 연속하는 둘 이상의 프레임이 동일한 분석 결과를 통해서 예비 정보를 설정할 경우, 해당하는 첫 프레임을 시작 프레임으로 하고, 해당하는 마지막 프레임을 끝 프레임으로 하여, 상기 관심 요구 영역 정보의 시간 정보를 생성하고, 해당하는 첫 프레임부터 마지막 프레임까지의 예비 정보를 중첩시켜, 상기 관심 요구 영역 정보의 좌표 정보를 생성하는, 것이 바람직하다.Furthermore, in the final analysis step (S600), based on the order of the frames constituting the video data, preliminary information is set through the same analysis result of two or more consecutive frames using the preliminary information set for each frame. In this case, the corresponding first frame is used as the start frame and the corresponding last frame is used as the end frame to generate time information of the ROI information, and overlap preliminary information from the corresponding first frame to the last frame. It is preferable to generate coordinate information of the request area of interest information.

또는, 상기 최종 분석 단계(S600)는 각 프레임 별 설정한 예비 정보를 이용하여, 상기 동영상 데이터를 구성하는 프레임의 순서를 기준으로, 기설정된 소정 시간에 해당하는 프레임들이 동일한 분석 결과를 통해서 예비 정보를 설정할 경우, 해당하는 첫 프레임을 시작 프레임으로 하고, 해당하는 마지막 프레임을 끝 프레임으로 하여, 상기 관심 요구 영역 정보의 시간 정보를 생성하고, 해당하는 첫 프레임부터 마지막 프레임까지의 예비 정보를 중첩시켜, 상기 관심 요구 영역 정보의 좌표 정보를 생성하는 것이 바람직하다.Alternatively, in the final analysis step (S600), based on the order of the frames constituting the video data, frames corresponding to a predetermined time are the same as preliminary information through the analysis result using the preliminary information set for each frame. When is set, the corresponding first frame is used as the start frame and the corresponding last frame is used as the end frame to generate time information of the information on the request region of interest, and the preliminary information from the corresponding first frame to the last frame is overlapped. , It is preferable to generate coordinate information of the information on the request region of interest.

또는, 상기 최종 분석 단계(S600)는 각 프레임 별 설정한 예비 정보를 이용하여, 상기 동영상 데이터를 구성하는 프레임의 순서를 기준으로, 기설정된 소정 개수 이상의 프레임들이 동일한 분석 결과를 통해서 예비 정보를 설정할 경우, 해당하는 첫 프레임을 시작 프레임으로 하고, 해당하는 마지막 프레임을 끝 프레임으로 하여, 상기 관심 요구 영역 정보의 시간 정보를 생성하고, 해당하는 첫 프레임부터 마지막 프레임까지의 예비 정보를 중첩시켜, 상기 관심 요구 영역 정보의 좌표 정보를 생성하는 것이 바람직하다.Alternatively, in the final analysis step (S600), based on the order of the frames constituting the video data, preliminary information is set through the analysis result of the same analysis result of a predetermined number or more frames using the preliminary information set for each frame. In this case, the corresponding first frame is used as the start frame and the corresponding last frame is used as the end frame to generate time information of the ROI information, and overlap preliminary information from the corresponding first frame to the last frame. It is preferable to generate coordinate information of the request area of interest information.

본 발명의 일 실시예에 따른 동영상 데이터 내 관심 요구 영역 분석 시스템 및 그 방법은, 동영상 데이터를 구성하고 있는 각 프레임 별로, 동영상 데이터를 제공받은 이용자(시청자 등)가 관심을 두고 집중도 높게 응시해야 하는 관심 요구 영역 정보를 분석하는 기술에 관한 것으로, 다양한 주제/종류의 동영상 데이터를 입력받아, 이를 시청하는 이용자가 집중해야 할 영역을 사전에 분석하여 저장함으로써, 향후 이용자가 해당하는 동영상 데이터를 시청할 때, 이용자의 시선 응시 위치를 파악하여, 동영상 데이터의 시청 집중도를 분석할 수 있는 장점이 있다.A system and method for analyzing a region requiring interest in video data according to an embodiment of the present invention provides a user (viewer, etc.) receiving video data for each frame constituting the video data should pay attention and gaze with high concentration. It is about a technology that analyzes information on the area of demand for interest. It receives video data of various subjects/types, analyzes and stores the area to be focused on in advance by the user watching it, so that when the user watches the corresponding video data in the future , it has the advantage of being able to analyze the viewing concentration of video data by identifying the location of the user's gaze.

향후, 동영상 데이터의 시청 집중도가 낮거나, 분석한 집중해야 할 영역이 아닌 다른 영역을 응시하고 있을 경우, 이를 코칭하여 집중해야 할 영역의 응시로 이어질 수 있도록 할 수 있어, 나홀로 학습을 진행해야 하는 동영상 학습 데이터에 특히 최적화되어 있는 장점이 있다.In the future, if the viewer concentration of the video data is low, or if you are staring at an area other than the area you need to focus on, coaching can lead you to gaze at the area you need to focus on, so you can learn alone. It has the advantage of being particularly optimized for video learning data.

도 1은 본 발명의 일 실시예에 따른 동영상 데이터 내 관심 요구 영역 분석 시스템을 나타낸 구성 예시도이다.
도 2는 본 발명의 일 실시예에 따른 동영상 데이터 내 관심 요구 영역 분석 시스템 및 방법에 의해, 예비 정보의 설정을 위한 대표적인 분석 결과를 나타낸 예시 도면이다.
도 3은 본 발명의 일 실시예에 따른 동영상 데이터 내 관심 요구 영역 분석 방법을 나타낸 순서 예시도이다.1 is an exemplary configuration diagram illustrating a system for analyzing a region of interest in video data according to an embodiment of the present invention.
2 is an exemplary view showing a representative analysis result for setting preliminary information by a system and method for analyzing a required region of interest in video data according to an embodiment of the present invention.
3 is an exemplary flowchart illustrating a method for analyzing a region of interest in video data according to an embodiment of the present invention.

이하 첨부한 도면들을 참조하여 본 발명의 동영상 데이터 내 관심 요구 영역 분석 시스템 및 그 방법을 상세히 설명한다. 다음에 소개되는 도면들은 당업자에게 본 발명의 사상이 충분히 전달될 수 있도록 하기 위해 예로서 제공되는 것이다. 따라서, 본 발명은 이하 제시되는 도면들에 한정되지 않고 다른 형태로 구체화될 수도 있다. 또한, 명세서 전반에 걸쳐서 동일한 참조번호들은 동일한 구성요소들을 나타낸다.Hereinafter, a system and method for analyzing a region of interest in video data according to the present invention will be described in detail with reference to the accompanying drawings. The drawings introduced below are provided as examples to sufficiently convey the spirit of the present invention to those skilled in the art. Accordingly, the present invention may be embodied in other forms without being limited to the drawings presented below. Also, like reference numerals denote like elements throughout the specification.

이 때, 사용되는 기술 용어 및 과학 용어에 있어서 다른 정의가 없다면, 이 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 통상적으로 이해하고 있는 의미를 가지며, 하기의 설명 및 첨부 도면에서 본 발명의 요지를 불필요하게 흐릴 수 있는 공지 기능 및 구성에 대한 설명은 생략한다.At this time, unless there is another definition in the technical terms and scientific terms used, they have meanings commonly understood by those of ordinary skill in the art to which this invention belongs, and the gist of the present invention in the following description and accompanying drawings Descriptions of known functions and configurations that may unnecessarily obscure are omitted.

더불어, 시스템은 필요한 기능을 수행하기 위하여 조직화되고 규칙적으로 상호 작용하는 장치, 기구 및 수단 등을 포함하는 구성 요소들의 집합을 의미한다.In addition, a system refers to a set of components including devices, mechanisms, and means that are organized and regularly interact to perform necessary functions.

본 발명의 일 실시예에 따른 동영상 데이터 내 관심 요구 영역 분석 시스템 및 그 방법은 동영상 데이터를 구성하고 있는 각 프레임 별로, 동영상 데이터를 제공받은 이용자(시청자 등)가 관심을 두고 집중도 높게 응시해야 하는 관심 요구 영역 정보를 분석하는 기술에 관한 것이다.A system and method for analyzing a region requiring interest in video data according to an embodiment of the present invention is an interest that a user (viewer, etc.) who is provided with video data should pay attention to and gaze at with high concentration for each frame constituting the video data. It relates to techniques for analyzing demand domain information.

특히, 후술할 본 발명의 일 실시예에 따른 동영상 데이터 내 관심 요구 영역 분석 시스템 및 그 방법은 원활한 설명과 이해를 위해, 학습 동영상 데이터로 한정하여 기재하고 있으나, 이는 본 발명의 일 실시예에 불과하며, 학습을 위한 동영상 데이터 외에도 라이브 커머스와 같은 쇼핑을 위한 동영상 데이터 등에도 적용 가능하다.In particular, a system and method for analyzing a region of interest in video data according to an embodiment of the present invention, which will be described later, are limited to learning video data for easy explanation and understanding, but this is only an embodiment of the present invention. In addition to video data for learning, it can also be applied to video data for shopping such as live commerce.

본 발명의 일 실시예에 따른 동영상 데이터 내 관심 요구 영역 분석 시스템 및 그 방법은 이와 같은 다양한 주제/종류의 동영상 데이터를 입력받아, 이를 시청하는 이용자가 집중해야 할 영역을 사전에 분석하여 저장함으로써, 향후 이용자가 해당하는 동영상 데이터를 시청할 때, 이용자의 시선 응시 위치를 파악하여, 동영상 데이터의 시청 집중도를 분석할 수 있는 장점이 있다.A system and method for analyzing a region requiring interest in video data according to an embodiment of the present invention receives such various subjects/types of video data, analyzes and stores in advance an area to be focused on by a user watching it, In the future, when the user watches the corresponding video data, there is an advantage in that the viewing concentration of the video data can be analyzed by identifying the location of the user's gaze.

더 나아가, 동영상 데이터의 시청 집중도가 낮거나, 분석한 집중해야 할 영역이 아닌 다른 영역을 응시하고 있을 경우, 이를 코칭하여 집중해야 할 영역의 응시로 이어질 수 있도록 할 수 있어, 나홀로 학습을 진행해야 하는 동영상 학습 데이터에 특히 최적화되어 있는 장점이 있다.Furthermore, if the viewer concentration of the video data is low, or if you are staring at an area other than the analyzed area to focus on, coaching can lead you to gaze at the area to focus on, so you do not have to study alone. It has the advantage of being particularly optimized for video learning data.

이 때, 본 발명의 일 실시예에 따른 동영상 데이터 내 관심 요구 영역 분석 시스템 및 그 방법은 동영상 데이터를 구성하고 있는 각 프레임 별로, 집중해야 할 영역(관심 요구 영역 정보)을 분석함에 있어서, 이용자(학생 등)가 동영상 데이터를 시청할 때, 강의자(선생님 등)가 정면(카메라 방향)을 쳐다보면서 강의를 진행하면, 얼굴에 집중도가 높고, 강의자가 정면을 쳐다보지 않고 칠판 등을 가리키면서 판서를 진행하면, 판서가 진행되고 있는 영역에 집중도가 높다는 연구 결과를 토대로, 상술한 조건들을 토대로 동작을 수행하게 된다.At this time, the system and method for analyzing the region of interest in video data according to an embodiment of the present invention analyzes the region to be focused (requiring interest region information) for each frame constituting the video data, and the user ( When a lecturer (teacher, etc.) is watching video data, if the lecturer (teacher, etc.) proceeds with a lecture while looking straight ahead (toward the camera), the concentration is high on the face, and if the lecturer (such as a teacher) proceeds with writing while pointing at the blackboard, etc. , Based on the research result that concentration is high in the area where writing is in progress, the operation is performed based on the above conditions.

도 1은 본 발명의 일 실시예에 따른 동영상 데이터 내 관심 요구 영역 분석 시스템의 구성을 나타낸 도면이다. 도 1을 참조로 하여 본 발명의 일 실시예에 따른 동영상 데이터 내 관심 요구 영역 분석 시스템을 자세히 알아보도록 한다.1 is a diagram showing the configuration of a system for analyzing a region of interest in video data according to an embodiment of the present invention. Referring to FIG. 1 , a system for analyzing a region of interest in video data according to an embodiment of the present invention will be described in detail.

본 발명의 일 실시예에 따른 동영상 데이터 내 관심 요구 영역 분석 시스템은 도 1에 도시된 바와 같이, 데이터 입력부, 예비 분석부 및 최종 분석부를 포함하여 구성되는 것이 바람직하다. 각 구성들은 컴퓨터 등을 포함하는 적어도 하나 이상의 연산처리수단에 각각 또는 통합 포함되어 동작을 수행하는 것이 바람직하다. 물론, 본 발명의 일 실시예에 따른 동영상 데이터 내 관심 요구 영역 분석 시스템은 온라인 학습 동영상 제공 서비스 업체 또는, 온라인 학습 집중도를 분석하고 이에 따른 코칭을 제공하는 서비스 제공 서버에서 동작을 수행하는 것이 가장 바람직하다.As shown in FIG. 1 , a system for analyzing a region of interest in video data according to an embodiment of the present invention preferably includes a data input unit, a preliminary analysis unit, and a final analysis unit. It is preferable that each component performs an operation by being individually or integrally included in at least one arithmetic processing unit including a computer or the like. Of course, it is most preferable that the system for analyzing the area of interest in video data according to an embodiment of the present invention is operated by an online learning video providing service company or a service providing server that analyzes online learning concentration and provides coaching accordingly. do.

각 구성에 대해서 자세히 알아보자면,For a detailed look at each component,

데이터 입력부는 외부(온라인 학습 동영상 제공 서비스 업체 또는, 온라인 학습 집중도를 분석하고 이에 따른 코칭을 제공하는 서비스 제공 서버 등)로부터 관심 요구 영역 정보의 분석을 원하는 동영상 데이터를 입력받게 된다.The data input unit receives video data for analysis of information on a demand area of interest from an external source (e.g., an online learning video providing service company or a service providing server that analyzes online learning concentration and provides coaching accordingly).

이 때, 상기 관심 요구 영역 정보란, 동영상 데이터를 시청하는 이용자가 집중해야 하는 영역, 다시 말하자면, 학습 동영상 데이터를 시청하는 학습자가 집중해야 하는 영역 정보를 의미한다.At this time, the interest-requiring region information refers to an area in which a user viewing the video data should focus, in other words, information on an area in which a learner viewing the learning video data should focus.

시계열 순으로 재생되는 동영상 데이터를 분석하는 만큼, 집중이 요구되는 시작 프레임의 시간 정보, 시작 프레임에서의 영역 좌표 정보, 마지막 프레임의 시간 정보, 마지막 프레임에서의 영역 좌표 정보를 포함하는 것이 바람직하며, 이에 대해서는 자세히 후술하도록 한다.As video data played in time series is analyzed, it is desirable to include time information of the start frame, region coordinate information in the start frame, time information of the last frame, and region coordinate information in the last frame, which require attention, This will be described later in detail.

상기 예비 분석부는 상기 데이터 입력부에 의한 동영상 데이터를 구성하는 각 프레임 별로, 인체를 인식하고, 이를 기반으로 상기 관심 요구 영역 정보의 설정을 위한 예비 정보를 추출하는 것이 바람직하다.Preferably, the preliminary analysis unit recognizes a human body for each frame constituting the video data by the data input unit, and extracts preliminary information for setting the information on the required region of interest based on this recognition.

이를 위해, 상기 예비 분석부는 도 1에 도시된 바와 같이, 제1 기초 분석부, 제1 판단부, 제2 판단부, 제1 세부 분석부, 제1 설정부, 제3 판단부, 제2 설정부, 제4 판단부, 제3 설정부, 제2 기초 분석부 및 제4 설정부를 포함하는 것이 바람직하다.To this end, the preliminary analysis unit, as shown in FIG. 1 , includes a first basic analysis unit, a first determination unit, a second determination unit, a first detailed analysis unit, a first setting unit, a third determination unit, and a second setting unit. unit, a fourth determination unit, a third setting unit, a second basic analysis unit, and a fourth setting unit.

상기 제1 기초 분석부는 미리 저장된 제1 분석 알고리즘을 적용하여, 각 프레임 내 인체를 인식하고, 인식한 인체의 특징점 정보를 분석하게 된다.The first basic analysis unit applies a pre-stored first analysis algorithm to recognize a human body within each frame, and analyzes information on feature points of the recognized human body.

이 때, 학습 동영상 데이터인 만큼, 해당하는 학습 동영상 데이터에 나타나는 인체는 강의자(선생님 등)이라고 가정하는 것이 바람직하다. 만약, 라이브 커머스 등을 위한 쇼핑 동영상 데이터일 경우, 쇼핑 동영상 데이터에 나타나는 인체는 쇼핑 진행자(쇼호스트 등)라고 가정하는 것이 바람직하다.At this time, since it is learning video data, it is preferable to assume that the human body appearing in the corresponding learning video data is a lecturer (teacher, etc.). In the case of shopping video data for live commerce, etc., it is preferable to assume that the human body appearing in the shopping video data is a shopping host (show host, etc.).

상기 제1 분석 알고리즘으로는 휴먼 포즈 에스티메이션(human pose estimation) 알고리즘인 것이 가장 바람직하나, 프레임 내 인체를 인식할 수 있는 어떠한 알고리즘이라도 무방하다.The first analysis algorithm is most preferably a human pose estimation algorithm, but any algorithm capable of recognizing a human body in a frame may be used.

휴먼 포즈 에스티메이션 알고리즘의 하나의 예를 들자면, OpenPose으로서, 딥러닝의 합성곱 신경망을 기반으로, 실시간으로 프레임 내 사람의 몸, 손 및 얼굴의 특징점을 추출할 수 있는 라이브러리이다.As an example of a human pose estimation algorithm, OpenPose is a library that can extract feature points of a person's body, hand, and face within a frame in real time based on a convolutional neural network of deep learning.

상기 제1 기초 분석부를 통해서, 각 프레임 별로, 화자(speaker)의 존재 여부, 다시 말하자면, 학습 동영상 데이터에 선생님의 모습이 존재하는지 여부를 판단하기 위한 분석 자료로 활용되게 된다.Through the first basic analysis unit, each frame is used as analysis data for determining whether a speaker exists, that is, whether a teacher's appearance exists in the learning video data.

상기 제1 판단부는 상기 제1 기초 분석부에 의한 분석 결과를 전달받아, 인체의 인식 여부를 판단하게 된다. 즉, 상술한 바와 같이, 상기 제1 분석 알고리즘을 통한 분석 결과를 이용하여, 학습 동영상 데이터의 프레임 별 선생님이 나타나는지 여부를 판단하게 된다. 상기 제1 판단부를 통해서 프레임 별 선생님의 존재/미존재를 판단하게 된다.The first determination unit receives an analysis result by the first basic analysis unit and determines whether the human body is recognized. That is, as described above, it is determined whether or not a teacher appears for each frame of the learning video data using the analysis result through the first analysis algorithm. The existence/non-existence of the teacher for each frame is determined through the first determination unit.

상술한 바와 같이, 다양한 연구 결과, 영상이 플레이 될 때, 선생님이 정면을 보고 있을 경우, 선생님의 얼굴에 대한 집중도가 높으며, 선생님이 칠판을 가리키면서 판서를 하고 있을 경우, 선생님의 손 동작(판서 대응 동작)에 대한 집중도가 높게 된다.As described above, as a result of various studies, when the teacher is looking at the front when the video is played, the concentration on the teacher's face is high, and when the teacher is writing while pointing at the blackboard, the teacher's hand movements (corresponding to writing) concentration on action) is high.

그렇기 때문에, 본 발명의 일 실시예에 따른 동영상 데이터 내 관심 요구 영역 분석 시스템은 단순히 프레임 내 선생님이 존재한다는 점에서 분석을 그치는 것이 아니라, 상기 제2 판단부를 통해서, 프레임 내 선생님은 존재하지만, 현재 선생님의 동작이 정면(카메라 방향)을 바라보고 있는지, 다른 동작을 취하고 있는지 추가 분석을 수행하게 된다.Therefore, the system for analyzing the region of need of interest in video data according to an embodiment of the present invention does not stop analysis simply because the teacher exists in the frame, but through the second determination unit, the teacher exists in the frame. Additional analysis will be performed to see if the teacher's motion is looking straight ahead (toward the camera) or taking a different motion.

이러한 점을 고려하여, 상기 제2 판단부는 상기 제1 판단부의 판단 결과에 따라, 인체가 인식될 경우(프레임 내 선생님이 존재할 경우), 상기 제1 기초 분석부에 의한 분석 결과를 이용하여, 안면(face)으로 설정된 특징점 정보의 분석 여부를 판단하게 된다.In consideration of this, the second determination unit uses the analysis result of the first basic analysis unit when a human body is recognized (when a teacher exists in the frame) according to the determination result of the first determination unit, It is determined whether to analyze feature point information set as (face).

다시 말하자면, 상기 제2 판단부는 상기 제1 판단부의 판단 결과에 따라, 인체가 인식될 경우, 상기 제1 기초 분석부에 의한 분석 결과를 이용하여, 분석 결과로 안면으로 설정된 특징점 정보를 포함하고 있는지 판단하게 된다.In other words, if the human body is recognized by the second determination unit according to the determination result of the first determination unit, using the analysis result by the first basic analysis unit, whether or not the feature point information set as the face is included as the analysis result. to judge

즉, 상기 제1 판단부의 판단 결과, 프레임 내 선생님이 존재하지만, 보여지는 선생님의 신체 부위가 얼굴을 포함하고 있는지 분석하게 된다.That is, as a result of the determination of the first determination unit, it is analyzed whether or not the body part of the teacher shown in the frame includes the face, even though the teacher exists in the frame.

이 때, 상기 제1 기초 분석부는 상술한 바와 같이, 프레임 내 사람의 몸, 손 및 얼굴의 특징점을 추출할 수 있으며, 당연히 이를 위해서는 각 특징점 별 라벨링이 진행되어 있기 때문에, 상기 제2 판단부는 상기 제1 기초 분석부에 의한 분석 결과를 이용하여, 안면으로 설정된 특징점 정보가 포함되었는지 판단하게 된다.At this time, as described above, the first basic analysis unit can extract the feature points of the body, hand, and face of the person in the frame. Using the analysis result by the first basic analysis unit, it is determined whether feature point information set as a face is included.

상기 제1 세부 분석부는 상기 제2 판단부의 판단 결과에 따라, 안면으로 설정된 특징점 정보가 분석될 경우, 다시 말하자면, 상기 제1 기초 분석부에 의한 분석 결과에서 안면으로 설정된 특징점 정보가 포함되어 있을 경우, 해당하는 특징점 정보를 추출하게 된다.According to the determination result of the second determination unit, the first detailed analysis unit analyzes the feature point information set as the face, in other words, when the analysis result by the first basic analysis unit includes the feature point information set as the face. , the corresponding feature point information is extracted.

즉, 제1 판단부의 판단 결과, 상기 제1 기초 분석부에 의한 분석 결과, 선생님이 존재한다고 판단하고, 제2 판단부의 판단 결과, 보여 지는 선생님의 신체 부위가 얼굴을 포함하고 있는지, 얼굴을 포함하고 있지 않은지 판단하게 된다.That is, as a result of the determination of the first determination unit and the analysis result of the first basic analysis unit, it is determined that a teacher exists, and as a result of determination of the second determination unit, whether or not the body part of the teacher shown includes a face, You decide whether you are doing it or not.

이러한 이유는, 선생님의 얼굴에 대한 집중도가 높다는 점을 연구 결과를 토대로, 선생님의 안면 영역을 상기 관심 요구 영역 정보를 생성함에 있어서, 가장 상위 기준으로 삼기 위함이다.The reason for this is to take the teacher's face region as the highest criterion in generating the information on the required region of interest, based on the research result that the degree of concentration on the teacher's face is high.

이를 통해서, 상기 제1 설정부는 상기 제1 세부 분석부에 의해 추출 결과를 전달받아, 특징점 정보(안면으로 설정된 특징점 정보)에 따른 안면 영역의 위치 정보를 분석하여, 도 2의 a)에 도시된 바와 같이, 해당하는 프레임 내 관심 요구 영역 정보의 생성을 위한 예비 정보로 설정하게 된다. 상기 예비 정보로는, 각 프레임 별 시간 정보를 이미 알 수 있기 때문에, 영역에 대한 좌표 정보인 것이 바람직하다.Through this, the first setting unit receives the extraction result from the first detailed analysis unit, analyzes the location information of the face area according to the feature point information (the feature point information set as the face), and As such, it is set as preliminary information for generating the information on the required region of interest within the corresponding frame. As the preliminary information, since time information for each frame is already known, it is preferable that it is coordinate information for an area.

다만, 선생님의 안면 영역으로, 안면의 전면부(정면의 카메라를 바라보고 있을 때 보여 지는 안면 위치)가 아닌 측면부(정면의 카메라를 바라보고 있지 않을 때 보여 지는 안면 위치)를 포함할 수 있고, 선생님의 측면부가 보여 지는 것은, 현재 선생님의 행위가 판서를 수행하거나, 특정 지점을 가리키는 등의 동작을 수행하고 있다고 가정할 수 있다. 또한, 연구 결과, 선생님의 전면부 얼굴이 보여지지 않고 손 동작이 발생할 경우, 손 동작으로 집중도가 옮겨짐을 알 수 있기에, 현재 상기 제1 세부 분석부에 의한 분석 결과, 안면으로 설정된 특징점 정보가 전면부에 해당하는지 판단하여, 관심 요구 영역 정보의 생성 정확도를 향상시킬 수 있다.However, as the teacher's face area, it may include the side part (the face position shown when not looking at the front camera) rather than the front part of the face (the face position shown when looking at the front camera), The fact that the teacher's side part is shown can assume that the current teacher's action is performing writing or pointing to a specific point. In addition, as a result of the study, it can be seen that when the teacher's front face is not shown and a hand motion occurs, the degree of concentration is shifted by the hand motion. As a result of the analysis by the first detailed analysis unit, the feature point information set as the face is By determining whether it corresponds to negative, it is possible to improve generation accuracy of information on the required region of interest.

이에 따라, 상기 제3 판단부는 상기 제1 세부 분석부에 의해 추출 결과를 전달받아, 특징점 정보(안면으로 설정된 특징점 정보)에 따른 안면 영역이 안면 영역의 전면부에 해당하는지 판단하는 것이 바람직하다.Accordingly, it is preferable that the third determination unit receives the extraction result from the first detailed analysis unit and determines whether the facial area according to the minutiae point information (facial minutiae point information) corresponds to the front portion of the facial area.

전면부의 판단 기준으로는, 전달받은 특징점 정보로 2개의 눈, 코, 입과, 턱을 포함하는 얼굴 윤곽이 모두 포함되어 있을 경우, 전면부에 해당한다고 판단하는 것이 바람직하다.As a criterion for determining the front part, it is preferable to determine that the front part corresponds to the front part when all facial contours including two eyes, a nose, a mouth, and a chin are included in the received feature point information.

이를 통해서, 상기 제2 설정부는 상기 제3 판단부의 판단 결과에 따라, 상기 제1 세부 분석부에 의해 추출 결과로 전달받은 특징점 정보가 안면 영역의 전면부에 해당될 경우, 상기 제1 설정부와 마찬가지로, 해당하는 특징점 정보에 따른 안면 영역의 위치 정보를 분석하여, 도 2의 a)에 도시된 바와 같이, 해당하는 프레임 내 관심 요구 영역 정보의 생성을 위한 예비 정보로 설정하게 된다.Through this, the second setting unit, according to the determination result of the third determination unit, when the feature point information transmitted as the extraction result by the first detailed analysis unit corresponds to the front part of the facial area, the first setting unit and Similarly, positional information of the facial region according to the corresponding minutiae point information is analyzed and set as preliminary information for generating information on the required region of interest within the corresponding frame, as shown in a) of FIG. 2 .

더불어, 상기 관심 요구 영역 정보를 생성함에 있어서, 선생님의 안면 영역을 가장 상위 기준으로 설정하고 있기 때문에, 상기 예비 분석부는 각 프레임 별로, 상기 제1 설정부 또는, 제2 설정부와 같이, '안면 영역의 위치 정보'를 분석하여 상기 예비 정보가 설정될 경우, 상기 예비 분석부를 통한 또다른 기준에 의한 예비 정보를 설정하기 위한 추가 분석 없이, 상기 최종 분석부에 의한 동작이 수행되게 된다.In addition, since the facial region of the teacher is set as the highest criterion in generating the information on the required region of interest, the preliminary analysis unit, like the first setting unit or the second setting unit, sets the 'face' for each frame. When the preliminary information is set by analyzing the location information of the region, the final analysis unit performs an operation without additional analysis to set preliminary information based on another criterion through the preliminary analysis unit.

상기 제4 판단부는 상기 제2 판단부의 판단 결과에 따라, 안면으로 설정된 특징점 정보가 분석되지 않을 경우, 다시 말하자면, 상기 제1 기초 분석부에 의한 분석 결과에서 안면으로 설정된 특징점 정보가 포함되어 있지 않을 경우 또는, 상기 제3 판단부의 판단 결과에 따라, 상기 제1 세부 분석부에 의해 추출 결과로 전달받은 특징점 정보가 안면 영역의 전면부에 해당되지 않을 경우에 동작을 수행하게 된다.According to the determination result of the second determination unit, the fourth determination unit does not analyze the feature point information set as the face, in other words, when the analysis result by the first basic analysis unit does not include the feature point information set as the face. or, according to the determination result of the third determination unit, when the feature point information received as an extraction result by the first detailed analysis unit does not correspond to the front part of the facial region, an operation is performed.

즉, 연구 결과를 토대로 상기 관심 요구 영역 정보를 생성함에 있어서, 가장 상위 기준에 해당하지 않을 경우(안면을 포함하고 있지 않거나, 안면을 포함하고 있지만 전면부가 아닌 측면부에 해당할 경우), 상기 제1 기초 분석부에 의한 분석 결과를 이용하여 손(hand)으로 설정된 특징점 정보의 분석 여부를 판단하게 된다.That is, in generating the information on the required region of interest based on the research results, if it does not correspond to the highest criterion (if it does not contain the face or if it includes the face but corresponds to the side part rather than the front part), the first Using the analysis result by the basic analysis unit, it is determined whether to analyze the feature point information set by the hand.

다시 말하자면, 상기 제4 판단부는 상기 제2 판단부의 판단 결과에 따라, 안면으로 설정된 특징점 정보가 분석되지 않거나, 상기 제3 판단부의 판단 결과에 따라, 안면 영역의 전면부에 해당되지 않을 경우, 상기 제1 기초 분석부에 의한 분석 결과를 이용하여, 분석 결과로 손으로 설정된 특징점 정보를 포함하고 있는지 판단하게 된다.In other words, if the feature point information set to the face is not analyzed according to the determination result of the second determination unit or does not correspond to the front part of the face region according to the determination result of the third determination unit, the fourth determination unit determines the Using the analysis result by the first basic analysis unit, it is determined whether or not hand-set feature point information is included as the analysis result.

즉, 프레임 내 보여 지는 선생님의 신체 부위가 손을 포함하고 있는지 분석하게 된다.That is, it is analyzed whether the teacher's body part shown in the frame includes the hand.

이 때, 상기 제1 기초 분석부는 상술한 바와 같이, 프레임 내 사람의 몸, 손 및 얼굴의 특징점을 추출할 수 있으며, 당연히 이를 위해서는 각 특징점 별 라벨링이 진행되어 있기 때문에, 상기 제4 판단부는 상기 제1 기초 분석부에 의한 분석 결과를 이용하여, 손으로 설정된 특징점 정보가 포함되었는지 판단하게 된다.At this time, as described above, the first basic analysis unit can extract the feature points of the body, hand, and face of the person in the frame, and since labeling for each feature point is in progress, the fourth determination unit Using the analysis result by the first basic analysis unit, it is determined whether the feature point information set by hand is included.

학습 동영상 데이터에서 선생님의 얼굴은 보이지만 전면부가 아닐 경우는, 선생님이 배경 영역에 해당하는 표, 지문 등을 가리키고 있거나, 판서를 진행하고 있을 가능성이 높고, 아예 선생님이 보이지 않을 경우에는, 선생님이 목소리만을 통해서 배경 영역에 해당하는 표, 지문 등에 대한 설명을 진행하고 있을 가능성이 높다. 그렇기 때문에, 연구 결과를 토대로, 선생님 없이 목소리만으로 집중도를 높이는 것보다 용이한 선생님의 손 동작을 통한 가리키는 동작 또는, 판서 동작을 그 다음 분석 기준으로 설정하고, 마지막 분석 기준으로는 보여 지는 모습 없이 표, 지문 등에 대한 설명 동작을 설정하는 것이 바람직하다.In the learning video data, if the teacher's face is visible but not the front, it is highly likely that the teacher is pointing to a table or fingerprint corresponding to the background area, or is in the process of writing, and if the teacher is not visible at all, the teacher's voice It is highly likely that only through the description of tables and fingerprints corresponding to the background area. Therefore, based on the research results, the next analysis criterion is the pointing motion through the teacher's hand motion or the writing motion, which is easier than increasing the concentration with only the voice without a teacher. , it is desirable to set explanation operations for fingerprints, etc.

이러한 점을 고려하여, 상기 제3 설정부는 상기 제4 판단부의 판단 결과, 손으로 설정된 특징점 정보가 분석될 경우, 다시 말하자면, 상기 제1 기초 분석부에 의한 분석 결과에서 손으로 설정된 특징점 정보가 포함되어 있을 경우, 해당하는 특징점 정보를 추출하게 된다.In consideration of this point, when the third setting unit analyzes the hand-set feature point information as a result of the determination of the fourth decision unit, in other words, the hand-set feature point information is included in the analysis result by the first basic analysis unit. If it is, the corresponding feature point information is extracted.

상기 제3 설정부는 추출한 특징점 정보(손으로 설정된 특징점 정보)에 따른 손 영역의 위치 정보를 분석하여, 도 2의 b)에 도시된 바와 같이, 해당하는 프레임 내 관심 요구 영역 정보의 생성을 위한 예비 정보로 설정하게 된다. 상기 예비 정보로는, 각 프레임 별 시간 정보를 이미 알 수 있기 때문에, 영역에 대한 좌표 정보인 것이 바람직하다.The third setter analyzes the positional information of the hand region according to the extracted feature point information (hand-set feature point information), and as shown in b) of FIG. information is set. As the preliminary information, since time information for each frame is already known, it is preferable that it is coordinate information for an area.

더불어, 상기 관심 요구 영역 정보를 생성함에 있어서, 선생님의 손 영역을 차상위 기준으로 설정하고 있기 때문에, 상기 예비 분석부는 각 프레임 별로, 상기 제4 설정부와 같이, '손 영역의 위치 정보'를 분석하여 상기 예비 정보가 설정될 경우, 상기 예비 분석부를 통한 또다른 기준에 의한 예비 정보를 설정하기 위한 추가 분석 없이, 상기 최종 분석부에 의한 동작이 수행되게 된다.In addition, since the teacher's hand region is set as the second highest criterion in generating the information on the required region of interest, the preliminary analysis unit analyzes 'position information of the hand region' for each frame, like the fourth setting unit. Thus, when the preliminary information is set, an operation by the final analysis unit is performed without additional analysis for setting preliminary information based on another criterion through the preliminary analysis unit.

상기 제2 기초 분석부는 상기 제1 판단부의 판단 결과에 따라, 인체가 인식되지 않을 경우(프레임 내 선생님이 존재하지 않을 경우), 미리 저장된 제2 분석 알고리즘을 적용하여, 해당하는 프레임과 직전 프레임 간의 변화 영역을 분석하게 된다.The second basic analysis unit applies a pre-stored second analysis algorithm when the human body is not recognized (when there is no teacher in the frame) according to the determination result of the first determination unit, and the relationship between the corresponding frame and the immediately preceding frame is determined. Areas of change are analyzed.

즉, 상기 제2 기초 분석부는 상기 제1 판단부의 판단 결과에 따라, 인체가 인식되지 않을 경우, 프레임 내 선생님이 존재하지 않고, 도 2의 c)에 도시된 바와 같이, 표, 지문 등을 통한 배경 영역이 제공되고 있다고 판단하고, 프레임 내 변화 영역을 분석하게 된다.That is, when the human body is not recognized by the second basic analysis unit according to the determination result of the first determination unit, the teacher in the frame does not exist, and as shown in c) of FIG. 2, the table, fingerprint, etc. It is determined that the background area is being provided, and the change area within the frame is analyzed.

이를 위해, 상기 제2 분석 알고리즘으로 차영역 분석(Differential image analysis) 알고리즘을 통해서, 해당하는 현재 프레임과 직전 프레임 간의 색상 변화 영역 등을 분석하여, 변화 영역을 추출하게 된다.To this end, the color change area between the corresponding current frame and the previous frame is analyzed through a differential image analysis algorithm as the second analysis algorithm, and the change area is extracted.

발생 가능한 변화 영역의 예를 들자면, 정답을 맞춰야 하는 상황에서, 정답이 입력되거나, 손이 보이지 않은 채로 판서가 진행되고 있어, 강조를 위한 표시가 입력되는 경우가 있다.As an example of a change area that can occur, there is a case where a correct answer is input in a situation where a correct answer is to be guessed, or a mark for emphasis is input because writing is in progress while the hand is not visible.

상기 제4 설정부는 상기 제2 기초 분석부에 의한 분석 결과를 전달받아, 직전 프레임을 기준으로 현재 프레임이 변화한 영역의 위치 정보를 분석하여, 분석한 위치 정보를 상기 예비 정보로 설정하게 된다.The fourth setting unit receives the analysis result from the second basic analysis unit, analyzes location information of a region where the current frame has changed based on the previous frame, and sets the analyzed location information as the preliminary information.

상기 제4 설정부는 추출한 변화 영역의 위치 정보를 분석하여, 해당하는 프레임 내 관심 요구 영역 정보의 생성을 위한 예비 정보로 설정하게 된다. 상기 예비 정보로는, 각 프레임 별 시간 정보를 이미 알 수 있기 때문에, 영역에 대한 좌표 정보인 것이 바람직하다.The fourth setting unit analyzes the location information of the extracted change region and sets it as preliminary information for generating information on a required region of interest within a corresponding frame. As the preliminary information, since time information for each frame is already known, it is preferable that it is coordinate information for an area.

이를 통해서, 상기 예비 분석부는 각 프레임 별로 다양한 기준 조건에 따른 상기 예비 정보를 설정하게 되며, 각 프레임 마다 반드시 하나의 예비 정보가 포함되는 것은 아니며, 동영상 데이터의 내용에 따라, 상기 예비 정보가 설정되지 않는 프레임이 존재할 수도 있다.Through this, the preliminary analysis unit sets the preliminary information according to various reference conditions for each frame, and each frame does not necessarily include one preliminary information, and the preliminary information is not set according to the contents of the video data. There may be frames that do not exist.

상기 최종 분석부는 상기 예비 분석부에서 추출한 각 프레임 별 예비 정보를 이용하여, 상기 동영상 데이터에 대한 관심 요구 영역 정보를 생성하는 것이 바람직하다.Preferably, the final analysis unit generates information on a region requiring interest for the video data using preliminary information for each frame extracted by the preliminary analysis unit.

시계열 순으로 재생되는 동영상 데이터의 특성 상, 각 프레임 별 분석한 예비 정보, 일 예를 들자면, 선택되는 어느 하나의 프레임에서는 직전 프레임에 비해 인체 미인식인 채로 배경 영역으로 제공되는 표 영역에서 '동그라미 표시'가 추가되는 변화량이 발생하여, 동그라미 영역이 예비 정보로 설정되었고, 다음 프레임에서는 '별 표시'가 추가되는 변화량이 발생하여, 별 영역이 예비 정보로 설정되었고, 그 다음 프레임에서는 '3 기재'가 추가되는 변화량이 발생하여, 3 영역이 예비 정보로 설정되었을 경우, 각각의 예비 정보마다의 라벨링을 통한 상기 관심 요구 영역 정보를 생성하는 것이 아니라, 해당 기준(프레임 내 선생님이 존재하지 않을 경우의 프레임 변화량 분석)을 고려하여, 상기 예비 정보를 통합하여 분석하게 된다.Due to the nature of video data played in time series order, preliminary information analyzed for each frame, for example, in a selected frame, a 'circle' is displayed in the table area provided as a background area without human body recognition compared to the previous frame. ' occurred, the circled area was set as preliminary information, and in the next frame, the amount of change in which 'star' was added occurred, and the star area was set as preliminary information, and in the next frame, '3 description' When 3 areas are set as preliminary information when 3 areas are set as preliminary information, rather than generating the information on the area requiring interest through labeling for each preliminary information, the corresponding criterion (when no teacher in the frame exists) frame variation analysis), the preliminary information is integrated and analyzed.

일 예를 들자면, 손의 위치가 움직일 경우, 각 위치마다 상기 예비 정보를 설정하고, 이를 통해서, 관심 요구 영역 정보를 생성하는 것이 아니라, 손의 움직임이 시작된 시간(해당 기준 시작)부터 끝난 시간(해당 기준 끝 또는, 다른 기준 시작) 동안의 예비 정보에 의한 위치 좌표를 통합하여 분석하게 된다.For example, when the position of the hand moves, the preliminary information is set for each position, and through this, the information on the required area of interest is not generated, but the time from the time the movement of the hand started (the start of the reference) to the end ( The location coordinates by preliminary information during the end of the standard or the beginning of another standard) are integrated and analyzed.

추가 예를 들자면, 선생님이 옆을 보면서 판서를 하다가 정면을 쳐다볼 경우, 손의 위치가 움직이고 있지만, 전면부가 추출되게 된다. 그렇기 때문에, 손의 위치에 따른 하나의 관심 요구 영역 정보가 생성되고, 전면부가 나타남에 따라 전면부의 위치에 따른 또다른 하나의 관심 요구 영역 정보가 생성되게 된다. 즉, 전면부로 예비 정보의 설정 기준이 바뀜에 따라, 이전 설정 기준에 따른 예비 정보의 분석 종료 시점이 완성되게 된다.As an additional example, if the teacher looks at the front while looking at the side while writing, the front part is extracted even though the position of the hand is moving. Therefore, one piece of information on the required region of interest is generated according to the position of the hand, and as the front part appears, another piece of information on the required region of interest is generated according to the position of the front part. That is, as the setting standard of the preliminary information is changed to the front side, the analysis end point of the preliminary information according to the previous setting standard is completed.

또다른 예를 들자면, 선생님이 옆을 보면서 판서를 하다가 화면에서 사라진 후, 프레임 간의 변화량일 나타나지 않을 경우, 손의 위치에 따른 하나의 관심 요구 영역 정보가 프레임의 변화량이 존재하지 않는 첫 번째 프레임 시간까지로 마무리되게 된다.As another example, if the teacher disappears from the screen while writing while looking at the side, and the amount of change between frames does not appear, information on one area of interest according to the position of the hand is the time of the first frame at which the amount of change in the frame does not exist. will end with

이러한 점을 고려하여, 상기 최종 분석부는 각 프레임 별 설정한 예비 정보를 이용하여, 상기 동영상 데이터를 구성하는 프레임의 순서를 기준으로, 연속하는 둘 이상의 프레임이 동일한 분석 결과를 통해서 예비 정보를 설정할 경우, 해당하는 첫 프레임을 시작 프레임으로 하고, 해당하는 마지막 프레임을 끝 프레임으로 하여, 상기 관심 요구 영역 정보의 시간 정보를 생성하고, 해당하는 첫 프레임부터 마지막 프레임까지의 예비 정보를 중첩시켜, 상기 관심 요구 영역 정보의 좌표 정보를 생성하는 것이 바람직하다.In consideration of this point, when the final analyzer sets preliminary information through the same analysis result of two or more consecutive frames based on the order of the frames constituting the video data using the preliminary information set for each frame. , with the corresponding first frame as the start frame and the corresponding last frame as the end frame, time information of the interest region information is generated, preliminary information from the corresponding first frame to the last frame is overlapped, and the interest It is desirable to generate coordinate information of the requested area information.

물론, 동영상 데이터의 특성 상, 현실적으로 하나의 프레임에 대해서 상이한 예비 정보가 설정되는 것은 거의 불가능하다.Of course, due to the characteristics of video data, it is practically impossible to set different preliminary information for one frame.

다만, 각 프레임마다의 상기 관심 요구 영역 정보를 생성하는 것이 아니라, 상기 동영상 데이터에 대해서 소정 시간 동안 소정 좌표 영역에 해당하는 관심 요구 영역 정보를 생성하는 것이기 때문에, 연속하는 둘 이상의 프레임이 동일한 분석 결과(얼굴 영역(전면부), 손 영역 및 변화 영역)를 통해서 예비 정보를 설정할 경우, 해당하는 프레임들을 분석 결과를 기준으로 그룹화하고, 각 그룹 별 해당하는 소정 시간 정보와 해당하는 소정 위치 영역 좌표 정보를 분석하게 된다.However, since the information on the required region of interest corresponding to a predetermined coordinate region is generated for a predetermined period of time with respect to the video data, rather than generating the information on the required region of interest for each frame, two or more consecutive frames have the same analysis result. When preliminary information is set through (face area (front part), hand area, and change area), the corresponding frames are grouped based on the analysis result, and corresponding predetermined time information and corresponding predetermined position area coordinate information for each group will analyze

또한, 상기 최종 분석부는 각 프레임 별 설정한 예비 정보를 이용하여, 상기 동영상 데이터를 구성하는 프레임의 순서를 기준으로, 미리 설정된 소정 시간(일 예를 들자면, 학생의 집중도가 최대치가 되는 시간)에 해당하는 프레임들이 동알한 분석 결과를 통해서 예비 정보를 설정할 경우, 해당하는 첫 프레임을 시작 프레임으로 하고, 해당하는 마지막 프레임을 끝 프레임으로 하여, 상기 관심 요구 영역 정보의 시간 정보를 생성하고, 해당하는 첫 프레임부터 마지막 프레임까지의 예비 정보를 중첩시켜, 상기 관심 요구 영역 정보의 좌표 정보를 생성하는 것이 바람직하다.In addition, the final analysis unit uses the preliminary information set for each frame at a predetermined time (for example, the time when the concentration of the student reaches the maximum) based on the order of the frames constituting the video data. When the corresponding frames set preliminary information through the same analysis result, the corresponding first frame is set as a start frame, and the corresponding last frame is set as an end frame to generate time information of the required region of interest information. Preferably, coordinate information of the ROI information is generated by overlapping preliminary information from the first frame to the last frame.

또한, 상기 최종 분석부는 각 프레임 별 설정한 예비 정보를 이용하여, 상기 동영상 데이터를 구성하는 프레임의 순서를 기준으로, 미리 설정된 소정 개수 이상(일 예를 들자면, 동영상 데이터에 설정되어 있는 fps를 기준으로 학생의 집중도가 최대치가 되는 시간에 해당하는 프레임 개수)의 프레임들이 동알한 분석 결과를 통해서 예비 정보를 설정할 경우, 해당하는 첫 프레임을 시작 프레임으로 하고, 해당하는 마지막 프레임을 끝 프레임으로 하여, 상기 관심 요구 영역 정보의 시간 정보를 생성하고, 해당하는 첫 프레임부터 마지막 프레임까지의 예비 정보를 중첩시켜, 상기 관심 요구 영역 정보의 좌표 정보를 생성하는 것이 바람직하다.In addition, the final analysis unit uses the preliminary information set for each frame, based on the order of the frames constituting the video data, a predetermined number or more (for example, based on the fps set in the video data) If the preliminary information is set through the same analysis result of the frames of the number of frames corresponding to the time when the student's concentration is maximized, the corresponding first frame is set as the start frame, and the corresponding last frame is set as the end frame, Preferably, coordinate information of the request region of interest information is generated by generating time information of the request region of interest information and overlapping preliminary information from a corresponding first frame to a corresponding last frame.

여기서, 상기 최종 분석부에서 다수의 프레임에 대한 예비 정보를 중첩시키는 것은, 단순히 해당하는 모든 프레임에 설정되어 있는 예비 정보를 중첩시킴으로써 나타나는 위치 영역의 좌표 정보를 분석하거나, 해당하는 모든 프레임에 설정되어 있는 예비 정보를 중첩시킨 후, 중심점을 기준으로 미리 설정된 소정 범위의 좌표 정보를 분석하게 된다. 이 때, 미리 설정된 소정 범위는 해당하는 예비 정보의 분석 결과를 토대로 설정되는 것이 바람직하다.Here, the overlapping of preliminary information for a plurality of frames in the final analysis unit is simply analyzing coordinate information of a location area indicated by overlapping preliminary information set in all corresponding frames, or set in all corresponding frames. After overlapping the preliminary information, the coordinate information of a predetermined range set in advance based on the center point is analyzed. At this time, it is preferable that the predetermined range is set based on the analysis result of the corresponding preliminary information.

일 예를 들자면, 얼굴 영역을 기준으로 설정한 예비 정보들을 중첩시킬 경우, 얼굴 영역만으로 충분하기 때문에, 범위 수정이 필요하지 않게 된다.For example, when preliminary information set based on the face area is overlapped, range correction is not required because only the face area is sufficient.

그렇지만, 손 영역을 기준으로 설정한 예비 정보의 경우, 집중해야 하는 영역은 손 영역이 아닌, 손 영역으로 인해 변화되는 영역에 해당하기 때문에, 손 영역을 기준으로 보다 넓은 범위로 수정이 하는 것이 바람직하다.However, in the case of preliminary information set based on the hand region, since the area to be focused on is not the hand region but the region that changes due to the hand region, it is desirable to make corrections in a wider range based on the hand region. do.

또한, 변화 영역을 기준으로 설정한 예비 정보의 경우, 변화 영역이 집중해야 하는 영역에 해당하기 때문에, 범위 수정이 필요하지 않게 된다.In addition, in the case of preliminary information set based on the change area, since the change area corresponds to the area to be focused on, range correction is not required.

도 3은 본 발명의 일 실시예에 따른 동영상 데이터 내 관심 요구 영역 분석 방법을 나타낸 순서 예시도이다. 도 3을 참조로 하여 본 발명의 일 실시예에 따른 동영상 데이터 내 관심 요구 영역 분석 방법을 자세히 알아보도록 한다.3 is an exemplary flowchart illustrating a method for analyzing a region of interest in video data according to an embodiment of the present invention. Referring to FIG. 3 , a method for analyzing a region of interest in video data according to an embodiment of the present invention will be described in detail.

본 발명의 일 실시예에 따른 동영상 데이터 내 관심 요구 영역 분석 방법은 도 3에 도시된 바와 같이, 데이터 입력 단계(S100), 기초 분석 단계(S200), 제1 판단 단계(S300), 제2 판단 단계(S400), 제1 설정 단계(S500) 및 최종 분석 단계(S600)를 포함하여 구성되는 것이 바람직하며, 각 단계들은 연산 처리 수단에 의해 각 단계가 수행되는 동영상 데이터 내 관심 요구 영역 분석 시스템을 이용하여 동작되게 된다.As shown in FIG. 3, the method for analyzing a region of interest in video data according to an embodiment of the present invention includes a data input step (S100), a basic analysis step (S200), a first judgment step (S300), and a second judgment step. It is preferable to include a step (S400), a first setting step (S500) and a final analysis step (S600). It will work using .

각 단계에 대해서 자세히 알아보자면,For a detailed look at each step,

상기 데이터 입력 단계(S100)는 상기 데이터 입력부에서, 외부(온라인 학습 동영상 제공 서비스 업체 또는, 온라인 학습 집중도를 분석하고 이에 따른 코칭을 제공하는 서비스 제공 서버 등)로부터 관심 요구 영역 정보의 분석을 원하는 동영상 데이터를 입력받게 된다.In the data input step (S100), in the data input unit, a video for which analysis of information on a demand area of interest is desired from an outside (e.g., an online learning video providing service company or a service providing server that analyzes online learning concentration and provides coaching accordingly). data is entered.

상기 기초 분석 단계(S200)는 상기 예비 분석부에서, 미리 저장된 제1 분석 알고리즘을 적용하여, 각 프레임 내 인체를 인식하고, 인식한 인체의 특징점 정보를 분석하게 된다.In the basic analysis step (S200), the preliminary analysis unit applies a pre-stored first analysis algorithm to recognize a human body within each frame, and analyzes information on feature points of the recognized human body.

상기 제1 판단 단계(S300)는 상기 예비 분석부에서, 상기 기초 분석 단계(S200)의 분석 결과를 이용하여, 인체의 인식 여부를 판단하게 된다. 즉, 상술한 바와 같이, 상기 제1 분석 알고리즘을 통한 분석 결과를 이용하여, 학습 동영상 데이터의 프레임 별 선생님이 나타나는지 여부를 판단하게 된다. 즉, 프레임 별 선생님의 존재/미존재를 판단하게 된다.In the first determination step (S300), the preliminary analysis unit determines whether the human body is recognized by using the analysis result of the basic analysis step (S200). That is, as described above, it is determined whether or not a teacher appears for each frame of the learning video data using the analysis result through the first analysis algorithm. That is, the existence/non-existence of the teacher for each frame is determined.

그렇기 때문에, 단순히 프레임 내 선생님이 존재한다는 점에서 분석을 그치는 것이 아니라, 상기 제2 판단 단계(S400)를 통해서, 프레임 내 선생님은 존재하지만, 현재 선생님의 동작이 정면(카메라 방향)을 바라보고 있는지, 다른 동작을 취하고 있는지 추가 분석을 수행하게 된다.Therefore, the analysis does not stop simply from the fact that the teacher in the frame exists, but through the second determination step (S400), the teacher in the frame exists, but the current teacher's motion is looking at the front (camera direction). , it will perform additional analysis to see if it is taking a different action.

상기 제2 판단 단계(S400)는 상기 예비 분석부에서, 상기 제1 판단 단계(S300)의 판단 결과에 따라, 인체가 인식될 경우(프레임 내 선생님이 존재할 경우), 상기 기초 분석 단계(S200)의 분석 결과를 이용하여, 안면(face)으로 설정된 특징점 정보의 분석 여부를 판단하게 된다.The second determination step (S400) is the basic analysis step (S200) when a human body is recognized (when a teacher exists in the frame) according to the determination result of the first determination step (S300) in the preliminary analysis unit. Using the analysis result of , it is determined whether to analyze feature point information set as a face.

다시 말하자면, 인체가 인식될 경우, 안면으로 설정된 특징점 정보를 포함하고 있는지 판단하게 된다. 이를 통해서, 프레임 내 선생님이 존재하지만, 보여 지는 선생님의 신체 부위가 얼굴을 포함하고 있는지 분석하게 된다.In other words, when a human body is recognized, it is determined whether or not it includes feature point information set as a face. Through this, although the teacher exists in the frame, it is analyzed whether the body parts of the teacher shown include the face.

여기서, 상기 기초 분석 단계(S200)의 분석 결과는 프레임 내 사람의 몸, 손 및 얼굴의 특징점을 추출할 수 있으며, 당연히 이를 위해서는 각 특징점 별 라벨링이 진행되어 있기 때문에, 상기 제2 판단 단계(S400)에서 상기 기초 분석 단계(S200)의 분석 결과를 이용하여, 안면으로 설정된 특징점 정보가 포함되었는지 판단하게 된다.Here, as a result of the analysis of the basic analysis step (S200), feature points of the body, hand, and face of a person in the frame can be extracted. ), it is determined whether feature point information set as a face is included using the analysis result of the basic analysis step (S200).

상기 제1 설정 단계(S500)는 상기 예비 분석부에서, 상기 제2 판단 단계(S400)의 판단 결과에 따라, 안면으로 설정된 특징점 정보가 분석될 경우, 다시 말하자면, 상기 기초 분석 단계(S200)의 분석 결과에서 안면으로 설정된 특징점 정보가 포함되어 있을 경우, 해당하는 특징점 정보를 추출하게 된다.In the first setting step (S500), in the preliminary analysis unit, when the feature point information set as the face is analyzed according to the determination result of the second determination step (S400), in other words, the basic analysis step (S200) If the analysis result includes feature point information set as a face, the corresponding feature point information is extracted.

또한, 상기 제1 설정 단계(S500)는 추출한 특징점 정보를 이용하여 예비 정보를 설정하게 된다.In addition, in the first setting step (S500), preliminary information is set using the extracted feature point information.

상세하게는, 상기 제1 판단 단계(S300)의 판단 결과에 따라, 선생님이 존재한다고 판단될 경우, 상기 제2 판단 단계(S400)는 보여 지는 선생님의 신체 부위가 얼굴을 포함하고 있는지, 얼굴을 포함하고 있지 않은지 판단하게 된다.In detail, when it is determined that a teacher exists according to the determination result of the first determination step (S300), the second determination step (S400) determines whether the teacher's body part to be shown includes a face, and whether the teacher's face is included. You decide whether it is included or not.

이에 따라, 상기 제1 설정 단계(S500)는 상기 기초 분석 단계(S200)의 분석 결과에서 안면으로 설정된 특징점 정보를 추출하고, 추출한 특징점 정보에 따른 안면 영역의 위치 정보를 분석하여, 도 2의 a)에 도시된 바와 같이, 해당하는 프레임 내 관심 요구 영역 정보의 생성을 위한 예비 정보로 설정하게 된다. 상기 예비 정보로는, 각 프레임 별 시간 정보를 이미 알 수 있기 때문에, 영역에 대한 좌표 정보인 것이 바람직하다.Accordingly, in the first setting step (S500), the feature point information set as the face is extracted from the analysis result of the basic analysis step (S200), and positional information of the facial area according to the extracted feature point information is analyzed, and FIG. 2 a As shown in ), it is set as preliminary information for generating information on the required region of interest within a corresponding frame. As the preliminary information, since time information for each frame is already known, it is preferable that it is coordinate information for an area.

다만, 상기 제1 설정 단계(S500)는 선생님의 안면 영역으로, 안면의 전면부(정면의 카메라를 바라보고 있을 때 보여 지는 안면 위치)가 아닌 측면부(정면의 카메라를 바라보고 있지 않을 때 보여 지는 안면 위치)를 포함할 수 있고, 선생님의 측면부가 보여 지는 것은, 현재 선생님의 행위가 판서를 수행하거나, 특정 지점을 가리키는 등의 동작을 수행하고 있다고 가정할 수 있다. 또한, 연구 결과, 선생님의 전면부 얼굴이 보여지지 않고 손 동작이 발생할 경우, 손 동작으로 집중도가 옮겨짐을 알 수 있기에, 분석 결과, 안면으로 설정된 특징점 정보가 전면부에 해당하는지 추가 판단하여, 관심 요구 영역 정보의 생성 정확도를 향상시킬 수 있다.However, the first setting step (S500) is the teacher's face area, not the front part of the face (the position of the face shown when looking at the front camera), but the side part (shown when not looking at the front camera) facial position), and the side part of the teacher is shown, it can be assumed that the current teacher's action is performing a writing or pointing to a specific point. In addition, as a result of the study, when the teacher's frontal face is not shown and a hand motion occurs, it can be seen that the concentration is shifted by the hand motion. It is possible to improve generation accuracy of request area information.

이를 위해, 도 3에 도시된 바와 같이, 상기 제1 설정 단계(S500)를 대신해서, 추가 판단 단계(S510) 및 제2 설정 단계(S520)를 포함하게 된다.To this end, as shown in FIG. 3 , an additional determination step (S510) and a second setting step (S520) are included instead of the first setting step (S500).

상기 추가 판단 단계(S510)는 상기 예비 분석부에서, 상기 제2 판단 단계(S400)의 판단 결과에 따라, 안면으로 설정된 특징점 정보가 분석될 경우, 상기 기초 분석 단계(S200)의 분석 결과에서 해당하는 특징점 정보를 추출하여, 특징점 정보(안면으로 설정된 특징점 정보)에 따른 안면 영역이 안면 영역의 전면부에 해당하는지 판단하게 된다.In the additional determination step (S510), when the feature point information set as the face is analyzed according to the determination result of the second determination step (S400) in the preliminary analysis unit, the corresponding feature is determined from the analysis result of the basic analysis step (S200). By extracting the minutiae point information, it is determined whether the facial area according to the minutiae point information (the minutiae point information set as face) corresponds to the front part of the face area.

상기 제2 설정 단계(S520)는 상기 추가 판단 단계(S510)의 판단 결과에 따라, 안면 영역의 전면부에 해당될 경우, 해당하는 특징점 정보에 따른 안면 영역의 위치 정보를 분석하여, 도 2의 a)에 도시된 바와 같이, 해당하는 프레임 내 관심 요구 영역 정보의 생성을 위한 예비 정보로 설정하게 된다.In the second setting step (S520), according to the determination result of the additional determination step (S510), if the face region corresponds to the front part, the location information of the face region according to the corresponding feature point information is analyzed, and As shown in a), it is set as preliminary information for generating the information on the required region of interest within a corresponding frame.

더불어, 상기 관심 요구 영역 정보를 생성함에 있어서, 선생님의 안면 영역을 가장 상위 기준으로 설정하고 있기 때문에, 각 프레임 별로, 상기 제1 설정 단계(S500) 또는, 제2 설정 단계(S520)를 통해서, '안면 영역의 위치 정보'를 분석하여 상기 예비 정보가 설정될 경우, 또다른 기준에 의한 예비 정보를 설정하기 위한 추가 분석을 수행하지 않고 상기 최종 분석 단계(S600)에 의한 동작을 수행하게 된다.In addition, since the teacher's facial region is set as the highest criterion in generating the information on the required region of interest, through the first setting step (S500) or the second setting step (S520) for each frame, When the preliminary information is set by analyzing the 'positional information of the facial region', the operation according to the final analysis step (S600) is performed without performing additional analysis to set the preliminary information according to another criterion.

본 발명의 일 실시예에 따른 동영상 데이터 내 관심 요구 영역 분석 방법은 도 3에 도시된 바와 같이, 상기 제1 판단 단계(S300)의 판단 결과에 따라, 인체가 인식되지 않을 경우, 추가 분석 단계(S700) 및 제3 설정 단계(S800)를 수행하게 된다.As shown in FIG. 3 , in the method for analyzing a region of interest in video data according to an embodiment of the present invention, when a human body is not recognized according to the determination result of the first determination step (S300), an additional analysis step ( S700) and the third setting step (S800) are performed.

또한, 상기 제2 판단 단계(S400)의 판단 결과에 따라, 안면으로 설정된 특징점 정보가 분석되지 않을 경우, 제3 판단 단계(S900) 및 제4 설정 단계(S1000)를 수행하게 된다.In addition, according to the determination result of the second determination step (S400), when the feature point information set as the face is not analyzed, the third determination step (S900) and the fourth setting step (S1000) are performed.

상기 추가 분석 단계(S700)는 상기 예비 분석부에서, 상기 제1 판단 단계(S300)의 판단 결과에 따라, 인체가 인식되지 않을 경우(프레임 내 선생님이 존재하지 않을 경우), 미리 저장된 제2 분석 알고리즘을 적용하여, 해당하는 프레임과 직전 프레임 간의 변화 영역을 분석하게 된다.In the additional analysis step (S700), according to the determination result of the first determination step (S300) in the preliminary analysis unit, when the human body is not recognized (when the teacher does not exist in the frame), the pre-stored second analysis is performed. By applying the algorithm, a change area between the corresponding frame and the previous frame is analyzed.

즉, 상기 제1 판단 단계(S300)의 판단 결과에 따라, 인체가 인식되지 않을 경우, 프레임 내 선생님이 존재하지 않고, 도 2의 c)에 도시된 바와 같이, 표, 지문 등을 통한 배경 영역이 제공되고 있다고 판단하고, 프레임 내 변화 영역을 분석하게 된다.That is, according to the determination result of the first determination step (S300), when the human body is not recognized, the teacher in the frame does not exist, and as shown in c) of FIG. 2, the background area through the table or fingerprint, etc. is provided, and the change area within the frame is analyzed.

상기 제3 설정 단계(S800)는 상기 예비 분석부에서, 상기 추가 분석 단계(S700)의 분석 결과를 전달받아, 직전 프레임을 기준으로 현재 프레임이 변화한 영역의 위치 정보를 분석하여, 분석한 위치 정보를 상기 예비 정보로 설정하게 된다.In the third setting step (S800), the preliminary analysis unit receives the analysis result of the additional analysis step (S700), analyzes the location information of the area where the current frame has changed based on the previous frame, and analyzes the analyzed location. Information is set as the preliminary information.

상기 제3 설정 단계(S800)는 추출한 변화 영역의 위치 정보를 분석하여, 해당하는 프레임 내 관심 요구 영역 정보의 생성을 위한 예비 정보로 설정하게 된다. 상기 예비 정보로는, 각 프레임 별 시간 정보를 이미 알 수 있기 때문에, 영역에 대한 좌표 정보인 것이 바람직하다.In the third setting step (S800), location information of the extracted change area is analyzed and set as preliminary information for generating information on the required area of interest within the corresponding frame. As the preliminary information, since time information for each frame is already known, it is preferable that it is coordinate information for an area.

상기 제3 판단 단계(S900)는 상기 예비 분석부에서, 상기 제2 판단 단계(S400)의 판단 결과에 따라, 안면으로 설정된 특징점 정보가 분석되지 않을 경우, 다시 말하자면, 상기 기초 분석 단계(S200)의 분석 결과에서 안면으로 설정된 특징점 정보가 포함되어 있지 않을 경우 또는, 상기 추가 판단 단계(S510)의 판단 결과에 따라, 안면 영역의 전면부에 해당되지 않을 경우에 동작을 수행하게 된다.The third determination step (S900) is performed in the preliminary analysis unit when, according to the determination result of the second determination step (S400), feature point information set as a face is not analyzed, in other words, the basic analysis step (S200). The operation is performed when the feature point information set as the face is not included in the analysis result or when it does not correspond to the front part of the face region according to the determination result of the additional determination step (S510).

즉, 연구 결과를 토대로 상기 관심 요구 영역 정보를 생성함에 있어서, 가장 상위 기준에 해당하지 않을 경우(안면을 포함하고 있지 않거나, 안면을 포함하고 있지만 전면부가 아닌 측면부에 해당할 경우), 차상위 기준에 해당하는 손(hand)으로 설정된 특징점 정보의 분석 여부를 판단하게 된다.That is, in generating the information on the required region of interest based on the research results, if it does not correspond to the highest criterion (if it does not contain the face or if it includes the face but corresponds to the side part instead of the front part), the second highest criterion It is determined whether to analyze feature point information set by the corresponding hand.

상세하게는, 상기 제3 판단 단계(S900)는 안면으로 설정된 특징점 정보가 분석되지 않거나, 안면 영역의 전면부에 해당되지 않을 경우, 상기 기초 분석 단계(S200)의 분석 결과를 이용하여, 분석 결과로 손으로 설정된 특징점 정보를 포함하고 있는지 판단하게 된다.In detail, in the third determination step (S900), when the feature point information set as the face is not analyzed or does not correspond to the front part of the face region, the analysis result is obtained by using the analysis result of the basic analysis step (S200). It is determined whether or not the feature point information set by hand is included.

학습 동영상 데이터에서 선생님의 얼굴은 보이지만 전면부가 아닐 경우는, 선생님이 배경 영역에 해당하는 표, 지문 등을 가리키고 있거나, 판서를 진행하고 있을 가능성이 높고, 아예 선생님이 보이지 않을 경우에는, 선생님이 목소리만을 통해서 배경 영역에 해당하는 표, 지문 등에 대한 설명을 진행하고 있을 가능성이 높다. 그렇기 때문에, 연구 결과를 토대로, 선생님 없이 목소리만으로 집중도를 높이는 것보다 용이한 선생님의 손 동작을 통한 가리키는 동작 또는, 판서 동작을 그 다음 분석 기준으로 설정하고, 마지막 분석 기준으로는 보여 지는 모습 없이 표, 지문 등에 대한 설명 동작을 설정하는 것이 바람직하다.In the learning video data, if the teacher's face is visible but not the front, it is highly likely that the teacher is pointing to a table or fingerprint corresponding to the background area, or is in the process of writing, and if the teacher is not visible at all, the teacher's voice It is highly likely that only through the description of tables and fingerprints corresponding to the background area. Therefore, based on the research results, the next analysis criterion is the pointing motion through the teacher's hand motion or the writing motion, which is easier than increasing the concentration with only the voice without a teacher, and the final analysis criterion is a table without a visible figure. , it is desirable to set explanation operations for fingerprints, etc.

상기 제4 설정 단계(S1000)는 상기 예비 분석부에서, 상기 제3 판단 단계(S900)의 판단 결과에 따라, 손으로 설정된 특징점 정보가 분석될 경우, 다시 말하자면, 상기 기초 분석 단계(S200)의 분석 결과에서 손으로 설정된 특징점 정보가 포함되어 있을 경우, 해당하는 특징점 정보를 추출하게 된다.In the fourth setting step (S1000), in the preliminary analysis unit, when the feature point information set by hand is analyzed according to the determination result of the third determining step (S900), in other words, in the basic analysis step (S200) If the analysis result includes manually set feature point information, the corresponding feature point information is extracted.

또한, 추출한 특징점 정보(손으로 설정된 특징점 정보)에 따른 손 영역의 위치 정보를 분석하여, 도 2의 b)에 도시된 바와 같이, 해당하는 프레임 내 관심 요구 영역 정보의 생성을 위한 예비 정보로 설정하게 된다. 상기 예비 정보로는, 각 프레임 별 시간 정보를 이미 알 수 있기 때문에, 영역에 대한 좌표 정보인 것이 바람직하다.In addition, positional information of the hand region according to the extracted feature point information (hand-set feature point information) is analyzed, and as shown in b) of FIG. will do As the preliminary information, since time information for each frame is already known, it is preferable that it is coordinate information for an area.

이 때, 각 프레임 별로 다양한 기준 조건에 따른 상기 예비 정보를 설정하게 되며, 각 프레임 마다 반드시 하나의 예비 정보가 포함되는 것은 아니며, 동영상 데이터의 내용에 따라, 상기 예비 정보가 설정되지 않는 프레임이 존재할 수도 있다.At this time, the preliminary information is set according to various reference conditions for each frame, and each frame does not necessarily include one preliminary information. may be

이러한 점을 고려하여, 상기 최종 분석 단계(S600)는 상술한 단계들을 통해서, 출한 각 프레임 별 예비 정보를 이용하여, 상기 동영상 데이터에 대한 관심 요구 영역 정보를 생성하게 된다.Considering this point, in the final analysis step (S600), interest-required region information for the video data is generated using the preliminary information for each frame output through the above-described steps.

이에 따라, 상기 최종 분석 단계(S600)는 각 프레임 별 설정한 예비 정보를 이용하여, 상기 동영상 데이터를 구성하는 프레임의 순서를 기준으로, 연속하는 둘 이상의 프레임이 동일한 분석 결과를 통해서 예비 정보를 설정할 경우, 해당하는 첫 프레임을 시작 프레임으로 하고, 해당하는 마지막 프레임을 끝 프레임으로 하여, 상기 관심 요구 영역 정보의 시간 정보를 생성하고, 해당하는 첫 프레임부터 마지막 프레임까지의 예비 정보를 중첩시켜, 상기 관심 요구 영역 정보의 좌표 정보를 생성하는 것이 바람직하다.Accordingly, in the final analysis step (S600), based on the order of the frames constituting the video data, preliminary information is set through the same analysis result of two or more consecutive frames using the preliminary information set for each frame. In this case, the corresponding first frame is used as the start frame and the corresponding last frame is used as the end frame to generate time information of the ROI information, and overlap preliminary information from the corresponding first frame to the last frame. It is preferable to generate coordinate information of the request area of interest information.

또한, 상기 최종 분석 단계(S600)는 각 프레임 별 설정한 예비 정보를 이용하여, 상기 동영상 데이터를 구성하는 프레임의 순서를 기준으로, 미리 설정된 소정 시간(일 예를 들자면, 학생의 집중도가 최대치가 되는 시간)에 해당하는 프레임들이 동알한 분석 결과를 통해서 예비 정보를 설정할 경우, 해당하는 첫 프레임을 시작 프레임으로 하고, 해당하는 마지막 프레임을 끝 프레임으로 하여, 상기 관심 요구 영역 정보의 시간 정보를 생성하고, 해당하는 첫 프레임부터 마지막 프레임까지의 예비 정보를 중첩시켜, 상기 관심 요구 영역 정보의 좌표 정보를 생성하는 것이 바람직하다.In addition, in the final analysis step (S600), using preliminary information set for each frame, based on the order of frames constituting the video data, a predetermined time (for example, when the concentration of the student is the maximum When preliminary information is set through the same analysis result of the frames corresponding to the corresponding time), the corresponding first frame is set as the start frame, and the corresponding last frame is set as the end frame to generate time information of the ROI information and overlapping preliminary information from a corresponding first frame to a corresponding last frame to generate the coordinate information of the request region of interest information.

더불어, 상기 최종 분석 단계(S600)는 각 프레임 별 설정한 예비 정보를 이용하여, 상기 동영상 데이터를 구성하는 프레임의 순서를 기준으로, 미리 설정된 소정 개수 이상(일 예를 들자면, 동영상 데이터에 설정되어 있는 fps를 기준으로 학생의 집중도가 최대치가 되는 시간에 해당하는 프레임 개수)의 프레임들이 동알한 분석 결과를 통해서 예비 정보를 설정할 경우, 해당하는 첫 프레임을 시작 프레임으로 하고, 해당하는 마지막 프레임을 끝 프레임으로 하여, 상기 관심 요구 영역 정보의 시간 정보를 생성하고, 해당하는 첫 프레임부터 마지막 프레임까지의 예비 정보를 중첩시켜, 상기 관심 요구 영역 정보의 좌표 정보를 생성하는 것이 바람직하다.In addition, in the final analysis step (S600), based on the order of frames constituting the video data, using preliminary information set for each frame, a predetermined number or more (for example, set in the video data If the preliminary information is set through the same analysis result of the number of frames corresponding to the time when the student's concentration is maximized based on the fps of the It is preferable to generate coordinate information of the request-of-interest information for a frame by generating time information of the request-of-interest information and overlapping preliminary information from the corresponding first frame to the last frame.

이 때, 상기 최종 분석 단계(S600)를 통해서, 다수의 프레임에 대한 예비 정보를 중첩시키는 것은, 단순히 해당하는 모든 프레임에 설정되어 있는 예비 정보를 중첩시킴으로써 나타나는 위치 영역의 좌표 정보를 분석하거나, 해당하는 모든 프레임에 설정되어 있는 예비 정보를 중첩시킨 후, 중심점을 기준으로 미리 설정된 소정 범위의 좌표 정보를 분석하게 된다. 이 때, 미리 설정된 소정 범위는 해당하는 예비 정보의 분석 결과를 토대로 설정되는 것이 바람직하다.At this time, through the final analysis step (S600), overlapping the preliminary information for a plurality of frames simply analyzes the coordinate information of the location area indicated by overlapping the preliminary information set in all corresponding frames, or After overlapping the preliminary information set in all the frames, the coordinate information of a predetermined range set in advance based on the center point is analyzed. At this time, it is preferable that the predetermined range is set based on the analysis result of the corresponding preliminary information.

이상과 같이 본 발명에서는 구체적인 구성 소자 등과 같은 특정 사항들과 한정된 실시예 도면에 의해 설명되었으나 이는 본 발명의 보다 전반적인 이해를 돕기 위해서 제공된 것 일 뿐, 본 발명은 상기의 일 실시예에 한정되는 것이 아니며, 본 발명이 속하는 분야에서 통상의 지식을 가진 자라면 이러한 기재로부터 다양한 수정 및 변형이 가능하다.As described above, the present invention has been described with specific details such as specific components and limited embodiment drawings, but this is only provided to help a more general understanding of the present invention, and the present invention is not limited to the above embodiment. No, and those skilled in the art to which the present invention pertains can make various modifications and variations from these descriptions.

따라서, 본 발명의 사상은 설명된 실시예에 국한되어 정해져서는 아니 되며, 후술하는 특허 청구 범위뿐 아니라 이 특허 청구 범위와 균등하거나 등가적 변형이 있는 모든 것들은 본 발명 사상의 범주에 속한다고 할 것이다.Therefore, the spirit of the present invention should not be limited to the described embodiments, and not only the scope of the claims described later, but also all modifications equivalent or equivalent to the scope of the claims belong to the scope of the scope of the present invention. .

Claims

Analysis of interest-requiring region information including at least one of time information of the start frame, coordinate information in the start frame, time information of the last frame, and coordinate information in the last frame requiring the attention of a user receiving video data a data input unit that receives desired video data;
In order to set the information on the required region of interest, the feature point information of the body region is recognized for each frame constituting the video data by using a pre-stored first analysis algorithm, and preliminary information, which is coordinate information for the recognized region, is extracted. A preliminary analysis unit to do; and
a final analysis unit generating information on a required region of interest for the video data using the preliminary information for each frame extracted by the preliminary analysis unit;
Including,
The preliminary analysis section
a first basic analysis unit for recognizing a human body within a corresponding frame by applying a pre-stored first analysis algorithm and analyzing information on the recognized human body feature points;
a first determination unit receiving an analysis result from the first basic analysis unit and determining whether the human body is recognized;
a second determination unit determining whether to analyze feature point information set as a face using an analysis result by the first basic analysis unit when a human body is recognized according to a determination result of the first determination unit; and
a first detailed analyzer configured to extract corresponding feature point information when the feature point information set as the face is analyzed according to the determination result of the second determiner;
A system for analyzing the area of interest in the video data, including a.

delete

According to claim 1,
The preliminary analysis section
a first setting unit that receives an extraction result from the first detailed analysis unit, analyzes location information of the facial region according to feature point information, and sets the analyzed location information as preliminary information;
Further comprising, the system for analyzing the area of interest within the video data.

According to claim 1,
The preliminary analysis section
a third determination unit receiving an extraction result from the first detailed analysis unit and determining whether the facial area according to the feature point information corresponds to the front portion of the facial area; and
According to the judgment result of the third determination unit, when the front part is applicable, the first detailed analysis unit receives the extraction result, analyzes the location information of the facial area according to the feature point information, and prepares the analyzed location information. a second setting unit for setting information;
Further comprising, the system for analyzing the area of interest within the video data.

According to claim 1,
The preliminary analysis section
When the feature point information set as the face is not analyzed according to the determination result of the second determination unit, using the analysis result by the first basic analysis unit, it is determined whether the feature point information set as the hand is analyzed. 4 judges; and
According to the determination result of the fourth determination unit, when the minutiae point information set by the hand is analyzed, the corresponding minutiae point information is extracted, the location information of the hand area according to the extracted minutiae point information is analyzed, and the analyzed location information is used as preliminary information. a third setting unit for setting;
Further comprising, the system for analyzing the area of interest within the video data.

According to claim 1,
The preliminary analysis section
a second basic analysis unit analyzing a region of change between a corresponding frame and a previous frame by applying a pre-stored second analysis algorithm when the human body is not recognized according to the determination result of the first determination unit; and
a fourth setting unit that receives the analysis result from the second basic analysis unit, analyzes location information of the change area, and sets the analyzed location information as preliminary information;
Further comprising, the system for analyzing the area of interest within the video data.

According to any one of claims 4 to 7,
The final analysis part
Using the preliminary information set for each frame,
When the preliminary information is set through the same analysis result in two or more consecutive frames based on the order of frames constituting the video data,
generating time information of the required region of interest information by using a corresponding first frame as a start frame and a corresponding last frame as an end frame;
A system for analyzing a required region of interest in video data, wherein the coordinate information of the required region of interest information is generated by overlapping corresponding preliminary information from a corresponding first frame to a corresponding last frame.

According to any one of claims 4 to 7,
The final analysis part
Using the preliminary information set for each frame,
When the preliminary information is set through the same analysis result of frames corresponding to a predetermined time based on the order of frames constituting the video data,
generating time information of the required region of interest information by using a corresponding first frame as a start frame and a corresponding last frame as an end frame;
A system for analyzing a required region of interest in video data, wherein the coordinate information of the required region of interest information is generated by overlapping corresponding preliminary information from a corresponding first frame to a corresponding last frame.

According to any one of claims 4 to 7,
The final analysis part
Using the preliminary information set for each frame,
When the preliminary information is set through the same analysis result in a predetermined number or more frames based on the order of frames constituting the video data,
generating time information of the required region of interest information by using a corresponding first frame as a start frame and a corresponding last frame as an end frame;
A system for analyzing a required region of interest in video data, wherein the coordinate information of the required region of interest information is generated by overlapping corresponding preliminary information from a corresponding first frame to a corresponding last frame.

As a method for analyzing a required region of interest using a system for analyzing a required region of interest in video data in which each step is performed by an arithmetic processing means,
In the data input unit, a region of interest request including at least one of time information of the start frame, coordinate information of the start frame, time information of the last frame, and coordinate information of the last frame requiring attention of a user receiving video data. A data input step of receiving video data for analysis of information (S100);
A basic analysis step of recognizing a human body for each frame constituting the video data in the data input step (S100) and analyzing information on the recognized human body feature points using a pre-stored first analysis algorithm in the preliminary analysis unit. (S200);
A first determination step (S300) of determining whether or not the human body is recognized in the frame using the analysis result of the basic analysis step (S200) in the preliminary analysis unit;
In the preliminary analysis unit, if the human body is recognized according to the determination result of the first determination step (S300), whether or not the feature point information set as the face is analyzed using the analysis result of the basic analysis step (S200). A second judgment step of determining (S400);
In the preliminary analysis unit, when the feature point information set as the face is analyzed according to the determination result of the second determination step (S400), the corresponding feature point information is extracted, and the coordinate information of the extracted feature point information is set as preliminary information. 1 setting step (S500); and
a final analysis step (S600) of generating, in a final analysis unit, information on a region requiring interest for the video data by using preliminary information set for each frame;
Including, a method for analyzing a region of interest in video data.

According to claim 11,
The first setting step (S500)
According to the determination result of the second determination step (S400), when the minutiae point information set as the face is analyzed, the corresponding minutiae point information is extracted, the location information of the facial area according to the minutiae point information is analyzed, and the analyzed location information is obtained. A method for analyzing the area of interest in video data, which is set as preliminary information.

According to claim 11,
The method of analyzing the region of interest in the video data
According to the determination result of the second determination step (S400), when the minutiae point information set as the face is analyzed, the corresponding minutiae point information is extracted to determine whether the facial region according to the minutiae point information corresponds to the front portion of the face region. Judging step (S510); and
According to the determination result of the additional determination step (S510), if the front part is applicable, a second setting step of analyzing the location information of the facial area according to the corresponding feature point information and setting the analyzed location information as preliminary information ( S520);
Further comprising, a method for analyzing a region of interest in video data.

According to claim 12,
The method of analyzing the region of interest in the video data
In the preliminary analysis unit, if the human body is not recognized according to the determination result of the first determination step (S300), additional analysis is performed to analyze the change area between the corresponding frame and the immediately preceding frame by applying the pre-stored second analysis algorithm. Step (S700); and
A third setting step (S800) of receiving the analysis result of the additional analysis step (S700), analyzing the location information of the change area, and setting the analyzed location information as preliminary information in the preliminary analysis unit;
Further comprising, a method for analyzing a region of interest in video data.

According to claim 14,
The method of analyzing the region of interest in the video data
In the preliminary analysis unit, when the feature point information set to face is not analyzed according to the determination result of the second determination step (S400), using the analysis result of the basic analysis step (S200), set to hand A third judgment step of determining whether feature point information is analyzed (S900); and
In the preliminary analysis unit, when the feature point information set by the hand is analyzed according to the determination result of the third determination step (S900), the corresponding feature point information is extracted, and position information of the hand region according to the feature point information is analyzed, A fourth setting step of setting the analyzed location information as preliminary information (S1000);
Further comprising, a method for analyzing a region of interest in video data.

According to claim 15,
The final analysis step (S600)
When the preliminary information is set through the same analysis result in two or more consecutive frames based on the order of the frames constituting the video data using the preliminary information set for each frame,
generating time information of the required region of interest information by using a corresponding first frame as a start frame and a corresponding last frame as an end frame;
A method of analyzing a required region of interest in moving picture data, wherein coordinate information of the required region of interest information is generated by overlapping corresponding preliminary information from a corresponding first frame to a corresponding last frame.

According to claim 15,
The final analysis step (S600)
When the preliminary information is set through the same analysis result of the frames corresponding to the predetermined time based on the order of the frames constituting the video data using the preliminary information set for each frame,
generating time information of the required region of interest information by using a corresponding first frame as a start frame and a corresponding last frame as an end frame;
A method of analyzing a required region of interest in moving picture data, wherein coordinate information of the required region of interest information is generated by overlapping corresponding preliminary information from a corresponding first frame to a corresponding last frame.

According to claim 15,
The final analysis step (S600)
Using the preliminary information set for each frame, based on the order of the frames constituting the video data, if the preliminary information is set through the same analysis result in a predetermined number of frames or more,
generating time information of the required region of interest information by using a corresponding first frame as a start frame and a corresponding last frame as an end frame;
A method of analyzing a required region of interest in moving picture data, wherein coordinate information of the required region of interest information is generated by overlapping corresponding preliminary information from a corresponding first frame to a corresponding last frame.