KR20230067445A

KR20230067445A - Method for analyzing user shot video, apparatus and program thereof

Info

Publication number: KR20230067445A
Application number: KR1020220019222A
Authority: KR
Inventors: 이상현; 양성훈; 오승진
Original assignee: 이상현; 양성훈; 오승진
Priority date: 2021-11-08
Filing date: 2022-02-15
Publication date: 2023-05-16

Abstract

Disclosed is a device for analyzing user captured images based on artificial intelligence. The device includes: a communication unit, an input unit, a display, one or more processors, and a memory electrically connected to the processor and storing at least one code executed by the processor. By providing the device, user's actions and facial expressions in the images can be effectively recognized.

Description

Method for analyzing user-photographed video based on artificial intelligence, device and program therefor

본 개시는 인공 지능 기반으로 사용자 촬영 영상을 분석하는 방법 및 장치에 관한 것으로, 보다 상세하게는, 사용자의 행동 및 표정에 기초하여 영상을 분석하는 방법 및 장치에 관한 것이다.The present disclosure relates to a method and apparatus for analyzing an image captured by a user based on artificial intelligence, and more particularly, to a method and apparatus for analyzing an image based on a user's behavior and expression.

기존 오픈소스로 이용할 수 있는 컴퓨터비전 관련 AI 기술에 크게 객체 추적, 행동 인식, 표정 인식 등이 있다. 기존 기술을 활용하면 영상에 등장하는 사람 객체를 특정 단위(프레임 k개) 별 행동 인식 결과와 표정 인식 결과를 얻어낼 수 있다.AI technologies related to computer vision that can be used as existing open sources include object tracking, action recognition, and facial expression recognition. Using existing technology, it is possible to obtain behavioral recognition results and facial expression recognition results for each specific unit (k frames) of a human object appearing in a video.

이는 특정 객체가 어떤 행동을 수행했는지, 어떤 표정을 가졌는지에 대한 결과를 알 수 있으나 해당 결과는 사람이 직접 영상을 보더라도 알 수 있는 정보이며 특별한 방식으로 영상을 분석해야 하는 사람들(상담사 등)에겐 전혀 도움이 되지 않는 정보이다.It is possible to know the result of what action a specific object performed and what expression it had, but the result is information that can be known even if a person directly watches the video, and it is not at all useful to those who need to analyze the video in a special way (such as counselors). This is unhelpful information.

또한 기존 기술에는 같은 영상 내에서 객체가 이탈 후 재등장하는 경우 같은 객체로 인식하지 못해 이어진 행동들이 서로 다른 객체가 수행한 것으로 해석되어 영상 분석에 차질을 빚는다.In addition, in the existing technology, when an object reappears after leaving the same image, it is not recognized as the same object, and subsequent actions are interpreted as being performed by different objects, which hinders image analysis.

객체 추적, 행동 인식, 표정 인식 기술을 연동하고 재식별(Re-ID) 기술을 활용하여 같은 영상 내에서 각 객체가 수행한 행동들과 가진 표정들을 모두 추적하여 모으고 그 결과들을 시계열에 따라 시각적으로 제시하는 방안이 필요하다.By interlocking object tracking, action recognition, and facial expression recognition technologies and utilizing Re-ID technology, all actions performed by each object and facial expressions in the same image are tracked and collected, and the results are visually displayed in time series. A proposal is needed.

등록특허공보 제10-2157313호, 2020.09.11.Registered Patent Publication No. 10-2157313, 2020.09.11.

본 발명에 개시된 실시 예는 영상 내 객체 인식과 관련된 컴퓨터 비전 AI 기술을 이용한 데이터 가공 방법 및 장치를 제공하는 데에 그 목적이 있다.Embodiments disclosed in the present invention are aimed at providing a data processing method and apparatus using computer vision AI technology related to object recognition in an image.

본 발명에 개시된 다른 실시 예는 사용자의 특정 행동 및 표정을 미리 확정하여, 영상에서 선별적으로 추출하는 방법을 제공하는 데에 그 목적이 있다.Another embodiment disclosed in the present invention aims to provide a method of selectively extracting a user's specific behavior and facial expression from an image by determining in advance.

본 발명에 개시된 다른 실시 예는, 사용자의 특정 행동 및 표정으로부터 도출 가능한 사용자 관심 키워드와 관련된 데이터를 생성하는 데에 그 목적이 있다.Another embodiment disclosed in the present invention aims to generate data related to a user interest keyword that can be derived from a user's specific behavior and facial expression.

본 발명이 해결하고자 하는 과제들은 이상에서 언급된 과제로 제한되지 않으며, 언급되지 않은 또 다른 과제들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.The problems to be solved by the present invention are not limited to the problems mentioned above, and other problems not mentioned will be clearly understood by those skilled in the art from the description below.

상술한 기술적 과제를 달성하기 위한 본 개시의 일 측면에 따른 인공 지능 기반으로 사용자 촬영 영상을 분석하는 방법은 사용자 촬영 영상이 입력되면 영상 내 사용자의 행동 및 표정을 인식하는 인식 모델이 출력할 라벨(lable)을 설정하는 단계, 사용자의 행동 및 표정 중 적어도 하나에 의해 도출 가능한 사용자 관심 키워드를 설정하는 단계, 분석 대상인 사용자 촬영 영상이 입력되면 사용자를 특정하는 단계, 상기 라벨에 기초하여, 상기 특정된 사용자의 행동 및 표정을 상기 인식 모델을 통해 인식하는 단계, 상기 인식된 사용자의 행동 및 표정에 기초하여 상기 사용자 관심 키워드와 관련된 데이터를 생성하는 단계 및 생성된 데이터를 디스플레이하는 단계를 포함할 수 있다.A method of analyzing a user-captured image based on artificial intelligence according to an aspect of the present disclosure for achieving the above-described technical problem is a label to be output by a recognition model that recognizes a user's behavior and expression in the image when the user-captured image is input ( label), setting a user interest keyword that can be derived from at least one of the user's behavior and facial expression, specifying the user when a user photographed image to be analyzed is input, based on the label, the specified Recognizing the user's behavior and facial expression through the recognition model, generating data related to the user's interest keyword based on the recognized user's behavior and facial expression, and displaying the generated data. .

상기 사용자 관심 키워드를 설정하는 단계는 미리 학습된 관심 키워드 추천 모델에 상기 설정된 라벨을 입력하여 사용자 관심 키워드를 출력하는 단계를 포함할 수 있다.Setting the user interest keyword may include outputting the user interest keyword by inputting the set label to a pre-learned interest keyword recommendation model.

상기 사용자를 특정하는 단계는 상기 입력된 사용자 촬영 영상에서 인식된 사용자 별로 고유 식별 정보를 매핑하는 단계를 포함하며, 상기 인식 모델을 통해 인식하는 단계는 상기 매핑된 고유 식별 정보에 기초하여, 사용자 각각의 행동 및 표정을 시간순으로 수집하는 단계를 포함할 수 있다.The step of specifying the user includes mapping unique identification information for each user recognized from the input user-captured image, and the step of recognizing through the recognition model includes mapping unique identification information to each user based on the mapped unique identification information. It may include collecting behaviors and facial expressions of the chronologically.

상기 데이터를 생성하는 단계는 상기 시간순으로 수집된 사용자 별 인식된 행동 및 표정에 기초하여, 사용자 관심 키워드와 관련된 데이터를 생성하는 단계를 포함할 수 있다.The generating of the data may include generating data related to a keyword of interest to the user based on the recognized behavior and expression of each user collected in the chronological order.

상기 데이터를 생성하는 단계는 미리 학습된 데이터 생성 모델을 이용하여 상기 사용자 관심 키워드와 관련된 데이터를 생성하는 단계를 포함할 수 있다.Generating the data may include generating data related to the keyword of interest to the user by using a pre-learned data generation model.

상기 디스플레이하는 단계는 상기 시간순으로 수집된 사용자 별 인식된 행동 및 표정을 시간 순으로 제공하는 단계를 포함할 수 있다.The displaying may include providing the recognized behaviors and facial expressions of each user collected in the chronological order in chronological order.

상기 디스플레이하는 단계는 상기 관심 키워드와 관련된 데이터를 사용자 별로 시간순으로 제공하는 단계를 포함할 수 있다.The displaying may include providing data related to the keyword of interest in chronological order for each user.

상기 디스플레이하는 단계는 사용자 간에 발생된 상황 또는 관계를 예측할 수 있는 사용자의 행동 또는 표정을 비교하여 디스플레이하는 단계를 포함할 수 있다.The displaying may include a step of comparing and displaying a user's action or facial expression that may predict a situation or relationship between users.

본 개시의 일 측면에 따른 인공 지능 기반으로 사용자 촬영 영상을 분석하는 장치는 통신부, 입력부, 디스플레이, 하나 이상의 프로세서 및 상기 프로세서와 전기적으로 연결되고, 상기 프로세서에서 수행되는 적어도 하나의 코드(Code)가 저장되는 메모리를 포함할 수 있다.An apparatus for analyzing a user-captured image based on artificial intelligence according to an aspect of the present disclosure is electrically connected to a communication unit, an input unit, a display, one or more processors, and the processor, and includes at least one code executed by the processor. It may include a memory to be stored.

상기 메모리는, 상기 프로세서를 통해 실행될 때, 상기 프로세서가, 상기 통신부 또는 입력부를 통해 사용자 촬영 영상을 획득하면 영상 내 사용자의 행동 및 표정을 인식하는 인식 모델이 출력할 라벨을 설정하고, 사용자의 행동 및 표정 중 적어도 하나에 의해 도출 가능한 사용자 관심 키워드를 설정하며, 분석 대상인 사용자 촬영 영상이 입력되면 사용자를 특정하고, 상기 라벨에 기초하여, 상기 특정된 사용자의 행동 및 표정을 상기 인식 모델을 통해 인식하며, 상기 인식된 사용자의 행동 및 표정에 기초하여 상기 사용자 관심 키워드와 관련된 데이터를 생성하고, 생성된 데이터를 상기 디스플레이를 통해 디스플레이하도록 야기하는 코드가 저장될 수 있다.The memory, when executed by the processor, sets a label to be output by a recognition model recognizing the user's behavior and expression in the image when the processor obtains a user-captured image through the communication unit or the input unit, and the user's behavior and set a user interest keyword derived by at least one of facial expressions, specify a user when a user-captured image to be analyzed is input, and recognize behaviors and facial expressions of the specified user through the recognition model based on the label. and a code that generates data related to the keyword of interest to the user based on the recognized behavior and expression of the user, and causes the generated data to be displayed through the display.

이 외에도, 본 개시를 구현하기 위한 실행하기 위한 컴퓨터 판독 가능한 기록 매체에 저장된 컴퓨터 프로그램이 더 제공될 수 있다.In addition to this, a computer program stored in a computer readable recording medium for execution to implement the present disclosure may be further provided.

이 외에도, 본 개시를 구현하기 위한 방법을 실행하기 위한 컴퓨터 프로그램을 기록하는 컴퓨터 판독 가능한 기록 매체가 더 제공될 수 있다.In addition to this, a computer readable recording medium recording a computer program for executing a method for implementing the present disclosure may be further provided.

상술한 과제를 해결하기 위한 본 발명의 일 면에 따른 영상 내 객체 인식과 관련된 컴퓨터 비전 AI 기술을 이용한 데이터 가공 방법에 있어서, 사용자 입력에 따라 행동 라벨 및 표정 라벨을 설정하는 단계, 상기 사용자 입력에 따라 고유 처리 논리를 구성하는 단계, 분석을 위한 영상을 획득하는 단계, 상기 영상 내의 객체를 추적하여 고유 ID를 부여하는 단계, 상기 객체의 행동 및 표정을 인식하는 단계, 상기 인식된 결과를 상기 고유 ID와 연동하는 단계, 상기 연동된 결과 및 상기 고유 처리 논리에 기초하여 데이터를 가공하는 단계 및 상기 가공된 결과를 시각화하는 단계를 포함한다.In the data processing method using computer vision AI technology related to object recognition in an image according to an aspect of the present invention for solving the above problems, setting a behavior label and a facial expression label according to a user input, Constructing a unique processing logic according to the method, acquiring an image for analysis, tracking an object in the image and assigning a unique ID, recognizing the behavior and expression of the object, and converting the recognized result to the unique ID. It includes interworking with ID, processing data based on the interworking result and the unique processing logic, and visualizing the processed result.

또한, 상기 인식 단계는, 상기 영상 내 각 객체의 시간에 따른 행동 및 표정 변화를 인식하는 것일 수 있다.In addition, the recognizing step may include recognizing behavior and facial expression changes of each object in the image over time.

본 개시의 전술한 과제 해결 수단에 의하면, 사용자의 특정 행동 및 표정이 미리 확정되어 추출될 수 있으며, 사용자의 특정 행동 및 표정으로부터 도출 가능한 사용자 관심 키워드와 관련된 데이터가 생성될 수 있어서, 사용자 편의가 제고될 수 있다.According to the above-mentioned problem solving means of the present disclosure, a user's specific behavior and facial expression can be determined and extracted in advance, and data related to a user's interest keyword that can be derived from the user's specific behavior and facial expression can be generated, so that user convenience is improved. can be improved.

또한, 영상 내에서 해당 객체가 시간에 따라 어떤 행동들을 수행했는지, 어떤 표정(감정)을 가졌었는지 분석이 가능하여 단순히 프레임에 따른 결과를 제시하는 것에 비해 영상의 성격 파악과 영상 내 객체에 대한 이해 및 분석을 용이하게 한다.In addition, it is possible to analyze what actions and expressions (emotions) the object performed over time in the video, so it is possible to understand the nature of the video and understand the object in the video, compared to simply presenting the results according to the frame. and facilitate analysis.

또한, 영상 분석에 필요한 정보들을 자신의 상황에 맞추어 가공할 수 있기 때문에 가공된 결과값들을 요약하여 시각적으로 확인하면 영상 전체를 검토할 필요 없이 해당 결과만을 확인함으로써 영상 분석을 직접 수행한 것과 같은 효과를 얻을 수 있다. 이를 통해 영상 분석에 소요되는 시간을 대폭 절감하고 업무 효율을 증대시킬 수 있다.In addition, since the information necessary for video analysis can be processed according to one's own situation, if the processed results are summarized and visually confirmed, only the corresponding result can be checked without the need to review the entire video, resulting in the same effect as performing video analysis directly. can be obtained. Through this, it is possible to drastically reduce the time required for image analysis and increase work efficiency.

또한, 영상 분석을 수행하는 사람이 AI가 출력하는 결과 라벨들을 취사 선택하고 특정 결과들을 조합하여 전후 및 인과 순서를 설정할 수 있다. 즉, 단순히 AI가 분석해낸 정보들만 이용하는 것이 아니라 사람이 직접 AI 분석 로직을 설정 및 조합하여 본인의 상황에 맞는 정보들을 능동적으로 이끌어낼 수 있다.In addition, the person performing the video analysis can select the result labels output by the AI and set the order of causal and causal by combining specific results. In other words, rather than simply using only the information analyzed by AI, a person can directly set and combine AI analysis logic to actively derive information suitable for his or her situation.

본 개시의 효과들은 이상에서 언급된 효과로 제한되지 않으며, 언급되지 않은 또 다른 효과들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.The effects of the present disclosure are not limited to the effects mentioned above, and other effects not mentioned will be clearly understood by those skilled in the art from the description below.

도 1은 본 발명의 일 실시 예에 따른 인공 지능 기반으로 사용자 촬영 영상을 분석하는 장치를 개략적으로 설명하기 위한 도면,
도 2는 본 발명의 일 실시 예에 따른 사용자 촬영 영상을 분석하는 장치의 구성을 나타내는 블록도,
도 3은 본 발명의 일 실시 예에 따른 인공 지능 기반으로 사용자 촬영 영상을 분석하는 방법을 나타내는 시퀀스도,
도 4는 본 발명의 일 실시 예에 따른 인식 모델의 프로세스를 설명하기 위한 도면,
도 5는 본 발명의 일 실시 예에 따른 키워드 도출 모델의 프로세스 및 데이터 생성 모델의 프로세스를 설명하기 위한 도면, 그리고,
도 6은 본 발명의 일 실시 예에 따른 사용자 별 고유 식별 번호를 매핑하여 출력하는 분석 장치를 설명하기 위한 도면이다.1 is a diagram for schematically explaining an apparatus for analyzing a user-captured image based on artificial intelligence according to an embodiment of the present invention;
2 is a block diagram showing the configuration of a device for analyzing a user-captured image according to an embodiment of the present invention;
3 is a sequence diagram illustrating a method of analyzing a user-captured image based on artificial intelligence according to an embodiment of the present invention;
4 is a diagram for explaining a process of a recognition model according to an embodiment of the present invention;
5 is a diagram for explaining a process of a keyword derivation model and a process of a data generation model according to an embodiment of the present invention;
6 is a diagram for explaining an analysis device that maps and outputs a unique identification number for each user according to an embodiment of the present invention.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나, 본 발명은 이하에서 개시되는 실시예들에 제한되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하고, 본 발명이 속하는 기술 분야의 통상의 기술자에게 본 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다. Advantages and features of the present invention, and methods of achieving them, will become clear with reference to the detailed description of the following embodiments taken in conjunction with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below, but may be implemented in various different forms, only these embodiments are intended to complete the disclosure of the present invention, and are common in the art to which the present invention belongs. It is provided to fully inform the person skilled in the art of the scope of the invention, and the invention is only defined by the scope of the claims.

본 명세서에서 사용된 용어는 실시예들을 설명하기 위한 것이며 본 발명을 제한하고자 하는 것은 아니다. 본 명세서에서, 단수형은 문구에서 특별히 언급하지 않는 한 복수형도 포함한다. 명세서에서 사용되는 "포함한다(comprises)" 및/또는 "포함하는(comprising)"은 언급된 구성요소 외에 하나 이상의 다른 구성요소의 존재 또는 추가를 배제하지 않는다. 명세서 전체에 걸쳐 동일한 도면 부호는 동일한 구성 요소를 지칭하며, "및/또는"은 언급된 구성요소들의 각각 및 하나 이상의 모든 조합을 포함한다. 비록 "제1", "제2" 등이 다양한 구성요소들을 서술하기 위해서 사용되나, 이들 구성요소들은 이들 용어에 의해 제한되지 않음은 물론이다. 이들 용어들은 단지 하나의 구성요소를 다른 구성요소와 구별하기 위하여 사용하는 것이다. 따라서, 이하에서 언급되는 제1 구성요소는 본 발명의 기술적 사상 내에서 제2 구성요소일 수도 있음은 물론이다.Terminology used herein is for describing the embodiments and is not intended to limit the present invention. In this specification, singular forms also include plural forms unless specifically stated otherwise in a phrase. As used herein, "comprises" and/or "comprising" does not exclude the presence or addition of one or more other elements other than the recited elements. Like reference numerals throughout the specification refer to like elements, and “and/or” includes each and every combination of one or more of the recited elements. Although "first", "second", etc. are used to describe various components, these components are not limited by these terms, of course. These terms are only used to distinguish one component from another. Accordingly, it goes without saying that the first element mentioned below may also be the second element within the technical spirit of the present invention.

다른 정의가 없다면, 본 명세서에서 사용되는 모든 용어(기술 및 과학적 용어를 포함)는 본 발명이 속하는 기술분야의 통상의 기술자에게 공통적으로 이해될 수 있는 의미로 사용될 수 있을 것이다. 또한, 일반적으로 사용되는 사전에 정의되어 있는 용어들은 명백하게 특별히 정의되어 있지 않는 한 이상적으로 또는 과도하게 해석되지 않는다.Unless otherwise defined, all terms (including technical and scientific terms) used in this specification may be used with meanings commonly understood by those skilled in the art to which the present invention belongs. In addition, terms defined in commonly used dictionaries are not interpreted ideally or excessively unless explicitly specifically defined.

명세서에서 사용되는 "부" 또는 “모듈”이라는 용어는 소프트웨어, FPGA 또는 ASIC과 같은 하드웨어 구성요소를 의미하며, "부" 또는 “모듈”은 어떤 역할들을 수행한다. 그렇지만 "부" 또는 “모듈”은 소프트웨어 또는 하드웨어에 한정되는 의미는 아니다. "부" 또는 “모듈”은 어드레싱할 수 있는 저장 매체에 있도록 구성될 수도 있고 하나 또는 그 이상의 프로세서들을 재생시키도록 구성될 수도 있다. 따라서, 일 예로서 "부" 또는 “모듈”은 소프트웨어 구성요소들, 객체지향 소프트웨어 구성요소들, 클래스 구성요소들 및 태스크 구성요소들과 같은 구성요소들과, 프로세스들, 함수들, 속성들, 프로시저들, 서브루틴들, 프로그램 코드의 세그먼트들, 드라이버들, 펌웨어, 마이크로 코드, 회로, 데이터, 데이터베이스, 데이터구조들, 테이블들, 어레이들 및 변수들을 포함한다. 구성요소들과 "부" 또는 “모듈”들 안에서 제공되는 기능은 더 작은 수의 구성요소들 및 "부" 또는 “모듈”들로 결합되거나 추가적인 구성요소들과 "부" 또는 “모듈”들로 더 분리될 수 있다.The term "unit" or "module" used in the specification means a hardware component such as software, FPGA or ASIC, and "unit" or "module" performs certain roles. However, "unit" or "module" is not meant to be limited to software or hardware. A “unit” or “module” may be configured to reside in an addressable storage medium and may be configured to reproduce one or more processors. Thus, as an example, a “unit” or “module” may refer to components such as software components, object-oriented software components, class components, and task components, processes, functions, properties, procedures, subroutines, segments of program code, drivers, firmware, microcode, circuitry, data, databases, datastructures, tables, arrays and variables. Functions provided within components and "units" or "modules" may be combined into smaller numbers of components and "units" or "modules" or may be combined into additional components and "units" or "modules". can be further separated.

이하, 첨부된 도면을 참조하여 본 발명의 실시예를 상세하게 설명한다. Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시 예에 따른 인공 지능 기반으로 사용자 촬영 영상을 분석하는 장치(100, 이하, “분석 장치”로 칭함)를 개략적으로 설명하기 위한 도면이다.1 is a diagram schematically illustrating an apparatus 100 (hereinafter, referred to as “analysis apparatus”) for analyzing a user-captured image based on artificial intelligence according to an embodiment of the present invention.

분석 장치(100)는 다양한 인공 지능 알고리즘을 이용하여 다양한 장소에서 촬영된 사용자 촬영 영상을 분석할 수 있다. 가령, 분석 장치(100)는 소정 장소(PL, 가령 어린이집)에서 촬영된 촬영 영상(10)을 분석할 수 있는데, 분석 장치(100)는 촬영 영상에서 사용자(U1, U2)를 특정할 수 있으며, 사용자의 행동 및/또는 표정을 인식할 수 있으며, 사용자의 행동 및/또는 표정에 기초하여 사용자의 감정 상태를 추정할 수 있다.The analysis device 100 may analyze user-captured images captured in various places using various artificial intelligence algorithms. For example, the analysis device 100 may analyze a captured image 10 taken at a predetermined place (PL, for example, a daycare center), and the analysis device 100 may identify users U1 and U2 in the captured image, , the user's action and/or expression may be recognized, and the user's emotional state may be estimated based on the user's action and/or expression.

도 1에 도시된 바와 같이, 분석 장치(100)는 특정된 제1 사용자(U1) 및 제2 사용자(U2)의 행동(때리는 행동, 맞는 행동)을 인식할 수 있으며, 제2 사용자(U2)의 표정이 일그러지고 눈물을 흘리는 표정을 인식할 수 있고, 더 나아가, 행동 및 표정에 기초하여, 각 사용자(U1, U2)의 감정 상태를 추정할 수 있다.As shown in FIG. 1 , the analysis device 100 may recognize specific behaviors (a hitting behavior, a hitting behavior) of the first user U1 and the second user U2, and the second user U2 A distorted facial expression and a tearful facial expression may be recognized, and furthermore, the emotional state of each user U1 or U2 may be estimated based on the behavior and facial expression.

여기서, 카메라(C1)는 소정 장소(PL, 가령, 어린이집)의 촬영 허가 범위를 촬영할 수 있으며, 분석 장치(100)의 구성 요소로 포함될 수 있으나, 실시 예가 이에 한정되는 것은 아니다.Here, the camera C1 may take a picture of a shooting permitted range of a predetermined place (PL, for example, a daycare center), and may be included as a component of the analysis device 100, but the embodiment is not limited thereto.

살핀 바와 같이, 분석 장치(100)는 사용자 촬영 영상에서 사용자를 특정하고, 특정된 사용자의 행동 및/또는 표정을 인식할 수 있으며, 사용자의 감정 상태를 추정할 수 있다.As shown, the analysis device 100 may identify a user in a user-captured image, recognize a behavior and/or facial expression of the specified user, and estimate an emotional state of the user.

상술한 분석 장치(100)는 연산처리를 수행하여 사용자에게 결과를 제공할 수 있는 다양한 장치들이 모두 포함될 수 있다. 예를 들어, 분석 장치(100)는 컴퓨터, 서버 장치 및 휴대용 단말기를 모두 포함하도록 구현되거나, 또는 어느 하나의 형태로 구현될 수 있다.The above-described analysis device 100 may include all of various devices capable of providing results to users by performing calculation processing. For example, the analysis device 100 may be implemented to include all of a computer, a server device, and a portable terminal, or may be implemented in any one form.

여기에서, 상기 컴퓨터는 예를 들어, 웹 브라우저(WEB Browser)가 탑재된 노트북, 데스크톱(desktop), 랩톱(laptop), 태블릿 PC, 슬레이트 PC 등을 포함할 수 있다.Here, the computer may include, for example, a laptop computer, a desktop computer, a laptop computer, a tablet PC, a slate PC, and the like equipped with a web browser.

상기 서버 장치는 외부 장치와 통신을 수행하여 정보를 처리하는 서버로써, 애플리케이션 서버, 컴퓨팅 서버, 데이터베이스 서버, 파일 서버, 게임 서버, 메일 서버, 프록시 서버 및 웹 서버 등을 포함할 수 있다.The server device is a server that processes information by communicating with an external device, and may include an application server, a computing server, a database server, a file server, a game server, a mail server, a proxy server, and a web server.

상기 휴대용 단말기는 예를 들어, 휴대성과 이동성이 보장되는 무선 통신 장치로서, PCS(Personal Communication System), GSM(Global System for Mobile communications), PDC(Personal Digital Cellular), PHS(Personal Handyphone System), PDA(Personal Digital Assistant), IMT(International Mobile Telecommunication)-2000, CDMA(Code Division Multiple Access)-2000, W-CDMA(W-Code Division Multiple Access), WiBro(Wireless Broadband Internet) 단말, 스마트 폰(Smart Phone) 등과 같은 모든 종류의 핸드헬드(Handheld) 기반의 무선 통신 장치와 시계, 반지, 팔찌, 발찌, 목걸이, 안경, 콘택트 렌즈, 또는 머리 착용형 장치(head-mounted-device(HMD) 등과 같은 웨어러블 장치를 포함할 수 있다.The portable terminal is, for example, a wireless communication device that ensures portability and mobility, and includes a Personal Communication System (PCS), a Global System for Mobile communications (GSM), a Personal Digital Cellular (PDC), a Personal Handyphone System (PHS), and a PDA. (Personal Digital Assistant), IMT (International Mobile Telecommunication)-2000, CDMA (Code Division Multiple Access)-2000, W-CDMA (W-Code Division Multiple Access), WiBro (Wireless Broadband Internet) terminal, smart phone ) and wearable devices such as watches, rings, bracelets, anklets, necklaces, glasses, contact lenses, or head-mounted-devices (HMDs). can include

도 2는 본 발명의 일 실시 예에 따른 분석 장치(100)의 구성을 나타내는 블록도이다.2 is a block diagram showing the configuration of an analysis device 100 according to an embodiment of the present invention.

분석 장치(100)는 통신부(110), 입력부(120), 디스플레이(130), 메모리(150) 및 제어부(190)를 포함할 수 있다. 도 2에 도시된 구성요소들은 본 개시에 따른 분석 장치(100)를 구현하는데 있어서 필수적인 것은 아니어서, 본 명세서 상에서 설명되는 분석 장치(100)는 위에서 열거된 구성요소들 보다 많거나, 또는 적은 구성요소들을 가질 수 있다.The analysis device 100 may include a communication unit 110, an input unit 120, a display 130, a memory 150, and a controller 190. The components shown in FIG. 2 are not essential to implement the analysis device 100 according to the present disclosure, so the analysis device 100 described in this specification has more or fewer components than the components listed above. can have elements.

통신부(110)는 외부 장치와 통신을 가능하게 하는 하나 이상의 구성 요소를 포함할 수 있으며, 예를 들어, 방송 수신 모듈, 유선통신 모듈, 무선통신 모듈, 근거리 통신 모듈, 위치정보 모듈 중 적어도 하나를 포함할 수 있다.The communication unit 110 may include one or more components enabling communication with an external device, and for example, at least one of a broadcast reception module, a wired communication module, a wireless communication module, a short-distance communication module, and a location information module. can include

입력부(120)는 영상 정보(또는 신호), 오디오 정보(또는 신호), 데이터, 또는 사용자로부터 입력되는 정보의 입력을 위한 것으로서, 적어도 하나의 카메라, 적어도 하나의 마이크로폰 및 사용자 입력부 중 적어도 하나를 포함할 수 있다. 입력부(120)에서 수집한 음성 데이터나 이미지 데이터는 분석되어 사용자의 제어명령으로 처리될 수 있다.The input unit 120 is for inputting image information (or signals), audio information (or signals), data, or information input from a user, and includes at least one of at least one camera, at least one microphone, and a user input unit. can do. Voice data or image data collected by the input unit 120 may be analyzed and processed as a user's control command.

카메라는 촬영 모드에서 이미지 센서에 의해 얻어지는 정지영상 또는 동영상 등의 화상 프레임을 처리한다. 처리된 화상 프레임은 디스플레이부(또는 본 개시의 분석 장치의 화면)에 표시되거나 메모리에 저장될 수 있다. A camera processes an image frame such as a still image or a moving image obtained by an image sensor in a photographing mode. The processed image frame may be displayed on a display unit (or a screen of the analysis device of the present disclosure) or stored in a memory.

한편, 상기 카메라가 복수개일 경우, 매트릭스 구조를 이루도록 배치될 수 있으며, 이와 같이 매트릭스 구조를 이루는 카메라들을 통해 다양한 각도 또는 초점을 갖는 복수의 영상정보가 입력될 수 있고, 또한 상기 카메라들은 3차원의 입체영상을 구현하기 위한 좌 영상 및 우 영상을 획득하도록, 스트레오 구조로 배치될 수도 있다.On the other hand, when there are a plurality of cameras, they can be arranged to form a matrix structure, and a plurality of image information having various angles or focal points can be input through the cameras forming the matrix structure, and the cameras form a three-dimensional image. It may be arranged in a stereo structure so as to obtain left and right images for realizing a stereoscopic image.

마이크로폰은 외부의 음향 신호를 전기적인 음성 데이터로 처리한다. 처리된 음성 데이터는 본 장치에서 수행 중인 기능(또는 실행 중인 응용 프로그램)에 따라 다양하게 활용될 수 있다. 한편, 마이크로폰에는 외부의 음향 신호를 입력 받는 과정에서 발생되는 잡음(noise)을 제거하기 위한 다양한 잡음 제거 알고리즘이 구현될 수 있다.A microphone processes an external acoustic signal into electrical voice data. The processed voice data can be used in various ways according to the function (or application program being executed) being performed in the present device. Meanwhile, various noise cancellation algorithms for removing noise generated in the process of receiving an external sound signal may be implemented in the microphone.

사용자 입력부는 사용자로부터 정보를 입력받기 위한 것으로서, 사용자 입력부를 통해 정보가 입력되면, 제어부는 입력된 정보에 대응되도록 본 장치의 동작을 제어할 수 있다. 이러한, 사용자 입력부는 하드웨어식 물리 키(예를 들어, 본 장치의 전면, 후면 및 측면 중 적어도 하나에 위치하는 버튼, 돔 스위치 (dome switch), 조그 휠, 조그 스위치 등) 및 소프트웨어식 터치 키를 포함할 수 있다. 일 예로서, 터치 키는, 소프트웨어적인 처리를 통해 터치스크린 타입의 디스플레이부 상에 표시되는 가상 키(virtual key), 소프트 키(soft key) 또는 비주얼 키(visual key)로 이루어지거나, 상기 터치스크린 이외의 부분에 배치되는 터치 키(touch key)로 이루어질 수 있다. 한편, 상기 가상키 또는 비주얼 키는, 다양한 형태를 가지면서 터치스크린 상에 표시되는 것이 가능하며, 예를 들어, 그래픽(graphic), 텍스트(text), 아이콘(icon), 비디오(video) 또는 이들의 조합으로 이루어질 수 있다. The user input unit is for receiving information from a user, and when information is input through the user input unit, the control unit can control the operation of the device to correspond to the input information. The user input unit includes hardware physical keys (for example, a button located on at least one of the front, rear, and side surfaces of the device, a dome switch, a jog wheel, a jog switch, etc.) and a software touch key. can include As an example, the touch key is composed of a virtual key, soft key, or visual key displayed on a touch screen type display unit through software processing, or the touch screen It may be made of a touch key (touch key) disposed in a part other than the part. On the other hand, the virtual key or visual key can be displayed on the touch screen while having various forms, for example, graphic (graphic), text (text), icon (icon), video (video) or these can be made of a combination of

디스플레이부(130)는 본 장치에서 처리되는 정보를 표시(출력)한다. 예를 들어, 디스플레이부는 본 장치에서 구동되는 응용 프로그램(일 예로, 어플리케이션)의 실행화면 정보, 또는 이러한 실행화면 정보에 따른 UI(User Interface), GUI(Graphic User Interface) 정보를 표시할 수 있다. The display unit 130 displays (outputs) information processed by the present device. For example, the display unit may display execution screen information of an application program (eg, application) driven in the present device, or UI (User Interface) and GUI (Graphic User Interface) information according to such execution screen information.

메모리(150)는 본 장치의 다양한 기능을 지원하는 데이터와, 제어부의 동작을 위한 프로그램을 저장할 수 있고, 입/출력되는 데이터들(예를 들어, 음악 파일, 정지영상, 동영상 등)을 저장할 있고, 본 장치에서 구동되는 다수의 응용 프로그램(application program 또는 애플리케이션(application)), 본 장치의 동작을 위한 데이터들, 명령어들을 저장할 수 있다. 이러한 응용 프로그램 중 적어도 일부는, 무선 통신을 통해 외부 서버로부터 다운로드 될 수 있다.The memory 150 may store data supporting various functions of the device and programs for operation of the control unit, and may store input/output data (eg, music files, still images, moving images, etc.) , a plurality of application programs (application programs or applications) running in the present device, data for the operation of the present device, and instructions can be stored. At least some of these application programs may be downloaded from an external server through wireless communication.

메모리(150)는 인공 지능 기반의 인식 모델(M1), 키워드 도출 모델(M2) 및 데이터 생성 모델(M3) 등을 포함할 수 있다.The memory 150 may include an artificial intelligence-based recognition model M1, a keyword derivation model M2, and a data generation model M3.

이러한, 메모리(150)는 플래시 메모리 타입(flash memory type), 하드디스크 타입(hard disk type), SSD 타입(Solid State Disk type), SDD 타입(Silicon Disk Drive type), 멀티미디어 카드 마이크로 타입(multimedia card micro type), 카드 타입의 메모리(예를 들어 SD 또는 XD 메모리 등), 램(random access memory; RAM), SRAM(static random access memory), 롬(read-only memory; ROM), EEPROM(electrically erasable programmable read-only memory), PROM(programmable read-only memory), 자기 메모리, 자기 디스크 및 광디스크 중 적어도 하나의 타입의 저장매체를 포함할 수 있다. 또한, 메모리(150)는 본 장치와는 분리되어 있으나, 유선 또는 무선으로 연결된 데이터베이스가 될 수도 있다.The memory 150 may be a flash memory type, a hard disk type, a solid state disk type, a silicon disk drive type, or a multimedia card micro type. micro type), card type memory (eg SD or XD memory, etc.), random access memory (RAM), static random access memory (SRAM), read-only memory (ROM), electrically erasable EEPROM (EEPROM) It may include a storage medium of at least one type of a programmable read-only memory (PROM), a programmable read-only memory (PROM), a magnetic memory, a magnetic disk, and an optical disk. In addition, the memory 150 is separate from the present device, but may be a database connected by wire or wirelessly.

제어부(190)는 본 장치 내의 구성요소들의 동작을 제어하기 위한 알고리즘 또는 알고리즘을 재현한 프로그램에 대한 데이터를 저장하는 메모리(150), 및 메모리(150)에 저장된 데이터를 이용하여 전술한 동작을 수행하는 적어도 하나의 프로세서(미도시)로 구현될 수 있다. 프로세서는 단수로 기재되어도 복수로 구현될 수 있다. 메모리(150)는 프로세서와 전기적으로 연결되고, 프로세서에서 수행되는 적어도 하나의 코드(Code)가 저장될 수 있다. 이때, 메모리와 프로세서는 각각 별개의 칩으로 구현될 수 있다. 또는, 메모리와 프로세서는 단일 칩으로 구현될 수도 있다.The control unit 190 performs the above-described operation using the memory 150 that stores data for an algorithm or a program that reproduces the algorithm for controlling the operation of the components in the device, and the data stored in the memory 150. It may be implemented with at least one processor (not shown). Processors may be implemented in plural even when described in the singular. The memory 150 may be electrically connected to the processor and store at least one code executed by the processor. In this case, the memory and the processor may be implemented as separate chips. Alternatively, the memory and the processor may be implemented as a single chip.

또한, 제어부(190)는 이하의 도 3 내지 도 6에서 설명되는 본 개시에 따른 다양한 실시 예들을 본 장치 상에서 구현하기 위하여, 위에서 살펴본 구성요소들을 중 어느 하나 또는 복수를 조합하여 제어할 수 있다.In addition, the control unit 190 may control any one or a combination of the components described above in order to implement various embodiments according to the present disclosure described in FIGS. 3 to 6 below on the present device.

도 3은 본 발명의 일 실시 예에 따른 인공 지능 기반으로 사용자 촬영 영상을 분석하는 방법을 나타내는 시퀀스도이다. 도 4는 본 발명의 일 실시 예에 따른 인식 모델(M1)의 프로세스를 설명하기 위한 도면이고, 도 5는 본 발명의 일 실시 예에 따른 키워드 도출 모델(M2)의 프로세스 및 데이터 생성 모델(M3)의 프로세스를 설명하기 이한 도면이며, 도 6은 본 발명의 일 실시 예에 따른 사용자 별 고유 식별 번호를 매핑하여 출력하는 분석 장치(100)를 설명하기 위한 도면이다. 도 3을 설명하면서, 도 4 내지 도 6의 내용을 함께 참고하여 설명하기로 한다.3 is a sequence diagram illustrating a method of analyzing a user-captured image based on artificial intelligence according to an embodiment of the present invention. 4 is a diagram for explaining a process of a recognition model M1 according to an embodiment of the present invention, and FIG. 5 is a process of a keyword derivation model M2 and a data generation model M3 according to an embodiment of the present invention. ) is a diagram for explaining the process, and FIG. 6 is a diagram for explaining the analysis device 100 that maps and outputs a unique identification number for each user according to an embodiment of the present invention. 3 will be described with reference to the contents of FIGS. 4 to 6 together.

먼저, 제어부(190)는 사용자 촬영 영상이 입력되면 영상 내 사용자의 행동 및 표정을 인식하는 인식 모델(M1)이 출력할 라벨을 세팅할 수 있다(S310).First, when a user-captured image is input, the controller 190 may set a label to be output by the recognition model M1 recognizing the user's behavior and expression in the image (S310).

여기서, 라벨은 인공 지능 파이프라인이 출력하는 결과값 중에서 원하는 결과만을 출력하기 위해 설정되는 것이며, 인식 모델(M1)이 실제 출력하는 값 및 네트워크의 중간 출력 값일 수 있으나, 실시 예가 이에 한정되는 것은 아니다.Here, the label is set to output only desired results among result values output by the artificial intelligence pipeline, and may be a value actually output by the recognition model M1 and an intermediate output value of the network, but the embodiment is not limited thereto. .

인식 모델(M1)은 사용자의 행동 중에서 특정 행동(가령, 앉기, 걷기, 뛰기, 일어서기, 눕기, 손 또는 발의 특정 움직임, 고객 숙이기, 방향 전환 등)을 인식하도록 라벨을 세팅할 수 있으며, 사용자의 표정 중에서 특정 표정(가령, 웃는 표정, 우는표정, 깜찍한 표정, 눈물 흘림, 눈 깜빡 등을 포함한 다양한 얼굴의 표정)을 인식하도록 라벨을 세팅할 수 있다. 특정 행동 및 특정 표정은 실시 예에 따라 다양할 수 있다. 아울러, 인식 모델(M1)은 인식된 행동 및/또는 표정에 기초하여, 사용자의 감정 상태를 추정할 수 있다.The recognition model M1 may set a label to recognize a specific action (eg, sitting, walking, running, standing, lying down, specific movement of a hand or foot, bending a customer, changing direction, etc.) among user actions. A label may be set to recognize a specific facial expression (eg, various facial expressions including a smiling expression, a crying expression, a cute expression, shedding tears, and eye blinking) among facial expressions. A specific action and a specific facial expression may vary according to embodiments. In addition, the recognition model M1 may estimate the user's emotional state based on the recognized behavior and/or facial expression.

도 4를 참고하면, 인식 모델(M1)은 사용자 촬영 영상(410)을 입력받아, 사용자 행동의 경우 행동을 인식하고(420), 표정의 경우 표정을 인식하며(430), 이에 기반하여 감정을 추정할 수 있다(430). 다만, 인식 모델(M1)은 행동 및/또는 표정을 인식하기에 앞서, 먼저 사용자를 특정 및 인식할 수 있다.Referring to FIG. 4 , the recognition model M1 receives a user-captured image 410, recognizes the action in the case of a user action (420), and recognizes a facial expression in the case of a facial expression (430), and based on this recognizes the emotion. It can be estimated (430). However, the recognition model M1 may first identify and recognize a user before recognizing a behavior and/or facial expression.

선택적 실시 예로, 인식 모델(M1)은 복수의 신경망 기반의 네트워크를 포함할 수 있는데, 제1 네트워크는 사용자의 행동을 인식하기 위한 네트워크, 제2 네트워크는 사용자의 표정을 인식하기 위한 네트워크, 제3 네트워크는 제1 네트워크와 네트워크로부터 수집되는 정보에 기초하여, 사용자의 감정 상태를 추정하기 위한 네트워크를 일 수 있다. 상기 복수의 네트워크는 모두 지도 학습 기반으로 학습될 수 있으며, 그라운드 트루스 기반으로 소정의 목표에 도달하기까지 반복 학습될 수 있다.As an optional embodiment, the recognition model M1 may include a plurality of networks based on neural networks. A first network is a network for recognizing a user's behavior, a second network is a network for recognizing a user's facial expression, and a third network is a network for recognizing a user's facial expression. The network may be a network for estimating a user's emotional state based on the first network and information collected from the network. All of the plurality of networks may be trained based on supervised learning, and may be repeatedly learned based on ground truth until a predetermined goal is reached.

도 3의 S310 단계 이후, 제어부(190)는 사용자의 행동 및 표정 중 적어도 하나에 의해 도출 가능한 사용자 관심 키워드를 설정할 수 있다(S320).After step S310 of FIG. 3 , the controller 190 may set a keyword of user interest that can be derived by at least one of the user's behavior and expression (S320).

사용자 관심 키워드는 사용자의 행동 및 표정 중 적어도 하나에 의해 도출 가능한 사용자 별 고유 처리 논리에 해당될 수 있으며, 가령, 침착함(특정 표정 또는 감정) 및 앉기(특정 행동)가 소정의 시퀀스에 따라 반복되는 경우, 사용자 관심 키워드가 “집중” 이라 설정될 수 있으나, 실시 예가 이에 한정되지 않고 다양할 수 있다.The user interest keyword may correspond to a unique processing logic for each user that can be derived from at least one of the user's behavior and facial expression, and for example, calm (specific expression or emotion) and sitting (specific behavior) are repeated according to a predetermined sequence. In this case, the user interest keyword may be set to “focus”, but the embodiment is not limited thereto and may be diverse.

아울러, 사용자 관심 키워드는 분석 장치(100)의 키워드 도출 모델(M2)을 이용해서도 설정될 수 있으며, 도 5를 참고하여 설명하기로 한다.In addition, the keyword of interest to the user may be set using the keyword derivation model M2 of the analysis device 100, and will be described with reference to FIG. 5.

키워드 도출 모델(M2)은 인식된 행동(510) 및 인식된 표정(520)을 입력받아 사용자 관심 키워드를 도출할 수 있다. 키워드 도출 모델(M2)은 특정 행동의 반복, 특정 표정의 반복, 특정 행동 및/또는 특정 표정의 반복 시퀀스 등에 기초하여 사용자 관심 키워드를 설정할 수 있다.The keyword derivation model M2 may derive a user interest keyword by receiving the recognized behavior 510 and the recognized facial expression 520 . The keyword derivation model M2 may set a keyword of interest to the user based on repetition of a specific action, repetition of a specific facial expression, repetition sequence of a specific action and/or specific facial expression, and the like.

상기 사용자 관심 키워드는 어떤 행동에 어떤 표정이 이어지면 바로 도출될 수 있는 형태의 용어일 수 있으며, 사용자 별 고유 처리 논리에 해당될 수 있다.The user interest keyword may be a term that can be immediately derived when a certain action is followed by a certain facial expression, and may correspond to a unique processing logic for each user.

즉, 키워드 도출 모델(M2)은 미리 학습될 수 있으며, 설정된 라벨이 입력되면 사용자 관심 키워드를 출력할 수 있다.That is, the keyword derivation model M2 may be trained in advance, and may output a user interest keyword when a set label is input.

도 3의 S320 단계 이후, 제어부(190)는 분석 대상인 사용자 촬영 영상이 입력되면 사용자를 특정할 수 있다(S330).After step S320 of FIG. 3 , the controller 190 may specify a user when a user photographed image as an analysis target is input (S330).

이때, 제어부(190)는 상기 입력된 사용자 촬영 영상에서 인식된 사용자 별로 고유 식별 정보를 매핑할 수 있는데, 도 6을 참고하여 설명하기로 한다.At this time, the controller 190 may map unique identification information for each user recognized from the input user-captured image, which will be described with reference to FIG. 6 .

도 6에 도시된 바와 같이, 제어부(190)는 사용자 촬영 영상에서 사용자들(U5~U7)에 대응하는 고유 식별 번호(ID5~ID7)을 각각 매핑하여 출력할 수 있다.As shown in FIG. 6 , the controller 190 may map and output unique identification numbers ID5 to ID7 corresponding to the users U5 to U7 in the user-captured image.

제어부(190)는 사용자가 촬영 영상에서 스크린 아웃되었다가 스크린 인되는 경우도 인식할 수 있으며, 사용자의 용모가 소정 범위 내에서 바뀌더라도 사용자를 특정할 수 있다.The controller 190 may also recognize a case where a user is screened out of a captured image and then screened in, and may identify a user even if the user's appearance changes within a predetermined range.

도 3의 S330 단계 이후, 제어부(190)는 상술한 라벨에 기초하여, 특정된 사용자의 행동 및 표정을 인식 모델(M1)을 통해 인식할 수 있다(S340).After step S330 of FIG. 3 , the controller 190 may recognize the behavior and expression of the specified user through the recognition model M1 based on the above-described label (S340).

인식 모델(M1)은 매핑된 고유 식별 정보에 기초하여, 사용자 각각의 행동 및 표정을 시간순으로 수집할 수 있으며, 해당 영상을 시간순으로 저장할 수 있다.The recognition model M1 may collect each user's behavior and expression in chronological order based on the mapped unique identification information, and may store corresponding images in chronological order.

도 3의 S340 단계 이후에, 제어부(190)는 인식된 사용자의 행동 및 표정에 기초하여 사용자 관심 키워드와 관련된 데이터를 생성할 수 있다(S350).After step S340 of FIG. 3 , the controller 190 may generate data related to a keyword of interest to the user based on the recognized behavior and expression of the user (S350).

제어부(190)는 시간순으로 수집된 사용자 별 인식된 행동 및 표정에 기초하여, 사용자 관심 키워드와 관련된 데이터를 생성할 수 있다. 제어부(190)는 프레임 혹은 정해진 단위 별로 사용자가 설정한 고유 처리 논리에 따라 새로운 결과를 산출하여 메모리(150)에 저장할 수 있다.The controller 190 may generate data related to a keyword of interest to the user based on the recognized behavior and expression of each user collected in chronological order. The controller 190 may calculate and store a new result in the memory 150 according to the unique processing logic set by the user for each frame or predetermined unit.

또한, 제어부(190)는 미리 학습된 데이터 생성 모델(M3)을 이용하여 상기 사용자 관심 키워드와 관련된 데이터를 생성할 수 있으며, 도 5를 참고하여 설명하기로 한다.In addition, the controller 190 may generate data related to the keyword of interest to the user by using a pre-learned data generation model M3, which will be described with reference to FIG. 5 .

도 5를 참고하면, 데이터 생성 모델(M3)은 사용자 관심 키워드를 입력받아 데이터를 생성(530)할 수 있다.Referring to FIG. 5 , the data generation model M3 may receive a user interest keyword and generate data (530).

도 3의 S350 단계 이후, 제어부(190)는 디스플레이(130)를 통해 생성된 데이터를 디스플레이할 수 있다.After step S350 of FIG. 3 , the controller 190 may display the generated data through the display 130 .

제어부(190)는 시간순으로 수집된 사용자 별 인식된 행동 및 표정을 시간 순으로 제공할 수 있다.The controller 190 may provide the recognized behaviors and facial expressions of each user collected in chronological order in chronological order.

제어부(190)는 관심 키워드와 관련된 데이터를 사용자 별로 시간순으로 제공할 수 있다.The controller 190 may provide data related to the keyword of interest for each user in chronological order.

제어부(190)는 사용자 간에 발생된 상황 또는 관계를 예측할 수 있는 사용자의 행동 또는 표정을 비교하여 디스플레이(130)에 출력할 수 있다.The controller 190 may compare the user's action or facial expression, which may predict a situation or relationship between users, and output the comparison to the display 130 .

한편, 상술한 분석 장치(100)는 사람이 등장하는 영상을 AI가 분석하여 해당 영상 내에 특정 객체가 시간에 따라 수행한 행동과 표정(감정)을 종합하고 이를 시계열에 따라 분석한 결과를 시각적으로 표현하거나, 사용자가 원하는 행동과 표정을 선택해 인식하도록 설정하고 행동과 표정을 조합하여 본인만의 특별한 처리 논리를 설계하면 AI가 영상 분석 수행 시 설계된 처리 논리에 따라 결과를 요약하여 시각적으로 전달함으로써 영상 분석 효율을 증대시키는 영상 분석 AI 보조 시스템에 대한 것이다.On the other hand, in the above-described analysis device 100, AI analyzes an image in which a person appears, synthesizes actions and facial expressions (emotions) performed by a specific object over time in the image, and visually displays the result of analyzing them according to time series. If the user selects and recognizes the desired action and facial expression, and designs his or her own special processing logic by combining the action and facial expression, AI summarizes the results according to the designed processing logic when performing image analysis and visually conveys the image. It is about an image analysis AI assistant system that increases analysis efficiency.

사용자는 웹 UI 등을 이용하여 분석을 수행하기 전에 결과로 받아보길 원하는 조건을 미리 설정할 수 있다. The user may set conditions in advance to receive results before performing the analysis using a web UI or the like.

제어부(190)는 상기 사용자 입력에 따라 행동 라벨 및 표정 라벨을 설정할 수 있다.The controller 190 may set an action label and a facial expression label according to the user input.

여기서, 행동 라벨은 객체의 행동과 관련한 것으로, 앉기, 걷기, 뛰기, 일어서기, 눕기 등을 포함할 수 있다. 표정 라벨은 객체의 표정과 관련한 것으로, 침착함, 슬픔, 기쁨 등을 포함할 수 있다.Here, the action label relates to the action of the object and may include sitting, walking, running, standing up, lying down, and the like. The facial expression label relates to the facial expression of the object and may include calmness, sadness, joy, and the like.

제어부(190)는 AI 파이프라인이 출력하는 결과값들 중 원하는 결과만을 출력하도록 설정할 수 있다. 설정 시 AI 파이프라인 내부에 설정된 값들만을 필터링하여 분석 결과를 출력할 수 있다.The controller 190 may set to output only desired results among result values output by the AI pipeline. When setting, analysis results can be output by filtering only the values set inside the AI pipeline.

예를 들어, 행동 라벨이 "걷기"로 미리 설정되고, 표정 라벨이 "슬픔"으로 미리 설정된 경우, 이후 AI 파이프라인이 출력하는 결과값은 걷기와 관련된 행동과 슬픔과 관련된 표정이 될 수 있다.For example, if the action label is preset to “walking” and the expression label is preset to “sadness”, the result value output by the AI pipeline thereafter may be a behavior related to walking and a facial expression related to sadness.

제어부(190)는 상기 사용자 입력에 따라 고유 처리 논리를 구성할 수 있다.The control unit 190 may configure a unique processing logic according to the user input.

사용자가 분석하고자 하는 결과를 얻기 위한 설정을 수행할 수 있다. 이를 테면 표정 라벨인 "침착함"과 행동 라벨인 "앉기"가 몇 프레임 이상 지속된다면 이는 "집중" 하는 것으로 처리된 결과값을 얻을 수 있도록 라벨들을 조합하여 본인만의 고유한 처리 논리를 구성할 수 있다.Settings can be made to obtain the results the user wants to analyze. For example, if the expression label "calm" and the action label "sitting" last for more than a few frames, it is "concentrate", and you can configure your own processing logic by combining the labels to obtain the processed result value. can

제어부(190)는 분석을 위한 영상을 획득할 수 있다.The controller 190 may obtain an image for analysis.

실시예에 따라, 사용자가 자신의 단말을 이용하여 분석하고자 하는 영상을 웹 UI 등을 통해 업로드할 수 있다. 제어부(190)는 웹 UI 등을 통해 업로드된 영상을 획득할 수 있다.Depending on the embodiment, a user may upload an image to be analyzed using a user's terminal through a web UI or the like. The controller 190 may acquire the uploaded image through a web UI or the like.

제어부(190)는 상기 영상 내의 객체를 추적하여 고유 ID를 부여할 수 있다.The controller 190 may assign a unique ID by tracking the object in the image.

제어부(190)는 AI를 이용하여 영상 내에 등장하는 사람 객체들 각각을 추적한다. AI가 영상 내에서 등장하는 각 객체에 대해 고유한 ID를 부여함으로써, 각 프레임마다 각 ID의 객체 위치를 알 수 있고 단위 시간 별 행동 인식과 표정 인식 결과를 추출할 수 있다. 이러한 결과는 메모리에 저장될 수 있다.The controller 190 tracks each of the human objects appearing in the image using AI. By assigning a unique ID to each object appearing in the video, AI can know the object location of each ID for each frame and extract the action recognition and facial expression recognition results for each unit time. These results can be stored in memory.

제어부(190)는 영상 프레임에서 객체를 포함시키는 사각형(이하 경계 박스라 칭함)을 그렸을 때 사각형의 가장 왼쪽 및 가장 위쪽과 가장 오른쪽 및 가장 아래쪽의 픽셀 위치를 벡터값으로 가지고 이에 객체 ID를 대응시킴으로써 객체 위치를 알 수 있다.When a rectangle (hereinafter referred to as a bounding box) containing an object is drawn in an image frame, the control unit 190 takes the leftmost, topmost, rightmost, and bottommost pixel positions of the rectangle as vector values and corresponds them with object IDs. object location is known.

제어부(190)는 상기 객체의 행동 및 표정을 인식할 수 있다.The controller 190 may recognize the object's behavior and expression.

제어부(190)는 객체 추적 결과, 알 수 있는 객체를 포함한 경계 박스로부터 객체와 객체 주변을 잘라낸 프레임으로부터 일련의 알고리즘을 이용해 특징점 등을 추출하고 이를 이용해 행동을 예측할 수 있는 AI 모델이 각 객체 ID의 단위 별 행동 인식 결과를 추출할 수 있다. 이러한 결과는 메모리에 저장될 수 있다.The control unit 190 extracts feature points, etc. using a series of algorithms from the object tracking result, the object from the bounding box including the known object, and the frame cut out around the object, and the AI model that can predict the behavior using this extracts the object ID of each object ID. Action recognition results for each unit can be extracted. These results can be stored in memory.

표정 인식은 얼굴 인식 → 표정 인식 순으로 진행되며 인식한 얼굴의 경계 박스로부터 얼굴과 얼굴 주변을 잘라낸 프레임으로부터 일련의 알고리즘을 이용해 특징점 등을 추출하고 이를 이용해 표정을 예측할 수 있는 AI 모델이 표정 인식 결과를 추출할 수 있다. 이러한 결과는 메모리에 저장될 수 있다.Expression recognition proceeds in the order of face recognition → expression recognition, and an AI model that extracts feature points using a series of algorithms from frames cut out of the face and face surroundings from the bounding box of the recognized face, and using these, an AI model that can predict facial expressions results in facial expression recognition can be extracted. These results can be stored in memory.

영상 끝까지 행동 및 표정 인식을 수행한 결과를 기존 영상에 합성하여 분석된 영상을 시각적으로 확인할 수 있도록 하며, 또한 텍스트 형태의 로그 파일로 저장될 수 있다.The result of performing action and facial expression recognition until the end of the video is synthesized with the existing video so that the analyzed video can be visually checked, and can also be saved as a text log file.

제어부(190)는 상기 인식된 결과를 상기 고유 ID와 연동할 수 있다.The controller 190 may link the recognized result with the unique ID.

제어부(190)는 수행한 표정 인식 결과를 객체 추적 시 산출해낸 객체 별 ID와 연동하는 작업을 수행한다. 제어부(190)는 객체 추적 결과 추적중이던 객체의 경계박스와 표정 인식 결과 인식한 얼굴의 경계 박스가 겹치는지 확인하여 연동한다.The control unit 190 performs a task of interlocking the facial expression recognition result with the ID for each object calculated during object tracking. As a result of object tracking, the controller 190 checks whether the bounding box of the object being tracked overlaps with the bounding box of the recognized face as a result of facial expression recognition, and interlocks them.

제어부(190)는 상기 연동된 결과 및 상기 고유 처리 논리에 기초하여 데이터를 가공할 수 있다.The controller 190 may process data based on the linked result and the unique processing logic.

제어부(190)는 연동 처리 분석 결과, 로그 파일로부터 프레임 혹은 정해진 단위 별로 사용자가 설정한 고유 처리 논리에 따라 새로운 결과를 산출하여 이를 파일 시스템 혹은 DB에 저장한다.The control unit 190 calculates a new result according to the unique processing logic set by the user for each frame or predetermined unit from the log file as a result of the interlocking process analysis and stores it in a file system or DB.

제어부(190)는 상기 가공된 결과를 시각화할 수 있다.The controller 190 may visualize the processed result.

시간에 따라 특정 객체가 행한 행동 및 표정 인식 결과 값들을 그래프 등으로 시각화하여 보여줄 수 있다.Actions performed by a specific object over time and facial expression recognition result values can be visualized and displayed in a graph or the like.

상술한 본 발명의 방법은 다양한 공간에서 적용될 수 있다.The method of the present invention described above can be applied in various spaces.

일 실시예에서, 본 발명은 어린이집, 키즈카페 등 모니터링이 가능한 실내 공간에 적용할 수 있다.In one embodiment, the present invention can be applied to an indoor space where monitoring is possible, such as a daycare center and a kids cafe.

어린이집 혹은 키즈카페 등에 의무적으로 설치된 CCTV를 통해 저장된 영상을 본 시스템에서 분석할 때, 활동하는 아이들 객체 각각에 대하여 행동 및 표정 인식을 수행할 수 있다. 인식한 결과를 바탕으로 제시한 실내 공간 등에서 발생하는 사건 사고를 검출해낼 수 있다. When this system analyzes images stored through CCTVs obligatory installed in daycare centers or kids cafes, it is possible to perform action and facial expression recognition for each active child object. Based on the recognized results, it is possible to detect incidents and accidents that occur in the proposed indoor space.

가령 인식한 행동 및 표정에서 '사람을 때리다(hit sb)'와 '슬픔(sad)'이 영상 내 특정 시간대에 동시에 인식된 경우 다툼이 있었다고 판단할 수 있다. 사용자는 이와 같은 논리로 행동과 표정을 조합하여 고유 처리 논리를 구성하면 긴 시간 녹화된 영상에서도 사람이 일일이 영상을 볼 필요 없이 분석 결과 시간대만을 참조하여 능률을 극대화할 수 있다.For example, when 'hit sb' and 'sad' are recognized at the same time in a specific time period in the recognized action and facial expression, it can be determined that there is a quarrel. If a user configures a unique processing logic by combining actions and facial expressions with such logic, it is possible to maximize efficiency by referring only to the time period of the analysis result without the need for a person to view the video individually even in a video recorded for a long time.

뿐만 아니라, '넘어짐(fall down)'과 '슬픔(sad)'의 조합으로 안전 사고를 검출해내는 데에도 응용할 수 있다.In addition, it can be applied to detect safety accidents with a combination of 'fall down' and 'sad'.

다른 실시예에서, 본 발명은 학습 공간에 적용할 수 있다.In another embodiment, the present invention can be applied to a learning space.

학습 공간에서 사용자 본인의 집중 시간 등을 파악하기 위해 '앉기(sit)', '읽기(read)', '평온(calm)' 혹은 '앉기(sit)', '쓰기(write)', '평온(calm)' 등 행동과 감정을 조합한 고유 처리 논리를 구성하여 검출된 결과로부터 집중한 시간을 파악할 수 있다. 혹은 '앉기(sit)'와 '휴대폰 보기(use phone)'을 조합하여 검출하면 학습 공간에서 자리에 앉았을 때 어느 정도 빈도로 휴대폰을 사용하는지 확인할 수 있다.In order to identify the user's concentration time in the learning space, 'sit', 'read', 'calm' or 'sit', 'write', 'calm' (calm)', etc., it is possible to determine the concentration time from the detected result by constructing a unique processing logic that combines behavior and emotion. Alternatively, by detecting a combination of 'sit' and 'use phone', it is possible to determine how often a mobile phone is used when sitting in a learning space.

위와 같이 사용자가 구체적인 고유 처리 논리를 구성하면 기존 행동 인식 AI 분석 기법에 비하여 더 정확한 분석 결과를 얻어낼 수 있다. 예를 들어, 학습 공간 내 자리에 앉아서 학습에 집중하지 않고 지인과 통화(말을 함)하거나 웃을 때, 단순히 '앉기(sit)'만을 검출하는 기존 행동 인식 AI 분석 기법을 활용하면 분석이 제대로 이루어지지 않는다.As above, if the user configures a specific unique processing logic, more accurate analysis results can be obtained compared to existing behavior recognition AI analysis techniques. For example, if you use the existing behavioral recognition AI analysis technique that simply detects 'sit' when you sit in a seat in a learning space and talk to an acquaintance or laugh without concentrating on learning, the analysis can be performed properly. don't lose

또 다른 실시예에서, 본 발명은 상담 센터에 적용할 수 있다.In another embodiment, the present invention is applicable to a counseling center.

유아, 청소년, 성인 대상 상담 센터에서는 상담사가 정해진 주제의 영상을 분석하여 상담을 진행하는 경우가 있다. 유아 대상 상담의 경우 영상 내 아이와 부모가 서로 간 쳐다봄, 발화 및 응답 등 상호작용 빈도 등이 중요한 지표로써 작용한다. In counseling centers for infants, teenagers, and adults, there are cases in which a counselor conducts counseling by analyzing an image of a predetermined subject. In the case of counseling for infants, the frequency of interaction between the child and parent in the video, such as looking at each other, speaking and responding, etc., acts as an important index.

따라서, '사람을 쳐다봄(watch somebody)'과 '말하기(talk)' 조합을 검출하거나 '듣기(listen)'와 '행복(happy)' 조합을 검출하고 이를 시각화하여 피상담자에게 제공하는 등 상담에 유의미한 정보를 전달하거나 결과가 검출된 시간대를 중심으로 상담사가 영상 분석에 임하여 능률을 극대화할 수 있다.Therefore, a combination of 'watch somebody' and 'talk' is detected, or a combination of 'listen' and 'happy' is detected and visualized and provided to the counselee, for example. A counselor can maximize efficiency by delivering meaningful information or analyzing the video around the time zone when the result is detected.

또한, 제어부(190)는 다양한 분석 요인들(F)과 사용될 수 있고, 해당 요인들을 이용해 추가적인 기능(U)이 구현될 수 있다.In addition, the controller 190 may be used with various analysis factors (F), and an additional function (U) may be implemented using the corresponding factors.

여기서, 분석 요인(F)은 영상 내에서 등장하는 각 객체의 픽셀 좌표, 인식한 행동 및 표정에 대한 신뢰도(확률 값) 및 얼굴-표정 인식 시 추정 나이대(추정 최소 나이 - 추정 최대 나이 쌍)을 포함할 수 있으며, 이에 제한되는 것은 아니다.Here, the analysis factor (F) is the pixel coordinates of each object appearing in the image, the reliability (probability value) of the recognized behavior and facial expression, and the estimated age range (estimated minimum age - estimated maximum age pair) at face-expression recognition. It may include, but is not limited to.

실시예에 따라, 본 발명은 분석 요인(F) 중 영상 내에서 등장하는 각 객체의 픽셀 좌표를 통해 영상 내 객체간 평균 거리 변화도 검출할 수 있다.Depending on the embodiment, the present invention may also detect a change in average distance between objects in the image through pixel coordinates of each object appearing in the image among the analysis factors (F).

실시예에 따라, 본 발명은 분석 요인(F) 중 인식한 행동 및 표정에 대한 신뢰도(확률 값)를 통해 특정 객체의 시간대별 표정(감정) 변화도 검출할 수 있으며, 검출된 변화도는 그래프로 시각화 가능할 수 있다.Depending on the embodiment, the present invention can also detect a change in expression (emotion) of a specific object over time through the reliability (probability value) of the recognized behavior and facial expression among the analysis factors (F), and the detected change is a graph can be visualized with

100 : 인공 지능 기반으로 사용자 촬영 영상을 분석하는 장치
190 : 제어부100: A device for analyzing user-captured images based on artificial intelligence
190: control unit

Claims

A method performed by an apparatus for analyzing a user-captured image based on artificial intelligence, the method comprising:
setting a label to be output by a recognition model that recognizes a user's behavior and expression in the image when a user-captured image is input;
setting a user interest keyword that can be derived by at least one of a user's behavior and expression;
specifying a user when a user-captured image to be analyzed is input;
Recognizing the specified behavior and expression of the user based on the label through the recognition model;
generating data related to the user's interest keyword based on the recognized user's behavior and expression; and
An analysis method comprising displaying the generated data.

According to claim 1,
The step of setting the user interest keyword,
and outputting a user interest keyword by inputting the set label to a pre-learned interest keyword recommendation model.

According to claim 2,
The step of specifying the user includes mapping unique identification information for each user recognized from the input user-captured image,
Wherein the recognizing through the recognition model includes collecting each user's behavior and expression in chronological order based on the mapped unique identification information.

According to claim 3,
The step of generating the data is,
And generating data related to a keyword of interest to the user based on the recognized behavior and expression of each user collected in the chronological order.

According to claim 4,
The step of generating the data is,
And generating data related to the user interest keyword using a pre-learned data generation model.

According to claim 5,
The displaying step is
And providing the recognized behaviors and facial expressions of each user collected in the chronological order in chronological order.

According to claim 6,
The displaying step is
And providing data related to the keyword of interest in chronological order for each user.

According to claim 7,
The displaying step is
An analysis method comprising the step of comparing and displaying a user's behavior or expression that can predict a situation or relationship that has occurred between users.

An artificial intelligence-based user-captured image analysis program combined with a hardware computer and stored in a medium to execute the method of any one of claims 1 to 8.

A device for analyzing user-photographed images based on artificial intelligence,
communications department;
input unit;
display;
one or more processors; and
A memory electrically connected to the processor and storing at least one code executed by the processor,
When the memory is executed by the processor, when the processor obtains a user-captured image through the communication unit or the input unit, a recognition model that recognizes the user's behavior and expression in the image sets a label to be output, and the user's behavior and set a user interest keyword derived by at least one of facial expressions, specify a user when a user-captured image to be analyzed is input, and recognize behavior and facial expression of the specified user through the recognition model based on the label. and a code for generating data related to the keyword of interest to the user based on the recognized behavior and expression of the user and displaying the generated data through the display is stored.