KR102399121B1

KR102399121B1 - Early screening system and method for high-risk groups of suicide using artificial intelligence (AI)-based video and audio information

Info

Publication number: KR102399121B1
Application number: KR1020200128617A
Authority: KR
Inventors: 전승준; 구창민
Original assignee: 주식회사 인포쉐어
Priority date: 2020-10-06
Filing date: 2020-10-06
Publication date: 2022-05-18
Also published as: KR20220046023A

Abstract

본 발명은 인공지능(AI) 기반 영상 및 음성정보를 활용한 자살고위험군 조기선별시스템 및 방법에 관한 것으로, 자살고위험군 면담자의 행동을 촬영하여 유효데이터를 획득하기 위한 3D 카메라가 설치된 서브컴퓨터를 포함하는 정보획득장치 및 정보획득장치와 무선 또는 유선으로 연결되어 입력되는 영상에 따른 유효데이터를 저장하는 계측서버를 포함하는 계측장치와; 상기 계측장치와 유선 또는 무선으로 연결되어 인가되는 유효데이터를 기준으로 얼굴, 표정, 행동에 따른 각각의 데이터를 분리하여 분석모듈과; 상기 분석모듈로부터 인가되는 분리된 이미지를 기 저장된 표준화데이터에 매칭시켜 자살고위험군 면담자의 심리상태를 분석 및 결과를 표출하는 분석서버;를 포함하는 것을 특징으로 하며, 상용화된 비전-디텍팅 기술의 적용으로 기존 모션인식기술 및 움직임관련 알고리즘(기술)과의 연동을 통해 더욱 강력한 평가기능의 시스템을 구축할 수 있고, 심리학적 신경과학적인 표준데이터 테이블에 대한 알고리즘을 적용함으로써 사회적 문제로 대두 되고 있는 정신적 문제에 대한 기초 평가 프로그램으로 응용이 가능한 감성 및 심리치료 분야에 대한 적용 가능한 센서디바이스를 사용한 IoT 연동 기술 분야에 확대 적용 가능하고, 3D 카메라를 통해 실시간 자살고위험군 면담자의 행동양식을 획득하고 심리학 및 신경과학 전문가에 작성된 행동양식 표준화데이터를 통해 분석 및 판단하여 심리상태를 객관적으로 평가할 수 있는 효과가 있다. The present invention relates to a system and method for early screening of a high-risk group using artificial intelligence (AI)-based video and audio information, comprising a subcomputer installed with a 3D camera for acquiring valid data by photographing the behavior of an interviewer for a high-risk group a measurement device including an information acquisition device and a measurement server connected to the information acquisition device wirelessly or by wire to store valid data according to an input image; an analysis module for separating each data according to a face, an expression, and an action based on the valid data applied by being connected to the measuring device by wire or wirelessly; An analysis server that analyzes the psychological state of an interviewer in a high-risk group by matching the separated image applied from the analysis module to pre-stored standardized data, and displays the results; characterized in that it includes; application of commercialized vision-detecting technology By linking with the existing motion recognition technology and motion-related algorithms (technology), it is possible to build a system with a more powerful evaluation function, and by applying the algorithm for the psychological neuroscientific standard data table, the psychological It is applicable to the field of IoT interworking technology using applicable sensor devices in the field of emotional and psychotherapy that can be applied as a basic evaluation program for problems. It has the effect of objectively evaluating the psychological state by analyzing and judging through the behavioral standardized data written by scientific experts.

Description

Early screening system and method for high-risk groups of suicide using artificial intelligence (AI)-based video and audio information}

본 발명은 인공지능(AI) 기반 영상 및 음성정보를 활용한 자살고위험군 조기선별시스템 및 방법에 관한 것으로, 더욱 상세하게는 3D 카메라를 통해 실시간 자살고위험군 면담자의 행동양식을 획득하고, 심리학 및 신경과학 전문가에 의해 작성된 행동양식 표준화데이터를 통해 분석 및 판단(또는 연산)하여 심리상태를 객관적으로 평가할 수 있도록 한 인공지능(AI) 기반 영상 및 음성정보를 활용한 자살고위험군 조기선별시스템 및 방법에 관한 것이다.The present invention relates to a system and method for early screening of high-risk groups of suicide using artificial intelligence (AI)-based video and audio information. It relates to a system and method for early screening of high-risk groups of suicide using artificial intelligence (AI)-based video and audio information that enables objective evaluation of psychological states by analyzing and judging (or calculating) through behavioral standardized data prepared by experts. .

일반적으로 자살생각과 자살시도를 포함하는 자살행동은 객관적, 정량적 기준이 불분명하여 지역 내 자살예방센터는 자살업무 담당자들의 관리 포인트가 모호하고, 밀착관리에 어려움이 있다. 또한, 각 지자체는 행정안전부 지역안전지수 자살분야 등급 개선을 위한 다양한 사업을 실시하고 있으나 실효성 미미하며, 포스트 코로나 시대에 맞춰 상담 및 관리체계를 비대면 중심으로 전환하려는 노력을 기울이고 있다.In general, objective and quantitative criteria for suicidal behavior, including suicidal thoughts and suicidal attempts, are unclear. Therefore, the management points of suicide prevention centers in the region are ambiguous, and it is difficult to closely manage them. In addition, each local government is implementing various projects to improve the suicide field rating of the Ministry of Public Administration and Security's local safety index, but the effectiveness is insignificant, and efforts are being made to switch the counseling and management system to a non-face-to-face focus in the post-corona era.

또한, 자살행동을 하는 사람은 사회와의 소통이 극히 제한적으로 이루어지기 때문에, 가족, 연인 또는 친구와의 소통은 큰 힘이 되고 있다. 그러나, 평소 우울감, 상실감, 좌절 등의 감정표현을 숨긴 후 극단적 선택을 하는 경우가 빈번하게 발생하고, 이러한 갑작스러운 감정변화가 자살행동으로 이어지는 경우가 많다. In addition, since the person who commits suicide has very limited communication with society, communication with family, lovers, or friends is a great strength. However, it is common to make extreme choices after hiding emotional expressions such as depression, loss, and frustration, and these sudden emotional changes often lead to suicidal behavior.

이와 같이, 자살사망자의 심리 상태는 불안정할 수 있으며, 이러한 심리 상태가 정확히 모니터링되지 않아 많은 지자체 내의 문제가 발생하고 있다.As such, the psychological state of the suicide victims may be unstable, and this psychological state is not accurately monitored, causing problems in many local governments.

상기와 같이 정보기술 기반으로 심리상태를 파악하는 선행문헌으로 대한민국 등록특허 제10-1689021호(2016.12.23. 공고)의 '센싱장비를 이용한 심리상태 판단 시스템 및 그 방법'은 상담 또는 면접 중인 피검자를 대상으로, 적어도 하나의 센싱 디바이스를 이용하여 상기 피검자의 모션 정보, 생체 신호 정보, 음성 정보 중 적어도 하나의 정보를 계측하는 단계; 상기 계측의 인자 별로 계측 데이터에 의해 분류 가능한 복수의 상태 유형 정보를 기초로, 상기 적어도 하나의 센싱 디바이스의 계측 데이터를 이용하여 상기 피검자의 현재 상태에 대응하는 적어도 하나의 상태 유형 정보를 필터링하는 단계; 및 상기 필터링된 적어도 하나의 상태 유형 정보를 심리 분류 테이블과 매칭시켜, 상기 피검자가 가진 적어도 하나의 심리 상태를 추출하여 제공하는 단계를 포함하고, 상기 심리 분류 테이블은, 이종의 복수의 심리 상태에 대해, 상기 복수의 상태 유형 정보 중 상기 심리 상태 각각에 대응하는 해당 상태유형 정보가 개별 매핑되어 있으며, 상기 피검자의 모션 정보는 상기 피검자의 얼굴 영역에서 추출된 표정 정보 및 신체 영역에서 추출된 제스처 정보를 포함하며, 상기 피검자의 생체 신호 정보는 상기 피검자의 맥박, 체온, 혈압, 뇌파 신호 중 적어도 하나를 포함하고, 상기 적어도 하나의 상태 유형 정보를 필터링하는 단계는, 상기 계측의 인자 별로 계측 데이터에 의해 분류 가능한 복수의 상태 유형 정보를 저장하되, 상기 모션 정보의 계측 인자의 경우, 상기 표정 정보에 따라 분류 가능한 복수의 제1 모션 상태 유형 정보와, 상기 제스쳐 정보에 따라 분류 가능한 복수의 제2 모션 상태 유형 정보를 개별 저장하고 있으며, 상기 복수의 제1 모션 상태 유형 정보는 상기 얼굴 영역에서 움직임 변화가 검출된 근육 부위에 대응하여 분류 가능한 복수의 행동 유형 정보이고, 상기 적어도 하나의 심리 상태를 추출하여 제공하는 단계는, 상기 모션 정보의 계측 인자의 경우, 상기 필터링된 제1 모션 상태 유형 및 제2 모션 상태 유형을 상기 심리 분류 테이블과 각각 매칭시켜, 상기 표정과 상기 제스처에 대응하는 심리 상태를 개별 추출하되, 상기 제1 모션상태 유형에 매칭된 심리 상태와 상기 제2 모션 상태 유형에 매칭된 심리 상태 간의 감정 레벨 차를 비교하여, 상기 감정 레벨 차가 임계치 이상이면, 상기 제2 모션 상태 유형에 매핑된 심리 상태를 소거하고 상기 제1 모션상태 유형에 매칭된 심리 상태만을 추출하여 제공하며, 상기 음성 정보에 대한 복수의 상태 유형 정보는, 상기 피검자의 목소리 떨림, 목소리 세기의 급격한 상승 또는 하강 변화, 질문에 대한 답변 시간 경과, 기 설정된 특정 단어의 반복적 진술에 해당하는 유형 중 선택된 복수의 유형에 대응하는 상태 유형 정보를 각각 포함하는 것을 특징으로 한다. As described above, as a prior document for understanding the psychological state based on information technology, the 'mental state judgment system and method using sensing equipment' of Republic of Korea Patent No. 10-1689021 (2016.12.23. Announcement) is the subject of counseling or interviewing. measuring at least one of motion information, biosignal information, and voice information of the subject using at least one sensing device; Filtering at least one state type information corresponding to the current state of the subject by using the measurement data of the at least one sensing device based on a plurality of state type information classifiable by the measurement data for each measurement factor ; and matching the filtered at least one state type information with a psychological classification table to extract and provide at least one psychological state possessed by the examinee, wherein the psychological classification table includes a plurality of heterogeneous psychological states. In contrast, among the plurality of state type information, corresponding state type information corresponding to each of the psychological states is individually mapped, and the motion information of the examinee includes facial expression information extracted from the face region and gesture information extracted from the body region of the examinee. including, wherein the biosignal information of the examinee includes at least one of a pulse, body temperature, blood pressure, and EEG signals of the examinee, and the step of filtering the at least one state type information includes: A plurality of state type information that can be classified by State type information is individually stored, and the plurality of first motion state type information is a plurality of behavior type information that can be classified in response to a muscle part in which a change in motion is detected in the face region, and the at least one psychological state is extracted. and providing, in the case of a measurement factor of the motion information, matching the filtered first motion state type and the second motion state type with the psychological classification table, respectively, to obtain a psychological state corresponding to the facial expression and the gesture Separately, by comparing the emotional level difference between the psychological state matched to the first motion state type and the psychological state matched to the second motion state type, if the emotional level difference is equal to or greater than a threshold, the second motion state type The mapped mental state is erased, and only the mental state matched to the first motion state type is extracted and provided, and the plurality of state type information for the voice information includes: trembling of the subject's voice, a sudden increase or decrease in voice strength , time elapsed for answering questions, specific preset It is characterized in that each includes state type information corresponding to a plurality of types selected from types corresponding to repeated statements of words.

상기의 선행문헌은 센싱장비를 이용한 심리상태 판단 시스템 및 그 방법에 따르면, 상담 또는 면접 중에 피검자의 행동 양식을 실시간 모니터링하고 분석함에 따라 피검자의 행동 양식에 대응하는 심리 상태를 더욱 객관적으로 도출할 수 있으며 심리 상태의 판별 정확도를 높일 수 있는 이점이 있다. According to the above-mentioned prior literature, the psychological state judging system using sensing equipment and its method, the psychological state corresponding to the behavioral pattern of the examinee can be derived more objectively by monitoring and analyzing the subject's behavioral patterns in real time during counseling or interview. And there is an advantage that can increase the identification accuracy of the psychological state.

그러나, 상기의 선행문헌은 유형별 질문에 대한 면담자의 심리학적, 신경과학적으로 객관화된 평가를 할 수 없다는 문제점을 가지고 있다. However, the above prior literature has a problem in that it is not possible to objectively evaluate the interviewer psychologically and neuroscientifically for each type of question.

대한민국 등록특허 제10-1689021호(2016.12.23. 공고)Republic of Korea Patent Registration No. 10-1689021 (2016.12.23. Announcement)

본 발명은 상기한 선행문헌의 문제점을 개선하기 위하여 3D 카메라를 통해 실시간 피 실험자의 행동양식을 획득하고 심리학 및 신경과학 전문가에 작성된 행동양식 표준화데이터를 통해 분석 및 판단하여 심리상태를 객관적으로 평가할 수 있도록 한 인공지능(AI) 기반 영상 및 음성정보를 활용한 자살고위험군 조기선별시스템 및 방법을 제공하는 데 목적이 있다.The present invention can objectively evaluate the psychological state by acquiring the behavioral pattern of a real-time subject through a 3D camera and analyzing and judging the behavioral pattern standardized data written by a psychology and neuroscience expert in order to improve the problems of the prior literature It aims to provide a system and method for early screening of high-risk groups of suicide using artificial intelligence (AI)-based video and audio information.

또한, 본 발명은 객관성 있는 식별 방법을 제공하기 위해 특정 모션분석기술을 통해 유형별 질문에 대한 면담자의 심리학적, 신경과학적으로 객관화된 평가를 할 수 있도록 한 인공지능(AI) 기반 영상 및 음성정보를 활용한 자살고위험군 조기선별시스템 및 방법을 제공하는 데 목적이 있다.In addition, in order to provide an objective identification method, the present invention provides an artificial intelligence (AI)-based video and audio information that enables the psychological and neuroscientific objective evaluation of the interviewer for each type of question through specific motion analysis technology. The purpose of this study is to provide a system and method for early screening of high-risk groups for suicide.

본 발명은 상기한 목적을 달성하기 위한 수단으로, The present invention is a means for achieving the above object,

면담자의 행동을 촬영하여 유효데이터를 획득하기 위한 3D 카메라가 설치된 서브컴퓨터를 포함하는 정보획득장치 및 정보획득장치와 무선 또는 유선으로 연결되어 입력되는 영상에 따른 유효데이터를 저장하는 계측서버를 포함하는 계측장치와; 상기 계측장치와 유선 또는 무선으로 연결되어 인가되는 유효데이터를 기준으로 얼굴, 표정, 행동에 따른 각각의 데이터를 분리하여 분석모듈과; 상기 분석모듈로부터 인가되는 분리된 이미지를 기 저장된 표준화데이터에 매칭시켜 면담자의 심리상태를 분석 및 결과를 표출하는 분석서버;를 포함하는 것을 특징으로 하는 인공지능(AI) 기반 영상 및 음성정보를 활용한 자살고위험군 조기선별시스템을 제공한다. An information acquisition device including a sub-computer equipped with a 3D camera to acquire valid data by photographing the behavior of the interviewer, and a measurement server connected to the information acquisition device wirelessly or by wire to store valid data according to the input image measuring device; an analysis module for separating each data according to a face, an expression, and an action based on the valid data applied by being connected to the measuring device by wire or wirelessly; An analysis server that analyzes the psychological state of the interviewer and displays the results by matching the separated image applied from the analysis module to the standardized data stored in advance; using artificial intelligence (AI) based video and audio information comprising: Provides an early screening system for high-risk groups of suicide.

본 발명의 분석모듈은, 흑백 영역에 대한 픽셀값의 평균차에 의한 임계치 구분에 의해 특징을 판단, 파악하는 얼굴분석모듈; 특정 근육의 활동 상태를 분석하기 위하여 액션유닛(Action Units)으로 이루어지는 동작인식모듈과; 음성에서 특징을 추출하고, 추출된 최솟값(Minimum pitch(Hz))과 최댓값(Maximum pitch(Hz))을 지정하며, 지정된 음성파형 내에서 12개의 특징을 추출하는 음성인식모듈;을 포함하는 것을 특징으로 한다. The analysis module of the present invention includes: a face analysis module for judging and figuring out features by threshold classification based on the average difference of pixel values for black and white regions; a motion recognition module composed of action units to analyze the activity state of a specific muscle; A voice recognition module that extracts features from a voice, specifies the extracted minimum value (Minimum pitch (Hz)) and the maximum value (Maximum pitch (Hz)), and extracts 12 features from within the specified voice waveform; characterized by including; do it with

본 발명의 얼굴분석모듈은, 흰색 부분 영상 픽셀들의 밝기 합에서 검은색 부분의 밝기 합을 뺀 차로 계산되며, 영상 및 이미지 검색 시에 이 값이 특징(Feature)에 부여된 임계치 보다 큰지 작은지에 따라 파악하고자 하는 대상물체라고 추측하는 것을 특징으로 한다.The face analysis module of the present invention is calculated by subtracting the sum of the brightness of the black part from the sum of the brightness of the image pixels in the white part, and it is calculated according to whether this value is greater than or less than the threshold value assigned to the feature when searching for images and images. It is characterized in that it is assumed to be the target object to be grasped.

본 발명의 동작인식모듈은, 이마아래, 입술 끝 당김, 입술 끝 아래, 입술 다뭄, 입술 빨음, 눈 깜빡임을 포함하는 발생유무 상태와, 이마 중심 올림, 이마 외부 올림, 이마 아래, 눈꺼풀 위로, 볼 위, 코 찡그림, 윗 입술 위, 입술 끝 당김, 보조개, 입술 끝 아래, 아래턱 위, 입술 늘림, 입술 나뉨, 턱 아래를 포함하는 발생 정도 상태를 인식하는 것을 특징으로 한다.The motion recognition module of the present invention provides a state of occurrence or non-occurrence including under the forehead, pulling the tip of the lip, under the tip of the lip, closing the lips, sucking the lips, and blinking the eyes, and raising the center of the forehead, raising the outside of the forehead, under the forehead, above the eyelids, the cheeks It is characterized by recognizing the state of occurrence, including upper, nose frown, upper lip, lip pull, dimple, lower lip, upper lower jaw, lip extension, lip split, and lower chin.

본 발명의 음성파형 내에서 12개의 특징 추출은, 음성파형의 고저, 기울기, 파형의 고저 변화수 특징 추출, 측정초당 음절수, CID 비율, 발화당 음정수, 발화당 단어수, 발화당 내용어수, 음소착어, 의미착어, 후속발화 개시시간, 도치어, 간투사, 반복어, 수정어인 것을 추출하는 것을 특징으로 한다.Extraction of 12 features from within the voice waveform of the present invention is the extraction of the high/low, slope, and high/low changes of the waveform, the number of syllables per second measured, the CID ratio, the number of pitches per utterance, the number of words per utterance, the number of content words per utterance , phonemic, semantic, subsequent utterance start time, inverted words, interprojection, repeated words, and modified words are characterized in that they are extracted.

본 발명의 분석서버는, 분석모듈로부터 공급되는 유효데이터를 기준으로 감정에 대한 시간변화에 따른 그래프 형태의 이력제공, 이력에 따른 감정별 해당 동영상 연동 제공 및 행동인식 분석에 따른 분석데이터를 제공하는 것을 특징으로 한다.Analysis server of the present invention, based on the valid data supplied from the analysis module, provides a history in the form of a graph according to the change of time for emotions, provides the corresponding video linkage for each emotion according to the history, and provides analysis data according to behavior recognition analysis characterized in that

본 발명은 상기한 목적을 달성하기 위한 또 다른 수단으로, The present invention is another means for achieving the above object,

계측장치로부터 입력되는 영상 또는 이미지를 통해 얼굴표정, 머리 및 손의 움직임을 감지하여 모션을 감지하여 영상 또는 이미지를 획득하는 모션감지 및 음성인식단계와; 모션감지 및 음성인식단계를 통해 인가되는 유효데이터에서 특정 영역을 추출하여 면담자의 감정상태를 파악하는 특정영역, 음성특징추출단계; 및 특정영역, 음성특징추출단계를 통해 인가되는 영역데이터를 기준으로 기 저장된 표준화데이터에 매칭시켜 면전자의 심리를 판단, 분석하는 분석단계;를 포함하는 것을 특징으로 하는 인공지능(AI) 기반 영상 및 음성정보를 활용한 자살고위험군 조기선별방법을 제공한다.a motion detection and voice recognition step of detecting a motion by detecting a facial expression, a head and a hand movement through an image or image input from a measurement device, and acquiring an image or image; A specific area and voice feature extraction step of extracting a specific area from the valid data applied through the motion detection and voice recognition step to understand the emotional state of the interviewer; and an analysis step of determining and analyzing the psychology of the face electron by matching it with pre-stored standardized data based on the area data applied through the specific area and voice feature extraction step; and an early screening method for high-risk groups of suicide using voice information.

본 발명의 모션감지 및 음성인식단계는, 계측장치의 3D 카메라를 통해 이마아래, 입술 끝 당김, 입술 끝 아래, 입술 다뭄, 입술 빨음, 눈 깜빡임을 포함하는 발생유무 상태와, 이마 중심 올림, 이마 외부 올림, 이마 아래, 눈꺼풀 위로, 볼 위, 코 찡그림, 윗 입술 위, 입술 끝 당김, 보조개, 입술 끝 아래, 아래턱 위, 입술 늘림, 입술 나뉨, 턱 아래를 포함하는 발생 정도 상태를 인식하여 계측서버에 저장하는 것을 특징으로 한다. The motion detection and voice recognition step of the present invention includes, through the 3D camera of the measuring device, the presence or absence of occurrence including under the forehead, pulling the tip of the lip, under the tip of the lip, closing the lips, sucking the lips, and blinking the eyes, raising the center of the forehead, the forehead Recognizes and measures the degree of occurrence including external elevation, under the forehead, over the eyelids, over the cheeks, nose wrinkle, upper lip, lip pull, dimple, lower lip, upper lower jaw, lip extension, split lip, and lower chin It is characterized in that it is stored on the server.

본 발명의 특정영역, 음성특징추출단계는, 얼굴이 정면을 향하도록 변형하기 위하여 눈 검출을 이용하여 두 눈이 정확히 수평이 되도록 얼굴을 회전한 후 두 눈 사이의 거리가 항상 같도록 얼굴을 축소시키며, 얼굴 이미지에서 배경, 머리, 이마, 귀, 턱은 잘라내는 기하학적 변형 및 다듬기단계와; 조명에 따라 왼쪽 얼굴과 오른쪽 얼굴이 전혀 다른 얼굴처럼 보일 수 있으므로 표준화된 밝기와 대조값을 얻기 위해서 얼굴의 왼쪽과 오른쪽을 나누어서 히스토그램 균등화하는 히스토그램 균등화단계와; 히스토그램 균등화단계에서 증가될 수 있는 픽셀 노이즈를 감소시키기 위해 양방향 필터(Bilateral Filter)를 이용하여 이미지를 매끈하게 하면서 경계선을 뚜렷이 하는 스무딩단계와; 스무딩 된 이미지에 타원형의 마스크를 씌워 정면을 향하는 얼굴만을 추출하는 마스크단계와; 마이크를 통해 입력되는 음성에서 최솟값과 최댓값의 음성파형을 검출하는 음성파형인식단계와; 음성파형인식단계(S25)를 통해 추출된 음성파형 내에서 12개의 특징을 추출하는 파형특징추출단계;를 포함하는 것을 특징으로 한다. In the specific area, voice feature extraction step of the present invention, the face is rotated so that the two eyes are exactly horizontal using eye detection to transform the face to face the front, and then the face is reduced so that the distance between the two eyes is always the same. geometric transformation and trimming steps of cutting out the background, hair, forehead, ears, and chin from the face image; a histogram equalization step of equalizing the histogram by dividing the left and right sides of the face to obtain standardized brightness and contrast values because the left and right faces may look completely different depending on lighting; a smoothing step of making the image smooth and clear the boundary line by using a Bilateral Filter to reduce pixel noise that may be increased in the histogram equalization step; A mask step of extracting only the face facing the front by covering the smoothed image with an oval mask; A voice waveform recognition step of detecting a voice waveform of a minimum value and a maximum value from the voice input through the microphone; and a waveform feature extraction step of extracting 12 features from the audio waveform extracted through the speech waveform recognition step (S25).

본 발명의 파형특징추출단계는, 음성파형의 고저, 기울기, 파형의 고저 변화수 특징 추출, 측정초당 음절수, CID 비율, 발화당 음정수, 발화당 단어수, 발화당 내용어수, 음소착어, 의미착어, 후속발화 개시시간, 도치어, 간투사, 반복어, 수정어인 것을 특징으로 한다.The waveform feature extraction step of the present invention includes: extracting the high-low, slope, and high-low variation of the waveform, the number of syllables per second measured, the CID ratio, the number of pitches per utterance, the number of words per utterance, the number of content words per utterance, the phoneme It is characterized in that it is a semantic word, subsequent utterance start time, inverted word, liver projection, repeated word, and modified word.

본 발명의 분석단계는, 모션감지 및 음성인식단계(S10)를 통해 획득한 유효데이터를 특정영역, 음성특징추출단계의 기하학적 변형 및 다듬기단계, 히스토그램 균등화단계, 스무딩단계, 마스크단계를 통해 회득된 얼굴이미지 및 음성파형 내에서 12개의 특징 추출한 데이터를 표준데이터테이블과 비교, 매칭하고, 매칭된 이미지 및 음성파형특징에 따른 데이터를 환자군과 정상군 데이터와 비교하여 객관적인 정보와 해석데이터를 추출 및 표시하는 것을 특징으로 한다. In the analysis step of the present invention, the valid data obtained through the motion detection and speech recognition step (S10) is obtained through the geometric transformation and smoothing step of the specific region, the speech feature extraction step, the histogram equalization step, the smoothing step, and the mask step. Extract and display objective information and interpretation data by comparing and matching the data extracted from 12 features within the face image and voice waveform with the standard data table, and comparing the data according to the matched image and voice waveform features with the patient group and normal group data characterized in that

본 발명은 기 상용화된 비전-디텍팅 기술의 적용으로 기존 모션인식기술 및 움직임관련 알고리즘(기술)과의 연동을 통해 더욱 강력한 평가기능의 시스템을 구축할 수 있고, 심리학적 신경과학적인 표준데이터 테이블에 대한 알고리즘을 적용함으로써 사회적 문제로 대두 되고 있는 정신적 문제에 대한 기초 평가 프로그램으로 응용이 가능한 감성 및 심리치료 분야에 대한 활용 가능 센서디바이스를 사용한 IoT 연동 기술 분야에 확대 적용 가능한 효과가 있다. The present invention can build a more powerful evaluation function system by interworking with existing motion recognition technology and motion-related algorithms (techniques) by applying the commercialized vision-detecting technology, and it is a psychological and neuroscientific standard data table. By applying the algorithm of

본 발명은 3D 카메라를 통해 실시간 피 실험자의 행동양식을 획득하고 심리학 및 신경과학 전문가에 작성된 행동양식 표준화데이터를 통해 분석 및 판단하여 심리상태를 객관적으로 평가할 수 있는 효과가 있다. The present invention has the effect of objectively evaluating the psychological state by acquiring the behavioral pattern of a real-time subject through a 3D camera and analyzing and judging the behavioral pattern standardized data written by a psychology and neuroscience expert.

또한, 본 발명은 다양한 분야에서 효과적으로 사용이 가능한 것은 물론 특히, 모션분석기술에 의한 객관성 있는 식별 방법을 제공함으로써 유형별 질문에 대한 면담자의 심리학적, 신경과학적으로 객관화된 평가를 할 수 있는 효과가 있다.In addition, the present invention can be effectively used in various fields, and in particular, by providing an objective identification method by motion analysis technology, there is an effect that the interviewer's psychological and neuroscientific objective evaluation for each type of question is effective. .

도 1은 본 발명의 인공지능(AI) 기반 영상 및 음성정보를 활용한 자살고위험군 조기선별시스템을 도시한 개략도.
도 2는 본 발명의 인공지능(AI) 기반 영상 및 음성정보를 활용한 자살고위험군 조기선별방법을 도시한 순서도.
도 3은 도 2의 특정영역, 음성특징추출단계에 따른 추출하는 단계를 도시한 도면.
도 4는 도 2에 도시된 모션감지 및 음성인식단계에서 행동분석을 하기 위해 손의 위치를 판별하는 상태를 도시한 도면.
도 5는 도 3에 도시된 특정영역, 음성특징추출단계에서 얼굴의 기하학적 변형단계를 통해 얼굴에서 배경, 머리, 이마, 턱을 잘라내는 과정에 대한 상태를 단계적으로 도시한 도면.
도 6는 도 3에 도시된 특정영역, 음성특징추출단계에서 히스토그램 균등화단계를 통해 얼굴을 균등하게 분할하여 이미지를 획득하는 상태를 도시한 도면.
도 7은 도 3에 도시된 특정영역, 음성특징추출단계에서 마스크단계를 통해 얼굴에 마스크가 적용되어 얼굴을 추출되는 상태를 도시한 도면. 1 is a schematic diagram illustrating an early screening system for high-risk groups of suicide using artificial intelligence (AI)-based video and audio information of the present invention.
2 is a flowchart illustrating an early screening method for a high-risk group using artificial intelligence (AI)-based video and audio information of the present invention.
FIG. 3 is a diagram illustrating an extraction step according to the specific region and voice feature extraction step of FIG. 2 .
4 is a view showing a state in which the position of the hand is determined for action analysis in the motion detection and voice recognition step shown in FIG. 2 .
5 is a view showing the state of the process of cutting out the background, head, forehead, and chin from the face through the geometric transformation step of the face in the specific region and voice feature extraction step shown in FIG. 3 step by step;
FIG. 6 is a diagram illustrating a state in which an image is obtained by equally dividing a face through a histogram equalization step in a specific region and voice feature extraction step shown in FIG. 3 .
7 is a view showing a state in which a face is extracted by applying a mask to the face through the mask step in the specific region and voice feature extraction step shown in FIG. 3 .

이하, 본 발명에서 사용되는 기술적 용어는 단지 특정한 실시예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아님을 유의해야 하고, 본 발명에서 사용되는 기술적 용어는 본 발명에서 특별히 다른 의미로 정의되지 않는 한, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 의미로 해석되어야 하며, 과도하게 포괄적인 의미로 해석되거나, 과도하게 축소된 의미로 해석되지 않아야 한다. Hereinafter, it should be noted that the technical terms used in the present invention are only used to describe specific embodiments, and are not intended to limit the present invention, and the technical terms used in the present invention have a different meaning in particular in the present invention. Unless defined, it should be interpreted in the meaning generally understood by those of ordinary skill in the art to which the present invention pertains, and should not be interpreted in an overly comprehensive meaning or in an excessively reduced meaning.

또한, 본 발명에서 사용되는 기술적인 용어가 본 발명의 사상을 정확하게 표현하지 못하는 잘못된 기술적 용어일 때에는, 당업자가 올바르게 이해할 수 있는 기술적 용어로 대체되어 이해되어야 할 것이다. 또한, 본 발명에서 사용되는 일반적인 용어는 사전에 정의되어 있는 바에 따라, 또는 전후 문맥상에 따라 해석되어야 하며, 과도하게 축소된 의미로 해석되지 않아야 한다.In addition, when the technical term used in the present invention is an incorrect technical term that does not accurately express the spirit of the present invention, it should be understood by being replaced with a technical term that can be correctly understood by those skilled in the art. In addition, the general terms used in the present invention should be interpreted according to the definition in the dictionary or according to the context before and after, and should not be interpreted in an excessively reduced meaning.

아울러, 본 발명에서 사용되는 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한 복수의 표현을 포함하는데, 예를 들어 "구성된다" 또는 "포함한다" 등의 용어는 발명에 기재된 여러 구성 요소들, 또는 여러 단계를 반드시 모두 포함하는 것으로 해석되지 않아야 하며, 그 중 일부 구성 요소들 또는 일부 단계들은 포함되지 않을 수도 있고, 또는 추가적인 구성 요소 또는 단계들을 더 포함할 수 있는 것으로 해석되어야 한다.In addition, the singular expression used in the present invention includes a plural expression unless the context clearly indicates otherwise. It should not be construed as necessarily including all of several steps, and it should be construed that some components or some steps may not be included, or additional components or steps may be further included.

본 발명에 의한 인공지능(AI) 기반 영상 및 음성정보를 활용한 자살고위험군 조기선별시스템 및 방법을 설명한다.A system and method for early screening of high-risk groups of suicide using artificial intelligence (AI)-based video and audio information according to the present invention will be described.

본 발명의 인공지능(AI) 기반 영상 및 음성정보를 활용한 자살고위험군 조기선별시스템은 면담자의 행동을 촬영하여 유효데이터를 획득하기 위한 3D 카메라(112)가 설치된 서브컴퓨터(111)를 포함하는 정보획득장치(110) 및 정보획득장치(110)와 무선 또는 유선으로 연결되어 입력되는 영상에 따른 유효데이터를 저장하는 계측서버(120)를 포함하는 계측장치(100)와; 상기 계측장치(100)로부터 인가되는 유효데이터를 기준으로 얼굴, 표정, 행동에 따른 각각의 데이터를 분리하여 분석모듈(200)과; 상기 분석모듈(200)에서 분리된 이미지를 기 저장된 데이터에 매칭시켜 면담자의 심리상태를 분석 및 결과를 표출하는 분석서버(300);를 포함하는 것을 특징으로 한다. The system for early screening of high-risk groups using artificial intelligence (AI)-based video and audio information of the present invention includes a subcomputer 111 installed with a 3D camera 112 for acquiring valid data by photographing the behavior of the interviewer. a measurement device 100 including a measurement server 120 connected to the acquisition device 110 and the information acquisition device 110 wirelessly or by wire to store valid data according to an input image; an analysis module 200 by separating each data according to a face, an expression, and an action based on the valid data applied from the measurement device 100; and an analysis server 300 that analyzes the psychological state of the interviewer by matching the image separated by the analysis module 200 with pre-stored data and displays the results.

상기와 같은 특징으로 이루어지는 본 발명의 인공지능(AI) 기반 영상 및 음성정보를 활용한 자살고위험군 조기선별시스템 및 방법을 첨부된 도면을 통해 상세하게 설명한다.The system and method for early screening of high-risk groups using artificial intelligence (AI)-based video and audio information of the present invention having the above characteristics will be described in detail with reference to the accompanying drawings.

도 1을 참조하여 상세하게 설명하면, 본 발명의 인공지능(AI) 기반 영상 및 음성정보를 활용한 자살고위험군 조기선별시스템은 계측장치(100), 분석모듈(200), 분석서버(300)를 포함한다. When described in detail with reference to FIG. 1 , the system for early screening of a high-risk group using artificial intelligence (AI)-based video and audio information of the present invention includes a measurement device 100 , an analysis module 200 , and an analysis server 300 . include

상기 계측장치(100)는 면담자의 행동을 촬영하여 유효데이터를 획득하기 위한 3D 카메라가 설치된 서브컴퓨터(111)를 포함하는 정보획득장치(110) 및 정보획득장치(110)와 무선 또는 유선으로 연결되어 입력되는 영상에 따른 유효데이터를 저장하는 계측서버(120)를 포함한다. The measurement device 100 is connected to the information acquisition device 110 and the information acquisition device 110 including a sub-computer 111 installed with a 3D camera for acquiring valid data by photographing the interviewer’s behavior wirelessly or by wire. and a measurement server 120 for storing valid data according to the input image.

상기 분석모듈(200)은 계측장치(100)와 유선 또는 무선으로 연결되어 인가되는 유효데이터를 기준으로 얼굴, 표정, 행동에 따른 각각의 데이터를 분류, 분석할 수 있도록 얼굴분석모듈(210), 동작인식모듈(220) 및 얼굴분석모듈(210)을 포함하며, 상기 얼굴분석모듈(210), 동작인식모듈(220) 및 얼굴분석모듈(210)에 의해 분류 및 분석된 이미지 데이터를 분석서버(300)로 인가한다. The analysis module 200 is connected to the measurement device 100 by wire or wirelessly so as to classify and analyze each data according to the face, expression, and behavior based on the applied valid data, the face analysis module 210, It includes a motion recognition module 220 and a face analysis module 210, and analyzes the image data classified and analyzed by the face analysis module 210, the motion recognition module 220, and the face analysis module 210 ( 300) is approved.

상기 얼굴분석모듈(210)은 흰색 부분 영상 픽셀들의 밝기 합에서 검은색 부분의 밝기 합을 뺀 차로 계산되며, 영상 및 이미지 검색 시에 이 값이 특색(feature)에 부여된 임계치 보다 큰지 작은지에 따라 파악하고자 하는 대상물체라고 추측한다. The face analysis module 210 is calculated by subtracting the sum of the brightness of the black part from the sum of the brightness of the image pixels in the white part, and depending on whether this value is greater than or less than the threshold value assigned to the feature when searching for images and images Assume that it is the object to be grasped.

상기 얼굴분석모듈(210)은 영상인식에 사용되는 영상 특색(feature)들은 특징 점을 이용한 SIFT(Scale-Invariant Feature Transform), 템플릿 매칭을 이용한 HOG(Histogram of Oriented Gradient), 그리고 LBP(Local Binary Pattern), MCT(Modified Census Transform) 등 여러 가지가 있지만, 그 중에서도 에이다부스트(Adaboost: Adaptive Boosting)와 주로 함께 쓰이는 하르 유사 특징(Haar-like Feature)은 기본적으로 영상에서의 영역과 영역의 밝기 차를 이용한다. The face analysis module 210 is an image feature used for image recognition, SIFT (Scale-Invariant Feature Transform) using feature points, HOG (Histogram of Oriented Gradient) using template matching, and LBP (Local Binary Pattern) ) and MCT (Modified Census Transform), among others, the Haar-like Feature, which is mainly used together with Adaboost (Adaptive Boosting), basically measures the difference in brightness between regions in the image. use it

또한, 모양에 따른 기본 특색(feature)들에 따라 이것들을 다수 조합하여 다양한 위치 및 크기에서 물체에 대한 특징을 추출한다. In addition, according to the basic features according to the shape, a plurality of these are combined to extract the features of the object at various positions and sizes.

각 기본 특색(feature)들의 특징 값은, 사각형의 흰색 부분 영상 픽셀들의 밝기 합에서 검은색 부분의 밝기 합을 뺀 차로 계산되며, 영상 및 이미지 검색 시에 이 값이 특색(Feature)에 부여된 임계치 보다 큰지 아닌지에 따라서 파악하고자 하는 대상물체로 추측한다. The feature value of each basic feature is calculated by subtracting the sum of the brightness of the black part from the sum of the brightness of the image pixels in the white part of the square, and this value is the threshold value assigned to the feature when searching for images and images. Depending on whether it is larger or not, it is assumed to be the object to be grasped.

여기에서 대상물체인지 확신을 하기 위해서는 의미 있는 특색(feature)들을 파악하는 것이 중요한데, 이는 특정 영역의 인식 대상에서는 비슷한 값들을 나타내면서 대상이 아닌 경우에는 무작위한 값을 내는 경우를 구분하는 것과 같다. Here, it is important to identify meaningful features in order to be sure that the object is an object, which is the same as distinguishing between cases where similar values are displayed in the recognition object in a specific area, but random values are generated when it is not the object.

하르 유사 특징(Haar-like Feature)을 이용한 얼굴인식의 경우, 눈을 예로 들면 눈의 영역은 주변보다 어두운 특징이 있기 때문에 이러한 특징은 사람 얼굴 검출에 의미 있는 특징이 된다. In the case of face recognition using a Haar-like feature, for example, the eye area is darker than the surrounding area, so this feature is a meaningful feature for human face detection.

그리고 이러한 의미 있는 특징의 선정은 에이다부스트(Adaboost)와 같은 부스팅(Boosting) 알고리즘과 같은 기계 학습 알고리즘을 통해 이루어진다. 이처럼 하르 유사 특징은 기초 특색(feature)에 의한 물체의 기하학적 정보를 가지면서 단위 영역의 밝기 차를 이용하기 때문에 사람 얼굴과 같은 경우에는 특징적인 밝기 차를 가지기 때문에 비교적 적용하기 적합하다.And the selection of these meaningful features is done through a machine learning algorithm such as a boosting algorithm such as Adaboost. As such, the Har-like feature is relatively suitable for application because it has a characteristic difference in brightness in the case of a human face because it uses the difference in brightness of the unit area while having geometric information of the object by the basic feature.

상기 동작인식모듈(220)은 이마아래, 입술 끝 당김, 입술 끝 아래, 입술 다뭄, 입술 빨음, 눈 깜빡임을 포함하는 발생유무 상태와, 이마 중심 올림, dl마 외부 올림, 이마 아래, 눈꺼풀 위로, 볼 위, 코 찡그림, 윗 입술 위, 입술 끝 당김, 보조개, 입술 끝 아래, 아래턱 위, 입술 늘림, 입술 나뉨, 턱 아래를 포함하는 발생 정도 상태를 인식한다. The motion recognition module 220 includes a state of occurrence or non-occurrence including under the forehead, pulling the tip of the lip, under the tip of the lip, dry lips, sucking the lips, and blinking the eyes, raising the center of the forehead, raising the outside of the forehead, under the forehead, above the eyelids, Recognize the occurrence status including cheeks, nose frown, upper lip, lip pull, dimple, lower lip, lower chin, lip extension, split lip, and lower chin.

상기 음성인식모듈(230)은 마이크를 통해 입력되는 음성에서 특징을 추출하고, 추출된 최솟값(Minimum pitch(Hz))과 최댓값(Maximum pitch(Hz))을 지정한다.The voice recognition module 230 extracts a feature from the voice input through the microphone, and designates the extracted minimum value (Minimum pitch (Hz)) and the maximum value (Maximum pitch (Hz)).

상기 음성인식모듈(230)은 음성파형 내에서 추출되는 12개의 특징은 음성파형의 고저, 기울기, 파형의 고저 변화수 특징 추출, 측정초당 음절수, CID 비율, 발화당 음정수, 발화당 단어수, 발화당 내용어수, 음소착어, 의미착어, 후속발화 개시시간, 도치어, 간투사, 반복어, 수정어 등을 추출한다.The voice recognition module 230 extracts the 12 features extracted from the voice waveform, including the high and low slope of the voice waveform, the number of changes in the waveform, the number of syllables per second measured, the CID ratio, the number of pitches per utterance, and the number of words per utterance. , the number of content words per utterance, phonemic verbs, semantic verbs, subsequent utterance start time, inverted words, interprojection, repeated words, modified words, etc. are extracted.

상기 분석서버(300)는 분석모듈(200)로부터 인가되는 분리된 이미지를 기 저장된 표준화데이터에 매칭시켜 면담자의 심리상태를 분석 및 결과를 표출한다. The analysis server 300 analyzes the psychological state of the interviewer and displays the results by matching the separated image applied from the analysis module 200 to the pre-stored standardized data.

또한, 분석서버(300)는 표준데이터테이블(310)과 유효데이터를 비교·분석하고, 비교·분석된 유효데이터에 대한 분석 및 결과는 분석 결과를 토대로 "표정분석"과 "행동분석"의 그래프를 자동으로 생성하고, 그래프에서 각 항목에 따라 선형 그래프에서 항목에 대한 그래프 정보가 명확하게 표시된다.In addition, the analysis server 300 compares and analyzes the standard data table 310 and the valid data, and the analysis and results for the compared and analyzed valid data are graphs of "expression analysis" and "behavioral analysis" based on the analysis results. is automatically generated, and according to each item in the graph, the graph information for the item in the line graph is clearly displayed.

상기 분석서버(300)는 음성인식모듈(230)에서 지정된 음성파형 내에서 12개로 추출한 특징을 수치화하고, 수치화된 데이터와 기저장된 환자군과 정상군 데이터를 비교 및 분석한다.The analysis server 300 digitizes the 12 features extracted from the voice waveform specified by the voice recognition module 230 , and compares and analyzes the quantified data with pre-stored data of the patient group and the normal group.

상기와 같이 구성되는 본 발명의 자살고위험군 조기선별시스템에서 수행되는 인공지능(AI) 기반 영상 및 음성정보를 활용한 자살고위험군 조기선별방법을 이용하여 심리상태를 판단하는 감정인식 분석을 통한 자살고위험군 조기선별방법을 설명한다. Early detection of high suicide risk group through emotional recognition analysis to determine psychological state using artificial intelligence (AI)-based video and audio information performed in the early screening system for high suicide risk group of the present invention configured as described above The selection method is explained.

본 발명의 자살고위험군 조기선별시스템에서 수행되는 인공지능(AI) 기반 영상 및 음성정보를 활용한 자살고위험군 조기선별방법은 정보획득장치(110)로부터 입력되는 영상을 통해 얼굴표정, 머리 및 손의 움직임을 감지하여 모션을 감지하여 영상 또는 이미지를 획득하는 모션감지 및 음성인식단계(S10)와; 모션감지 및 음성인식단계(S10)를 통해 인가되는 유효데이터에서 특정 영역을 추출하여 면담자의 감정상태를 파악하는 특정영역, 음성특징추출단계(S20); 및 특정영역, 음성특징추출단계(S20)를 통해 인가되는 영역데이터를 기준으로 기 저장된 표준화데이터에 매칭시켜 면전자의 심리를 판단, 분석하는 분석단계(S30);를 포함한다. The method for early screening of high-risk groups of suicide using artificial intelligence (AI)-based image and audio information performed in the system for early screening of high-risk groups of the present invention includes facial expressions, head and hand movements through images input from the information acquisition device 110 . a motion detection and voice recognition step (S10) of detecting a motion and acquiring an image or an image; A specific area, voice feature extraction step (S20) of extracting a specific area from the valid data applied through the motion detection and voice recognition step (S10) to understand the emotional state of the interviewer; and an analysis step (S30) of determining and analyzing the psychology of the face electron by matching it with pre-stored standardized data based on the area data applied through the specific area and voice feature extraction step (S20).

상기 특정영역 추출단계(S20)는 기하학적 변형 및 다듬기단계(S21), 균등화단계(S22), 스무딩단계(S23), 마스크단계(S24)를 포함한다. The specific region extraction step (S20) includes a geometric transformation and smoothing step (S21), an equalization step (S22), a smoothing step (S23), and a mask step (S24).

상기 기하학적 변형 및 다듬기단계(S21)는 얼굴이 정면을 향하도록 변형해주는 단계이다. 얼굴이 잘 정렬이 되어있어야 얼굴 인식 알고리즘이 정확한 특징점을 찾아 인식할 수 있기 때문이다. 즉. 얼굴 검출시 정렬된 얼굴 이미지를 되살리는 것 같지만, 얼굴을 되살리는 과정이 정확하지는 않기 때문에 정확한 얼굴이미지를 획득하기 위하여 기하학적 변형 및 다듬기단계(S21)를 수행한다.The geometric transformation and trimming step (S21) is a step of transforming the face to face the front. This is because the face recognition algorithm can find and recognize accurate feature points only when the faces are well aligned. In other words. It seems to revive the aligned face image when detecting the face, but since the process of reviving the face is not accurate, the geometric transformation and trimming step (S21) is performed to obtain an accurate face image.

또한, 더 정확한 정렬을 위해서 눈 검출을 이용하여 두 눈이 정확히 수평이 되도록 얼굴을 회전시킨다. 또한 두 눈 사이의 거리가 항상 같도록 얼굴을 축소시킨다. 이후 얼굴 이미지에서 배경, 머리, 이마, 귀, 턱은 잘라내어 특정 부분에 대한 이미지를 획득한다. Also, for more accurate alignment, eye detection is used to rotate the face so that both eyes are exactly horizontal. Also, reduce the face so that the distance between the two eyes is always the same. Afterwards, the background, head, forehead, ears, and chin are cut out from the face image to obtain an image of a specific part.

상기 균등화단계(S22)는 얼굴 인식 시 조명의 영향을 많이 받는다. 특히 조명에 따라 왼쪽 얼굴과 오른쪽 얼굴이 전혀 다른 얼굴처럼 보일 수 있다. 그러므로 더 표준화된 밝기와 대조값을 얻기 위해서 얼굴의 왼쪽과 오른쪽을 나누어서 히스토그램 균등화를 시켜준다.The equalization step (S22) is greatly affected by lighting during face recognition. In particular, depending on the lighting, the left and right faces may look completely different. Therefore, in order to obtain more standardized brightness and contrast values, the histogram is equalized by dividing the left and right sides of the face.

그리고 왼쪽과 오른쪽을 나누어서 히스토그램 균등화하면, 경계선이 생기게 된다. 그러므로 히스토그램 균등화를 왼쪽에서 중심 쪽 방향으로, 또는 오른쪽에서 중심 쪽 방향으로 향하도록 수행한다. 그리고 전체 얼굴을 히스토그램 평등화 시킨 이미지까지, 세 개의 이미지를 결합한다.And if the histogram is equalized by dividing the left and right sides, a boundary is created. Therefore, histogram equalization is performed from left to center or from right to center. Then, the three images are combined, up to the image in which the entire face is histogram equalized.

상기 스무딩단계(S23)는 히스토그램 균등화까지 수행하게 되면 이미지의 픽셀 노이즈가 증가할 수도 있다. 이 픽셀 노이즈를 감소시키기 위해 스무딩 과정이 필요하다. 이 단계에서는 양방향필터(Bilateral Filter)를 사용한다. 상기 양방향필터(Bilateral Filter)는 이미지를 매끈하게 만들어 주면서 경계선을 뚜렷하게 한다.In the smoothing step (S23), if the histogram equalization is performed, pixel noise of the image may increase. A smoothing process is needed to reduce this pixel noise. In this step, a Bilateral Filter is used. The Bilateral Filter makes the image smooth and sharpens the boundary line.

상기 마스크단계(S24)는 기하학적 변형 및 다듬기단계(S21)에서 이미 배경 이미지 등은 제거를 했지만 이미지의 코너부분의 그림자 때문에 인식 과정에서 문제가 생길 것을 방지하여 타원형의 마스크를 씌워주는 과정을 수행한다. 이 과정은 스무딩까지 된 이미지에 타원을 그려 정면을 향하는 얼굴만을 추출한다.In the mask step (S24), the background image has already been removed in the geometric transformation and trimming step (S21), but a process of putting on an oval mask is performed to prevent problems in the recognition process due to the shadow of the corner part of the image. . In this process, only the face facing the front is extracted by drawing an ellipse on the smoothed image.

상기와 같은 단계를 통해 추출되는 얼굴 이미지는 영상인식에 사용되는 영상 특색(Feature)들은 특징 점을 이용한 SIFT 또는 Naive Bayes 분류기, 템플릿 매칭을 이용한 HOG, 그리고 LBP, MCT등 여러 가지가 있지만, 그 중에서도 Naive Bayes 분류기, 에이다부스트(Adaboost: Adaptive Boosting)와 주로 함께 쓰이는 하르 유사 특징(Haar-like Feature)은 기본적으로 영상에서의 영역과 영역의 밝기 차를 이용한다. As for the facial image extracted through the above steps, the image features used for image recognition include SIFT or Naive Bayes classifier using feature points, HOG using template matching, and LBP and MCT, among others. The Haar-like feature, which is mainly used together with the Naive Bayes classifier and Adaboost (Adaptive Boosting), basically uses the difference in brightness between the region and the region in the image.

상기 Naive Bayes 분류기는 다차원 특징벡터에 대하여 모든 차원이 서로 독립적이라는 매우 강한 가정에 기반하며, 각 차원은 또한 일반적으로 1차원 가우시안 확률 분포임을 가정하여 전체 학습자료에 대하여 단지 평균과 분산만을 추정한다.The Naive Bayes classifier is based on the very strong assumption that all dimensions are independent of each other for multidimensional feature vectors, and each dimension is also generally assumed to be a one-dimensional Gaussian probability distribution, and only the mean and variance are estimated for the entire training data.

주어진 분류 'c' 에 대하여 입력벡터 'x' 가 주어졌을 때 사후확률(a posteriori probability)은 수학식 1과 같이 표현된다. When an input vector 'x' is given with respect to a given classification 'c', a posterior probability is expressed as Equation (1).

각각의 차원의 우도(likelihood)는 정규분포로 가정되며, 수학식 2와 같이 계산된다.The likelihood of each dimension is assumed to be a normal distribution, and is calculated as in Equation (2).

학습과정은 주어진 학습자료의 평균과 분산만이 필요하다. 또한, 수학식 (1)에서 모든 차원이 독립적으로 가정되므로 구현이 매우 간단하고 성능에 영향을 주는 요소가 적기 때문에 감정인식의 baseline 성능을 구하기 위하여 사용된다. The learning process requires only the mean and variance of the given learning material. In addition, since all dimensions are assumed independently in Equation (1), implementation is very simple and there are few factors affecting performance, so it is used to obtain baseline performance of emotion recognition.

그리고 상기 에이다부스트 알고리즘은 단순한 가설에 근거한 많은 약분류기(Weak Classifier)을 결합하여 강분류기(Strong Classifier)를 생성하는 방법이다. 약분류기란 무작위 추측(Random Guessing)과 비슷한 수준의 결과의 분류기를 말한다. 여러 약분류기를 이용하여 샘플들을 인식하고, 정확히 인식된 샘플에 대해서는 가중치를 감소시키고, 인식되지 않은 샘플에 대해서는 가중치를 증가시켜서 다음 약분류기에 반영시킨다. 즉, 가중치가 큰 필터들부터 먼저 사용되어 얼굴이 아닌 이미지들을 최대한 먼저 걸러낸 후, 가중치가 다음으로 큰 필터 순으로 사용하게 된다. 최정적으로 강분류기는 각 단계에서 생성된 약분류기들의 조압으로 구성된다.And the Adaboost algorithm is a method of generating a strong classifier by combining many weak classifiers based on a simple hypothesis. A weak classifier is a classifier with similar results to random guessing. Samples are recognized using several weak classifiers, weights are decreased for correctly recognized samples, and weights are increased for samples that are not recognized and reflected in the next weak classifier. That is, the filters with the largest weight are used first, and the non-face images are filtered out as much as possible first, and then the filters with the next largest weight are used in order. Ultimately, the strong classifier is composed of the pressure regulation of the weak classifiers generated in each stage.

또한, 하나의 강분류기를 만드는 과정을 부스팅(Boosting)이라 하며 강분류기들을 케스케이드(Cascade)형태로 구성하여 최종적으로 에이다부스트(Adaboost) 얼굴검출기를 만들게 된다. 부스팅(Boosting) 알고리즘은 아래 그림과 같이 3단계로 이루어진다. In addition, the process of making one strong classifier is called boosting, and the strong classifiers are configured in a cascade form to finally create an Adaboost face detector. The boosting algorithm consists of three steps as shown in the figure below.

첫 번째로 얼굴영상 즉 포지티브 샘플(Positive Sample)과 비 얼굴영상, 네거티브 샘플(Negative Sample)로 나누어 데이터를 수집한다. 이 데이터를 토대로 얻은 하르 특색(Haar Feature)들 중에서 만족하는 얼굴 검출률을 얻을 때까지 반복하여 얼굴과 비 얼굴을 구분해 낼 수 있는 소수의 값들을 찾는다. 마지막으로는 앞에서 찾은 하르 특색(Haar Feature) 즉, 약분류기들을 모아서 하나의 강분류기를 구성한다. First, data is collected by dividing the face image into positive samples, non-face images, and negative samples. Among the Haar features obtained based on this data, iteratively until a satisfactory face detection rate is obtained, a small number of values that can distinguish faces from non-faces are found. Finally, a strong classifier is constructed by collecting the Haar features, that is, the weak classifiers.

이후 강분류기들을 케스케이드(Cascade) 형태로 연결하여 여러 단계로 구성한 에이다부스트(Adaboost) 고속 얼굴 검출기를 구성하는 과정은 아래와 같이 4단계로 이루어진다. Thereafter, the process of constructing the Adaboost high-speed face detector consisting of several stages by connecting the strong classifiers in a cascade form consists of four stages as follows.

첫 번째로 전체 단계에서의 개수 및 각 단계에서의 얼굴 검출률을 미리 정한다. 시작단계에서는 낮은 검출률로 시작하게 되지만 후반 단계로 갈수록 검출률을 높인다. 이때 각 단계에서의 얼굴 검출률은 그 단계의 강분류기를 구성하는 약분류기의 개수에 영향을 미치게 된다. First, the number of all steps and the face detection rate in each step are predetermined. It starts with a low detection rate in the starting stage, but increases the detection rate toward the latter stage. In this case, the face detection rate in each stage affects the number of weak classifiers constituting the strong classifiers in that stage.

두 번째로는 각 단계에서 지정된 얼굴 검출률에 도달할 때까지 하르 특색(Haar Feature), 즉 약분류기를 추가하면서 하나의 강분류기를 훈련시킨다. 이렇게 하나의 단계가 완성되면 앞의 과정을 반복하면서 다음 단계(세번재)의 강분류기를 훈련시킨다. 이때 이전단계에서 훈련에 참가했던 비 얼굴 이미지는 제외시키고 새로운 비 얼굴 이미지를 추가하여 훈련시킨다. Second, one strong classifier is trained while adding Haar features, that is, weak classifiers, until a specified face detection rate is reached in each step. When one stage is completed in this way, the strong classifier of the next stage (the third) is trained while repeating the previous process. At this time, the non-face image that participated in the training in the previous step is excluded and a new non-face image is added to train.

그리고 얼굴 이미지인 경우 앞 단계에서 구분할 수 없었던 데이터들만 다음 단계의 훈련에 참가시킨다. 위의 과정을 계속 반복을 하여 원하는 단계의 개수와 얼굴 검출률을 얻을 때까지 반복훈련을 시킨다. 이렇게 해서 최종적(네번째)으로 케스케이드(Cascade) 형태의 얼굴 검출용인 에이다부스트(Adaboost)를 구성하게 된다.And in the case of a face image, only data that could not be distinguished in the previous step participate in the training in the next step. Repeat the above process repeatedly until the desired number of steps and face detection rate are obtained. In this way, the final (fourth) form of Adaboost for cascade face detection is configured.

각 기본 특색(Feature)들의 특징 값은, 사각형의 흰색 부분 영상 픽셀들의 밝기 합에서 검은색 부분의 밝기 합을 뺀 차로 계산되며, 영상 및 이미지 검색 시에 이 값이 특색(Feature)에 부여된 임계치 보다 큰지 아닌지에 따라서 파악하고자 하는 대상물체로 추측한다. The feature value of each basic feature is calculated by subtracting the sum of the brightness of the black part from the sum of the brightness of the image pixels in the white part of the rectangle. Depending on whether it is larger or not, it is assumed to be the object to be grasped.

여기에서 대상물체인지 확신을 하기 위해서는 의미 있는 특색(Feature)들을 파악하는 것이 중요한데, 이는 특정 영역의 인식 대상에서는 비슷한 값들을 나타내면서 대상이 아닌 경우에는 무작위한 값을 내는 경우를 구분하는 것과 같다. Here, it is important to identify meaningful features in order to be sure that the object is an object, which is the same as distinguishing cases in which similar values are displayed in the recognition target in a specific area, but random values are generated in the case of non-target objects.

또한, 하르 유사 특징(Haar-like Feature)을 이용한 얼굴인식의 경우, 눈을 예로 들면 눈의 영역은 주변보다 어두운 특징이 있기 때문에 이러한 특징은 사람 얼굴 검출에 의미 있는 특징이 된다. Also, in the case of face recognition using a Haar-like feature, if the eye is taken as an example, the eye area is darker than the surrounding area, so this feature becomes a meaningful feature for human face detection.

또한, 자살고위험군의 얼굴표정 특징점 분석하기 위하여 SVM(Support Vector Machine)을 사용한다.In addition, SVM (Support Vector Machine) is used to analyze the facial expression feature points of the suicide risk group.

상기 SVM(Support Vector Machine)은 지도학습을 통해 데이터를 분류하는 기법으로 분류 결과를 알고 있는 데이터 그룹들에 대하여 그룹과 그룹을 나누는 결정 경계를 설정할 때, 판별경계(Hyper Plane)와 결정 경계와 학습 데이터 사이의 거리(Margin)의 개념을 사용한다.The SVM (Support Vector Machine) is a technique for classifying data through supervised learning, and when setting a decision boundary that divides a group and a group for data groups for which the classification result is known, a decision boundary (Hyper Plane), a decision boundary and learning We use the concept of margin between data.

상기 분석서버는 표준데이터테이블(310)과 유효데이터를 비교 및 분석하여 도 8과 같이 분석 결과를 토대로 "표정 분석"과 "행동 분석"의 그래프는 자동으로 생성되면서 원형 차트 그래프에서 각 항목을 선택하면, 우측의 선형 그래프에서 선택된 항목에 대한 그래프 정보가 좀 더 명확하게 표시된다.The analysis server compares and analyzes the standard data table 310 and valid data, and selects each item from the pie chart graph while the graphs of “expression analysis” and “behavioral analysis” are automatically generated based on the analysis results as shown in FIG. 8 Then, the graph information for the item selected in the linear graph on the right is displayed more clearly.

본 발명은 기 상용화된 비전-디텍팅 기술의 적용으로 기존 모션인식기술 및 움직임관련 알고리즘(기술)과의 연동을 통해 더욱 강력한 평가기능의 시스템을 구축할 수 있는 것은 물론 심리학적 신경과학적인 표준데이터 테이블에 대한 알고리즘을 적용함으로써 사회적 문제로 대두 되고 있는 정신적 문제에 대한 기초 평가 프로그램으로 응용이 가능 감성 및 심리치료 분야에 대한 적용 가능 센서디바이스를 사용한 IoT 연동 기술 분야에 확대 적용 가능하다. The present invention can build a more powerful evaluation function system by interworking with existing motion recognition technology and motion-related algorithms (techniques) by applying a commercially available vision-detecting technology, as well as psychological and neuroscientific standard data By applying the algorithm to the table, it can be applied as a basic evaluation program for mental problems that are emerging as a social problem.

또한, 3D 카메라를 통해 실시간 자살고위험군 면담자의 행동양식을 획득하고 심리학 및 신경과학 전문가에 작성된 행동양식 표준화데이터를 통해 분석 및 판단하여 심리상태를 객관적으로 평가할 수 있다. In addition, it is possible to objectively evaluate the psychological state by acquiring the behavioral patterns of the interviewer in the high-risk group in real time through the 3D camera, and analyzing and judging the behavioral pattern standardized data written by psychology and neuroscience experts.

더욱이 다양한 분야에서 효과적으로 사용이 가능한 것은 물론 특히, 모션분석기술에 의한 객관성 있는 식별 방법을 제공함으로써 유형별 질문에 대한 면담자의 심리학적, 신경과학적으로 객관화된 평가를 할 수 있다. Moreover, it can be effectively used in various fields, and in particular, by providing an objective identification method by motion analysis technology, the interviewer's psychological and neuroscientific objective evaluation for each type of question can be made.

본 명세서는 다수의 특정한 구현물의 세부사항들을 포함하지만, 이들은 어떠한 고안이나 청구 가능한 것의 범위에 대해서도 제한적인 것으로서 이해되어서는 안되며, 오히려 특정한 고안의 특정한 실시형태에 특유할 수 있는 특징들에 대한 설명으로서 이해되어야 한다. While this specification contains numerous specific implementation details, these are not to be construed as limitations on the scope of any invention or claim, but rather as descriptions of features that may be specific to particular embodiments of a particular invention. should be understood

한편, 본 명세서와 도면에 개시된 본 고안의 실시 예들은 이해를 돕기 위해 특정 예를 제시한 것에 지나지 않으며, 본 고안의 범위를 한정하고자 하는 것은 아니다. 여기에 개시된 실시 예들 이외에도 본 고안의 기술적 사상에 바탕을 둔 다른 변형 예들이 실시 가능하다는 것은, 본 고안이 속하는 기술분야에서 통상의 지식을 가진 자에게 자명한 것이다.On the other hand, the embodiments of the present invention disclosed in the present specification and drawings are merely presented as specific examples to aid understanding, and are not intended to limit the scope of the present invention. It will be apparent to those of ordinary skill in the art to which the present invention pertains that other modifications based on the technical idea of the present invention can be implemented in addition to the embodiments disclosed herein.

100: 계측장치 110: 정보획득장치
111: 서브컴퓨터 112: 3D 카메라
120: 계측서버(삭제) 200: 분석모듈
210: 얼굴분석모듈 220: 동작인식모듈
230: 음성인식모듈 300: 분석서버
310: 표준데이터테이블 S10: 모션감지 및 음성인식단계
S20: 특정영역, 음성특징추출단계 S21: 기하학적 변형 및 다듬기단계
S22: 히스토그램 균등화단계 S23: 스무딩단계
S24: 마스크단계 S25: 음성파형인식단계
S26: 파형특징추출단계 S30: 분석단계100: measurement device 110: information acquisition device
111: subcomputer 112: 3D camera
120: measurement server (delete) 200: analysis module
210: face analysis module 220: motion recognition module
230: voice recognition module 300: analysis server
310: standard data table S10: motion detection and voice recognition step
S20: specific region, voice feature extraction step S21: geometric transformation and refinement step
S22: Histogram equalization step S23: Smoothing step
S24: mask step S25: voice waveform recognition step
S26: Waveform feature extraction step S30: Analysis step

Claims

The information acquisition device 110 including a subcomputer installed with a 3D camera for acquiring valid data by photographing the behavior of an interviewer in the high-risk group for suicide, and the information acquisition device 110 wirelessly or wiredly connected to effective data according to the input image a measurement device 100 including a measurement server 120 for storing;
By dividing the threshold by the average difference of the pixel values for the black and white area, each data according to the face, expression, and behavior can be separated based on valid data connected to the measurement device 100 by wire or wirelessly A face analysis module 210 for determining and grasping features, a motion recognition module 220 comprising action units to analyze the activity state of a specific muscle, extracting features from a voice, and the extracted minimum pitch (Hz)) and a maximum value (Maximum pitch (Hz)), the analysis module 200 including a voice recognition module 230 for extracting 12 features within the specified voice waveform;
By matching the separated image applied from the analysis module 200 to the standardized data stored in advance, the standard data table 310 and valid data are compared and analyzed so that the psychological state of the interviewer can be analyzed and the results expressed. Analysis and results of the analyzed valid data automatically generate graphs of “expression analysis” and “behavioral analysis” based on the analysis results, and the graph information about the items is clearly displayed in the linear graph according to each item in the graph. , an analysis server 300 that digitizes the features extracted by 12 within the voice waveform specified by the voice recognition module 230, and compares and analyzes the quantified data with pre-stored data of the patient group and the normal group;
An early screening system for high-risk groups of suicide using artificial intelligence (AI)-based video and audio information, characterized in that it includes a.

delete

According to claim 1,
The face analysis module 210,
It is calculated by subtracting the sum of the brightness of the black part from the sum of the brightness of the image pixels in the white part, and when searching for images and images, it is assumed that this value is the object to be identified depending on whether it is greater than or less than the threshold value assigned to the feature. An early screening system for high-risk groups of suicide using artificial intelligence (AI)-based video and audio information.

According to claim 1,
The motion recognition module 220,
Conditions of occurrence including under the forehead, pulling the tip of the lip, under the tip of the lip, dry lips, sucking the lips, and blinking eyes;
Degree of occurrence including forehead lift, external forehead lift, under forehead, over eyelids, over cheeks, frown, over upper lip, lip pull, dimple, under lip, over lower jaw, elongated lip, split lip, under chin An early screening system for high-risk groups of suicide using artificial intelligence (AI)-based video and audio information, characterized by recognizing the state.

According to claim 1,
Extraction of 12 features in the voice waveform,
Speech waveform pitch, slope, waveform variation feature extraction, number of syllables per second, CID ratio, number of pitches per utterance, number of words per utterance, number of content words per utterance, phoneme dialect, semantic language, subsequent utterance start time, inversion An early screening system for high-risk groups of suicide using artificial intelligence (AI)-based video and audio information, characterized in that they are words, liver projections, repeated words, and modified words.

According to claim 1,
The analysis server 300,
Based on the valid data supplied from the analysis module 200, providing a history in the form of a graph according to time change for emotions, providing a corresponding video linkage for each emotion according to the history, and providing analysis data according to behavior recognition analysis An early screening system for high-risk groups of suicide using artificial intelligence (AI)-based video and audio information.

Motion detection that detects facial expressions, head and hand movements through the video or image input from the measurement device 100, detects motion, acquires an image or image, and detects the minimum and maximum waveforms from the voice in the analysis module and a voice recognition step (S10);
A specific area of extracting a specific area from the valid data applied through the motion detection and voice recognition step (S10) to identify the emotional state of the interviewer, and extracting a characteristic from the voice waveform in the analysis module, a speech feature extraction step (S20); and
An analysis step (S30) of determining and analyzing the psychology of the face electron in the analysis server 300 by matching it with the standardized data stored in advance based on the area data applied through the specific area, the voice feature extraction step (S20);
A method for early screening of high-risk groups of suicide using artificial intelligence (AI)-based video and audio information performed in a system for early screening of high-risk groups of suicide, characterized in that it includes a.

8. The method of claim 7,
The motion detection and voice recognition step (S10),
Through the 3D camera of the measuring device, the status of occurrence including under the forehead, pulling the tip of the lip, under the tip of the lip, dry lips, sucking the lips, and blinking the eyes,
Degree of occurrence including forehead lift, external forehead lift, under forehead, over eyelids, over cheeks, frown, over upper lip, lip pull, dimple, under lip, over lower jaw, elongated lip, split lip, under chin Early detection of high-risk groups using artificial intelligence (AI)-based video and audio information performed in a high-risk group early screening system that recognizes the state and stores the voice waveforms according to the minimum and maximum values from the input voice in the measurement server selection method.

8. The method of claim 7,
The specific region, the voice feature extraction step (S20),
In order to transform the face to face forward, it uses eye detection to rotate the face so that the two eyes are exactly horizontal, then reduces the face so that the distance between the two eyes is always the same. , a geometric transformation and trimming step (S21) of cutting the jaw;
a histogram equalization step (S22) of dividing the left and right sides of the face to equalize the histogram in order to obtain standardized brightness and contrast values because the left and right faces may look like completely different faces depending on the lighting;
a smoothing step (S23) of smoothing the image using a Bilateral Filter to reduce pixel noise that may be increased in the histogram equalization step (S22) and clearing the boundary line (S23);
A mask step (S24) of extracting only the face facing the front by covering the image smoothed in the smoothing step (S23) with an oval mask;
A voice waveform recognition step (S25) of detecting a voice waveform of a minimum value and a maximum value from the voice input through the microphone;
a waveform feature extraction step (S26) of extracting 12 features from the voice waveform extracted through the voice waveform recognition step (S25);
A method for early screening of high-risk groups of suicide using artificial intelligence (AI)-based video and audio information performed in a system for early screening of high-risk groups of suicide, characterized in that it includes a.

10. The method of claim 9,
The waveform feature extraction step (S26) is,
Speech waveform pitch, slope, waveform variation feature extraction, number of syllables per second, CID ratio, number of pitches per utterance, number of words per utterance, number of content words per utterance, phoneme dialect, semantic language, subsequent utterance start time, inversion A method for early screening of high-risk groups of suicide using artificial intelligence (AI)-based video and audio information performed in an early screening system for high-risk groups, characterized in that they are words, liver projections, repeated words, and modified words.

8. The method of claim 7,
The analysis step (S30) is,
A specific area, geometric transformation and smoothing step (S21) of the speech feature extraction step (S20), the histogram equalization step (S22), the smoothing step (S23), the mask of the valid data obtained through the motion detection and speech recognition step (S10) Compare and match the data extracted from 12 features in the face image and voice waveform obtained through step S24 with the standard data table 310,
Artificial intelligence (AI)-based video and audio performed in an early detection system for high-risk groups of suicide, characterized in that the data according to the characteristics of the matched image and audio waveform are compared with the data of the patient group and the normal group to extract and display objective information and interpretation data An early screening method for high-risk groups of suicide using information.