KR20240014941A

KR20240014941A - AI analysis system based on continuous but incomplete data

Info

Publication number: KR20240014941A
Application number: KR1020220092739A
Authority: KR
Inventors: 정태명; 김문현; 유주헌
Original assignee: 주식회사 히포티앤씨; 성균관대학교산학협력단
Priority date: 2022-07-26
Filing date: 2022-07-26
Publication date: 2024-02-02

Abstract

본 발명은, 멀티 모달 데이터가 시계열적으로 획득될 때 획득되는 데이터에 결측치가 있어 불완전한 경우 이를 보완하기 위한 인공지능을 포함하는 시스템에 관한 것이다.
본 발명에 따른 불완전한 연속 데이터 기반의 인공지능 분석 시스템은 multi-modal 데이터 환경에서 결측이 있더라도 데이터를 분석할 수 있는 효과가 있다.The present invention relates to a system including artificial intelligence to compensate for incomplete data due to missing values when multi-modal data is acquired in time series.
The artificial intelligence analysis system based on incomplete continuous data according to the present invention is effective in analyzing data even if there are missing data in a multi-modal data environment.

Description

AI analysis system based on continuous but incomplete data}

본 발명은 불완전한 연속 데이터 기반의 인공지능 분석 시스템에 관한 것이다.The present invention relates to an artificial intelligence analysis system based on incomplete continuous data.

최근 인공지능(AI)를 활용한 기술들이 활발하게 개발되고 있으며, 인공지능을 활용하기 위하여 데이터를 확보하는 것이 필수적으로 전제되고 있다. Recently, technologies utilizing artificial intelligence (AI) have been actively developed, and securing data is an essential prerequisite for utilizing artificial intelligence.

분석을 위해 다양한 종류의 데이터를 필요로 하는 영역, 일 예로 정신건강의학과 관련된 영역에서는 문진이나 행동, 대화 등을 관찰하고 입력한 정보 또는 행동을 확인할 수 있는 영상데이터 등을 종합하여 특정 증상을 진단할 수 있다.In areas that require various types of data for analysis, for example, areas related to mental health, it is possible to diagnose specific symptoms by observing interview questions, behavior, and conversations, and combining video data that can confirm input information or behavior. You can.

종래기술로서 이와 같은 시간의 순서를 두고 연속적으로 획득되는 데이터들이 일부 결측(missing)되었을 때 이를 대체하기 위한 기술들이 개시되고 있다. 이와 관련하여 대한민국 공개특허 대한민국 공개특허 제10-2020-0030303 호가 개시되어 있다.As a prior art, technologies are being disclosed to replace data that is acquired sequentially in such a time sequence when some of it is missing. In this regard, Republic of Korea Patent Publication No. 10-2020-0030303 is disclosed.

그러나 이러한 종래기술은 시간 간격을 두고 수집되는 연속적인 multi-modal 데이터 환경에서 특정 시점의 1개 이상의 modal 데이터 전체가 결측되거나, modal 데이터 내에서 일부 속성들이 누락 또는 잡음이 추가되는 종합적인 상황에서 데이터를 분석하지 못하는 한계점이 있었다.However, this prior art uses data in a comprehensive situation where one or more modal data at a specific point in time is missing in a continuous multi-modal data environment collected at intervals, or some attributes are missing or noise is added within the modal data. There was a limitation in analyzing.

대한민국 공개특허 제10-2020-0030303 호Republic of Korea Patent Publication No. 10-2020-0030303

본 발명은 종래의 시간 간격을 두고 수집되는 연속적인 데이터에서 불완전한 데이터를 보완하고, 이를 근거로 분석하기 위한 인공지능 분석 시스템을 제공하는 것에 그 목적이 있다.The purpose of the present invention is to provide an artificial intelligence analysis system for compensating for incomplete data in conventional continuous data collected at time intervals and analyzing it based on this.

상기 과제의 해결 수단으로서, 시간 간격을 두고 차수에 따라 획득되는 연속적인 멀티 모달(multi-modal) 데이터를 획득하는 데이터 수집부, 각 차수에 따라 획득되는 멀티 모달 데이터에 포함된 개별 데이터의 특징을 추출하여 개별 벡터화하고, 개별 벡터를 연결하여 멀티 모달 벡터가 포함된 입력 벡터를 생성하는 입력 데이터 생성부, 입력 벡터에 결측치가 있는 경우 결측치를 보완하여 보완된 입력 벡터를 생성하는 결측 보완 인공지능부 및 보완된 입력 벡터를 입력받고 분석하여 분석된 결과 벡터를 출력하는 분석 인공지능부를 포함하며, 결측 보완 인공지능부의 학습은, 결측이 없는 복수의 차수에 따른 멀티 모달 벡터에 임의의 결측치를 치환하여 생성된 학습 데이터를 기반으로 비지도 학습되며, 에러 역전파(error backpropagation) 알고리즘을 적용하여 가중치를 수정하는 불완전한 연속 데이터 기반의 인공지능 분석 시스템이 제공될 수 있다.As a means of solving the above problem, a data collection unit that acquires continuous multi-modal data acquired according to orders at time intervals, and the characteristics of individual data included in the multi-modal data acquired according to each order An input data generation unit that extracts and converts individual vectors and connects individual vectors to generate an input vector containing multi-modal vectors; a missingness compensation artificial intelligence unit that compensates for missing values if there are missing values in the input vector and creates a supplemented input vector and an analysis artificial intelligence unit that receives and analyzes the supplemented input vector and outputs the analyzed result vector. The learning of the missingness compensation artificial intelligence unit is performed by substituting random missing values into multi-modal vectors according to multiple orders without missingness. An artificial intelligence analysis system based on incomplete continuous data can be provided that undergoes unsupervised learning based on the generated learning data and modifies weights by applying an error backpropagation algorithm.

한편, 결측 보완 인공지능부는, 입력층, 은닉층 및 출력층을 포함하며, 입력층과 은닉층은 완전연결(fully connected)되며, 은닉층과 출력층은 완전연결(fully connected)되며, 은닉층과 출력층의 노드의 활성화 함수(activation fuction)는 비선형함수일 수 있다. Meanwhile, the missingness compensation artificial intelligence unit includes an input layer, a hidden layer, and an output layer. The input layer and the hidden layer are fully connected, the hidden layer and the output layer are fully connected, and the nodes of the hidden layer and the output layer are activated. The activation function may be a non-linear function.

한편, 결측 보완 인공지능부를 학습하기 위한 학습데이터는, 결측이 없는 복수의 차수에 따른 멀티 모달 벡터에 임의의 개수의 값들을 선택하여 0으로 치환하여 생성될 수 있다.Meanwhile, learning data for learning the missingness compensating artificial intelligence unit can be generated by selecting an arbitrary number of values in a multi-modal vector according to a plurality of orders without missingness and replacing them with 0.

이때, 결측 보완 인공지능의 에러 함수는 이며, n 은 학습데이터의 개수이고, m 은 입력 벡터의 차원(dimension)이며, 는 i 번째 입력 벡터의 j 번째 속성이며, 는 i 번째 학습데이터를 연결했을 때 출력 벡터의 j 번째 속성값일 수 있다.At this time, the error function of missingness compensation artificial intelligence is , n is the number of training data, m is the dimension of the input vector, is the jth attribute of the ith input vector, may be the jth attribute value of the output vector when connecting the ith learning data.

한편, 결측 보완 인공지능의 에러 함수는 이며, n 은 학습데이터의 개수이고, m 은 입력 벡터의 차원(dimension)이며, 는 i 번째 입력 벡터의 j 번째 속성이며, 는 i 번째 학습데이터를 연결했을 때 출력 벡터의 j 번째 속성값일 수 있다.Meanwhile, the error function of missingness compensation artificial intelligence is , n is the number of training data, m is the dimension of the input vector, is the jth attribute of the ith input vector, may be the jth attribute value of the output vector when connecting the ith learning data.

한편, 보완 인공지능부는, 은닉층에 시계열의 특성을 반영할 수 있도록 recurrent connection을 포함할 수 있다. Meanwhile, the complementary artificial intelligence unit may include a recurrent connection to reflect the characteristics of the time series in the hidden layer.

한편, 보완 인공지능부는 은닉층의 어느 하나의 노드는 다른 나머지 노드 각각에 recurrent 될 수 있다. Meanwhile, in the complementary artificial intelligence unit, any one node of the hidden layer can be recurrent to each of the other remaining nodes.

또한, 은닉층의 어느 하나의 노드가 다른 나머지 노드 각각에 recurrent 될 때 각각 고유한 가중치를 가질 수 있다. Additionally, when any node in the hidden layer recurrently recurs to each of the other remaining nodes, each node may have a unique weight.

한편, 멀티 모달 데이터는 영상, 음성, 텍스트, 센서값, 게임게이터 및 비디오 특정값 중 적어도 두 개를 포함할 수 있다.Meanwhile, multi-modal data may include at least two of video, voice, text, sensor values, gamegater, and video specific values.

한편, 멀티 모달 데이터는 질병의 진단 및 치료시 획득되는 의료데이터일 수 있다.Meanwhile, multi-modal data may be medical data obtained during diagnosis and treatment of a disease.

본 발명에 따른 불완전한 연속 데이터 기반의 인공지능 분석 시스템은 multi-modal 데이터 환경에서 결측이 있더라도 데이터를 분석할 수 있는 효과가 있다.The artificial intelligence analysis system based on incomplete continuous data according to the present invention is effective in analyzing data even if there are missing data in a multi-modal data environment.

도 1은 본 발명의 일 실시예에 따른 불완전한 연속 데이터 기반의 인공지능 분석 시스템의 블록도이다.
도 2는 본 발명의 일 실시예에 따른 불완전한 연속 데이터 기반의 인공지능 분석 시스템에서 데이터 처리의 개념이 도시된 블록도이다.
도 3은 입력 데이터 생성부의 기능을 도시한 개념도이다.
도 4는 입력 데이터 생성부에서 멀티 모달 데이터로부터 입력 벡터를 도출하는 개념을 도시한 개념도이다.
도 5는 결측 보완 인공지능부의 기능을 도시한 개념도이다.
도 6은 결측 보완 인공지능부에 입력하기 위한 입력벡터의 개념을 도시한 개념도이다.
도 7은 멀티 모달 데이터, 입력 벡터 및 보완된 입력 벡터의 예를 도시한 데이터 셋의 개념도이다.
도 8은 결측 보완 인공지능부의 신경망을 도시한 도면이다.
도 9는 본 발명의 다른 실시예에서 결측 보완 인공지능부의 신경망을 도시한 도면이다.1 is a block diagram of an artificial intelligence analysis system based on incomplete continuous data according to an embodiment of the present invention.
Figure 2 is a block diagram illustrating the concept of data processing in an artificial intelligence analysis system based on incomplete continuous data according to an embodiment of the present invention.
Figure 3 is a conceptual diagram showing the function of the input data generator.
Figure 4 is a conceptual diagram illustrating the concept of deriving an input vector from multi-modal data in an input data generator.
Figure 5 is a conceptual diagram showing the function of the missingness compensation artificial intelligence unit.
Figure 6 is a conceptual diagram illustrating the concept of an input vector for input to a missingness correction artificial intelligence unit.
Figure 7 is a conceptual diagram of a data set showing examples of multi-modal data, input vectors, and complemented input vectors.
Figure 8 is a diagram showing the neural network of the missingness compensation artificial intelligence unit.
Figure 9 is a diagram showing a neural network of a missingness compensation artificial intelligence unit in another embodiment of the present invention.

이하, 본 발명의 실시 예에 따른 불완전한 연속 데이터 기반의 인공지능 분석 시스템에 대하여, 첨부된 도면을 참조하여 상세히 설명한다. 그리고 이하의 실시예의 설명에서 각각의 구성요소의 명칭은 당업계에서 다른 명칭으로 호칭될 수 있다. 그러나 이들의 기능적 유사성 및 동일성이 있다면 변형된 실시예를 채용하더라도 균등한 구성으로 볼 수 있다. 또한 각각의 구성요소에 부가된 부호는 설명의 편의를 위하여 기재된다. 그러나 이들 부호가 기재된 도면상의 도시 내용이 각각의 구성요소를 도면내의 범위로 한정하지 않는다. 마찬가지로 도면상의 구성을 일부 변형한 실시예가 채용되더라도 기능적 유사성 및 동일성이 있다면 균등한 구성으로 볼 수 있다. 또한 당해 기술 분야의 일반적인 기술자 수준에 비추어 보아, 당연히 포함되어야 할 구성요소로 인정되는 경우, 이에 대하여는 설명을 생략한다.Hereinafter, an artificial intelligence analysis system based on incomplete continuous data according to an embodiment of the present invention will be described in detail with reference to the attached drawings. And in the description of the following embodiments, the names of each component may be referred to by different names in the art. However, if there is functional similarity and identity between them, they can be viewed as equivalent configurations even if modified embodiments are adopted. Additionally, symbols added to each component are described for convenience of explanation. However, the content shown in the drawings in which these symbols are written does not limit each component to the scope within the drawings. Likewise, even if an embodiment in which the configuration in the drawing is partially modified is adopted, if there is functional similarity and identity, it can be viewed as an equivalent configuration. Additionally, if it is recognized as a component that should naturally be included in light of the general level of technicians in the relevant technical field, the description thereof will be omitted.

도 1은 본 발명의 일 실시예에 따른 불완전한 연속 데이터 기반의 인공지능 분석 시스템의 블록도이다.1 is a block diagram of an artificial intelligence analysis system based on incomplete continuous data according to an embodiment of the present invention.

도 1을 참조하면, 본 발명의 일 실시예에 따른 불완전한 연속 데이터 기반의 인공지능 시스템(1)은 다양한 모드로 획득된 데이터에 결측치가 존재하는 경우 결측지를 보완하여 분석할 수 있도록 구성된다.Referring to FIG. 1, the artificial intelligence system 1 based on incomplete continuous data according to an embodiment of the present invention is configured to be able to supplement and analyze missing values when there are missing values in data acquired in various modes.

본 개시에서 멀티 모달 데이터(multi-modal data)라 함은 다양한 방식으로 획득된 데이터를 뜻한다. 일 예로서, 환자의 증상을 판단하기 위한 다양한 지표가 되는 데이터가 될 수 있다. 이 경우 멀티모달 데이터는 영상데이터, 음성 데이터, 텍스트 데이터, 센서 데이터, 게임 데이터, 비디오 데이터 등이 될 수 있다. 이때 멀티모달 데이터는 획득된 시간에 대한 정보를 포함할 수 있다.In this disclosure, multi-modal data refers to data acquired in various ways. As an example, it can be data that serves as various indicators for determining a patient's symptoms. In this case, multimodal data can be image data, voice data, text data, sensor data, game data, video data, etc. At this time, multimodal data may include information about the acquired time.

영상 데이터는 환자로부터 획득한 이미지에 관한 정보 및 해당 이미지가 획득된 시간에 대한 정보를 포함할 수 있다. 영상 데이터는 일 예로서 인체를 대상으로 하는 X-RAY 영상, CT 영상일 수 있다.Image data may include information about an image acquired from a patient and information about the time at which the image was acquired. For example, the image data may be an X-RAY image or CT image of the human body.

음성 데이터는 환자의 발성이나 대화 내용, 음색 등의 정보를 포함할 수 있다.Voice data may include information such as the patient's vocalization, conversation content, or tone of voice.

텍스트 데이터는 의료진 또는 보호자가 환자를 관할하여 입력한 정보가 포함될 수 있다. 일 예로서, 환자의 나이, 이름과 같은 식별정보를 포함할 수 있으며, 또한 이미 임상에서 사용되고 있는 문진에 대한 관찰결과 데이터를 포함할 수 있다. 이 경우 예를 들어 환자의 주의력이 부족에 대한 정도를 관찰하여 수치로 입력될 수 있다. Text data may include information entered by medical staff or guardians under the jurisdiction of the patient. As an example, it may include identification information such as the patient's age and name, and may also include observation result data for questionnaires already used in clinical practice. In this case, for example, the degree to which the patient's attention is lacking can be observed and input as a number.

센서 데이터는 다양한 종류의 센서를 이용하여 환자로부터 획득되는 정보를 포함할 수 있다. 센서 데이터는 일 예로서, 환자의 체온, 몸무게, 혈압, 심전도 등일 수 있다.Sensor data may include information obtained from a patient using various types of sensors. Sensor data may be, for example, the patient's temperature, weight, blood pressure, electrocardiogram, etc.

게임 데이터는 환자를 특정 상황에 노출시키고, 반응을 관찰하거나 특정 입력을 요구하기 위한 컨텐츠를 포함할 수 있다. 또한 게임을 진행하면서 특정 콘텐츠에 노출되었을 때 사용자로부터 획득되는 데이터를 포함할 수 있다. Game data may include content for exposing patients to specific situations, observing reactions, or requesting specific input. It may also include data obtained from the user when exposed to specific content while playing the game.

비디오 데이터는 영상의 열(sequence)이며, 일 예로서 환자의 관찰 비디오 영상일 수 있다. 환자의 관찰 비디오는 예들 들어 정신건강학에서 ADHD, 자폐증과 같은 질병을 진단하거나 중증 정도 등을 판단하기 위해 환자의 행동을 촬영한 영상일 수 있다. 이 경우 비디오 데이터에는 환자의 행동이나 표정, 그리고 자세 등에 관한 정보가 포함될 수 있다. 또한 비디오 데이터는 초음파 진단 비디오 영상, 조영술과 같이 인체로부터 획득되는 영상을 포함할 수 있다.Video data is a sequence of images, and may be an observation video image of a patient, as an example. For example, in mental health, a patient's observation video may be a video of a patient's behavior to diagnose diseases such as ADHD or autism or to determine the severity. In this case, the video data may include information about the patient's behavior, expression, and posture. Additionally, video data may include images obtained from the human body, such as ultrasound diagnostic video images and angiography.

결국 멀티모달 데이터는 특정 대상의 증상을 진단하기 위한 다양한 단서들을 포함한 종합적인 정보일 수 있다.Ultimately, multimodal data can be comprehensive information that includes various clues to diagnose the symptoms of a specific subject.

다만, 전술한 예는 의료분야에서 정신건강과 관련된 진단을 하기 위한 예를 들어 설명하였으나, 본 발명에 따른 실시예서는 이에 한정하지 않고 멀티 모달 데이터는 특정 분석을 하기 위한 다양한 종류의 데이터일 수 있다.However, although the above-mentioned example was described as an example for making a diagnosis related to mental health in the medical field, the embodiment according to the present invention is not limited to this, and multi-modal data may be various types of data for specific analysis. .

다시 도 1을 참조하면 본 발명에 따른 불완전한 연속 데이터 기반의 인공지능 분석 시스템은 데이터 수집부(10) 및 서버(20)를 포함하여 구성될 수 있다.Referring again to FIG. 1, the artificial intelligence analysis system based on incomplete continuous data according to the present invention may be configured to include a data collection unit 10 and a server 20.

데이터 수집부(10)는 다양한 정보를 입력받을 수 있도록 구성된다. 데이터 수집부(10)는 카메라, 스캐너 등의 이미지 및 동영상 데이터를 획득하기 위하여 구성되며, 마이크는 영상 데이터를 획득하도록 구성될 수 있다. 또한 데이터 수집부(10)는 게임 프로그램으로부터 데이터를 추출하여 콘텐츠에 대한 정보를 획득할 수 있다. 즉 데이터 수집부(10)는 카메라, 마이크, 기타 센서 등가 같은 다양한 장치로부터 적어도 하나의 시간의 흐름에 따른 데이터를 획득하는 넓은 의미의 다양한 수단을 뜻한다.The data collection unit 10 is configured to receive various information. The data collection unit 10 may be configured to acquire image and video data from cameras, scanners, etc., and the microphone may be configured to acquire video data. Additionally, the data collection unit 10 can obtain information about content by extracting data from the game program. In other words, the data collection unit 10 refers to a wide variety of means for acquiring at least one piece of data over time from various devices such as cameras, microphones, and other sensors.

서버(20)는 데이터 베이스부(100), 입력 데이터 생성부(200), 결측 보완 인공지능부(300), 분석 인공지능부(400)를 포함할 수 있다. 서버(20)는 프로세서, 메모리, 통신모듈 등을 포함할 수 있다. 서버(20)는 전술한 데이터 수집부(10)와 통신하여 데이터를 수신할 수 있다. 다만, 이러한 서버(20)의 구성은 널리 알려진 구성이 포함될 수 있으므로 더 이상의 상세한 설명은 생략하도록 한다. The server 20 may include a database unit 100, an input data generation unit 200, a missingness compensation artificial intelligence unit 300, and an analysis artificial intelligence unit 400. The server 20 may include a processor, memory, communication module, etc. The server 20 may receive data by communicating with the data collection unit 10 described above. However, since the configuration of the server 20 may include widely known configurations, further detailed description will be omitted.

데이터 베이스부(100)는 데이터 수집부(10)에서 전송한 멀티 모달 데이터를 저장할 수 있도록 구성된다. 멀티 모달 데이터로 저장하며, 각각의 데이터에 대한 시간 동기화를 진행하여 데이터 베이스화 할 수 있다. 데이터 베이스부(100) 수신된 데이터는 원본 데이터로서, 시간의 흐름에 따라 완벽한 데이터를 포함하는 경우는 많지 않다. 일부 센서의 오작동, 센싱 누락, 노이즈 등에 의해 부정확하거나 결측이 발생할 수 있다. 원본 데이터는 이러한 결측값을 포함한 불완전한 상태일 수 있다.The database unit 100 is configured to store multi-modal data transmitted from the data collection unit 10. It is saved as multi-modal data and can be converted into a database by synchronizing the time of each data. The data received from the database unit 100 is original data and does not often contain complete data over time. Inaccuracies or missing information may occur due to malfunction of some sensors, missing sensing, noise, etc. The original data may be incomplete, including these missing values.

입력 데이터 생성부(200)는 소정 시간 간격에 따른 차수, 즉 원본 데이터를 소정 시간 간격으로 구분하며, 각 차수에서 멀티 모달 데이터를 추출한다. 그리고 각 차수에서 멀티 모달 데이터에 포함된 각각의 개별 데이터에 대한 특징을 추출하여 개별 벡터화한다. The input data generator 200 divides orders according to a predetermined time interval, that is, original data into predetermined time intervals, and extracts multi-modal data from each order. Then, features for each individual data included in the multi-modal data at each order are extracted and vectorized individually.

개별 벡터화는 예컨대, 음성 데이터에 대한 전처리를 수행하고, scale 변환을 수행하고 heat map 형태로 추출한 후, 이를 벡터화하는 방식을 사용할 수 있다. (내용이 맞는지 확인부탁드리겠습니다.좋습니다.)Individual vectorization can be performed, for example, by preprocessing voice data, performing scale conversion, extracting it in the form of a heat map, and then vectorizing it. (Please check if the contents are correct. It is good.)

입력 데이터 생성부(200)는 시간 차수에 따른 각각의 개별 벡터를 소정 순서에 따라 연결하여 멀티 모달 벡터를 생성한다. 이러한 입력 데이터 생성부(200)의 기능은 결측 보완 인공지능부(300) 및 분석 인공지능부(400)에 데이터를 입력하기 전 전처리 과정으로 이해할 수 있다. 한편, 입력 데이터 생성부(200)는 원본 데이터에 결측치가 있는 경우 벡터 값을 0으로 치환하여 멀티 모달 벡터를 생성한다.The input data generator 200 connects each individual vector according to time order in a predetermined order to generate a multi-modal vector. The function of this input data generation unit 200 can be understood as a pre-processing process before inputting data into the missingness compensation artificial intelligence unit 300 and the analysis artificial intelligence unit 400. Meanwhile, the input data generator 200 generates a multi-modal vector by replacing the vector value with 0 when there is a missing value in the original data.

결측 보완 인공지능부(300)는 멀티 모달 벡터 내에 결측치에 의한 영향을 제거하고, 보완할 수 있도록 구성된다. The missingness compensation artificial intelligence unit 300 is configured to remove and compensate for the influence of missing values in the multi-modal vector.

결측 보완 인공지능부(300)는 소정 시간에 대한 멀티 모달 벡터, 예컨대 3 차수라면, t-2, t-1, t 각각에 대한 멀티 모달 벡터를 하나로 합쳐 결측치를 보완한다. 이때 결측 보완 인공지능부(300)에서 동시에 입력되는 데이터의 크기, 예컨대 몇 개의 차수에 대한 멀티 모달 벡터를 선택하는지는 윈도윙(windowing)에 의해 결정될 수 있다. 즉 무빙 윈도우(moving window)는 예컨대, 움직이는 창과 같이 입력 차수에 따라 처음에는 t-3, t-2, t-1일 때 멀티 모달 데이터를 입력하고, 그 다음에는 t-2, t-1, t일 때의 멀티 모달 데이터를 입력할 수 있다. 즉 결측 보완 인공지능부(300)는 시간에 따른 차수별로 데이터가 저장되어 있을 때 연속적인 일정 시간 간격동안의 데이터를 입력받을 수 있다.The missingness compensation artificial intelligence unit 300 compensates for missing values by combining multi-modal vectors for a predetermined time, for example, multi-modal vectors for t-2, t-1, and t into one if it is of order 3. At this time, the size of simultaneously input data in the missingness compensation artificial intelligence unit 300, for example, how many orders of multi-modal vectors are selected, can be determined by windowing. In other words, a moving window, like a moving window, first inputs multi-modal data at t-3, t-2, and t-1, and then at t-2, t-1, and Multi-modal data at t can be entered. That is, the missingness correction artificial intelligence unit 300 can receive data for a continuous certain time interval when data is stored by order over time.

결측 보완 인공지능부(300)는 결국 무빙 윈도우에 따라 순차적으로 멀티 모달 벡터가 선택되어 입력되는 경우 결측치를 보완하여 보완된 출력 벡터를 생성하며, 보완된 출력 벡터는 분석 인공지능부(400)에 대한 입력이 된다.When multi-modal vectors are sequentially selected and input according to a moving window, the missingness compensation artificial intelligence unit 300 compensates for missing values to generate a supplemented output vector, and the supplemented output vector is sent to the analysis artificial intelligence unit 400. becomes an input for

결측 보완 인공지능부(300)는 미리 학습된 상태일 수 있다. 결측 보완 인공지능부(300)는 비지도학습으로 학습될 수 있다. 이때 학습 데이터는 시간에 따른 복수의 차수에 대한 멀티 모달 벡터를 포함할 수 있다. 이때 학습을 위한 멀티 모달 벡터는 결측치가 없는 완벽한 상태의 데이터를 기초로하며, 멀티 모달 벡터 내에 무작위 개수와 위치에 대하여 임의로 결측치를 생성한다. 결국 입력 데이터 내에는 무작위로 생성된 0의 값을 갖는 벡터들이 포함된다.The missingness compensation artificial intelligence unit 300 may be in a pre-trained state. The missingness compensation artificial intelligence unit 300 can be learned through unsupervised learning. At this time, the learning data may include multi-modal vectors for multiple orders over time. At this time, the multi-modal vector for learning is based on perfect data with no missing values, and missing values are randomly generated for random numbers and positions within the multi-modal vector. Ultimately, the input data includes randomly generated vectors with a value of 0.

상기의 과정으로 생성된 입력 데이터를 결측 보완 인공지능부(300)에 입력하고 보완된 입력 벡터를 구한다. 이때 에러 역전파(error backpropagation) 알고리즘을 적용하여 가중치를 수정한다. 이때 사용되는 에러 함수는 MSE(Mean Square Error)이며, 구체적으로 아래와 같다.The input data generated through the above process is input to the missingness correction artificial intelligence unit 300 and a complemented input vector is obtained. At this time, the weight is modified by applying the error backpropagation algorithm. The error function used at this time is MSE (Mean Square Error), and is specifically as follows.

또한, 에러 함수는 cross entropy 일 수 있으며, 구체적으로 아래와 같다.Additionally, the error function may be cross entropy, specifically as follows.

이때 n은 입력 데이터의 개수이고, m은 입력벡터 X 의 차원(dimension)이며, 는 i 번째 입력 데이터의 j 번째 속성이고, 는 i 번째 입력 데이터를 연결했을 때 결측 보완 인공지능부에 의해 출력된 j 번째 속성값을 뜻한다.At this time, n is the number of input data, m is the dimension of the input vector is the jth attribute of the ith input data, refers to the jth attribute value output by the missingness correction artificial intelligence unit when connecting the ith input data.

분석 인공지능부(400)는 결측 보완 인공지능부(300)로부터 보완된 출력 벡터를 입력받고 분석한 결과 벡터를 출력할 수 있다.The analysis artificial intelligence unit 400 may receive a supplemented output vector from the missingness correction artificial intelligence unit 300 and output an analysis result vector.

이때 분석 인공지능부(400)는 지도 학습에 의해 보완된 출력 벡터를 근거로 판단될 수 있는 의미 있는 결과값을 도출할 수 있다. 예컨대, 정신건강과 관련된 결과를 도출하도록 학습된 경우 보완된 출력 벡터가 입력되었을 때, 우울증, 자폐증 등의 증상을 분류하도록 분석 결과 벡터를 출력할 수 있다. 이와 같은 분석 인공지능부(400)는 알려진 다양한 방식의 신경망이 사용될 수 있으므로, 이와 관련된 상세한 설명은 생략하도록 한다.At this time, the analysis artificial intelligence unit 400 can derive meaningful results that can be judged based on the output vector supplemented by supervised learning. For example, when learned to produce results related to mental health, when a supplemented output vector is input, the analysis result vector can be output to classify symptoms such as depression and autism. Since the analysis artificial intelligence unit 400 may use various known types of neural networks, detailed descriptions thereof will be omitted.

도 2는 본 발명의 일 실시예에 따른 불완전한 연속 데이터 기반의 인공지능 분석 시스템(1)에서 데이터 처리의 개념이 도시된 블록도이다.Figure 2 is a block diagram illustrating the concept of data processing in an artificial intelligence analysis system 1 based on incomplete continuous data according to an embodiment of the present invention.

도 2를 참조하면, 데이터 수집부에서 획득된 멀티 모달 원본 데이터는 입력 데이터 생성부(200)를 통하여 각 차수별로 각각의 개별 데이터가 특징벡터화 되어 취합되고 멀티 모달 벡터를 포함하는 입력 벡터(X)를 생성한다. 입력 데이터 생성부(200)는 멀티 모달 벡터(X)의 생성시 각 차수별로 개별 데이터가 결측되거나, 하나의 개별 데이터 내에서 변수가 누락되는 경우 이를 ‘0’의 값으로 치환한다.Referring to FIG. 2, the multi-modal original data obtained from the data collection unit is collected by converting individual data for each order into feature vectors through the input data generation unit 200, and generating an input vector (X) including the multi-modal vector. creates . When generating a multi-modal vector (

이후 결측 보완 인공지능부(300)는 입력벡터(X)를 입력받고 ‘0’의 값으로 치환되었던 멀티 모달 벡터 내의 결측값을 보완하여 보완된 입력 벡터(X’)를 생성한다.Afterwards, the missingness compensation artificial intelligence unit 300 receives the input vector (X) and compensates for the missing values in the multi-modal vector that were replaced with the value of ‘0’ to generate a complemented input vector (X’).

분석 인공지능부(400)는 보완된 입력 벡터(X’)를 입력받고 분석하여 결과벡터(Y)를 출력하여 분석을 완료하게 된다.The analysis artificial intelligence unit 400 receives the supplemented input vector (X'), analyzes it, and outputs the result vector (Y) to complete the analysis.

도 3은 입력 데이터 생성부의 기능을 도시한 개념도이다.Figure 3 is a conceptual diagram showing the function of the input data generator.

도 3을 참조하면, 원본 데이터는 시간의 흐름에 따라 각 차수(t-2, t-1, t)별로 멀티 모달 데이터가 포함될 수 있다.Referring to FIG. 3, the original data may include multi-modal data for each order (t-2, t-1, t) over time.

이때 입력 데이터 생성부(200)는 각 차수별로, 예를 들어 t-2 시점에서 영상특징, 음성특징, 텍스트 데이터, 센서 데이터, 게임 데이터 게임 데이터, 비디오 특징 각각에 대하여 개별 특징벡터화를 수행하고, 이를 취합하여 t-2 시점에서의 입력벡터를 생성한다. 또한 다른 차수, 즉 t-1 시점에서 입력벡터(t-1)을 생성하고, 또한 t 시점에서 입력벡터(t)를 생성한다.At this time, the input data generator 200 performs individual feature vectorization for each order, for example, image features, audio features, text data, sensor data, game data, game data, and video features at time t-2, By collecting these, an input vector at time t-2 is generated. Also, an input vector (t-1) is generated at another order, that is, at time t-1, and an input vector (t) is also generated at time t.

다만, 입력 데이터 생성부(200)에 의한 입력벡터의 생성은 3개의 차수에 대하여 설명하였으나, 시간의 흐름에 따라 복수의 시점에서 동일한 과정이 수행될 수 있다.However, although the generation of the input vector by the input data generator 200 has been described in terms of three orders, the same process may be performed at multiple points in time.

도 4는 입력 데이터 생성부에서 멀티 모달 데이터로부터 입력벡터를 도출하는 개념을 도시한 개념도이다.Figure 4 is a conceptual diagram illustrating the concept of deriving an input vector from multi-modal data in an input data generator.

도 4를 참조하면, 도 3을 참조하여 설명한 바와 같이 입력 데이터 생성부에(200)서 원본 데이터를 이용하여 멀티 모달 벡터를 포함하는 입력 벡터를 생성하게 되나, 이때 결측치가 있는 경우 이에 대한 값을 ‘0’으로 치환하여 입력벡터를 생성하게 된다. 일 예로서, t 시점에서 영상특징에 대한 데이터 및 센서 데이터가 누락되어 있는 경우 해당 특징 벡터에 대한 값을 0으로 치환하게 된다.Referring to FIG. 4, as explained with reference to FIG. 3, the input data generator 200 uses the original data to generate an input vector including a multi-modal vector. However, if there is a missing value, the corresponding value is generated. The input vector is created by replacing it with '0'. As an example, if data and sensor data for image features are missing at time t, the value for the corresponding feature vector is replaced with 0.

도 5는 결측 보완 인공지능부의 기능을 도시한 개념도이며, 도 6은 결측 보완 인공지능부에 입력하기 위한 입력벡터의 개념을 도시한 개념도이다. Figure 5 is a conceptual diagram showing the function of the missingness compensation artificial intelligence unit, and Figure 6 is a conceptual diagram showing the concept of an input vector for input to the missingness compensation artificial intelligence unit.

도 5를 참조하면, 결측 보완 인공지능부(300)는 각 차수(t-4, t-3, t-2, t-1, t) 각각에 대하여 멀티 모달 벡터를 포함하는 입력 벡터가 입력될 수 있다. 이때 결측 보완 인공지능부(300)는 무빙 윈도우(moving window)를 통하여 인접하는 소정개수, 예컨대 3개의 차수에 대한 입력 벡터를 입력받을 수 있다. 일 예로서, 도 6을 참조하면, 도 5에서 윈도우 2에 의해 선택된 3 개의 입력벡터(시각 t-3에서의 입력벡터, 시각 t-2에서의 입력벡터, 시각 t-1 에서의 입력벡터)들은 연속적으로 연결되어 결측 보완 인공지능부(300)에 입력될 수 있다. Referring to FIG. 5, the missingness compensation artificial intelligence unit 300 receives an input vector including a multi-modal vector for each order (t-4, t-3, t-2, t-1, t). You can. At this time, the missingness correction artificial intelligence unit 300 may receive input vectors for a predetermined number of adjacent orders, for example, three orders, through a moving window. As an example, referring to FIG. 6, three input vectors selected by window 2 in FIG. 5 (input vector at time t-3, input vector at time t-2, input vector at time t-1) They can be continuously connected and input into the missingness correction artificial intelligence unit 300.

즉 윈도우 1(window 1)에서는 t-4, t-3, t-2 시점에서 각각 변환된 입력벡터를 동시에 입력받고 결측치를 보완하게 된다. 결측치가 보완된 이후 윈도우를 다음 차수로 이동하게 된다. 즉, 다음 차수에서 윈도우 2(window 2)는 t-3, t-2, t-1 시점에서 각각 변환된 입력벡터를 동시에 입력받고 결측치가 있는 경우 이를 보완하게 된다. 결측 보완 인공지능부(300)는 이러한 시간의 흐름에 따라 획득되는 수많은 시점에 대하여 전술한 과정을 거쳐 결측치를 보완하여 보완된 입력 벡터를 생성하게 된다.That is, in window 1, the converted input vectors are simultaneously received at time points t-4, t-3, and t-2, and missing values are compensated. After missing values are corrected, the window moves to the next order. That is, in the next order, window 2 simultaneously receives the converted input vectors at time points t-3, t-2, and t-1, and compensates for any missing values. The missingness compensation artificial intelligence unit 300 compensates for missing values through the above-described process for numerous viewpoints acquired over time and generates a supplemented input vector.

도 7은 멀티 모달 데이터, 입력 벡터 및 보완된 입력 벡터의 예를 도시한 데이터 셋의 개념도이다.Figure 7 is a conceptual diagram of a data set showing examples of multi-modal data, input vectors, and complemented input vectors.

도 7을 참조하면, 결측 보완 인공지능부에 의해 입력 벡터(X)내에 0으로 대체되었던 영상특징(t)에 대한 데이터와 센서데이터(t)의 결측치가 보완된 입력벡터(X’)에서는 보완이 되어 있다. 결국 결측 보완 인공지능부에 의해 결측되었던 데이터가 시계열적인 관계를 이용하여 보완되고 완전한 멀티 모달 벡터를 생성할 수 있게 된다. 이와 같이 보완된 멀티 모달 벡터를 포함하는 보완된 입력벡터는 분석 인공지능부에 입력될 수 있으며, 분석 인공지능부는 이를 근거로 결과벡터를 출력할 수 있게 된다.Referring to FIG. 7, the data for the image feature (t), which was replaced with 0 in the input vector (X) by the missingness compensation artificial intelligence unit, and the missing values of the sensor data (t) are supplemented in the input vector (X') This has been done. Ultimately, the missing data is complemented by the missingness correction artificial intelligence unit using time series relationships, and a complete multi-modal vector can be created. The supplemented input vector including the supplemented multi-modal vector can be input to the analysis artificial intelligence unit, and the analysis artificial intelligence unit can output a result vector based on this.

본 개시에 의한 일 실시예로서, 환자가 1주일 주기로 진단을 위한 검사를 진행한다고 가정할 때 시간 간격은 1주일이 될 수 있다. 이때 1차 검사시 CT 촬영, 몸무게, X-ray 촬영을 수행하고, 이후 2차 검사에서 1차 검사시와 동일하게 검사가 수행되지 않을 수 있다. 일 예로서 2차 검사에서 CT 촬영을 제외한 몸무게, X-ray 촬영을 수행할 수 있다. 이 경우 누락된 CT 촬영 데이터가 결측치 보완 인공지능부에 의하여 추정한 CT 촬영 데이터를 보완될 수 있다.As an example according to the present disclosure, assuming that a patient undergoes a diagnostic test every week, the time interval may be one week. At this time, CT scan, weight, and As an example, weight and X-ray imaging may be performed in the secondary examination, excluding CT scanning. In this case, the missing CT scan data can be supplemented with the CT scan data estimated by the missing value supplement artificial intelligence unit.

또한 본 개시는 획득되는 멀티모달 데이터가 반드시 동일한 시간 간격을 두고 획득될 것을 요구하지 않는다. 즉 멀티모달 데이터가 획득되는 1차수와 2차수, 그리고 3차수 간 시간 간격이 동일하지 않더라도 결측 보완 인공지능부에 의해 결측치를 보완할 수 있게 된다.Additionally, the present disclosure does not necessarily require that the acquired multimodal data be acquired at the same time interval. In other words, even if the time interval between the first, second, and third orders in which multimodal data is acquired is not the same, missing values can be compensated for by the missingness compensation artificial intelligence unit.

일 예로서, 본 개시에 의해 ADHD 환자를 대상으로 3차수에 걸쳐 획득한 데이터에 결측치가 있는 경우 이를 보완할 수 있다. 일 예로서, 1 차수와 2차수의 간격은 10일, 2차수와 3차수의 간격은 14일과 같이 각 차수별로 시간 간격이 다를 수 있다. 구체적인 예로서, 환자가 1차수에 진단 게임 1, 진단 게임 2 및 진단 게임 3을 수행하고, 2차수에서 진단 게임 2 및 진단 게임 3을 수행하고, 3차수에서 진단 게임 1 및 진단 게임 3을 수행할 수 있다. 이 경우 결측된 게임 데이터는 2차수의 진단 게임 1에 대한 데이터와, 3차수의 진단 게임 2에 대한 데이터 일 수 있다. 이 경우 본 개시에 의해 누락된 2차수의 진단 게임 1에 대한 데이터와 3차수의 진단게임 2에 대한 데이터를 추정하여 데이터를 보완하고, 최종적으로 환자의 상태를 분석하여 결과를 도출할 수 있게 된다.As an example, according to the present disclosure, if there are missing values in data obtained over three rounds targeting ADHD patients, these can be supplemented. As an example, the time interval may be different for each order, such as the interval between the first order and the second order is 10 days, and the interval between the second order and the third order is 14 days. As a specific example, the patient plays Diagnostic Game 1, Diagnostic Game 2, and Diagnostic Game 3 in Wave 1, Diagnostic Game 2 and Diagnostic Game 3 in Wave 2, and Diagnostic Game 1 and Diagnostic Game 3 in Wave 3. can do. In this case, the missing game data may be data for the second round of diagnostic game 1 and data for the third round of diagnostic game 2. In this case, with the present disclosure, it is possible to supplement the data by estimating the missing data for the second round of diagnostic game 1 and the data for the third round of diagnostic game 2, and finally derive the results by analyzing the patient's condition. .

이하에서는 도 8 및 도 9를 참조하여 전술한 결측 보완 인공지능부의 구조에 대하여 설명하도록 한다.Hereinafter, the structure of the above-described missingness correction artificial intelligence unit will be described with reference to FIGS. 8 and 9.

도 8은 결측 보완 인공지능부의 신경망을 도시한 도면이다.Figure 8 is a diagram showing the neural network of the missingness compensation artificial intelligence unit.

도 8을 참조하면, 결측 보완 인공지능부(300)는 입력층, 은닉층 및 출력층을 포함할 수 있다. 입력층과 은닉층은 완전연결(fully connected)되며, 은닉층과 출력층 또한 완전연결(fully connected) 될 수 있다. 은닉층 및 출력층의 노드의 활성화 함수(activation fuction)는 비선형함수이다. Referring to FIG. 8, the missingness correction artificial intelligence unit 300 may include an input layer, a hidden layer, and an output layer. The input layer and hidden layer are fully connected, and the hidden layer and output layer can also be fully connected. The activation function of the nodes of the hidden layer and output layer is a non-linear function.

한편, 결측 보완 인공지능부(300)에서 입력벡터(X)와 보완된 입력벡터(X)의 차원은 같다. 즉 입력층의 노드의 개수와 출력층의 노드의 개수는 동일하게 구성된다.Meanwhile, in the missingness compensation artificial intelligence unit 300, the dimensions of the input vector (X) and the complemented input vector (X) are the same. That is, the number of nodes in the input layer and the number of nodes in the output layer are the same.

이때 결측 보완 인공지능부(300)에 의해 출력되는 보완된 입력 벡터(X’)는 결측치가 대체(imputation)된 상태가 된다. 예를 들어, 입력벡터(X)의 j 번째 원소 X(j)가 결측치라고 하면, 결측 보완 인공지능부에 의해 보완된 입력벡터(X’)는 j 번째 원소 X’(j)가 대체된 값을 갖는다.At this time, the complemented input vector (X') output by the missingness correction artificial intelligence unit 300 has the missing value replaced (imputation). For example, if the j-th element X(j) of the input vector (X) is a missing value, the input vector ( has

도 9는 본 발명의 다른 실시예에서 결측 보완 인공지능부의 신경망을 도시한 도면이다.Figure 9 is a diagram showing a neural network of a missingness compensation artificial intelligence unit in another embodiment of the present invention.

도 9를 참조하면 본 발명의 다른 실시예에서 결측 보완 인공지능부(300)는 RNN(Recurrent Neural Network)와 유사하게 recurrent connection이 추가될 수 있다. Referring to FIG. 9, in another embodiment of the present invention, the missingness compensation artificial intelligence unit 300 may have a recurrent connection added similar to a Recurrent Neural Network (RNN).

recurrent connection은 결측 보완 인공지능부(300)는 은닉층에 시계열의 특성을 반영하기 위해 구비된다.The recurrent connection compensates for missingness. The artificial intelligence unit 300 is provided to reflect the characteristics of the time series in the hidden layer.

이때 은닉층st(state)와 은닉층st-1이 구비되며, 은닉층st-1은 은닉층st와 동일한 개수의 노드 개수를 갖도록 구성된다.At this time, a hidden layer st (state) and a hidden layer st-1 are provided, and the hidden layer st-1 is configured to have the same number of nodes as the hidden layer st.

은닉층st 으로부터 은닉층st-1로의 연결(W)은 하나의 노드가 오직 하나로 연결(W1, W2, Wn)되며, 이 연결의 가중치는 각각 1이다. 즉 현재 은닉층 노드의 값들이 이전 상태의 은닉층 각각에 연결되어 값을 전달한다. 이후 은닉층st-1의 각각의 노드는 은닉층st의 모든 노드와 연결된다. 즉 은닉층st-1로부터 은닉층st에 대한 연결(U)은 완전연결된다. 이때 은닉층st-1의 각 노드와 은닉층st 각 노드에 대한 연결에는 고유의 가중치를 갖게 된다.The connection (W) from hidden layer st to hidden layer st-1 has only one node connected (W1, W2, Wn), and the weight of each of these connections is 1. In other words, the values of the current hidden layer node are connected to each hidden layer in the previous state and the values are transmitted. Afterwards, each node of the hidden layer st-1 is connected to all nodes of the hidden layer st. That is, the connection (U) from hidden layer st-1 to hidden layer st is fully connected. At this time, each node in the hidden layer st-1 and the connection to each node in the hidden layer st have a unique weight.

이상에서 설명한 바와 같이 본 발명에 따른 불완전한 연속 데이터 기반의 인공지능 분석 시스템은 multi-modal 데이터 환경에서 결측이 있더라도 데이터를 분석하여 분석 정확도를 향상시킬 수 있는 효과가 있다.As described above, the artificial intelligence analysis system based on incomplete continuous data according to the present invention has the effect of improving analysis accuracy by analyzing data even if there are missing data in a multi-modal data environment.

20: 서버
100: 데이터 베이스부
200: 입력 데이터 생성부
300: 결측 보완 인공지능부
400: 분석 인공지능부20: Server
100: database part
200: input data generation unit
300: Missing Compensation Artificial Intelligence Department
400: Analysis Artificial Intelligence Department

Claims

a data collection unit that acquires continuous multi-modal data acquired according to orders at time intervals;
an input data generator that extracts features of individual data included in multi-modal data obtained according to each order, converts them into individual vectors, and connects the individual vectors to generate an input vector including the multi-modal vector;
When there is a missing value in the input vector, a missingness compensation artificial intelligence unit that supplements the missing value to generate a supplemented input vector; and
It includes an analysis artificial intelligence unit that receives and analyzes the supplemented input vector and outputs the analyzed result vector,
The learning of the missingness compensation artificial intelligence unit is,
Unsupervised learning is performed based on learning data generated by substituting random missing values into multi-modal vectors of multiple orders without missingness,
An artificial intelligence analysis system based on incomplete continuous data that modifies weights by applying an error backpropagation algorithm.

According to claim 1,
The missingness compensation artificial intelligence department,
It includes an input layer, a hidden layer, and an output layer,
The input layer and the hidden layer are fully connected,
The hidden layer and the output layer are fully connected,
An artificial intelligence analysis system based on incomplete continuous data where the activation function of the nodes of the hidden layer and the output layer is a non-linear function.

According to clause 2,
The learning data for learning the missing compensation artificial intelligence unit is,
An artificial intelligence analysis system based on incomplete continuous data that is created by selecting a random number of values in the multi-modal vector according to the plurality of orders without missingness and replacing them with 0.

According to clause 3,
The error function of the missingness compensation artificial intelligence is,
and
where n is the number of learning data, m is the dimension of the input vector,
remind is the jth attribute of the ith input vector,
remind is an artificial intelligence analysis system based on incomplete continuous data, which is the j th attribute value of the output vector when the i th learning data is connected.

According to clause 3,
The supplementary artificial intelligence department,
An artificial intelligence analysis system based on incomplete continuous data that includes recurrent connections to reflect the characteristics of the time series in the hidden layer.

According to clause 6,
The supplementary artificial intelligence department,
An artificial intelligence analysis system based on incomplete continuous data in which any one node of the hidden layer recurrently recurs to each of the other remaining nodes.

According to clause 7,
An artificial intelligence analysis system based on incomplete continuous data, each of which has a unique weight when any one node of the hidden layer is recurrent to each of the other remaining nodes.

According to claim 1,
The multi-modal data is an artificial intelligence analysis system based on incomplete continuous data including at least two of video, voice, text, sensor values, gamegater, and video specific values.

According to claim 1,
The multi-modal data is,
An artificial intelligence analysis system based on incomplete continuous data, which is medical data obtained during the diagnosis and treatment of disease.