KR102431205B1

KR102431205B1 - Apparatus for generating training data and System for symptom diagnosis to which the Artificial Intelligence data training is applied

Info

Publication number: KR102431205B1
Application number: KR1020220006098A
Authority: KR
Inventors: 임재현; 김원표
Original assignee: 루먼랩 주식회사
Priority date: 2022-01-14
Filing date: 2022-01-14
Publication date: 2022-08-11
Also published as: KR102431205B9

Abstract

Disclosed are an apparatus for generating training data to predict mental illnesses and symptoms and a symptom diagnosis system and method to which training data is applied. According to one embodiment of the present invention, the apparatus for generating training data may include: a data receiving unit receiving biological data from a wearable device worn by a user; a data analysis unit extracting at least one feature value from the biological data; and a learning data generation unit generating training data by performing labeling on analysis data generated based on the at least one feature value.

Description

Apparatus for generating training data and System for symptom diagnosis to which the Artificial Intelligence data training is applied}

사용자의 웨어러블 장치에서 수집된 생체 데이터를 기초로 머신러닝 및 딥러닝 모델의 훈련을 통해 정신질환 및 증상을 예측하기 위한 학습 데이터 생성 장치 및 인공지능 데이터 학습이 적용된 증상 진단 시스템 및 그 방법에 관한 것이다.It relates to a learning data generating device for predicting mental disorders and symptoms through training of machine learning and deep learning models based on biometric data collected from a user's wearable device, and a symptom diagnosis system to which artificial intelligence data learning is applied, and a method therefor .

종래에는 PSG(Polysomnography)와 같은 침습적인 장치를 통해 측정한 생체 데이터를 기초로 전문의의 진찰을 통해 아동의 정신질환 및 증상에 대한 진단을 수행하였다. Conventionally, diagnosis of children's mental disorders and symptoms was performed through consultation with a specialist based on biometric data measured through an invasive device such as polysomnography (PSG).

최근에는 웨어러블 장치를 통하여 비침습적 방식으로 사용자의 기초 생체 데이터를 수집하며, 이를 이용한 다양한 진단 방법들이 연구되고 있다. 공개특허 10-2018-0029517의 경우, 웨어러블 디바이스를 주의력결핍 장애의 진단에 활용하는 방법을 개시하고 있다. Recently, basic biometric data of a user is collected in a non-invasive manner through a wearable device, and various diagnostic methods using the collected data are being studied. In the case of Patent Publication No. 10-2018-0029517, a method of using a wearable device for diagnosis of attention deficit disorder is disclosed.

정신질환 및 증상을 예측하기 위한 학습 데이터 생성 장치, 학습 데이터가 적용된 증상 진단 시스템 및 방법을 제공하는데 목적이 있다.An object of the present invention is to provide an apparatus for generating learning data for predicting mental illness and symptoms, and a system and method for diagnosing symptoms to which learning data is applied.

일 양상에 따르면, 학습 데이터 생성 장치는 사용자에게 착용된 웨어러블 장치로부터 생체 데이터를 수신하는 데이터 수신부; 생체 데이터로부터 하나 이상의 특징값을 추출하는 데이터 분석부; 및 하나 이상의 특징값을 기초로 생성된 분석 데이터에 레이블링을 수행하여 학습 데이터를 생성하는 학습 데이터 생성부를 포함할 수 있다. According to an aspect, an apparatus for generating learning data includes a data receiving unit configured to receive biometric data from a wearable device worn by a user; a data analysis unit for extracting one or more feature values from biometric data; and a training data generator configured to generate training data by labeling the analysis data generated based on one or more feature values.

생체 데이터는 심박수, 걸음수, 대사율(METs, Metabolic rates), 칼로리 소모량, 및 수면 상태 중 적어도 하나에 대한 데이터를 포함할 수 있다. The biometric data may include data on at least one of a heart rate, a step number, a metabolic rate (METs, Metabolic rates), a calorie consumption, and a sleep state.

데이터 분석부는 생체 데이터에 포함된 심박수 및 걸음수 중 적어도 하나에 대한 데이터를 기초로 Cosinor 분석을 수행하여 추정함수 y(t)를 생성할 수 있다.The data analyzer may generate an estimation function y(t) by performing a cosinor analysis based on data on at least one of a heart rate and a number of steps included in the biometric data.

데이터 분석부는 추정함수를 기초로 적합도(Good of fit, GOF)를 계산할 수 있다. The data analyzer may calculate a good of fit (GOF) based on the estimation function.

학습 데이터 생성부는 소정 증상 별 특징값 정보를 포함하는 체크 리스트와 분석 데이터에 포함된 특징값을 비교하여 분석 데이터가 증상에 대한 데이터인지 여부를 결정하여 레이블링을 수행할 수 있다. The learning data generating unit may compare a check list including feature value information for each predetermined symptom with a feature value included in the analysis data to determine whether the analysis data is symptom data and perform labeling.

학습 데이터 생성부는 학습 데이터를 소정 비율에 따라 훈련용 데이터, 검증용 데이터 및 테스트용 데이터로 분류할 수 있다. The training data generator may classify the training data into training data, verification data, and test data according to a predetermined ratio.

일 양상에 따르면, 학습 데이터 생성 방법은 사용자에게 착용된 웨어러블 장치로부터 생체 데이터를 수신하는 단계; 생체 데이터로부터 하나 이상의 특징값을 추출하는 데이터 분석 단계; 및 하나 이상의 특징값을 기초로 생성된 분석 데이터에 레이블링을 수행하여 학습 데이터를 생성하는 학습 데이터 생성 단계를 포함할 수 있다. According to an aspect, a method for generating learning data includes: receiving biometric data from a wearable device worn by a user; a data analysis step of extracting one or more feature values from biometric data; and performing labeling on the analysis data generated based on one or more feature values to generate training data.

데이터 분석 단계는 생체 데이터에 포함된 심박수 및 걸음수 중 적어도 하나에 대한 데이터를 기초로 Cosinor 분석을 수행하여 추정함수 y(t)를 생성할 수 있다. In the data analysis step, the estimation function y(t) may be generated by performing a cosinor analysis based on data on at least one of a heart rate and a number of steps included in the biometric data.

데이터 분석 단계는 추정함수를 기초로 적합도(Good of fit, GOF)를 계산할 수 있다. In the data analysis step, a good of fit (GOF) may be calculated based on the estimation function.

학습 데이터 생성 단계는 소정 증상 별 특징값 정보를 포함하는 체크 리스트와 분석 데이터에 포함된 특징값을 비교하여 분석 데이터가 증상에 대한 데이터인지 여부를 결정하여 레이블링을 수행할 수 있다. The learning data generation step may perform labeling by comparing a check list including feature value information for each predetermined symptom with a feature value included in the analysis data to determine whether the analysis data is data about a symptom.

학습 데이터 생성 단계는 학습 데이터를 소정 비율에 따라 훈련용 데이터, 검증용 데이터 및 테스트용 데이터로 분류할 수 있다. The training data generation step may classify the training data into training data, verification data, and test data according to a predetermined ratio.

일 양상에 따르면, 증상 진단 시스템은, 증상 진단 장치와 웨어러블 장치를 포함할 수 있다.According to an aspect, the symptom diagnosis system may include a symptom diagnosis device and a wearable device.

일 양상에 따르면, 증상 진단 장치는 사용자에게 착용된 웨어러블 장치로부터 생체 데이터를 수신하는 데이터 수신부; 생체 데이터로부터 하나 이상의 특징값을 추출하는 데이터 분석부; 및 소정 증상에 대한 레이블이 부여된 학습 데이터로 학습된 분석 모델을 이용하여 특징값을 분석하여 소정 증상의 유무를 판단하는 진단부를 포함할 수 있다. According to one aspect, the symptom diagnosis apparatus may include: a data receiver configured to receive biometric data from a wearable device worn by a user; a data analysis unit for extracting one or more feature values from biometric data; and a diagnostic unit that analyzes a feature value using an analysis model learned from the learning data labeled with respect to a predetermined symptom to determine the presence or absence of a predetermined symptom.

상기 분석 모델은 머신러닝 모델 또는 딥러닝 모델일 수 있다. The analysis model may be a machine learning model or a deep learning model.

상기 생체 데이터는 심박수, 걸음수, 대사율(METs, Metabolic rates), 칼로리 소모량, 및 수면 상태 중 적어도 하나에 대한 데이터를 포함할 수 있다. The biometric data may include data on at least one of a heart rate, a step count, a metabolic rate (METs, Metabolic rates), a calorie consumption, and a sleep state.

일 양상에 따르면, 증상 진단 방법은 사용자에게 착용된 웨어러블 장치로부터 생체 데이터를 수신하는 단계; 생체 데이터로부터 하나 이상의 특징값을 추출하는 데이터 분석 단계; 및 소정 증상에 대한 레이블이 부여된 학습 데이터로 학습된 분석 모델을 이용하여 특징값을 분석하여 소정 증상의 유무를 판단하는 진단 단계를 포함할 수 있다.According to one aspect, a symptom diagnosis method includes: receiving biometric data from a wearable device worn by a user; a data analysis step of extracting one or more feature values from biometric data; and a diagnosis step of determining the presence or absence of a predetermined symptom by analyzing a feature value using an analysis model learned with the learning data labeled with respect to the predetermined symptom.

사용자의 웨어러블 장치에서 수집되는 일주기 데이터를 통하여 아동의 정신질환 및 증상을 조기에 진단하고 평가할 수 있다.Through the circadian data collected from the user's wearable device, it is possible to diagnose and evaluate children's mental disorders and symptoms early.

도 1은 일 실시예에 따른 학습 데이터 생성 장치의 구성도이다.
도 2는 일 실시예에 따른 학습 데이터 생성 방법을 도시한 흐름도이다.
도 3은 일 실시예에 따른 증상 진단 장치의 구성도이다.
도 4는 일 예에 학습 데이터 생성 장치에 적용된 분석 모델의 성능을 분석한 예시도이다.
도 5는 일 실시예에 따른 증상 진단 방법을 도시한 흐름도이다.1 is a block diagram of an apparatus for generating learning data according to an exemplary embodiment.
2 is a flowchart illustrating a method of generating learning data according to an embodiment.
3 is a block diagram of an apparatus for diagnosing symptoms according to an exemplary embodiment.
4 is an exemplary diagram of analyzing the performance of an analysis model applied to an apparatus for generating training data in an example.
5 is a flowchart illustrating a symptom diagnosis method according to an exemplary embodiment.

이하, 첨부된 도면을 참조하여 본 발명의 일 실시예를 상세하게 설명한다. 본 발명을 설명함에 있어 관련된 공지 기능 또는 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략할 것이다. 또한, 후술되는 용어들은 본 발명에서의 기능을 고려하여 정의된 용어들로서 이는 사용자, 운용자의 의도 또는 관례 등에 따라 달라질 수 있다. 그러므로, 그 정의는 본 명세서 전반에 걸친 내용을 토대로 내려져야 할 것이다.Hereinafter, an embodiment of the present invention will be described in detail with reference to the accompanying drawings. In describing the present invention, if it is determined that a detailed description of a related well-known function or configuration may unnecessarily obscure the gist of the present invention, the detailed description thereof will be omitted. In addition, the terms to be described later are terms defined in consideration of functions in the present invention, which may vary according to intentions or customs of users and operators. Therefore, the definition should be made based on the content throughout this specification.

이하, 학습 데이터 생성 장치 및 학습 데이터가 적용된 증상 진단 시스템의 실시예들을 도면들을 참고하여 자세히 설명한다.Hereinafter, embodiments of the learning data generating apparatus and the symptom diagnosis system to which the learning data is applied will be described in detail with reference to the drawings.

도 1은 일 실시예에 따른 학습 데이터 생성 장치의 구성도이다.1 is a block diagram of an apparatus for generating learning data according to an exemplary embodiment.

일 실시예에 따르면, 학습 데이터 생성 장치(100)는 사용자에게 착용된 웨어러블 장치로부터 생체 데이터를 수신하는 데이터 수신부(110), 생체 데이터로부터 하나 이상의 특징값을 추출하는 데이터 분석부(120) 및 하나 이상의 특징값을 기초로 생성된 분석 데이터에 레이블링을 수행하여 학습 데이터를 생성하는 학습 데이터 생성부(130)를 포함할 수 있다.According to an embodiment, the learning data generating apparatus 100 includes a data receiving unit 110 receiving biometric data from a wearable device worn by a user, a data analyzing unit 120 extracting one or more feature values from the biometric data, and one It may include a learning data generating unit 130 that generates learning data by performing labeling on the analysis data generated based on the above feature values.

일 예에 따르면, 데이터 수신부(110)는 사용자가 착용하고 있는 웨어러블 장치 또는 사용자 단말에서 사용자의 생체 데이터를 30초 내지 1분 단위로 수신할 수 있다. 일 예로, 생체 데이터에는 심박수, 걸음수, METs(Metabolic rates), 칼로리 소모량 및 수면 상태 중 적어도 하나를 포함할 수 있다. According to an example, the data receiving unit 110 may receive the user's biometric data from the wearable device or the user terminal worn by the user in units of 30 seconds to 1 minute. As an example, the biometric data may include at least one of a heart rate, a step count, metabolic rates (METs), a calorie consumption, and a sleep state.

일 예에 따르면, 데이터 분석부(120)는 Cosinor 분석, 일주기 정보 추출, 데이터 적재를 수행할 수 있다. 데이터 분석부(120)는 데이터 수신부(110)를 통하여 수신한 생체 데이터에 대하여 일주기 분석을 수행하여 소정의 특징점을 추출할 수 있으면, 특징점을 기초로 각 사용자 별 데이터 세트를 생성할 수 있다. According to an example, the data analysis unit 120 may perform cosinor analysis, circadian information extraction, and data loading. If the data analyzer 120 can extract a predetermined feature point by performing a circadian analysis on the biometric data received through the data receiver 110 , it can generate a data set for each user based on the feature point.

일 예에 따르면, 데이터 분석부(120)는 데이터 수신부(110)를 통하여 수신한 생체 데이터의 유효성을 검증할 수 있으며, 각각의 생체 데이터가 기록된 시간을 연, 월, 일, 시, 분, 초로 분리하여 저장할 수 있다.According to an example, the data analysis unit 120 may verify the validity of the biometric data received through the data receiving unit 110 , and calculate the time at which each biometric data was recorded year, month, day, hour, minute, It can be stored separately in seconds.

일 예에 따르면, 데이터 분석부(120)는 유효성 검증된 생체 데이터 파일에 대하여 심박수, 수면 정보 및 활동 정보를 기준으로 생체 데이터를 세개의 대분류를 수행할 수 있다. According to an example, the data analysis unit 120 may perform three major classifications of biometric data based on heart rate, sleep information, and activity information on the validated biometric data file.

일 예로, 데이터 분석부(120)는 심박수에 대하여 1시간 당 최소, 최대 및 산술평균을 계산할 수 있다. For example, the data analyzer 120 may calculate the minimum, maximum, and arithmetic mean per hour with respect to the heart rate.

일 예로, 데이터 분석부(120)는 수면 정보에 대하여 30초, 60초를 기준으로 두개의 중분류를 수행할 수 있다. 예를 들어, 데이터 분석부(120)는 30초 수면을 1시간 당 light, deep, rem, wake의 4단계 수면 기록의 지속시간(분)을 계산할 수 있다. 또한, 데이터 분석부(120)는 60초 수면을 1시간 당 asleep, restless, awake의 3단계 수면 기록의 지속시간(분) 혹은 발생 횟수를 계산할 수 있다.For example, the data analyzer 120 may perform two intermediate classifications based on 30 seconds and 60 seconds for sleep information. For example, the data analysis unit 120 may calculate the duration (minutes) of 4 sleep recordings of light, deep, rem, and wake per hour for 30 seconds of sleep. In addition, the data analysis unit 120 may calculate the duration (minutes) or the number of occurrences of the three-stage sleep recording of asleep, restless, and awake per hour for 60 seconds of sleep.

일 예로, 데이터 분석부(120)는 활동 정보에 대하여 1시간 당 신체 강도 측정량(Intensity, METs)의 최소, 최대 및 산술평균, 1시간 당 걸음수의 합, 1시간 당 칼로리 소모량의 합을 계산할 수 있다.As an example, the data analysis unit 120 calculates the minimum, maximum, and arithmetic mean of body intensity measurements (Intensity, METs) per hour for activity information, the sum of steps per hour, and the sum of calories consumed per hour. can be calculated

일 예에 따르면, 데이터 분석부(120)는 Cosinor 분석을 수행함에 있어서, 1시간 주기의 생체 데이터의 유효성을 확인하기 위해 시간 당 소정 시간(예: 30분) 이상의 데이터를 사용할 수 있다. 일 예로, 데이터 분석부(120)는 생체 데이터의 유효성을 확인하기 위하여 하루에 특정 시간(예 5시간) 이상의 데이터를 보유한 사용자의 생체 데이터를 유효한 것으로 판단하고 사용할 수 있다. According to an example, when performing cosinor analysis, the data analysis unit 120 may use data for a predetermined time (eg, 30 minutes) or more per hour in order to check the validity of the biometric data of one hour period. For example, the data analyzer 120 may determine and use the biometric data of a user who has data for a specific time (eg, 5 hours) or more per day as valid in order to check the validity of the biometric data.

일 예에 따르면, 데이터 분석부(120)는 생체 데이터에 포함된 심박수 및 걸음수 중 적어도 하나에 대한 데이터를 기초로 Cosinor 분석을 수행할 수 있다. 예를 들어, 데이터 분석부(120)는 Cosinor 분석을 수행하기 위하여 1시간 주기의 생체 데이터를 사용할 수 있으면, MESOR(Midline Statistic Of Rhythm), Amplitude, Acrophase, Good of fit을 계산하여 일주기 데이터로 반환할 수 있다.According to an example, the data analyzer 120 may perform cosinor analysis based on data on at least one of a heart rate and a number of steps included in the biometric data. For example, if the data analysis unit 120 can use biometric data of one hour period to perform cosinor analysis, it calculates MESOR (Midline Statistic Of Rhythm), Amplitude, Acrophase, and Good of fit and converts it into circadian data. can be returned

일 실시예에 따르면, 데이터 분석부(120)는 Cosinor 분석을 수행하여 아래 수학식과 같이 추정함수 y(t)를 생성할 수 있다.According to an embodiment, the data analysis unit 120 may perform cosinor analysis to generate an estimation function y(t) as shown in the following Equation.

여기서, M은 MESOR(midline estimating statistic of rhythm), 는 진폭(Amplitude), 는 정점시각(Acrophase), 는 시간, T는 주기를 의미한다. 일 예로, 주기 T는 24시간일 수 있다. 일 예로, 데이터 분석부(120)는 소정 기간(예: 3일) 이상의 데이터를 연속적으로 나열하여 y(t)를 추정할 수 있다.Here, M is the midline estimating statistic of rhythm (MESOR), is the amplitude, is the peak time (Acrophase), is the time, T is the period. For example, the period T may be 24 hours. For example, the data analysis unit 120 may estimate y(t) by continuously listing data for a predetermined period (eg, 3 days) or longer.

일 실시예에 따르면, 데이터 분석부(120)는 추정함수를 기초로 아래 수학식과 같이 적합도(Good of fit, GOF)를 계산할 수 있다. According to an embodiment, the data analyzer 120 may calculate a good of fit (GOF) as shown in the following equation based on the estimation function.

여기서,

는 Explained Sum of Squares를 의미하며,

와 같이 정의된다. 여기서,

과 같이 정의되면, y는 사용자의 일주기 데이터,

은 추정함수 y(t)로부터 추정된 예측값을 의미한다. here,

stands for Explained Sum of Squares,

is defined as here,

If defined as, y is the user's circadian data,

is a predicted value estimated from the estimation function y(t).

일 예로,

는 Total Sum of Squares를 의미하며,

와 같이 정의된다. 여기서,

은

과 같이 정의된다. For example,

stands for Total Sum of Squares,

is defined as here,

silver

is defined as

일 예로, 수학식 2는 다음과 같이 계산될 수 있다. As an example, Equation 2 may be calculated as follows.

일 예에 따르면, 데이터 분석부(120)는 사용자의 심박수 및 걸음수 중 적어도 하나를 기초로 L5(Least 5 hours active), M10(Most 10 hours active), RA(Relative Amplitude) 지표를 일주기 단위 혹은 전체 데이터 수집기간에 비례하는 단위로 계산할 수 있다. 예를 들어, L5는 1시간 주기 데이터에서 5시간씩 슬라이딩 윈도우 프레임을 적용하여 산술평균을 계산하며, 일주기 데이터에서 이 값의 최소를 계산하는 것을 나타낸다. M10은 1시간 주기 데이터에서 10시간씩 슬라이딩 윈도우 프레임을 적용하여 산술평균을 계산하고, 일주기 데이터에서 이 값의 최대를 계산하는 것을 의미한다.According to an example, the data analysis unit 120 calculates L5 (Least 5 hours active), M10 (Most 10 hours active), and RA (Relative Amplitude) indicators based on at least one of the user's heart rate and step count in units of one cycle. Alternatively, it can be calculated in units proportional to the entire data collection period. For example, L5 represents calculating an arithmetic mean by applying a sliding window frame by 5 hours in 1-hour period data, and calculating the minimum of this value in 1-hour period data. M10 means calculating an arithmetic mean by applying a sliding window frame by 10 hours in 1-hour period data, and calculating the maximum of this value in 1-hour period data.

일 예에 따르면, 데이터 분석부(120)는 사용자의 걸음수에 기초하여 아래 수학식과 같이 IS(Inerdaily Stability), IV(Intradaily Variability)를 계산할 수 있다.According to an example, the data analyzer 120 may calculate intradaily stability (IS) and intradaily variability (IV) as shown in the following equation based on the number of steps of the user.

여기서,

는 1시간 당 생체 데이터의 평균,

는 모든 생체 데이터 지점들의 평균,

는 현재 연산 중인 1시간 데이터의 인덱스,

는 하루에 존재하는 생체 데이터 지점들의 개수를 의미한다. here,

is the average of biometric data per hour,

is the average of all biometric data points,

is the index of the one-hour data currently being calculated,

denotes the number of biometric data points existing per day.

일 예에 따르면, 데이터 분석부(120)는 사용자의 기초 신체 정보를 활용하여 기초 대사량을 계산한 후, 일일 칼로리 소모량에서 이를 가감하여 일주기 대사량 차이를 계산할 수 있다. 예를 들어, 기초 대사량은 아래의 세 가지 계산식을 사용하여 계산될 수 있다.According to an example, the data analyzer 120 may calculate the basal metabolic rate by using the user's basic body information, and then calculate the difference in the circadian metabolic rate by adding or subtracting it from the daily calorie consumption. For example, the basal metabolic rate may be calculated using the following three formulas.

- Mifflin St Jeor's Equation - Mifflin St Jeor's Equation

여기서,

은 Basal Metabolic Rate이며,

는 남성의 경우 +5, 여성의 경우 -161을 할당한다. here,

is the Basal Metabolic Rate,

is assigned +5 for men and -161 for women.

- Katch-McArdle's Equation- Katch-McArdle's Equation

- Harris-Benedict's Equation- Harris-Benedict's Equation

일 예에 따르면, 데이터 분석부(120)는 사용자의 수면 시간 중 걸음수를 계수하며, 심박수에 대하여 최소, 최대, 산술평균을 계산할 수 있다. 예를 들어, 수면 시간은 오후 9시~오후 6시, 일몰 시간~일출 시간 등으로 설정될 수 있다. According to an example, the data analyzer 120 may count the number of steps taken during the user's sleep time, and may calculate the minimum, maximum, and arithmetic mean with respect to the heart rate. For example, the sleep time may be set to 9 pm to 6 pm, sunset time to sunrise time, and the like.

일 예에 따르면, 데이터 분석부(120)는 사용자의 일상 시간 중에서 걸음수를 계수하며, 심박수에 대하여 최소, 최대, 산술평균을 계산할 수 있다. 예를 들어, 일상 시간은 오전 7시~오후 8시, 일출 시간~일몰 시간 등으로 설정될 수 있다.According to an example, the data analysis unit 120 may count the number of steps in the user's daily time, and may calculate the minimum, maximum, and arithmetic mean with respect to the heart rate. For example, the daily time may be set to 7:00 am to 8:00 pm, sunrise time to sunset time, and the like.

일 예에 따르면, 데이터 분석부(120)는 사용자의 신체 활동의 강도를 Light, Sedentary, Moderate, Vigorous의 4단계로 구분할 수 있다. 예를 들어, 각 단계는 METs의 값이 각각 <1.5, 1.5~3.0, 3.0~6.0, >6.0 범위에 있는 경우를 나타내며, 일주기 단위로 연속적으로 계산될 수 있다. 또한, 데이터 분석부(120)는 분 주기 단위의 생체 데이터에서 하루에 각 단계가 지속된 총 시간을 계산하고 기록할 수 있다.According to an example, the data analyzer 120 may classify the intensity of the user's physical activity into four levels of Light, Sedentary, Moderate, and Vigorous. For example, each step represents a case in which the values of METs are in the range of <1.5, 1.5-3.0, 3.0-6.0, and >6.0, respectively, and can be continuously calculated in units of one cycle. In addition, the data analysis unit 120 may calculate and record the total duration of each step in a day in the biometric data in units of minutes.

일 예에 따르면, 데이터 분석부(120)는 사용자의 일주기 심박수의 표준편차, 분산을 계산할 수 있다. 데이터 분석부(120)는 30초 수면 기록에서 사용자가 일상 시간에 수면 상태에 있던 총 시간을 산출할 수 있으며, 사용자의 일주기 단위의 수면 4단계의 지속 시간을 계산할 수 있다. 또한, 데이터 분석부(120)는 아래 수학식과 같이 사용자의 일주기 단위의 침대에 있었던 시간을 산출할 수 있다. According to an example, the data analyzer 120 may calculate the standard deviation and variance of the user's circadian heart rate. The data analysis unit 120 may calculate the total time that the user was in the sleep state in the daily time from the 30-second sleep record, and may calculate the duration of the user's four sleep phases in units of one cycle. In addition, the data analysis unit 120 may calculate the amount of time the user was in bed in a unit of one cycle as shown in the following equation.

일 예에 따르면, 데이터 분석부(120)는 사용자의 수면 기록에서 아래 수학식과 같이 30초 수면 데이터와 60초 수면 데이터 각각에 일주기 수면 효율을 계산할 수 있다.According to an example, the data analysis unit 120 may calculate circadian sleep efficiency for each of 30 seconds sleep data and 60 seconds sleep data in the user's sleep record, as shown in the following equation.

일 예에 따르면, 데이터 분석부(120)는 30초 수면 기록에서 사용자의 일주기 4단계의 각 수면의 비율을 산출할 수 있다. 일 예로, 수면 비율을 아래 수학식과 같이 계산될 수 있다. According to an example, the data analysis unit 120 may calculate the ratio of each sleep in the 4th stage of the user's circadian cycle from the 30-second sleep record. As an example, the sleep rate may be calculated as shown in the following equation.

일 실시예에 따르면, 학습 데이터 생성부(130)는 소정 증상 별 특징값 정보를 포함하는 체크 리스트와 분석 데이터에 포함된 특징값을 비교하여 분석 데이터가 증상에 대한 데이터인지 여부를 결정하여 레이블링을 수행할 수 있다. According to an embodiment, the learning data generation unit 130 determines whether the analysis data is data about symptoms by comparing the check list including the characteristic value information for each predetermined symptom with the characteristic values included in the analysis data to perform labeling. can be done

일 예에 따르면, 학습 데이터 생성부(130)는 K-SADS-PL(Kiddie Schedule for Affective Disorders and Schizophrenia Present and Lifetime version) DSM5(Diagnostic and Statistical Manual of Mental Disorders) 진단 기준에 의거하여 분석 데이터에 사용자의 정신질환 진단 여부 및 증상 여부를 포함한 레이블을 부여할 수 있다. According to an example, the learning data generating unit 130 provides a user to the analysis data based on the K-SADS-PL (Kiddie Schedule for Affective Disorders and Schizophrenia Present and Lifetime version) DSM5 (Diagnostic and Statistical Manual of Mental Disorders) diagnostic criteria. A label can be given including whether or not the person has been diagnosed with a mental illness and has symptoms.

일 예로, 정신질환 또는 증상은 ADHD(Attention Defiant/Hyperactivity Disorder), OCD(Obsessive-Compulsive Disorder), ODD(Oppositional Defiant Disorder)를 포함할 수 있으며, MDD(Major Depressive Disorder) 진단 및 수면 문제 증상을 더 포함할 수 있다.As an example, the mental illness or symptom may include Attention Defiant/Hyperactivity Disorder (ADHD), Obsessive-Compulsive Disorder (OCD), Oppositional Defiant Disorder (ODD), and MDD (Major Depressive Disorder) diagnosis and sleep problem symptoms. may include

일 예에 따르면, 학습 데이터 생성부(130)는 전문가가 생성한 K-SADS-PL 기반의 체크 리스트 데이터를 수신할 수 있으며, 수신한 체크 리스트에 기초하여 분석 데이터에 레이블링을 수행할 수 있다. 예를 들어, 학습 데이터 생성부(130)는 체크 리스트에 포함된 특징값과 분석 데이터에 포함된 특징값을 비교하여 유사도를 계산할 수 있으며, 소정 기준 이상의 유사도를 가지는지 여부에 따라 분석 데이터가 증상에 대한 데이터인지 여부를 판단할 수 있다. According to an example, the learning data generating unit 130 may receive the K-SADS-PL based checklist data generated by the expert, and perform labeling on the analysis data based on the received checklist. For example, the learning data generating unit 130 may calculate the similarity by comparing the feature value included in the check list with the feature value included in the analysis data, and the analysis data is displayed according to whether the similarity is greater than or equal to a predetermined standard. It can be determined whether the data for

일 예로, 학습 데이터 생성부(130)는 진단하고자 하는 질환 및 증상을 보유한 분석 데이터에 레이블 1, 그렇지 않은 경우 레이블 0을 부여할 수 있다. As an example, the learning data generator 130 may assign a label 1 to the analysis data having a disease and symptom to be diagnosed, and a label 0 otherwise.

일 예로, 학습 데이터 생성부(130)는 정상군과 질환군의 비율을 맞출 수 있다. 예를 들어, 정상군과 질환군의 비율은 1:1 또는 2:1과 같이 설정될 수 있다. 학습 데이터 생성부(130)는 데이터 비율을 재조정하기 위하여 무작위(Random) 샘플링, SMOTE(Synthetic Minority Oversampling Technique) 샘플링 기법을 사용할 수 있다. For example, the learning data generating unit 130 may match the ratio between the normal group and the disease group. For example, the ratio of the normal group to the disease group may be set as 1:1 or 2:1. The training data generator 130 may use a random sampling or a synthetic minority oversampling technique (SMOTE) sampling technique to readjust the data rate.

일 예에 따르면, 학습 데이터 생성부(130)는 샘플링 과정을 거친 학습 데이터 세트에 스케일링을 수행할 수 있다. 예를 들어, 학습 데이터 생성부(130)는 모든 데이터를 0~1 사이의 값으로 변환하는 최소-최대 스케일링 기법, 모든 데이터를 -1~1 사이의 값으로 변환하는 표준 스케일링 기법 등을 사용하여 학습 데이터를 스케일링 할 수 있다. According to an example, the training data generator 130 may perform scaling on the training data set that has undergone a sampling process. For example, the training data generator 130 uses a minimum-maximum scaling technique that converts all data into values between 0 and 1, a standard scaling technique that converts all data into values between -1 and 1, and the like. The training data can be scaled.

일 실시예에 따르면, 학습 데이터 생성부(130)는 학습 데이터를 소정 비율에 따라 훈련용 데이터, 검증용 데이터 및 테스트용 데이터로 분류할 수 있다. 예를 들어, 학습 데이터 생성부(130)는 훈련:검증:테스트의 비율을 6:2:2, 5:3:2, 7:2:1 등으로 조정할 수 있다. According to an embodiment, the training data generator 130 may classify the training data into training data, verification data, and test data according to a predetermined ratio. For example, the learning data generator 130 may adjust the training:verification:test ratio to 6:2:2, 5:3:2, 7:2:1, and the like.

도 2는 일 실시예에 따른 학습 데이터 생성 방법을 도시한 흐름도이다.2 is a flowchart illustrating a method of generating learning data according to an embodiment.

일 실시예에 따르면, 학습 데이터 생성 장치는 사용자에게 착용된 웨어러블 장치로부터 생체 데이터를 수신할 수 있다(210). According to an embodiment, the apparatus for generating learning data may receive biometric data from a wearable device worn by the user ( S210 ).

일 예에 따르면, 학습 데이터 생성 장치는 사용자가 착용하고 있는 웨어러블 장치 또는 사용자 단말에서 사용자의 생체 데이터를 30초 내지 1분 단위로 수신할 수 있다. 일 예로, 생체 데이터에는 심박수, 걸음수, METs(Metabolic rates), 칼로리 소모량 및 수면 상태 중 적어도 하나를 포함할 수 있다. According to an example, the learning data generating apparatus may receive the user's biometric data from the wearable device or the user terminal worn by the user in units of 30 seconds to 1 minute. As an example, the biometric data may include at least one of a heart rate, a step count, metabolic rates (METs), a calorie consumption, and a sleep state.

일 실시예에 따르면, 학습 데이터 생성 장치는 생체 데이터로부터 하나 이상의 특징값을 추출할 수 있다(220).According to an embodiment, the apparatus for generating learning data may extract one or more feature values from the biometric data ( 220 ).

일 예에 따르면, 학습 데이터 생성 장치는 생체 데이터에 포함된 심박수 및 걸음수 중 적어도 하나에 대한 데이터를 기초로 Cosinor 분석을 수행하여 추정함수 y(t)를 생성할 수 있다. 또한, 학습 데이터 생성 장치는 추정함수를 기초로 적합도(Good of fit, GOF)를 계산할 수 있다. According to an example, the apparatus for generating learning data may generate an estimation function y(t) by performing a cosinor analysis based on data on at least one of a heart rate and a number of steps included in the biometric data. Also, the learning data generating apparatus may calculate a good of fit (GOF) based on the estimation function.

일 실시예에 따르면, 학습 데이터 생성 장치는 하나 이상의 특징값을 기초로 생성된 분석 데이터에 레이블링을 수행하여 학습 데이터를 생성할 수 있다(230).According to an embodiment, the apparatus for generating training data may generate training data by labeling the analysis data generated based on one or more feature values ( 230 ).

일 예에 따르면, 학습 데이터 생성 장치는 소정 증상 별 특징값 정보를 포함하는 체크 리스트와 분석 데이터에 포함된 특징값을 비교하여 분석 데이터가 증상에 대한 데이터인지 여부를 결정하여 레이블링을 수행할 수 있다. 또한, 학습 데이터 생성 장치는 학습 데이터를 소정 비율에 따라 훈련용 데이터, 검증용 데이터 및 테스트용 데이터로 분류할 수 있다. According to an example, the apparatus for generating learning data may compare a check list including feature value information for each predetermined symptom with a feature value included in the analysis data to determine whether the analysis data is data about symptoms and perform labeling. . Also, the training data generating apparatus may classify the training data into training data, verification data, and test data according to a predetermined ratio.

도 3은 일 실시예에 따른 증상 진단 장치의 구성도이다.3 is a block diagram of an apparatus for diagnosing symptoms according to an exemplary embodiment.

증상 진단 시스템은, 증상 진단 장치(300)와 웨어러블 장치(미도시)를 포함할 수 있다. The symptom diagnosis system may include a symptom diagnosis apparatus 300 and a wearable device (not shown).

일 실시예에 따르면, 증상 진단 장치(300)는 사용자에게 착용된 웨어러블 장치로부터 생체 데이터를 수신하는 데이터 수신부(310), 생체 데이터로부터 하나 이상의 특징값을 추출하는 데이터 분석부(320) 및 소정 증상에 대한 레이블이 부여된 학습 데이터로 학습된 분석 모델을 이용하여 특징값을 분석하여 소정 증상의 유무를 판단하는 진단부(330)를 포함할 수 있다. According to an embodiment, the symptom diagnosis apparatus 300 includes a data receiver 310 that receives biometric data from a wearable device worn by a user, a data analyzer 320 that extracts one or more feature values from biometric data, and a predetermined symptom. The diagnostic unit 330 may include a diagnosis unit 330 for determining the presence or absence of a predetermined symptom by analyzing a feature value using an analysis model learned with the learning data labeled for .

일 예에 따르면, 분석 모델은 학습 데이터를 기반으로 머신러닝 및 딥러닝 기법을 사용하여 아동의 정신질환 및 증상을 예측할 수 있다. 일 예로, 머신러닝 기법은 랜덤 포레스트(Random Forest), XGBoost(eXtreme Gradient Boosting), LightGBM(Light Gradient Boosting Machine) 분류기(Classifier)를 사용할 수 있다. 일 예로, 딥러닝 모델은 LSTM(Long-Short Term Memory)을 사용할 수 있다. According to an example, the analysis model may predict a child's mental illness and symptoms using machine learning and deep learning techniques based on the learning data. As an example, the machine learning technique may use a random forest, eXtreme Gradient Boosting (XGBoost), or Light Gradient Boosting Machine (LightGBM) classifier. As an example, the deep learning model may use Long-Short Term Memory (LSTM).

일 예에 따르면, 랜덤 포레스트 분류기 모델의 경우, 훈련의 반복 횟수를 50회 이하, n_estimator는 2000개 이하, max_depth는 50 이하, min_samples_leaf를 20 이하, min_samples_split은 50 이하로 설정할 수 있다.According to an example, in the case of a random forest classifier model, the number of repetitions of training may be set to 50 or less, n_estimator may be set to 2000 or less, max_depth may be set to 50 or less, min_samples_leaf may be set to 20 or less, and min_samples_split may be set to 50 or less.

일 예에 따르면, XGBoost 분류기 모델의 경우, 훈련의 반복 횟수를 10회 이하, objective를 "binary:logistic", booster를 "gbtree" 및 "dart", eval_metric을 "auc" 및 "logloss", 최대 깊이를 20이하로 설정할 수 있다.According to an example, for the XGBoost classifier model, the number of iterations of training is 10 or less, the objective is "binary:logistic", the booster is "gbtree" and "dart", the eval_metric is "auc" and "logloss", and the maximum depth is can be set to 20 or less.

일 예에 따르면, LightGBM 분류기 모델의 경우, 훈련의 반복 횟수를 10회 이하, objective를 "binary", boosting_type을 "gbdt", metric을 "auc", max_depth를 20 이하로 설정할 수 있다. According to an example, in the case of the LightGBM classifier model, the number of iterations of training may be set to 10 or less, the objective may be set to "binary", the boosting_type to be "gbdt", the metric may be set to "auc", and max_depth may be set to 20 or less.

일 예에 따르면, LSTM 모델의 경우, 사용되는 학습 데이터는 사용자의 일주기 데이터를 3일 이상의 특정한 일 수의 범위로 그룹화될 수 있다. 예를 들어, 사용자의 정신질환 및 증상의 진단 여부를 결정하기 위해 14일의 연속된 일주기 데이터를 그룹화하여 모델의 입력으로 사용할 수 있다. LSTM 모델의 셀 개수는 지정된 특정한 일 수로 설정될 수 있으며, many-to-one, many-to-many 구조로 구성될 수 있다. 각 셀의 출력은 전역 평균 풀링(Global average pooling) 기법을 사용하여 각 셀의 출력을 해당하는 날을 대표하는 값으로 출력할 수 있으며, 전역 최대 풀링(Global maximum pooling)을 사용할 수도 있다. 또한, 각 셀의 모든 출력은 완전 연결 레이어(Fully connected layer)로 전달되어 사용자의 정신질환 및 증상의 진단 확률을 추정할 수 있다. 여기서, 각 셀의 출력과 최종 예측 레이어 사이에는 신경망(Neural Network)이 존재할 수 있다. According to an example, in the case of the LSTM model, the training data used may be grouped by grouping the user's circadian data into a specific number of days range of 3 or more days. For example, in order to determine whether to diagnose a user's mental illness and symptoms, 14 consecutive days of circadian data may be grouped and used as an input to the model. The number of cells in the LSTM model may be set to a specified number of days, and may be configured in a many-to-one or many-to-many structure. The output of each cell may be output as a value representing the corresponding day using a global average pooling technique, or global maximum pooling may be used. In addition, all outputs of each cell are transmitted to a fully connected layer to estimate a diagnosis probability of a user's mental illness and symptoms. Here, a neural network may exist between the output of each cell and the final prediction layer.

일 예에 따르면, 분석 모델은 다음 6가지 측정 지표로 평가될 수 있다. 예를 들어, 6가지 성능 측정 지표는 ROC/AUC, 정확도(Accuracy), 민감도(Sensitivity), 특이도(Specificity), F1 Score, G-Mean을 포함하며, 성능 평가 시에는 훈련에 사용하지 않은 Hold-Out 테스트 샘플을 사용하여 모델의 성능을 평가할 수 있다. 예를 들어, 위의 학습 데이터 생성 장치에서 생성된 테스트용 데이터를 이용하여 성능을 평가할 수 있다. 성능 평가를 위한 지표는 아래와 같이 정의될 수 있다. According to an example, the analysis model may be evaluated by the following six metrics. For example, the six performance metrics include ROC/AUC, Accuracy, Sensitivity, Specificity, F1 Score, and G-Mean. Hold not used for training during performance evaluation. -Out test samples can be used to evaluate the performance of the model. For example, performance may be evaluated using test data generated by the above learning data generating device. The index for performance evaluation may be defined as follows.

-

여기서, ROC(Receiver Operating Characteristic)는 수신기 조작 특성을 나타내며, TPR(True Positive Rate)은 민감도(Sensitivity), FPR(False Positive Rate)는 (1-특이도(Specificity))와 같은 의미를 갖는다. AUC(Area Under Curve)는 ROC의 값이 그래프 형태로 표시될 때의 그래프 곡선 아래 면적을 의미하며, 이는 일반적으로 머신러닝 및 딥러닝 모델의 성능을 나타내는 지표로 사용된다. Here, Receiver Operating Characteristic (ROC) indicates receiver operation characteristics, True Positive Rate (TPR) has the same meaning as Sensitivity, and False Positive Rate (FPR) has the same meaning as (1-specificity). AUC (Area Under Curve) refers to the area under the graph curve when the ROC value is displayed in graph form, and is generally used as an indicator of the performance of machine learning and deep learning models.

또한, 15에서 TP는 True Positive, TN은 True Negative, FP는 False Positive, FN은 False Negative를 의미한다. TP는 데이터의 실제 라벨이 양성이고 모델의 예측도 양성인 경우, TN은 데이터의 실제 라벨이 양성이지만 모델의 예측은 음성인 경우, FP는 데이터의 실제 라벨이 음성이지만 모델의 예측은 양성인 경우, FN은 데이터의 실제 라벨이 음성이고 모델의 예측도 음성인 경우를 나타낸다.Also, in 15, TP stands for True Positive, TN stands for True Negative, FP stands for False Positive, and FN stands for False Negative. TP is when the actual label of the data is positive and the prediction of the model is also positive, TN is when the actual label of the data is positive but the prediction of the model is negative, FP is when the actual label of the data is negative but the prediction of the model is positive, FN represents the case where the actual label of the data is negative and the prediction of the model is also negative.

Sensitivity(민감도)는 재현율(Recall)과 같은 의미로 사용될 수 있으며, 민감도가 높은 경우에 실제 정답을 모델이 올바르게 예측한 것이라고 볼 수 있다. 예를 들어, 어떤 환자 A에 대해 오직 ADHD 만의 진단을 예측하는 상황이 필요한 경우, ADHD의 민감도가 높은 모델을 선택할 수 있다.Sensitivity can be used in the same sense as recall, and when the sensitivity is high, it can be seen that the model correctly predicted the actual answer. For example, if it is necessary to predict a diagnosis of only ADHD for a certain patient A, a model with high ADHD sensitivity can be selected.

F1 Score는 정밀도(Precision)과 재현율(Recall)의 조화평균을 나타내는 수치이며, 여기서 사용된 "2"라는 상수는 라벨 분포에 따라

로 쓰일 수 있다. 여기서 F1 Score는 불균형한 데이터 분포를 갖는 상황에서 모델의 성능을 정확하게 하나의 값으로 평가하기 위한 지표일 수 있다. F1 Score is a numerical value representing the harmonic average of precision and recall, and the constant "2" used here depends on the label distribution.

can be used as Here, the F1 Score may be an index for accurately evaluating the performance of a model as a single value in a situation with an unbalanced data distribution.

도 4는 분석 모델에 대한 성능을 분석한 것이다. 4 is an analysis of the performance of the analysis model.

도 4(a)는 ADHD에 대하여 3가지 모델에 사용된 학습 데이터 크기와 샘플링 된 학습 데이터 세트 전체의 모델 성능 및 Hold-Out 데이터 샘플에 대한 모델의 성능을 표로 나타낸 것이다. 각 모델의 성능은 모델 훈련에 참여하지 않은 Hold-Out 데이터 샘플에 대한 값을 평가하는 것이 올바른 성능 측정 방법이므로, 도 4(a)에서 나타낸 바와 같이, ADHD의 경우, ROC/AUC 0.801, 정확도 73.3%(0.733), 민감도 0.754, 특이도 0.714, F1 Score 0.724, G-Mean 0.734를 보인 LightGBM 모델의 성능이 가장 좋은 것을 확인할 수 있다. Figure 4 (a) is a table showing the size of the training data used in the three models for ADHD, the model performance of the entire sampled training data set, and the performance of the model for the hold-out data sample. As the performance of each model is a correct performance measurement method to evaluate the value for the hold-out data sample that did not participate in model training, as shown in Fig. 4(a), in the case of ADHD, ROC/AUC 0.801, accuracy 73.3 %(0.733), sensitivity 0.754, specificity 0.714, F1 Score 0.724, G-Mean 0.734, it can be seen that the performance of the LightGBM model is the best.

도 4(b)는 OCD에 대하여 3가지 모델에 사용된 학습 데이터 크기와 샘플링 된 학습 데이터 세트 전체의 모델 성능 및 Hold-Out 데이터 샘플에 대한 모델의 성능을 표로 나타낸 것이다. 도 4(b)를 참조하면, OCD의 경우, ROC/AUC 0.799, 정확도 75.2%(0.752), 민감도 0.785, 특이도 0.707, F1 Score 0.785, G-Mean 0.745를 보인 LightGBM 모델의 성능이 가장 좋은 것으로 확인할 수 있다. Figure 4(b) is a table showing the training data size used for the three models for OCD, the model performance of the entire sampled training dataset, and the model performance on the hold-out data sample. 4(b), in the case of OCD, the LightGBM model with ROC/AUC 0.799, accuracy 75.2% (0.752), sensitivity 0.785, specificity 0.707, F1 Score 0.785, and G-Mean 0.745 showed the best performance. can be checked

도 4(c) ODD에 대하여 3가지 모델에 사용된 학습 데이터 크기와 샘플링 된 학습 데이터 세트 전체의 모델 성능 및 Hold-Out 데이터 샘플에 대한 모델의 성능을 표로 나타낸 것이다. 도 4(c)를 참조하면, ODD의 경우, ROC/AUC 0.844, 정확도 73.8%(0.738), 민감도 0.769, 특이도 0.711, F1 Score 0.733, G-Mean 0.740을 보인 XGBoost 모델의 성능이 가장 좋은 것으로 확인할 수 있다.FIG. 4(c) is a table showing the size of the training data used in the three models for ODD, the model performance of the entire sampled training data set, and the performance of the model for the hold-out data sample. 4(c), in the case of ODD, the XGBoost model with ROC/AUC 0.844, accuracy 73.8% (0.738), sensitivity 0.769, specificity 0.711, F1 Score 0.733, and G-Mean 0.740 showed the best performance. can be checked

도 4에서 랜덤 포레스트 모델의 데이터 크기가 다른 두 가지 모델에 비해 상대적으로 적은 이유는 랜덤 포레스트 모델이 데이터의 결측을 허용하지 않으므로 결측치가 있는 데이터가 버려졌기 때문이다.The reason why the data size of the random forest model in FIG. 4 is relatively small compared to the other two models is that data with missing values are discarded because the random forest model does not allow missing data.

도 5는 일 실시예에 따른 증상 진단 방법을 도시한 흐름도이다.5 is a flowchart illustrating a symptom diagnosis method according to an exemplary embodiment.

일 실시예에 따르면, 증상 진단 장치는 사용자에게 착용된 웨어러블 장치로부터 생체 데이터를 수신할 수 있다(510).According to an embodiment, the symptom diagnosis apparatus may receive biometric data from a wearable device worn by the user ( S510 ).

일 예에 따르면, 증상 진단 장치는 사용자가 착용하고 있는 웨어러블 장치 또는 사용자 단말에서 사용자의 생체 데이터를 30초 내지 1분 단위로 수신할 수 있다. 일 예로, 생체 데이터에는 심박수, 걸음수, METs(Metabolic rates), 칼로리 소모량 및 수면 상태 중 적어도 하나를 포함할 수 있다.According to an example, the symptom diagnosis apparatus may receive the user's biometric data from the wearable device or the user terminal worn by the user in units of 30 seconds to 1 minute. As an example, the biometric data may include at least one of a heart rate, a step count, metabolic rates (METs), a calorie consumption, and a sleep state.

일 실시예에 따르면, 증상 진단 장치는 생체 데이터로부터 하나 이상의 특징값을 추출할 수 있다(520). According to an embodiment, the symptom diagnosis apparatus may extract one or more feature values from the biometric data ( 520 ).

일 예에 따르면, 증상 진단 장치는 생체 데이터에 포함된 심박수 및 걸음수 중 적어도 하나에 대한 데이터를 기초로 Cosinor 분석을 수행하여 추정함수 y(t)를 생성할 수 있다. 또한, 학습 데이터 생성 장치는 추정함수를 기초로 적합도(Good of fit, GOF)를 계산할 수 있다. According to an example, the symptom diagnosis apparatus may generate an estimation function y(t) by performing a cosinor analysis based on data on at least one of a heart rate and a number of steps included in the biometric data. Also, the learning data generating apparatus may calculate a good of fit (GOF) based on the estimation function.

일 실시예에 따르면, 증상 진단 장치는 소정 증상에 대한 레이블이 부여된 학습 데이터로 학습된 분석 모델을 이용하여 특징값을 분석하여 소정 증상의 유무를 판단할 수 있다(530). According to an embodiment, the symptom diagnosis apparatus may determine the presence or absence of a predetermined symptom by analyzing a feature value using an analysis model learned from learning data labeled with a predetermined symptom ( 530 ).

본 발명의 일 양상은 컴퓨터로 읽을 수 있는 기록 매체에 컴퓨터가 읽을 수 있는 코드로서 구현될 수 있다. 상기의 프로그램을 구현하는 코드들 및 코드 세그먼트들은 당해 분야의 컴퓨터 프로그래머에 의하여 용이하게 추론될 수 있다. 컴퓨터가 읽을 수 있는 기록 매체는 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록 장치를 포함할 수 있다. 컴퓨터가 읽을 수 있는 기록 매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플로피 디스크, 광 디스크 등을 포함할 수 있다. 또한, 컴퓨터가 읽을 수 있는 기록 매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산 방식으로 컴퓨터가 읽을 수 있는 코드로 작성되고 실행될 수 있다.An aspect of the present invention may be implemented as computer-readable codes on a computer-readable recording medium. Codes and code segments implementing the above program can be easily inferred by a computer programmer in the art. The computer-readable recording medium may include any type of recording device in which data readable by a computer system is stored. Examples of the computer-readable recording medium may include ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical disk, and the like. In addition, the computer-readable recording medium may be distributed in network-connected computer systems, and may be written and executed as computer-readable codes in a distributed manner.

이제까지 본 발명에 대하여 그 바람직한 실시 예들을 중심으로 살펴보았다. 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자는 본 발명이 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 변형된 형태로 구현될 수 있음을 이해할 수 있을 것이다. 따라서, 본 발명의 범위는 전술한 실시 예에 한정되지 않고 특허 청구범위에 기재된 내용과 동등한 범위 내에 있는 다양한 실시 형태가 포함되도록 해석되어야 할 것이다.So far, the present invention has been looked at with respect to preferred embodiments thereof. Those of ordinary skill in the art to which the present invention pertains will understand that the present invention may be implemented in a modified form without departing from the essential characteristics of the present invention. Accordingly, the scope of the present invention is not limited to the above-described embodiments, and should be construed to include various embodiments within the scope equivalent to the content described in the claims.

100: 학습 데이터 생성 장치
110: 데이터 수신부
120: 데이터 분석부
130: 학습 데이터 생성부
300: 증상 진단 장치
310: 데이터 수신부
320: 데이터 분석부
330: 진단부100: training data generating device
110: data receiving unit
120: data analysis unit
130: training data generation unit
300: symptom diagnosis device
310: data receiving unit
320: data analysis unit
330: diagnostic unit

Claims

a data receiver configured to receive biometric data from a wearable device worn by a user;
When the biometric data includes data for a predetermined time or longer, it is determined that the biometric data is valid, and the biometric data determined to be valid is returned as biometric data in units of one cycle, a data analysis unit for extracting the above feature values; and
And a learning data generator for generating learning data by performing labeling on the analysis data generated based on the one or more feature values,
The learning data generator receives checklist data based on Kiddie Schedule for Affective Disorders and Schizophrenia Present and Lifetime version (K-SADS-PL) including feature value information for each predetermined symptom, and based on the received checklist data to generate the learning data by performing labeling in which a label including whether the user is diagnosed with a mental illness and whether or not the user has a symptom is performed on the biometric data of the cycle unit.

The method of claim 1,
The biometric data includes data on at least one of a heart rate, a step count, a metabolic rate (METs, Metabolic rates), a calorie consumption, and a sleep state, a learning data generating device.

delete

The method of claim 1,
The learning data generation unit
A learning data generating apparatus for classifying the learning data into training data, verification data, and test data according to a predetermined ratio.

receiving biometric data from a wearable device worn by a user;
When the biometric data includes data for a predetermined time or longer, it is determined that the biometric data is valid, and the biometric data determined to be valid is returned as biometric data in units of one cycle, a data analysis step of extracting the above feature values; and
A learning data generation step of generating learning data by performing labeling on the analysis data generated based on the one or more feature values;
The step of generating the learning data is,
Receive checklist data based on K-SADS-PL (Kiddie Schedule for Affective Disorders and Schizophrenia Present and Lifetime version) including feature value information for each predetermined symptom, and based on the received checklist data, A method for generating learning data, characterized in that the learning data is generated by performing labeling in which a label including whether the user is diagnosed with a mental illness and whether the user has symptoms is performed on the biometric data.

8. The method of claim 7,
The bio-data includes data on at least one of heart rate, step number, metabolic rate (METs, Metabolic rates), calorie consumption, and sleep state, learning data generation method.

delete

8. The method of claim 7,
The step of generating the training data is
Classifying the learning data into training data, verification data, and test data according to a predetermined ratio, a method of generating learning data.

a data receiver configured to receive biometric data from a wearable device worn by a user;
When the biometric data includes data for a predetermined time or longer, it is determined that the biometric data is valid, and the biometric data determined to be valid is returned as biometric data in units of one cycle, a data analysis unit for extracting the above feature values; and
a diagnosis unit for determining the presence or absence of a predetermined symptom by analyzing the feature value using an analysis model learned from the learning data labeled with the user's mental illness diagnosis and symptom status;
The learning data is based on checklist data based on K-SADS-PL (Kiddie Schedule for Affective Disorders and Schizophrenia Present and Lifetime version) including feature value information for each predetermined symptom. A symptom diagnosis device, characterized in that the label including the diagnosis and symptoms of mental illness is given and generated.

14. The method of claim 13,
The analysis model is a machine learning model or a deep learning model, symptom diagnosis apparatus.

14. The method of claim 13,
The biometric data includes data on at least one of heart rate, step number, metabolic rate (METs, Metabolic rates), calorie consumption, and sleep state, symptom diagnosis apparatus.

A method for diagnosing a symptom in a symptom diagnosis apparatus, the method comprising:
receiving biometric data from a wearable device worn by a user;
When the biometric data includes data for a predetermined time or longer, it is determined that the biometric data is valid, and the biometric data determined to be valid is returned as biometric data in units of one cycle, a data analysis step of extracting the above feature values; and
A determination step of outputting data for judging the presence or absence of a predetermined symptom by analyzing the feature value using an analysis model trained with learning data labeled with a predetermined symptom;
The learning data is based on checklist data based on K-SADS-PL (Kiddie Schedule for Affective Disorders and Schizophrenia Present and Lifetime version) including feature value information for each predetermined symptom. A method for diagnosing symptoms, characterized in that the label including whether or not a diagnosis of mental illness is diagnosed and whether or not a symptom is present is generated.