WO2024058585A1 - Method and analysis device for classifying severity of lung disease of subject by using voice data and clinical information - Google Patents

Method and analysis device for classifying severity of lung disease of subject by using voice data and clinical information Download PDF

Info

Publication number
WO2024058585A1
WO2024058585A1 PCT/KR2023/013863 KR2023013863W WO2024058585A1 WO 2024058585 A1 WO2024058585 A1 WO 2024058585A1 KR 2023013863 W KR2023013863 W KR 2023013863W WO 2024058585 A1 WO2024058585 A1 WO 2024058585A1
Authority
WO
WIPO (PCT)
Prior art keywords
exercise
voice data
subject
clinical information
severity
Prior art date
Application number
PCT/KR2023/013863
Other languages
French (fr)
Korean (ko)
Inventor
김태영
이수정
정명진
김재호
박혜윤
조주희
강단비
공성아
방가람
신선혜
류혜인
Original Assignee
사회복지법인 삼성생명공익재단
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 사회복지법인 삼성생명공익재단 filed Critical 사회복지법인 삼성생명공익재단
Priority claimed from KR1020230122823A external-priority patent/KR20240038622A/en
Publication of WO2024058585A1 publication Critical patent/WO2024058585A1/en

Links

Images

Classifications

    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/66Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/20ICT specially adapted for the handling or processing of patient-related medical or healthcare data for electronic clinical trials or questionnaires
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/30ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/50ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for simulation or modelling of medical disorders
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Definitions

  • the technology described below relates to a technique for predicting the degree of lung disease using the subject's voice.
  • COPD chronic obstructive pulmonary disease
  • the technology described below seeks to provide a technique for predicting the degree of lung disease such as COPD based on the subject's voice and clinical information.
  • a method of classifying the severity of a subject's lung disease using voice data and clinical information includes the steps of: an analysis device receiving voice data and clinical information of a subject; the analysis device preprocessing the voice data and clinical information; The analysis device includes inputting the pre-processed voice data and clinical information into a pre-trained learning model, and the analysis device classifies the severity of the subject's lung disease based on the output value of the learning model.
  • the analysis device that classifies the severity of the subject's lung disease includes an interface device that receives the subject's voice data and clinical information, a storage device that stores a learning model that receives the voice data and clinical information and classifies the severity of the lung disease, and the input device. It includes a computing device that preprocesses voice data and clinical information, inputs the preprocessed voice data and clinical information into the learning model, and classifies the severity of the subject's lung disease based on the output value of the learning model.
  • the technology described below can predict the degree of lung disease by analyzing the user's voice and clinical information that can be obtained relatively easily.
  • the technology described below can diagnose the severity of lung disease through voice recording and self-diagnosis without the patient having to visit a medical institution.
  • Figure 1 is an example of a lung disease severity classification system using voice and clinical information.
  • Figure 2 is an example of the learning process of a learning model for lung disease severity classification.
  • Figure 3 shows the results of verifying the performance of a learning model that classifies lung disease severity.
  • Figure 4 is an example of an analysis device that classifies lung disease severity.
  • first, second, A, B, etc. may be used to describe various components, but the components are not limited by the terms, and are only used for the purpose of distinguishing one component from other components. It is used only as For example, a first component may be named a second component without departing from the scope of the technology described below, and similarly, the second component may also be named a first component.
  • the term and/or includes any of a plurality of related stated items or a combination of a plurality of related stated items.
  • each component is responsible for. That is, two or more components, which will be described below, may be combined into one component, or one component may be divided into two or more components for more detailed functions.
  • each of the components described below may additionally perform some or all of the functions handled by other components, and some of the main functions handled by each component may be performed by other components. Of course, it can also be carried out exclusively by .
  • each process forming the method may occur in a different order from the specified order unless a specific order is clearly stated in the context. That is, each process may occur in the same order as specified, may be performed substantially simultaneously, or may be performed in the opposite order.
  • the technology described below is a technique for predicting or classifying the severity of lung diseases such as COPD based on the subject's voice and clinical information. For convenience of explanation, the following explanation will focus on COPD. However, the technology described below can be used to predict or classify the severity of various lung diseases other than COPD.
  • User data used for analysis includes the user's voice and clinical information.
  • user data is collected from a specific subject, and can be collected before and after exercise for a subject performing a certain exercise.
  • the user's voice is collected before and after exercise, and input variables include features extracted from the voice.
  • Some of the clinical information may be collected separately before and after exercise.
  • clinical information may include questionnaire information collected from the subject. A detailed description of user data will be provided later.
  • the analysis device classifies or predicts the degree of lung disease based on the user's voice and clinical information.
  • the analysis device can be implemented as a variety of devices capable of processing data.
  • an analysis device can be implemented as a PC, a server on a network, a smart device, a wearable device, or a chipset with a dedicated program embedded therein.
  • analysis devices may be built into various devices such as exercise equipment, vehicles, smart speakers, etc.
  • the analysis device can classify lung disease using a machine learning model.
  • Machine learning models include decision trees, random forest, KNN (K-nearest neighbor), Naive Bayes, SVM (support vector machine), and ANN (artificial neural network). The following learning model will be explained focusing on DNN (Deep Neural Network). However, the learning model for lung disease classification can be implemented as various types of models.
  • Figure 1 is an example of a lung disease severity classification system 100 using voice and clinical information.
  • the analysis device is a user terminal 130, a computer terminal 140, and a server 150.
  • Subject A performs a certain exercise for a certain amount of time.
  • Patients with lung disease may have different vocal characteristics before and after exercise. Accordingly, user data can be collected from subject A before and after exercise, respectively.
  • the user data may include the subject's voice data and clinical information.
  • Voice data consists of voice data before exercise and voice data after exercise.
  • the voice data before exercise and the voice data after exercise are composed of data in which the same subject A uttered the same words or sentences (text) before and after exercise, respectively.
  • Clinical information may consist of various items. Some of the items included in clinical information correspond to data collected before and after exercise.
  • the database may store the subject's voice data and clinical information.
  • the database 110 may be a device such as an Electronic Medical Record (EMR).
  • EMR Electronic Medical Record
  • the user terminal 120 may receive user data from subject A.
  • the user terminal 120 illustrates a device such as a smart device.
  • the user terminal 120 corresponds to a device that can collect user voice through a microphone and receive clinical information through a certain interface device.
  • the user terminal 120 may be any one of various types of devices, such as a smart device, PC, wearable device, smart speaker, etc.
  • the user terminal 130 may receive user data from the database 110. Furthermore, the user terminal 120 and the user terminal 130 may be the same device. In this case, the user terminal 130 may be a device that collects and analyzes user data at the same time.
  • the user terminal 130 may constantly preprocess the user data of the subject. For example, the user terminal 130 may remove noise from the subject's voice data. Additionally, the user terminal 130 may convert voice data into one of the following types: chromagram, Mel frequency cepstral coefficient (MFCC), and Mel spectrogram. Additionally, the user terminal 130 may perform preprocessing to normalize information of different categories among clinical information to a certain range.
  • the user terminal 130 may classify the severity of the subject's lung disease by inputting user data into a pre-built learning model. User A can check the degree of the subject's lung disease through the user terminal 130.
  • the computer terminal 140 receives user data from the database 110 or the user terminal 120.
  • the computer terminal 140 may constantly preprocess user data.
  • the computer terminal 140 may classify the severity of the subject's lung disease by inputting user data into a pre-built learning model.
  • User B can check the degree of the subject's lung disease through the computer terminal 140.
  • the server 150 receives user data from the database 110 or the user terminal 120.
  • the server 150 may constantly preprocess the user data of the subject.
  • the server 150 may classify the severity of the subject's lung disease by inputting user data into a pre-built learning model.
  • User A can access the server 150 through the user terminal to check the degree of the subject's lung disease.
  • Figure 2 is an example of a learning process 200 of a learning model for lung disease severity classification.
  • a learning model may be one of various types.
  • the learning model shows a deep learning model as an example.
  • a learning model that classifies lung disease severity can be named a classification model.
  • Classification models are built using training data.
  • the learning process of the classification model can be performed by a learning device.
  • a learning device refers to a computing device that controls digital data processing and the learning process of deep learning models.
  • the learning device constructs learning data (210).
  • Training data can be collected from various groups depending on the severity of lung disease. For example, learning data may be collected from the normal group, severity 1 group, ..., and severity n group, respectively.
  • Lung disease severity can be determined based on FEV1 (Forced expiratory volume).
  • FEV1 refers to the amount of air expelled from the lungs when exhaling in 1 second. If the patient's FEV1 is lower than a threshold (eg, the average of the entire population), the patient can be classified as a COPD patient. If a patient's FEV1 is above the threshold, the patient can be classified as a patient with low severity.
  • subjects can be classified into normal, low-severity lung disease patients, and high-severity lung disease patients.
  • Learning data includes clinical information and voice data for each group.
  • the training data also includes the label value of each training data.
  • Voice data is collected separately before and after performing certain exercises. Voice data can be collected as subjects utter the same sentence.
  • Voice data may consist of items as shown in Table 1 below.
  • the learning device can extract 32 features as shown in Table 1 below from voice signals.
  • voice data may consist of any number of items among the items in Table 1 below.
  • the learning device can extract silence sections and conversation sections from the entire file using a voice recognition tool.
  • the silent section is defined as a section in which a signal with an amplitude level of -36dBFS (decibel full scale) or less lasts for more than 200ms.
  • Jitter is a value that indicates how constant the period of vibration is. The more irregular the period or amplitude, the larger the value.
  • Shimmer is a number that indicates how constant the amplitude of vibration is. The more irregular the period or amplitude, the larger the value.
  • Formant is a resonance that occurs in the vocal tract (the space that extends from the pharynx and oral cavity to the nasal cavity and lips).
  • HNR Harmonic to noise ratio
  • Speech rate refers to the number of words per minute in speech.
  • f0 (fundamental frequency) is the frequency of vocal cord vibration and perceptually corresponds to pitch.
  • Articulation rate is the number of syllables per second in speech.
  • Syllable duration refers to the duration of a syllable.
  • the learning device can extract jitter, shimmer, formants, HNR, speech rate, f0, articulation rate, and syllable length using publicly available software for speech analysis.
  • Clinical information can consist of 31 items as shown in Table 2 below.
  • the clinical information below includes self-administration variables. Some of the clinical information may be collected through wearable devices, sensor devices, etc. Furthermore, clinical information may consist of any number of items among the items in Table 2 below.
  • BMI Body Mass Index
  • Resting SpO2 blood oxygen saturation
  • SpO2 blood oxygen saturation
  • resting heart rate 10 heart rate after exercise 11
  • the learning device can consistently preprocess the initial learning data.
  • Preprocessing for voice data may include noise removal, data type conversion, etc.
  • Preprocessing of clinical information may include the process of adjusting values into certain categories.
  • the learning device can normalize clinical information using preprocessing techniques such as Min-Max Normalization and z-score normalization.
  • the learning device can convert the value of clinical information into a constant value by one-hot vector coding.
  • the learning device can input encoded clinical information into a learning model.
  • the learning device treats 32 voice variables and 31 types of clinical information as individual input variables and can construct a total of 63 input variables as learning data.
  • the learning device builds a classification model using the learning data (220).
  • the learning device extracts one input data from the collected learning data and inputs it into the classification model.
  • the classification model outputs a probability value for lung disease severity for the corresponding input data.
  • the learning device compares the value output by the classification model with the known correct answer (label value) and updates the weight of the classification model so that the classification model outputs a label corresponding to the correct answer.
  • the learning device repeats the learning process using multiple learning data.
  • Figure 3 shows the results of verifying the performance of a learning model that classifies lung disease severity. Looking at Figure 3, the built model showed an average micro AUROC (area under the ROC) and an average macro AUROC of 0.87. Therefore, the classification model showed significantly high performance in classifying lung disease severity.
  • FIG. 4 is an example of an analysis device 300 that classifies the severity of lung disease.
  • the analysis device 300 corresponds to the above-described analysis device (130, 140, or 150 in FIG. 1).
  • the analysis device 300 may be physically implemented in various forms.
  • the analysis device 300 may take the form of a smart device, a computer device such as a PC, a network server, a wearable device, an exercise device, or a chipset dedicated to data processing.
  • the analysis device 300 may include a storage device 310, a memory 320, an arithmetic device 330, an interface device 340, a communication device 350, and an output device 360.
  • the storage device 310 may store the above-described classification model.
  • the classification model is a pre-trained model.
  • the classification model is a model that outputs lung disease severity based on input user data (voice data and clinical information).
  • the storage device 310 can store user data.
  • User data is the user's voice data and clinical information that are subject to analysis.
  • Voice data consists of data collected before exercise and data collected after exercise.
  • Voice data may consist of the items in Table 1.
  • Clinical information may consist of the items in Table 2.
  • the memory 320 may store data and information generated when the analysis device classifies the severity of lung disease using the subject's user data.
  • the interface device 340 is a device that receives certain commands and data from the outside.
  • the interface device 340 may receive the subject's voice data from a physically connected input device or an external storage device.
  • the input device may include a device such as a microphone.
  • Voice data consists of data measured before and after exercise.
  • the interface device 340 may receive the subject's clinical information from a physically connected input device or an external storage device.
  • the interface device 340 may analyze the subject's user data and transmit the results of classifying the severity of lung disease to an external object.
  • the interface device 340 may receive data or information transmitted through the communication device 350 below.
  • the communication device 350 refers to a configuration that receives and transmits certain information through a wired or wireless network.
  • the communication device 350 may receive the subject's voice data from an external object (database, user terminal, microphone, etc.).
  • an external object database, user terminal, microphone, etc.
  • the communication device 350 may receive clinical information about a subject from an external object.
  • the communication device 350 may analyze the subject's user data and transmit the results of classifying the severity of lung disease to an external object such as a user terminal.
  • the output device 360 is a device that outputs certain information.
  • the output device 360 can output interfaces, classification results, etc. required for the data processing process.
  • the computing device 330 may preprocess user data consistently. For example, the computing device 330 may convert voice data into a certain type of data. Additionally, the computing device 330 may normalize each value of clinical information into a certain category.
  • the computing device 330 inputs the preprocessed user data into a pre-trained learning model.
  • the computing device 330 may classify the severity of the subject's lung disease based on the probability value output by the learning model.
  • the computing device 330 may be a device such as a processor that processes data and performs certain operations, an AP, or a chip with an embedded program.
  • the method for classifying the severity of a subject's lung disease as described above may be implemented as a program (or application) including an executable algorithm that can be executed on a computer.
  • the program may be stored and provided in a temporary or non-transitory computer readable medium.
  • a non-transitory readable medium refers to a medium that stores data semi-permanently and can be read by a device, rather than a medium that stores data for a short period of time, such as registers, caches, and memories.
  • the various applications or programs described above include CD, DVD, hard disk, Blu-ray disk, USB, memory card, ROM (read-only memory), PROM (programmable read only memory), and EPROM (Erasable PROM, EPROM).
  • EEPROM Electrically EPROM
  • Temporarily readable media include Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDR SDRAM), and Enhanced SDRAM (Enhanced RAM). It refers to various types of RAM, such as SDRAM (ESDRAM), Synchronous DRAM (Synclink DRAM, SLDRAM), and Direct Rambus RAM (DRRAM).
  • SRAM Static RAM
  • DRAM Dynamic RAM
  • SDRAM Synchronous DRAM
  • DDR SDRAM Double Data Rate SDRAM
  • Enhanced SDRAM Enhanced SDRAM
  • ESDRAM Synchronous DRAM
  • SLDRAM Synchronous DRAM
  • DRRAM Direct Rambus RAM

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Public Health (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Epidemiology (AREA)
  • Primary Health Care (AREA)
  • Pathology (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Artificial Intelligence (AREA)
  • Heart & Thoracic Surgery (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Animal Behavior & Ethology (AREA)
  • Signal Processing (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Veterinary Medicine (AREA)
  • Surgery (AREA)
  • Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)

Abstract

This method for classifying the severity of a lung disease of a subject by using voice data and clinical information comprises steps in which an analysis device: receives voice data and clinical information about the subject; pre-processes the voice data and the clinical information; inputs the pre-processed voice data and clinical information into a pre-trained learning model; and classifies the severity of the lung disease of the subject on the basis of an output value of the learning model.

Description

음성 데이터 및 임상 정보를 이용하여 대상자의 폐질환의 중증도를 분류하는 방법 및 분석장치Method and analysis device for classifying the severity of a subject's lung disease using voice data and clinical information
이하 설명하는 기술은 대상자의 음성을 이용한 폐질환 정도를 예측하는 기법에 관한 것이다.The technology described below relates to a technique for predicting the degree of lung disease using the subject's voice.
COPD(chronic obstructive pulmonary disease)와 같은 폐질환은 악화 예방을 위하여 조기 진단이 중요하다. COPD는 임상적으로 기침, 객담, 호흡 곤란 등이 있는 환자를 대상으로 폐기능 검사 등을 수행하여 진단될 수 있다. 다만, COPD는 초기 증상을 판별하기 어렵기 때문에, 기본적인 진단만으로는 조기 발견이 어렵다.Early diagnosis of lung diseases such as COPD (chronic obstructive pulmonary disease) is important to prevent worsening. COPD can be clinically diagnosed by performing pulmonary function tests on patients with coughing, sputum production, and shortness of breath. However, because early symptoms of COPD are difficult to identify, early detection is difficult with basic diagnosis alone.
최근 흉부 CT(Computed Tomography) 영상을 분석하는 딥러닝 모델을 이용하여 COPD를 진단하는 연구가 진행된 바 있다. 그러나, 이와 같은 진단 기법도 환자에 대한 흉부 영상이 필요하므로 폐질환의 조기 발견에 기여하기는 어렵다.Recently, a study was conducted to diagnose COPD using a deep learning model that analyzes chest CT (Computed Tomography) images. However, this diagnostic technique also requires chest imaging of the patient, so it is difficult to contribute to the early detection of lung disease.
이하 설명하는 기술은 대상자의 음성 및 임상정보를 기준으로 COPD와 같은 폐질환의 정도를 예측하는 기법을 제공하고자 한다.The technology described below seeks to provide a technique for predicting the degree of lung disease such as COPD based on the subject's voice and clinical information.
음성 데이터 및 임상 정보를 이용하여 대상자의 폐질환의 중증도를 분류하는 방법은 분석장치는 대상자의 음성 데이터 및 임상 정보를 입력받는 단계, 상기 분석장치는 상기 음성 데이터 및 상기 임상 정보를 전처리하는 단계, 상기 분석장치는 상기 전처리된 음성 데이터 및 임상 정보를 사전에 학습된 학습모델에 입력하는 단계 및 상기 분석장치는 상기 학습모델의 출력값을 기준으로 상기 대상자의 폐질환 중증도를 분류하는 단계를 포함한다.A method of classifying the severity of a subject's lung disease using voice data and clinical information includes the steps of: an analysis device receiving voice data and clinical information of a subject; the analysis device preprocessing the voice data and clinical information; The analysis device includes inputting the pre-processed voice data and clinical information into a pre-trained learning model, and the analysis device classifies the severity of the subject's lung disease based on the output value of the learning model.
대상자의 폐질환을 중증도를 분류하는 분석장치는 대상자의 음성 데이터 및 임상 정보를 입력받는 인터페이스 장치, 음성 데이터 및 임상 정보를 입력받아 폐질환 중증도를 분류하는 학습 모델을 저장하는 저장장치 및 상기 입력되는 음성 데이터 및 임상 정보를 전처리하고, 상기 전처리된 음성 데이터 및 임상 정보를 상기 학습 모델에에 입력하고, 상기 학습모델의 출력값을 기준으로 상기 대상자의 폐질환 중증도를 분류하는 연산장치를 포함한다.The analysis device that classifies the severity of the subject's lung disease includes an interface device that receives the subject's voice data and clinical information, a storage device that stores a learning model that receives the voice data and clinical information and classifies the severity of the lung disease, and the input device. It includes a computing device that preprocesses voice data and clinical information, inputs the preprocessed voice data and clinical information into the learning model, and classifies the severity of the subject's lung disease based on the output value of the learning model.
이하 설명하는 기술은 비교적 손쉽게 획득 가능한 사용자의 음성 및 임상 정보를 분석하여 폐질환 정도를 예측할 수 있다. 이하 설명하는 기술은 환자가 의료 기관을 방문하지 않고도 음성 녹음 및 자가 진단을 통해 폐질환의 중증도를 진단할 수 있다. The technology described below can predict the degree of lung disease by analyzing the user's voice and clinical information that can be obtained relatively easily. The technology described below can diagnose the severity of lung disease through voice recording and self-diagnosis without the patient having to visit a medical institution.
도 1은 음성 및 임상 정보를 이용한 폐질환 정도 분류 시스템에 대한 예이다.Figure 1 is an example of a lung disease severity classification system using voice and clinical information.
도 2는 폐질환 중증도 분류를 위한 학습 모델의 학습 과정의 예이다. Figure 2 is an example of the learning process of a learning model for lung disease severity classification.
도 3은 폐질환 중증도를 분류하는 학습모델의 성능을 검증한 결과이다.Figure 3 shows the results of verifying the performance of a learning model that classifies lung disease severity.
도 4는 폐질환 중증도를 분류하는 분석장치의 예이다. Figure 4 is an example of an analysis device that classifies lung disease severity.
이하 설명하는 기술은 다양한 변경을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 상세하게 설명하고자 한다. 그러나, 이는 이하 설명하는 기술을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 이하 설명하는 기술의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다.The technology described below may be subject to various changes and may have various embodiments, and specific embodiments will be illustrated in the drawings and described in detail. However, this is not intended to limit the technology described below to specific embodiments, and should be understood to include all changes, equivalents, and substitutes included in the spirit and scope of the technology described below.
제1, 제2, A, B 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 해당 구성요소들은 상기 용어들에 의해 한정되지는 않으며, 단지 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. 예를 들어, 이하 설명하는 기술의 권리 범위를 벗어나지 않으면서 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다. 및/또는 이라는 용어는 복수의 관련된 기재된 항목들의 조합 또는 복수의 관련된 기재된 항목들 중의 어느 항목을 포함한다.Terms such as first, second, A, B, etc. may be used to describe various components, but the components are not limited by the terms, and are only used for the purpose of distinguishing one component from other components. It is used only as For example, a first component may be named a second component without departing from the scope of the technology described below, and similarly, the second component may also be named a first component. The term and/or includes any of a plurality of related stated items or a combination of a plurality of related stated items.
본 명세서에서 사용되는 용어에서 단수의 표현은 문맥상 명백하게 다르게 해석되지 않는 한 복수의 표현을 포함하는 것으로 이해되어야 하고, "포함한다" 등의 용어는 설명된 특징, 개수, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함을 의미하는 것이지, 하나 또는 그 이상의 다른 특징들이나 개수, 단계 동작 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 배제하지 않는 것으로 이해되어야 한다.In terms used in this specification, singular expressions should be understood to include plural expressions, unless clearly interpreted differently from the context, and terms such as “including” refer to the described features, numbers, steps, operations, and components. , it means the existence of parts or a combination thereof, but should be understood as not excluding the possibility of the presence or addition of one or more other features, numbers, step operation components, parts, or combinations thereof.
도면에 대한 상세한 설명을 하기에 앞서, 본 명세서에서의 구성부들에 대한 구분은 각 구성부가 담당하는 주기능 별로 구분한 것에 불과함을 명확히 하고자 한다. 즉, 이하에서 설명할 2개 이상의 구성부가 하나의 구성부로 합쳐지거나 또는 하나의 구성부가 보다 세분화된 기능별로 2개 이상으로 분화되어 구비될 수도 있다. 그리고 이하에서 설명할 구성부 각각은 자신이 담당하는 주기능 이외에도 다른 구성부가 담당하는 기능 중 일부 또는 전부의 기능을 추가적으로 수행할 수도 있으며, 구성부 각각이 담당하는 주기능 중 일부 기능이 다른 구성부에 의해 전담되어 수행될 수도 있음은 물론이다.Before providing a detailed description of the drawings, it would be clarified that the division of components in this specification is merely a division according to the main function each component is responsible for. That is, two or more components, which will be described below, may be combined into one component, or one component may be divided into two or more components for more detailed functions. In addition to the main functions it is responsible for, each of the components described below may additionally perform some or all of the functions handled by other components, and some of the main functions handled by each component may be performed by other components. Of course, it can also be carried out exclusively by .
또, 방법 또는 동작 방법을 수행함에 있어서, 상기 방법을 이루는 각 과정들은 문맥상 명백하게 특정 순서를 기재하지 않은 이상 명기된 순서와 다르게 일어날 수 있다. 즉, 각 과정들은 명기된 순서와 동일하게 일어날 수도 있고 실질적으로 동시에 수행될 수도 있으며 반대의 순서대로 수행될 수도 있다.In addition, when performing a method or operation method, each process forming the method may occur in a different order from the specified order unless a specific order is clearly stated in the context. That is, each process may occur in the same order as specified, may be performed substantially simultaneously, or may be performed in the opposite order.
이하 설명하는 기술은 대상자의 음성 및 임상정보를 기준으로 COPD와 같은 폐질환의 중증도를 예측 내지 분류하는 기법이다. 이하 설명의 편의를 위하여 COPD를 중심으로 설명한다. 다만, 이하 설명하는 기술은 COPD 외에 다양한 폐질환 중증도 예측 내지 분류에 활용될 수 있다.The technology described below is a technique for predicting or classifying the severity of lung diseases such as COPD based on the subject's voice and clinical information. For convenience of explanation, the following explanation will focus on COPD. However, the technology described below can be used to predict or classify the severity of various lung diseases other than COPD.
분석에 활용하는 사용자 데이터는 사용자의 음성 및 임상 정보를 포함한다. 이때 사용자 데이터는 특정 대상자로부터 수집되며, 일정한 운동을 수행하는 대상자에 대하여 운동 전과 운동 후에 수집될 수 있다. 사용자의 음성은 운동 전과 운동 후에 각각 수집되며, 입력 변수는 해당 음성에서 추출한 특징들을 포함한다. 임상 정보 중 일부는 운동 전과 운동 후에 각각 수집될 수 있다. 나아가, 임상 정보는 대상자로부터 수집한 문진 정보를 포함할 수 있다. 사용자 데이터에 대한 구체적인 설명은 후술한다.User data used for analysis includes the user's voice and clinical information. At this time, user data is collected from a specific subject, and can be collected before and after exercise for a subject performing a certain exercise. The user's voice is collected before and after exercise, and input variables include features extracted from the voice. Some of the clinical information may be collected separately before and after exercise. Furthermore, clinical information may include questionnaire information collected from the subject. A detailed description of user data will be provided later.
이하 분석장치가 사용자의 음성 및 임상 정보를 기준으로 폐질환의 정도를 분류 내지 예측한다고 설명한다. 분석 장치는 데이터 처리가 가능한 다양한 장치로 구현될 수 있다. 예컨대, 분석 장치는 PC, 네트워크상의 서버, 스마트 기기, 웨어러블 기기, 전용 프로그램이 임베딩된 칩셋 등으로 구현될 수 있다. 나아가 분석장치는 운동 기구, 차량, 스마트스피커 등과 같은 다양한 장치에 내장된 형태일 수도 있다.The following explains that the analysis device classifies or predicts the degree of lung disease based on the user's voice and clinical information. The analysis device can be implemented as a variety of devices capable of processing data. For example, an analysis device can be implemented as a PC, a server on a network, a smart device, a wearable device, or a chipset with a dedicated program embedded therein. Furthermore, analysis devices may be built into various devices such as exercise equipment, vehicles, smart speakers, etc.
분석장치는 기계 학습 모델을 이용하여 폐질환 여부를 분류할 수 있다. 기계 학습 모델은 결정 트리, 랜덤 포레스트(random forest), KNN(K-nearest neighbor), 나이브 베이즈(Naive Bayes), SVM(support vector machine), ANN(artificial neural network) 등이 있다. 이하 학습모델은 DNN(Deep Neural Network)을 중심으로 설명한다. 다만, 폐질환 분류를 위한 학습모델은 다양한 유형의 모델로 구현될 수 있다.The analysis device can classify lung disease using a machine learning model. Machine learning models include decision trees, random forest, KNN (K-nearest neighbor), Naive Bayes, SVM (support vector machine), and ANN (artificial neural network). The following learning model will be explained focusing on DNN (Deep Neural Network). However, the learning model for lung disease classification can be implemented as various types of models.
도 1은 음성 및 임상 정보를 이용한 폐질환 정도 분류 시스템(100)에 대한 예이다. 도 1에서 분석장치는 사용자 단말(130), 컴퓨터 단말(140) 및 서버(150)인 예를 도시하였다.Figure 1 is an example of a lung disease severity classification system 100 using voice and clinical information. In Figure 1, the analysis device is a user terminal 130, a computer terminal 140, and a server 150.
대상자 A는 일정한 시간 동안 일정한 운동을 수행한다. 폐질환 환자는 운동 전과 운동 후에 대비되는 음성적 특징이 다를 수 있다. 따라서, 사용자 데이터는 대상자 A로부터 운동 전과 운동 후에 각각 수집될 수 있다. Subject A performs a certain exercise for a certain amount of time. Patients with lung disease may have different vocal characteristics before and after exercise. Accordingly, user data can be collected from subject A before and after exercise, respectively.
사용자 데이터는 전술한 바와 같이 대상자의 음성 데이터 및 임상 정보를 포함할 수 있다. 음성 데이터는 운동 전 음성 데이터 및 운동 후 음성 데이터로 구성된다. 운동 전 음성 데이터 및 운동 후 음성 데이터는 각각 동일한 대상자 A가 운동 전과 운동 후에 각각 동일한 단어들 또는 문장(텍스트)을 발성한 데이터로 구성된다. 임상 정보는 다양한 항목들로 구성될 수 있다. 임상 정보에 포함되는 항목들 중 일부는 운동 전과 운동 후에 각각 수집되는 데이터에 해당한다.As described above, the user data may include the subject's voice data and clinical information. Voice data consists of voice data before exercise and voice data after exercise. The voice data before exercise and the voice data after exercise are composed of data in which the same subject A uttered the same words or sentences (text) before and after exercise, respectively. Clinical information may consist of various items. Some of the items included in clinical information correspond to data collected before and after exercise.
데이터베이스(DB, 110)는 대상자의 음성 데이터 및 임상 정보를 저장할 수 있다. 데이터베이스(110)는 EMR(Electronic Medical Record)과 같은 장치일 수 있다.The database (DB, 110) may store the subject's voice data and clinical information. The database 110 may be a device such as an Electronic Medical Record (EMR).
사용자 단말(120)은 대상자 A로부터 사용자 데이터를 입력받을 수 있다. 도 1에서 사용자 단말(120)은 스마트 기기와 같은 장치를 도시하였다. 사용자 단말(120)은 마이크로 사용자 음성을 수집할 수 있고, 일정한 인터페이스 장치를 통하여 임상 정보를 입력받을 수 있는 장치에 해당한다. 사용자 단말(120)은 스마트 기기, PC, 웨어러블 기기, 스마트 스피커 등과 같이 다양한 형태의 장치 중 어느 하나일 수 있다.The user terminal 120 may receive user data from subject A. In Figure 1, the user terminal 120 illustrates a device such as a smart device. The user terminal 120 corresponds to a device that can collect user voice through a microphone and receive clinical information through a certain interface device. The user terminal 120 may be any one of various types of devices, such as a smart device, PC, wearable device, smart speaker, etc.
사용자 단말(130)은 데이터베이스(110)로부터 사용자 데이터를 수신할 수 있다. 나아가, 사용자 단말(120)과 사용자 단말(130)은 동일한 장치일 수 있다. 이 경우 사용자 단말(130)은 사용자 데이터를 수집하고 동시에 분석하는 장치일 수 있다. 사용자 단말(130)은 대상자의 사용자 데이터를 일정하게 전처리할 수 있다. 예컨대, 사용자 단말(130)은 대상자의 음성 데이터의 잡음을 제거할 수 있다. 또한, 사용자 단말(130)은 음성 데이터를 크로마그램(chromagram), MFCC(Mel frequency cepstral coefficient) 및 멜 스펙트로그램(Mel spectrogram) 등과 같은 유형 중 어느 하나로 변환할 수도 있다. 또한, 사용자 단말(130)은 임상 정보 중 범주가 서로 다른 정보를 일정한 범위로 정규화하는 전처리를 할 수 있다. 사용자 단말(130)은 사용자 데이터를 사전에 구축한 학습 모델에 입력하여 대상자의 폐질환 중증도를 분류할 수 있다. 사용자 A는 사용자 단말(130)을 통해 대상자의 폐질환 정도를 확인할 수 있다.The user terminal 130 may receive user data from the database 110. Furthermore, the user terminal 120 and the user terminal 130 may be the same device. In this case, the user terminal 130 may be a device that collects and analyzes user data at the same time. The user terminal 130 may constantly preprocess the user data of the subject. For example, the user terminal 130 may remove noise from the subject's voice data. Additionally, the user terminal 130 may convert voice data into one of the following types: chromagram, Mel frequency cepstral coefficient (MFCC), and Mel spectrogram. Additionally, the user terminal 130 may perform preprocessing to normalize information of different categories among clinical information to a certain range. The user terminal 130 may classify the severity of the subject's lung disease by inputting user data into a pre-built learning model. User A can check the degree of the subject's lung disease through the user terminal 130.
컴퓨터 단말(140)은 데이터베이스(110) 또는 사용자 단말(120)로부터 사용자 데이터를 수신한다. 컴퓨터 단말(140)은 사용자 데이터를 일정하게 전처리할 수 있다. 컴퓨터 단말(140)은 사용자 데이터를 사전에 구축한 학습 모델에 입력하여 대상자의 폐질환 중증도를 분류할 수 있다. 사용자 B는 컴퓨터 단말(140)을 통해 대상자의 폐질환 정도를 확인할 수 있다. The computer terminal 140 receives user data from the database 110 or the user terminal 120. The computer terminal 140 may constantly preprocess user data. The computer terminal 140 may classify the severity of the subject's lung disease by inputting user data into a pre-built learning model. User B can check the degree of the subject's lung disease through the computer terminal 140.
서버(150)는 데이터베이스(110) 또는 사용자 단말(120)로부터 사용자 데이터를 수신한다. 서버(150)는 대상자의 사용자 데이터를 일정하게 전처리할 수 있다. 서버(150)는 사용자 데이터를 사전에 구축한 학습 모델에 입력하여 대상자의 폐질환 중증도를 분류할 수 있다. 사용자 A는 사용자 단말을 통해 서버(150)에 접속하여 대상자의 폐질환 정도를 확인할 수 있다. The server 150 receives user data from the database 110 or the user terminal 120. The server 150 may constantly preprocess the user data of the subject. The server 150 may classify the severity of the subject's lung disease by inputting user data into a pre-built learning model. User A can access the server 150 through the user terminal to check the degree of the subject's lung disease.
도 2는 폐질환 중증도 분류를 위한 학습 모델의 학습 과정(200)의 예이다. 학습 모델은 다양한 유형 중 어느 하나일 수 있다. 도 2에서 학습모델은 딥러닝 모델을 예로 도시한다. 폐질환 중증도를 분류하는 학습 모델은 분류 모델이라고 명명할 수 있다. 분류 모델은 학습 데이터를 이용하여 구축된다. 분류 모델의 학습 과정은 학습장치가 수행할 수 있다. 학습장치는 디지털 데이터 처리 및 딥러닝 모델의 학습 과정을 제어하는 컴퓨팅 장치를 의미한다.Figure 2 is an example of a learning process 200 of a learning model for lung disease severity classification. A learning model may be one of various types. In Figure 2, the learning model shows a deep learning model as an example. A learning model that classifies lung disease severity can be named a classification model. Classification models are built using training data. The learning process of the classification model can be performed by a learning device. A learning device refers to a computing device that controls digital data processing and the learning process of deep learning models.
학습장치는 학습 데이터를 구축한다(210). 학습 데이터는 폐질환 중증도에 따라 다양한 그룹에서 수집할 수 있다. 예컨대, 학습 데이터는 정상 그룹, 중증도 1 그룹, ..., 및 중증도 n 그룹으로부터 각각 수집될 수 있다. 폐질환 중증도는 FEV1(Forced expiratory volume)을 기준으로 결정될 수 있다. FEV1은 1초 동안 숨을 내쉬면서 폐에서 배출하는 공기량을 말한다. 환자의 FEV1이 임계값(예컨대, 전체 모집단의 평균)보다 낮은 경우, 해당 환자는 COPD 환자로 분류할 수 있다. 환자의 FEV1이 임계값 이상인 경우, 해당 환자는 중증도가 낮은 환자로 분류할 수 있다. 즉, 이 경우 대상자는 정상, 낮은 중증도의 폐질환 환자 및 높은 중증도의 폐질환 환자로 분류될 수 있다. 학습 데이터는 각 그룹의 임상 정보 및 음성 데이터를 포함한다. 나아가, 학습 데이터는 각 학습 데이터의 라벨값도 포함한다. 음성 데이터는 일정한 운동의 수행 전과 수행 후로 구분하여 각각 수집된다. 음성 데이터는 대상자들이 동일한 문장을 발성하면서 수집될 수 있다. The learning device constructs learning data (210). Training data can be collected from various groups depending on the severity of lung disease. For example, learning data may be collected from the normal group, severity 1 group, ..., and severity n group, respectively. Lung disease severity can be determined based on FEV1 (Forced expiratory volume). FEV1 refers to the amount of air expelled from the lungs when exhaling in 1 second. If the patient's FEV1 is lower than a threshold (eg, the average of the entire population), the patient can be classified as a COPD patient. If a patient's FEV1 is above the threshold, the patient can be classified as a patient with low severity. That is, in this case, subjects can be classified into normal, low-severity lung disease patients, and high-severity lung disease patients. Learning data includes clinical information and voice data for each group. Furthermore, the training data also includes the label value of each training data. Voice data is collected separately before and after performing certain exercises. Voice data can be collected as subjects utter the same sentence.
음성 데이터는 아래 표 1과 같은 항목들로 구성될 수 있다. 학습장치는 음성 신호로부터 아래와 표 1과 같은 32개의 특징을 추출할 수 있다. 나아가 음성 데이터는 아래 표 1의 항목들 중 임의의 복수의 항목들로 구성될 수도 있다.Voice data may consist of items as shown in Table 1 below. The learning device can extract 32 features as shown in Table 1 below from voice signals. Furthermore, voice data may consist of any number of items among the items in Table 1 below.
1One 운동 전 침묵구간 수Number of silent sections before exercise
22 운동 후 침묵구간 수Number of silent sections after exercise
33 운동 전후 침묵구간 수의 차이Difference in number of silent sections before and after exercise
44 운동 전후 침묵구간 수의 비율Ratio of number of silent sections before and after exercise
55 운동 전 침묵구간 길이Length of silent period before exercise
66 운동 후 침묵구간 길이Length of silence after exercise
77 운동 전후 침묵구간 차이Differences between silent intervals before and after exercise
88 운동 전후 침묵구간 비율Ratio of silent sections before and after exercise
99 운동 전 녹음파일의 전체 길이Total length of pre-exercise recording file
1010 운동 후 녹음파일의 전체 길이Total length of post-exercise recording file
1111 운동 전후 녹음파일의 전체 길이 차이Difference in total length of recording files before and after exercise
1212 운동 전후 녹음파일의 전체 길이 비율Total length ratio of recording files before and after exercise
1313 운동 전 녹음파일 전체 길이 대비 침묵구간 길이의 비율Ratio of the length of the silent section to the total length of the pre-exercise recording file
1414 운동 후 녹음파일 전체 길이 대비 침묵구간 길이의 비율Ratio of the length of the silent section to the total length of the recording file after exercise
1515 운동 전후 녹음파일 전체 길이 대비 침묵구간 길이의 차이Difference between the length of the silent section compared to the total length of the recording file before and after exercise
1616 운동 전후 녹음파일 전체 길이 대비 침묵구간 길이의 비율Ratio of the length of the silent section to the total length of the recording file before and after exercise
1717 운동 전 지터(jitter)Jitter before exercise
1818 운동 후 지터(jitter)Jitter after exercise
1919 운동 전 시머(shimmer)Shimmer before exercise
2020 운동 후 시머(shimmer)Post-workout shimmer
2121 운동 전 HNR(harmonic to noise ratio)Harmonic to noise ratio (HNR) before exercise
2222 운동 후 HNRHNR after exercise
2323 운동 전 포먼트(formant) Formant before exercise
2424 운동 전 포먼트(formant) Formant before exercise
2525 운동 전 발성 속도(speech rate)Speech rate before exercise
2626 운동 후 발성 속도(speech rate)Speech rate after exercise
2727 운동 전 f0 (기본주파수)f0 (fundamental frequency) before exercise
2828 운동 후 f0 (기본주파수)f0 (fundamental frequency) after exercise
2929 운동 전 조음 속도(articulation rate)Articulation rate before exercise
3030 운동 후 조음 속도(articulation rate)Articulation rate after exercise
3131 운동 전 음절 길이(syllable duration)Syllable duration before exercise
3232 운동 후 음절 길이(syllable duration)Syllable duration after exercise
음성 신호의 특징들 중 침묵구간 수, 침묵구간 길이, 녹음 길이 및 녹음 길이 대비 침묵구간의 비율은 정상군에 비해 COPD 환자군에서 운동 전 대비 운동 후에 크게 증가할 것으로 평가된다. 사용자의 발성을 위한 텍스트가 사전에 설정된 경우, 학습장치는 음성 인식툴을 이용하여 전체 파일에서 침묵 구간 및 대화 구간 등을 추출할 수 있다. Among the characteristics of the voice signal, the number of silent sections, length of silent sections, recording length, and ratio of silent sections to recording length are expected to significantly increase after exercise compared to before exercise in the COPD patient group compared to the normal group. If the text for the user's speech is set in advance, the learning device can extract silence sections and conversation sections from the entire file using a voice recognition tool.
침묵구간은 -36dBFS(decibel Full scale) 이하의 크기 정도(amplitude level)를 갖는 신호가 200ms 이상 지속된 구간으로 정의한다.The silent section is defined as a section in which a signal with an amplitude level of -36dBFS (decibel full scale) or less lasts for more than 200ms.
지터(jitter)는 진동의 주기가 얼마나 일정한지 나타내는 수치로 주기나 진폭이 불규칙할수록 값이 커진다.Jitter is a value that indicates how constant the period of vibration is. The more irregular the period or amplitude, the larger the value.
시머(shimmer)는 진동의 진폭이 얼마나 일정한지 나타내는 수치로 주기나 진폭이 불규칙할수록 값이 커진다.Shimmer is a number that indicates how constant the amplitude of vibration is. The more irregular the period or amplitude, the larger the value.
포먼트(formant)는 성도(인두, 구강을 지나 비강과 입술까지 이르는 공간)에서 발생하는 공명이다.Formant is a resonance that occurs in the vocal tract (the space that extends from the pharynx and oral cavity to the nasal cavity and lips).
HNR(harmonic to noise ratio)은 70~4,500 Hz 사이에 존재하는 배음과 1,500~4,500 Hz 사이에 존재하는 비정상 배음 간의 비율 평균치이며, 그 값이 클수록 소음의 비율이 높음을 의미한다.HNR (harmonic to noise ratio) is the average value of the ratio between overtones that exist between 70 and 4,500 Hz and abnormal overtones that exist between 1,500 and 4,500 Hz. The larger the value, the higher the noise ratio.
발성 속도(speech rate)는 음성 내 분당 단어 수 (words per minute)를 의미한다.Speech rate refers to the number of words per minute in speech.
f0(기본주파수)는 성대 진동의 주파수로 지각적으로는 음높이(pitch)에 해당한다.f0 (fundamental frequency) is the frequency of vocal cord vibration and perceptually corresponds to pitch.
조음 속도(articulation rate)는 음성 내 초당 음절 수 (syllables per second)이다.Articulation rate is the number of syllables per second in speech.
음절 길이(syllable duration)는 음절 지속 시간을 말한다.Syllable duration refers to the duration of a syllable.
학습장치는 음성 분석을 위한 공개 소프트웨어를 사용하여 지터, 시머, 포먼트, HNR, 발성 속도, f0, 조음 속도 및 음절 길이를 추출할 수 있다.The learning device can extract jitter, shimmer, formants, HNR, speech rate, f0, articulation rate, and syllable length using publicly available software for speech analysis.
임상 정보는 아래 표 2와 같은 31개의 항목들로 구성될 수 있다. 아래 임상 정보는 자가문진 변수들을 포함한다. 임상 정보 중 일부 정보는 웨어러블기기, 센서 장치 등으로 통해 수집될 수도 있다. 나아가 임상 정보는 아래 표 2의 항목들 중 임의의 복수의 항목들로 구성될 수도 있다.Clinical information can consist of 31 items as shown in Table 2 below. The clinical information below includes self-administration variables. Some of the clinical information may be collected through wearable devices, sensor devices, etc. Furthermore, clinical information may consist of any number of items among the items in Table 2 below.
1One 성별gender
22 나이age
33 key
44 몸무게weight
55 BMI (Body Mass Index)BMI (Body Mass Index)
66 의자 앉아 일어서기 횟수Number of times to sit down and stand up
77 안정 시 SpO2 (혈중 산소 포화도)Resting SpO2 (blood oxygen saturation)
88 운동 후 SpO2 (혈중 산소 포화도)SpO2 (blood oxygen saturation) after exercise
99 안정 시 심박수resting heart rate
1010 운동 후 심박수heart rate after exercise
1111 운동자각도 (의자 앉아 일어서기 후 호흡곤란 정도로 1~3 범위이며 1=괜찮다, 2= 약간 숨이 차다, 3= 숨이 차다)Awareness of exertion (difficulty breathing after sitting on a chair and standing up ranges from 1 to 3; 1=fine, 2=slightly out of breath, 3=out of breath)
1212 호흡곤란 정도 (0~4 범위이며 0=정상, 4=최중증)Degree of dyspnea (ranging from 0 to 4, 0=normal, 4=severe)
1313 나는 전혀 기침을 하지 않는다/한다 (0~5 범위이며 0=전혀 없다, 5=심하다)I do not cough at all (range 0 to 5, 0=not at all, 5=severe)
1414 나는 가슴에 전혀 가래가 없다/있다I don't/have any phlegm in my chest at all
1515 나는 가슴 답답함을 전혀 느끼지 않는다 / 느낀다I do not feel / feel any chest tightness at all
1616 나는 언덕이나 계단을 오를 때 전혀 숨이 차지 않다 / 차다I don't feel out of breath at all when I climb hills or stairs.
1717 나는 집에서 활동하는데, 전혀 재약을 받지 않는다 / 받는다I work from home, and I don't / get any medication at all.
1818 나는 폐질환에도 불구하고 외출하는데 자신이 있다 / 없다Despite my lung disease, I am/are not confident in going out.
1919 나는 잠을 깊이 잔다 / 못한다I sleep deeply / I can’t.
2020 나는 기운이 왕성하다 / 없다I have a lot of energy / I have no energy.
2121 위 질문 8항의 합 (0~40)Sum of 8 questions above (0~40)
2222 지금까지 살아오는 동안 피운 담배의 양은 총 얼마나 됩니까? (1 = 피운 적 없음 or 평생 5갑 미만, 2 = 평생 5갑 이상 )How many cigarettes have you smoked in your entire life so far? (1 = never smoked or less than 5 packs in lifetime, 2 = more than 5 packs in lifetime)
2323 금연여부 (1 = 금연중(현재 전혀 피우지 않음), 2 = 현재 흡연중 (금연 시도 중이라도 현재 완전히 금연 못했으면 흡연 중으로 표시)Smoking cessation status (1 = quitting smoking (currently not smoking at all), 2 = currently smoking (even if you are trying to quit smoking, if you have not completely quit smoking, mark as smoking)
2424 금연한 기간 (1 = 1달이내 금연, 2 = 1년 이내 금연, 3 = 1년 이전 금연)Period of quitting smoking (1 = quit smoking within 1 month, 2 = quit smoking within 1 year, 3 = quit smoking more than 1 year ago)
2525 하루피는 흡연량 ((x) 개피 / 하루)Amount of cigarettes smoked per day ((x) cigarettes per day)
2626 현재 흡연 중이라면 최근 1년 동안 담배를 끊고자 하루(24시간) 이상 금연한 적이 있습니까? (1 = 아니오, 2 = 예(최근 1년간 하루 이상 금연 시도))If you currently smoke, have you tried to quit smoking for at least one day (24 hours) in the past year? (1 = No, 2 = Yes (attempted to quit smoking at least one day in the past year))
2727 몇 세부터 흡연을 시작하였습니까? (약 x세)At what age did you start smoking? (about x years old)
2828 흡연 시 하루에 피운 담배양은 (20개비 =1갑) (하루 평균 x개비)When smoking, the amount of cigarettes smoked per day (20 cigarettes = 1 pack) (average x cigarettes per day)
2929 지금까지 흡연한 총기간(금연한 경우도 흡연했던 총 기간 (총 x개월)Total number of years you have smoked (total number of years you have smoked even if you quit smoking (total x months))
3030 금연 기간 (약 x개월 동안 금연)Smoking cessation period (approximately x months without smoking)
3131 학력 (1=초등학교 이하, 2=중학교, 3=고등학교, 4=전문대/4년제 대학, 5=대학원 이상, 6=기타 )Education (1=Elementary school or lower, 2=Middle school, 3=High school, 4=Junior college/4-year university, 5=Graduate school or higher, 6=Other)
학습장치는 초기 학습 데이터를 일정하게 전처리할 수 있다. 음성 데이터에 대한 전처리는 잡음 제거, 데이터의 유형 변환 등을 포함할 수 있다. 임상 정보에 대한 전처리는 일정한 범주로 값을 조절하는 과정을 포함할 수 있다. 예컨대, 학습장치는 최소-최대 정규화(Min-Max Normalization), z-점수 정규화 등과 같은 전처리 기법을 이용하여 임상 정보를 정규화할 수 있다.The learning device can consistently preprocess the initial learning data. Preprocessing for voice data may include noise removal, data type conversion, etc. Preprocessing of clinical information may include the process of adjusting values into certain categories. For example, the learning device can normalize clinical information using preprocessing techniques such as Min-Max Normalization and z-score normalization.
또한, 학습장치는 임상 정보의 값을 원-핫 벡터 코딩하여 일정한 값으로 변환할 수 있다. 학습장치는 인코딩된 임상 정보를 학습 모델에 입력할 수 있다.Additionally, the learning device can convert the value of clinical information into a constant value by one-hot vector coding. The learning device can input encoded clinical information into a learning model.
학습장치는 음성 변수 32개와 임상 정보 31종을 개별적인 입력 변수로 취급하여 전체 63개의 입력 변수를 학습 데이터로 구축할 수 있다.The learning device treats 32 voice variables and 31 types of clinical information as individual input variables and can construct a total of 63 input variables as learning data.
학습장치는 학습 데이터를 이용하여 분류 모델을 구축한다(220). 학습장치는 수집한 학습데이터 중 하나의 입력 데이터룰 추출하여 분류 모델에 입력한다. 분류 모델은 해당 입력 데이터에 대한 폐질환 중증도에 대한 확률값을 출력한다. 학습장치는 분류 모델이 출력하는 값과 알고 있는 정답(라벨값)을 비교하여 분류 모델이 정답에 해당하는 라벨을 출력하도록 분류 모델의 가중치를 업데이트한다. 학습장치는 다수의 학습 데이터를 이용하여 학습 과정을 반복한다.The learning device builds a classification model using the learning data (220). The learning device extracts one input data from the collected learning data and inputs it into the classification model. The classification model outputs a probability value for lung disease severity for the corresponding input data. The learning device compares the value output by the classification model with the known correct answer (label value) and updates the weight of the classification model so that the classification model outputs a label corresponding to the correct answer. The learning device repeats the learning process using multiple learning data.
연구자는 소속 기관에서 수집한 248명의 데이터를 이용하여 전술한 분류 모델을 구축하고 검증하였다. 연구자는 248명의 데이터를 4:1로 구분하여 각각 학습 데이터 및 검증 데이터로 이용하였다. 248명의 데이터는 COPD 중증도가 높은 54건(FEV1 < 50), 중증도가 낮은 144건(FEV1 ≥ 50) 및 정상 상태 50건으로 구성되었다. 연구자는 다양한 기계학습모델을 구축하였다. 연구자는 MLP(Multi-layer Perceptron), 랜던 포레스트, Extra Tree Classifier, XGBoost 및 LightGBM의 모델을 각각 구축하였다. 연구자는 구축한 모델들의 성능을 비교하였는데 랜덤 포레스트가 가장 높은 성능을 보였다. 도 3은 폐질환 중증도를 분류하는 학습모델의 성능을 검증한 결과이다. 도 3을 살펴보면 구축한 모델은 평균 micro AUROC(area under the ROC) 및 평균 macro AUROC가 0.87의 성능을 보였다. 따라서, 해당 분류 모델은 폐질환 중증도 분류에 상당히 높은 성능을 보였다. The researcher built and verified the aforementioned classification model using data from 248 people collected from the affiliated institution. The researcher divided the data of 248 people into a 4:1 ratio and used them as training data and verification data, respectively. Data from 248 patients consisted of 54 cases with high COPD severity (FEV1 < 50), 144 cases with low severity (FEV1 ≥ 50), and 50 cases with normal COPD severity. The researcher built various machine learning models. The researcher built models of MLP (Multi-layer Perceptron), Landon Forest, Extra Tree Classifier, XGBoost, and LightGBM, respectively. The researcher compared the performance of the built models, and random forest showed the highest performance. Figure 3 shows the results of verifying the performance of a learning model that classifies lung disease severity. Looking at Figure 3, the built model showed an average micro AUROC (area under the ROC) and an average macro AUROC of 0.87. Therefore, the classification model showed significantly high performance in classifying lung disease severity.
도 4는 폐질환 중증도를 분류하는 분석장치(300)의 예이다. 분석장치(300)는 전술한 분석장치(도 1의 130, 140 또는 150)에 해당한다. 분석장치(300)는 물리적으로 다양한 형태로 구현될 수 있다. 예컨대, 분석장치(300)는 스마트 기기, PC와 같은 컴퓨터 장치, 네트워크의 서버, 웨어러블 기기, 운동 기기, 데이터 처리 전용 칩셋 등의 형태를 가질 수 있다.Figure 4 is an example of an analysis device 300 that classifies the severity of lung disease. The analysis device 300 corresponds to the above-described analysis device (130, 140, or 150 in FIG. 1). The analysis device 300 may be physically implemented in various forms. For example, the analysis device 300 may take the form of a smart device, a computer device such as a PC, a network server, a wearable device, an exercise device, or a chipset dedicated to data processing.
분석장치(300)는 저장장치(310), 메모리(320), 연산장치(330), 인터페이스 장치(340), 통신장치(350) 및 출력장치(360)를 포함할 수 있다.The analysis device 300 may include a storage device 310, a memory 320, an arithmetic device 330, an interface device 340, a communication device 350, and an output device 360.
저장장치(310)는 전술한 분류 모델을 저장할 수 있다. 분류 모델은 사전에 학숩된 모델이다. 분류 모델은 입력되는 사용자 데이터(음성 데이터 및 임상 정보)를 기준으로 폐질환 중증도를 출력하는 모델이다.The storage device 310 may store the above-described classification model. The classification model is a pre-trained model. The classification model is a model that outputs lung disease severity based on input user data (voice data and clinical information).
저장장치(310)는 사용자 데이터를 저장할 수 있다. 사용자 데이터는 분석 대상인 사용자의 음성 데이터 및 임상 정보이다. 음성 데이터는 운동 전에 수집한 데이터 및 운동 후에 수집한 데이터로 구성된다. 음성 데이터는 표 1의 항목들로 구성될 수 있다. 임상 정보는 표 2의 항목들로 구성될 수 있다.The storage device 310 can store user data. User data is the user's voice data and clinical information that are subject to analysis. Voice data consists of data collected before exercise and data collected after exercise. Voice data may consist of the items in Table 1. Clinical information may consist of the items in Table 2.
메모리(320)는 분석장치가 대상자의 사용자 데이터를 이용하여 폐질환 중증도를 분류하는 과정에서 생성되는 데이터 및 정보 등을 저장할 수 있다.The memory 320 may store data and information generated when the analysis device classifies the severity of lung disease using the subject's user data.
인터페이스 장치(340)는 외부로부터 일정한 명령 및 데이터를 입력받는 장치이다. The interface device 340 is a device that receives certain commands and data from the outside.
인터페이스 장치(340)는 물리적으로 연결된 입력 장치 또는 외부 저장장치로부터 대상자의 음성 데이터를 입력받을 수 있다. 이때 입력 장치는 마이크와 같은 장치를 포함할 수도 있다. 음성 데이터는 운동 전과 운동 후에 각각 측정한 데이터로 구성된다.The interface device 340 may receive the subject's voice data from a physically connected input device or an external storage device. At this time, the input device may include a device such as a microphone. Voice data consists of data measured before and after exercise.
인터페이스 장치(340)는 물리적으로 연결된 입력 장치 또는 외부 저장장치로부터 대상자의 임상 정보를 입력받을 수 있다. The interface device 340 may receive the subject's clinical information from a physically connected input device or an external storage device.
인터페이스 장치(340)는 대상자의 사용자 데이터를 분석하여 폐질환 중증도를 분류한 결과를 외부 객체에 전달할 수도 있다. The interface device 340 may analyze the subject's user data and transmit the results of classifying the severity of lung disease to an external object.
한편, 인터페이스 장치(340)는 아래 통신장치(350)를 경유하여 전달된 데이터 내지 정보를 입력받을 수도 있다.Meanwhile, the interface device 340 may receive data or information transmitted through the communication device 350 below.
통신장치(350)는 유선 또는 무선 네트워크를 통해 일정한 정보를 수신하고 전송하는 구성을 의미한다. The communication device 350 refers to a configuration that receives and transmits certain information through a wired or wireless network.
통신장치(350)는 외부 객체(데이터베이스, 사용자 단말, 마이크 등)로부터 대상자의 음성 데이터를 수신할 수 있다.The communication device 350 may receive the subject's voice data from an external object (database, user terminal, microphone, etc.).
통신장치(350)는 외부 객체로부터 대상자의 임상 정보를 수신할 수 있다.The communication device 350 may receive clinical information about a subject from an external object.
통신장치(350)는 대상자의 사용자 데이터를 분석하여 폐질환 중증도를 분류한 결과를 사용자 단말과 같은 외부 객체에 송신할 수도 있다.The communication device 350 may analyze the subject's user data and transmit the results of classifying the severity of lung disease to an external object such as a user terminal.
출력장치(360)는 일정한 정보를 출력하는 장치이다. 출력장치(360)는 데이터 처리 과정에 필요한 인터페이스, 분류 결과 등을 출력할 수 있다. The output device 360 is a device that outputs certain information. The output device 360 can output interfaces, classification results, etc. required for the data processing process.
연산 장치(330)는 사용자 데이터를 일정하게 전처리할 수 있다. 예컨대, 연산 장치(330)는 음성 데이터를 일정한 유형의 데이터로 변환할 수 있다. 또한, 연산 장치(330)는 임상 정보의 각 값을 일정한 범주로 정규화할 수도 있다.The computing device 330 may preprocess user data consistently. For example, the computing device 330 may convert voice data into a certain type of data. Additionally, the computing device 330 may normalize each value of clinical information into a certain category.
연산 장치(330)는 전처리한 사용자 데이터를 사전에 학습된 학습 모델에 입력한다. 연산 장치(330)는 학습 모델이 출력하는 확률값을 기준으로 대상자의 폐질환 중증도를 분류할 수 있다.The computing device 330 inputs the preprocessed user data into a pre-trained learning model. The computing device 330 may classify the severity of the subject's lung disease based on the probability value output by the learning model.
연산 장치(330)는 데이터를 처리하고, 일정한 연산을 처리하는 프로세서, AP, 프로그램이 임베디드된 칩과 같은 장치일 수 있다.The computing device 330 may be a device such as a processor that processes data and performs certain operations, an AP, or a chip with an embedded program.
또한, 상술한 바와 같은 대상자의 폐질환 중증도 분류 방법은 컴퓨터에서 실행될 수 있는 실행가능한 알고리즘을 포함하는 프로그램(또는 어플리케이션)으로 구현될 수 있다. 상기 프로그램은 일시적 또는 비일시적 판독 가능 매체(non-transitory computer readable medium)에 저장되어 제공될 수 있다.Additionally, the method for classifying the severity of a subject's lung disease as described above may be implemented as a program (or application) including an executable algorithm that can be executed on a computer. The program may be stored and provided in a temporary or non-transitory computer readable medium.
비일시적 판독 가능 매체란 레지스터, 캐쉬, 메모리 등과 같이 짧은 순간 동안 데이터를 저장하는 매체가 아니라 반영구적으로 데이터를 저장하며, 기기에 의해 판독(reading)이 가능한 매체를 의미한다. 구체적으로는, 상술한 다양한 어플리케이션 또는 프로그램들은 CD, DVD, 하드 디스크, 블루레이 디스크, USB, 메모리카드, ROM (read-only memory), PROM (programmable read only memory), EPROM(Erasable PROM, EPROM) 또는 EEPROM(Electrically EPROM) 또는 플래시 메모리 등과 같은 비일시적 판독 가능 매체에 저장되어 제공될 수 있다.A non-transitory readable medium refers to a medium that stores data semi-permanently and can be read by a device, rather than a medium that stores data for a short period of time, such as registers, caches, and memories. Specifically, the various applications or programs described above include CD, DVD, hard disk, Blu-ray disk, USB, memory card, ROM (read-only memory), PROM (programmable read only memory), and EPROM (Erasable PROM, EPROM). Alternatively, it may be stored and provided in a non-transitory readable medium such as EEPROM (Electrically EPROM) or flash memory.
일시적 판독 가능 매체는 스태틱 램(Static RAM,SRAM), 다이내믹 램(Dynamic RAM,DRAM), 싱크로너스 디램 (Synchronous DRAM,SDRAM), 2배속 SDRAM(Double Data Rate SDRAM,DDR SDRAM), 증강형 SDRAM(Enhanced SDRAM,ESDRAM), 동기화 DRAM(Synclink DRAM,SLDRAM) 및 직접 램버스 램(Direct Rambus RAM,DRRAM) 과 같은 다양한 RAM을 의미한다.Temporarily readable media include Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDR SDRAM), and Enhanced SDRAM (Enhanced RAM). It refers to various types of RAM, such as SDRAM (ESDRAM), Synchronous DRAM (Synclink DRAM, SLDRAM), and Direct Rambus RAM (DRRAM).
본 실시예 및 본 명세서에 첨부된 도면은 전술한 기술에 포함되는 기술적 사상의 일부를 명확하게 나타내고 있는 것에 불과하며, 전술한 기술의 명세서 및 도면에 포함된 기술적 사상의 범위 내에서 당업자가 용이하게 유추할 수 있는 변형 예와 구체적인 실시례는 모두 전술한 기술의 권리범위에 포함되는 것이 자명하다고 할 것이다.This embodiment and the drawings attached to this specification only clearly show some of the technical ideas included in the above-described technology, and those skilled in the art can easily understand them within the scope of the technical ideas included in the specification and drawings of the above-described technology. It is self-evident that all inferable variations and specific embodiments are included in the scope of rights of the above-mentioned technology.

Claims (8)

  1. 분석장치는 대상자의 음성 데이터 및 임상 정보를 입력받는 단계;The analysis device receives the subject's voice data and clinical information;
    상기 분석장치는 상기 음성 데이터 및 상기 임상 정보를 전처리하는 단계;The analysis device preprocesses the voice data and the clinical information;
    상기 분석장치는 상기 전처리된 음성 데이터 및 임상 정보를 사전에 학습된 학습모델에 입력하는 단계; 및The analysis device inputs the pre-processed voice data and clinical information into a pre-trained learning model; and
    상기 분석장치는 상기 학습모델의 출력값을 기준으로 상기 대상자의 폐질환 중증도를 분류하는 단계를 포함하는 음성 데이터 및 임상 정보를 이용하여 대상자의 폐질환의 중증도를 분류하는 방법.A method of classifying the severity of a subject's lung disease using voice data and clinical information, wherein the analysis device classifies the severity of the subject's lung disease based on the output value of the learning model.
  2. 제1항에 있어서,According to paragraph 1,
    상기 음성 데이터는 상기 대상자가 운동 전에 일정한 텍스트를 발성한 음성 데이터 및 상기 대상자기 상기 운동 후에 상기 일정한 텍스트를 발성한 음성 데이터를 포함하는 음성 데이터 및 임상 정보를 이용하여 대상자의 폐질환의 중증도를 분류하는 방법.The voice data classifies the severity of the subject's lung disease using voice data and clinical information, including voice data in which the subject utters a certain text before exercise and voice data in which the subject utters a certain text after the exercise. How to.
  3. 제1항에 있어서,According to paragraph 1,
    상기 음성 데이터는 운동 전 침묵구간 수, 운동 후 침묵구간 수, 운동 전후 침묵구간 수의 차이, 운동 전후 침묵구간 수의 비율, 운동 전 침묵구간 길이, 운동 후 침묵구간 길이, 운동 전후 침묵구간 차이, 운동 전후 침묵구간 비율, 운동 전 녹음파일의 전체 길이, 운동 후 녹음파일의 전체 길이, 운동 전후 녹음파일의 전체 길이 차이, 운동 전후 녹음파일의 전체 길이 비율, 운동 전 녹음파일 전체 길이 대비 침묵구간 길이의 비율, 운동 후 녹음파일 전체 길이 대비 침묵구간 길이의 비율, 운동 전후 녹음파일 전체 길이 대비 침묵구간 길이의 차이, 운동 전후 녹음파일 전체 길이 대비 침묵구간 길이의 비율, 운동 전 지터(jitter), 운동 후 지터(jitter), 운동 전 시머(shimmer), 운동 후 시머(shimmer), 운동 전 HNR(harmonic to noise ratio), 운동 후 HNR, 운동 전 포먼트(formant), 운동 전 포먼트(formant), 운동 전 발성 속도(speech rate), 운동 후 발성 속도(speech rate), 운동 전 f0 (기본주파수), 운동 후 f0 (기본주파수), 운동 전 조음 속도(articulation rate), 운동 후 조음 속도(articulation rate), 운동 전 음절 길이(syllable duration) 및 운동 후 음절 길이(syllable duration) 중 복수의 항목들을 포함하는 음성 데이터 및 임상 정보를 이용하여 대상자의 폐질환의 중증도를 분류하는 방법.The voice data includes the number of silent sections before exercise, the number of silent sections after exercise, the difference in the number of silent sections before and after exercise, the ratio of the number of silent sections before and after exercise, the length of silent sections before exercise, the length of silent sections after exercise, the difference between silent sections before and after exercise, Ratio of silent section before and after exercise, total length of recorded file before exercise, total length of recorded file after exercise, difference in total length of recorded file before and after exercise, ratio of total length of recorded file before and after exercise, length of silent section compared to total length of recorded file before exercise. ratio, the ratio of the length of the silent section to the total length of the recording file after exercise, the difference in the length of the silent section compared to the total length of the recording file before and after exercise, the ratio of the length of the silent section to the total length of the recording file before and after exercise, jitter before exercise, exercise After jitter, before exercise shimmer, after exercise shimmer, before exercise HNR (harmonic to noise ratio), after exercise HNR, before exercise formant, before exercise formant, Speech rate before exercise, speech rate after exercise, f0 (fundamental frequency) before exercise, f0 (fundamental frequency) after exercise, articulation rate before exercise, articulation rate after exercise ), a method of classifying the severity of a subject's lung disease using voice data and clinical information including a plurality of items among the syllable duration before exercise and the syllable duration after exercise.
  4. 제1항에 있어서,According to paragraph 1,
    상기 임상 정보는 성별, 나이, 키, 몸무게, BMI(Body Mass Index), 산소포화도, 심박수, 운동자각도, 호흡곤란 정도, 기침 여부, 금연 여부, 금연 기간 및 흡연 기간을 포함하는 음성 데이터 및 임상 정보를 이용하여 대상자의 폐질환의 중증도를 분류하는 방법.The clinical information includes voice data and clinical information including gender, age, height, weight, BMI (Body Mass Index), oxygen saturation, heart rate, perceived exertion, degree of difficulty breathing, cough, smoking cessation, smoking cessation period, and smoking period. A method of classifying the severity of a subject's lung disease using information.
  5. 대상자의 음성 데이터 및 임상 정보를 입력받는 인터페이스 장치;An interface device that receives the subject's voice data and clinical information;
    음성 데이터 및 임상 정보를 입력받아 폐질환 중증도를 분류하는 학습 모델을 저장하는 저장장치; 및A storage device that stores a learning model that receives voice data and clinical information and classifies the severity of lung disease; and
    상기 입력되는 음성 데이터 및 임상 정보를 전처리하고, 상기 전처리된 음성 데이터 및 임상 정보를 상기 학습 모델에에 입력하고, 상기 학습모델의 출력값을 기준으로 상기 대상자의 폐질환 중증도를 분류하는 연산장치를 포함하는 대상자의 폐질환을 중증도를 분류하는 분석장치.It includes a computing device that preprocesses the input voice data and clinical information, inputs the preprocessed voice data and clinical information into the learning model, and classifies the severity of the subject's lung disease based on the output value of the learning model. An analysis device that classifies the severity of a patient's lung disease.
  6. 제5항에 있어서,According to clause 5,
    상기 음성 데이터는 상기 대상자가 운동 전에 일정한 텍스트를 발성한 음성 데이터 및 상기 대상자기 상기 운동 후에 상기 일정한 텍스트를 발성한 음성 데이터를 포함하는 대상자의 폐질환을 중증도를 분류하는 분석장치.The voice data is an analysis device for classifying the severity of the subject's lung disease, including voice data in which the subject utters a certain text before exercising and voice data in which the subject utters the certain text after the exercise.
  7. 제5항에 있어서,According to clause 5,
    상기 음성 데이터는 운동 전 침묵구간 수, 운동 후 침묵구간 수, 운동 전후 침묵구간 수의 차이, 운동 전후 침묵구간 수의 비율, 운동 전 침묵구간 길이, 운동 후 침묵구간 길이, 운동 전후 침묵구간 차이, 운동 전후 침묵구간 비율, 운동 전 녹음파일의 전체 길이, 운동 후 녹음파일의 전체 길이, 운동 전후 녹음파일의 전체 길이 차이, 운동 전후 녹음파일의 전체 길이 비율, 운동 전 녹음파일 전체 길이 대비 침묵구간 길이의 비율, 운동 후 녹음파일 전체 길이 대비 침묵구간 길이의 비율, 운동 전후 녹음파일 전체 길이 대비 침묵구간 길이의 차이, 운동 전후 녹음파일 전체 길이 대비 침묵구간 길이의 비율, 운동 전 지터(jitter), 운동 후 지터(jitter), 운동 전 시머(shimmer), 운동 후 시머(shimmer), 운동 전 HNR(harmonic to noise ratio), 운동 후 HNR, 운동 전 포먼트(formant), 운동 전 포먼트(formant), 운동 전 발성 속도(speech rate), 운동 후 발성 속도(speech rate), 운동 전 f0 (기본주파수), 운동 후 f0 (기본주파수), 운동 전 조음 속도(articulation rate), 운동 후 조음 속도(articulation rate), 운동 전 음절 길이(syllable duration) 및 운동 후 음절 길이(syllable duration) 중 복수의 항목들을 포함하는 대상자의 폐질환을 중증도를 분류하는 분석장치.The voice data includes the number of silent sections before exercise, the number of silent sections after exercise, the difference in the number of silent sections before and after exercise, the ratio of the number of silent sections before and after exercise, the length of silent sections before exercise, the length of silent sections after exercise, the difference between silent sections before and after exercise, Ratio of silent section before and after exercise, total length of recorded file before exercise, total length of recorded file after exercise, difference in total length of recorded file before and after exercise, ratio of total length of recorded file before and after exercise, length of silent section compared to total length of recorded file before exercise. ratio, the ratio of the length of the silent section to the total length of the recording file after exercise, the difference in the length of the silent section compared to the total length of the recording file before and after exercise, the ratio of the length of the silent section to the total length of the recording file before and after exercise, jitter before exercise, exercise After jitter, before exercise shimmer, after exercise shimmer, before exercise HNR (harmonic to noise ratio), after exercise HNR, before exercise formant, before exercise formant, Speech rate before exercise, speech rate after exercise, f0 (fundamental frequency) before exercise, f0 (fundamental frequency) after exercise, articulation rate before exercise, articulation rate after exercise. ), an analysis device that classifies the severity of a subject's lung disease, including multiple items among syllable duration before exercise and syllable duration after exercise.
  8. 제5항에 있어서,According to clause 5,
    상기 임상 정보는 성별, 나이, 키, 몸무게, BMI(Body Mass Index), 산소포화도, 심박수, 운동자각도, 호흡곤란 정도, 기침 여부, 금연 여부, 금연 기간 및 흡연 기간을 포함하는 대상자의 폐질환을 중증도를 분류하는 분석장치.The clinical information includes the subject's lung disease, including gender, age, height, weight, BMI (Body Mass Index), oxygen saturation, heart rate, perceived exertion, degree of difficulty breathing, cough, smoking cessation, smoking cessation period, and smoking period. An analysis device that classifies severity.
PCT/KR2023/013863 2022-09-16 2023-09-15 Method and analysis device for classifying severity of lung disease of subject by using voice data and clinical information WO2024058585A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
KR10-2022-0117052 2022-09-16
KR20220117052 2022-09-16
KR10-2023-0122823 2023-09-15
KR1020230122823A KR20240038622A (en) 2022-09-16 2023-09-15 Classification method for severity of pulmonary disease based on vocal data and clinical information and analysis apparatus

Publications (1)

Publication Number Publication Date
WO2024058585A1 true WO2024058585A1 (en) 2024-03-21

Family

ID=90275303

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2023/013863 WO2024058585A1 (en) 2022-09-16 2023-09-15 Method and analysis device for classifying severity of lung disease of subject by using voice data and clinical information

Country Status (1)

Country Link
WO (1) WO2024058585A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20150127380A (en) * 2014-05-07 2015-11-17 한국 한의학 연구원 Apparatus and method for diagnosis of physical conditions using phonetic analysis
JP2018516616A (en) * 2015-04-16 2018-06-28 コーニンクレッカ フィリップス エヌ ヴェKoninklijke Philips N.V. Device, system and method for detecting heart and / or respiratory disease in a subject
JP2018534026A (en) * 2015-10-08 2018-11-22 コルディオ メディカル リミテッド Evaluation of lung diseases by speech analysis
US20210076977A1 (en) * 2017-12-21 2021-03-18 The University Of Queensland A method for analysis of cough sounds using disease signatures to diagnose respiratory diseases
JP2022100317A (en) * 2019-03-11 2022-07-05 株式会社RevComm Information processing apparatus

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20150127380A (en) * 2014-05-07 2015-11-17 한국 한의학 연구원 Apparatus and method for diagnosis of physical conditions using phonetic analysis
JP2018516616A (en) * 2015-04-16 2018-06-28 コーニンクレッカ フィリップス エヌ ヴェKoninklijke Philips N.V. Device, system and method for detecting heart and / or respiratory disease in a subject
JP2018534026A (en) * 2015-10-08 2018-11-22 コルディオ メディカル リミテッド Evaluation of lung diseases by speech analysis
US20210076977A1 (en) * 2017-12-21 2021-03-18 The University Of Queensland A method for analysis of cough sounds using disease signatures to diagnose respiratory diseases
JP2022100317A (en) * 2019-03-11 2022-07-05 株式会社RevComm Information processing apparatus

Similar Documents

Publication Publication Date Title
US11810670B2 (en) Intelligent health monitoring
US20200388287A1 (en) Intelligent health monitoring
US10223934B2 (en) Systems and methods for expressive language, developmental disorder, and emotion assessment, and contextual feedback
Alaie et al. Cry-based infant pathology classification using GMMs
Shi et al. Theory and application of audio-based assessment of cough
Muhammad et al. Convergence of artificial intelligence and internet of things in smart healthcare: a case study of voice pathology detection
Stasak et al. Automatic detection of COVID-19 based on short-duration acoustic smartphone speech analysis
Romero et al. Deep learning features for robust detection of acoustic events in sleep-disordered breathing
Vatanparvar et al. CoughMatch–subject verification using cough for personal passive health monitoring
Simply et al. Diagnosis of obstructive sleep apnea using speech signals from awake subjects
Usman et al. Heart rate detection and classification from speech spectral features using machine learning
Blanco et al. Improving automatic detection of obstructive sleep apnea through nonlinear analysis of sustained speech
Ding et al. Severity evaluation of obstructive sleep apnea based on speech features
Popadina et al. Voice analysis framework for asthma-COVID-19 early diagnosis and prediction: AI-based mobile cloud computing application
JP2023531464A (en) A method and system for screening for obstructive sleep apnea during wakefulness using anthropometric information and tracheal breath sounds
WO2024058585A1 (en) Method and analysis device for classifying severity of lung disease of subject by using voice data and clinical information
Romero et al. Snorer diarisation based on deep neural network embeddings
WO2023058946A1 (en) System and method for predicting respiratory disease prognosis through time-series measurements of cough sounds, respiratory sounds, recitation sounds and vocal sounds
Dubnov Signal analysis and classification of audio samples from individuals diagnosed with COVID-19
KR20230050208A (en) Respiratory disease prognosis prediction system and method through time-series cough sound, breathing sound, reading sound or vocal sound measurement
KR20240038622A (en) Classification method for severity of pulmonary disease based on vocal data and clinical information and analysis apparatus
Kim et al. Non-invasive way to diagnose dysphagia by training deep learning model with voice spectrograms
Dutta et al. A Fine-Tuned CatBoost-Based Speech Disorder Detection Model
Xu et al. A Review of Disorder Voice Processing Toward to Applications
Chudasama et al. Voice Based Pathology Detection from Respiratory Sounds using Optimized Classifiers

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23865881

Country of ref document: EP

Kind code of ref document: A1