KR20220007537A

KR20220007537A - Method and device for predicting Alzheimer's disease based on phonetic features

Info

Publication number: KR20220007537A
Application number: KR1020210089014A
Authority: KR
Inventors: 이준영; 고현웅
Original assignee: 서울대학교산학협력단
Priority date: 2020-07-10
Filing date: 2021-07-07
Publication date: 2022-01-18
Also published as: KR102659616B1

Abstract

Provided are a method and device for predicting Alzheimer's disease based on a vocal characteristic. According to an embodiment of the present invention, the device for predicting Alzheimer's disease comprises: a voice inputting unit configured to generate a voice sample by recording a voice of a target; a data inputting unit configured to receive demographic information of the target; a vocal characteristic extracting unit extracting the vocal characteristic from the generated voice sample; and a prediction model pre-trained to predict Alzheimer's disease of the target based on the vocal characteristic and the demographic information.

Description

Method and device for predicting Alzheimer's disease based on phonetic features

본 발명은 치매의 원인 질병 중 가장 높은 비율을 차지하는 알츠하이머병(Alzheimer's disease, AD)을 진단하기 위한 방법으로 환자의 음성을 이용한 비침습적인 방법을 통하여 의료인의 알츠하이머병 유무를 판단하는데 도움을 제공하는 방법에 관한 것으로, 치매 고위험군인 사람들을 대상으로 비침습적 방법을 통하여 알츠하이머병으로 인한 치매 상태를 예측할 수 있는 방법 및 장치를 제공할 수 있다.The present invention is a method for diagnosing Alzheimer's disease (AD), which accounts for the highest rate among the causative diseases of dementia, and provides assistance in determining the presence or absence of Alzheimer's disease in medical personnel through a non-invasive method using the patient's voice To a method, it is possible to provide a method and an apparatus capable of predicting a dementia state due to Alzheimer's disease through a non-invasive method for people with a high risk of dementia.

4차 산업과 함께 AI 스피커를 사용하는 비율이 점차 늘어나고 있으나, 아직까지 대중적으로 보급화되지는 못한 상태이다. AI 스피커의 대중화가 늦어지는 것과 관련하여, 다양한 이유들이 존재하나, 소비자들은 AI 스피커를 활용할 컨텐츠가 부족하다는 의견이 대다수를 차지하고 있다. 즉, AI 스피커를 이용한 다양한 컨텐츠의 개발이 요구되고 있는 상황이다.Although the proportion of using AI speakers is gradually increasing along with the 4th industry, it has not yet been popularized. Regarding the delay in popularization of AI speakers, various reasons exist, but the majority of consumers are of the opinion that there is not enough content to use AI speakers. In other words, the development of various contents using AI speakers is required.

여기서, AI 스피커와 헬스케어 산업은 수익성과 공익성을 함께 잡는 '두 마리 토끼'로 인식되고 있으며, 국내외 많은 연구진들이 AI 스피커를 이용하여 헬스케어와 관련된 다양한 컨텐츠를 개발하고 있는 상황이다.Here, AI speakers and the healthcare industry are recognized as 'two rabbits' that capture both profitability and public interest, and many domestic and foreign researchers are developing various contents related to healthcare using AI speakers.

우리나라의 65세 이상 노인 인구는 2000년을 기점으로 전체인구의 7.1%를 차지하게 되었고, 2026년에는 20%를 넘어설 것으로 예상된다. 대표적 노인성 질환인 치매의 경우 2000년에 65세 이상 노인의 8.3%인 28만 명으로 집계되었으며, 2010년에는 전체 노인의 8.6%인 43만 명에 이를 것으로 추정되어 증가추세에 있다. 노인 인구의 증가에 따라 평균수명의 증가로 인하여 치매의 유병률은 꾸준하게 증가하는 추세를 나타낸다.The elderly population over 65 in Korea accounted for 7.1% of the total population as of 2000, and is expected to exceed 20% by 2026. In the case of dementia, a representative geriatric disease, in 2000, 8.3% or 280,000 of the elderly aged 65 and over were counted, and in 2010, it is estimated to reach 430,000, or 8.6% of the total, and is on the rise. With the increase of the elderly population, the prevalence of dementia shows a steadily increasing trend due to the increase in life expectancy.

알츠하이머병(Alzheimer's disease, AD)은 전체 치매의 약 70%를 설명하는 가장 흔한 원인 질병으로, 2020년에는 약 60만 명의 알츠하이머병 환자가 발생할 것으로 추정된다. 알츠하이머병은 대표적인 퇴행성 치매질환으로 대뇌 신경세포의 퇴화를 통한 인지기능 및 일상생활 저하를 야기하며 사망에 이르는 파괴적인 질병으로 환자뿐만 아니라 환자를 돌보는 가족들에게도 고통을 초래한다. 알츠하이머병은 비가역적인 질병인데다가 완치 가능한 치료법이 개발되지 않았기 때문에 알츠하이머병을 조기 진단하여 빠르게 개입하는 것이 현재 최선의 방안이다. 알츠하이머병 유병률 증가에 따라 국민 보건 및 의료 비용 지출 또한 막대한 영향을 받고 있다. Alzheimer's disease (AD) is the most common causative disease accounting for about 70% of all dementias, and it is estimated that about 600,000 Alzheimer's disease patients will occur in 2020. Alzheimer's disease is a representative degenerative dementia disease that causes deterioration of cognitive function and daily life through degeneration of cerebral nerve cells. Since Alzheimer's disease is an irreversible disease and a curable treatment has not been developed, early diagnosis and early intervention for Alzheimer's disease is currently the best solution. As the prevalence of Alzheimer's disease increases, public health and medical expenditures are also having a significant impact.

음성 및 언어장애는 인지기능 및 신경 퇴화에 따른 치매에 동반되는 증상 중 하나로 정상 노화 및 치매 원인 질병 감별에 자주 사용되는 기준이며 종종 조기 증상의 마커로 나타난다. 때문에 자동음성인식 연구에서는 치매 진단을 위한 음성 및 언어 마커를 발굴하는 다양한 연구들이 수행되고 있다. 대표적으로 DementiaBank의 Pitt corpus의 데이터(e.g., Cookie Theft 그림 설명)를 이용하여 치매를 예측하는 연구들이 주를 이루고 있다. Speech and speech disorders are one of the symptoms accompanying dementia due to cognitive function and neurodegeneration, and are frequently used criteria to differentiate normal aging and diseases that cause dementia, and often appear as markers of early symptoms. Therefore, in automatic speech recognition research, various studies are being carried out to discover speech and language markers for dementia diagnosis. Typically, studies that predict dementia using data from Pitt corpus of DementiaBank (e.g., Cookie Theft illustration) are mainly conducted.

대표적인 치매 원인질병인 알츠하이머병을 정확하게 진단하는데 사용되는 뇌척수액 검수나 양전자 방출 단층 촬영술(positron emission tomograph, PET)이 전문가에 의해 특정한 공간 및 시간에 제약됨을 고려할 때, AI 스피커 및 스마트 기기를 이용하여 대화 대상자의 알츠하이머병 여부를 예측할 수 있는 방법 및 장치가 요구되고 있는 실정이다. Considering that cerebrospinal fluid examination or positron emission tomograph (PET), which is used to accurately diagnose Alzheimer's disease, a typical dementia-causing disease, is limited to a specific space and time by experts, conversation using AI speakers and smart devices There is a need for a method and apparatus for predicting whether a subject has Alzheimer's disease.

한국 공개특허 제10-2014-0119486호Korean Patent Publication No. 10-2014-0119486

본 발명은 음성의 음향학적 특성을 이용하여 알츠하이머병을 진단하기 위한 방법에 대한 것으로, 음성인식 마이크를 통해 AI 스피커 혹은 스마트 기기를 이용하여 대화 상대자의 응답을 수집하여 대화 상대자의 알츠하이머병 유무 및 가능성을 예측할 수 있는 방법 및 장치에 관한 것이다.The present invention relates to a method for diagnosing Alzheimer's disease using the acoustic characteristics of speech, and the presence and possibility of Alzheimer's disease of the conversation partner by collecting the conversation partner's responses using an AI speaker or a smart device through a voice recognition microphone It relates to a method and apparatus for predicting

본 발명의 실시예에 따른 음성 특성 기반 알츠하이머병 예측 장치는 대상자의 음성을 녹음하여 음성 샘플을 생성하도록 구성된 음성 입력부; 상기 대상자의 인구통계학적 정보를 수신하도록 구성된 데이터 입력부; 상기 생성된 음성 샘플에서 음성 특성을 추출하는 음성 특성 추출부; 및 상기 음성 특성과 상기 인구통계학적 정보를 기초로 상기 대상자의 알츠하이머병 여부를 예측하도록 미리 학습된 예측 모델을 포함한다. According to an embodiment of the present invention, an apparatus for predicting Alzheimer's disease based on speech characteristics includes a voice input unit configured to generate a voice sample by recording a subject's voice; a data input configured to receive demographic information of the subject; a voice characteristic extracting unit for extracting voice characteristics from the generated voice sample; and a predictive model trained in advance to predict whether the subject has Alzheimer's disease or not based on the voice characteristics and the demographic information.

본 발명의 실시예에 따른 음성 특성 기반 알츠하이머병 예측 방법은 대상자의 음성을 녹음하여 음성 샘플을 생성하는 단계; 상기 대상자의 인구통계학적 정보를 수신하는 단계; 상기 생성된 음성 샘플에서 음성 특성을 추출하는 단계; 및 미리 학습된 예측 모델에 상기 음성 특성과 상기 인구통계학적 정보를 입력하여 상기 대상자의 알츠하이머병 여부를 예측하는 단계를 포함한다.A method for predicting Alzheimer's disease based on speech characteristics according to an embodiment of the present invention includes: generating a voice sample by recording a subject's voice; receiving demographic information of the subject; extracting a voice characteristic from the generated voice sample; and predicting whether the subject has Alzheimer's disease by inputting the voice characteristics and the demographic information into a pre-trained prediction model.

본 발명의 실시예에 따른 기록매체는 컴퓨터 판독 가능 명령을 저장하는 컴퓨터 판독 가능 기록매체로서, 상기 명령은 적어도 하나의 프로세서에 의해 실행될 때, 상기 적어도 하나의 프로세서가 단계들을 수행하도록 하며, 상기 단계들은, 대상자의 음성을 녹음하여 음성 샘플을 생성하는 단계; 상기 대상자의 인구통계학적 정보를 수신하는 단계; 상기 생성된 음성 샘플에서 음성 특성을 추출하는 단계; 및 미리 학습된 예측 모델에 상기 음성 특성과 상기 인구통계학적 정보를 입력하여 상기 대상자의 알츠하이머병 여부를 예측하는 단계를 포함한다.A recording medium according to an embodiment of the present invention is a computer-readable recording medium storing computer-readable instructions, the instructions, when executed by at least one processor, cause the at least one processor to perform the steps, recording the subject's voice to generate a voice sample; receiving demographic information of the subject; extracting a voice characteristic from the generated voice sample; and predicting whether the subject has Alzheimer's disease by inputting the voice characteristics and the demographic information into a pre-trained prediction model.

본 발명의 실시예에 따른 음성 특성 기반 알츠하이머병 예측 방법 및 장치는 AI 스피커 및 스마트 기기를 이용한 대화 대상자의 알츠하이머병 여부에 대한 예측 방법 및 장치는 기존 검사에 수반되는 시간적, 공간적, 자원적 한계를 극복하는데 도움을 줄 것으로 예상되며, 의료인의 알츠하이머병 진단을 지원할 수 있다.The method and apparatus for predicting Alzheimer's disease based on voice characteristics according to an embodiment of the present invention is a method and apparatus for predicting whether a conversation target has Alzheimer's disease using an AI speaker and a smart device It is expected to help overcome the disease, and can support the diagnosis of Alzheimer's disease by medical personnel.

본 발명에서 제공하는 음성의 음향학적 특성을 활용한 알츠하이머병 위험도 예측을 통해 지속적으로 늘어나는 치매 인구에 대한 적절한 개입 방안을 마련이 지원될 수 있다.It can be supported to prepare an appropriate intervention plan for the continuously increasing dementia population by predicting the risk of Alzheimer's disease using the acoustic characteristics of speech provided by the present invention.

도 1은 본 발명의 실시예에 따른 음성 특성 기반 알츠하이머병 예측 장치의 구성을 도시한 블록도이다.
도 2는 본 발명의 실시예에 따른 음성 특성 기반 알츠하이머병 예측 장치의 동작 과정에 예시적으로 도시한다.
도 3은 본 발명의 실시예에 따른 음성 특성 기반 알츠하이머병 예측 방법의 순서를 도시한 블록도이다.
도 4는 인구 통계학 및 음성 특성을 고려한 알츠하이머병 예측 결과를 도시한 그래프이다. 1 is a block diagram illustrating the configuration of an apparatus for predicting Alzheimer's disease based on speech characteristics according to an embodiment of the present invention.
2 exemplarily shows the operation of the apparatus for predicting Alzheimer's disease based on speech characteristics according to an embodiment of the present invention.
3 is a block diagram illustrating a procedure of a method for predicting Alzheimer's disease based on speech characteristics according to an embodiment of the present invention.
4 is a graph showing Alzheimer's disease prediction results in consideration of demographics and negative characteristics.

이하, 본 발명의 바람직한 실시예를 첨부한 도면을 참조하여 설명한다. 본 발명은 도면에 도시된 실시 예를 참고로 설명되었으나 이는 하나의 실시예로서 설명되는 것이며, 이것에 의해 본 발명의 기술적 사상과 그 핵심 구성 및 작용은 제한되지 않는다.Hereinafter, preferred embodiments of the present invention will be described with reference to the accompanying drawings. Although the present invention has been described with reference to the embodiment shown in the drawings, which will be described as one embodiment, the technical idea of the present invention and its core configuration and operation are not limited thereby.

도 1은 본 발명의 실시예에 알츠하이머병 예측 장치의 구성을 도시한 블록도이다. 도 2는 본 발명의 실시예에 따른 알츠하이머병 예측 장치의 동작 과정에 예시적으로 도시한다.1 is a block diagram showing the configuration of an apparatus for predicting Alzheimer's disease according to an embodiment of the present invention. 2 exemplarily shows an operation process of an apparatus for predicting Alzheimer's disease according to an embodiment of the present invention.

도 1 및 도 2를 참조하면, 본 발명의 실시예에 따른 알츠하이머병 예측 장치(10)은 음성 입력부(100), 데이터 입력부(110), 음성 특성 추출부(120), 예측 모델(130) 및 데이터 저장부(140)를 포함한다. 1 and 2, the Alzheimer's disease prediction apparatus 10 according to an embodiment of the present invention includes a voice input unit 100, a data input unit 110, a voice characteristic extractor 120, a prediction model 130, and and a data storage unit 140 .

알츠하이머병 예측 장치(10)는, 전적으로 하드웨어이거나, 또는 부분적으로 하드웨어이고 부분적으로 소프트웨어인 측면을 가질 수 있다. 예컨대, 본 명세서의 알츠하이머병 예측 장치(10) 및 이에 포함된 각 부는, 특정 형식 및 내용의 데이터를 전자통신 방식으로 주고받기 위한 장치 및 이에 관련된 소프트웨어를 통칭할 수 있다. 본 명세서에서 "부", "모듈(module)", "서버(server)", "장치", "장치" 또는 "단말" 등의 용어는 하드웨어 및 해당 하드웨어에 의해 구동되는 소프트웨어의 조합을 지칭하는 것으로 의도된다. 예를 들어, 여기서 하드웨어는 CPU 또는 다른 프로세서(processor)를 포함하는 데이터 처리 기기일 수 있다. 또한, 하드웨어에 의해 구동되는 소프트웨어는 실행중인 프로세스, 객체(object), 실행파일(executable), 실행 스레드(thread of execution), 프로그램(program) 등을 지칭할 수 있다.The Alzheimer's disease prediction apparatus 10 may be entirely hardware, or may have aspects that are partly hardware and partly software. For example, the Alzheimer's disease prediction apparatus 10 and each unit included therein of the present specification may collectively refer to an apparatus for exchanging data of a specific format and content in an electronic communication method and software related thereto. As used herein, terms such as “unit”, “module”, “server”, “device”, “device” or “terminal” refer to a combination of hardware and software driven by the hardware. it is intended to be For example, the hardware herein may be a data processing device including a CPU or other processor. In addition, software driven by hardware may refer to a running process, an object, an executable file, a thread of execution, a program, and the like.

또한, 알츠하이머병 예측 장치(10)를 구성하는 각각의 모듈은 반드시 물리적으로 구분되는 별개의 구성요소를 지칭하는 것으로 의도되지 않는다. 도 1에서 음성 입력부(100), 데이터 입력부(110), 음성 특성 추출부(120), 예측 모델(130) 및 데이터 저장부(140)는 서로 구분되는 별개의 블록으로 도시되나, 이는 알츠하이머병 예측 장치(10)을 구성하는 장치를 해당 장치에 의해 실행되는 동작에 의해 단지 기능적으로 구분한 것이다. 따라서, 실시예에 따라서는 음성 입력부(100), 데이터 입력부(110), 음성 특성 추출부(120), 예측 모델(130) 및 데이터 저장부(140)는 일부 또는 전부가 동일한 하나의 장치 내에 집적화될 수 있다. 예를 들어, 알츠하이머병 예측 장치(10)는 AI 스피커와 같은 음성 인식과 데이터 처리 능력을 가진 장치를 통해 구현될 수 있다. 다만 이에 한정되는 것은 아니며, 상기 구성들은 하나 이상이 다른 부와 물리적으로 구분되는 별개의 장치로 구현될 수도 있으며, 분산 컴퓨팅 환경 하에서 서로 통신 가능하게 연결된 컴포넌트들일 수도 있다.In addition, each module constituting the apparatus 10 for predicting Alzheimer's disease is not necessarily intended to refer to physically distinct separate components. In FIG. 1 , the voice input unit 100 , the data input unit 110 , the voice characteristic extractor 120 , the predictive model 130 , and the data storage unit 140 are shown as separate blocks that are separated from each other, but this is Alzheimer's disease prediction The devices constituting the device 10 are only functionally divided by the operations performed by the device. Accordingly, in some embodiments, the voice input unit 100, the data input unit 110, the voice characteristic extraction unit 120, the predictive model 130, and the data storage unit 140 are all or partly integrated in the same device. can be For example, the Alzheimer's disease prediction apparatus 10 may be implemented through a device having voice recognition and data processing capabilities, such as an AI speaker. However, the present invention is not limited thereto, and one or more of the components may be implemented as separate devices physically separated from other units, or may be components communicatively connected to each other in a distributed computing environment.

음성 입력부(100)는 대상자의 음성 샘플을 생성한다. 음성 입력부(100)는 대상자의 음성을 일정 주파수로 녹음하여 음성 샘플을 생성하도록 구성될 수 있다. 음성 입력부(100)는 콘덴서 마이크 및 이를 제어하기 위한 장치를 포함할 수 있으며, 16Hz 이상의 주파수로 대상자의 음성을 녹음하여 음성 샘플을 생성할 수 있다. 대상자는 녹음이 가능한 조용한 방에서 그림 묘사, 표준 문단 발화 및 이야기 회상하기 과제 중 적어도 하나를 실시하여 자연 발화 또는 낭독 발화를 하게 되며, 음성 입력부(100)는 이러한 대상자의 발화를 녹음하여 음성 샘플을 생성하게 된다.The voice input unit 100 generates a voice sample of the subject. The voice input unit 100 may be configured to generate a voice sample by recording the subject's voice at a predetermined frequency. The voice input unit 100 may include a condenser microphone and a device for controlling the same, and may generate a voice sample by recording the subject's voice at a frequency of 16 Hz or higher. The subject performs at least one of picture description, standard paragraph utterance, and story recall tasks in a quiet room where recording is possible to perform spontaneous utterance or aloud utterance, and the voice input unit 100 records the subject's utterance to obtain a voice sample. will create

데이터 입력부(110)는 대상자의 인구통계학적 정보를 수신한다. 인구통계학적 정보는 적어도 대상자의 연령, 성별, 교육연수(교육을 받은 학력 정도)를 포함한다. 데이터 입력부(110)를 통해 입력되는 자료는 대상자의 연령, 성별, 교육연수를 확인할 수 있거나, 이러한 정보들을 추출할 수 있는 데이터를 의미한다. 예시적으로, 인구통계학적 정보는 대상자의 문진 자료를 통해 습득될 수 있다. 문진 자료는 환자의 연령, 성별, 교육연수와 관련된 정보를 포함할 수 있으며, 훈련된 전문의의 의학적 판단을 이용하여 생성되는 자료에 해당하며, 기타 의료진 및 환자의 보호자 관리하에 수집된 자료일 수 있다. 데이터 입력부(110)는 상술한 문진 자료를 통해 인구통계학적 정보를 수신할 수 있으나, 이에 한정되는 것은 아니다. 몇몇 실시예에서, 대상자의 인구통계학적 정보는 대상자가 직접 입력한 정보일 수 있다.The data input unit 110 receives demographic information of a subject. Demographic information includes at least the age, gender, and education years (degree of education received) of the subject. Data input through the data input unit 110 means data from which the age, gender, and education years of the subject can be confirmed, or such information can be extracted. Illustratively, demographic information may be acquired through questionnaire data of the subject. The questionnaire data may include information related to the patient's age, gender, and education and training, and correspond to data generated using the medical judgment of a trained specialist, and may be data collected under the management of other medical staff and the patient's guardian. . The data input unit 110 may receive demographic information through the above-described questionnaire, but is not limited thereto. In some embodiments, the subject's demographic information may be information directly input by the subject.

음성 입력부(100), 데이터 입력부(110)를 통해 입력된 데이터들은 데이터 저장부(140)에 저장될 수 있다. 데이터 저장부(140)는 입력된 데이터들이 저장되거나, 후술하는 예측 모델의 데이터 처리에 필요한 임시 또는 일시적 저장 공간을 제공하도록 구성될 수 있다. Data input through the voice input unit 100 and the data input unit 110 may be stored in the data storage unit 140 . The data storage unit 140 may be configured to store input data or to provide a temporary or temporary storage space required for data processing of a predictive model to be described later.

음성 특성 추출부(120)는 입력된 대상자의 음성 샘플에서 환자의 음성 특성을 추출할 수 있다. 음성 특성 추출부(120)는 음성 샘플의 음운 특성, 근원 특성, 스펙트럼 특성과 관련된 음성 특성을 추출할 수 있다. 구체적으로, 음성 특성 추출부(120)는 대상자 음성의 기본 주파수(fundamental frequency), 발화와 관련된 정보(발화 속도, 발화 시간, 발화 길이) 휴기와 관련된 정보(휴기 정도, 휴기 횟수, 휴기 구간 길이), 쉼머(Shimmer), 지터(Jitter), 포만트(formant), 비음 대 소음 비율(harmonic-to-noise ratio), 라우드니스(loudness), 스펙트럼 수치(spectral centroid) 멜 주파수 켑스트럼 계수(MFCC ,Mel Frequency Cepstral Coefficients), 아이덴티티 벡터(i-vector), 조음 속도, 영교차율(zcr, zero-crossing rate), 음성 확률(vp, voicing probability), 선 스펙트럼 순서쌍(LSP, line spectral paris), 주기 변동(Period perturbation), 진폭 변동 지수(APQ, amplitude perturbation quotient), 강성(Stiffness), 에너지(Energy), 강도(목소리의 크기, Intensity), 엔트로피(Entropy) 중 적어도 하나를 음성 특성으로 추출할 수 있다.The voice characteristic extractor 120 may extract the patient's voice characteristic from the inputted subject's voice sample. The voice characteristic extractor 120 may extract a voice characteristic related to a phonological characteristic, a root characteristic, and a spectral characteristic of a voice sample. Specifically, the speech characteristic extracting unit 120 provides information related to the fundamental frequency of the subject's voice, utterance (speech rate, utterance time, utterance length) information related to rest (degree of rest, number of rest periods, length of rest period) , Shimmer, Jitter, formant, harmonic-to-noise ratio, loudness, spectral centroid Mel frequency cepstrum coefficient (MFCC , Mel Frequency Cepstral Coefficients), identity vector (i-vector), articulation rate, zero-crossing rate (zcr, zero-crossing rate), voicing probability (vp, voicing probability), line spectral paris (LSP, line spectral paris), periodic variation At least one of (period perturbation), amplitude perturbation quotient (APQ), stiffness, energy, intensity (voice loudness, Intensity), and entropy may be extracted as a voice characteristic .

여기서, 음성 특성 추출부(120)은 음성 샘플을 정량화하기 위한 전처리 과정을 먼저 수행할 수 있다. 전처리 과정을 통해 음성 샘플의 시간, 주파수 등이 일정하도록 조정될 수 있다. 또한, 전처리 과정을 통해 사람의 음성과 사람의 음성이 아닌 음성을 구분할 수 있다. 음성 특성 추출부(120)은 입력된 음성 샘플에서 사람의 음성만을 선별하도록 학습된 인공 신경망 모형(예를 들어, 합성곱 신경망)을 포함할 수 있으며, 학습된 인공 신경망 모형을 통해 상술한 사람의 음성만을 선별하는 전처리를 수행할 수 있다. 음성 특성 추출부(120)는 전처리된 음성 샘플에서 음성의 기본 주파수(fundamental frequency), 발화 속도, 발화 시간, 발화 길이, 휴기 정도, 휴기 횟수, 휴기 구간 길이, 쉼머(Shimmer), 지터(Jitter), 포만트(formant), 비음 대 소음 비율(harmonic-to-noise ratio), 라우드니스(loudness), 스펙트럼 수치(spectral centroid) 멜 주파수 켑스트럼 계수(MFCC ,Mel Frequency Cepstral Coefficients), 아이덴티티 벡터(i-vector), 조음 속도, 영교차율(zcr, zero-crossing rate), 음성 확률(vp, voicing probability), 선 스펙트럼 순서쌍(LSP, line spectral paris), 주기 변동(Period perturbation), 진폭 변동 지수(APQ, amplitude perturbation quotient), 강성(Stiffness), 에너지(Energy), 강도(목소리의 크기, Intensity), 엔트로피(Entropy) 중 적어도 하나를 포함하는 음성 특성을 추출할 수 있다. 음성 특성 추출부(120)는 공개된 음성 특성 추출 프로그램(예를 들어, Praat)를 이용하여 음운, 근원, 스펙트럼 영역에 해당하는 음성 특성을 추출할 수도 있다.Here, the voice characteristic extractor 120 may first perform a pre-processing process for quantifying the voice sample. Through the pre-processing process, the time, frequency, etc. of the voice sample may be adjusted to be constant. In addition, it is possible to distinguish a human voice from a non-human voice through a preprocessing process. The speech characteristic extraction unit 120 may include an artificial neural network model (eg, a convolutional neural network) trained to select only human voices from the input voice samples, and may include the human voice model described above through the learned artificial neural network model. Preprocessing for selecting only negatives may be performed. The speech characteristic extracting unit 120 includes a fundamental frequency, a speech rate, a speech time, a speech length, a degree of pause, the number of pauses, a length of a pause period, a shimmer, and a jitter in the preprocessed voice sample. , formant, harmonic-to-noise ratio, loudness, spectral centroid, Mel Frequency Cepstral Coefficients (MFCC), identity vector (i) -vector), articulation rate, zero-crossing rate (zcr, zero-crossing rate), voicing probability (vp, voicing probability), line spectral paris (LSP, line spectral paris), period perturbation, amplitude fluctuation index (APQ) , amplitude perturbation quotient), stiffness, energy, intensity (volume of a voice, Intensity), and a voice characteristic including at least one of entropy may be extracted. The voice characteristic extraction unit 120 may extract voice characteristics corresponding to phonological, root, and spectral regions using a publicly available voice characteristic extraction program (eg, Praat).

예측 모델(130)는 음성 특성 추출부(120)에서 추출된 음성 특성과 데이터 입력부(110)를 통해 제공된 인구통계학적 정보를 통해 대상자의 알츠하이머병 여부를 예측하도록 미리 학습된 상태일 수 있다. 예측 모델(130)은 선형회귀 모형, 로지스틱 회귀 모형, 기계학습 모형 및 신경망 모형 중 적어도 하나의 분석 모형을 포함할 수 있다. The predictive model 130 may be in a state previously trained to predict whether a subject has Alzheimer's disease through the voice characteristics extracted by the voice characteristic extracting unit 120 and the demographic information provided through the data input unit 110 . The predictive model 130 may include at least one analysis model among a linear regression model, a logistic regression model, a machine learning model, and a neural network model.

예측 모델(130)은 다변량 로지스틱 회귀 모형을 포함하며, 다변량 로지스틱 회귀 모형은 하기 수학식 1과 같이 구성될 수 있다.The predictive model 130 includes a multivariate logistic regression model, and the multivariate logistic regression model may be configured as shown in Equation 1 below.

[수학식 1][Equation 1]

(여기서, X₁내지 X_p는 독립 변수로서 예측 모델에 입력되는 입력 값으로 p개의 음성 특성 및 인구통계학적 정보에 각각 대응되고,

내지

는 독립 변수의 회귀 계수인 상수 값에 해당하며,

는 초기 상수 값에 해당하고,

는 치매 위험 확률 값에 해당한다)(Here, X ₁ to X _p are input values input to the predictive model as independent variables, respectively, corresponding to p voice characteristics and demographic information,

inside

corresponds to the constant value that is the regression coefficient of the independent variable,

is the initial constant value,

corresponds to the dementia risk probability value)

상술한 수학식 1에 기초하여 구성된 예측 모델(130)은 치매 위험 확률 값을 출력하고, 상기 치매 위험 확률 값에 기초하여 환자의 상태를 평가한 상태 정보를 출력할 수 있다. 치매 위험 확률 값은 치매로 진단 받을 확률을 의미하며, 상태 정보는 환자를 전문의 진단 기준에 따라 진단한 것으로 환자의 상태는 알츠하이머병 또는 정상인지기능으로 결정될 수 있다. 예를 들어, 예측 모델(130)은 계산된 치매 위험 확률 값(

)이 0.5 이상인 경우 알츠하이머병으로 환자의 상태를 결정하며, 계산된 치매 위험 확률 값이 0.5 미만인 경우 정상인지기능으로 환자의 상태를 결정할 수 있다. The predictive model 130 constructed based on Equation 1 above may output a dementia risk probability value and output state information obtained by evaluating a patient's condition based on the dementia risk probability value. The dementia risk probability value means the probability of being diagnosed with dementia, and the status information indicates that the patient is diagnosed according to the diagnostic criteria of a specialist, and the patient's status may be determined as Alzheimer's disease or normal cognitive function. For example, the predictive model 130 calculates the dementia risk probability value (

) is 0.5 or more, the patient's status is determined as Alzheimer's disease, and when the calculated dementia risk probability value is less than 0.5, the patient's status can be determined by normal cognitive function.

다만 본 발명의 예측 모델(130)이 이에 한정되는 것은 아니다. 예측 모델(130)은 다항 다변량 로지스틱 회귀모형으로 구성될 수 있으며, 예측 모델(130)은 계산된 치매 위험 확률 값들에 기초하여 환자의 상태를 알츠하이머병, 경도인지장애 및 정상인지기능 중 하나로 결정할 수도 있다.However, the predictive model 130 of the present invention is not limited thereto. The predictive model 130 may be configured as a multinomial multivariate logistic regression model, and the predictive model 130 may determine the patient's condition as one of Alzheimer's disease, mild cognitive impairment, and normal cognitive function based on the calculated dementia risk probability values. have.

이하, 본 발명의 다른 실시예에 따른 음성 특성 기반 알츠하이머병 예측 방법을 설명하도록 한다. Hereinafter, a method for predicting Alzheimer's disease based on voice characteristics according to another embodiment of the present invention will be described.

도 3은 실시예에 따른 음성 특성 기반 알츠하이머병 예측 방법의 순서도이다. 본 방법은 도 1 및 도 2에 따른 예측 장치에서 수행될 수 있으며, 본 실시예의 설명을 위해 도 1, 도 2 및 관련 설명이 참조될 수 있다. 3 is a flowchart of a method for predicting Alzheimer's disease based on voice characteristics according to an embodiment. The method may be performed in the prediction apparatus according to FIGS. 1 and 2 , and FIGS. 1 and 2 and related descriptions may be referred to for the description of the present embodiment.

도 3을 참조하면, 실시예에 따른 음성 특성 기반 알츠하이머병 예측 방법은 대상자의 음성을 녹음하여 음성 샘플을 생성하는 단계(S100); 상기 대상자의 인구통계학적 정보를 수신하는 단계(S110); 상기 생성된 음성 샘플에서 음성 특성을 추출하는 단계(S120); 및 미리 학습된 예측 모델에 상기 음성 특성과 상기 인구통계학적 정보를 입력하여 상기 대상자의 알츠하이머병 여부를 예측하는 단계(S130)를 포함한다. Referring to FIG. 3 , the method for predicting Alzheimer's disease based on voice characteristics according to an embodiment includes generating a voice sample by recording a subject's voice (S100); Receiving demographic information of the subject (S110); extracting a voice characteristic from the generated voice sample (S120); and predicting whether the subject has Alzheimer's disease by inputting the voice characteristics and the demographic information into a pre-learned prediction model (S130).

실시예에 따른 음성 특성 기반 알츠하이머병 예측 방법의 각 단계에서 단계(S100)와 단계(S110)는 설명의 편의를 위해 순차적으로 기재하고 설명하는 것으로, 기재 순서에 국한되어 순차적으로 수행되는 것은 아니다. 몇몇 실시예에서, 단계(S110)가 단계(S100)보다 먼저 수행될 수도 있다. 또한, 상기 예측 방법이 수행되기 이전 예측 모델을 학습하는 단계가 먼저 수행될 수 있다. Steps (S100) and (S110) in each step of the method for predicting Alzheimer's disease based on negative characteristics according to the embodiment are sequentially described and explained for convenience of explanation, and are limited to the order of description and are not sequentially performed. In some embodiments, step S110 may be performed before step S100. In addition, the step of learning the prediction model before the prediction method is performed may be performed first.

대상자의 음성을 녹음하여 음성 샘플을 생성한다(S100).A voice sample is generated by recording the subject's voice (S100).

대상자는 녹음이 가능한 조용한 방에서 그림 묘사, 표준 문단 발화 및 이야기 회상하기 과제 중 적어도 하나를 실시하여 자연 발화 또는 낭독 발화를 하게 되며, 이러한 대상자의 발화를 녹음하여 음성 샘플을 생성하게 된다. 음성 입력부(100)는 콘덴서 마이크 및 이를 제어하기 위한 장치를 포함할 수 있으며, 16Hz 이상의 주파수로 대상자의 음성을 녹음하여 음성 샘플을 생성할 수 있다.Subjects perform at least one of picture description, standard paragraph utterance, and story recall tasks in a quiet room with recording capability to utter spontaneous or aloud utterances, and voice samples are generated by recording the subject's utterances. The voice input unit 100 may include a condenser microphone and a device for controlling the same, and may generate a voice sample by recording the subject's voice at a frequency of 16 Hz or higher.

상기 대상자의 인구통계학적 정보를 수신한다(S110).Receives demographic information of the subject (S110).

상기 인구통계학적 정보는 상기 대상자의 연령, 성별 및 교육연수를 포함한다. 예시적으로, 인구통계학적 정보는 대상자의 문진 자료를 통해 습득될 수 있다. 문진 자료는 환자의 연령, 성별, 교육연수와 관련된 정보를 포함할 수 있으며, 훈련된 전문의의 의학적 판단을 이용하여 생성되는 자료에 해당하며, 기타 의료진 및 환자의 보호자 관리하에 수집된 자료일 수 있다. 데이터 입력부(110)는 상술한 문진 자료를 통해 인구통계학적 정보를 수신할 수 있으나, 이에 한정되는 것은 아니다. 몇몇 실시예에서, 대상자의 인구통계학적 정보는 대상자가 직접 입력한 정보일 수 있다.The demographic information includes the age, sex, and education years of the subject. Illustratively, demographic information may be acquired through questionnaire data of the subject. The questionnaire data may include information related to the patient's age, gender, and education and training, and correspond to data generated using the medical judgment of a trained specialist, and may be data collected under the management of other medical staff and the patient's guardian. . The data input unit 110 may receive demographic information through the above-described questionnaire, but is not limited thereto. In some embodiments, the subject's demographic information may be information directly input by the subject.

다음으로, 생성된 음성 샘플에서 음성 특성을 추출한다(S120).Next, a voice characteristic is extracted from the generated voice sample (S120).

입력된 대상자의 음성 샘플에서 환자의 음성 특성이 추출될 수 있다. 환자의 음성 샘플을 정량화하기 위한 전처리 과정에 인공 신경망 모형이 활용될 수 있다. 전처리 과정을 통해 음성 샘플의 시간, 주파수 등이 일정하도록 조정될 수 있다. 또한, 입력된 음성 샘플이 다수인 경우 음성 샘플을 선별하기 위한 전처리 과정이 수행될 수 있다. 음성 특성 추출부(120)는 전처리된 음성 샘플에서 음성의 기본 주파수(fundamental frequency), 발화 속도, 발화 시간, 발화 길이, 휴기 정도, 휴기 횟수, 휴기 구간 길이, 쉼머(Shimmer), 지터(Jitter), 포만트(formant), 비음 대 소음 비율(harmonic-to-noise ratio), 라우드니스(loudness), 스펙트럼 수치(spectral centroid) 멜 주파수 켑스트럼 계수(MFCC ,Mel Frequency Cepstral Coefficients), 아이덴티티 벡터(i-vector), 조음 속도, 영교차율(zcr, zero-crossing rate), 음성 확률(vp, voicing probability), 선 스펙트럼 순서쌍(LSP, line spectral paris), 주기 변동(Period perturbation), 진폭 변동 지수(APQ, amplitude perturbation quotient), 강성(Stiffness), 에너지(Energy), 강도(목소리의 크기, Intensity), 엔트로피(Entropy) 중 적어도 하나를 포함하는 음성 특성을 추출할 수 있다.A voice characteristic of the patient may be extracted from the inputted subject's voice sample. An artificial neural network model may be utilized in a preprocessing process for quantifying a patient's voice sample. Through the pre-processing process, the time, frequency, etc. of the voice sample may be adjusted to be constant. In addition, when there are a plurality of input voice samples, a pre-processing process for selecting the voice samples may be performed. The speech characteristic extracting unit 120 includes a fundamental frequency, a speech rate, a speech time, a speech length, a degree of pause, the number of pauses, a length of a pause period, a shimmer, and a jitter in the preprocessed voice sample. , formant, harmonic-to-noise ratio, loudness, spectral centroid, Mel Frequency Cepstral Coefficients (MFCC), identity vector (i) -vector), articulation rate, zero-crossing rate (zcr, zero-crossing rate), voicing probability (vp, voicing probability), line spectral paris (LSP, line spectral paris), period perturbation, amplitude fluctuation index (APQ) , amplitude perturbation quotient), stiffness, energy, intensity (volume of a voice, Intensity), and a voice characteristic including at least one of entropy may be extracted.

다음으로, 미리 학습된 예측 모델에 상기 음성 특성과 상기 인구통계학적 정보를 입력하여 상기 대상자의 알츠하이머병 여부를 예측한다(S130).Next, the voice characteristics and the demographic information are input to the pre-trained prediction model to predict whether the subject has Alzheimer's disease (S130).

예측 모델(130)은 다변량 로지스틱 회귀 모델을 포함하며, 상기 다변량 로지스틱 회귀 모형은 하기 수학식 1과 같이 구성될 수 있다.The predictive model 130 includes a multivariate logistic regression model, and the multivariate logistic regression model may be configured as shown in Equation 1 below.

[수학식 1][Equation 1]

내지

는 독립 변수의 회귀 계수인 상수 값에 해당하며,

는 초기 상수 값에 해당하고,

inside

is the initial constant value,

corresponds to the dementia risk probability value)

상술한 수학식 1에 기초하여 구성된 예측 모델(130)은 치매 위험 확률 값을 출력하고, 상기 치매 위험 확률 값에 기초하여 환자의 상태를 평가한 상태 정보를 출력할 수 있다. 치매 위험 확률 값은 치매로 진단 받을 확률을 의미하며, 상태 정보는 환자를 전문의 진단 기준에 따라 진단한 것으로 환자의 상태는 알츠하이머병 또는 정상인지기능으로 결정될 수 있다. 예를 들어, 예측 모델(130)은 계산된 치매 위험 확률 값이 0.5 이상인 경우 알츠하이머병으로 환자의 상태를 결정하며, 계산된 치매 위험 확률 값이 0.5 미만인 경우 정상인지기능으로 환자의 상태를 결정할 수 있다.The predictive model 130 constructed based on Equation 1 above may output a dementia risk probability value and output state information obtained by evaluating a patient's condition based on the dementia risk probability value. The dementia risk probability value means the probability of being diagnosed with dementia, and the status information indicates that the patient is diagnosed according to the diagnostic criteria of a specialist, and the patient's status may be determined as Alzheimer's disease or normal cognitive function. For example, the predictive model 130 determines the patient's status as Alzheimer's disease when the calculated dementia risk probability value is 0.5 or more, and determines the patient's status with normal cognitive function when the calculated dementia risk probability value is less than 0.5. have.

이러한 실시예들에 따른 음성 특성 기반 알츠하이머병 예측 방법은, 애플리케이션으로 구현되거나 다양한 컴퓨터 구성요소를 통하여 수행될 수 있는 프로그램 명령어의 형태로 구현되어 컴퓨터 판독 가능한 기록 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능한 기록 매체는 프로그램 명령어, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. The voice characteristic-based Alzheimer's disease prediction method according to these embodiments may be implemented as an application or implemented in the form of program instructions that may be executed through various computer components and recorded in a computer-readable recording medium. The computer-readable recording medium may include program instructions, data files, data structures, etc. alone or in combination.

컴퓨터 판독 가능한 기록 매체의 예에는, 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CD-ROM, DVD와 같은 광기록 매체, 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 ROM, RAM, 플래시 메모리 등과 같은 프로그램 명령어를 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. Examples of the computer-readable recording medium include hard disks, magnetic media such as floppy disks and magnetic tapes, optical recording media such as CD-ROMs and DVDs, and magneto-optical media such as floppy disks. media), and hardware devices specially configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like.

프로그램 명령어의 예에는, 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드도 포함된다. 상기 하드웨어 장치는 본 발명에 따른 처리를 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.Examples of program instructions include not only machine language codes such as those generated by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like. The hardware device may be configured to operate as one or more software modules for carrying out the processing according to the present invention, and vice versa.

이제까지 본 발명에 대하여 그 바람직한 실시예들을 중심으로 살펴보았다. 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자는 본 발명이 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 변형된 형태로 구현될 수 있음을 이해할 수 있을 것이다. 그러므로 개시된 실시예들은 한정적인 관점이 아니라 설명적인 관점에서 고려되어야 한다. 본 발명의 범위는 전술한 설명이 아니라 특허청구범위에 나타나 있으며, 그와 동등한 범위 내에 있는 모든 차이점은 본 발명에 포함된 것으로 해석되어야 할 것이다.So far, the present invention has been looked at with respect to preferred embodiments thereof. Those of ordinary skill in the art to which the present invention pertains will understand that the present invention can be implemented in a modified form without departing from the essential characteristics of the present invention. Therefore, the disclosed embodiments are to be considered in an illustrative rather than a restrictive sense. The scope of the present invention is indicated in the claims rather than the foregoing description, and all differences within the scope equivalent thereto should be construed as being included in the present invention.

실험예Experimental example

상술한 실시예에 따른 음성 특성 기반 알츠하이머병 예측 방법 및 장치의 예측 모델의 구축하고, 구축된 예측 모델을 검증하는 실험을 수행하였다. An experiment was performed to construct a predictive model of the Alzheimer's disease prediction method and apparatus based on the voice characteristics according to the above-described embodiment, and to verify the constructed predictive model.

예측 모델의 구축을 위한 대상자 데이터는 총 210명을 대상으로, 보라매병원에 내원하는 환자 및 동작구 치매안심센터에 등록된 사람들을 대상으로 음성 및 진단 정보를 획득하였다. 전체 대상자 중, 알츠하이머성 치매군에 해당하는 사람이 106명, 정상군은 104명이었다. 대상자의 음성 수집을 위하여 검사자와 대상자와 상호작용하는 동안 음성을 수집하였다.The subject data for the construction of the predictive model was obtained from a total of 210 patients, and voice and diagnostic information were obtained from patients visiting Boramae Hospital and those registered at Dongjak-gu Dementia Safety Center. Among the total subjects, 106 people in the Alzheimer's dementia group and 104 in the normal group were. To collect the subject's voice, the voice was collected while interacting with the examiner and the subject.

대상자의 음성을 정량화 하기 위하여, 먼저 인공신경망 모형(합성곱 신경망)을 활용하여 음성 샘플에 대한 전처리를 수행하였다. 즉, 입력된 음성 샘플이 다수인 경우 사람의 음성 샘플을 선별하기 위한 전처리 과정이 먼저 수행되어 학습에 노이즈 데이터가 입력되는 것을 방지하였다. 전처리된 음성에 대하여 자동화된 음성 특성 추출 방법을 활용하여 음성의 음성 특성을 추출하였다. 음성의 기본 주파수(fundamental frequency(f0) mean, f0 std), 발화와 관련된 정보(발화 속도, 발화 시간, 발화 길이), 휴기(pause)와 관련된 정보(pause rate, pause count, pause duration mean, pause duration standard deviation(std)), 쉼머(Shimmer), 지터(Jitter), 포만트(formant), 비음 대 소음 비율(harmonic-to-noise ratio), 라우드니스(loudness), 스펙트럼 수치(spectral centroid mean, spectral centroid std)와 같은 음성 특성이 각각 추출되었다.In order to quantify the subject's voice, preprocessing was performed on the voice sample using an artificial neural network model (convolutional neural network) first. That is, when there are a large number of input voice samples, a preprocessing process for selecting human voice samples is performed first to prevent noise data from being input for learning. For the pre-processed voice, the voice characteristics of the voice were extracted using an automated voice characteristic extraction method. Fundamental frequency (f0) mean, f0 std), speech-related information (speech rate, speech time, speech length), information related to pauses (pause rate, pause count, pause duration mean, pause duration standard deviation (std)), Shimmer, Jitter, formant, harmonic-to-noise ratio, loudness, spectral centroid mean, spectral centroid std) were extracted, respectively.

인구 통계학적 정보(연령, 성별, 교육연수)와 추출된 각각의 음성 특성을 입력 값으로 하고, 알츠하이머성 치매 예측 모델을 대상자의 치매 위험 확률 값을 출력하고, 상기 치매 위험 확률 값에 기초하여 환자의 상태를 평가한 상태 정보를 출력하는 예측 모델을 구축하였다. 예측 모델은 다변량 로지스틱 회귀 모델로 구현되었으며, 치매 위험 확률 값을 출력하도록 구축되었다. 예측 모델은 계산된 치매 위험 확률 값이 0.5 이상인 경우 알츠하이머병으로 환자의 상태를 결정하며, 계산된 치매 위험 확률 값이 0.5 미만인 경우 정상인지기능으로 환자의 상태를 결정할 수 있다.Demographic information (age, gender, education year) and each extracted voice characteristic are taken as input values, the Alzheimer's dementia prediction model outputs the subject's dementia risk probability value, and the patient based on the dementia risk probability value A predictive model that outputs state information that evaluates the state of The predictive model was implemented as a multivariate logistic regression model and constructed to output dementia risk probability values. The predictive model may determine the patient's status as Alzheimer's disease when the calculated dementia risk probability value is 0.5 or more, and determine the patient's status with normal cognitive function when the calculated dementia risk probability value is less than 0.5.

구축된 예측 모델을 통해 예측 성능을 테스트하였다. 구체적으로, 예측 모델에 대하여, 예측 성능의 인덱스로더 ROC(receiver operating characteristic) 커브(curve)의 AUC(area under the curve)를 산출하였다. AUC는 ROC 커브의 밑면적을 구한 값으로 예측 모형의 전반적인 성능을 나타내는 대표적인 인덱스로 1에 가까울수록 좋은 성능을 의미한다. 도 4는 구축된 예측 모델을 결과를 도시하는 그래프이며, Youden Index에 따른 optimal cutoff score에 따르면, 아래 표 1과 같은 성능을 나타낸다. The prediction performance was tested through the built prediction model. Specifically, with respect to the prediction model, the AUC (area under the curve) of the ROC (receiver operating characteristic) curve of the index loader of the prediction performance was calculated. AUC is a value obtained by obtaining the area under the ROC curve. It is a representative index indicating the overall performance of the predictive model. The closer to 1, the better the performance. 4 is a graph showing the results of the constructed prediction model, and according to the optimal cutoff score according to the Youden Index, the performance is shown in Table 1 below.

Cutoffcutoff AUCAUC AUC.seAUC.se SensitivitySensitivity SpecificitySpecificity PPVPPV NPVNPV 0.4370.437 0.8160.816 0.0290.029 0.8020.802 0.6990.699 0.7330.733 0.7740.774

예측력은 AUC = 0.816, 민감도(Sensitivity)는 0.802, 그리고 특이도(Specificity)는 0.802의 성능을 나타났다. PPV(positive predictive value)는 0.733, NPV(negative predictive value)는 0.774로 나타났다.The predictive power was AUC = 0.816, the sensitivity was 0.802, and the specificity was 0.802. PPV (positive predictive value) was 0.733 and NPV (negative predictive value) was 0.774.

종래 알츠하이머병 진단을 위해 시행되는 검사는 대뇌 침착 아밀로이드 PET 및 뇌척수액 검사를 이용하는 방법이 사용되었다. 아밀로이드 PET을 사용하는 것은 비용이 비싸고, 3차 병원과 같이 전문 의학센터를 운영하는 병원이 아니고서는 쉽게 이용하기 어려울 뿐만 아니라 방사선에 대한 노출과 같은 위험요인이 있다. 뇌척수액 분석에 대해서는, 침습적인 요추 천자요법을 사용하고, 작업 및 분석에 들어가는 노동이 필요하고, 기관에 따른 신뢰도의 차이가 있다는 제한점이 있다.Conventionally, the test performed for the diagnosis of Alzheimer's disease was a method using a cerebral deposited amyloid PET and cerebrospinal fluid test. The use of amyloid PET is expensive, and it is difficult to use except in hospitals that operate specialized medical centers such as tertiary hospitals, and there are risk factors such as exposure to radiation. For cerebrospinal fluid analysis, there are limitations in that invasive lumbar puncture is used, labor is required for work and analysis, and there is a difference in reliability depending on the institution.

이와 달리, 본원 발명의 음성 특성 기반 알츠하이머병 예측 방법 및 장치는 비침습적인 방법을 사용한 검사를 통해서 알츠하이머병을 진단하는 것으로, AI스피커/스마트폰/태블릿/PC 등을 통해 진행할 수 있는 시간, 공간, 전문가를 초월한 정신과선별검사로 집에서나 병원 아닌 곳에서 편하게 검사를 진행할 수 있다. 또한, 1,2차 클리닉에서 알츠하이머병 위험도 판단을 할 수 있으며, False-positive를 최소한으로 할 수 있어 비용을 절약할 수 있는 효과도 있으며, 향후 치료 프로그램까지 확장될 수 있다. In contrast, the voice characteristic-based Alzheimer's disease prediction method and apparatus of the present invention diagnose Alzheimer's disease through a non-invasive test, and time and space that can be progressed through AI speakers/smartphones/tablets/PCs, etc. , a psychiatric screening test that transcends experts, allowing you to conveniently conduct the test at home or in a non-hospital setting. In addition, the risk of Alzheimer's disease can be determined at the primary and secondary clinics, and the false-positive can be minimized, which has the effect of saving costs and can be extended to treatment programs in the future.

즉, 대표적인 치매 원인질병인 알츠하이머병을 정확하게 진단하는데 사용되는 뇌척수액 검수나 양전자 방출 단층 촬영술(positron emission tomograph, PET)이 전문가에 의해 특정한 공간 및 시간에 제약됨을 고려할 때, 본 발명과 같은 AI 스피커 및 스마트 기기를 이용한 대화 대상자의 알츠하이머병 유무에 대한 검사 방법은 기존 검사에 수반되는 시간적, 공간적, 자원적 한계를 극복하는데 도움을 줄 것으로 예상된다.That is, considering that the cerebrospinal fluid test or positron emission tomograph (PET) used to accurately diagnose Alzheimer's disease, a typical dementia-causing disease, is limited to a specific space and time by an expert, an AI speaker like the present invention and The test method for Alzheimer's disease in conversational subjects using smart devices is expected to help overcome the temporal, spatial, and resource limitations associated with existing tests.

결과적으로 음성의 음향학적 특성을 활용한 알츠하이머병 위험도 진단은 지속적으로 늘어나는 치매 인구에 대한 적절한 개입방안을 마련할 수 있을 것으로 기대된다.As a result, Alzheimer's disease risk diagnosis using the acoustic characteristics of speech is expected to provide an appropriate intervention plan for the continuously increasing dementia population.

10: 알츠하이머병 예측 장치
100: 음성 입력부
110: 데이터 입력부
120: 음성 특성 추출부
130: 예측 모델
140: 데이터 저장부10: Alzheimer's disease prediction device
100: voice input unit
110: data input unit
120: voice feature extraction unit
130: predictive model
140: data storage unit

Claims

a voice input unit configured to record the subject's voice to generate a voice sample;
a data input configured to receive demographic information of the subject;
a voice characteristic extracting unit for extracting voice characteristics from the generated voice sample; and
and a predictive model trained in advance to predict whether or not the subject has Alzheimer's disease based on the voice characteristics and the demographic information.

According to claim 1,
The demographic information is characterized in that it includes the age, sex, and education years of the subject, Alzheimer's disease prediction device.

According to claim 1,
The voice characteristic extracting unit includes a fundamental frequency of a voice, a utterance rate, a utterance time, a utterance length, a degree of pause, the number of pauses, a length of a pause period, a shimmer, a jitter, a formant, and a nasal sound. harmonic-to-noise ratio, loudness, spectral centroid Mel Frequency Cepstral Coefficients (MFCC), identity vector (i-vector), articulation velocity, zero Crossing rate (zcr, zero-crossing rate), negative probability (vp, voicing probability), line spectral paris (LSP), period perturbation, amplitude perturbation quotient (APQ), stiffness ( Stiffness), energy (Energy), intensity (voice loudness, Intensity), characterized in that for extracting at least one of entropy (Entropy) as the voice characteristic, Alzheimer's disease prediction apparatus.

4. The method of claim 3,
The speech characteristic extraction unit includes an artificial neural network model that performs preprocessing for selecting human speech from the speech sample,
The apparatus for predicting Alzheimer's disease, characterized in that the speech characteristic extractor extracts the speech characteristic from a preprocessed speech sample.

According to claim 1,
The predictive model is characterized in that it comprises at least one analysis model of a linear regression model, a logistic regression model, a machine learning model, and a neural network model, Alzheimer's disease prediction apparatus.

6. The method of claim 5,
The logistic regression model is a multivariate logistic regression model, and the multivariate logistic regression model is configured as in Equation 1 below, Alzheimer's disease prediction apparatus.

[Equation 1]

(Here, X ₁ to X _p are input values input to the predictive model as independent variables, respectively, corresponding to p voice characteristics and demographic information,

inside

is the initial constant value,

corresponds to the dementia risk probability value)

7. The method of claim 6,
The prediction model outputs a dementia risk probability value, and outputs state information that evaluates the patient's condition based on the dementia risk probability value,
The state information is Alzheimer's disease prediction apparatus, characterized in that determined as Alzheimer's disease or normal cognitive function.

recording the subject's voice to generate a voice sample;
receiving demographic information of the subject;
extracting a voice characteristic from the generated voice sample; and
and predicting whether the subject has Alzheimer's disease by inputting the voice characteristics and the demographic information into a pre-trained prediction model.

A computer-readable medium storing computer-readable instructions, the instructions, when executed by at least one processor, cause the at least one processor to perform steps, the steps comprising:
recording the subject's voice to generate a voice sample;
receiving demographic information of the subject;
extracting a voice characteristic from the generated voice sample; and
and predicting whether the subject has Alzheimer's disease by inputting the voice characteristics and the demographic information into a pre-trained prediction model.