WO2020134647A1 - Early-stage ad speech auxiliary screening system aiming at mandarin chinese - Google Patents

Early-stage ad speech auxiliary screening system aiming at mandarin chinese Download PDF

Info

Publication number
WO2020134647A1
WO2020134647A1 PCT/CN2019/117033 CN2019117033W WO2020134647A1 WO 2020134647 A1 WO2020134647 A1 WO 2020134647A1 CN 2019117033 W CN2019117033 W CN 2019117033W WO 2020134647 A1 WO2020134647 A1 WO 2020134647A1
Authority
WO
WIPO (PCT)
Prior art keywords
module
speech
early
picture
feature
Prior art date
Application number
PCT/CN2019/117033
Other languages
French (fr)
Chinese (zh)
Inventor
燕楠
王岚
严泉雷
徐梦真
Original Assignee
深圳先进技术研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳先进技术研究院 filed Critical 深圳先进技术研究院
Publication of WO2020134647A1 publication Critical patent/WO2020134647A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/66Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition

Definitions

  • the invention relates to an early AD speech auxiliary screening system for Mandarin Chinese.
  • AD Alzheimer's disease
  • auxiliary diagnostic methods that may become AD early screening tools from different perspectives such as cognitive impairment, biochemical diagnosis, and neuroelectrophysiology, such as speech testing, olfactory testing, gait testing, retinal imaging testing, and A ⁇ peripheral blood. Screening, urine AD7c-NTP protein detection, electroencephalogram analysis (EEG), etc.
  • the existing early AD screening program mainly has the following problems, which hinders the further clinical application of related diagnostic methods in the early detection and detection of AD and the evaluation of AD disease progression.
  • CSF cerebrospinal fluid
  • neuroimaging neuroimaging
  • neuropsychological scale testing due to limitations such as high cost, low penetration rate of invasive tests, and high threshold access, the above detection methods are difficult to use as a diagnostic tool for large-scale early AD screening.
  • the current related technologies still have some deficiencies in means or schemes, such as small sample size, single feature extraction scheme and feature selection method.
  • the detection tasks, auxiliary materials used and the data features extracted based on the tasks are also different and have great differences. Since early AD language barriers are reflected in many aspects such as speech and semantic extraction, a single speech task cannot fully understand the speech specificity of early AD patients.
  • the present invention proposes an early AD speech auxiliary screening system for Mandarin Chinese.
  • This system aims at early AD AD screening of Mandarin Chinese from the perspective of impaired early AD language function
  • the goal is to provide an extraction solution and supporting devices for early AD speech-specific features. It has the advantages of low cost, real-time access to rich data, easy realization of large sample collection (can be used for big data analysis), and remote acquisition and analysis.
  • the scope of AD early screening and long-term course management have significant application potential.
  • the subject test unit is used to test the tested object;
  • the speech feature extraction unit is used to extract the speech feature of the tested object and store the speech feature;
  • the recognition unit is used to perform speech feature on the speech feature Identify
  • the subject test unit includes a spontaneous voice module, a picture description module, a word fluency module, a poetry recitation module, a sentence restatement module, a picture naming module, a span test module, a carousel pronunciation module, and a picture matching module;
  • the speech feature extraction unit includes an automatic segmentation module for speech signals, a signal preprocessing module, an automatic speech recognition module, a text analysis module, and a speech analysis module; the speech data recorded in each module in the main test unit is first automatically segmented
  • the module gets segmented speech, which is recognized as text after signal preprocessing and automatic speech recognition module.
  • the text analysis module and speech analysis module calculate and estimate their corresponding speech features, vocabulary features, grammatical features, pragmatics according to the results of speech recognition feature;
  • the recognition unit includes a feature selection module and a classification module; the feature selection module selects and optimizes multi-dimensional features to obtain a diagnosis-sensitive feature set; the classification module uses the optimized feature set and uses a multi-task deep confidence network algorithm (Multi- task (DBN), using task relevance to jointly improve the prediction of classification tasks; the network uses classification tasks as the main task, and MMSE and MoCA scores will be trained as related tasks to help improve the prediction performance of classification tasks; based on the prediction results Complete the diagnosis and screening of patients with early AD.
  • DBN multi-task deep confidence network algorithm
  • the above-mentioned spontaneous speech module mainly tests the language vocabulary output and sentence coherence characteristics of the tested object.
  • the above picture description module provides at least one memo picture, and the tested object narrates the plot of the story in the picture in its own language within a given time. Mainly test the language output of the tested objects (including language fluency and vocabulary).
  • the above-mentioned word fluency module provides at least one theme, and the tested object speaks as many related words as possible within 1 minute.
  • the above-mentioned poetry recitation module provides at least one ancient poetry word, which is used to detect the prosody feature of the tested object's speech.
  • the above sentence retelling module provides multiple sentences, and the tested object repeats the seen sentence. This module can detect the speech output of the tested object.
  • the above picture naming module provides multiple pictures that appear in random order, and the tested object speaks the things in the picture. This module can assess whether the tested object has defects in the semantic level of the word, and can also assess the measured object in Whether there are difficulties in vocabulary selection.
  • the above-mentioned span test module provides 2 to 5 Chinese characters with similar pronunciations in turn, requiring the subject to repeat the content they saw; judging the correctness of the repeat based on the module data, and analyzing the correctness when the syllable span is 2-5 rate.
  • voice with low similarity are easier to remember than voices with high similarity, and this module can detect the speech perception of the test subject based on this.
  • All the glyphs involved in this module are tested in the order of two syllables, three syllables, four syllables, and five syllables, and they are played in the order of Yinping, Yangping, Shangsheng, Desheng, Yinping, Yangping, Shangsheng, and Desheng.
  • the above-mentioned carousel pronunciation module provides at least one set of three-syllable syllable strings, which requires the subject to be repeated three times. It is used to detect whether the coordinated movement of the measured object to the syllable series is too slow, and whether the sound composition function is abnormal.
  • each page includes at least three pictures, of which two pictures have relevance.
  • the measured object is required to select two pictures with correlation.
  • auxiliary diagnostic markers which can be used as a comprehensive index for high-sensitivity early screening of AD, suggesting high risk of AD, increasing the discovery rate of early AD patients, and helping to achieve Intervene as early as possible for patients to strive for more effective time for related treatments.
  • FIG. 2 is a schematic structural diagram of an early AD speech assisted screening system for Mandarin Chinese in the present invention
  • Figure 4 is the picture in the picture naming module
  • Language barrier is one of the important features of AD in the early stage, which will lead to changes in the characteristics of the patient's spontaneous speech rhythm, pronunciation cycle, pronunciation quality and speech processing rate. Its clinical manifestations are relatively non-fluent spontaneous speech, difficulty in finding words, language Slow speed, long pauses and phoneme errors.
  • the present invention uses objective analysis methods to quantify the specific performance of early AD language barriers in speech information, and effectively detects relevant speech features through automatic speech analysis (Automatic Speech Analysis, ASA) and automatic speech recognition technologies, with a view to realizing Chinese Mandarin Early automatic screening of users' AD.
  • Automatic Speech Analysis Automatic Speech Analysis
  • an early AD speech assisted screening system for Mandarin Chinese includes a subject test unit, a speech feature extraction unit and a recognition unit.
  • the subject test unit is used to test the tested object;
  • the speech feature extraction unit is used to extract the speech feature of the tested object and store the speech feature;
  • the recognition unit is used to perform speech feature on the speech feature To identify.
  • the subject test unit includes a spontaneous voice module, a picture description module, a word fluency module, a poetry recitation module, a sentence restatement module, a picture naming module, a span test module, a carousel pronunciation module and a picture matching module.
  • the above-mentioned spontaneous speech module mainly tests the language vocabulary output and sentence coherence characteristics of the tested object.
  • this module requires patients to complete a spontaneous emotional speech about 2 minutes around the theme of "self-introduction", such as introducing name, age, family, work, and hobbies.
  • the module data can extract the features of language vocabulary output and sentence coherence.
  • the above picture description module provides at least one memo picture, and the tested object narrates the plot of the story in the picture in its own language within a given time. Mainly test the language output of the tested objects (including language fluency and vocabulary).
  • this module requires patients to observe a memo picture without text, and to narrate the plot of the story in the picture in their own language within a given time (1-2 minutes).
  • the patient needs to understand the characters and events in the picture story and present a structured framework in the output. Therefore, the module can test the patient's language output (including language fluency and vocabulary).
  • This module includes a total of 3 pictures (for example, Figure 3); among them, the prescribed description duration of the first picture is about 2 minutes, and the prescribed description duration of the latter two pictures is about 1 minute. According to the actual performance of the patient, the specific duration can be discretionary.
  • the above-mentioned word fluency module provides at least one theme, and the tested object speaks as many related words as possible within 1 minute.
  • the subject may be fruit, and the interval between the words spoken by the patient may be analyzed, and the long-term memory function of the patient may be analyzed.
  • the above-mentioned poetry recitation module provides at least one ancient poetry word, which is used to detect the prosody feature of the tested object's speech.
  • this module contains a total of 6 simple poems for patients to read, for example: Bai Ri Yi Shan exhausted, the Yellow River into the ocean current. Want to be a thousand miles away, to a higher level.
  • the above sentence restatement module provides multiple sentences, and the tested object repeats the seen sentence.
  • This module can detect the speech output of the patient.
  • this module requires patients to repeat the sentences they see.
  • the module data is used to calculate the interval between words in the process of repeating sentences.
  • This module contains a total of 14 sentences, the difficulty of the sentence gradually increases, for example: tomorrow Sunday.
  • this module can assess whether the patient has defects in the semantic level of vocabulary, and can also assess whether the patient has difficulty in vocabulary selection. Naming example: Giraffe.
  • the above-mentioned span test module provides 2 to 5 Chinese characters with similar pronunciations in sequence (for example, when the span is 2, "an” and “class” appear), and the tested object is required to retell what they see; judge according to the module data Repeat the correctness, and analyze the correct rate when the syllable span is 2-5.
  • voices with low similarity are easier to remember than voices with high similarity, and this module can detect the patient's speech perception based on this.
  • All the glyphs involved in this module are tested in the order of two syllables, three syllables, four syllables, and five syllables, and they are played in the order of Yinping, Yangping, Shangsheng, Desheng, Yinping, Yangping, Shangsheng, and Desheng.
  • each page includes at least three pictures, of which two pictures have relevance.
  • the measured object is required to select two pictures with correlation.
  • each page of this module will give three pictures (for example, Figure 5), where the picture in the upper row has a clear correlation with the picture in the lower row, requiring the patient to use the left and right keys on the keyboard to Select the most relevant picture in the lower row.
  • a table will be generated in the folder; the table records the patient's picture selection, after the experiment is completed, these records can be analyzed and responded to the correct rate.
  • the reaction time of AD patients to things is obviously longer than that of normal people, and the cognitive ability of things is not as good as normal people. Therefore, the data in this module can quickly and intuitively distinguish between patients and normal people by analyzing the picture selection correct rate and reaction time.
  • the organic combination of multi-tasking can reflect the language barrier caused by AD from multiple sides. Therefore, the present invention establishes a multi-task, multi-dimensional speech feature automatic speech abnormal feature extraction framework, with a view to completing based on the difference between the speech features of AD patients and normal people Early screening for AD.
  • the recording and other file information generated by each module is analyzed, and the multi-dimensional speech features related to the task are extracted for the speech output of different modules, so as to obtain the early symptoms that can be used to screen AD Sensitive feature set.
  • the speech feature extraction unit includes an automatic segmentation module of speech signals, a signal preprocessing module, an automatic speech recognition module, a text analysis module, and a speech analysis module.
  • the voice data recorded in each module of the main test unit is first segmented by the automatic segmentation module, and then recognized as text by the signal preprocessing and automatic speech recognition module.
  • the text analysis and speech analysis modules are calculated based on the results of speech recognition And estimate their corresponding phonetic features, vocabulary features, grammatical features, and pragmatic features
  • the speech features used in the present invention can be roughly divided into four categories: phonetic features, lexical features, grammatical features, and pragmatic features. Among them, some features of speech defects caused by multiple cognitive impairments may belong to multiple levels at the same time.
  • Phonetic features describe the deficiencies of language production at the sound level, which mainly include three aspects.
  • the first is the time to produce words, phonemes, syllables, the number of pauses in the language stream, the time of speech production and pauses in the language stream, repetition, etc.
  • These features can measure the fluency of the language stream, where repetition can also be based on its Judging the position in the speech stream to determine its cause. If it repeats at the beginning or end of the sentence, it can be considered that the patient has a problem in understanding; if it repeats in the sentence, it can be considered that the patient wants to repeat the previous content.
  • Lexical features mainly show the characteristics of language output at the level of vocabulary content.
  • Vocabulary can be classified by part-of-speech, and then the distribution of vocabulary can be analyzed according to the frequency of occurrence of each part of speech. At the same time, it can also be used to analyze which type of vocabulary the patient tends to use or does not tend to use. For example, the output language contains a large number of demonstrative pronouns, which means that the output may be ambiguous.
  • Another type of vocabulary semantic feature is used to measure the vocabulary richness and information density, and the measurement of this type of feature is mainly based on the symbol-to-symbol ratio. Class symbols refer to the different words used in the text corresponding to a paragraph of speech, and form symbols are all word forms. On the premise that the length of speech output is equal, the ratio of quasi-features and symbols reflects the richness of corpus vocabulary to some extent. The larger the symbol-to-symbol ratio, the greater the change in the vocabulary used and the lower the repetition rate.
  • Grammatical features show the characteristics of language output at the grammatical level, and are achieved by measuring the complexity of lexical and syntactic structures. Such features mainly include the occurrence frequency, relative proportion and average length of each grammatical component, the number and length of clauses, the height of the syntactic tree and the depth of the sentence; also include the examination of syntactic errors, such as errors in syntactic structure and incomplete sentences .
  • Pragmatic features reflect the cohesiveness and coherence of language production, and are variables at the discourse level. Cohesion is concerned with the relationship between sentences, and coherence can be divided into partial coherence and overall coherence. Local coherence refers to whether a sentence is coherent with the next sentence, and overall coherence refers to whether the content of a sentence is closely related to the theme of the entire text.
  • pragmatic features also include the information integrity of language output. You can determine the information that should appear in the output in advance, and then count the amount of information covered in the patient's output, such as the number of keywords in the language output. The positioning of information needs the help of grammatical analysis.
  • the recognition unit includes a feature selection module and a classification module; the feature selection module selects and optimizes multi-dimensional features to obtain a diagnosis-sensitive feature set; the classification module uses the optimized feature set and uses a multi-task deep confidence network algorithm (Multi- task (DBN), using task relevance to jointly improve the prediction of classification tasks; the network takes classification tasks as the main task, and MMSE and MoCA scores will be trained as related tasks to help improve the prediction performance of classification tasks. According to the prediction results, the diagnosis and screening of patients with early AD are finally completed.
  • DBN multi-task deep confidence network algorithm
  • the invention adopts a framework based on Multi-task Deep Learning (MDTL) to realize automatic classification of AD patients.
  • the structure includes two-level modules.
  • the upper-level module is a feature selection based on stable selection of multi-task deep learning
  • the lower-level module is an ELM or SVM classifier for classification.
  • the present invention introduces a stable feature selection technology, which repeats the Lasso estimation of different parameter ⁇ parameter values 50 times, and the probability of each parameter in the feature vector set is expressed as the sum of the frequencies that appear in these 50 Lasso estimations.
  • the threshold t that makes the steady state feature set parameters change little determines the feature set.
  • the invention uses the multi-task learning principle to build a multi-task deep confidence network (Multi-task DBN), and uses task correlation to jointly improve the prediction of classification tasks.
  • the network focuses on classification tasks, and MMSE and MoCA scores will be trained as related tasks to help improve the prediction performance of classification tasks.
  • the patient's language output can be identified. If the patient's speech feature set collected by the experimental task matches the standard feature set placed in the AD recognition model, the patient may be regarded as suspected of early AD.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Signal Processing (AREA)
  • Public Health (AREA)
  • General Health & Medical Sciences (AREA)
  • Epidemiology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Machine Translation (AREA)

Abstract

An early-stage AD speech auxiliary screening system aiming at mandarin Chinese. The system comprises a main body testing unit, a speech feature extracting unit, and a recognizing unit; the main body testing unit is used for testing a tested object; the speech feature extracting unit is used for extracting a speech feature of the tested object and storing the speech feature; the recognizing unit is used for recognizing the speech feature; and the main body testing unit comprises a spontaneous voice module, a picture describing module, a word fluency module, a poem reciting module, a sentence repeating module, a picture naming module, a span testing module, a cycle pronouncing module, and a picture matching module. For the system, from the aspect that an early-stage AD speech function is damaged, aiming at the research objective of AD early-stage screening of mandarin Chinese, a matched device relevant to early-stage AD speech specific features is provided, and the system has the advantages that costs are low, rich data can be obtained in real time, large sample acquisition can be easily implemented, remote acquisition and analysis can be implemented, etc.

Description

一种针对汉语普通话的早期AD言语辅助筛查系统An Early AD Speech Assisted Screening System for Mandarin Chinese 技术领域Technical field
本发明涉及一种针对汉语普通话的早期AD言语辅助筛查系统。The invention relates to an early AD speech auxiliary screening system for Mandarin Chinese.
背景技术Background technique
近十年来,研究新型的、无创的、经济的、适合于早期AD筛查的辅助诊断标志物,已成为AD早期诊断研究的热点问题。研究人员从认知损伤、生化诊断、神经电生理等不同角度提出一系列有可能成为AD早期筛查工具的辅助诊断方法,如言语测试、嗅觉测试、步态检测、视网膜成像检测、Aβ外周血液筛查、尿液AD7c-NTP蛋白检测、脑电分析(EEG)等。In the past decade, research on new, non-invasive, economical, and suitable diagnostic markers for early AD screening has become a hot issue in the early diagnosis of AD. Researchers have proposed a series of auxiliary diagnostic methods that may become AD early screening tools from different perspectives such as cognitive impairment, biochemical diagnosis, and neuroelectrophysiology, such as speech testing, olfactory testing, gait testing, retinal imaging testing, and Aβ peripheral blood. Screening, urine AD7c-NTP protein detection, electroencephalogram analysis (EEG), etc.
现有的早期AD筛查方案主要存在以下问题,阻碍了相关诊断方法在AD早期筛查检测及AD疾病进程评估中的进一步临床应用。The existing early AD screening program mainly has the following problems, which hinders the further clinical application of related diagnostic methods in the early detection and detection of AD and the evaluation of AD disease progression.
(1)成本较高,对环境设置及执行人员有专业性要求,难以大范围推广(1) The cost is high, and there are professional requirements for environmental settings and executives, which is difficult to promote on a large scale
目前,临床上对AD的检测方法有脑脊液(CSF)分析、神经影像学、神经心理量表测试三类方法。然而,由于成本高昂、侵入性检查的普及率低及高门槛准入等局限性,上述检测方法很难作为大范围早期AD筛查的诊断工具。At present, there are three methods for clinical detection of AD: cerebrospinal fluid (CSF) analysis, neuroimaging, and neuropsychological scale testing. However, due to limitations such as high cost, low penetration rate of invasive tests, and high threshold access, the above detection methods are difficult to use as a diagnostic tool for large-scale early AD screening.
(2)缺乏对不同言语任务和所提取的言语特征的合理利用与系统考察(2) Lack of reasonable use and systematic investigation of different speech tasks and extracted speech features
当前相关技术在手段或方案上仍存在一些不足,例如样本量较少、特征提取方案和特征选择方法较为单一等。所采用的检测任务、辅助材料以及基于任务所提取的数据特征也不尽相同,具有较大的差异性。由于早期AD语言障碍体现在语音、语义提取等多个方面,单一言语任务不能全面了解早期AD患者的言语特异性。The current related technologies still have some deficiencies in means or schemes, such as small sample size, single feature extraction scheme and feature selection method. The detection tasks, auxiliary materials used and the data features extracted based on the tasks are also different and have great differences. Since early AD language barriers are reflected in many aspects such as speech and semantic extraction, a single speech task cannot fully understand the speech specificity of early AD patients.
发明内容Summary of the invention
为解决上述背景技术中存在的问题,本发明提出一种针对汉语普通话的早期AD言语辅助筛查系统,该系统从早期AD语言功能受损角度出发,针对汉语普通话的AD早期筛查这一研究目标,提供关于早期AD言语特异性特征的提取 方案及配套装置,具有低成本、可实时获取丰富数据、易实现大样本采集(可用于大数据分析)、可以实现远程采集分析等优势,在大范围AD早期筛查和长时间病程管理方面都有着重大应用潜力。In order to solve the above-mentioned problems in the background art, the present invention proposes an early AD speech auxiliary screening system for Mandarin Chinese. This system aims at early AD AD screening of Mandarin Chinese from the perspective of impaired early AD language function The goal is to provide an extraction solution and supporting devices for early AD speech-specific features. It has the advantages of low cost, real-time access to rich data, easy realization of large sample collection (can be used for big data analysis), and remote acquisition and analysis. The scope of AD early screening and long-term course management have significant application potential.
本发明解决上述问题的技术方案是:一种针对汉语普通话的早期AD言语辅助筛查系统,其特殊之处在于:The technical solution for solving the above problems of the present invention is: an early AD speech auxiliary screening system for Mandarin Chinese, and its special features are:
包括主体测试单元、言语特征提取单元和识别单元;Including subject test unit, speech feature extraction unit and recognition unit;
所述主体测试单元用于对被测对象进行测试;所述言语特征提取单元用于提取被测对象的言语特征,并对该言语特征进行存储;所述识别单元用于对言语特征进行言语特征进行识别;The subject test unit is used to test the tested object; the speech feature extraction unit is used to extract the speech feature of the tested object and store the speech feature; the recognition unit is used to perform speech feature on the speech feature Identify
所述主体测试单元包括自发语音模块、图片描述模块、词语流利度模块、诗词朗诵模块、句子复述模块、图片命名模块、跨度测试模块、轮转发音模块和图片匹配模块;The subject test unit includes a spontaneous voice module, a picture description module, a word fluency module, a poetry recitation module, a sentence restatement module, a picture naming module, a span test module, a carousel pronunciation module, and a picture matching module;
所述言语特征提取单元包括语音信号的自动切分模块、信号预处理模块、自动语音识别模块、文本分析模块、语音分析模块;主体测试单元中各个模块中记录得到的语音数据首先经过自动切分模块得到分段的语音,经过信号的预处理和自动语音识别模块识别为文本,文本分析模块和语音分析模块根据语音识别的结果计算和估计其对应的语音特征、词汇特征、语法特征、语用特征;The speech feature extraction unit includes an automatic segmentation module for speech signals, a signal preprocessing module, an automatic speech recognition module, a text analysis module, and a speech analysis module; the speech data recorded in each module in the main test unit is first automatically segmented The module gets segmented speech, which is recognized as text after signal preprocessing and automatic speech recognition module. The text analysis module and speech analysis module calculate and estimate their corresponding speech features, vocabulary features, grammatical features, pragmatics according to the results of speech recognition feature;
所述识别单元包括特征选择模块和分类模块;特征选择模块对多维特征进行选择和优化,得到对诊断敏感的特征集;分类模块使用优化的特征集,并运用多任务深度置信网络算法(Multi-task DBN),利用任务相关性联合提高对分类任务的预测;该网络以分类任务为主任务,而MMSE和MoCA评分将作为相关任务进行训练,以帮助提高分类任务的预测性能;根据预测结果最终完成对早期AD患者的诊断和筛查。The recognition unit includes a feature selection module and a classification module; the feature selection module selects and optimizes multi-dimensional features to obtain a diagnosis-sensitive feature set; the classification module uses the optimized feature set and uses a multi-task deep confidence network algorithm (Multi- task (DBN), using task relevance to jointly improve the prediction of classification tasks; the network uses classification tasks as the main task, and MMSE and MoCA scores will be trained as related tasks to help improve the prediction performance of classification tasks; based on the prediction results Complete the diagnosis and screening of patients with early AD.
进一步地,上述自发语音模块主要测试被测对象的语言词汇产出与语句连贯性特征。Further, the above-mentioned spontaneous speech module mainly tests the language vocabulary output and sentence coherence characteristics of the tested object.
进一步地,上述图片描述模块提供至少一张记事图片,被测对象在给定时 间内通过自己的语言来叙述图片中故事的情节。主要测试被测对象的语言产出情况(包括语言流畅与词汇量)。Further, the above picture description module provides at least one memo picture, and the tested object narrates the plot of the story in the picture in its own language within a given time. Mainly test the language output of the tested objects (including language fluency and vocabulary).
进一步地,上述词语流利度模块提供至少一个主题,被测对象在1分钟内讲出尽可能多的相关词语。Further, the above-mentioned word fluency module provides at least one theme, and the tested object speaks as many related words as possible within 1 minute.
进一步地,上述诗词朗诵模块提供至少一首古诗词,用于检测被测对象言语的韵律特征。Further, the above-mentioned poetry recitation module provides at least one ancient poetry word, which is used to detect the prosody feature of the tested object's speech.
进一步地,上述句子复述模块提供多个语句,被测对象对看到的句子进行重复,本模块可以检测被测对象的言语产出情况。Further, the above sentence retelling module provides multiple sentences, and the tested object repeats the seen sentence. This module can detect the speech output of the tested object.
进一步地,上述图片命名模块提供按照随机顺序出现的多张图片,被测对象说出图片中的事物,本模块可以评估被测对象是否具有词汇语义层面的缺陷,同时还可以评估被测对象在词汇选择方面是否存在困难。Further, the above picture naming module provides multiple pictures that appear in random order, and the tested object speaks the things in the picture. This module can assess whether the tested object has defects in the semantic level of the word, and can also assess the measured object in Whether there are difficulties in vocabulary selection.
进一步地,上述跨度测试模块提供依次出现读音相近的2至5个汉字,要求被测对象复述自己看到的内容;根据模块数据判断复述的正确性,并分析音节跨度为2-5时的正确率。对于短期记忆而言,相似性小的语音比相似性大的语音更容易记忆,本模块可据此检测被测对象的言语感知情况。本模块涉及的所有字形在测试时依次以双音节、三音节、四音节、五音节为一组,按照阴平、阳平、上声、去声、阴平、阳平、上声、去声的顺序播放。Further, the above-mentioned span test module provides 2 to 5 Chinese characters with similar pronunciations in turn, requiring the subject to repeat the content they saw; judging the correctness of the repeat based on the module data, and analyzing the correctness when the syllable span is 2-5 rate. For short-term memory, voices with low similarity are easier to remember than voices with high similarity, and this module can detect the speech perception of the test subject based on this. All the glyphs involved in this module are tested in the order of two syllables, three syllables, four syllables, and five syllables, and they are played in the order of Yinping, Yangping, Shangsheng, Desheng, Yinping, Yangping, Shangsheng, and Desheng.
进一步地,上述轮转发音模块提供至少一组三音节音串,要求被测对象重复三遍。用于检测被测对象对音节串联内部的协调运动是否过慢、构音功能是否异常。Further, the above-mentioned carousel pronunciation module provides at least one set of three-syllable syllable strings, which requires the subject to be repeated three times. It is used to detect whether the coordinated movement of the measured object to the syllable series is too slow, and whether the sound composition function is abnormal.
进一步地,上述图片匹配模块提供多个页面;每个页面包括至少三幅图,其中有两幅图具有相关性。要求被测对象将具有相关性两幅图选出。Further, the above picture matching module provides multiple pages; each page includes at least three pictures, of which two pictures have relevance. The measured object is required to select two pictures with correlation.
本发明的优点:The advantages of the invention:
(1)从早期AD语言功能受损角度出发,针对汉语普通话的AD早期筛查这一研究目标,提供关于早期AD言语特异性特征的提取方案、识别方法及配套装置,具有低成本、可实时获取丰富数据、易实现大样本采集(可用于大数据分析)、可以实现远程采集分析等优势,在大范围AD早期筛查和长时间病程管理方面都有着重大应用潜力;(1) From the perspective of the impairment of early AD language function, for the research goal of early AD screening of Mandarin Chinese, it provides extraction solutions, recognition methods and supporting devices for early AD speech-specific features, with low cost and real-time Obtaining rich data, easy to realize large sample collection (available for big data analysis), and enabling remote collection and analysis have great application potential in large-scale AD early screening and long-term disease management;
(2)面向患者的环节以多种言语认知任务为主,采用自动言语分析技术提取AD早期言语特征集,具有操作便捷、无侵入等特点,可在患者舒适环境下采集,在用户体验方面有着独特优势;(2) The patient-oriented link is mainly based on a variety of speech recognition tasks. It uses automatic speech analysis technology to extract AD early speech feature sets. It has the characteristics of convenient operation and no intrusion. It can be collected under the comfort of the patient. In terms of user experience Has unique advantages;
(3)提出一种言语特异性声学特征选择与降维的同步优化方案,该方案引入多任务学习思想,结合稳定特征选择和深度学习来实现对特征选择与分类特征子空间的同步优化;(3) A synchronization optimization scheme for speech-specific acoustic feature selection and dimensionality reduction is proposed. This scheme introduces the multi-task learning idea, combined with stable feature selection and deep learning to achieve the synchronization optimization of feature selection and classification feature subspace;
(4)易与其他方法相结合,组成多模态辅助诊断标记物,进而作为高敏感性的AD早期筛查综合指标,提示AD高风险,提高早期AD患者的发现率,有助于实现对患者的尽早干预,为相关治疗争取更多有效时间。(4) It can be easily combined with other methods to form multi-modal auxiliary diagnostic markers, which can be used as a comprehensive index for high-sensitivity early screening of AD, suggesting high risk of AD, increasing the discovery rate of early AD patients, and helping to achieve Intervene as early as possible for patients to strive for more effective time for related treatments.
附图说明BRIEF DESCRIPTION
图1是本发明针对汉语普通话的早期AD言语辅助筛查系统实施例的流程图;1 is a flowchart of an embodiment of an early AD speech assisted screening system for Mandarin Chinese in the present invention;
图2是本发明针对汉语普通话的早期AD言语辅助筛查系统的结构示意图;2 is a schematic structural diagram of an early AD speech assisted screening system for Mandarin Chinese in the present invention;
图3是图片描述模块中的记事图片;Figure 3 is a picture of the note in the picture description module;
图4是图片命名模块中的图片;Figure 4 is the picture in the picture naming module;
图5是图片匹配模块中的页面示意图。5 is a schematic diagram of pages in the image matching module.
具体实施方式detailed description
为使本发明实施方式的目的、技术方案和优点更加清楚,下面将结合本发明实施方式中的附图,对本发明实施方式中的技术方案进行清楚、完整地描述,显然,所描述的实施方式是本发明一部分实施方式,而不是全部的实施方式。基于本发明中的实施方式,本领域普通技术人员在没有作出创造性劳动前提下 所获得的所有其他实施方式,都属于本发明保护的范围。因此,以下对在附图中提供的本发明的实施方式的详细描述并非旨在限制要求保护的本发明的范围,而是仅仅表示本发明的选定实施方式。基于本发明中的实施方式,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施方式,都属于本发明保护的范围。To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be described clearly and completely in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments It is a part of the embodiments of the present invention, but not all the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by a person of ordinary skill in the art without creative work fall within the protection scope of the present invention. Therefore, the following detailed description of the embodiments of the present invention provided in the drawings is not intended to limit the scope of the claimed invention, but merely represents selected embodiments of the present invention. Based on the embodiments in the present invention, all other embodiments obtained by a person of ordinary skill in the art without making creative work fall within the protection scope of the present invention.
语言障碍是AD早期的重要特征之一,会导致患者自发语音的韵律、发音周期、发音质量以及言语处理速率等特征发生改变,其临床表现为相对非流利的自发性言语、找词困难、语速减慢、长时间的停顿及音位错误等。本发明采用客观分析手段对早期AD语言障碍在言语信息中的特异性表现进行量化,通过自动言语分析(Automatic Speech Analyses,ASA)和自动语音识别等技术有效检测相关言语特征,以期实现针对汉语普通话使用者的AD早期自动筛查。Language barrier is one of the important features of AD in the early stage, which will lead to changes in the characteristics of the patient's spontaneous speech rhythm, pronunciation cycle, pronunciation quality and speech processing rate. Its clinical manifestations are relatively non-fluent spontaneous speech, difficulty in finding words, language Slow speed, long pauses and phoneme errors. The present invention uses objective analysis methods to quantify the specific performance of early AD language barriers in speech information, and effectively detects relevant speech features through automatic speech analysis (Automatic Speech Analysis, ASA) and automatic speech recognition technologies, with a view to realizing Chinese Mandarin Early automatic screening of users' AD.
参见图1-图2,一种针对汉语普通话的早期AD言语辅助筛查系统,包括主体测试单元、言语特征提取单元和识别单元。Referring to FIGS. 1-2, an early AD speech assisted screening system for Mandarin Chinese includes a subject test unit, a speech feature extraction unit and a recognition unit.
所述主体测试单元用于对被测对象进行测试;所述言语特征提取单元用于提取被测对象的言语特征,并对该言语特征进行存储;所述识别单元用于对言语特征进行言语特征进行识别。The subject test unit is used to test the tested object; the speech feature extraction unit is used to extract the speech feature of the tested object and store the speech feature; the recognition unit is used to perform speech feature on the speech feature To identify.
所述主体测试单元包括自发语音模块、图片描述模块、词语流利度模块、诗词朗诵模块、句子复述模块、图片命名模块、跨度测试模块、轮转发音模块和图片匹配模块。The subject test unit includes a spontaneous voice module, a picture description module, a word fluency module, a poetry recitation module, a sentence restatement module, a picture naming module, a span test module, a carousel pronunciation module and a picture matching module.
进一步地,上述自发语音模块主要测试被测对象的语言词汇产出与语句连贯性特征。Further, the above-mentioned spontaneous speech module mainly tests the language vocabulary output and sentence coherence characteristics of the tested object.
具体地,本模块要求患者围绕“自我介绍”这一主题完成2分钟左右的情感自发语音,例如介绍姓名、年龄、家庭、工作、兴趣爱好。模块数据可提取语言词汇产出与语句连贯性特征。Specifically, this module requires patients to complete a spontaneous emotional speech about 2 minutes around the theme of "self-introduction", such as introducing name, age, family, work, and hobbies. The module data can extract the features of language vocabulary output and sentence coherence.
进一步地,上述图片描述模块提供至少一张记事图片,被测对象在给定时间内通过自己的语言来叙述图片中故事的情节。主要测试被测对象的语言产出 情况(包括语言流畅与词汇量)。Further, the above picture description module provides at least one memo picture, and the tested object narrates the plot of the story in the picture in its own language within a given time. Mainly test the language output of the tested objects (including language fluency and vocabulary).
具体地,本模块要求患者观察一张没有文字的记事图片,并在给定时间(1-2分钟)内通过自己的语言来叙述图片中故事的情节。在本模块中,患者需要对图片故事中的人物和事件加以理解,在产出中呈现出一个有条理的框架。因此,模块可以测试患者的语言产出情况(包括语言流畅与词汇量)。本模块共包括3张图片(例如图3);其中,第一张图片的规定描述时长为2分钟左右,后两张图片的规定描述时长为1分钟左右。根据患者的实际表现,具体时长可酌情出入。Specifically, this module requires patients to observe a memo picture without text, and to narrate the plot of the story in the picture in their own language within a given time (1-2 minutes). In this module, the patient needs to understand the characters and events in the picture story and present a structured framework in the output. Therefore, the module can test the patient's language output (including language fluency and vocabulary). This module includes a total of 3 pictures (for example, Figure 3); among them, the prescribed description duration of the first picture is about 2 minutes, and the prescribed description duration of the latter two pictures is about 1 minute. According to the actual performance of the patient, the specific duration can be discretionary.
进一步地,上述词语流利度模块提供至少一个主题,被测对象在1分钟内讲出尽可能多的相关词语。Further, the above-mentioned word fluency module provides at least one theme, and the tested object speaks as many related words as possible within 1 minute.
具体地,所述主题可以是水果,患者说出每类词的间隔时间,可以分析患者的长时记忆功能。Specifically, the subject may be fruit, and the interval between the words spoken by the patient may be analyzed, and the long-term memory function of the patient may be analyzed.
进一步地,上述诗词朗诵模块提供至少一首古诗词,用于检测被测对象言语的韵律特征。Further, the above-mentioned poetry recitation module provides at least one ancient poetry word, which is used to detect the prosody feature of the tested object's speech.
古诗词具有良好的汉语韵律结构,可以作为检测汉语使用者发音韵律缺陷的实验材料。具体地,本模块共包含6首简单的诗词供患者阅读,例如:白日依山尽,黄河入海流。欲穷千里目,更上一层楼。Ancient poems have a good Chinese prosody structure and can be used as experimental materials to detect defects in prosody of Chinese users. Specifically, this module contains a total of 6 simple poems for patients to read, for example: Bai Ri Yi Shan exhausted, the Yellow River into the ocean current. Want to be a thousand miles away, to a higher level.
进一步地,上述句子复述模块提供多个语句,被测对象对看到的句子进行重复,本模块可以检测患者的言语产出情况。Further, the above sentence restatement module provides multiple sentences, and the tested object repeats the seen sentence. This module can detect the speech output of the patient.
具体地,本模块要求患者对看到的句子进行重复,模块数据用于计算患者在复述句子过程中字词之间的间隔时间。一般来说,人们对短词的记忆优于长词,长词需要更长的演练时间(rehearsal times)。因此,本模块可以检测患者的言语产出情况。本模块共包含14个句子,句子难度逐渐增加,例如:明天星期天。Specifically, this module requires patients to repeat the sentences they see. The module data is used to calculate the interval between words in the process of repeating sentences. Generally speaking, people remember short words better than long words, and long words require longer rehearsal times. Therefore, this module can detect the patient's speech output. This module contains a total of 14 sentences, the difficulty of the sentence gradually increases, for example: tomorrow Sunday.
进一步地,上述图片命名模块提供按照随机顺序出现的多张图片,被测对象说出图片中的事物,本模块可以评估患者是否具有词汇语义层面的缺陷,同时还可以评估患者在词汇选择方面是否存在困难。Further, the above picture naming module provides multiple pictures that appear in a random order, and the tested object speaks the things in the picture. This module can assess whether the patient has defects in the semantic level of the vocabulary, and can also assess whether the patient is in terms of vocabulary selection. There are difficulties.
具体地,本模块中将按照随机顺序出现36张图片(例如图4),要求患者说出图片中的事物,实验时间无限制,患者可自行控制切换图片。本模块可以评估患者是否具有词汇语义层面的缺陷,同时还可以评估患者在词汇选择方面是否存在困难。命名示例:长颈鹿。Specifically, 36 pictures will appear in a random order in this module (for example, Figure 4). The patient is required to say something in the picture. The experiment time is unlimited, and the patient can control the picture switching. This module can assess whether the patient has defects in the semantic level of vocabulary, and can also assess whether the patient has difficulty in vocabulary selection. Naming example: Giraffe.
进一步地,上述跨度测试模块提供依次出现读音相近的2至5个汉字(例如当跨度为2时,出现“安”“班”),要求被测对象复述自己看到的内容;根据模块数据判断复述的正确性,并分析音节跨度为2-5时的正确率。对于短期记忆而言,相似性小的语音比相似性大的语音更容易记忆,本模块可据此检测患者的言语感知情况。本模块涉及的所有字形在测试时依次以双音节、三音节、四音节、五音节为一组,按照阴平、阳平、上声、去声、阴平、阳平、上声、去声的顺序播放。Further, the above-mentioned span test module provides 2 to 5 Chinese characters with similar pronunciations in sequence (for example, when the span is 2, "an" and "class" appear), and the tested object is required to retell what they see; judge according to the module data Repeat the correctness, and analyze the correct rate when the syllable span is 2-5. For short-term memory, voices with low similarity are easier to remember than voices with high similarity, and this module can detect the patient's speech perception based on this. All the glyphs involved in this module are tested in the order of two syllables, three syllables, four syllables, and five syllables, and they are played in the order of Yinping, Yangping, Shangsheng, Desheng, Yinping, Yangping, Shangsheng, and Desheng.
进一步地,上述轮转发音模块提供至少一组三音节音串,,例如依次发一组pa-ta-ka三音节音串,要求被测对象重复三遍。用于检测被测对象对音节串联内部的协调运动是否过慢、构音功能是否异常。Further, the above-mentioned carousel pronunciation module provides at least one set of three-syllable syllable strings, for example, a set of pa-ta-ka three-syllable syllable strings are sequentially issued, and the object to be tested is required to repeat three times. It is used to detect whether the coordinated movement of the measured object to the syllable series is too slow, and whether the sound composition function is abnormal.
进一步地,上述图片匹配模块提供多个页面;每个页面包括至少三幅图,其中有两幅图具有相关性。要求被测对象将具有相关性两幅图选出。Further, the above picture matching module provides multiple pages; each page includes at least three pictures, of which two pictures have relevance. The measured object is required to select two pictures with correlation.
具体地,本模块的每个页面会给出三幅图(例如图5),其中,上排的图与下排中的一幅图具有明显的相关性,要求患者通过键盘上的左右键来选择下排中相关性最高的图片。每一次实验完成后,都会在文件夹中生成一个表格;表格记录了患者的图片选择情况,实验完成后可以对这些记录进行分析反应时与正确率。AD患者对事物的反应时间明显大于正常人,对事物的认知能力也不如正常人,因此本模块数据可通过分析图片选择正确率与反应时间来快速直观地区分患者与正常人。Specifically, each page of this module will give three pictures (for example, Figure 5), where the picture in the upper row has a clear correlation with the picture in the lower row, requiring the patient to use the left and right keys on the keyboard to Select the most relevant picture in the lower row. After each experiment is completed, a table will be generated in the folder; the table records the patient's picture selection, after the experiment is completed, these records can be analyzed and responded to the correct rate. The reaction time of AD patients to things is obviously longer than that of normal people, and the cognitive ability of things is not as good as normal people. Therefore, the data in this module can quickly and intuitively distinguish between patients and normal people by analyzing the picture selection correct rate and reaction time.
多任务有机结合能够从多个侧面反映AD引发的语言障碍,因此,本发明建立了一种多任务、多维言语特征的自动言语异常特征提取框架,以期基于AD患者与正常人的言语特征差异完成AD早期筛查。上述主体测试单元的认知发 音任务结束后,对各模块生成的录音等文件信息进行分析,分别针对不同模块的言语产出提取任务相关的多维言语特征,从而获得可用于筛查AD早期症状的敏感特征集。The organic combination of multi-tasking can reflect the language barrier caused by AD from multiple sides. Therefore, the present invention establishes a multi-task, multi-dimensional speech feature automatic speech abnormal feature extraction framework, with a view to completing based on the difference between the speech features of AD patients and normal people Early screening for AD. After the cognitive pronunciation task of the above subject test unit is completed, the recording and other file information generated by each module is analyzed, and the multi-dimensional speech features related to the task are extracted for the speech output of different modules, so as to obtain the early symptoms that can be used to screen AD Sensitive feature set.
所述言语特征提取单元包括语音信号的自动切分模块、信号预处理模块、自动语音识别模块、文本分析模块、语音分析模块。主体测试单元中各个模块中记录得到的语音数据首先经过自动切分模块得到分段的语音,经过信号的预处理和自动语音识别模块识别为文本,文本分析和语音分析模块根据语音识别的结果计算和估计其对应的语音特征、词汇特征、语法特征、语用特征The speech feature extraction unit includes an automatic segmentation module of speech signals, a signal preprocessing module, an automatic speech recognition module, a text analysis module, and a speech analysis module. The voice data recorded in each module of the main test unit is first segmented by the automatic segmentation module, and then recognized as text by the signal preprocessing and automatic speech recognition module. The text analysis and speech analysis modules are calculated based on the results of speech recognition And estimate their corresponding phonetic features, vocabulary features, grammatical features, and pragmatic features
根据语言学的范畴划分,本发明所采用的言语特征可以大致分为四类:语音特征、词汇特征、语法特征和语用特征。其中,某些由多个认知功能损害所造成言语缺陷特征,有可能同时属于以上多个层面。According to the classification of linguistics, the speech features used in the present invention can be roughly divided into four categories: phonetic features, lexical features, grammatical features, and pragmatic features. Among them, some features of speech defects caused by multiple cognitive impairments may belong to multiple levels at the same time.
语音特征描述了语言产出在声音层面的缺陷,主要包括三个方面。首先是产出词语、音素、音节的时间,语流中停顿的次数,语流中言语产出和停顿的时间,重复等,这些特征均可以衡量语流的流畅程度,其中重复还可以根据其在语流中的位置来判定其产生的原因。如果重复出现在句子的开头或结尾,可以认为患者存在理解上的问题;如果重复出现在句中,则可以认为患者想重述之前的内容。语音特征还包括梅尔频率倒谱系数、基频、音长等,其中梅尔频率倒谱系数描写了声能与频率之间的对数关系,更符合人耳对声音的感知,其高低反映了声音在声道中的回响程度。这些特征可以用来描写语音中的韵律信息,如音调轮廓、重音等。除此之外,语音错误,即发出不符合语言发音规则的语音,和没有发全音的单词也属于语音特征。Phonetic features describe the deficiencies of language production at the sound level, which mainly include three aspects. The first is the time to produce words, phonemes, syllables, the number of pauses in the language stream, the time of speech production and pauses in the language stream, repetition, etc. These features can measure the fluency of the language stream, where repetition can also be based on its Judging the position in the speech stream to determine its cause. If it repeats at the beginning or end of the sentence, it can be considered that the patient has a problem in understanding; if it repeats in the sentence, it can be considered that the patient wants to repeat the previous content. Speech features also include Mel frequency cepstrum coefficient, fundamental frequency, sound length, etc., where the Mel frequency cepstrum coefficient describes the logarithmic relationship between sound energy and frequency, which is more in line with the human ear's perception of sound, and its height reflects The degree of reverberation of the sound in the channel. These features can be used to describe prosody information in speech, such as tone contours, stress, etc. In addition, phonetic errors, that is, sounds that do not conform to the pronunciation rules of the language, and words that do not sound full sounds are also phonetic features.
词汇特征主要表现了语言产出在词汇内容层面的特征。词汇可以通过词性来进行分类,然后依据各词类出现的频率来分析词汇的分布,同时也可以通过该频率来分析患者倾向/不倾向使用哪类词汇。例如,产出的语言中含有大量指示代词,这意味着该产出可能语义模糊。另一类词汇语义特征用来衡量词汇的丰富度和信息密度,这类特征的测量以类符形符比为主。类符指一段言语对 应的文本中所使用的不同词语,而形符即所有的词形。在言语产出长度相等的前提下,类符形符比值在一定程度上体现语料库词汇的丰富程度。类符形符比值越大,说明使用的词汇变化大,词汇重复率低。Lexical features mainly show the characteristics of language output at the level of vocabulary content. Vocabulary can be classified by part-of-speech, and then the distribution of vocabulary can be analyzed according to the frequency of occurrence of each part of speech. At the same time, it can also be used to analyze which type of vocabulary the patient tends to use or does not tend to use. For example, the output language contains a large number of demonstrative pronouns, which means that the output may be ambiguous. Another type of vocabulary semantic feature is used to measure the vocabulary richness and information density, and the measurement of this type of feature is mainly based on the symbol-to-symbol ratio. Class symbols refer to the different words used in the text corresponding to a paragraph of speech, and form symbols are all word forms. On the premise that the length of speech output is equal, the ratio of quasi-features and symbols reflects the richness of corpus vocabulary to some extent. The larger the symbol-to-symbol ratio, the greater the change in the vocabulary used and the lower the repetition rate.
语法特征表现了语言产出在语法层面的特征,通过衡量词法及句法的复杂程度来实现。这类特征主要包括各语法成分的出现频率、相对比例和平均长度,从句的数量及长度,句法树的高度以及句子深度;也包括对句法错误的考察,例如句法结构的错误和不完整的句子。Grammatical features show the characteristics of language output at the grammatical level, and are achieved by measuring the complexity of lexical and syntactic structures. Such features mainly include the occurrence frequency, relative proportion and average length of each grammatical component, the number and length of clauses, the height of the syntactic tree and the depth of the sentence; also include the examination of syntactic errors, such as errors in syntactic structure and incomplete sentences .
语用特征体现了语言产出的衔接性与连贯性,属于语篇层面的变量。衔接性所关注的是句与句之间的关系,连贯性则可以分为局部连贯和整体连贯。局部连贯是指一句话与下一句话之间是否连贯,整体连贯则是指一句话的内容是否与整个语篇的主题紧密相关。此外,语用特征还包括语言产出的信息完整性。可以事先确定产出中应出现的信息,然后统计患者产出中所覆盖到的信息数量,例如统计语言产出中关键词的出现数量。对信息的定位则需借助语法分析。Pragmatic features reflect the cohesiveness and coherence of language production, and are variables at the discourse level. Cohesion is concerned with the relationship between sentences, and coherence can be divided into partial coherence and overall coherence. Local coherence refers to whether a sentence is coherent with the next sentence, and overall coherence refers to whether the content of a sentence is closely related to the theme of the entire text. In addition, pragmatic features also include the information integrity of language output. You can determine the information that should appear in the output in advance, and then count the amount of information covered in the patient's output, such as the number of keywords in the language output. The positioning of information needs the help of grammatical analysis.
所述识别单元包括特征选择模块和分类模块;特征选择模块对多维特征进行选择和优化,得到对诊断敏感的特征集;分类模块使用优化的特征集,并运用多任务深度置信网络算法(Multi-task DBN),利用任务相关性联合提高对分类任务的预测;该网络以分类任务为主任务,而MMSE和MoCA评分将作为相关任务进行训练,以帮助提高分类任务的预测性能。根据预测结果最终完成对早期AD患者的诊断和筛查。The recognition unit includes a feature selection module and a classification module; the feature selection module selects and optimizes multi-dimensional features to obtain a diagnosis-sensitive feature set; the classification module uses the optimized feature set and uses a multi-task deep confidence network algorithm (Multi- task (DBN), using task relevance to jointly improve the prediction of classification tasks; the network takes classification tasks as the main task, and MMSE and MoCA scores will be trained as related tasks to help improve the prediction performance of classification tasks. According to the prediction results, the diagnosis and screening of patients with early AD are finally completed.
本发明采用一种基于多任务深度学习(Multi-task Deep Learning,MDTL)的框架实现AD患者的自动分类。该结构包括两级模块,上级模块为基于稳定选择的多任务深度学习的特征选择,下级模块为ELM或SVM分类器进行分类。首先,我们将对上述与AD语言脑功能网络相关高维声学特征进行归一化处理,应用Lasso估计和稳定特征选择(Stability Selection)技术进一步优化,实现对诊断敏感的特征集。其次,本发明引入稳定特征选择技术,将重复50次不同参数λ参数取值的Lasso估计,特征向量集中的每个参数的可能性表述为在这50次 Lasso估计中出现的频率之和,选择使得稳态特征集参数变化很小的阈值t确定特征集。最后,本发明利用多任务学习(Multi-task Learning)原理,构建一种多任务深度置信网络(Multi-task DBN),利用任务相关性联合提高对分类任务的预测。该网络以分类任务为主任务,而MMSE和MoCA评分将作为相关任务进行训练,以帮助提高分类任务的预测性能。The invention adopts a framework based on Multi-task Deep Learning (MDTL) to realize automatic classification of AD patients. The structure includes two-level modules. The upper-level module is a feature selection based on stable selection of multi-task deep learning, and the lower-level module is an ELM or SVM classifier for classification. First, we will normalize the above high-dimensional acoustic features related to the AD language brain function network, and use Lasso estimation and stability feature selection (Stability) Selection technology to further optimize to realize the diagnosis-sensitive feature set. Secondly, the present invention introduces a stable feature selection technology, which repeats the Lasso estimation of different parameter λ parameter values 50 times, and the probability of each parameter in the feature vector set is expressed as the sum of the frequencies that appear in these 50 Lasso estimations. The threshold t that makes the steady state feature set parameters change little determines the feature set. Finally, the invention uses the multi-task learning principle to build a multi-task deep confidence network (Multi-task DBN), and uses task correlation to jointly improve the prediction of classification tasks. The network focuses on classification tasks, and MMSE and MoCA scores will be trained as related tasks to help improve the prediction performance of classification tasks.
因此,基于言语特征集的具体匹配结果,可以实现对患者语言产出情况的识别。如果经实验任务采集到的患者言语特征集与AD识别模型中置入的标准特征集匹配度较高,则可以认为该患者疑似早期AD。Therefore, based on the specific matching results of the speech feature set, the patient's language output can be identified. If the patient's speech feature set collected by the experimental task matches the standard feature set placed in the AD recognition model, the patient may be regarded as suspected of early AD.
以上所述仅为本发明的实施例,并非以此限制本发明的保护范围,凡是利用本发明说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的系统领域,均同理包括在本发明的保护范围内。The above are only embodiments of the present invention, and are not intended to limit the scope of protection of the present invention. Any equivalent structure or equivalent process transformation made by the description and drawings of the present invention, or directly or indirectly used in other related In the field of systems, the same reason is included in the protection scope of the present invention.

Claims (10)

  1. 一种针对汉语普通话的早期AD言语辅助筛查系统,其特征在于:An early AD speech assisted screening system for Mandarin Chinese, which is characterized by:
    包括主体测试单元、言语特征提取单元和识别单元;Including subject test unit, speech feature extraction unit and recognition unit;
    所述主体测试单元用于对被测对象进行测试;所述言语特征提取单元用于提取被测对象的言语特征,并对该言语特征进行存储;所述识别单元用于对言语特征进行言语特征进行识别;The subject test unit is used to test the tested object; the speech feature extraction unit is used to extract the speech feature of the tested object and store the speech feature; the recognition unit is used to perform speech feature on the speech feature Identify
    所述主体测试单元包括图片描述模块、词语流利度模块、诗词朗诵模块、句子复述模块、图片命名模块、跨度测试模块、轮转发音模块和图片匹配模块;The subject test unit includes a picture description module, a word fluency module, a poetry recitation module, a sentence repetition module, a picture naming module, a span test module, a carousel pronunciation module and a picture matching module;
    所述言语特征提取单元包括语音信号的自动切分模块、信号预处理模块、自动语音识别模块、文本分析模块、语音分析模块;主体测试单元中各个模块中记录得到的语音数据首先经过自动切分模块得到分段的语音,经过信号的预处理和自动语音识别模块识别为文本,文本分析模块和语音分析模块根据语音识别的结果计算和估计其对应的语音特征、词汇特征、语法特征、语用特征;The speech feature extraction unit includes an automatic segmentation module for speech signals, a signal preprocessing module, an automatic speech recognition module, a text analysis module, and a speech analysis module; the speech data recorded in each module in the main test unit is first automatically segmented The module gets segmented speech, which is recognized as text after signal preprocessing and automatic speech recognition module. The text analysis module and speech analysis module calculate and estimate their corresponding speech features, vocabulary features, grammatical features, pragmatics according to the results of speech recognition feature;
    所述识别单元包括特征选择模块和分类模块;特征选择模块对多维特征进行选择和优化,得到对诊断敏感的特征集;分类模块使用优化的特征集,并运用多任务深度置信网络算法,利用任务相关性联合提高对分类任务的预测;该网络以分类任务为主任务,MMSE和MoCA评分将作为相关任务进行训练,以帮助提高分类任务的预测性能;根据预测结果最终完成对早期AD患者的诊断和筛查。The recognition unit includes a feature selection module and a classification module; the feature selection module selects and optimizes multi-dimensional features to obtain a diagnosis-sensitive feature set; the classification module uses the optimized feature set and uses a multi-task deep confidence network algorithm to use the task Relevance jointly improves the prediction of classification tasks; the network takes classification tasks as the main task, and MMSE and MoCA scores will be trained as related tasks to help improve the prediction performance of classification tasks; the diagnosis of early AD patients is finally completed according to the prediction results And screening.
  2. 根据权利要求1所述的一种针对汉语普通话的早期AD言语辅 助筛查系统,其特征在于:所述主体测试单元还包括自发语音模块,上述自发语音模块主要测试被测对象的语言词汇产出与语句连贯性特征。An early AD speech assisted screening system for Mandarin Chinese according to claim 1, wherein the subject test unit further includes a spontaneous speech module, and the spontaneous speech module mainly tests the language vocabulary output of the tested object Coherence characteristics with sentences.
  3. 根据权利要求1所述的一种针对汉语普通话的早期AD言语辅助筛查系统,其特征在于:图片描述模块提供至少一张记事图片,被测对象在给定时间内通过自己的语言来叙述图片中故事的情节。An early AD speech assisted screening system for Mandarin Chinese according to claim 1, characterized in that the picture description module provides at least one memo picture, and the tested object narrates the picture in his own language within a given time The plot of the story.
  4. 根据权利要求1-3任一所述的一种针对汉语普通话的早期AD言语辅助筛查系统,其特征在于:词语流利度模块提供至少一个主题,被测对象在1分钟内讲出尽可能多的相关词语。An early AD speech assisted screening system for Mandarin Chinese as claimed in any one of claims 1 to 3, characterized in that: the word fluency module provides at least one topic, and the test subject speaks as much as possible within 1 minute Related words.
  5. 根据权利要求4所述的一种针对汉语普通话的早期AD言语辅助筛查系统,其特征在于:诗词朗诵模块提供至少一首古诗词,用于检测被测对象言语的韵律特征。An early AD speech assisted screening system for Mandarin Chinese as claimed in claim 4, characterized in that the poetry recitation module provides at least one ancient poetry word for detecting the prosodic features of the tested subject's speech.
  6. 根据权利要求5所述的一种针对汉语普通话的早期AD言语辅助筛查系统,其特征在于:句子复述模块提供多个语句,被测对象对看到的句子进行重复,用于检测被测对象的言语产出情况。An early AD speech assisted screening system for Mandarin Chinese as claimed in claim 5, characterized in that the sentence restatement module provides multiple sentences, and the tested object repeats the seen sentence to detect the tested object The output of speech.
  7. 根据权利要求6所述的一种针对汉语普通话的早期AD言语辅助筛查系统,其特征在于:图片命名模块提供按照随机顺序出现的多张图片,被测对象说出图片中的事物,本模块可以评估被测对象是否具有词汇语义层面的缺陷,同时还可以评估被测对象在词汇选择方面是否存在困难。An early AD speech assisted screening system for Mandarin Chinese according to claim 6, characterized in that: the picture naming module provides multiple pictures that appear in random order, and the tested object speaks the things in the picture. This module It is possible to assess whether the object under test has deficiencies in the semantic level of vocabulary, and at the same time it is possible to assess whether the object under test has difficulties in vocabulary selection.
  8. 根据权利要求7所述的一种针对汉语普通话的早期AD言语辅助筛查系统,其特征在于:跨度测试模块提供依次出现读音相近的2至5 个汉字,要求被测对象复述自己看到的内容;根据模块数据判断复述的正确性,并分析音节跨度为2-5时的正确率。An early AD speech assisted screening system for Mandarin Chinese as claimed in claim 7, characterized in that the span test module provides 2 to 5 Chinese characters with similar pronunciations in sequence, requiring the subject to repeat what he saw ; Judge the correctness of the retelling according to the module data, and analyze the correct rate when the syllable span is 2-5.
  9. 根据权利要求8所述的一种针对汉语普通话的早期AD言语辅助筛查系统,其特征在于:轮转发音模块提供至少一组三音节音串,要求被测对象重复三遍,用于检测被测对象对音节串联内部的协调运动是否过慢、构音功能是否异常。An early AD speech auxiliary screening system for Mandarin Chinese according to claim 8, characterized in that the carousel pronunciation module provides at least one set of three-syllable syllable strings, which requires the subject to be repeated three times for detecting the subject Whether the subject's coordinated movement within the syllable series is too slow, and whether the sound-forming function is abnormal.
  10. 根据权利要求9所述的一种针对汉语普通话的早期AD言语辅助筛查系统,其特征在于:图片匹配模块提供多个页面;每个页面包括至少三幅图,其中有两幅图具有相关性,要求被测对象将具有相关性两幅图选出。An early AD speech assisted screening system for Mandarin Chinese according to claim 9, characterized in that: the picture matching module provides multiple pages; each page includes at least three pictures, of which two pictures are related , Require the measured object to select two pictures with correlation.
PCT/CN2019/117033 2018-12-29 2019-11-11 Early-stage ad speech auxiliary screening system aiming at mandarin chinese WO2020134647A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811632037.0A CN109841231B (en) 2018-12-29 2018-12-29 Early AD (AD) speech auxiliary screening system for Chinese mandarin
CN201811632037.0 2018-12-29

Publications (1)

Publication Number Publication Date
WO2020134647A1 true WO2020134647A1 (en) 2020-07-02

Family

ID=66883600

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/117033 WO2020134647A1 (en) 2018-12-29 2019-11-11 Early-stage ad speech auxiliary screening system aiming at mandarin chinese

Country Status (2)

Country Link
CN (1) CN109841231B (en)
WO (1) WO2020134647A1 (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109841231B (en) * 2018-12-29 2020-09-04 深圳先进技术研究院 Early AD (AD) speech auxiliary screening system for Chinese mandarin
CN110379214A (en) * 2019-06-27 2019-10-25 武汉职业技术学院 A kind of Picture writing training method and device based on speech recognition
CN110728997B (en) * 2019-11-29 2022-03-22 中国科学院深圳先进技术研究院 Multi-modal depression detection system based on context awareness
CN112908317B (en) * 2019-12-04 2023-04-07 中国科学院深圳先进技术研究院 Voice recognition system for cognitive impairment
CN114916921A (en) * 2022-07-21 2022-08-19 中国科学院合肥物质科学研究院 Rapid speech cognition assessment method and device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101983613A (en) * 2010-10-15 2011-03-09 浙江大学 Computer-aided device for screening mild cognitive impairment (MCI) of old people
US20120265024A1 (en) * 2010-10-05 2012-10-18 University Of Florida Research Foundation, Incorporated Systems and methods of screening for medical states using speech and other vocal behaviors
CN205176850U (en) * 2015-10-30 2016-04-20 郑伟宏 Alzheimer disease screening device
US20160253999A1 (en) * 2015-02-26 2016-09-01 Arizona Board Of Regents Systems and Methods for Automated Evaluation of Human Speech
CN108198576A (en) * 2018-02-11 2018-06-22 华南理工大学 A kind of Alzheimer's disease prescreening method based on phonetic feature Non-negative Matrix Factorization
CN109841231A (en) * 2018-12-29 2019-06-04 深圳先进技术研究院 A kind of early stage AD speech auxiliary screening system for standard Chinese

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9763617B2 (en) * 2011-08-02 2017-09-19 Massachusetts Institute Of Technology Phonologically-based biomarkers for major depressive disorder
CN106725532B (en) * 2016-12-13 2018-04-24 兰州大学 Depression automatic evaluation system and method based on phonetic feature and machine learning
CN108597542A (en) * 2018-03-19 2018-09-28 华南理工大学 A kind of dysarthrosis severity method of estimation based on depth audio frequency characteristics

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120265024A1 (en) * 2010-10-05 2012-10-18 University Of Florida Research Foundation, Incorporated Systems and methods of screening for medical states using speech and other vocal behaviors
CN101983613A (en) * 2010-10-15 2011-03-09 浙江大学 Computer-aided device for screening mild cognitive impairment (MCI) of old people
US20160253999A1 (en) * 2015-02-26 2016-09-01 Arizona Board Of Regents Systems and Methods for Automated Evaluation of Human Speech
CN205176850U (en) * 2015-10-30 2016-04-20 郑伟宏 Alzheimer disease screening device
CN108198576A (en) * 2018-02-11 2018-06-22 华南理工大学 A kind of Alzheimer's disease prescreening method based on phonetic feature Non-negative Matrix Factorization
CN109841231A (en) * 2018-12-29 2019-06-04 深圳先进技术研究院 A kind of early stage AD speech auxiliary screening system for standard Chinese

Also Published As

Publication number Publication date
CN109841231A (en) 2019-06-04
CN109841231B (en) 2020-09-04

Similar Documents

Publication Publication Date Title
Moro-Velazquez et al. Advances in Parkinson's disease detection and assessment using voice and speech: A review of the articulatory and phonatory aspects
WO2020134647A1 (en) Early-stage ad speech auxiliary screening system aiming at mandarin chinese
Khodabakhsh et al. Evaluation of linguistic and prosodic features for detection of Alzheimer’s disease in Turkish conversational speech
US9947322B2 (en) Systems and methods for automated evaluation of human speech
CN112750465B (en) Cloud language ability evaluation system and wearable recording terminal
Räsänen et al. ALICE: An open-source tool for automatic measurement of phoneme, syllable, and word counts from child-centered daylong recordings
Fraser et al. Automatic speech recognition in the diagnosis of primary progressive aphasia
French et al. Forensic speech science
Levitan et al. Combining Acoustic-Prosodic, Lexical, and Phonotactic Features for Automatic Deception Detection.
Qin et al. Automatic assessment of speech impairment in cantonese-speaking people with aphasia
Pompili et al. Pragmatic aspects of discourse production for the automatic identification of Alzheimer's disease
Kokkinakis et al. Data collection from persons with mild forms of cognitive impairment and healthy controls-infrastructure for classification and prediction of dementia
Asadi et al. Between-speaker rhythmic variability is not dependent on language rhythm, as evidence from Persian reveals
Liu et al. AI recognition method of pronunciation errors in oral English speech with the help of big data for personalized learning
Zealouk et al. Voice pathology assessment based on automatic speech recognition using Amazigh digits
Han et al. [Retracted] The Modular Design of an English Pronunciation Level Evaluation System Based on Machine Learning
CN112599119B (en) Method for establishing and analyzing mobility dysarthria voice library in big data background
Ding et al. Automatic recognition of student emotions based on deep neural network and its application in depression detection
Li et al. Quantitative intonation modeling of interrogative sentences for Mandarin speech synthesis
Bartelds et al. Measuring foreign accent strength using an acoustic distance measure
CN111583914B (en) Big data voice classification method based on Hadoop platform
Wang et al. Automatic Detection of Putative Mild Cognitive Impairment from Speech Acoustic Features in Mandarin-Speaking Elders
Pan et al. Being a round/y: An acoustic description of high front vowels in Singapore Mandarin elicited by speakers with different bilingual balance in Mandarin and English
Thaler et al. Language characteristics supporting early Alzheimer’s diagnosis through machine learning–a literature review
Duan et al. An English pronunciation and intonation evaluation method based on the DTW algorithm

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19904619

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 10.11.2021)

122 Ep: pct application non-entry in european phase

Ref document number: 19904619

Country of ref document: EP

Kind code of ref document: A1