WO2020134647A1

WO2020134647A1 - Early-stage ad speech auxiliary screening system aiming at mandarin chinese

Info

Publication number: WO2020134647A1
Application number: PCT/CN2019/117033
Authority: WO
Inventors: 燕楠; 王岚; 严泉雷; 徐梦真
Original assignee: 深圳先进技术研究院
Priority date: 2018-12-29
Filing date: 2019-11-11
Publication date: 2020-07-02
Also published as: CN109841231A; CN109841231B

Abstract

An early-stage AD speech auxiliary screening system aiming at mandarin Chinese. The system comprises a main body testing unit, a speech feature extracting unit, and a recognizing unit; the main body testing unit is used for testing a tested object; the speech feature extracting unit is used for extracting a speech feature of the tested object and storing the speech feature; the recognizing unit is used for recognizing the speech feature; and the main body testing unit comprises a spontaneous voice module, a picture describing module, a word fluency module, a poem reciting module, a sentence repeating module, a picture naming module, a span testing module, a cycle pronouncing module, and a picture matching module. For the system, from the aspect that an early-stage AD speech function is damaged, aiming at the research objective of AD early-stage screening of mandarin Chinese, a matched device relevant to early-stage AD speech specific features is provided, and the system has the advantages that costs are low, rich data can be obtained in real time, large sample acquisition can be easily implemented, remote acquisition and analysis can be implemented, etc.

Description

An Early AD Speech Assisted Screening System for Mandarin Chinese

Technical field

The invention relates to an early AD speech auxiliary screening system for Mandarin Chinese.

Background technique

In the past decade, research on new, non-invasive, economical, and suitable diagnostic markers for early AD screening has become a hot issue in the early diagnosis of AD. Researchers have proposed a series of auxiliary diagnostic methods that may become AD early screening tools from different perspectives such as cognitive impairment, biochemical diagnosis, and neuroelectrophysiology, such as speech testing, olfactory testing, gait testing, retinal imaging testing, and Aβ peripheral blood. Screening, urine AD7c-NTP protein detection, electroencephalogram analysis (EEG), etc.

The existing early AD screening program mainly has the following problems, which hinders the further clinical application of related diagnostic methods in the early detection and detection of AD and the evaluation of AD disease progression.

(1) The cost is high, and there are professional requirements for environmental settings and executives, which is difficult to promote on a large scale

At present, there are three methods for clinical detection of AD: cerebrospinal fluid (CSF) analysis, neuroimaging, and neuropsychological scale testing. However, due to limitations such as high cost, low penetration rate of invasive tests, and high threshold access, the above detection methods are difficult to use as a diagnostic tool for large-scale early AD screening.

(2) Lack of reasonable use and systematic investigation of different speech tasks and extracted speech features

The current related technologies still have some deficiencies in means or schemes, such as small sample size, single feature extraction scheme and feature selection method. The detection tasks, auxiliary materials used and the data features extracted based on the tasks are also different and have great differences. Since early AD language barriers are reflected in many aspects such as speech and semantic extraction, a single speech task cannot fully understand the speech specificity of early AD patients.

Summary of the invention

In order to solve the above-mentioned problems in the background art, the present invention proposes an early AD speech auxiliary screening system for Mandarin Chinese. This system aims at early AD AD screening of Mandarin Chinese from the perspective of impaired early AD language function The goal is to provide an extraction solution and supporting devices for early AD speech-specific features. It has the advantages of low cost, real-time access to rich data, easy realization of large sample collection (can be used for big data analysis), and remote acquisition and analysis. The scope of AD early screening and long-term course management have significant application potential.

The technical solution for solving the above problems of the present invention is: an early AD speech auxiliary screening system for Mandarin Chinese, and its special features are:

Including subject test unit, speech feature extraction unit and recognition unit;

The subject test unit is used to test the tested object; the speech feature extraction unit is used to extract the speech feature of the tested object and store the speech feature; the recognition unit is used to perform speech feature on the speech feature Identify

The subject test unit includes a spontaneous voice module, a picture description module, a word fluency module, a poetry recitation module, a sentence restatement module, a picture naming module, a span test module, a carousel pronunciation module, and a picture matching module;

The speech feature extraction unit includes an automatic segmentation module for speech signals, a signal preprocessing module, an automatic speech recognition module, a text analysis module, and a speech analysis module; the speech data recorded in each module in the main test unit is first automatically segmented The module gets segmented speech, which is recognized as text after signal preprocessing and automatic speech recognition module. The text analysis module and speech analysis module calculate and estimate their corresponding speech features, vocabulary features, grammatical features, pragmatics according to the results of speech recognition feature;

The recognition unit includes a feature selection module and a classification module; the feature selection module selects and optimizes multi-dimensional features to obtain a diagnosis-sensitive feature set; the classification module uses the optimized feature set and uses a multi-task deep confidence network algorithm (Multi- task (DBN), using task relevance to jointly improve the prediction of classification tasks; the network uses classification tasks as the main task, and MMSE and MoCA scores will be trained as related tasks to help improve the prediction performance of classification tasks; based on the prediction results Complete the diagnosis and screening of patients with early AD.

Further, the above-mentioned spontaneous speech module mainly tests the language vocabulary output and sentence coherence characteristics of the tested object.

Further, the above picture description module provides at least one memo picture, and the tested object narrates the plot of the story in the picture in its own language within a given time. Mainly test the language output of the tested objects (including language fluency and vocabulary).

Further, the above-mentioned word fluency module provides at least one theme, and the tested object speaks as many related words as possible within 1 minute.

Further, the above-mentioned poetry recitation module provides at least one ancient poetry word, which is used to detect the prosody feature of the tested object's speech.

Further, the above sentence retelling module provides multiple sentences, and the tested object repeats the seen sentence. This module can detect the speech output of the tested object.

Further, the above picture naming module provides multiple pictures that appear in random order, and the tested object speaks the things in the picture. This module can assess whether the tested object has defects in the semantic level of the word, and can also assess the measured object in Whether there are difficulties in vocabulary selection.

Further, the above-mentioned span test module provides 2 to 5 Chinese characters with similar pronunciations in turn, requiring the subject to repeat the content they saw; judging the correctness of the repeat based on the module data, and analyzing the correctness when the syllable span is 2-5 rate. For short-term memory, voices with low similarity are easier to remember than voices with high similarity, and this module can detect the speech perception of the test subject based on this. All the glyphs involved in this module are tested in the order of two syllables, three syllables, four syllables, and five syllables, and they are played in the order of Yinping, Yangping, Shangsheng, Desheng, Yinping, Yangping, Shangsheng, and Desheng.

Further, the above-mentioned carousel pronunciation module provides at least one set of three-syllable syllable strings, which requires the subject to be repeated three times. It is used to detect whether the coordinated movement of the measured object to the syllable series is too slow, and whether the sound composition function is abnormal.

Further, the above picture matching module provides multiple pages; each page includes at least three pictures, of which two pictures have relevance. The measured object is required to select two pictures with correlation.

The advantages of the invention:

(1) From the perspective of the impairment of early AD language function, for the research goal of early AD screening of Mandarin Chinese, it provides extraction solutions, recognition methods and supporting devices for early AD speech-specific features, with low cost and real-time Obtaining rich data, easy to realize large sample collection (available for big data analysis), and enabling remote collection and analysis have great application potential in large-scale AD early screening and long-term disease management;

(2) The patient-oriented link is mainly based on a variety of speech recognition tasks. It uses automatic speech analysis technology to extract AD early speech feature sets. It has the characteristics of convenient operation and no intrusion. It can be collected under the comfort of the patient. In terms of user experience Has unique advantages;

(3) A synchronization optimization scheme for speech-specific acoustic feature selection and dimensionality reduction is proposed. This scheme introduces the multi-task learning idea, combined with stable feature selection and deep learning to achieve the synchronization optimization of feature selection and classification feature subspace;

(4) It can be easily combined with other methods to form multi-modal auxiliary diagnostic markers, which can be used as a comprehensive index for high-sensitivity early screening of AD, suggesting high risk of AD, increasing the discovery rate of early AD patients, and helping to achieve Intervene as early as possible for patients to strive for more effective time for related treatments.

BRIEF DESCRIPTION

1 is a flowchart of an embodiment of an early AD speech assisted screening system for Mandarin Chinese in the present invention;

2 is a schematic structural diagram of an early AD speech assisted screening system for Mandarin Chinese in the present invention;

Figure 3 is a picture of the note in the picture description module;

Figure 4 is the picture in the picture naming module;

5 is a schematic diagram of pages in the image matching module.

detailed description

To make the objectives, technical solutions, and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be described clearly and completely in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments It is a part of the embodiments of the present invention, but not all the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by a person of ordinary skill in the art without creative work fall within the protection scope of the present invention. Therefore, the following detailed description of the embodiments of the present invention provided in the drawings is not intended to limit the scope of the claimed invention, but merely represents selected embodiments of the present invention. Based on the embodiments in the present invention, all other embodiments obtained by a person of ordinary skill in the art without making creative work fall within the protection scope of the present invention.

Language barrier is one of the important features of AD in the early stage, which will lead to changes in the characteristics of the patient's spontaneous speech rhythm, pronunciation cycle, pronunciation quality and speech processing rate. Its clinical manifestations are relatively non-fluent spontaneous speech, difficulty in finding words, language Slow speed, long pauses and phoneme errors. The present invention uses objective analysis methods to quantify the specific performance of early AD language barriers in speech information, and effectively detects relevant speech features through automatic speech analysis (Automatic Speech Analysis, ASA) and automatic speech recognition technologies, with a view to realizing Chinese Mandarin Early automatic screening of users' AD.

Referring to FIGS. 1-2, an early AD speech assisted screening system for Mandarin Chinese includes a subject test unit, a speech feature extraction unit and a recognition unit.

The subject test unit is used to test the tested object; the speech feature extraction unit is used to extract the speech feature of the tested object and store the speech feature; the recognition unit is used to perform speech feature on the speech feature To identify.

The subject test unit includes a spontaneous voice module, a picture description module, a word fluency module, a poetry recitation module, a sentence restatement module, a picture naming module, a span test module, a carousel pronunciation module and a picture matching module.

Specifically, this module requires patients to complete a spontaneous emotional speech about 2 minutes around the theme of "self-introduction", such as introducing name, age, family, work, and hobbies. The module data can extract the features of language vocabulary output and sentence coherence.

Specifically, this module requires patients to observe a memo picture without text, and to narrate the plot of the story in the picture in their own language within a given time (1-2 minutes). In this module, the patient needs to understand the characters and events in the picture story and present a structured framework in the output. Therefore, the module can test the patient's language output (including language fluency and vocabulary). This module includes a total of 3 pictures (for example, Figure 3); among them, the prescribed description duration of the first picture is about 2 minutes, and the prescribed description duration of the latter two pictures is about 1 minute. According to the actual performance of the patient, the specific duration can be discretionary.

Specifically, the subject may be fruit, and the interval between the words spoken by the patient may be analyzed, and the long-term memory function of the patient may be analyzed.

Ancient poems have a good Chinese prosody structure and can be used as experimental materials to detect defects in prosody of Chinese users. Specifically, this module contains a total of 6 simple poems for patients to read, for example: Bai Ri Yi Shan exhausted, the Yellow River into the ocean current. Want to be a thousand miles away, to a higher level.

Further, the above sentence restatement module provides multiple sentences, and the tested object repeats the seen sentence. This module can detect the speech output of the patient.

Specifically, this module requires patients to repeat the sentences they see. The module data is used to calculate the interval between words in the process of repeating sentences. Generally speaking, people remember short words better than long words, and long words require longer rehearsal times. Therefore, this module can detect the patient's speech output. This module contains a total of 14 sentences, the difficulty of the sentence gradually increases, for example: tomorrow Sunday.

Further, the above picture naming module provides multiple pictures that appear in a random order, and the tested object speaks the things in the picture. This module can assess whether the patient has defects in the semantic level of the vocabulary, and can also assess whether the patient is in terms of vocabulary selection. There are difficulties.

Specifically, 36 pictures will appear in a random order in this module (for example, Figure 4). The patient is required to say something in the picture. The experiment time is unlimited, and the patient can control the picture switching. This module can assess whether the patient has defects in the semantic level of vocabulary, and can also assess whether the patient has difficulty in vocabulary selection. Naming example: Giraffe.

Further, the above-mentioned span test module provides 2 to 5 Chinese characters with similar pronunciations in sequence (for example, when the span is 2, "an" and "class" appear), and the tested object is required to retell what they see; judge according to the module data Repeat the correctness, and analyze the correct rate when the syllable span is 2-5. For short-term memory, voices with low similarity are easier to remember than voices with high similarity, and this module can detect the patient's speech perception based on this. All the glyphs involved in this module are tested in the order of two syllables, three syllables, four syllables, and five syllables, and they are played in the order of Yinping, Yangping, Shangsheng, Desheng, Yinping, Yangping, Shangsheng, and Desheng.

Further, the above-mentioned carousel pronunciation module provides at least one set of three-syllable syllable strings, for example, a set of pa-ta-ka three-syllable syllable strings are sequentially issued, and the object to be tested is required to repeat three times. It is used to detect whether the coordinated movement of the measured object to the syllable series is too slow, and whether the sound composition function is abnormal.

Specifically, each page of this module will give three pictures (for example, Figure 5), where the picture in the upper row has a clear correlation with the picture in the lower row, requiring the patient to use the left and right keys on the keyboard to Select the most relevant picture in the lower row. After each experiment is completed, a table will be generated in the folder; the table records the patient's picture selection, after the experiment is completed, these records can be analyzed and responded to the correct rate. The reaction time of AD patients to things is obviously longer than that of normal people, and the cognitive ability of things is not as good as normal people. Therefore, the data in this module can quickly and intuitively distinguish between patients and normal people by analyzing the picture selection correct rate and reaction time.

The organic combination of multi-tasking can reflect the language barrier caused by AD from multiple sides. Therefore, the present invention establishes a multi-task, multi-dimensional speech feature automatic speech abnormal feature extraction framework, with a view to completing based on the difference between the speech features of AD patients and normal people Early screening for AD. After the cognitive pronunciation task of the above subject test unit is completed, the recording and other file information generated by each module is analyzed, and the multi-dimensional speech features related to the task are extracted for the speech output of different modules, so as to obtain the early symptoms that can be used to screen AD Sensitive feature set.

The speech feature extraction unit includes an automatic segmentation module of speech signals, a signal preprocessing module, an automatic speech recognition module, a text analysis module, and a speech analysis module. The voice data recorded in each module of the main test unit is first segmented by the automatic segmentation module, and then recognized as text by the signal preprocessing and automatic speech recognition module. The text analysis and speech analysis modules are calculated based on the results of speech recognition And estimate their corresponding phonetic features, vocabulary features, grammatical features, and pragmatic features

According to the classification of linguistics, the speech features used in the present invention can be roughly divided into four categories: phonetic features, lexical features, grammatical features, and pragmatic features. Among them, some features of speech defects caused by multiple cognitive impairments may belong to multiple levels at the same time.

Phonetic features describe the deficiencies of language production at the sound level, which mainly include three aspects. The first is the time to produce words, phonemes, syllables, the number of pauses in the language stream, the time of speech production and pauses in the language stream, repetition, etc. These features can measure the fluency of the language stream, where repetition can also be based on its Judging the position in the speech stream to determine its cause. If it repeats at the beginning or end of the sentence, it can be considered that the patient has a problem in understanding; if it repeats in the sentence, it can be considered that the patient wants to repeat the previous content. Speech features also include Mel frequency cepstrum coefficient, fundamental frequency, sound length, etc., where the Mel frequency cepstrum coefficient describes the logarithmic relationship between sound energy and frequency, which is more in line with the human ear's perception of sound, and its height reflects The degree of reverberation of the sound in the channel. These features can be used to describe prosody information in speech, such as tone contours, stress, etc. In addition, phonetic errors, that is, sounds that do not conform to the pronunciation rules of the language, and words that do not sound full sounds are also phonetic features.

Lexical features mainly show the characteristics of language output at the level of vocabulary content. Vocabulary can be classified by part-of-speech, and then the distribution of vocabulary can be analyzed according to the frequency of occurrence of each part of speech. At the same time, it can also be used to analyze which type of vocabulary the patient tends to use or does not tend to use. For example, the output language contains a large number of demonstrative pronouns, which means that the output may be ambiguous. Another type of vocabulary semantic feature is used to measure the vocabulary richness and information density, and the measurement of this type of feature is mainly based on the symbol-to-symbol ratio. Class symbols refer to the different words used in the text corresponding to a paragraph of speech, and form symbols are all word forms. On the premise that the length of speech output is equal, the ratio of quasi-features and symbols reflects the richness of corpus vocabulary to some extent. The larger the symbol-to-symbol ratio, the greater the change in the vocabulary used and the lower the repetition rate.

Grammatical features show the characteristics of language output at the grammatical level, and are achieved by measuring the complexity of lexical and syntactic structures. Such features mainly include the occurrence frequency, relative proportion and average length of each grammatical component, the number and length of clauses, the height of the syntactic tree and the depth of the sentence; also include the examination of syntactic errors, such as errors in syntactic structure and incomplete sentences .

Pragmatic features reflect the cohesiveness and coherence of language production, and are variables at the discourse level. Cohesion is concerned with the relationship between sentences, and coherence can be divided into partial coherence and overall coherence. Local coherence refers to whether a sentence is coherent with the next sentence, and overall coherence refers to whether the content of a sentence is closely related to the theme of the entire text. In addition, pragmatic features also include the information integrity of language output. You can determine the information that should appear in the output in advance, and then count the amount of information covered in the patient's output, such as the number of keywords in the language output. The positioning of information needs the help of grammatical analysis.

The recognition unit includes a feature selection module and a classification module; the feature selection module selects and optimizes multi-dimensional features to obtain a diagnosis-sensitive feature set; the classification module uses the optimized feature set and uses a multi-task deep confidence network algorithm (Multi- task (DBN), using task relevance to jointly improve the prediction of classification tasks; the network takes classification tasks as the main task, and MMSE and MoCA scores will be trained as related tasks to help improve the prediction performance of classification tasks. According to the prediction results, the diagnosis and screening of patients with early AD are finally completed.

The invention adopts a framework based on Multi-task Deep Learning (MDTL) to realize automatic classification of AD patients. The structure includes two-level modules. The upper-level module is a feature selection based on stable selection of multi-task deep learning, and the lower-level module is an ELM or SVM classifier for classification. First, we will normalize the above high-dimensional acoustic features related to the AD language brain function network, and use Lasso estimation and stability feature selection (Stability) Selection technology to further optimize to realize the diagnosis-sensitive feature set. Secondly, the present invention introduces a stable feature selection technology, which repeats the Lasso estimation of different parameter λ parameter values 50 times, and the probability of each parameter in the feature vector set is expressed as the sum of the frequencies that appear in these 50 Lasso estimations. The threshold t that makes the steady state feature set parameters change little determines the feature set. Finally, the invention uses the multi-task learning principle to build a multi-task deep confidence network (Multi-task DBN), and uses task correlation to jointly improve the prediction of classification tasks. The network focuses on classification tasks, and MMSE and MoCA scores will be trained as related tasks to help improve the prediction performance of classification tasks.

Therefore, based on the specific matching results of the speech feature set, the patient's language output can be identified. If the patient's speech feature set collected by the experimental task matches the standard feature set placed in the AD recognition model, the patient may be regarded as suspected of early AD.

The above are only embodiments of the present invention, and are not intended to limit the scope of protection of the present invention. Any equivalent structure or equivalent process transformation made by the description and drawings of the present invention, or directly or indirectly used in other related In the field of systems, the same reason is included in the protection scope of the present invention.

Claims

An early AD speech assisted screening system for Mandarin Chinese, which is characterized by:

Including subject test unit, speech feature extraction unit and recognition unit;

The subject test unit is used to test the tested object; the speech feature extraction unit is used to extract the speech feature of the tested object and store the speech feature; the recognition unit is used to perform speech feature on the speech feature Identify

The subject test unit includes a picture description module, a word fluency module, a poetry recitation module, a sentence repetition module, a picture naming module, a span test module, a carousel pronunciation module and a picture matching module;

The speech feature extraction unit includes an automatic segmentation module for speech signals, a signal preprocessing module, an automatic speech recognition module, a text analysis module, and a speech analysis module; the speech data recorded in each module in the main test unit is first automatically segmented The module gets segmented speech, which is recognized as text after signal preprocessing and automatic speech recognition module. The text analysis module and speech analysis module calculate and estimate their corresponding speech features, vocabulary features, grammatical features, pragmatics according to the results of speech recognition feature;

The recognition unit includes a feature selection module and a classification module; the feature selection module selects and optimizes multi-dimensional features to obtain a diagnosis-sensitive feature set; the classification module uses the optimized feature set and uses a multi-task deep confidence network algorithm to use the task Relevance jointly improves the prediction of classification tasks; the network takes classification tasks as the main task, and MMSE and MoCA scores will be trained as related tasks to help improve the prediction performance of classification tasks; the diagnosis of early AD patients is finally completed according to the prediction results And screening.
An early AD speech assisted screening system for Mandarin Chinese according to claim 1, wherein the subject test unit further includes a spontaneous speech module, and the spontaneous speech module mainly tests the language vocabulary output of the tested object Coherence characteristics with sentences.
An early AD speech assisted screening system for Mandarin Chinese according to claim 1, characterized in that the picture description module provides at least one memo picture, and the tested object narrates the picture in his own language within a given time The plot of the story.
An early AD speech assisted screening system for Mandarin Chinese as claimed in any one of claims 1 to 3, characterized in that: the word fluency module provides at least one topic, and the test subject speaks as much as possible within 1 minute Related words.
An early AD speech assisted screening system for Mandarin Chinese as claimed in claim 4, characterized in that the poetry recitation module provides at least one ancient poetry word for detecting the prosodic features of the tested subject's speech.
An early AD speech assisted screening system for Mandarin Chinese as claimed in claim 5, characterized in that the sentence restatement module provides multiple sentences, and the tested object repeats the seen sentence to detect the tested object The output of speech.
An early AD speech assisted screening system for Mandarin Chinese according to claim 6, characterized in that: the picture naming module provides multiple pictures that appear in random order, and the tested object speaks the things in the picture. This module It is possible to assess whether the object under test has deficiencies in the semantic level of vocabulary, and at the same time it is possible to assess whether the object under test has difficulties in vocabulary selection.
An early AD speech assisted screening system for Mandarin Chinese as claimed in claim 7, characterized in that the span test module provides 2 to 5 Chinese characters with similar pronunciations in sequence, requiring the subject to repeat what he saw ; Judge the correctness of the retelling according to the module data, and analyze the correct rate when the syllable span is 2-5.
An early AD speech auxiliary screening system for Mandarin Chinese according to claim 8, characterized in that the carousel pronunciation module provides at least one set of three-syllable syllable strings, which requires the subject to be repeated three times for detecting the subject Whether the subject's coordinated movement within the syllable series is too slow, and whether the sound-forming function is abnormal.
An early AD speech assisted screening system for Mandarin Chinese according to claim 9, characterized in that: the picture matching module provides multiple pages; each page includes at least three pictures, of which two pictures are related , Require the measured object to select two pictures with correlation.