CN109584861A

CN109584861A - The screening method of Alzheimer's disease voice signal based on deep learning

Info

Publication number: CN109584861A
Application number: CN201811464595.0A
Authority: CN
Inventors: 周青; 顾明亮; 马勇; 朱祖德
Original assignee: Jiangsu Normal University
Current assignee: Jiangsu Normal University
Priority date: 2018-12-03
Filing date: 2018-12-03
Publication date: 2019-04-05

Abstract

The screening method of Alzheimer's disease voice signal based on deep learning, is related to voice processing technology, comprising steps of training depth confidence network model is spare；So that detected person is carried out different spoken output tasks and acquires the voice of detected person；Acquired voice is pre-processed；It extracts in pretreated voice pathological characters relevant with Alzheimer's disease and is inputted trained depth confidence network model and be trained to obtain optimization feature；Optimization feature is inputted trained SVM classifier to classify, classification results are screening results.The screening method of Alzheimer's disease voice signal based on deep learning of the invention, realizes AD rapid screening using deep learning, only can make preliminary judgement by subject's voice, and method is simple, and intelligence degree is high.

Description

The screening method of Alzheimer's disease voice signal based on deep learning

Technical field

The present invention relates to place's voice process technology, and in particular to the Alzheimer's disease voice signal based on deep learning Screening method.

Background technique

Alzheimer's disease (Alzheimer ' s disease, AD), becomes one of aging society focus of attention.Whole nation stream Row disease, which is learned, investigates the Alzheimer's disease illness rate for then showing China's over-65s population up to 4.8%；Current clinic AD diagnosis needs 2-3 hours standard neuropsychologicals of experience assess the nerve that the base PET or Traumatic spinal cord low and expensive with availability are punctured Marker inspection, going screening using this conventional route, nearly ten million potential dementia patients are very difficult easily.

Early there is research to observe the obstacle of AD patient's spoken language output and finds that the exception of linguistic function can be used as AD assessment The training of deep neural network algorithm is utilized therefore by the analysis to measured's phonic signal character with the important evidence of diagnosis Pathological characters model finds effective pathological characters of AD patient, is realized to AD patient by SVM classifier with the side of non-intrusion type Formula carries out rapid screening, provides a kind of low cost for the clinical diagnosis of AD, feasibility is high, and structure is simple, intelligentized objective survey Amount method.

Summary of the invention

The object of the present invention is to provide a kind of quick sieves of Alzheimer's disease based on the optimization of depth confidence network characterization Technology is looked into, is analyzed by the processing to subject's voice signal, correlated pathologies, including fundamental frequency, jitter are extracted (jitter), Shimmer (shimmer), humorous make an uproar than (HNR), signal-to-noise ratio (SNR), short-time zero-crossing rate, short-time energy, resonance Peak, MFCC, LPC, speech pause, word speed.The pathological characters of extraction are analyzed, establish and train the depth for characteristic optimization Confidence network model and the svm classifier model for classification are spent, to realize the rapid screening to Alzheimer Disease patient.

To realize the above goal of the invention, technical scheme is as follows:

The screening method of Alzheimer's disease voice signal based on deep learning, comprising steps of

S1: training depth confidence network model is spare；

S2: so that detected person is carried out different spoken output tasks and acquire the voice of detected person；

S3: acquired voice is pre-processed；

S4: pathological characters relevant with Alzheimer's disease in pretreated voice are extracted and are inputted trained Depth confidence network model is trained to obtain optimization feature；

S5: optimization feature is inputted into trained SVM classifier and is classified, classification results are screening results.

Technical solution as a further improvement of that present invention, the step S2 are specifically included: measure field noise excludes to make an uproar Sound source carries out voice collecting after noise meets the requirements；During voice collecting, different spoken outputs are carried out to measured and are appointed Business, is marked arrangement to voice.

Technical solution as a further improvement of that present invention, the step S2 are specifically included: measure field noise excludes to make an uproar Sound source carries out voice collecting after noise meets the requirements；During voice collecting, different spoken outputs are carried out to measured and are appointed Business, spoken output task include self-introduction, Verbal fluency test, picture description, continuously send out vowel, voice is marked It arranges.

Technical solution as a further improvement of that present invention, the step S3 are specifically included: to collected voice data It is denoised, parameter is regular, preemphasis, adding window and sub-frame processing, obtains voice frame sequence.

Technical solution as a further improvement of that present invention, the step S3 are specifically included: to collected voice data It is denoised, parameter is regular, preemphasis, adding window and sub-frame processing, obtains voice frame sequence, wherein preemphasis, and adding window, framing is led to OpenSMILE is crossed to be pre-processed.

Technical solution as a further improvement of that present invention, the step S4 are specifically included: being extracted each in voice frame sequence The pathological characters of speech frame simultaneously extract first-order difference and second differnce to pathological characters, form new multidimensional pathological characters, will be more It ties up pathological characters and inputs trained depth confidence network model, output optimization feature.

Technical solution as a further improvement of that present invention, the step S4 are specifically included: being extracted each in voice frame sequence The pathological characters of speech frame simultaneously extract first-order difference and second differnce to pathological characters, form new multidimensional pathological characters, wherein Pathological characters include: fundamental frequency, jitter, Shimmer, humorous ratio of making an uproar, signal-to-noise ratio, short-time zero-crossing rate, short-time energy, formant, Multidimensional pathological characters are inputted trained depth confidence network model by MFCC, LPC, speech pause and word speed, and output optimization is special Sign.

Technical solution as a further improvement of that present invention, the step S5 are specifically included: using optimization feature as input It is put into trained SVM classifier and classifies, classification results are testing result, wherein the training of SVM classifier model Process are as follows: by the data in training set by pretreatment, pathological characters are extracted, and are put into the optimization that depth confidence network model obtains Feature input SVM classifier is trained to obtain trained SVM classifier model.

Compared with prior art, beneficial effects of the present invention: the Alzheimer's disease language of the invention based on deep learning The screening method of sound signal realizes AD rapid screening using deep learning, can only be made by subject's voice and tentatively be sentenced Disconnected, method is simple, and intelligence degree is high.

Detailed description of the invention

Fig. 1 is method flow schematic diagram of the invention；

Fig. 2 is voice collecting flow diagram；

Fig. 3 is voice pretreatment process schematic diagram；

Fig. 4 is feature extraction flow diagram；

Fig. 5 is depth network frame training optimization feature schematic diagram；

Fig. 6 is the flow chart of RBM parameter training；

Fig. 7 is SVM classifier training classification process figure.

Specific embodiment:

The present invention is described further with reference to the accompanying drawings.

Embodiment

Fig. 1 is the process signal of the screening method of the Alzheimer's disease voice signal of the invention based on deep learning Figure, comprising steps of

1) voice when carrying out different spoken output tasks to subject is acquired and arranges；

2) above-mentioned voice is pre-processed；

3) it extracts the acoustic feature of above-mentioned voice and is inputted depth confidence neural network and be trained to obtain optimization spy Sign；

4) optimization feature is inputted trained SVM classifier to classify, realizes the automatic recognition of speech by input Alzheimer Disease patient.

Fig. 2 is voice collecting flow diagram.The effect of the part is: acquiring primary data for experiment, collection is used for The training voice document that subsequent algorithm needs.Personnel's measure field noise of test is presided over first, if on-site noise is higher than 55dB, Noise source is then excluded, when noise is down to 55dB or less, then carries out voice collecting.

During voice collecting, different spoken output tasks, including " self-introduction " are carried out to measured, " speech is smooth Property test ", " picture description ", " continuously sending out vowel " four different spoken output tasks save voice.

Wherein, the voice of training set saves, label, and finishing part is to save all recording files of each subject In the case where numbering identical file with subject, process is saved without personal information, only retains the number to distinguish and examines Disconnected result (young people, normal old man, AD patient or not after diagnosing).

Fig. 3 is voice pretreatment process schematic diagram.Training data and test data are denoised respectively, parameter is regular, Preemphasis, adding window and sub-frame processing are successively carried out simultaneously, obtain voice frame sequence.Denoising.Using automatic segmentation program to voice It carries out smart detection and removes the noise jammings such as cough, manually proofreaded again for training data, it is bright to what is occurred in voice segments Aobvious noise and mute section of length are labeled and cut.Parameter is regular, and due to recording environment, equipment is different, in data summarization Afterwards, according to parameters such as requirement of experiment uniform sampling rate, bit rates, amplitude normalization processing is carried out using Audition software, is disappeared Except interference.Cutting, in order to examine influence of the different duration voice segments to effect is distinguished, by design automatic segmentation program to training number It, can manual setting cutting duration according to integration cutting is carried out.

After handling the voice signal of acquisition, pathological characters extraction is carried out.Fig. 4 is that pathological characters extract flow chart, The feature of extraction includes but is not limited to: fundamental frequency, jitter (jitter), Shimmer (shimmer), it is humorous make an uproar than (HNR), letter It makes an uproar than (SNR), short-time zero-crossing rate, short-time energy, formant, MFCC, LPC, speech pause, word speed.Training data also carries out spy Levy extraction process.

It is illustrated by taking MFCC feature as an example below.

When extracting the MFCC feature of each speech frame, frequency-region signal is obtained by Fourier transformation and modulus first, and pass through It crosses triangle filter function and obtains the output in Meier domain, take logarithm to carry out decorrelative transformation by long-lost cosine code, obtain 13 ranks MFCC parameter, then first-order difference and second differnce are extracted to it, 39 dimension MFCC feature of composition.

The method for wherein extracting feature includes being calculated using openSMILE including fundamental frequency, jitter (jitter), vibration The features such as width perturbation (shimmer), MFCC, LPC；Being extracted using the kit voice box of MATLAB includes humorous ratio of making an uproar (HNR), the features such as signal-to-noise ratio (SNR), short-time zero-crossing rate, short-time energy；Wherein word speed, speech pause and formant feature use Praat script is realized；

Particularly, for the extraction of the speech pause feature of one of AD patient's pathological characters, when including total to voice segments Length, generation total duration, pause total duration, pause number, five features such as sounding/pause ratio are stopped as speech pause assessment voice The global feature to pause.

For the effective information of preferably keeping characteristics, optimization characterization step is obtained using depth confidence network model here It include: that the pathological characters are inputted into trained depth confidence neural network (DBN) model in advance, output optimization feature.

Wherein, typical depth confidence network is limitation Boltzmann machine (the Restricted Boltzmann by multilayer Machine, RBM) and one layer BP neural network composition.Entire training process can be summarized as unsupervised learning from the bottom up and Two step of supervised learning from top to bottom:

It is DBN network 1. the first step is the RBM network parameter for successively training each layer from the bottom up using no label data Pre-training process.

2. network is finally exported the difference that obtains compared with having label data from upper by BP neural network by second step It is returned down, is the fine tuning of network with regulating networks parameter to optimal.

Fig. 5 is depth confidence neural metwork training flow chart, and the pathological characters of extraction are inputted, from the bottom up successively training Each layer of RBM network parameter, the output of pre-training are the input of SVM classifier.

Wherein RBM is the pith of DBN network, is a kind of undirected generative probabilistic model, from two layers neuron ( Layer v and hidden layer h) is constituted.The value of the visible layer unit of RBM be [0,1], implicit layer unit can only value be 0 or 1.The network The connection performance of neuron is statistical iteration between resulting in each neuron of same layer, and RBM is about v, the energy function of h are as follows:

In formula, I, J are respectively visible layer neuron number and hidden layer neuron number, and v, h are respectively visible layer unit With implicit layer unit, θ={ a, b, w } is the parameter of RBM model.

In v_i=1 or h_jWhen=1, conditional probability is

Wherein, activation primitive

Fig. 6 is the flow chart of RBM parameter training, and RBM the destination of study is to obtain the parameter of its network model, and pass through ladder Degree descent algorithm seeks the least energy in network structure.

It can serve as the input of SVM classifier by the optimization feature that depth confidence network model exports.Fig. 7 is SVM Classifier training classification process figure.The optimization feature that test obtains through the above steps is put into trained SVM classifier Classify, classification results are testing result, wherein training process are as follows: it is first that training data is passed through into pretreatment, feature is extracted, The optimization feature exported by depth confidence network model inputs SVM classifier and is trained, and is tested using 5 folding cross-validation methods Demonstrate,prove classifying quality；Wherein SVM is realized using LIBSVM, and the kernel function of selection is RBF (Radial Basis Function).

Although having been presented for some embodiments of the present invention herein, it will be appreciated by those of skill in the art that Without departing from the spirit of the invention, the embodiments herein can be changed.Examples detailed above is only exemplary, and is not answered Using the embodiments herein as the restriction of interest field of the present invention.

Claims

1. the screening method of the Alzheimer's disease voice signal based on deep learning, which is characterized in that comprising steps of

S1: training depth confidence network model is spare；

S3: acquired voice is pre-processed；

2. the screening method of the Alzheimer's disease voice signal according to claim 1 based on deep learning, feature Be, the step S2 is specifically included: measure field noise excludes noise source, and voice is carried out after noise meets the requirements and is adopted Collection；During voice collecting, different spoken output tasks are carried out to measured, arrangement is marked to voice.

3. the screening method of the Alzheimer's disease voice signal according to claim 2 based on deep learning, feature Be, the step S2 is specifically included: measure field noise excludes noise source, and voice is carried out after noise meets the requirements and is adopted Collection；During voice collecting, different spoken output tasks are carried out to measured, spoken output task includes self-introduction, speech Fluency test, continuously sends out vowel at picture description, and arrangement is marked to voice.

4. the screening method of the Alzheimer's disease voice signal according to claim 1 based on deep learning, feature It is, the step S3 is specifically included: collected voice data is denoised, parameter is regular, preemphasis, adding window and framing Processing obtains voice frame sequence.

5. the screening method of the Alzheimer's disease voice signal according to claim 4 based on deep learning, feature It is, the step S3 is specifically included: collected voice data is denoised, parameter is regular, preemphasis, adding window and framing Processing obtains voice frame sequence, and wherein preemphasis, adding window, framing are pre-processed by openSMILE.

6. the screening method of the Alzheimer's disease voice signal according to claim 1 based on deep learning, feature It is, the step S4 is specifically included: extracts the pathological characters of each speech frame in voice frame sequence and one is extracted to pathological characters Order difference and second differnce form new multidimensional pathological characters, and multidimensional pathological characters are inputted trained depth confidence network Model, output optimization feature.

7. the screening method of the Alzheimer's disease voice signal according to claim 6 based on deep learning, feature It is, the step S4 is specifically included: extracts the pathological characters of each speech frame in voice frame sequence and one is extracted to pathological characters Order difference and second differnce form new multidimensional pathological characters, wherein pathological characters include: that fundamental frequency, jitter, amplitude are micro- It disturbs, humorous make an uproar ratio, signal-to-noise ratio, short-time zero-crossing rate, short-time energy, formant, MFCC, LPC, speech pause and word speed, by multidimensional disease It manages feature and inputs trained depth confidence network model, output optimization feature.

8. the screening method of the Alzheimer's disease voice signal according to claim 1 based on deep learning, feature It is, the step S5 is specifically included: optimization feature is put into trained SVM classifier as input and is classified, point Class result is testing result, wherein the training process of SVM classifier model are as follows: by the data in training set by pre-processing, Pathological characters extract, and are put into the optimization feature input SVM classifier that depth confidence network model obtains and are trained and are trained Good SVM classifier model.