CN104992707A - Cleft palate voice glottal stop automatic identification algorithm and device - Google Patents

Cleft palate voice glottal stop automatic identification algorithm and device Download PDF

Info

Publication number
CN104992707A
CN104992707A CN201510257555.9A CN201510257555A CN104992707A CN 104992707 A CN104992707 A CN 104992707A CN 201510257555 A CN201510257555 A CN 201510257555A CN 104992707 A CN104992707 A CN 104992707A
Authority
CN
China
Prior art keywords
initial consonant
sample
value
information entropy
signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510257555.9A
Other languages
Chinese (zh)
Inventor
何凌
谭洁
尹恒
刘奇
郭春丽
严苗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan University
Original Assignee
Sichuan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan University filed Critical Sichuan University
Priority to CN201510257555.9A priority Critical patent/CN104992707A/en
Publication of CN104992707A publication Critical patent/CN104992707A/en
Pending legal-status Critical Current

Links

Landscapes

  • Electrically Operated Instructional Devices (AREA)

Abstract

The invention discloses a cleft palate voice glottal stop automatic identification algorithm and device, relates to the technical field of voice analysis and identification, and aims to provide a glottal stop automatic identification method and device. A computer is adopted for automatically identifying cleft palate voice glottal stops, effective and objective auxiliary diagnosis is provided to patients and voice teachers, and wide popularization of cleft palate voice assessment and voice treatment is facilitated. According to the technical key points of the invention, the method comprises the steps of: 1, collecting voice signals of syllables to be tested; 2, carrying out initial and final division on the voice signals of syllables, and retaining initial voice signals; 3, extracting characteristic values of the initial voice signals; and 4, sending the characteristic values into trained identification models, wherein the identification models judge whether glottal stops exist in the voice signals of syllables according to the characteristic values.

Description

A kind of cleft palate speech glottal stop automatic identification algorithm and device
Technical field
The present invention relates to speech analysis, recognition technology field, especially a kind of cleft palate speech glottal stop automatic identification algorithm and device.
Background technology
Harelip is modal congenital Craniofacial anomalies, and China has harelip crowds maximum in the world.With harelip unlike, the defect on facial shape is not only in the maximum impact of cleft palate, and due to the defect of upper palatine bone tissue in various degree and soft tissue and deformity, cause the dysfunctions such as patient's speech language, sucking, feed, have a strong impact on population life quality.Usually, after first phase Cleft palate repair, still have a large amount of patient existence voice disorder in various degree.The important step in cleft palate sequence Therapeutic mode to the treatment of cleft palate speech obstacle.
At present, realized by the perceptual judgment of professional voice teacher the assessment of cleft palate speech, this method is subject to the factor impacts such as the clinical experience of voice teacher and subjective state.
The clinical manifestation of cleft palate speech mainly comprises sympathetic response obstacle and dysarthrosis.Wherein, the main clinical manifestation of sympathetic response obstacle is high nasal sound, rhinorrhea gas etc.; Dysarthric main clinical manifestation is consonant deletion, compensatory, reduction, substitutes.Wherein, compensatory structure sound is one of modal wrong structure sound method of Patients with Cleft Palate extremely, its pronunciation principle is that Patients with Cleft Palate is when sending out consonant, because oral airflow branches to nasal cavity through the velopharyngeal opening of dysraphism, there is insufficient pressure in rhinorrhea gas and mouth, cause them to pronounce to utilize the air-flow in pharyngeal cavity before air-flow, thus learn compensatory pronunciation in one way.Glottal stop is clinical modal compensatory structure sound form, has the greatest impact, can occur in whole pressure consonants to speech intelligibility, on Auditory Perception, and patient's tonequality " hard, short ", smudgy.And long-term impact can cause increased thickness of vocal cords, brief summary, trachyphonia, hoarse.Because compensatory structure sound and velopharyngeal function are closely related, it directly maps the degree of velopharyngeal function, therefore has important clinical significance to its accurate evaluation.
Summary of the invention
Technical matters to be solved by this invention is: for above-mentioned Problems existing, a kind of glottal stop automatic identifying method and device are provided, adopt Computer Automatic Recognition cleft palate speech glottal stop, for patient and voice teacher provide effective objective auxiliary diagnosis, contribute to cleft palate speech assessment and popularize with the extensive of speech therapy.
Cleft palate speech glottal stop automatic identification algorithm provided by the invention, comprising:
Step 1: gather syllable verbal audio signal to be measured;
Step 2: the female cutting of sound is carried out to described syllable verbal audio signal, retains initial consonant voice signal;
Step 3: the eigenwert extracting described initial consonant voice signal;
Step 4: described eigenwert sent in trained model of cognition, model of cognition judges whether there is glottal stop in described syllable verbal audio signal according to described eigenwert.
Described step 2 comprises further:
Step 21: windowing framing is carried out to syllable verbal audio signal and obtains some speech frame x i[n], i gets 1,2,3 ... M;
Step 22: the short-time energy E calculating each speech frame iand short-time zero-crossing rate Z i;
Step 23: energy difference e (i) and zero passage rate variance z (i): e (the i)=E that calculate adjacent two frames i+1-E i, i=1,2 ..., M-1, z (i)=Z i+1-Z i, i=1,2 ..., M-1;
Step 24: each energy difference e (i) is compared with threshold value T1, each zero passage rate variance z (i) is compared with threshold value T2; When meeting e (i)>=T1, simultaneously during z (i)≤T2, if now i=I; Then get speech frame x i[n], i gets 1,2,3 ... I is the initial consonant voice signal of syllable verbal audio signal.
The initial consonant phonic signal character value that described step 3 is extracted comprise in following characteristics value one or more: spectrum energy strengthening segment eigenwert, MFCC acoustic feature value, critical bands short-time rating spectroscopic eigenvalue, wavelet transformation and Information Entropy Features value, wavelet package transforms and Information Entropy Features value; Wherein,
Extract the spectrum energy strengthening segment eigenwert of initial consonant voice signal: the first to the five spectrum energy strengthening segment eigenwert calculating every frame initial consonant speech frame; Calculate the first spectrum energy strengthening segment eigenwert of the first spectrum energy strengthening segment eigenwert average as initial consonant voice signal of whole initial consonant speech frame, by that analogy, calculate the second to the five spectrum energy strengthening segment eigenwert of initial consonant voice signal;
Extract the MFCC acoustic feature value of initial consonant voice signal: the MFCC acoustic feature value calculating every frame initial consonant speech frame, wherein MFCC coefficient value gets 12, obtains 12 MFCC eigenwerts of every frame initial consonant speech frame; Using the MFCC eigenwert of the mean value of a MFCC eigenwert of whole initial consonant voice signal frame as initial consonant voice signal, by that analogy, the second to the ten two MFCC eigenwert of initial consonant voice signal is calculated;
Extract the critical bands short-time rating spectroscopic eigenvalue of initial consonant voice signal: Short Time Fourier Transform is carried out to every frame initial consonant speech frame, obtain the short-time rating spectrum of every frame initial consonant speech frame; According to critical bands division rule, the short-time rating of every frame initial consonant speech frame spectrum is divided into 20 critical bands; The power of the first critical bands of whole initial consonant speech frame is superimposed and obtains the first critical bands short-time rating spectroscopic eigenvalue of initial consonant voice signal, obtain the second to the two ten critical bands short-time rating spectroscopic eigenvalue by that analogy;
Extract wavelet transformation and the Information Entropy Features value of initial consonant voice signal: three layers of wavelet transformation are carried out to every frame initial consonant speech frame, the signal after obtaining 4 reconstruct is reconstructed to the signal after three layers of wavelet decomposition, calculates the information entropy of the signal after each reconstruct; The mean value of the information entropy of the signal after reconstructing first of whole initial consonant voice signal frame is as the first wavelet transformation of initial consonant voice signal and Information Entropy Features value, by that analogy, the second to the four wavelet transformation and the Information Entropy Features value of initial consonant voice signal is calculated;
Extract wavelet package transforms and the Information Entropy Features value of initial consonant voice signal: three layers of wavelet package transforms are carried out to every frame initial consonant speech frame, signal after obtaining 8 reconstruct is reconstructed to the signal after three layers of WAVELET PACKET DECOMPOSITION, calculates the information entropy of the signal after each reconstruct; The mean value of the information entropy of the signal after reconstructing first of whole initial consonant voice signal frame is as the first wavelet package transforms of initial consonant voice signal and Information Entropy Features value, by that analogy, the second to the six wavelet transformation and the Information Entropy Features value of initial consonant voice signal is calculated.
Step 4 comprises further:
Choose the syllable verbal audio signal some composition true training sample set of known packets containing glottal stop, choose the known false training sample set of the some compositions of syllable verbal audio signal not comprising glottal stop;
Extract the spectrum energy strengthening segment eigenwert of each sample of two training sample sets, MFCC acoustic feature value, critical bands short-time rating spectroscopic eigenvalue, wavelet transformation and Information Entropy Features value and wavelet package transforms and Information Entropy Features value;
The initial consonant phonic signal character value of the syllable verbal audio signal to be measured that obtaining step 3 obtains;
Calculate the initial consonant phonic signal character value of this syllable verbal audio signal to be measured and the distance of each training sample:;
D 1 = Σ l = 1 5 a ( x l - y l ) 2 + Σ l = 6 17 b ( x l - y l ) 2 + Σ l = 18 37 c ( x l - y l ) 2 + Σ l = 38 41 d ( x l - y l ) 2 + Σ l = 42 49 e ( x l - y l ) 2 .
Choose the shortest some training samples of initial consonant phonic signal character value distance from syllable verbal audio signal to be measured, wherein belong to the training sample of true training sample set maximum time then think in described syllable verbal audio signal to be measured containing glottal stop;
Wherein: x l, l gets 1 ~ 5, is the first to the five spectrum energy strengthening segment eigenwert of syllable verbal audio signal to be measured;
X l, l gets 6 ~ 17, is the first to the ten two MFCC acoustic feature value of syllable verbal audio signal to be measured;
X l, l gets 18 ~ 37, is the first to the two ten critical bands short-time rating spectroscopic eigenvalue of syllable verbal audio signal to be measured;
X l, l gets 38 ~ 41, is the first to the four wavelet transformation and the Information Entropy Features value of syllable verbal audio signal to be measured;
X l, l gets 42 ~ 49, is the first to the eight wavelet package transforms and the Information Entropy Features value of syllable verbal audio signal to be measured;
Y l, l gets 1 ~ 5, is the first to the five spectrum energy strengthening segment eigenwert of training sample;
Y l, l gets 6 ~ 17, is the first to the ten two MFCC acoustic feature value of training sample;
Y l, l gets 18 ~ 37, is the first to the two ten critical bands short-time rating spectroscopic eigenvalue of training sample;
Y l, l gets 38 ~ 41, is the first to the four wavelet transformation and the Information Entropy Features value of training sample;
Y l, l gets 42 ~ 49, is the first to the eight wavelet package transforms and the Information Entropy Features value of training sample;
A, b, c, d, e are weights.
Preferably, the value acquisition methods of described weights comprises:
Choose the syllable verbal audio signal some composition true sample space of known packets containing glottal stop, choose the known syllable verbal audio signal some composition dummy copies space not comprising glottal stop;
Extract the spectrum energy strengthening segment eigenwert of each sample of two sample spaces, MFCC acoustic feature value, critical bands short-time rating spectroscopic eigenvalue, wavelet transformation and Information Entropy Features value and wavelet package transforms and Information Entropy Features value;
With the sample that the spectrum energy strengthening segment eigenwert of the sample of two sample spaces is KNN model of cognition; The recognition correct rate of KNN model of cognition is now a;
With the sample that the MFCC acoustic feature value of the sample of two sample spaces is KNN model of cognition; The recognition correct rate of KNN model of cognition is now b;
With the sample that the critical bands short-time rating spectroscopic eigenvalue of the sample of two sample spaces is KNN model of cognition; The recognition correct rate of KNN model of cognition is now c;
With the sample that the wavelet transformation of the sample of two sample spaces and Information Entropy Features value are KNN model of cognition; The recognition correct rate of KNN model of cognition is now d;
With the sample that the wavelet package transforms of the sample of two sample spaces and Information Entropy Features value are KNN model of cognition; The recognition correct rate of KNN model of cognition is now e.
In sum, owing to have employed technique scheme, the invention has the beneficial effects as follows:
1. present invention achieves the Computer Automatic Recognition of cleft palate speech glottal stop.
2. propose the KNN disaggregated model of improvement, recognition accuracy is up to 93.1%.
Accompanying drawing explanation
Examples of the present invention will be described by way of reference to the accompanying drawings, wherein:
Fig. 1 is algorithm flow chart of the present invention.
Fig. 2 is that in the present invention, critical bands short-time rating spectroscopic eigenvalue extracts process flow diagram.
Fig. 3 is that in the present invention, wavelet/wavelet packet transform and Information Entropy Features value extract process flow diagram.
Fig. 4 is the tree structure schematic diagram of three layers of wavelet transformation in the present invention.
Fig. 5 is the process flow diagram in the present invention, every frame voice signal being calculated to wavelet transformation and Information Entropy Features value.
Fig. 6 is the tree structure schematic diagram of three layers of wavelet package transforms in the present invention.
Fig. 7 is the process flow diagram in the present invention, every frame voice signal being calculated to wavelet package transforms and Information Entropy Features value.
Embodiment
All features disclosed in this instructions, or the step in disclosed all methods or process, except mutually exclusive feature and/or step, all can combine by any way.
Arbitrary feature disclosed in this instructions, unless specifically stated otherwise, all can be replaced by other equivalences or the alternative features with similar object.That is, unless specifically stated otherwise, each feature is an example in a series of equivalence or similar characteristics.
As Fig. 1, first, the cleft palate speech of input is carried out to the pre-service of framing and windowing.Because glottal stop only occurs in the initial consonant part of syllable, therefore first algorithm realizes the cutting of sound mother, and automatic identification algorithm only carries out the speech frame of initial consonant part.
Characteristics extraction is carried out to the voice signal of initial consonant part.
In this algorithm, pattern recognition classifier device adopts K arest neighbors (KNN:k-Nearest Neighbor) sorting algorithm, the KNN sorting algorithm of improvement, support vector machine (SVM:Support Vector Machines) sorting algorithm, realizes the automatic identification of voice signal with or without glottal stop two kind.
Wherein, the automatic recognition system based on KNN, improvement KNN, support vector machine is divided into two major parts: model training and part of detecting.In the training stage, the known voice signal whether containing glottal stop after pretreatment, extract acoustic feature value, this eigenwert, as training sample training mode recognition classifier (being respectively: the KNN of KNN, improvement, SVM classifier), makes it possess recognition capability.At test phase, to the voice signal to be measured inputted after pre-service, extract identical acoustic feature value and extract, obtain realizing automatic discrimination to or without glottal stop two kind by the model of cognition trained.
Lower mask body sets forth the implementation procedure of each step:
The framing of 1 voice signal and windowing
The generation of voice signal depends on the coordinative role of vocal organs, is a kind of vibration signal of quasi periodic.Voice signal is nonstationary random signal, but it is generally acknowledged that voice signal has short-term stationarity characteristic in about 10 ~ 30ms time range.
In cleft palate speech, glottal stop occurs over just initial consonant part.In this algorithm, carry out cutting to the sound mother of a syllable, obtain the voice signal of initial consonant part, its automatic identification algorithm only carries out initial consonant voice signal.Because the pronunciation duration of most of initial consonant is shorter, as: under normal circumstances, the unaspirated stop duration of a sound is in the scope of 0 ~ 32ms; The fricative duration of a sound is between 90ms ~ 220.3ms; Unaspirated affricate, aspirated stop, supply gas the affricative duration of a sound between 0 ~ 220.3ms; Turbid initial consonant duration is between 0 ~ 124ms.Considered that part initial consonant pronunciation duration is shorter, the duration of every frame voice signal is chosen as 10ms, and it is 1/2 frame length that frame moves.
The framing window adopted in this algorithm is Hamming (Hamming) window, and in time domain, voice signal is multiplied by window function, obtains framing windowing signal.Sample frequency due to voice signal is 16000Hz, and namely every frame voice signal length is 160 points, and it is 80 points that frame moves length.
The cutting of 2 initial consonants and simple or compound vowel of a Chinese syllable
In mandarin, the pronunciation of a Chinese character is a syllable.A complete syllable comprises initial consonant and simple or compound vowel of a Chinese syllable part.Initial consonant has consonant to form, and by manner of articulation, can be divided into plosive, affricate, fricative, nasal sound and lateral.In mandarin, have 21 initial consonants.Major part initial consonant is voiceless sound, only has part initial consonant to be voiced sound.Simple or compound vowel of a Chinese syllable is made up of vowel and compound vowel.The phonatory bands of vowel has the vibration of vocal cords, belongs to voiced sound.
Because the pronunciation characteristics of initial consonant and simple or compound vowel of a Chinese syllable is distinct, algorithm is based on the difference of the female pronunciation characteristics of sound, carry out the female cutting of sound by the catastrophe point of short-time energy and short-time zero-crossing rate parameter, the catastrophe point place of short-time energy and short-time zero-crossing rate is the female cut-off of sound.Its algorithm steps is as follows:
(1) set the voice signal of a Chinese character of input as x, its signal total length is L.Carry out framing and windowing process to this voice signal, frame length is 10ms (160 point), and frame moves as 5ms (80 points).Obtaining every frame voice signal is x i[n], n=1,2 ..., 160, i=1,2 ..., M.Wherein, floor represents and rounds downwards.
(2) to every frame voice signal x i[n], calculates short-time energy E iwith short-time zero-crossing rate Z i:
E i = Σ n = 1 160 x i 2 [ n ] ;
Z i = 1 2 Σ n = 1 160 | sgn ( x i [ n ] ) - sgn ( x i [ n - 1 ] ) | ;
In formula, sgn is sign function, that is:
sgn ( c ) = 1 , c &GreaterEqual; 0 - 1 , c < 0
(3) energy difference e (i) and zero passage rate variance z (i) of adjacent two frames is calculated, as shown in the formula:
e(i)=E i+1-E i,i=1,2,…,M-1
z(i)=Z i+1-Z i,i=1,2,…,M-1
Each value in energy difference e (i) and zero passage rate variance z (i) and threshold value T1, T2 are compared.When meeting:
E (i) >=T1, simultaneously during z (i)≤T2, if now i=I.Then I frame and I+1 frame are voice sound signal simple or compound vowel of a Chinese syllable separatrix.Get the front I frame of voice signal, this part is the initial consonant part of syllable.The value of T1 and T2, through great many of experiments, experience value is: T1=0.015, T2=8.
3 characteristics extraction
3.1 spectrum energy strengthening segment acoustic feature value F
The pronunciation device of Patients with Cleft Palate is normal, and the generation of cleft palate speech mainly betides resonance device.Based on the source-filter model of classics, the sound source excitation system of Patients with Cleft Palate is normal, and phonation occurs abnormal at vocal tract filter and oral cavity radiation place.Formant parameter is the acoustic feature value of typical sound channel filtering system, resonance peak is the important parameter characterizing vowel, and be that the initial consonant (consonant) in syllable is processed herein, therefore, adopt spectrum energy strengthening segment as the acoustic feature value of initial consonant herein.Spectrum energy strengthening segment parameter and formant parameter physical significance similar, its computing method are identical.Adopt LPC (LPC:Linear Predictive Coding) method herein, realize the estimation to the first to the five spectrum energy strengthening segment.According to the female segmentation algorithm of the sound in upper joint, obtain initial consonant voice signal x i[n], i=1,2 ..., I.To every frame initial consonant voice signal x i[n] calculates the first to the five spectrum energy strengthening segment: F i=[f i_1, f i_2, f i_3, f i_4, f i_5], i=1,2 ..., I.To the first to the five spectrum energy strengthening segment of all speech frames of initial consonant part, averaged respectively, the spectrum energy strengthening segment eigenwert obtaining initial consonant part of speech signal is:
F=[f 1,f 2,f 3,f 4,f 5]。
3.2MFCC acoustic feature value
Mel cepstral coefficients (MFCC:Mel-Frequence Cepstral Coefficients) is based on the auditory properties of people's ear.MFCC acoustic feature value, by the Homomorphic Processing to voice signal, realizes being separated sound source pumping signal and sound channel response message.In this algorithm, MFCC coefficient value is chosen as 12.
According to the female segmentation algorithm of the sound in upper joint, obtain initial consonant voice signal x i[n], i=1,2 ..., I.To every frame initial consonant voice signal x i[n] calculates MFCC eigenwert: M i=[m i_1, m i_2..., m i_12], i=1,2 ..., I.To the MFCC parameter averaged of all speech frames of initial consonant part, the MFCC acoustic feature value obtaining initial consonant part of speech signal is:
M=[m 1,m 2,m 3,m 4,m 5,m 6,m 7,m 8,m 9,m 10,m 11,m 12]。
The 3.3 acoustic feature value PSCB composed based on critical bands and short-time rating
This algorithm proposes the acoustic feature value (PSCB PowerSpectrum in Critical Bands) based on critical bands and short-time energy.Its algorithm flow is as shown in Figure 2:
According to the female segmentation algorithm of the sound in upper joint, obtain initial consonant voice signal x i[n], i=1,2 ..., I.To every frame initial consonant voice signal x i[n] carries out Short Time Fourier Transform, and wherein, counting of discrete Fourier transformation is 8192:
X i [ k ] = &Sigma; n = 0 N - 1 x i [ n ] e - j 2 &pi; N k
By Short Time Fourier Transform, calculate the short-time rating spectrum of every frame initial consonant voice signal:
S i [ k ] = X i [ k ] &CenterDot; X i * [ k ] = | X i [ k ] | 2
Then the short-time rating spectrum of each initial consonant voice signal is matrix:
S = S 1 [ k ] S 2 [ k ] . . . S I [ k ] .
Critical band is divide according to the auditory properties of people's ear, belongs to standard well known in the art.Frequency and the bandwidth of critical band are as shown in table 1.
The frequency of table 1 critical band and bandwidth (hertz Hz)
Critical band Low end frequency High end frequency Bandwidth Critical band Low end frequency High end frequency Bandwidth
0 0 100 100 11 1480 1720 240
11 100 200 100 12 1720 2000 280
2 200 300 100 13 2000 2320 320
3 300 400 100 14 2320 2700 380
4 400 510 110 15 2700 3150 450
5 510 630 120 16 3150 3700 550
6 630 770 140 17 3700 4400 700
7 770 920 150 18 4400 5300 900
8 920 1080 160 19 5300 6400 1100
9 1080 1270 190 20 6400 7700 1300
10 1279 1480 210
To the short-time rating spectrum matrix S of initial consonant voice signal, based on frequency and the bandwidth of critical band, frequency band division is carried out to s-matrix, be divided into 20 frequency ranges altogether.Calculate the power magnitude in each frequency range and p j, j=1,2 ..., 20, finally obtain the acoustic feature value based on critical bands and short-time energy to initial consonant voice signal: PSCB=(p 1, p 2..., p 20).
3.4 based on the acoustic feature value of wavelet and wavelet package conversion with information entropy
From wavelet analysis, signal analysis is a kind of multiresolution analysis, is realized by bank of filters.Each fraction stem-butts cutting off this grade of input signal resolves into the rough approximation (general picture) of a low frequency and the detail section of a high frequency.The reconstruct of signal is the inverse process decomposed.Along with the change of wavelet scale, can realize carrying out multiscale analysis by thick and essence to things.Theoretical according to multiresolution, Mallat proposes the fast algorithm of wavelet function feedback, is called Mallat algorithm.This algorithm adopts the decomposition and reconstruction of Mallat algorithm realization wavelet and wavelet packets.
This algorithm proposes based on the acoustic feature value (WTE:Wavelet Transform based Entropy, WPE:Wavelet Packet based Entropy) of wavelet and wavelet package conversion with information entropy.Its algorithm flow as shown in Figure 3.
WTE: according to the female segmentation algorithm of the sound in upper joint, obtain initial consonant voice signal x i[n], i=1,2 ..., I.Carry out 3 layers of wavelet decomposition (wavelet decomposition tree structure figure as shown in Figure 4) to every frame voice signal, be reconstructed the leaf node of wavelet decomposition, the signal after reconstruct is to the signal after each reconstruct, calculate its information entropy (its process is as shown in Figure 5), its computing formula is:
g 1 = - &Sigma; h c 3 0 2 ( h ) log { c 3 0 2 ( h ) } ;
g 1 = - &Sigma; h c 3 1 2 ( h ) log { c 3 1 2 ( h ) } ;
g 2 = - &Sigma; h c 2 1 2 ( h ) log { c 2 1 2 ( h ) } ;
g 3 = - &Sigma; h c 1 1 2 ( h ) log { c 1 1 2 ( h ) } .
WPE: according to the female segmentation algorithm of the sound in upper joint, obtain initial consonant voice signal x i[n], i=1,2 ..., I.3 layers of WAVELET PACKET DECOMPOSITION (wavelet packet tree construction as shown in Figure 6) are carried out to every frame voice signal, the signal after the 3rd layer of WAVELET PACKET DECOMPOSITION is reconstructed.Similar in WTE algorithm, the signal after reconstruct is to the signal after each reconstruct, calculate its information entropy (its process is as shown in Figure 7), its computing formula is: e w = - &Sigma; r d 3 w 2 ( r ) log { d 3 w 2 ( r ) } , w = 0,1 , . . 7 .
4 algorithm for pattern recognitions
The KNN sorting algorithm of 4.1 classics
KNN algorithm is one of mode identification method of classics, its basic thought is: sample to be tested finds K the training sample closest to test sample book in feature space, statistics and analysis is carried out, the classification finding quantity maximum or the highest classification of similarity to searching out K training sample.This test sample book is identified as and belongs to this classification.
In KNN recognizer used herein, the number K value of arest neighbors is 5.Its calculation procedure is as follows:
(1) known syllable verbal audio signal containing glottal stop and the known syllable verbal audio signal not containing glottal stop is gathered as training sample, wherein form a class sample set by the syllable verbal audio signal of glottal stop, syllable verbal audio signal without glottal stop forms another kind of sample set, and each classification is denoted as C i(i=1,2).
(2) to sample to be tested and training sample, identical acoustic feature value is calculated: the one in the eigenwert enumerated in Section 3.
(3) calculate the distance of sample to be tested and all training samples, the computing formula of its distance is as follows: wherein x is sample to be tested eigenwert, and y is training sample eigenwert, and N is eigenwert number.
(4) distance of sample to be tested to all training samples is sorted, get front 5 training samples closest to sample to be tested, in this classification belonging to 5 training samples, the classification C that quantity is maximum ibe the classification of this sample to be tested.
The 4.2 KNN sorting algorithms improved
This algorithm improves KNN algorithm, proposes to be weighted the eigenwert in class.
(1) to training sample and sample to be tested, identical acoustic feature value is calculated: F, MFCC, PSCB, WTE, WPE.Be a vector by these five acoustic feature value sequential concatenation, as eigenwert.For each initial consonant voice signal, the dimension of each parameter is respectively: F:5 dimension, MFCC:12 dimension, PSCB:20 dimension, WTE:4 dimension, WPE:8 dimension.
(2) distance of sample to be tested and all training samples is calculated.When calculating sample to be tested to each training sample distance, different weights are composed to each acoustic feature value.F composes weights a, MFCC composes weights b, PSCB composes weights c, and WTE composes weights d, and WPE composes weights e.The computing formula of its distance is improved to:
D 1 = &Sigma; l = 1 5 a ( x l - y l ) 2 + &Sigma; l = 6 17 b ( x l - y l ) 2 + &Sigma; l = 18 37 c ( x l - y l ) 2 + &Sigma; l = 38 41 d ( x l - y l ) 2 + &Sigma; l = 42 49 e ( x l - y l ) 2
(3) weights that each acoustic feature value is corresponding be preferably: application KNN sorter time, each acoustic feature value obtain to the accuracy differentiated with or without glottal stop two kind.That is, with the sample that the spectrum energy strengthening segment eigenwert of the sample of two sample spaces is KNN model of cognition, the recognition correct rate of KNN model of cognition is now a; With the sample that the MFCC acoustic feature value of the sample of two sample spaces is KNN model of cognition, the recognition correct rate of KNN model of cognition is now b; With the sample that the critical bands short-time rating spectroscopic eigenvalue of the sample of two sample spaces is KNN model of cognition, the recognition correct rate of KNN model of cognition is now c; With the sample that the wavelet transformation of the sample of two sample spaces and Information Entropy Features value are KNN model of cognition, the recognition correct rate of KNN model of cognition is now d; With the sample that the wavelet package transforms of the sample of two sample spaces and Information Entropy Features value are KNN model of cognition, the recognition correct rate of KNN model of cognition is now e.
4.3SVM algorithm for pattern recognition
Support vector machine (Support Vector Machines, SVM) pattern recognition classifier algorithm is widely used in Speech processing.SVM structure based risk minimization principle builds the lineoid of an optimizing decision, and the distance between the two class samples making this anomaly face, plane both sides nearest maximizes, thus provides good generalization ability to classification problem.SVM is to two classification classification problem Be very effective.The common kernel function of SVM has polynomial function, radial basis function, multi-layer perception(MLP) etc.Gaussian kernel function is the most frequently used radial basis function, has quite high dirigibility.Some researchs also show that this kernel function obtains better effects to Speech processing.Use gaussian kernel function herein, realize the differentiation with or without glottal stop two kind.Its calculation procedure is as follows:
(1) to sample to be tested and training sample, identical acoustic feature value is calculated: as spectrum energy strengthening segment eigenwert F.
(2) the spectrum energy strengthening segment eigenwert of application training sample, to the training of SVM model.
(3) by the spectrum energy strengthening segment eigenwert value of test sample book, input the SVM trained, obtain computing machine automatic discrimination result.
Training sample set in this algorithm contains the cleft palate speech of 4-11 year child cleft palate patient.Recording is carried out in professional recording room, and the time marquis of recording, requires that speaker keeps articulation type that is the most natural, that be accustomed to most.Speaker's lip distance Creative Hs300 digitizing microphone about 5cm, sends out the syllable in " West China Hospital of Stomatology, Sichuan University's speech therapy center mandarin structure sound meter " with the speed of about every 2s/ syllable.Cleft palate speech database used herein comprises children women's Patients with Cleft Palate voice 28 parts, children male sex's Patients with Cleft Palate voice 30 parts.The cleft palate speech collected independently is sentenced separately by 3 professional voice teachers and is listened, and in each syllable (Chinese character), whether initial consonant part glottal stop occurs provides judgement.
5 accuracy confirmatory experiments
The present invention adopts 10 k to roll over cross validation (k-fold cross validation) and verifies the recognition correct rate of each class model in Section 4.The value of k is 10.(being sentenced by professional voice teacher listens in each syllable (Chinese character) to get the syllable verbal audio signal 300 parts comprising glottal stop and do not comprise glottal stop, whether initial consonant part glottal stop occurs provides judgement), be master sample by these 300 parts of syllable verbal audio signals.The various eigenwerts of master sample are extracted according to preceding method.
The KNN sorting algorithm checking of 5.1 classics
300 parts of master samples are divided into ten parts at random, in turn will wherein 9 parts as training sample, remaining 1 part as test sample book.
Whether utilize classical KNN sorting algorithm identification test sample book containing glottal stop, listen result to compare recognition result and sentencing of professional voice teacher, calculating recognition result in this test sample book is correct number, calculates accuracy.
Using second part as test sample book, all the other 9 parts, as training sample, calculate the accuracy of recognition result; By that analogy, using residue eight parts successively as test sample book, remain this accuracy as training sample calculating recognition result of 9 increments.
Such traversal once after, obtain 10 accuracy, calculate the average of these 10 accuracy.
Again these 300 parts of master samples are divided into ten parts at random, successively will be every a as test sample book, remain nine parts as training sample, obtain 10 accuracy, calculate the average of these 10 accuracy.The like, then do 8 such random division and accuracy mean value computation.Finally obtain 10 accuracy averages, then these 10 accuracy averages are averaged again, just obtain the accuracy of this model of cognition.
The 5.2 KNN sorting algorithm checkings improved
Similar with 5.1 joint methods, difference is the eigenwert of KNN disaggregated model sample characteristics being replaced with improvement, model of cognition is replaced with the KNN disaggregated model of improvement.Calculate the accuracy of this model.
5.3SVM algorithm for pattern recognition
Similar with 5.1 joint methods, model of cognition is replaced with SVM model of cognition.Calculate the accuracy of this model.
Finally draw the recognition correct rate of all kinds of model of cognition, see table 2.The accuracy of the KNN disaggregated model after visible improvement is the highest.
Table 2 cleft palate speech is with or without the automatic recognition correct rate of glottal stop
The present invention is not limited to aforesaid embodiment.The present invention expands to any new feature of disclosing in this manual or any combination newly, and the step of the arbitrary new method disclosed or process or any combination newly.

Claims (10)

1. a cleft palate speech glottal stop automatic identification algorithm, is characterized in that, comprising:
Step 1: gather syllable verbal audio signal to be measured;
Step 2: the female cutting of sound is carried out to described syllable verbal audio signal, retains initial consonant voice signal;
Step 3: the eigenwert extracting described initial consonant voice signal;
Step 4: described eigenwert sent in trained model of cognition, model of cognition judges whether there is glottal stop in described syllable verbal audio signal according to described eigenwert.
2. a kind of cleft palate speech glottal stop automatic identification algorithm according to claim 1, it is characterized in that, described step 2 comprises further:
Step 21: windowing framing is carried out to syllable verbal audio signal and obtains some speech frame x i[n], i gets 1,2,3 ... M, n get 1,2,3 ... N, N are frame length;
Step 22: the short-time energy E calculating each speech frame iand short-time zero-crossing rate Z i;
Step 23: energy difference e (i) and zero passage rate variance z (i): e (the i)=E that calculate adjacent two frames i+1-E i, i=1,2 ..., M-1, z (i)=Z i+1-Z i, i=1,2 ..., M-1;
Step 24: each energy difference e (i) is compared with threshold value T1, each zero passage rate variance z (i) is compared with threshold value T2; When meeting e (i)>=T1, simultaneously during z (i)≤T2, if now i=I; Then get speech frame x i[n], i gets 1,2,3 ... I is the initial consonant voice signal of syllable verbal audio signal.
3. a kind of cleft palate speech glottal stop automatic identification algorithm according to claim 1, it is characterized in that, the initial consonant phonic signal character value that described step 3 is extracted comprise in following characteristics value one or more: spectrum energy strengthening segment eigenwert, MFCC acoustic feature value, critical bands short-time rating spectroscopic eigenvalue, wavelet transformation and Information Entropy Features value, wavelet package transforms and Information Entropy Features value; Wherein,
Extract the spectrum energy strengthening segment eigenwert of initial consonant voice signal: the first to the five spectrum energy strengthening segment eigenwert calculating every frame initial consonant speech frame; Calculate the first spectrum energy strengthening segment eigenwert of the first spectrum energy strengthening segment eigenwert average as initial consonant voice signal of whole initial consonant speech frame, by that analogy, calculate the second to the five spectrum energy strengthening segment eigenwert of initial consonant voice signal;
Extract the MFCC acoustic feature value of initial consonant voice signal: the MFCC acoustic feature value calculating every frame initial consonant speech frame, wherein MFCC coefficient value gets 12, obtains 12 MFCC eigenwerts of every frame initial consonant speech frame; Using the MFCC eigenwert of the mean value of a MFCC eigenwert of whole initial consonant voice signal frame as initial consonant voice signal, by that analogy, the second to the ten two MFCC eigenwert of initial consonant voice signal is calculated;
Extract the critical bands short-time rating spectroscopic eigenvalue of initial consonant voice signal: Short Time Fourier Transform is carried out to every frame initial consonant speech frame, obtain the short-time rating spectrum of every frame initial consonant speech frame; According to critical bands division rule, the short-time rating of every frame initial consonant speech frame spectrum is divided into 20 critical bands; The power of the first critical bands of whole initial consonant speech frame is superimposed and obtains the first critical bands short-time rating spectroscopic eigenvalue of initial consonant voice signal, obtain the second to the two ten critical bands short-time rating spectroscopic eigenvalue by that analogy;
Extract wavelet transformation and the Information Entropy Features value of initial consonant voice signal: three layers of wavelet transformation are carried out to every frame initial consonant speech frame, the signal after obtaining 4 reconstruct is reconstructed to the signal after three layers of wavelet decomposition, calculates the information entropy of the signal after each reconstruct; The mean value of the information entropy of the signal after reconstructing first of whole initial consonant voice signal frame is as the first wavelet transformation of initial consonant voice signal and Information Entropy Features value, by that analogy, the second to the four wavelet transformation and the Information Entropy Features value of initial consonant voice signal is calculated;
Extract wavelet package transforms and the Information Entropy Features value of initial consonant voice signal: three layers of wavelet package transforms are carried out to every frame initial consonant speech frame, signal after obtaining 8 reconstruct is reconstructed to the signal after three layers of WAVELET PACKET DECOMPOSITION, calculates the information entropy of the signal after each reconstruct; The mean value of the information entropy of the signal after reconstructing first of whole initial consonant voice signal frame is as the first wavelet package transforms of initial consonant voice signal and Information Entropy Features value, by that analogy, the second to the eight wavelet transformation and the Information Entropy Features value of initial consonant voice signal is calculated.
4. a kind of cleft palate speech glottal stop automatic identification algorithm according to claim 3, it is characterized in that, step 4 comprises further:
Choose the syllable verbal audio signal some composition true training sample set of known packets containing glottal stop, choose the known false training sample set of the some compositions of syllable verbal audio signal not comprising glottal stop;
Extract the spectrum energy strengthening segment eigenwert of each sample of two training sample sets, MFCC acoustic feature value, critical bands short-time rating spectroscopic eigenvalue, wavelet transformation and Information Entropy Features value and wavelet package transforms and Information Entropy Features value;
The initial consonant phonic signal character value of the syllable verbal audio signal to be measured that obtaining step 3 obtains;
Calculate the initial consonant phonic signal character value of this syllable verbal audio signal to be measured and the distance of each training sample:
D 1 = &Sigma; l = 1 5 a ( x l - y 1 ) 2 + &Sigma; l = 6 17 b ( x l - y l ) 2 + &Sigma; l = 18 37 c ( x l - y l ) 2 + &Sigma; l = 38 41 d ( x l - y l ) 2 + &Sigma; l = 42 49 e ( x l - y l ) 2 ;
Choose the shortest some training samples of initial consonant phonic signal character value distance from syllable verbal audio signal to be measured, wherein belong to the training sample of true training sample set maximum time then think in described syllable verbal audio signal to be measured containing glottal stop;
Wherein: x l, l gets 1 ~ 5, is the first to the five spectrum energy strengthening segment eigenwert of syllable verbal audio signal to be measured;
X l, l gets 6 ~ 17, is the first to the ten two MFCC acoustic feature value of syllable verbal audio signal to be measured;
X l, l gets 18 ~ 37, is the first to the two ten critical bands short-time rating spectroscopic eigenvalue of syllable verbal audio signal to be measured;
X l, l gets 38 ~ 41, is the first to the four wavelet transformation and the Information Entropy Features value of syllable verbal audio signal to be measured;
X l, l gets 42 ~ 49, is the first to the eight wavelet package transforms and the Information Entropy Features value of syllable verbal audio signal to be measured;
Y l, l gets 1 ~ 5, is the first to the five spectrum energy strengthening segment eigenwert of training sample;
Y l, l gets 6 ~ 17, is the first to the ten two MFCC acoustic feature value of training sample;
Y l, l gets 18 ~ 37, is the first to the two ten critical bands short-time rating spectroscopic eigenvalue of training sample;
Y l, l gets 38 ~ 41, is the first to the four wavelet transformation and the Information Entropy Features value of training sample;
Y l, l gets 42 ~ 49, is the first to the eight wavelet package transforms and the Information Entropy Features value of training sample;
A, b, c, d, e are weights.
5. a kind of cleft palate speech glottal stop automatic identification algorithm according to claim 4, it is characterized in that, the value acquisition methods of described weights comprises:
Choose the syllable verbal audio signal some composition true sample space of known packets containing glottal stop, choose the known syllable verbal audio signal some composition dummy copies space not comprising glottal stop;
Extract the spectrum energy strengthening segment eigenwert of each sample of two sample spaces, MFCC acoustic feature value, critical bands short-time rating spectroscopic eigenvalue, wavelet transformation and Information Entropy Features value and wavelet package transforms and Information Entropy Features value;
With the sample that the spectrum energy strengthening segment eigenwert of the sample of two sample spaces is KNN model of cognition; The recognition correct rate of KNN model of cognition is now a;
With the sample that the MFCC acoustic feature value of the sample of two sample spaces is KNN model of cognition; The recognition correct rate of KNN model of cognition is now b;
With the sample that the critical bands short-time rating spectroscopic eigenvalue of the sample of two sample spaces is KNN model of cognition; The recognition correct rate of KNN model of cognition is now c;
With the sample that the wavelet transformation of the sample of two sample spaces and Information Entropy Features value are KNN model of cognition; The recognition correct rate of KNN model of cognition is now d;
With the sample that the wavelet package transforms of the sample of two sample spaces and Information Entropy Features value are KNN model of cognition; The recognition correct rate of KNN model of cognition is now e.
6. a cleft palate speech glottal stop automatic identification equipment, is characterized in that, comprising:
Voice collecting unit, for gathering syllable verbal audio signal to be measured;
Initial consonant extraction unit, for carrying out the female cutting of sound to described syllable verbal audio signal, retains initial consonant voice signal; Initial consonant characteristics extraction unit, for extracting the eigenwert of described initial consonant voice signal;
Recognition unit, for sending in trained model of cognition by described eigenwert, model of cognition judges whether there is glottal stop in described syllable verbal audio signal according to described eigenwert.
7. a kind of cleft palate speech glottal stop automatic identification equipment according to claim 6, it is characterized in that, described initial consonant extraction unit comprises further:
Windowing framing subelement, obtains some speech frame x for carrying out windowing framing to syllable verbal audio signal i[n], i gets 1,2,3 ... M, n get 1,2,3 ... N, N are frame length;
Short-time energy computing unit, for calculating the short-time energy E of each speech frame i;
Short-time zero-crossing rate computing unit, for calculating each speech frame short-time zero-crossing rate Z i;
Energy difference computing unit, for calculating energy difference e (i) of adjacent two frames: e (i)=E i+1-E i, i=1,2 ..., M-1;
Zero passage rate variance computing unit, for calculating zero passage rate variance z (i) of adjacent two frames: z (i)=Z i+1-Z i, i=1,2 ..., M-1;
Comparing unit, for each energy difference e (i) being compared with threshold value T1, compares each zero passage rate variance z (i) with threshold value T2; When meeting e (i)>=T1, simultaneously during z (i)≤T2, if now i=I; Then get speech frame x i[n], i gets 1,2,3 ... I is the initial consonant voice signal of syllable verbal audio signal.
8. a kind of cleft palate speech glottal stop automatic identification equipment according to claim 6, is characterized in that, initial consonant characteristics extraction unit comprise in following subelement one or more:
Spectrum energy strengthening segment characteristics extraction subelement, for calculating the first to the five spectrum energy strengthening segment eigenwert of every frame initial consonant speech frame; Calculate the first spectrum energy strengthening segment eigenwert of the first spectrum energy strengthening segment eigenwert average as initial consonant voice signal of whole initial consonant speech frame, by that analogy, calculate the second to the five spectrum energy strengthening segment eigenwert of initial consonant voice signal;
MFCC acoustic feature value extracts subelement, and for calculating the MFCC acoustic feature value of every frame initial consonant speech frame, wherein MFCC coefficient value gets 12, obtains 12 MFCC eigenwerts of every frame initial consonant speech frame; Using the MFCC eigenwert of the mean value of a MFCC eigenwert of whole initial consonant voice signal frame as initial consonant voice signal, by that analogy, the second to the ten two MFCC eigenwert of initial consonant voice signal is calculated;
Critical bands short-time rating spectroscopic eigenvalue extracts subelement, for carrying out Short Time Fourier Transform to every frame initial consonant speech frame, obtains the short-time rating spectrum of every frame initial consonant speech frame; According to critical bands division rule, the short-time rating of every frame initial consonant speech frame spectrum is divided into 20 critical bands; The power of the first critical bands of whole initial consonant speech frame is superimposed and obtains the first critical bands short-time rating spectroscopic eigenvalue of initial consonant voice signal, obtain the second to the two ten critical bands short-time rating spectroscopic eigenvalue by that analogy;
Wavelet transformation and Information Entropy Features value extract subelement, for carrying out three layers of wavelet transformation to every frame initial consonant speech frame, being reconstructed the signal after obtaining 4 reconstruct, calculating the information entropy of the signal after each reconstruct to the signal after three layers of wavelet decomposition; The mean value of the information entropy of the signal after reconstructing first of whole initial consonant voice signal frame is as the first wavelet transformation of initial consonant voice signal and Information Entropy Features value, by that analogy, the second to the four wavelet transformation and the Information Entropy Features value of initial consonant voice signal is calculated;
Wavelet package transforms and Information Entropy Features value extract subelement, for carrying out three layers of wavelet package transforms to every frame initial consonant speech frame, being reconstructed the signal after obtaining 8 reconstruct, calculating the information entropy of the signal after each reconstruct to the signal after three layers of WAVELET PACKET DECOMPOSITION; The mean value of the information entropy of the signal after reconstructing first of whole initial consonant voice signal frame is as the first wavelet package transforms of initial consonant voice signal and Information Entropy Features value, by that analogy, the second to the eight wavelet package transforms and the Information Entropy Features value of initial consonant voice signal is calculated.
9. a kind of cleft palate speech glottal stop automatic identification equipment according to claim 8, it is characterized in that, recognition unit comprises further:
Sample space collects unit, for choosing the syllable verbal audio signal some composition true training sample set of known packets containing glottal stop, chooses the known false training sample set of the some compositions of syllable verbal audio signal not comprising glottal stop;
Sample characteristics extraction unit, for extracting the spectrum energy strengthening segment eigenwert of each training sample of two training sample sets, MFCC acoustic feature value, critical bands short-time rating spectroscopic eigenvalue, wavelet transformation and Information Entropy Features value and wavelet package transforms and Information Entropy Features value;
Syllable verbal audio signal characteristic value acquiring unit to be measured, for receiving the initial consonant phonic signal character value of the syllable verbal audio signal to be measured that initial consonant characteristics extraction unit extracts;
Metrics calculation unit, the distance for the initial consonant phonic signal character value and each training sample that calculate this syllable verbal audio signal to be measured:
D 1 = &Sigma; l = 1 5 a ( x l - y 1 ) 2 + &Sigma; l = 6 17 b ( x l - y l ) 2 + &Sigma; l = 18 37 c ( x l - y l ) 2 + &Sigma; l = 38 41 d ( x l - y l ) 2 + &Sigma; l = 42 49 e ( x l - y l ) 2 ;
Choose the shortest some training samples of initial consonant phonic signal character value distance from syllable verbal audio signal to be measured, wherein belong to the training sample of true training sample set maximum time then think in described syllable verbal audio signal to be measured containing glottal stop;
Wherein: x l, l gets 1 ~ 5, is the first to the five spectrum energy strengthening segment eigenwert of syllable verbal audio signal to be measured;
X l, l gets 6 ~ 17, is the first to the ten two MFCC acoustic feature value of syllable verbal audio signal to be measured;
X l, l gets 18 ~ 37, is the first to the two ten critical bands short-time rating spectroscopic eigenvalue of syllable verbal audio signal to be measured;
X l, l gets 38 ~ 41, is the first to the four wavelet transformation and the Information Entropy Features value of syllable verbal audio signal to be measured;
X l, l gets 42 ~ 49, is the first to the eight wavelet package transforms and the Information Entropy Features value of syllable verbal audio signal to be measured;
Y l, l gets 1 ~ 5, is the first to the five spectrum energy strengthening segment eigenwert of training sample;
Y l, l gets 6 ~ 17, is the first to the ten two MFCC acoustic feature value of training sample;
Y l, l gets 18 ~ 37, is the first to the two ten critical bands short-time rating spectroscopic eigenvalue of training sample;
Y l, l gets 38 ~ 41, is the first to the four wavelet transformation and the Information Entropy Features value of training sample;
Y l, l gets 42 ~ 49, is the first to the eight wavelet package transforms and the Information Entropy Features value of training sample;
A, b, c, d, e are weights.
10. a kind of cleft palate speech glottal stop automatic identification equipment according to claim 9, it is characterized in that, the value acquisition methods of described weights comprises:
Choose the syllable verbal audio signal some composition true sample space of known packets containing glottal stop, choose the known syllable verbal audio signal some composition dummy copies space not comprising glottal stop;
Extract the spectrum energy strengthening segment eigenwert of each sample of two sample spaces, MFCC acoustic feature value, critical bands short-time rating spectroscopic eigenvalue, wavelet transformation and Information Entropy Features value and wavelet package transforms and Information Entropy Features value;
With the sample that the spectrum energy strengthening segment eigenwert of the sample of two sample spaces is KNN model of cognition; The recognition correct rate of KNN model of cognition is now a;
With the sample that the MFCC acoustic feature value of the sample of two sample spaces is KNN model of cognition; The recognition correct rate of KNN model of cognition is now b;
With the sample that the critical bands short-time rating spectroscopic eigenvalue of the sample of two sample spaces is KNN model of cognition; The recognition correct rate of KNN model of cognition is now c;
With the sample that the wavelet transformation of the sample of two sample spaces and Information Entropy Features value are KNN model of cognition; The recognition correct rate of KNN model of cognition is now d;
With the sample that the wavelet package transforms of the sample of two sample spaces and Information Entropy Features value are KNN model of cognition; The recognition correct rate of KNN model of cognition is now e.
CN201510257555.9A 2015-05-19 2015-05-19 Cleft palate voice glottal stop automatic identification algorithm and device Pending CN104992707A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510257555.9A CN104992707A (en) 2015-05-19 2015-05-19 Cleft palate voice glottal stop automatic identification algorithm and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510257555.9A CN104992707A (en) 2015-05-19 2015-05-19 Cleft palate voice glottal stop automatic identification algorithm and device

Publications (1)

Publication Number Publication Date
CN104992707A true CN104992707A (en) 2015-10-21

Family

ID=54304510

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510257555.9A Pending CN104992707A (en) 2015-05-19 2015-05-19 Cleft palate voice glottal stop automatic identification algorithm and device

Country Status (1)

Country Link
CN (1) CN104992707A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105286798A (en) * 2015-11-04 2016-02-03 深圳市福生医疗器械有限公司 Velopharyngeal closure detection device and method
CN105679332A (en) * 2016-03-09 2016-06-15 四川大学 Cleft palate speech initial and final automatic segmentation method and system
CN107274886A (en) * 2016-04-06 2017-10-20 中兴通讯股份有限公司 A kind of audio recognition method and device
CN107293302A (en) * 2017-06-27 2017-10-24 苏州大学 A kind of sparse spectrum signature extracting method being used in voice lie detection system
CN108596898A (en) * 2018-04-27 2018-09-28 四川大学 The method for semi-automatically detecting of masopharyngeal mirror lower jaw pharynx closure based on image procossing
CN108596897A (en) * 2018-04-27 2018-09-28 四川大学 The full-automatic detection method of masopharyngeal mirror lower jaw pharynx closure based on image procossing
CN111354375A (en) * 2020-02-25 2020-06-30 咪咕文化科技有限公司 Cry classification method, device, server and readable storage medium
CN111883169A (en) * 2019-12-12 2020-11-03 马上消费金融股份有限公司 Audio file cutting position processing method and device

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101290766A (en) * 2007-04-20 2008-10-22 西北民族大学 Syllable splitting method of Tibetan language of Anduo
CN101825489A (en) * 2010-01-29 2010-09-08 浙江大学 Method for separating OLTC (On-Load Tap Changer) vibration signals of power transformer
CN101829689A (en) * 2010-03-31 2010-09-15 北京科技大学 Drift fault recognition method of hot-rolling strip steel based on sound signals
CN102800316A (en) * 2012-08-30 2012-11-28 重庆大学 Optimal codebook design method for voiceprint recognition system based on nerve network
CN102968986A (en) * 2012-11-07 2013-03-13 华南理工大学 Overlapped voice and single voice distinguishing method based on long time characteristics and short time characteristics
CN103308919A (en) * 2012-03-12 2013-09-18 中国科学院声学研究所 Fish identification method and system based on wavelet packet multi-scale information entropy
CN103400580A (en) * 2013-07-23 2013-11-20 华南理工大学 Method for estimating importance degree of speaker in multiuser session voice
CN104021785A (en) * 2014-05-28 2014-09-03 华南理工大学 Method of extracting speech of most important guest in meeting

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101290766A (en) * 2007-04-20 2008-10-22 西北民族大学 Syllable splitting method of Tibetan language of Anduo
CN101825489A (en) * 2010-01-29 2010-09-08 浙江大学 Method for separating OLTC (On-Load Tap Changer) vibration signals of power transformer
CN101829689A (en) * 2010-03-31 2010-09-15 北京科技大学 Drift fault recognition method of hot-rolling strip steel based on sound signals
CN103308919A (en) * 2012-03-12 2013-09-18 中国科学院声学研究所 Fish identification method and system based on wavelet packet multi-scale information entropy
CN102800316A (en) * 2012-08-30 2012-11-28 重庆大学 Optimal codebook design method for voiceprint recognition system based on nerve network
CN102968986A (en) * 2012-11-07 2013-03-13 华南理工大学 Overlapped voice and single voice distinguishing method based on long time characteristics and short time characteristics
CN103400580A (en) * 2013-07-23 2013-11-20 华南理工大学 Method for estimating importance degree of speaker in multiuser session voice
CN104021785A (en) * 2014-05-28 2014-09-03 华南理工大学 Method of extracting speech of most important guest in meeting

Non-Patent Citations (21)

* Cited by examiner, † Cited by third party
Title
何凌,袁亚南,尹恒,张桠童,张劲,刘奇,李杨: ""腭裂语音高鼻音等级自动识别算法研究"", 《四川大学学报工程科学版》 *
何凌,袁亚南,尹恒,张桠童,张劲,刘奇,李杨: "腭裂语音高鼻音等级自动识别算法研究", 《四川大学学报工程科学版》 *
向彪: ""基于超声波和视觉信息融合的语音提示技术研究"", 《中国优秀硕士学位论文全文数据库信息科技辑》 *
向彪: "基于超声波和视觉信息融合的语音提示技术研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 *
唐娜娜: ""基于稳健性PLPC的抗噪语音识别方法的研究"", 《中国优秀硕士学位论文全文数据库信息科技辑》 *
夏冬冬: ""非平稳环境下的语音增强算法研究"", 《中国优秀硕士学位论文全文数据库信息科技辑》 *
夏冬冬: "非平稳环境下的语音增强算法研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 *
尹恒,何凌,张劲,李杨: ""基于非线性参数的腭裂患者高鼻音自动识别"", 《计算机工程与设计》 *
尹恒,何凌,张劲,李杨: "基于非线性参数的腭裂患者高鼻音自动识别", 《计算机工程与设计》 *
张艳燕,张嵘: ""一种基于时域参数的声韵分割及声母分类方法"", 《全国人机语音通讯学术会议》 *
张艳燕,张嵘: "一种基于时域参数的声韵分割及声母分类方法", 《全国人机语音通讯学术会议》 *
林志敏: ""数字助听器中回声消除与响度补偿关键技术研究"", 《中国优秀硕士学位论文全文数据库信息科技辑》 *
林志敏: "数字助听器中回声消除与响度补偿关键技术研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 *
王国民: "《唇腭裂修复术与语音治疗》", 31 January 2013 *
王攀,沈继忠,施锦河: ""基于小波变换和时域能量熵的P300特征提取算法"", 《仪器仪表学报》 *
王攀,沈继忠,施锦河: "基于小波变换和时域能量熵的P300特征提取算法", 《仪器仪表学报》 *
赵力: "《语音信号处理》", 30 June 2009 *
陈盼弟: ""基于HMM和LPCC的腭裂语音辅音省略自动识别算法"", 《信息与电脑》 *
陈盼弟: "基于HMM和LPCC的腭裂语音辅音省略自动识别算法", 《信息与电脑》 *
顾亚强: ""非特定人语音识别关键技术研究"", 《中国优秀硕士学位论文全文数据库信息科技辑》 *
顾亚强: "非特定人语音识别关键技术研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105286798B (en) * 2015-11-04 2018-07-20 深圳市福生医疗器械有限公司 Velopharyngeal closure detection device and detection method
CN105286798A (en) * 2015-11-04 2016-02-03 深圳市福生医疗器械有限公司 Velopharyngeal closure detection device and method
CN105679332B (en) * 2016-03-09 2019-06-11 四川大学 A kind of cleft palate speech sound mother automatic segmentation method and system
CN105679332A (en) * 2016-03-09 2016-06-15 四川大学 Cleft palate speech initial and final automatic segmentation method and system
CN107274886A (en) * 2016-04-06 2017-10-20 中兴通讯股份有限公司 A kind of audio recognition method and device
CN107274886B (en) * 2016-04-06 2021-10-15 中兴通讯股份有限公司 Voice recognition method and device
CN107293302A (en) * 2017-06-27 2017-10-24 苏州大学 A kind of sparse spectrum signature extracting method being used in voice lie detection system
CN108596897A (en) * 2018-04-27 2018-09-28 四川大学 The full-automatic detection method of masopharyngeal mirror lower jaw pharynx closure based on image procossing
CN108596897B (en) * 2018-04-27 2021-08-20 四川大学 Image processing-based full-automatic detection method for nasopharyngoscope hypopharynx closing degree
CN108596898B (en) * 2018-04-27 2021-08-24 四川大学 Semi-automatic detection method for nasopharyngoscope hypopharynx closing degree based on image processing
CN108596898A (en) * 2018-04-27 2018-09-28 四川大学 The method for semi-automatically detecting of masopharyngeal mirror lower jaw pharynx closure based on image procossing
CN111883169A (en) * 2019-12-12 2020-11-03 马上消费金融股份有限公司 Audio file cutting position processing method and device
CN111354375A (en) * 2020-02-25 2020-06-30 咪咕文化科技有限公司 Cry classification method, device, server and readable storage medium

Similar Documents

Publication Publication Date Title
CN104992707A (en) Cleft palate voice glottal stop automatic identification algorithm and device
Sroka et al. Human and machine consonant recognition
CN104050965A (en) English phonetic pronunciation quality evaluation system with emotion recognition function and method thereof
CN105825852A (en) Oral English reading test scoring method
Quintas et al. Automatic Prediction of Speech Intelligibility Based on X-Vectors in the Context of Head and Neck Cancer.
Ryant et al. Highly accurate mandarin tone classification in the absence of pitch information
CN103405217A (en) System and method for multi-dimensional measurement of dysarthria based on real-time articulation modeling technology
CN110942784A (en) Snore classification system based on support vector machine
CN109300339A (en) A kind of exercising method and system of Oral English Practice
CN112397074A (en) Voiceprint recognition method based on MFCC (Mel frequency cepstrum coefficient) and vector element learning
JP2023018658A (en) Difficult airway evaluation method and device based on machine learning voice technology
Nieto et al. Pattern recognition of hypernasality in voice of patients with Cleft and Lip Palate
Cai et al. The best input feature when using convolutional neural network for cough recognition
Neto et al. Feature estimation for vocal fold edema detection using short-term cepstral analysis
CN114550701A (en) Deep neural network-based Chinese electronic larynx voice conversion device and method
Lv et al. Objective evaluation method of broadcasting vocal timbre based on feature selection
Baquirin et al. Artificial neural network (ANN) in a small dataset to determine neutrality in the pronunciation of english as a foreign language in filipino call center agents: Neutrality classification of Filipino call center agent's pronunciation
Nwe et al. Stress classification using subband based features
CN113129923A (en) Multi-dimensional singing playing analysis evaluation method and system in art evaluation
CN106297805A (en) A kind of method for distinguishing speek person based on respiratory characteristic
Sahoo et al. Detection of speech-based physical load using transfer learning approach
Gomathy et al. Gender clustering and classification algorithms in speech processing: a comprehensive performance analysis
Liu et al. Hypemasality detection in cleft palate speech based on natural computation
Koolagudi et al. Spectral features for emotion classification
Gore et al. Disease detection using voice analysis

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20151021