CN109727608A - A kind of ill voice appraisal procedure based on Chinese speech - Google Patents
A kind of ill voice appraisal procedure based on Chinese speech Download PDFInfo
- Publication number
- CN109727608A CN109727608A CN201711010103.6A CN201711010103A CN109727608A CN 109727608 A CN109727608 A CN 109727608A CN 201711010103 A CN201711010103 A CN 201711010103A CN 109727608 A CN109727608 A CN 109727608A
- Authority
- CN
- China
- Prior art keywords
- voice
- speech
- energy
- ill
- feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- Electrically Operated Instructional Devices (AREA)
Abstract
The present invention provides a kind of ill voice appraisal procedure based on Chinese speech, includes the following steps: to complete using Chinese automatic speech verification system to complete phone segmentation to the continuous voice signal of input and its automatic aligning for corresponding to text;According to the time series of the phone segmentation, the continuous speech signal is subjected to cutting and is divided into vowel, voiceless consonant and voiced consonant, then carries out the extraction of feature;The feature composition characteristic parameter sets of the extraction, the set of characteristic parameters pass through the screening of feature selector, then input classifier, obtain the careful scoring of voice problem.This method uses extensive normal voice database training automatic speech recognition system, classify according to the sounding feature of different phonemes to the voice sequence of cutting, it is directed to different phoneme type design feature parameters respectively, it finally obtains for the objective whole scoring of voice problem and each careful scoring, provides important references for clinical diagnosis and rehabilitation.
Description
Technical field
The present invention relates to voice detection technique fields, and in particular to a kind of ill voice assessment side based on Chinese speech
Method.
Background technique
Voice is the carrier that human language is linked up.Sonification system makes a sound under the driving of vocal cord vibration, and passes through pharynx
The channel that larynx, oral cavity form propagates from.Sound carries different information, is to link up essential means between men.
The sound that sonification system issues can be described by waveform signal, referred to as voice sound signal.When phonatory organ is in normal condition
When, the vibration of vocal cords has apparent periodic law, the propagation ducts also regular variation of throat oral cavity composition, thus generates
Voice sound signal also very regular follow.
In real life, voice only is not used as mutual exchange of information, but also for singing, performs, show emotion
The purpose of.Voice Disorders can cause difficulty to our daily exchange, reduce communication efficiency.The main reason for voice disorder, had
Degree throat, improper or phonatory organ infectious disease of pronouncing.Voice disorder has apparent embodiment in voice sound signal, such as
The tone of voice, volume, formant occur the features such as sound quality, the tone color of irregular characteristic or voice occur obviously with speak
Age of people, gender, mother tongue and the different anomalous variation of culture background.
In the clinical diagnosis and therapeutic process to voice disorder, carrying out human perception evaluation to voice sound signal is a mark
Quasi- and important non-invasive methods.Although the evaluation criterion and operating procedure of human perception evaluation have developed many years, by
In lacking enough experienced speech therapists, human perception evaluates the demand being unable to satisfy in modern society.Based on number
The automatic voice of signal processing technology is assessed, and has important practical application valence for the clinical diagnosis and treatment of voice disorder
Value.Its advantage includes Noninvasive, repeatability, objectivity, reliability etc..Traditional voice analysis method is typically just analyzed
To fix the long vowel of position of articulation generation, this makes acoustic signal analysis become simple, but compared with connected speech, long vowel is deposited
In following limitation:
(1) studies have shown that voice sound signal quality is influenced by language factor, if consonant-vowel is converted, and these language
Factor cannot embody in long vowel, therefore voice sound signal when hair long vowel cannot represent throat when complicated natural language exchanges
Sound signal;
(2) long vowel sounding has many restrictions, such as persistently keeps volume, pitch and phonatory organ position, this makes long vowel
The acquisition of audio is easy flexibly not as good as nature continuous speech;
(3) in addition, also some researches show that speech therapist is not so good as using the reliability that long vowel carries out voice perception evaluation
Use continuous speech.
At present focus mostly on the research that objectively evaluates of ill voice in the evaluation for carrying out entirety to voice sound signal quality, method
There are regression analysis or classification;But the diverse clinical manifestations of actually ill voice, there is that hoarseness, sounding be laborious, dry throat, mistake
Sound, trill etc., the whole severity for only evaluating voice problem cannot distinguish that specific voice problem provides help for clinic.
Few correlative study work at present can be careful to the individual particular aspects for quantifying voice, such as trachyphonia, breath sound.We recognize
There is key effect for clinical treatment for these careful characteristic researchs.
Summary of the invention
The present invention in order to solve the problems, such as in the prior art none of these methods can the ill voice of careful evaluation, provide one
Ill voice appraisal procedure of the kind based on Chinese speech.
To solve the above-mentioned problems, the technical solution adopted by the present invention is as described below:
A kind of ill voice appraisal procedure based on Chinese speech includes the following steps: to identify using Chinese automatic language
System is completed to complete phone segmentation to the continuous voice signal of input and its automatic aligning for corresponding to text;According to the sound
The continuous speech signal is carried out cutting and is divided into vowel, voiceless consonant and voiced consonant, then carried out by the time series of plain cutting
The extraction of feature;The feature composition characteristic parameter sets of the extraction, the set of characteristic parameters pass through the sieve of feature selector
Choosing, then inputs classifier, obtains the careful scoring of voice problem.
Preferably, the Chinese automatic speech verification system is to be based on depth nerve net using normal voice database training
Network-hidden Markov model Chinese speech recognizing system.
Preferably, the Chinese automatic speech verification system uses speaker adaptation technology.
Preferably, the Chinese automatic speech verification system is right with its by continuous voice signal using Viterbi algorithm
The aligned phoneme sequence of text is answered to force alignment to complete phone segmentation work.
Preferably, the feature in the step S2 includes: the first category feature: fundamental frequency shake, amplitude jitter, signal are made an uproar
Acoustic ratio, pitch, energy intensity;Second category feature: voice interruption, un-voiced signal ratio;Third category feature: mel cepstrum coefficients;
4th category feature: low-and high-frequency energy ratio, pitch variation amount, signal noise ratio variable quantity, Continuous Energy difference, Continuous Energy difference
Variable quantity, normalized energy.
Preferably, to the parameter of the first category feature, the second category feature and the 4th category feature described in the vowel extraction;To institute
State the parameter that voiced consonant extracts the second category feature, third category feature, the 4th category feature;It is special that third class is extracted to the voiceless consonant
Sign, the parameter of the 4th category feature.
Preferably, in the 4th category feature: the calculation formula of the low-and high-frequency energy ratio HiLo are as follows:Wherein, EhiIt is 1200HZ with the energy of super band, EloFor 1200HZ or less
The energy of frequency range, E0For background noise energy;The calculation formula of the pitch variation amount var_F0 are as follows:Wherein, F0 is the fundamental frequency of a frame signal, and n is the totalframes of one section of continuous speech
Mesh, var are the var functions in Calling MATLAB;The calculation formula of the signal noise ratio variable quantity var_HNR are as follows:Wherein, HNR is signal-to-noise ratio, and n is the totalframes mesh of one section of continuous speech;
The calculation formula of the Continuous Energy difference D3_Energy are as follows:Wherein, E
It (t) is the time domain energy of t frame;
The calculation formula of the Continuous Energy difference variable quantity var_D3 are as follows:
Wherein, n is the totalframes of one section of continuous speech;The calculation formula of the normalized energy E_nor are as follows:Wherein, EvoiceFor the energy of voice signal, EnoiseIt makes an uproar energy for the bottom of mute part.
Preferably, careful scoring described in the step S3 is directed to voice problem to be assessed, the voice to be assessed
Problem includes following voice problem: Q1: the severity of the lesion of common sensation the preceding paragraph voice;Q2: feel Heterophony,
Irregularly, unintelligible;Q3: feeling in voice has expiration or gas leak phenomenon;Q4: feel pronunciation Overexertion;Q5: pronounced
There is unexpected uncontrolled pause in Cheng Zhong;Q6: feel unexpected uncontrolled fundamental frequency or tone variation;Q7: phonation
It is middle to there is quick continuous cracker;Q8: feel the variation of regular rhythmic tone or loudness;Q9: exist in voice
Apparent fundamental frequency or tone and speaker's gender, age and the incongruent exception of culture and education background;Q10: pronunciation volume is obvious
It is incongruent too large or too small with speech content.
Preferably, the careful scoring obtains severity for voice problem, the severity include: it is normal or
Slightly, moderate, serious.
The ill voice assessment system based on Chinese speech that the present invention also provides a kind of, including voice input module, use
The Speech Assessment module of any the method as above, assessment result output module, the voice input module is for receiving in real time
Recording or audio, and give the real-time recording or audio transmission to Speech Assessment module;The Speech Assessment module is used for institute
It states real-time recording or audio is assessed, and assessment result is transferred to assessment result output module;The result output module
For the assessment result to be exported.
The invention has the benefit that providing a kind of ill voice appraisal procedure based on Chinese speech, this method is used
Extensive normal voice database training, classifies to the voice sequence of cutting according to the sounding feature of different phonemes, for
Different voice sequences designs different characteristic parameters, finally obtains the careful scoring of voice problem, and the objective of voice problem is commented
Point, important references are provided for clinical diagnosis and rehabilitation.
Detailed description of the invention
The step of Fig. 1 is the ill voice appraisal procedure in the embodiment of the present invention 1 based on Chinese speech figure.
Fig. 2 is the demonstration graph of the ill voice appraisal procedure in the embodiment of the present invention 1 based on Chinese speech.
Fig. 3 is in the embodiment of the present invention 1 based on deep neural network-hidden Markov model Chinese speech recognition system
System.
Fig. 4 is the schematic diagram of scoring classifier methods in the embodiment of the present invention 1.
Fig. 5 is the HiLo Distribution value histogram for not having trachyphonia voice medial vowel in the embodiment of the present invention 1.
Fig. 6 is the HiLo Distribution value histogram for having serious trachyphonia voice medial vowel in the embodiment of the present invention 1.
Fig. 7 is the HiLo Distribution value histogram for not having breath sound voice medial vowel in the embodiment of the present invention 1.
Fig. 8 is the HiLo Distribution value histogram for having serious breath sound voice medial vowel in the embodiment of the present invention 1.
Fig. 9 is the E_nor Distribution value histogram of volume normal voice medial vowel in the embodiment of the present invention 1.
There is the E_nor Distribution value histogram of moderate, serious amount of bass voice medial vowel in Figure 10 embodiment of the present invention 1.
Figure 11 is the schematic diagram of the ill voice assessment system in the embodiment of the present invention 3 based on Chinese speech.
Specific embodiment
The present invention is described in detail by specific embodiment with reference to the accompanying drawing, for a better understanding of this hair
It is bright, but following embodiments are not intended to limit the scope of the invention.In addition, it is necessary to illustrate, diagram provided in following embodiments
The basic conception that only the invention is illustrated in a schematic way, in attached drawing only display with related component in the present invention rather than according to reality
Component count, shape when implementation and size are drawn, when actual implementation each component shape, quantity and ratio can for it is a kind of with
The change of meaning, and its assembly layout form may also be increasingly complex.
Embodiment 1
In voice sound signal process field and speech therapy field, the objective analysis of ill voice has caused various passes
Note.For example, MDVP is voice sound signal analyzing software system commonly used by a speech therapist.MDVP provides 22 sound
Signal parameter is learned to describe voice quality.These parameters are used as by speech therapist according to progress voice assessment.
LingWAVES is another business software system, for assisting doctor to make clinical diagnosis to voice problem.LingWAVES can
To carry out basic acoustic signal analysis, such as spectrum sigtral response, tone analysis, volume analysis etc..In recent years many researchs
Scholar carries out acoustic signal analysis using a freeware Praat, but Praat is not one specifically for lesion
The tool of voice analysis also can not provide judgement to lesion type and severity.
From the angle of signal processing, relevant research is primarily upon how to extract effective characteristic parameter to describe lesion throat
The irregular characteristic of sound signal.Most common characteristic parameter include fundamental frequency shake (jitter), amplitude jitter (shimmer) and
Signal noise ratio (HNR).Fundamental frequency shake, amplitude jitter and their relevant similar characteristics parameters, describe in time-domain
Change degree of the voice sound signal in tone and volume.Usually, ill voice due to being not sufficiently stable, tremble by fundamental frequency shake and amplitude
Dynamic numerical value can be bigger.Signal noise ratio calculates the energy ratio between voice sound signal periodic component and additive noise ingredient.Morbid state
The signal noise ratio of voice is usually all relatively low.Other relevant acoustical characteristic parameters include: pitch-period entropy
(PPE), glottis quotient (GQ), voice turbulence index (VTI), and soft phonation
Index (SPI) etc..On the other hand, mel cepstrum coefficients (Mel-Frequency Cepstral Coefficients, MFCC),
It is common feature in speech recognition and Speaker Identification.This feature is also attempted for the detection and classification of ill voice.People
Artificial neural networks (Artificial neural network, ANN) and mixed Gauss model (Gaussian mixture
Model, GMM) it is used as voice classification tool.As previously mentioned, the voice signal of analysis is concentrated mainly on long vowel, closely
Development because of machine learning in several years also starts to turn to continuous speech signal in terms of pathology voice signal research.These researchs
Although there is no the careful voice signals for distinguishing different language content in acoustic signal analysis using connected speech signal.
One section of continuous speech is inputted, usually temporally frame extracts features described above parameter first, then to all frame level feature vectors
It statisticallys analyze, for example takes average value or standard variance or median of certain feature on all time frames, gained statistical value quilt
It is considered as the feature of whole section of continuous speech signal, and as subsequent recurrence or the input of classification problem.
Summarize existing research work, it is believed that have the disadvantage that
Above-mentioned common feature parameter such as jitter, shimmer, main concentrate describe metastable pronunciation state, these
Characteristic parameter can be used to identify normal voice and ill voice, but further careful can not detect lesion type and lesion
Degree.Similar Jitter, shimmer parameter are more suitable for extracting from stablizing in long vowel, and natural continuous speech is in different phonemes
Between be quickly converted, acoustic feature is also constantly changing, such as describe vocal cord vibration fundamental frequency and sound channel variation cepstrum coefficient.
These changing factors in continuous speech not only make some traditional characteristic parameters (such as jitter, shimmer) fail, and continuous
The statistical result of all frame level features of voice can not also eliminate the influence of language content variation.
These work mostly use greatly statistical data model, such as mixed Gauss model in detection and classification.Work as training data
When amount is enough and training data is matched with actual test data, statistical method effectively can accurately describe different classes of voice
Feature distribution.But the reason of morbid state voice, is many with type, acoustic feature distribution is simultaneously irregular, and the training number of many classifications
According to not enough, therefore these statistical methods can not obtain answer it is effective.
Few correlative study work can be careful to the individual particular aspects for quantifying voice at present, such as trachyphonia, breath sound
Deng.It is considered that these careful characteristic researchs have key effect for clinical treatment.
In view of the deficiencies of the prior art, this auxiliary voice assessment tool is according to language content cutting voice signal, for not
Suitable characteristic parameter is designed or found with speech-like signal.In addition to voice problem carry out the overall evaluation, the design also directed to
The different evaluation model of different voice problem training, such as trachyphonia (Roughness), dry throat (Glottal fry), sounding is laborious
(Strain) etc., whole and careful multidimensional voice problem score is carried out to natural continuous speech signal to realize, is asked for voice
The diagnosing and treating process of topic provides important references.
As shown in Figure 1, a kind of ill voice appraisal procedure based on Chinese speech, including step 1-3, institute specific as follows
Show:
1. using Chinese automatic speech verification system complete to the continuous voice signal of input and its correspond to text from
Dynamic alignment, completes phone segmentation;
As shown in Fig. 2, will be completed using Chinese automatic speech verification system to input continuous speech and its correspond to text
Automatic aligning, to realize the purpose for carrying out phone segmentation to continuous speech.Although needed in use to ill voice voice into
Row identification, but lack extensive ill voice continuous speech database.
As shown in figure 3, this method is based on the hidden Ma Erke of deep neural network-using extensive normal voice database training
The Chinese speech recognizing system of husband's model (DNN-HMM).In Fig. 3,1 is the acoustic feature vector Ot of continuous speech signal, and 2 are
Deep neural network DNN, 1 as 2 input, 3 for deep neural network output posterior probability vector (P (aj | Ot), aj table
Show the HMM state an of phoneme, 4 be HMM chain.It is correct to the identification of ill voice voice to improve Chinese speech recognizing system
Rate uses speaker adaptation technology (Speaker Adaptation Training, SAT) in training acoustic model.
In the assessment diagnosis of clinical voice, sufferer can be instructed to do different types of sounding by speech therapist, be generally comprised
Long vowel reads short essay and open end interview.Because the short essay content read is it is known that continuous speech to this part, text
It can be used as the input of speech recognition system.Viterbi algorithm is by the aligned phoneme sequence and voice in text inside speech recognition system
Signal forces alignment to complete phone segmentation work.
2. according to the time series of the phone segmentation, the continuous speech signal is subjected to cutting and is divided into vowel, clear
Then consonant and voiced consonant carry out the extraction of feature;
Extract the feature of 4 subclasses respectively, the first category feature: fundamental frequency shakes (jitter), amplitude jitter (shimmer),
Signal noise ratio (HNR), pitch (pitch), energy intensity (intensity);Second category feature: voice interruption (voice
), break un-voiced signal ratio (unvoiced fraction);Third category feature: mel cepstrum coefficients (MFCC);4th class:
It include: low-and high-frequency energy ratio: HiLo from design parameter;Pitch variation amount: var_F0;Signal noise ratio variable quantity: var_HNR;
Continuous Energy difference: D3_Energy;D3_Energy variable quantity: var_D3;Normalized energy.According to phone segmentation time sequence
Continuous speech signal is carried out cutting and is divided into vowel, voiceless consonant and voiced consonant by column, is extracted for this few class phoneme voice signal
Different characteristic parameters.Above-mentioned to vowel extraction first and second and four category feature parameters;To voiced consonant extract second and third, four classes it is special
Levy parameter;Third and fourth category feature parameter is extracted to voiceless consonant.The 4th category feature parameter that this method proposes is described below:
(1) HiLo, low-and high-frequency energy ratio, formula is as follows, wherein EhiFor high band (6000HZ or more) energy, EloIt is low
Frequency range (6000HZ or less) energy, E0For background noise energy.Experimental data shows, for hoarseness (Roughness) and
There is the voice signal of breath sound (Breathiness).
(2) var_F0, pitch variation amount, formula is as follows, and wherein F0 is the fundamental frequency of a frame signal, and n is one section of continuous speech
Totalframes mesh, var is the var function in Calling MATLAB.
(3) var_HNR, signal-to-noise ratio variable quantity, formula is as follows, and wherein HNR is signal-to-noise ratio, and n is the total of one section of continuous speech
Frame number.For some pathological voice voice signals, not only signal-to-noise ratio (HNR) is lower, and the variation of HNR is bigger, can embody
On var_HNR.
(4) D3_Energy, Continuous Energy difference, formula is as follows, and wherein E (t) is the time domain energy of t frame.D3_
Energy describes the variation of 3 frame speech signal energies before and after t moment.
(5) var_D3, D3_Energy variable quantity, formula is as follows, and wherein n is the totalframes of one section of continuous speech.
(6) E_nor, normalized energy, formula is as follows, wherein EvoiceFor the energy of voice signal, EnoiseFor mute part
Bottom make an uproar energy.E_nor can be used for measuring size of the speech signal energy relative to ambient noise, be conducive to judge that speaking volume is inclined
Big or less than normal (Loudness High/Low).
3. the feature composition characteristic parameter sets of the extraction, the set of characteristic parameters passes through the sieve of feature selector
Choosing, then inputs classifier, obtains the careful scoring of voice problem.
As shown in Fig. 2, Feature Selection device shares 10, it is directed to 10 class voice problems to be assessed respectively, is shown in Table 1.
1 voice problem list of table
Corresponding classifier shares 10, respectively for 10 class voice problems training to be assessed.Every a pair of feature selecting
Device and classifier constitute one group of Rating Model, obtain for the specific a kind of training in 10 class voice problems.Characteristic extracting module
The extensive set of characteristic parameters of output first passes through feature selector, and the parameter after screening will input classifier, final to obtain
To the scoring of corresponding voice problem.In feature selecting, in addition to comprehensive algorithms most in use, such as LASSO (least absolute
shrinkage and selection operator), mRMR(minimum redundancy maximum relevance)
It, can also be with particular reference to designer to all kinds of voices with LLBFS (local learning-based feature selection)
The experience of problem progress signal analysis.Experiment shows that specific voice problem has performance in specific acoustical characteristic parameters, such as
Hoarseness can cause voice signal HiLo parameter to increase, and the var_F0 and var_HNR of trill are larger.With reference to clinical speech therapy
Evaluation of the teacher to voice problem, the design use 4 class classifiers, and output can indicate that voice is normal or slight, moderate, serious throat
Mail topic.Due to being multicategory classification problem, this method selection multilayer neural network (MLP, Multi-layer Perceptron)
As classifier.Voice data used in training classifier are directed to the subjective scoring of every voice problem containing professional speech therapist,
Reference standard of this subjective scoring as training MLP.
As shown in figure 4, illustrating each scoring classifier principle by taking trachyphonia (Roughness) as an example, to each input feature vector
Parameter vector, MLP export the one-hot vector of one 4 dimension, indicate the severity of voice trachyphonia, are shown in Table 2.Other 9 kinds of voices
The assessment of problem is also similarly.
The severity of 2 voice problem of table
0001 | Normally (trachyphonia problem is not present) |
0010 | Slightly (there are slight trachyphonias) |
0100 | Moderate (there are moderate trachyphonias) |
1000 | Seriously (there are serious trachyphonias) |
The effect in an experiment from design parameter is exemplified below:
(1) about HiLo value
As shown in figure 5, abscissa is HiLo value in figure without the HiLo Distribution value histogram of trachyphonia voice medial vowel, indulge
Coordinate is sample size.
As shown in fig. 6, there is the HiLo Distribution value histogram of serious trachyphonia voice medial vowel, abscissa is HiLo value in figure,
Ordinate is sample size.
HiLo value is higher, if Fig. 5 comparison diagram 6 is the HiLo Distribution value histogram that the comparison of no trachyphonia has serious trachyphonia voice
Figure and Fig. 7 comparison diagram 8 are no breath sound and have serious breath sound voice HiLo Distribution value histogram.
As shown in fig. 7, abscissa is HiLo value in figure without the HiLo Distribution value histogram of breath sound voice medial vowel,
Ordinate is sample size.
As shown in figure 8, there is the HiLo Distribution value histogram of serious breath sound voice medial vowel, abscissa is HiLo value, is indulged
Coordinate is sample size.
(2) about var_F0 value: experimental data shows have the var_F0 value of serious voice problem voice higher, as table 3 is right
It is no trill and the var_F0 distribution for having serious trill voice than table 4.
Var_F0 value of the table 3 without trill voice medial vowel
Voice serial number | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
var_F0(*10-3) | 0.91 | 31.11 | 22.32 | 33.34 | 0.16 | 6.93 | 38.89 | 34.83 | 12.34 |
Table 4 has the var_F0 value of serious trill voice medial vowel
Voice serial number | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
var_F0(*10-3) | 91.47 | 150.47 | 241.33 | 49.23 | 20.37 | 39.53 | 69.50 | 119.30 | 63.11 |
(3) about var_HNR: its var_HNR value of the voice signal of i.e. some pathological voices is bigger, such as 5 contrast table 6 of table
For no trill and have shown in serious trill voice signal var_HNR Distribution value.
Var_HNR value of the table 5 without trill voice medial vowel
Voice serial number | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
var_HNR(*10-3) | 4.07 | 6.07 | 7.47 | 3.67 | 2.00 | 4.38 | 8.16 | 5.65 | 6.73 |
Table 6 has the var_HNR value of serious trill voice medial vowel
Voice serial number | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
var_HNR(*10-3) | 198.30 | 215.85 | 718.98 | 140.82 | 90.25 | 116.21 | 134.84 | 149.75 | 215.21 |
(4) about var_D3: general pathological voice can show biggish energy variation.D3_Energy is also used to calculate
var_D3.Var_D3 is used to measure the energy stability of one section of voice signal, and the var_D3 value of usually morbid state voice is higher, such as table
7 contrast tables 8 is no trill and have shown in serious trill voice signal var_D3 Distribution value.
Var_D3 value of the table 7 without trill voice medial vowel
Voice serial number | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
var_D3(*10-5) | 0.06 | 0.004 | 0.003 | 0.004 | 0.006 | 0.003 | 0.005 | 0.007 | 0.003 |
Table 8 has the var_D3 value of serious trill voice medial vowel
Voice serial number | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
var_D3(*10-5) | 15 | 723 | 2218 | 5730 | 1217 | 825 | 62 | 840 | 306 |
(5) about E_nor: if Fig. 9 comparison diagram 10 is that volume is normal and have the E_nor Distribution value of amount of bass voice, it is seen that
The E_nor value of amount of bass voice is relatively low.
Such as Fig. 9, the E_nor Distribution value histogram of volume normal voice medial vowel, wherein abscissa is E_nor value, indulges and sits
It is designated as sample size.
Such as Figure 10, there is the E_nor Distribution value histogram of moderate, serious amount of bass voice medial vowel, wherein abscissa is E_
Nor value, ordinate are sample size.
Embodiment 2
The specific embodiment for carrying out ill voice assessment using method of the present invention is as described below:
(1) establish ill voice database: this database is jointly complete by hospital speech therapist, scholar and volunteer's sufferer
At.Sufferer scale is that 230 mother tongues are Chinese personage, and gender, age balance respectively.Under relatively fixed environment, each disease
Suffering from recording includes long vowel, the voice signal for reading aloud short essay and question answering.Recording audio is two-channel, 16bit, sample rate are
The wav file of 44.1kHz.Collected voice signal is doing simple pre-processing (using speaker diarisation technology
Delete speech therapist's speech content, fixed multiple amplification and noise reduction) after, it gives 48 professional speech therapists in total and decides
See scoring.The subjective scoring of 10 voice problems is done to the recording of each sufferer.It is final right for the reliability for improving subjective scoring
The subjective scoring of all speech therapists statisticallys analyze, and removes insincere high score or low point, asks median, variance etc., finally
Its voice of each sufferer is obtained in the subjective scoring of 10 aspects.
(2) Chinese phonetic system training: the large-scale words automatic recognition of speech database training one of normal voice is used
A Chinese speech automatic recognition system.
(3) voice for reading the short essay of each sufferer in ill voice database and corresponding text input to Chinese speech
Automatic recognition system obtains phoneme time sequence alignment data.
(4) according to the phoneme time sequence alignment data in step (3), then to the member in the voice signal of short essay reading
Sound, voiceless consonant and voiced consonant extract characteristic parameter respectively.
(5) all parameters extracted in step (4) pass through 10 automatic scoring modules (10 characteristic parameter selection devices respectively
+ 10 four class classifiers), this 10 automatic scoring modules are directed to 10 voice problems respectively.In the Rating Model training stage, step
Suddenly target value of the subjective scoring as model training obtained in (1) is calculated with least mean-square error (Least Mean Square)
Method keeps Rating Model optimal.
(6) after automatic scoring model is optimal, i.e., the method is implanted into computer system, is ground for clinical trial or science
Study carefully.Test macro, the computing unit (NVIDIA GPU) needed for being calculated in addition to model, it is also necessary to which front end sound pick-up outfit is (available
Shure omnidirectional microphone+TASCAM 2*2 sound card), Audio storage unit (calculator memory), appraisal result display unit (liquid
Crystal display screen).Test start when, ask first subject's (mouth) distance microphone certain distance (30cm) hair long vowel (/a/ ,/
I/ ,/u/), short essay is read aloud, two problems are answered.Subject's voice signal of recording first passes through pre-processing, and then short essay is read aloud
Part of speech signal will enter computing system and carry out phone segmentation, feature extraction, automatic scoring, will finally export in display unit
All appraisal results of 10 phonetic problems.
Embodiment 3
As shown in figure 5, the present invention also provides a kind of ill voice assessment system based on Chinese speech, including voice input
Module, the Speech Assessment module using method as described in Example 1, assessment result output module.
The voice input module is used to receive real-time recording or audio, and by the real-time recording or audio transmission to language
Sound evaluation module;The Speech Assessment module transmits assessment result for assessing the real-time recording or audio
Give assessment result output module;The result output module is for exporting the assessment result.
Method and system provided by the present invention have it is following the utility model has the advantages that
(1) based on automatic speech recognition technology to the ill voice continuous speech signal progress phone segmentation of input, and according to
The sounding feature of different phonemes classifies to the voice sequence of cutting, such as vowel and consonant, voiced sound and voiceless sound;
(2) different characteristic parameters is designed for different voice sequences, traditional jitter is used to vowel,
Shimmer parameter with reflect vocal cords vibration regularity, to voiceless sound using MFCC and HiLo parameter with the variation of vocal tract filter
With energy each frequency range distribution;
(3) for the different Rating Models based on MLP of different voice problems training, realize to ill voice whole and
The objective scoring of 10 kinds of voice problems such as trachyphonia, provides important references for clinical diagnosis and rehabilitation.
This method/system, which is able to achieve, carries out automatic voice assessment to the continuous speech of fixed text content.If building up in the future
Extensive morbid state voice continuous speech database, this method/system then may extend to the natural continuous speech of arbitrary content.It is existing
Technology does not consider the variation of language content in continuous speech, extracts same characteristic features parameter to all voice sequences.This method/system
The voice sequence of different language content type can be handled respectively, it can be achieved that more careful analysis voice problem is to different phonetic content
The influence of sounding, offers guidance and suggestion for rehabilitation.Different from the prior art to the single evaluation of voice, this method/be
System makees various dimensions assessment to voice, realizes the judgement to voice Questions types and severity.
Under manpower license, artificial mask method, the method/be is can also be used in this method/system phone segmentation module
System can obtain the phoneme time sequence of higher precision.If there is the continuous speech database of extensive ill voice, can be used to train
Automatic Chinese speech recognition system cuts the phoneme for obtaining more reliable compared to the speech recognition system for using normal voice training
Divide time series.In grading module, the method for sparse estimation can also be used to quantify to distinguish the classification and serious journey of ill voice
Degree.
The above content is a further detailed description of the present invention in conjunction with specific preferred embodiments, and it cannot be said that
Specific implementation of the invention is only limited to these instructions.For those skilled in the art to which the present invention belongs, it is not taking off
Under the premise of from present inventive concept, several equivalent substitute or obvious modifications can also be made, and performance or use is identical, all answered
When being considered as belonging to protection scope of the present invention.
Claims (10)
1. a kind of ill voice appraisal procedure based on Chinese speech, which comprises the steps of:
S1: using Chinese automatic speech verification system complete to the continuous voice signal of input and its correspond to the automatic right of text
Together, phone segmentation is completed;
S2: according to the time series of the phone segmentation, the continuous speech signal is subjected to cutting and is divided into vowel, voiceless consonant
And voiced consonant, then carry out the extraction of feature;
S3: the feature composition characteristic parameter sets of the extraction, the set of characteristic parameters pass through the screening of feature selector, so
After input classifier, obtain the careful scoring of voice problem.
2. the ill voice appraisal procedure based on Chinese speech as described in claim 1, which is characterized in that the Chinese is automatic
Speech recognition system is using normal voice database training based on deep neural network-hidden Markov model Chinese speech
Identifying system.
3. the ill voice appraisal procedure based on Chinese speech as described in claim 1, which is characterized in that the Chinese is automatic
Speech recognition system uses speaker adaptation technology.
4. the ill voice appraisal procedure based on Chinese speech as described in claim 1, which is characterized in that the Chinese is automatic
Continuous voice signal and its aligned phoneme sequence for corresponding to text are forced alignment with complete by speech recognition system using Viterbi algorithm
It works at phone segmentation.
5. the ill voice appraisal procedure based on Chinese speech as described in claim 1, which is characterized in that in the step S2
The feature include:
First category feature: fundamental frequency shake, amplitude jitter, signal noise ratio, pitch, energy intensity;
Second category feature: voice interruption, un-voiced signal ratio;
Third category feature: mel cepstrum coefficients;
4th category feature: low-and high-frequency energy ratio, pitch variation amount, signal noise ratio variable quantity, Continuous Energy difference, Continuous Energy
Difference variable quantity, normalized energy.
6. the ill voice appraisal procedure based on Chinese speech as claimed in claim 5, which is characterized in that mentioned to the vowel
Take the parameter of first category feature, the second category feature and the 4th category feature;Second category feature, third are extracted to the voiced consonant
The parameter of category feature, the 4th category feature;The parameter of third category feature, the 4th category feature is extracted to the voiceless consonant.
7. the ill voice appraisal procedure based on Chinese speech as claimed in claim 5, which is characterized in that the 4th class is special
In sign:
The calculation formula of the low-and high-frequency energy ratio HiLo are as follows:Wherein, EhiFor 6000HZ or more
The energy of frequency range, EloFor the energy of 6000HZ or less frequency range, E0For background noise energy;
The calculation formula of the pitch variation amount var_F0 are as follows:Wherein, F0 mono-
The fundamental frequency of frame signal, n are the totalframes mesh of one section of continuous speech, and var is the var function in Calling MATLAB;
The calculation formula of the signal noise ratio variable quantity var_HNR are as follows:
Wherein, HNR is signal-to-noise ratio, and n is the totalframes mesh of one section of continuous speech;
The calculation formula of the Continuous Energy difference D3_Energy are as follows:Its
In, E (t) is the time domain energy of t frame;
The calculation formula of the Continuous Energy difference variable quantity var_D3 are as follows:
Wherein, n is the totalframes of one section of continuous speech;
The calculation formula of the normalized energy E_nor are as follows:Wherein, EvoiceFor voice signal
Energy, EnoiseIt makes an uproar energy for the bottom of mute part.
8. the ill voice appraisal procedure based on Chinese speech as described in claim 1, which is characterized in that in the step S3
The careful scoring is directed to voice problem to be assessed, and the voice problem to be assessed includes following voice problem:
Q1: the severity of the lesion of common sensation the preceding paragraph voice;
Q2: feel Heterophony, irregular, unintelligible;
Q3: feeling in voice has expiration or gas leak phenomenon;
Q4: feel pronunciation Overexertion;
Q5: in phonation, there is unexpected uncontrolled pause;
Q6: feel unexpected uncontrolled fundamental frequency or tone variation;
Q7: there is quick continuous cracker in phonation;
Q8: feel the variation of regular rhythmic tone or loudness;
Q9: there are apparent fundamental frequency or tone and speaker's gender, age and the incongruent exceptions of culture and education background in voice;
Q10: pronunciation volume is obviously incongruent too large or too small with speech content.
9. the ill voice appraisal procedure based on Chinese speech as claimed in claim 8, which is characterized in that the careful scoring
Obtain the severity for voice problem, the severity includes: normal or slight, moderate, serious.
10. a kind of ill voice assessment system based on Chinese speech, which is characterized in that including voice input module, using such as
The Speech Assessment module of any the method for claim 1-4, assessment result output module,
The voice input module comments the real-time recording or audio transmission to voice for receiving real-time recording or audio
Estimate module;
The Speech Assessment module is transferred to assessment knot for assessing the real-time recording or audio, and by assessment result
Fruit output module;
The result output module is for exporting the assessment result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711010103.6A CN109727608B (en) | 2017-10-25 | 2017-10-25 | Chinese speech-based ill voice evaluation system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711010103.6A CN109727608B (en) | 2017-10-25 | 2017-10-25 | Chinese speech-based ill voice evaluation system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109727608A true CN109727608A (en) | 2019-05-07 |
CN109727608B CN109727608B (en) | 2020-07-24 |
Family
ID=66290570
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711010103.6A Active CN109727608B (en) | 2017-10-25 | 2017-10-25 | Chinese speech-based ill voice evaluation system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109727608B (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110876609A (en) * | 2019-07-01 | 2020-03-13 | 上海慧敏医疗器械有限公司 | Voice treatment instrument and method for frequency band energy concentration rate measurement and audio-visual feedback |
CN111358428A (en) * | 2020-01-20 | 2020-07-03 | 书丸子(北京)科技有限公司 | Observation capability test evaluation method and device |
CN112687297A (en) * | 2020-12-21 | 2021-04-20 | 安徽科大讯飞医疗信息技术有限公司 | Voice state classification method and device, electronic equipment and storage medium |
CN112735480A (en) * | 2021-01-25 | 2021-04-30 | 杭州电子科技大学 | Vocal cord pathological change detection device based on neural network |
CN113143217A (en) * | 2021-05-07 | 2021-07-23 | 张育青 | Method and system for predicting human body sound production effect |
CN113571088A (en) * | 2021-07-27 | 2021-10-29 | 上海交通大学医学院附属第九人民医院 | Difficult airway assessment method and device based on deep learning voiceprint recognition |
CN114373452A (en) * | 2020-10-15 | 2022-04-19 | 华东师范大学 | Voice abnormity identification and evaluation method and system based on deep learning |
CN116110437A (en) * | 2023-04-14 | 2023-05-12 | 天津大学 | Pathological voice quality evaluation method based on fusion of voice characteristics and speaker characteristics |
CN117976141A (en) * | 2024-04-01 | 2024-05-03 | 四川大学华西医院 | Voice rehabilitation analysis method and system based on acoustic analysis algorithm |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101740025A (en) * | 2008-11-21 | 2010-06-16 | 三星电子株式会社 | Singing score evaluation method and karaoke apparatus using the same |
CN102663925A (en) * | 2012-05-09 | 2012-09-12 | 长春大学 | Method and system for tongue training for language training of hearing-impaired children |
JP5185841B2 (en) * | 2008-01-10 | 2013-04-17 | 旭化成株式会社 | User voice diagnosis apparatus and user voice diagnosis method |
CN103093759A (en) * | 2013-01-16 | 2013-05-08 | 东北大学 | Device and method of voice detection and evaluation based on mobile terminal |
CN103730130A (en) * | 2013-12-20 | 2014-04-16 | 中国科学院深圳先进技术研究院 | Detection method and system for pathological voice |
CN103928023A (en) * | 2014-04-29 | 2014-07-16 | 广东外语外贸大学 | Voice scoring method and system |
CN106531189A (en) * | 2016-12-20 | 2017-03-22 | 潘奕君 | Intelligent spoken language evaluation method |
-
2017
- 2017-10-25 CN CN201711010103.6A patent/CN109727608B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP5185841B2 (en) * | 2008-01-10 | 2013-04-17 | 旭化成株式会社 | User voice diagnosis apparatus and user voice diagnosis method |
CN101740025A (en) * | 2008-11-21 | 2010-06-16 | 三星电子株式会社 | Singing score evaluation method and karaoke apparatus using the same |
CN102663925A (en) * | 2012-05-09 | 2012-09-12 | 长春大学 | Method and system for tongue training for language training of hearing-impaired children |
CN103093759A (en) * | 2013-01-16 | 2013-05-08 | 东北大学 | Device and method of voice detection and evaluation based on mobile terminal |
CN103093759B (en) * | 2013-01-16 | 2014-12-10 | 东北大学 | Device and method of voice detection and evaluation based on mobile terminal |
CN103730130A (en) * | 2013-12-20 | 2014-04-16 | 中国科学院深圳先进技术研究院 | Detection method and system for pathological voice |
CN103928023A (en) * | 2014-04-29 | 2014-07-16 | 广东外语外贸大学 | Voice scoring method and system |
CN106531189A (en) * | 2016-12-20 | 2017-03-22 | 潘奕君 | Intelligent spoken language evaluation method |
Non-Patent Citations (1)
Title |
---|
TAN LEE ET AL.: "Automatic speech recognition for acoustical analysis and assessment of cantonese pathological voice and speech", 《 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)》 * |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110876609A (en) * | 2019-07-01 | 2020-03-13 | 上海慧敏医疗器械有限公司 | Voice treatment instrument and method for frequency band energy concentration rate measurement and audio-visual feedback |
CN111358428A (en) * | 2020-01-20 | 2020-07-03 | 书丸子(北京)科技有限公司 | Observation capability test evaluation method and device |
CN114373452A (en) * | 2020-10-15 | 2022-04-19 | 华东师范大学 | Voice abnormity identification and evaluation method and system based on deep learning |
CN112687297A (en) * | 2020-12-21 | 2021-04-20 | 安徽科大讯飞医疗信息技术有限公司 | Voice state classification method and device, electronic equipment and storage medium |
CN112687297B (en) * | 2020-12-21 | 2023-03-28 | 安徽讯飞医疗股份有限公司 | Voice state classification method and device, electronic equipment and storage medium |
CN112735480A (en) * | 2021-01-25 | 2021-04-30 | 杭州电子科技大学 | Vocal cord pathological change detection device based on neural network |
CN113143217A (en) * | 2021-05-07 | 2021-07-23 | 张育青 | Method and system for predicting human body sound production effect |
CN113143217B (en) * | 2021-05-07 | 2024-01-26 | 张育青 | Method and system for predicting sounding effect of human body |
CN113571088A (en) * | 2021-07-27 | 2021-10-29 | 上海交通大学医学院附属第九人民医院 | Difficult airway assessment method and device based on deep learning voiceprint recognition |
CN113571088B (en) * | 2021-07-27 | 2023-10-03 | 上海交通大学医学院附属第九人民医院 | Difficult airway assessment method and device based on deep learning voiceprint recognition |
CN116110437A (en) * | 2023-04-14 | 2023-05-12 | 天津大学 | Pathological voice quality evaluation method based on fusion of voice characteristics and speaker characteristics |
CN116110437B (en) * | 2023-04-14 | 2023-06-13 | 天津大学 | Pathological voice quality evaluation method based on fusion of voice characteristics and speaker characteristics |
CN117976141A (en) * | 2024-04-01 | 2024-05-03 | 四川大学华西医院 | Voice rehabilitation analysis method and system based on acoustic analysis algorithm |
Also Published As
Publication number | Publication date |
---|---|
CN109727608B (en) | 2020-07-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109727608A (en) | A kind of ill voice appraisal procedure based on Chinese speech | |
Cernak et al. | Characterisation of voice quality of Parkinson’s disease using differential phonological posterior features | |
CN106073706B (en) | A kind of customized information and audio data analysis method and system towards Mini-mental Status Examination | |
US20160351074A1 (en) | Systems and methods for expressive language, developmental disorder, and emotion assessment, and contextual feedback | |
Muhammad et al. | Convergence of artificial intelligence and internet of things in smart healthcare: a case study of voice pathology detection | |
CN103996155A (en) | Intelligent interaction and psychological comfort robot service system | |
Borsky et al. | Modal and nonmodal voice quality classification using acoustic and electroglottographic features | |
CN110223688A (en) | A kind of self-evaluating system of compressed sensing based hepatolenticular degeneration disfluency | |
CN114916921A (en) | Rapid speech cognition assessment method and device | |
Zhang et al. | Adjustment of cue weighting in speech by speakers and listeners: Evidence from amplitude and duration modifications of Mandarin Chinese tone | |
Alshehri et al. | Detection and Diagnosis of Learning Disabilities in Children of Saudi Arabia with Artificial Intelligence | |
Gu et al. | Disordered speech assessment using automatic methods based on quantitative measures | |
Deepa et al. | Speech technology in healthcare | |
Padmini et al. | Age-Based Automatic Voice Conversion Using Blood Relation for Voice Impaired. | |
He | Stress and emotion recognition in natural speech in the work and family environments | |
Sahoo et al. | Detection of speech-based physical load using transfer learning approach | |
CN111210845A (en) | Pathological voice detection device based on improved autocorrelation characteristics | |
CN110211566A (en) | A kind of classification method of compressed sensing based hepatolenticular degeneration disfluency | |
Ferrat et al. | A study of sounds produced by Algerian esophageal speakers | |
Liao | Analysis of Rehabilitation Occupational Therapy Techniques Based on Instrumental Music Chinese Tonal Language Spectrogram Analysis | |
Duan et al. | An English pronunciation and intonation evaluation method based on the DTW algorithm | |
Li | Modular design of English pronunciation proficiency evaluation system based on Speech Recognition Technology | |
Hanani et al. | English speaking proficiency assessment using speech and electroencephalography signals | |
Xu et al. | Application of Multimodal NLP Instruction Combined with Speech Recognition in Oral English Practice | |
Zainon et al. | Speech quality based on Arabic pronunciation using MFCC and LDA: Investigating the emphatic consonants |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |