CN109119094A - Voice classification method by utilizing vocal cord modeling inversion - Google Patents
Voice classification method by utilizing vocal cord modeling inversion Download PDFInfo
- Publication number
- CN109119094A CN109119094A CN201810824379.6A CN201810824379A CN109119094A CN 109119094 A CN109119094 A CN 109119094A CN 201810824379 A CN201810824379 A CN 201810824379A CN 109119094 A CN109119094 A CN 109119094A
- Authority
- CN
- China
- Prior art keywords
- glottis
- voice
- vocal cords
- model
- wave
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 210000001260 vocal cord Anatomy 0.000 title claims abstract description 73
- 238000000034 method Methods 0.000 title claims abstract description 32
- 238000005457 optimization Methods 0.000 claims abstract description 16
- 238000004458 analytical method Methods 0.000 claims abstract description 10
- 238000000354 decomposition reaction Methods 0.000 claims abstract description 9
- 230000002068 genetic effect Effects 0.000 claims abstract description 6
- 210000004704 glottis Anatomy 0.000 claims description 99
- 230000005236 sound signal Effects 0.000 claims description 17
- 230000008878 coupling Effects 0.000 claims description 7
- 238000010168 coupling process Methods 0.000 claims description 7
- 238000005859 coupling reaction Methods 0.000 claims description 7
- 238000010606 normalization Methods 0.000 claims description 7
- 230000002146 bilateral effect Effects 0.000 claims description 5
- 238000000605 extraction Methods 0.000 claims description 4
- 230000035772 mutation Effects 0.000 claims description 4
- 238000013178 mathematical model Methods 0.000 claims description 3
- 230000000694 effects Effects 0.000 abstract description 4
- 230000001755 vocal effect Effects 0.000 abstract description 3
- 230000008569 process Effects 0.000 description 6
- 230000002159 abnormal effect Effects 0.000 description 4
- 230000006870 function Effects 0.000 description 3
- 206010047675 Vocal cord polyp Diseases 0.000 description 2
- 230000007423 decrease Effects 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 208000014515 polyp of vocal cord Diseases 0.000 description 2
- 101100023120 Arabidopsis thaliana MFDR gene Proteins 0.000 description 1
- 241000208340 Araliaceae Species 0.000 description 1
- 206010020850 Hyperthyroidism Diseases 0.000 description 1
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 description 1
- 235000003140 Panax quinquefolius Nutrition 0.000 description 1
- 208000005248 Vocal Cord Paralysis Diseases 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000002996 emotional effect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 235000008434 ginseng Nutrition 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000036651 mood Effects 0.000 description 1
- 230000004962 physiological condition Effects 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 230000002459 sustained effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/39—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using genetic algorithms
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)
- Prostheses (AREA)
Abstract
The invention discloses a vocal classification method by utilizing vocal cord modeling inversion, which effectively distinguishes various voices from the perspective of vocal mechanism. The method mainly utilizes complex cepstrum phase decomposition to obtain an actual glottal wave as a target glottal wave, adopts an optimization algorithm to perform vocal cord dynamics model inversion operation by matching target and model glottal wave characteristic parameters, selects normal voice and special voice to perform identification and classification, and has better accuracy. After the actual voice signal is input, the actual glottal wave is extracted as a target, and the genetic algorithm is adopted for inversion to optimize the original model, so that vocal cord vibration conditions when different voices sound are sounded are simulated. Experimental results show that the matching relative error of each characteristic parameter after model inversion is not more than 1.95%, and the inversion effect is good. The normal voice and the special voice are selected for identification and analysis, and the accuracy is high.
Description
Technical field
The present invention relates to voice classification fields, and in particular to a kind of voice classification method of vocal cords modeling inversion.
Background technique
Voice sorting technique is to carry out signature analysis, the skill that will be distinguished to different types of voice to voice sound signal
Art, the technology can be applied to emotional speech analysis and assessment of voice quality etc..Language of the quality of voice quality to people
Speech expression has a direct impact, particularly important for teacher, announcer and singer etc., and people is in long-time sounding or in tight
When having the state of pressure, voice can generate variation, or even sound of neighing occur.
The acoustic analysis method of voice is widely used at present, is only capable of providing the acoustic information of voice, it cannot be with practical sounding
The physiological structure of system is associated, and can not providing good classification standard classification results, there are biggish errors.Research shows that
When issuing identical voice, some special voices (such as: vocal nodule, polyp of vocal cord, hyperthyroidism voice) and normal voice vocal cords
Vibration Condition is different, the corresponding vocal cord vibration situation of the voice being under different moods also different from.In view of vocal cords
It is important one of system during speech production, the Vibration Condition of vocal cords will have a direct impact on the quality of voice quality, therefore will
Vocal cords modeling technique is combined with Acouisic analysis, and simulated sound band is in the glottis wave exported when different conditions, to carry out voice
Classification.
Voice classification is carried out using vocal cords modeling technique, glottis wave is mainly exported by model, to simulate actual throat
Sound, to classify to voice.In actual conditions, the physical parameter of model is directly set, is difficult to simulate and practical voice
The glottis wave that signal matches, has a significant impact to the setting of subsequent classification standard.
Summary of the invention
The technical problems to be solved by the present invention are: being directed to the defect of background technique, the present invention is in traditional two mass of vocal cords
On the basis of block models, it is subject to the amendment of impact force, matching optimization is carried out to model using inversion algorithm, to reach accurate simulation
Physiological conditions when practical vocal cord vibration, to carry out the classification of normal voice Yu special voice.
The present invention uses following technical scheme to solve above-mentioned technical problem:
The present invention proposes a kind of voice classification method of vocal cords kinetic model inverting, specifically includes the following steps:
Step 1 estimates glottis wave using cepstrum polyphase decomposition CCPD, specifically:
(1) pitch period for seeking a frame voice sound signal first obtains a frame voice sound signal glottis by DYPSA algorithm and closes
The position of chalaza, glottis closing point position is corresponding with pitch period, it is specific to obtain glottis closing point in each pitch period
Position;
(2) voice sound signal in each pitch period is obtained, using the method for cepstrum by the voice sound signal in this period
It is decomposed into maximum phase signal and minimum phase signal, and differential is carried out to maximum, minimum phase signal;
(3) by after differential maximum, minimum phase signal is in conjunction with glottis closing point position, wherein maximum phase is believed
Before number being placed on glottis closing point, after minimum phase signal is placed on glottis closing point, derivative glottal flow estimation is obtained;
(4) derivative glottal flow is integrated, realizes glottal source estimation.
Step 2 establishes the modified two mass block model of vibration of vocal cords of impact force, and determines model optimization parameter vector;Tool
Body are as follows:
The modified two mass block model of vibration IFCM of vocal cords of impact force is established, the kinetics equation of system vibration is obtained, and
Introduce left side vocal cords zoom factor Ql, right side vocal cords zoom factor Qr, left vocal cord coupling zoom factor Qcl, right vocal cord coupling scaling
Coefficient QcrAnd pressure subglottic zoom factor QPAs model optimization parameter;
Wherein miαFor the quality of two sides mass block, kiαFor stiffness factor, riαFor damped coefficient, wherein i=1,2 difference tables
Show mass block up and down, α=l, r respectively indicate left and right mass block;L is vocal cords length, diFor each mass block thickness, kcαFor coupled systemes
Number, ciαAdded resilience coefficient when then colliding for two sides vocal cords;aiIndicate glottis interval area, PiFor pressure subglottic;
Parameters in Mathematical Model amendment are as follows:
Wherein, mil0kil0kcl0cil0Respectively represent quality, the stiffness factor, the coefficient of coup, two sides vocal cords hair of left mass block
The parameter standard value of added resilience coefficient when raw collision;mir0kir0kcr0cir0Respectively represent quality, the stiffness system of right mass block
The parameter standard value of added resilience coefficient when number, the coefficient of coup, two sides vocal cords collide;Pi0For the parameter mark of pressure subglottic
Quasi- value.
Parameters,acoustic in step 3, extraction glottis wave;
First order derivative is asked to glottis wave Ug, obtains glottal flow derivative Ug ', according to glottis wave and glottal flow derivative it is main when
Between point extract glottis wave characteristic parameter:
F0=1/ (toT-to) (5)
OQ=(tc-to)/(toT-to) (6)
CIQ=(tc-tm)/(toT-to) (7)
Sr=Ugc'/Ugr'(8)
0NAQ=Ugm/ (Ugc'| (toT-to)) (9)
Wherein, F0 represents fundamental frequency, and quotient is opened in OQ representative, and CIQ represents closure quotient, and Sr represents oblique ratio, and NAQ represents normalization vibration
Width quotient;To is glottis start-up time, and tc is glottis closing moment, and at the time of tm is corresponding to glottis wave Ug peak value, Ugc' is sound
The minimum value of door waveguide number Ug ', Ugr' are the maximum value of glottal flow derivative Ug ', and Ugm is glottis crest value.
Step 4 carries out model inversion operation, tool using the optimization algorithm GA-QN that genetic algorithm is combined with quasi-Newton method
Body are as follows:
A, initial population is firstly generated, and determines maximum evolutionary generation;
B, the fitness f (Φ of each individual in population is calculatedi), the maximum individual of fitness is obtained as the present age optimal
Body;
Fitness function are as follows:
Wherein subscript m indicates model glottis wave characteristic parameter, and subscript o indicates practical glottis wave characteristic parameter.
C, contemporary community selected, intersected, obtaining new group after mutation operator;
D, Quasi-Newton algorithm is carried out to individual each in obtained new population again, obtains the population of a new generation;
E, the above A-D operation is repeated until reaching maximum evolutionary generation, and output global optimum is individual.
Step 5 carries out voice classification;
The variance analysis of each special medical treatment parameter when according to different voice sounding is weighted combination to it, proposes left and right sound
Band weighting degree of asymmetry WAR and normalization weighting zoom factor NWS:
It is combined with NWS two indices to distinguish normal voice and special voice using WAR, WAR describes vocal cords
Symmetrical degree, WAR is smaller, and the degree of asymmetry of vocal cords is higher;NWS describes vocal cords bilateral intensity of anomaly, and NWS is bigger, and vocal cords are double
Side intensity of anomaly is more serious.
The invention adopts the above technical scheme compared with prior art, has following technical effect that
For the present invention after inputting actual speech signal, extracting practical glottis wave is target, carries out inverting using genetic algorithm
Original model is optimized, thus vocal cord vibration situation when simulating different voice sounding.The experimental results showed that model is anti-
It drills rear each characteristic parameter matching relative error and is no more than 3.6%, efficiency of inverse process is good.It chooses normal voice and special voice carries out
Discriminance analysis has higher accuracy rate.
Detailed description of the invention
Fig. 1 is flow chart of the invention.
Fig. 2 is CCPD algorithm flow chart.
Fig. 3 is IFCM illustraton of model.
Fig. 4 is model physical parameter initial value table.
Fig. 5 is normalization glottis wave Ug figure.
Fig. 6 is glottal flow derivative Ug ' waveform diagram.
Fig. 7 is GA-QN algorithm flow chart.
Fig. 8 is the glottis parameter error contrast table after normal voice inverting.
Fig. 9 is the glottis parameter error contrast table after special voice inverting.
Figure 10 is to identify situation table using the normal voice and special voice of single classification indicators.
Figure 11 is that the normal voice combined using double classification indicators and special voice identify situation table.
Specific embodiment
Technical solution of the present invention is described in further detail with reference to the accompanying drawing:
Those skilled in the art can understand that unless otherwise defined, all terms used herein (including skill
Art term and scientific term) there is meaning identical with the general understanding of those of ordinary skill in fields of the present invention.Also
It should be understood that those terms such as defined in the general dictionary should be understood that have in the context of the prior art
The consistent meaning of meaning will not be explained in an idealized or overly formal meaning and unless defined as here.
The present invention obtains practical voice glottis wave as target glottis wave, using optimization algorithm using cepstrum polyphase decomposition
Vocal cords kinetic model under operation is carried out by matching target and model glottis wave characteristic parameter, after optimizing according to inverting most
Excellent parameter vector proposes voice classification standard.Specific step is as follows:
Step 1: obtaining practical glottis wave;
The present invention is estimated using cepstrum polyphase decomposition (complex cepstrumphase decomposition, CCPD)
Count glottis wave.The pitch period for seeking a frame voice sound signal first passes through DYPSA (Dynamic Programming
Projected Phase-Slope Algorithm) algorithm obtain a frame voice sound signal glottis closing point position, glottis is closed
Chalaza position is corresponding with pitch period, obtains glottis closing point specific location in each pitch period.Obtain each fundamental tone week
Voice sound signal in this period is decomposed into maximum phase using the method for cepstrum and minimum phase is believed by the voice sound signal in the phase
Number and differential, with glottis closing point position ining conjunction with, the component part of maximum phase is opened with glottis to match, minimum phase composition part and
Glottis, which closes, to match.It is that glottis opens phase before glottis closing point, before maximum phase signal is placed on glottis closing point;Glottis closure
Phase is closed for glottis after point, after minimum phase signal is placed on glottis closing point, obtains derivative glottal flow estimation;By differential glottis
Wave integral, realizes glottal source estimation.
Step 2: determining model optimization parameter vector;
Establish the modified two mass block model of vibration of vocal cords of impact force (Impact force correctionmodel,
IFCM), the kinetics equation of system vibration is obtained, and introduces left side vocal cords zoom factor Ql, right side vocal cords zoom factor Qr, left
Vocal cords couple zoom factor Qcl, right vocal cord coupling zoom factor QcrAnd pressure subglottic zoom factor QPJoin as model optimization
Number.
Wherein miαFor the quality of two sides mass block, kiαFor stiffness factor and riαFor damped coefficient, wherein i=1,2 difference tables
Show mass block up and down, α=l, r respectively indicate left and right mass block.kiαTo intercouple.L is vocal cords length, diIt is thick for each mass block
Degree, kcαFor the coefficient of coup, ciαAdded resilience coefficient when then colliding for two sides vocal cords;aiIndicate glottis interval area, Ps
For pressure subglottic.
Parameters in Mathematical Model amendment are as follows:
Wherein, mil0kil0kcl0cil0Respectively represent quality, the stiffness factor, the coefficient of coup, two sides vocal cords hair of left mass block
The parameter standard value of added resilience coefficient when raw collision;mir0kir0kcr0cir0Respectively represent quality, the stiffness system of right mass block
The parameter standard value of added resilience coefficient when number, the coefficient of coup, two sides vocal cords collide;Ps0For the parameter mark of pressure subglottic
Quasi- value.
Parameters,acoustic in step 3, extraction glottis wave;
First order derivative is asked to glottis wave Ug, obtains glottal flow derivative Ug ', according to glottis wave and glottal flow derivative it is main when
Between point extract glottis wave characteristic parameter:
F0=1/ (toT-to) (5)
OQ=(tc-to)/(toT-to) (6)
CIQ=(tc-tm)/(toT-to) (7)
Sr=Ugc'/Ugr'(8)
NAQ=Ugm/ (| Ugc'| (toT-to)) (9)
Wherein, F0 represents fundamental frequency, and quotient is opened in OQ representative, and CIQ represents closure quotient, and Sr represents oblique ratio, and NAQ represents normalization vibration
Width quotient;To is glottis start-up time, and tc is glottis closing moment, and at the time of tm is corresponding to glottis wave Ug peak value, Ugc' is sound
The minimum value of door waveguide number Ug ', Ugr' are the maximum value of glottal flow derivative Ug ', and Ugm is glottis crest value.
Step 4 carries out model inversion operation, tool using the optimization algorithm GA-QN that genetic algorithm is combined with quasi-Newton method
Body are as follows:
A, initial population is firstly generated, and determines maximum evolutionary generation;
B, the fitness f (Φ of each individual in population is calculatedi), the maximum individual of fitness is obtained as the present age optimal
Body;
Fitness function are as follows:
Wherein subscript m indicates model glottis wave characteristic parameter, and subscript o indicates practical glottis wave characteristic parameter.
C, contemporary community selected, intersected, obtaining new group after mutation operator;
D, Quasi-Newton algorithm is carried out to individual each in obtained new population again, obtains the population of a new generation;
E, the above A-D operation is repeated until reaching maximum evolutionary generation, and output global optimum is individual;
Step 5 carries out voice classification;
The variance analysis of each special medical treatment parameter when the present invention is according to different voice sounding, combination is weighted to it, is proposed
Left and right vocal cords weight degree of asymmetry WAR and normalization weighting zoom factor NWS:
It is combined with NWS two indices to distinguish normal voice and special voice using WAR.WAR describes vocal cords
Symmetrical degree, WAR is smaller, and the degree of asymmetry of vocal cords is higher;NWS describes vocal cords bilateral intensity of anomaly, and NWS is bigger, and vocal cords are double
Side intensity of anomaly is more serious.
Embodiment 1:
Process of the invention is as shown in Figure 1, first with cepstrum polyphase decomposition (complex cepstrum phase
Decomposition, CCPD) estimation glottis wave.The pitch period for seeking a frame voice sound signal first, passes through DYPSA
(Dynamic Programming ProjectedPhase-SlopeAlgorithm) algorithm obtains a frame voice sound signal glottis and closes
The position of chalaza, glottis closing point position is corresponding with pitch period, it is specific to obtain glottis closing point in each pitch period
Position.The voice sound signal in each pitch period is obtained, is decomposed into the voice sound signal in this period using the method for cepstrum
Maximum phase and minimum phase signal and differential, in conjunction with glottis closing point position, component part and the glottis of maximum phase open phase
It coincide, minimum phase composition part is closed with glottis to match.It is that glottis opens phase before glottis closing point, maximum phase signal is placed on
Before glottis closing point;It is that glottis closes phase and obtains differential after minimum phase signal is placed on glottis closing point after glottis closing point
Glottal source estimation;Derivative glottal flow is integrated, realizes glottal source estimation, concrete operations process is as shown in Figure 2.
Two mass block models of standard as shown in Figure 3 are established, vocal cords are indicated with two mass blocks, the two mass blocks are logical
Cross spring kiαWith damper riαIt is connected, wherein i=1,2 respectively indicate mass block up and down, and α=l, r respectively indicate left and right quality
Block, spring kiWith damper riPass through an other spring kiαTo intercouple.And determine model optimization parameter Ql、Qr、Qcl、
Qcr、Qp, coupling glottal air flow acquisition model glottis wave, wherein each physical parameter initial value is as shown in Figure 4 in model.
Vocal cords kinetic model inverting behaviour is carried out by matching target and model glottis wave characteristic parameter using GA-QN algorithm
Make, operating process is as shown in Figure 5.Population number Np is wherein set as 50, the number of iterations is no more than 300, and crossover probability Pc is
0.7, mutation probability Pm are 0.3, and the fitness function of each individual is as follows in population:
Wherein each glottis wave characteristic parameter is respectively fundamental frequency F0 (F0=1/ (toT- to)), open quotient OQ (OQ=(tc-to)/
(toT- to)), closure quotient CIQ (CIQ=(tc-tm)/(toT- to)), oblique ratio Sr (Sr=Ugm'/MFDR), normalized amplitude
Quotient NAQ (NAQ=Ugm/ (Ugc'| (toT-to))), each main time node is as shown in Figure 6, Figure 7 in formula.Subscript m indicates mould
Type glottis wave characteristic parameter, subscript o indicate practical glottis wave characteristic parameter.
The present invention uses MEEI (Massachusetts Eye and Ear Infirmary) database.The survey of database
Examination collection is sustained vowel/a/, normal voice and special voice is chosen from database, wherein special voice includes polyp of vocal cord
Voice and two kinds of paralysis vocal cord voice.40 normal voices and 40 special voice (20 unilateral sound are chosen from above-mentioned sample
Band is abnormal and 20 bilateral vocal cords are abnormal), wherein the sample frequency of speech samples is 25kHz.
Normal voice and each characteristic parameter matching error value ginseng of special voice glottis wave are as shown in Figure 8 and Figure 9.It can from figure
To find out, compared to traditional two mass block models (Two mass model, TMM), IFCM model pair proposed by the present invention
The matching error for each characteristic value answered is all smaller, and efficiency of inverse process is more preferable, when illustrating that IFCM model tool can preferably react sounding
The actual vibration situation of vocal cords.It is abnormal due to occurring at vocal cords when inputting voice is special voice, cause glottis to be closed degree
Decline, the decline of glottis wave regularity, glottis frequency and amplitude have different variations, therefore the matching of each characteristic parameter of mould misses
Difference is above normal voice, illustrates that the efficiency of inverse process of vocal cords exception voice is slightly inferior to normal voice, but its all characteristic parameter
Matching error is only up to 1.95%, illustrates that the effect of this paper vocal cords model inversion is preferable, for the analysis of model parameter after optimization
Provide guarantee.
Figure 10 is only to identify normal voice with special voice as a result, Figure 11 is using WAR with WAR index or NWS index
Index and NWS index combine the normal voice of identification with special voice vocal cords abnormal position as a result, including discrimination and Kappa
Index.Kappa index is used to describe the effect of identification, refers to that target value closer to 1, shows that recognition result is better.It can be with from figure
Find out, only with WAR index, the discrimination to normal voice and special voice is 86.25%, and only uses the identification of NWS index
Rate is 81.25%.The identification for carrying out normal voice and special voice, the accuracy rate energy of recognition result are combined using two indices
Reach 97.50%.The experimental results showed that method proposed by the present invention effectively can comprehensively distinguish normal voice and special voice
Vocal cords ask condition extremely.
The above is only some embodiments of the invention, it is noted that for the ordinary skill people of the art
For member, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also answered
It is considered as protection scope of the present invention.
Claims (6)
1. a kind of voice classification method of vocal cords kinetic model inverting characterized by comprising
Step 1 estimates glottis wave using cepstrum polyphase decomposition CCPD,
Step 2 establishes the modified two mass block model of vibration of vocal cords of impact force, and determines model optimization parameter vector;
Parameters,acoustic in step 3, extraction glottis wave;
Step 4, the optimization algorithm GA-QN combined using genetic algorithm with quasi-Newton method pass through matching target and model glottis
Wave characteristic parameter carries out vocal cords kinetic model under operation;
Step 5 carries out voice classification;
The variance analysis of each characteristic parameter when according to different voice sounding is weighted combination to it, proposes that left and right vocal cords add
Degree of asymmetry WAR and normalization weighting zoom factor NWS are weighed, is combined with NWS two indices to distinguish normal throat using WAR
Sound and special voice.
2. voice classification method according to claim 1, which is characterized in that utilize cepstrum polyphase decomposition described in step 1
CCPD estimates glottis wave, specifically:
(1) pitch period for seeking a frame voice sound signal first obtains a frame voice sound signal glottis closing point by DYPSA algorithm
Position, glottis closing point position is corresponding with pitch period, obtain glottis closing point specific location in each pitch period;
(2) voice sound signal in each pitch period is obtained, is decomposed the voice sound signal in this period using the method for cepstrum
For maximum phase signal and minimum phase signal, and differential is carried out to maximum, minimum phase signal;
(3) by after differential maximum, minimum phase signal is in conjunction with glottis closing point position, wherein maximum phase signal is put
It sets before glottis closing point, after minimum phase signal is placed on glottis closing point, obtains derivative glottal flow estimation;
(4) derivative glottal flow is integrated, realizes glottal source estimation.
3. voice classification method according to claim 1, which is characterized in that establish the modified sound of impact force described in step 3
Two mass block model of vibration of band, and determine model optimization parameter vector;Specifically:
The modified two mass block model of vibration IFCM of vocal cords of impact force is established, obtains the kinetics equation of system vibration, and introduce
Left side vocal cords zoom factor Ql, right side vocal cords zoom factor Qr, left vocal cord coupling zoom factor Qcl, right vocal cord coupling zoom factor
QcrAnd pressure subglottic zoom factor QPAs model optimization parameter;
Wherein miαFor the quality of two sides mass block, kiαFor stiffness factor, riαFor damped coefficient, wherein i=1,2 are respectively indicated
Lower mass block, α=l, r respectively indicate left and right mass block;L is vocal cords length, diFor each mass block thickness, kcαFor the coefficient of coup,
ciαAdded resilience coefficient when then colliding for two sides vocal cords;aiIndicate glottis interval area, PiFor pressure subglottic;
Parameters in Mathematical Model amendment are as follows:
Wherein, mil0kil0kcl0cil0The quality of left mass block, stiffness factor, the coefficient of coup, two sides vocal cords are respectively represented to touch
The parameter standard value of added resilience coefficient when hitting;mir0kir0kcr0cir0Respectively represent quality, the stiffness factor, coupling of right mass block
The parameter standard value of added resilience coefficient when collaboration number, two sides vocal cords collide;Pi0For the parameter standard value of pressure subglottic.
4. voice classification method according to claim 1, which is characterized in that the acoustics in extraction glottis wave described in step 3
Parameter specifically:
First order derivative is asked to glottis wave Ug, glottal flow derivative Ug ' is obtained, according to the main time points of glottis wave and glottal flow derivative
Extract glottis wave characteristic parameter:
F0=1/ (toT-to) (5)
OQ=(tc-to)/(toT-to) (6)
CIQ=(tc-tm)/(toT-to) (7)
Sr=Ugc'/Ugr'(8)
NAQ=Ugm/ (| Ugc'| (toT-to)) (9)
Wherein, F0 represents fundamental frequency, and quotient is opened in OQ representative, and CIQ represents closure quotient, and Sr represents oblique ratio, and NAQ represents normalized amplitude
Quotient;To is glottis start-up time, and tc is glottis closing moment, and at the time of tm is corresponding to glottis wave Ug peak value, Ugc' is glottis
The minimum value of waveguide number Ug ', Ugr' are the maximum value of glottal flow derivative Ug ', and Ugm is glottis crest value.
5. voice classification method according to claim 4, which is characterized in that using genetic algorithm and quasi- ox described in step 4
The optimization algorithm GA-QN that the method for pausing combines, passes through matching target and model glottis wave characteristic parameter carries out vocal cords kinetic model
Under operation, specifically:
A, initial population is firstly generated, and determines maximum evolutionary generation;
B, the fitness f (Φ of each individual in population is calculatedi), the maximum individual of fitness is obtained as contemporary optimum individual;
Fitness function are as follows:
Wherein subscript m indicates model glottis wave characteristic parameter, and subscript o indicates practical glottis wave characteristic parameter;
C, contemporary community selected, intersected, obtaining new group after mutation operator;
D, Quasi-Newton algorithm is carried out to individual each in obtained new population again, obtains the population of a new generation;
E, the above B-D operation is repeated until reaching maximum evolutionary generation, and output global optimum is individual.
6. voice classification method according to claim 3, which is characterized in that the weighting of left and right vocal cords described in step 5 is asymmetric
Spend WAR and normalization weighting zoom factor NWS:
It is combined with NWS two indices to distinguish normal voice and special voice using WAR, WAR describes the symmetrical of vocal cords
Degree, WAR is smaller, and the degree of asymmetry of vocal cords is higher;NWS describes vocal cords bilateral intensity of anomaly, and NWS is bigger, and vocal cords bilateral is different
Chang Chengdu is more serious.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810824379.6A CN109119094B (en) | 2018-07-25 | 2018-07-25 | Vocal classification method using vocal cord modeling inversion |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810824379.6A CN109119094B (en) | 2018-07-25 | 2018-07-25 | Vocal classification method using vocal cord modeling inversion |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109119094A true CN109119094A (en) | 2019-01-01 |
CN109119094B CN109119094B (en) | 2023-04-28 |
Family
ID=64863285
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810824379.6A Active CN109119094B (en) | 2018-07-25 | 2018-07-25 | Vocal classification method using vocal cord modeling inversion |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109119094B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110870765A (en) * | 2019-06-27 | 2020-03-10 | 上海慧敏医疗器械有限公司 | Voice treatment instrument and method adopting glottis closing real-time measurement and audio-visual feedback technology |
CN111081273A (en) * | 2019-12-31 | 2020-04-28 | 湖南景程电子科技有限公司 | Voice emotion recognition method based on glottal wave signal feature extraction |
CN112201226A (en) * | 2020-09-28 | 2021-01-08 | 复旦大学 | Sound production mode judging method and system |
CN112562650A (en) * | 2020-10-31 | 2021-03-26 | 苏州大学 | Voice recognition classification method based on vocal cord characteristic parameters |
CN113012716A (en) * | 2021-02-26 | 2021-06-22 | 武汉星巡智能科技有限公司 | Method, device and equipment for identifying baby cry category |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4343969A (en) * | 1978-10-02 | 1982-08-10 | Trans-Data Associates | Apparatus and method for articulatory speech recognition |
US20010020141A1 (en) * | 1995-10-31 | 2001-09-06 | Elvina Ivanovna Chahine | Method of restoring speech functions in patients suffering from various forms of dysarthria, and dysarthria probes |
US20080300867A1 (en) * | 2007-06-03 | 2008-12-04 | Yan Yuling | System and method of analyzing voice via visual and acoustic data |
CN101916566A (en) * | 2010-07-09 | 2010-12-15 | 西安交通大学 | Electronic larynx speech reconstructing method and system thereof |
CN103730130A (en) * | 2013-12-20 | 2014-04-16 | 中国科学院深圳先进技术研究院 | Detection method and system for pathological voice |
CN103778913A (en) * | 2014-01-22 | 2014-05-07 | 苏州大学 | Pathological voice recognition method |
CN105359211A (en) * | 2013-09-09 | 2016-02-24 | 华为技术有限公司 | Unvoiced/voiced decision for speech processing |
CN108133713A (en) * | 2017-11-27 | 2018-06-08 | 苏州大学 | Method for estimating sound channel area under glottic closed phase |
-
2018
- 2018-07-25 CN CN201810824379.6A patent/CN109119094B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4343969A (en) * | 1978-10-02 | 1982-08-10 | Trans-Data Associates | Apparatus and method for articulatory speech recognition |
US20010020141A1 (en) * | 1995-10-31 | 2001-09-06 | Elvina Ivanovna Chahine | Method of restoring speech functions in patients suffering from various forms of dysarthria, and dysarthria probes |
US20080300867A1 (en) * | 2007-06-03 | 2008-12-04 | Yan Yuling | System and method of analyzing voice via visual and acoustic data |
CN101916566A (en) * | 2010-07-09 | 2010-12-15 | 西安交通大学 | Electronic larynx speech reconstructing method and system thereof |
CN105359211A (en) * | 2013-09-09 | 2016-02-24 | 华为技术有限公司 | Unvoiced/voiced decision for speech processing |
CN103730130A (en) * | 2013-12-20 | 2014-04-16 | 中国科学院深圳先进技术研究院 | Detection method and system for pathological voice |
CN103778913A (en) * | 2014-01-22 | 2014-05-07 | 苏州大学 | Pathological voice recognition method |
CN108133713A (en) * | 2017-11-27 | 2018-06-08 | 苏州大学 | Method for estimating sound channel area under glottic closed phase |
Non-Patent Citations (2)
Title |
---|
HUIQUN DENG ET AL.: "A New Method for Obtaining Accurate Estimates of Vocal-Tract Filters and Glottal Waves From Vowel Sounds", 《IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING》 * |
曾晓亮 等: "利用声带动力学模型参数反演方法进行病变嗓音分类", 《声学学报》 * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110870765A (en) * | 2019-06-27 | 2020-03-10 | 上海慧敏医疗器械有限公司 | Voice treatment instrument and method adopting glottis closing real-time measurement and audio-visual feedback technology |
CN111081273A (en) * | 2019-12-31 | 2020-04-28 | 湖南景程电子科技有限公司 | Voice emotion recognition method based on glottal wave signal feature extraction |
CN112201226A (en) * | 2020-09-28 | 2021-01-08 | 复旦大学 | Sound production mode judging method and system |
CN112201226B (en) * | 2020-09-28 | 2022-09-16 | 复旦大学 | Sound production mode judging method and system |
CN112562650A (en) * | 2020-10-31 | 2021-03-26 | 苏州大学 | Voice recognition classification method based on vocal cord characteristic parameters |
CN113012716A (en) * | 2021-02-26 | 2021-06-22 | 武汉星巡智能科技有限公司 | Method, device and equipment for identifying baby cry category |
CN113012716B (en) * | 2021-02-26 | 2023-08-04 | 武汉星巡智能科技有限公司 | Infant crying type identification method, device and equipment |
CN116631443A (en) * | 2021-02-26 | 2023-08-22 | 武汉星巡智能科技有限公司 | Infant crying type detection method, device and equipment based on vibration spectrum comparison |
CN116631443B (en) * | 2021-02-26 | 2024-05-07 | 武汉星巡智能科技有限公司 | Infant crying type detection method, device and equipment based on vibration spectrum comparison |
Also Published As
Publication number | Publication date |
---|---|
CN109119094B (en) | 2023-04-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109119094A (en) | Voice classification method by utilizing vocal cord modeling inversion | |
Drugman et al. | Glottal source processing: From analysis to applications | |
CN102664016B (en) | Singing evaluation method and system | |
CN102521281B (en) | Humming computer music searching method based on longest matching subsequence algorithm | |
CN105023570B (en) | A kind of method and system for realizing sound conversion | |
CN109243494A (en) | Childhood emotional recognition methods based on the long memory network in short-term of multiple attention mechanism | |
Ewert et al. | Piano transcription in the studio using an extensible alternating directions framework | |
Bhagatpatil et al. | An automatic infant’s cry detection using linear frequency cepstrum coefficients (LFCC) | |
Zhang et al. | Multiple vowels repair based on pitch extraction and line spectrum pair feature for voice disorder | |
Perez-Carrillo et al. | Indirect acquisition of violin instrumental controls from audio signal with hidden Markov models | |
Narendra et al. | Estimation of the glottal source from coded telephone speech using deep neural networks | |
CN111081273A (en) | Voice emotion recognition method based on glottal wave signal feature extraction | |
Cummings et al. | Glottal models for digital speech processing: A historical survey and new results | |
Le et al. | Personalized speech enhancement combining band-split rnn and speaker attentive module | |
Little et al. | Biomechanically informed nonlinear speech signal processing | |
Parlak et al. | Harmonic differences method for robust fundamental frequency detection in wideband and narrowband speech signals | |
Gao | Audio deepfake detection based on differences in human and machine generated speech | |
Albornoz et al. | Snore recognition using a reduced set of spectral features | |
Zheng et al. | Throat microphone speech enhancement via progressive learning of spectral mapping based on lstm-rnn | |
Zhang et al. | Pathological voice classification based on the features of an asymmetric fluid–structure interaction vocal cord model | |
Patil et al. | Combining evidence from spectral and source-like features for person recognition from humming | |
Lv et al. | Objective evaluation method of broadcasting vocal timbre based on feature selection | |
Nhu et al. | Singing performance of the talking robot with newly redesigned artificial vocal cords | |
Bai et al. | Glottal Features Under Workload in Human-Robot Interaction | |
Yan et al. | A Dual-Mode Real-Time Lip-Sync System for a Bionic Dinosaur Robot |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |