CN109119094A - Voice classification method by utilizing vocal cord modeling inversion - Google Patents

Voice classification method by utilizing vocal cord modeling inversion Download PDF

Info

Publication number
CN109119094A
CN109119094A CN201810824379.6A CN201810824379A CN109119094A CN 109119094 A CN109119094 A CN 109119094A CN 201810824379 A CN201810824379 A CN 201810824379A CN 109119094 A CN109119094 A CN 109119094A
Authority
CN
China
Prior art keywords
glottis
voice
vocal cords
model
wave
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810824379.6A
Other languages
Chinese (zh)
Other versions
CN109119094B (en
Inventor
孙宝印
陶智
陈莉媛
张晓俊
吴迪
肖仲喆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou University
Original Assignee
Suzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou University filed Critical Suzhou University
Priority to CN201810824379.6A priority Critical patent/CN109119094B/en
Publication of CN109119094A publication Critical patent/CN109119094A/en
Application granted granted Critical
Publication of CN109119094B publication Critical patent/CN109119094B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/39Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using genetic algorithms
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)
  • Prostheses (AREA)

Abstract

The invention discloses a vocal classification method by utilizing vocal cord modeling inversion, which effectively distinguishes various voices from the perspective of vocal mechanism. The method mainly utilizes complex cepstrum phase decomposition to obtain an actual glottal wave as a target glottal wave, adopts an optimization algorithm to perform vocal cord dynamics model inversion operation by matching target and model glottal wave characteristic parameters, selects normal voice and special voice to perform identification and classification, and has better accuracy. After the actual voice signal is input, the actual glottal wave is extracted as a target, and the genetic algorithm is adopted for inversion to optimize the original model, so that vocal cord vibration conditions when different voices sound are sounded are simulated. Experimental results show that the matching relative error of each characteristic parameter after model inversion is not more than 1.95%, and the inversion effect is good. The normal voice and the special voice are selected for identification and analysis, and the accuracy is high.

Description

A kind of voice classification method using vocal cords modeling inversion
Technical field
The present invention relates to voice classification fields, and in particular to a kind of voice classification method of vocal cords modeling inversion.
Background technique
Voice sorting technique is to carry out signature analysis, the skill that will be distinguished to different types of voice to voice sound signal Art, the technology can be applied to emotional speech analysis and assessment of voice quality etc..Language of the quality of voice quality to people Speech expression has a direct impact, particularly important for teacher, announcer and singer etc., and people is in long-time sounding or in tight When having the state of pressure, voice can generate variation, or even sound of neighing occur.
The acoustic analysis method of voice is widely used at present, is only capable of providing the acoustic information of voice, it cannot be with practical sounding The physiological structure of system is associated, and can not providing good classification standard classification results, there are biggish errors.Research shows that When issuing identical voice, some special voices (such as: vocal nodule, polyp of vocal cord, hyperthyroidism voice) and normal voice vocal cords Vibration Condition is different, the corresponding vocal cord vibration situation of the voice being under different moods also different from.In view of vocal cords It is important one of system during speech production, the Vibration Condition of vocal cords will have a direct impact on the quality of voice quality, therefore will Vocal cords modeling technique is combined with Acouisic analysis, and simulated sound band is in the glottis wave exported when different conditions, to carry out voice Classification.
Voice classification is carried out using vocal cords modeling technique, glottis wave is mainly exported by model, to simulate actual throat Sound, to classify to voice.In actual conditions, the physical parameter of model is directly set, is difficult to simulate and practical voice The glottis wave that signal matches, has a significant impact to the setting of subsequent classification standard.
Summary of the invention
The technical problems to be solved by the present invention are: being directed to the defect of background technique, the present invention is in traditional two mass of vocal cords On the basis of block models, it is subject to the amendment of impact force, matching optimization is carried out to model using inversion algorithm, to reach accurate simulation Physiological conditions when practical vocal cord vibration, to carry out the classification of normal voice Yu special voice.
The present invention uses following technical scheme to solve above-mentioned technical problem:
The present invention proposes a kind of voice classification method of vocal cords kinetic model inverting, specifically includes the following steps:
Step 1 estimates glottis wave using cepstrum polyphase decomposition CCPD, specifically:
(1) pitch period for seeking a frame voice sound signal first obtains a frame voice sound signal glottis by DYPSA algorithm and closes The position of chalaza, glottis closing point position is corresponding with pitch period, it is specific to obtain glottis closing point in each pitch period Position;
(2) voice sound signal in each pitch period is obtained, using the method for cepstrum by the voice sound signal in this period It is decomposed into maximum phase signal and minimum phase signal, and differential is carried out to maximum, minimum phase signal;
(3) by after differential maximum, minimum phase signal is in conjunction with glottis closing point position, wherein maximum phase is believed Before number being placed on glottis closing point, after minimum phase signal is placed on glottis closing point, derivative glottal flow estimation is obtained;
(4) derivative glottal flow is integrated, realizes glottal source estimation.
Step 2 establishes the modified two mass block model of vibration of vocal cords of impact force, and determines model optimization parameter vector;Tool Body are as follows:
The modified two mass block model of vibration IFCM of vocal cords of impact force is established, the kinetics equation of system vibration is obtained, and Introduce left side vocal cords zoom factor Ql, right side vocal cords zoom factor Qr, left vocal cord coupling zoom factor Qcl, right vocal cord coupling scaling Coefficient QcrAnd pressure subglottic zoom factor QPAs model optimization parameter;
Wherein mFor the quality of two sides mass block, kFor stiffness factor, rFor damped coefficient, wherein i=1,2 difference tables Show mass block up and down, α=l, r respectively indicate left and right mass block;L is vocal cords length, diFor each mass block thickness, kFor coupled systemes Number, cAdded resilience coefficient when then colliding for two sides vocal cords;aiIndicate glottis interval area, PiFor pressure subglottic;
Parameters in Mathematical Model amendment are as follows:
Wherein, mil0kil0kcl0cil0Respectively represent quality, the stiffness factor, the coefficient of coup, two sides vocal cords hair of left mass block The parameter standard value of added resilience coefficient when raw collision;mir0kir0kcr0cir0Respectively represent quality, the stiffness system of right mass block The parameter standard value of added resilience coefficient when number, the coefficient of coup, two sides vocal cords collide;Pi0For the parameter mark of pressure subglottic Quasi- value.
Parameters,acoustic in step 3, extraction glottis wave;
First order derivative is asked to glottis wave Ug, obtains glottal flow derivative Ug ', according to glottis wave and glottal flow derivative it is main when Between point extract glottis wave characteristic parameter:
F0=1/ (toT-to) (5)
OQ=(tc-to)/(toT-to) (6)
CIQ=(tc-tm)/(toT-to) (7)
Sr=Ugc'/Ugr'(8)
0NAQ=Ugm/ (Ugc'| (toT-to)) (9)
Wherein, F0 represents fundamental frequency, and quotient is opened in OQ representative, and CIQ represents closure quotient, and Sr represents oblique ratio, and NAQ represents normalization vibration Width quotient;To is glottis start-up time, and tc is glottis closing moment, and at the time of tm is corresponding to glottis wave Ug peak value, Ugc' is sound The minimum value of door waveguide number Ug ', Ugr' are the maximum value of glottal flow derivative Ug ', and Ugm is glottis crest value.
Step 4 carries out model inversion operation, tool using the optimization algorithm GA-QN that genetic algorithm is combined with quasi-Newton method Body are as follows:
A, initial population is firstly generated, and determines maximum evolutionary generation;
B, the fitness f (Φ of each individual in population is calculatedi), the maximum individual of fitness is obtained as the present age optimal Body;
Fitness function are as follows:
Wherein subscript m indicates model glottis wave characteristic parameter, and subscript o indicates practical glottis wave characteristic parameter.
C, contemporary community selected, intersected, obtaining new group after mutation operator;
D, Quasi-Newton algorithm is carried out to individual each in obtained new population again, obtains the population of a new generation;
E, the above A-D operation is repeated until reaching maximum evolutionary generation, and output global optimum is individual.
Step 5 carries out voice classification;
The variance analysis of each special medical treatment parameter when according to different voice sounding is weighted combination to it, proposes left and right sound Band weighting degree of asymmetry WAR and normalization weighting zoom factor NWS:
It is combined with NWS two indices to distinguish normal voice and special voice using WAR, WAR describes vocal cords Symmetrical degree, WAR is smaller, and the degree of asymmetry of vocal cords is higher;NWS describes vocal cords bilateral intensity of anomaly, and NWS is bigger, and vocal cords are double Side intensity of anomaly is more serious.
The invention adopts the above technical scheme compared with prior art, has following technical effect that
For the present invention after inputting actual speech signal, extracting practical glottis wave is target, carries out inverting using genetic algorithm Original model is optimized, thus vocal cord vibration situation when simulating different voice sounding.The experimental results showed that model is anti- It drills rear each characteristic parameter matching relative error and is no more than 3.6%, efficiency of inverse process is good.It chooses normal voice and special voice carries out Discriminance analysis has higher accuracy rate.
Detailed description of the invention
Fig. 1 is flow chart of the invention.
Fig. 2 is CCPD algorithm flow chart.
Fig. 3 is IFCM illustraton of model.
Fig. 4 is model physical parameter initial value table.
Fig. 5 is normalization glottis wave Ug figure.
Fig. 6 is glottal flow derivative Ug ' waveform diagram.
Fig. 7 is GA-QN algorithm flow chart.
Fig. 8 is the glottis parameter error contrast table after normal voice inverting.
Fig. 9 is the glottis parameter error contrast table after special voice inverting.
Figure 10 is to identify situation table using the normal voice and special voice of single classification indicators.
Figure 11 is that the normal voice combined using double classification indicators and special voice identify situation table.
Specific embodiment
Technical solution of the present invention is described in further detail with reference to the accompanying drawing:
Those skilled in the art can understand that unless otherwise defined, all terms used herein (including skill Art term and scientific term) there is meaning identical with the general understanding of those of ordinary skill in fields of the present invention.Also It should be understood that those terms such as defined in the general dictionary should be understood that have in the context of the prior art The consistent meaning of meaning will not be explained in an idealized or overly formal meaning and unless defined as here.
The present invention obtains practical voice glottis wave as target glottis wave, using optimization algorithm using cepstrum polyphase decomposition Vocal cords kinetic model under operation is carried out by matching target and model glottis wave characteristic parameter, after optimizing according to inverting most Excellent parameter vector proposes voice classification standard.Specific step is as follows:
Step 1: obtaining practical glottis wave;
The present invention is estimated using cepstrum polyphase decomposition (complex cepstrumphase decomposition, CCPD) Count glottis wave.The pitch period for seeking a frame voice sound signal first passes through DYPSA (Dynamic Programming Projected Phase-Slope Algorithm) algorithm obtain a frame voice sound signal glottis closing point position, glottis is closed Chalaza position is corresponding with pitch period, obtains glottis closing point specific location in each pitch period.Obtain each fundamental tone week Voice sound signal in this period is decomposed into maximum phase using the method for cepstrum and minimum phase is believed by the voice sound signal in the phase Number and differential, with glottis closing point position ining conjunction with, the component part of maximum phase is opened with glottis to match, minimum phase composition part and Glottis, which closes, to match.It is that glottis opens phase before glottis closing point, before maximum phase signal is placed on glottis closing point;Glottis closure Phase is closed for glottis after point, after minimum phase signal is placed on glottis closing point, obtains derivative glottal flow estimation;By differential glottis Wave integral, realizes glottal source estimation.
Step 2: determining model optimization parameter vector;
Establish the modified two mass block model of vibration of vocal cords of impact force (Impact force correctionmodel, IFCM), the kinetics equation of system vibration is obtained, and introduces left side vocal cords zoom factor Ql, right side vocal cords zoom factor Qr, left Vocal cords couple zoom factor Qcl, right vocal cord coupling zoom factor QcrAnd pressure subglottic zoom factor QPJoin as model optimization Number.
Wherein mFor the quality of two sides mass block, kFor stiffness factor and rFor damped coefficient, wherein i=1,2 difference tables Show mass block up and down, α=l, r respectively indicate left and right mass block.kTo intercouple.L is vocal cords length, diIt is thick for each mass block Degree, kFor the coefficient of coup, cAdded resilience coefficient when then colliding for two sides vocal cords;aiIndicate glottis interval area, Ps For pressure subglottic.
Parameters in Mathematical Model amendment are as follows:
Wherein, mil0kil0kcl0cil0Respectively represent quality, the stiffness factor, the coefficient of coup, two sides vocal cords hair of left mass block The parameter standard value of added resilience coefficient when raw collision;mir0kir0kcr0cir0Respectively represent quality, the stiffness system of right mass block The parameter standard value of added resilience coefficient when number, the coefficient of coup, two sides vocal cords collide;Ps0For the parameter mark of pressure subglottic Quasi- value.
Parameters,acoustic in step 3, extraction glottis wave;
First order derivative is asked to glottis wave Ug, obtains glottal flow derivative Ug ', according to glottis wave and glottal flow derivative it is main when Between point extract glottis wave characteristic parameter:
F0=1/ (toT-to) (5)
OQ=(tc-to)/(toT-to) (6)
CIQ=(tc-tm)/(toT-to) (7)
Sr=Ugc'/Ugr'(8)
NAQ=Ugm/ (| Ugc'| (toT-to)) (9)
Wherein, F0 represents fundamental frequency, and quotient is opened in OQ representative, and CIQ represents closure quotient, and Sr represents oblique ratio, and NAQ represents normalization vibration Width quotient;To is glottis start-up time, and tc is glottis closing moment, and at the time of tm is corresponding to glottis wave Ug peak value, Ugc' is sound The minimum value of door waveguide number Ug ', Ugr' are the maximum value of glottal flow derivative Ug ', and Ugm is glottis crest value.
Step 4 carries out model inversion operation, tool using the optimization algorithm GA-QN that genetic algorithm is combined with quasi-Newton method Body are as follows:
A, initial population is firstly generated, and determines maximum evolutionary generation;
B, the fitness f (Φ of each individual in population is calculatedi), the maximum individual of fitness is obtained as the present age optimal Body;
Fitness function are as follows:
Wherein subscript m indicates model glottis wave characteristic parameter, and subscript o indicates practical glottis wave characteristic parameter.
C, contemporary community selected, intersected, obtaining new group after mutation operator;
D, Quasi-Newton algorithm is carried out to individual each in obtained new population again, obtains the population of a new generation;
E, the above A-D operation is repeated until reaching maximum evolutionary generation, and output global optimum is individual;
Step 5 carries out voice classification;
The variance analysis of each special medical treatment parameter when the present invention is according to different voice sounding, combination is weighted to it, is proposed Left and right vocal cords weight degree of asymmetry WAR and normalization weighting zoom factor NWS:
It is combined with NWS two indices to distinguish normal voice and special voice using WAR.WAR describes vocal cords Symmetrical degree, WAR is smaller, and the degree of asymmetry of vocal cords is higher;NWS describes vocal cords bilateral intensity of anomaly, and NWS is bigger, and vocal cords are double Side intensity of anomaly is more serious.
Embodiment 1:
Process of the invention is as shown in Figure 1, first with cepstrum polyphase decomposition (complex cepstrum phase Decomposition, CCPD) estimation glottis wave.The pitch period for seeking a frame voice sound signal first, passes through DYPSA (Dynamic Programming ProjectedPhase-SlopeAlgorithm) algorithm obtains a frame voice sound signal glottis and closes The position of chalaza, glottis closing point position is corresponding with pitch period, it is specific to obtain glottis closing point in each pitch period Position.The voice sound signal in each pitch period is obtained, is decomposed into the voice sound signal in this period using the method for cepstrum Maximum phase and minimum phase signal and differential, in conjunction with glottis closing point position, component part and the glottis of maximum phase open phase It coincide, minimum phase composition part is closed with glottis to match.It is that glottis opens phase before glottis closing point, maximum phase signal is placed on Before glottis closing point;It is that glottis closes phase and obtains differential after minimum phase signal is placed on glottis closing point after glottis closing point Glottal source estimation;Derivative glottal flow is integrated, realizes glottal source estimation, concrete operations process is as shown in Figure 2.
Two mass block models of standard as shown in Figure 3 are established, vocal cords are indicated with two mass blocks, the two mass blocks are logical Cross spring kWith damper rIt is connected, wherein i=1,2 respectively indicate mass block up and down, and α=l, r respectively indicate left and right quality Block, spring kiWith damper riPass through an other spring kTo intercouple.And determine model optimization parameter Ql、Qr、Qcl、 Qcr、Qp, coupling glottal air flow acquisition model glottis wave, wherein each physical parameter initial value is as shown in Figure 4 in model.
Vocal cords kinetic model inverting behaviour is carried out by matching target and model glottis wave characteristic parameter using GA-QN algorithm Make, operating process is as shown in Figure 5.Population number Np is wherein set as 50, the number of iterations is no more than 300, and crossover probability Pc is 0.7, mutation probability Pm are 0.3, and the fitness function of each individual is as follows in population:
Wherein each glottis wave characteristic parameter is respectively fundamental frequency F0 (F0=1/ (toT- to)), open quotient OQ (OQ=(tc-to)/ (toT- to)), closure quotient CIQ (CIQ=(tc-tm)/(toT- to)), oblique ratio Sr (Sr=Ugm'/MFDR), normalized amplitude Quotient NAQ (NAQ=Ugm/ (Ugc'| (toT-to))), each main time node is as shown in Figure 6, Figure 7 in formula.Subscript m indicates mould Type glottis wave characteristic parameter, subscript o indicate practical glottis wave characteristic parameter.
The present invention uses MEEI (Massachusetts Eye and Ear Infirmary) database.The survey of database Examination collection is sustained vowel/a/, normal voice and special voice is chosen from database, wherein special voice includes polyp of vocal cord Voice and two kinds of paralysis vocal cord voice.40 normal voices and 40 special voice (20 unilateral sound are chosen from above-mentioned sample Band is abnormal and 20 bilateral vocal cords are abnormal), wherein the sample frequency of speech samples is 25kHz.
Normal voice and each characteristic parameter matching error value ginseng of special voice glottis wave are as shown in Figure 8 and Figure 9.It can from figure To find out, compared to traditional two mass block models (Two mass model, TMM), IFCM model pair proposed by the present invention The matching error for each characteristic value answered is all smaller, and efficiency of inverse process is more preferable, when illustrating that IFCM model tool can preferably react sounding The actual vibration situation of vocal cords.It is abnormal due to occurring at vocal cords when inputting voice is special voice, cause glottis to be closed degree Decline, the decline of glottis wave regularity, glottis frequency and amplitude have different variations, therefore the matching of each characteristic parameter of mould misses Difference is above normal voice, illustrates that the efficiency of inverse process of vocal cords exception voice is slightly inferior to normal voice, but its all characteristic parameter Matching error is only up to 1.95%, illustrates that the effect of this paper vocal cords model inversion is preferable, for the analysis of model parameter after optimization Provide guarantee.
Figure 10 is only to identify normal voice with special voice as a result, Figure 11 is using WAR with WAR index or NWS index Index and NWS index combine the normal voice of identification with special voice vocal cords abnormal position as a result, including discrimination and Kappa Index.Kappa index is used to describe the effect of identification, refers to that target value closer to 1, shows that recognition result is better.It can be with from figure Find out, only with WAR index, the discrimination to normal voice and special voice is 86.25%, and only uses the identification of NWS index Rate is 81.25%.The identification for carrying out normal voice and special voice, the accuracy rate energy of recognition result are combined using two indices Reach 97.50%.The experimental results showed that method proposed by the present invention effectively can comprehensively distinguish normal voice and special voice Vocal cords ask condition extremely.
The above is only some embodiments of the invention, it is noted that for the ordinary skill people of the art For member, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also answered It is considered as protection scope of the present invention.

Claims (6)

1. a kind of voice classification method of vocal cords kinetic model inverting characterized by comprising
Step 1 estimates glottis wave using cepstrum polyphase decomposition CCPD,
Step 2 establishes the modified two mass block model of vibration of vocal cords of impact force, and determines model optimization parameter vector;
Parameters,acoustic in step 3, extraction glottis wave;
Step 4, the optimization algorithm GA-QN combined using genetic algorithm with quasi-Newton method pass through matching target and model glottis Wave characteristic parameter carries out vocal cords kinetic model under operation;
Step 5 carries out voice classification;
The variance analysis of each characteristic parameter when according to different voice sounding is weighted combination to it, proposes that left and right vocal cords add Degree of asymmetry WAR and normalization weighting zoom factor NWS are weighed, is combined with NWS two indices to distinguish normal throat using WAR Sound and special voice.
2. voice classification method according to claim 1, which is characterized in that utilize cepstrum polyphase decomposition described in step 1 CCPD estimates glottis wave, specifically:
(1) pitch period for seeking a frame voice sound signal first obtains a frame voice sound signal glottis closing point by DYPSA algorithm Position, glottis closing point position is corresponding with pitch period, obtain glottis closing point specific location in each pitch period;
(2) voice sound signal in each pitch period is obtained, is decomposed the voice sound signal in this period using the method for cepstrum For maximum phase signal and minimum phase signal, and differential is carried out to maximum, minimum phase signal;
(3) by after differential maximum, minimum phase signal is in conjunction with glottis closing point position, wherein maximum phase signal is put It sets before glottis closing point, after minimum phase signal is placed on glottis closing point, obtains derivative glottal flow estimation;
(4) derivative glottal flow is integrated, realizes glottal source estimation.
3. voice classification method according to claim 1, which is characterized in that establish the modified sound of impact force described in step 3 Two mass block model of vibration of band, and determine model optimization parameter vector;Specifically:
The modified two mass block model of vibration IFCM of vocal cords of impact force is established, obtains the kinetics equation of system vibration, and introduce Left side vocal cords zoom factor Ql, right side vocal cords zoom factor Qr, left vocal cord coupling zoom factor Qcl, right vocal cord coupling zoom factor QcrAnd pressure subglottic zoom factor QPAs model optimization parameter;
Wherein mFor the quality of two sides mass block, kFor stiffness factor, rFor damped coefficient, wherein i=1,2 are respectively indicated Lower mass block, α=l, r respectively indicate left and right mass block;L is vocal cords length, diFor each mass block thickness, kFor the coefficient of coup, cAdded resilience coefficient when then colliding for two sides vocal cords;aiIndicate glottis interval area, PiFor pressure subglottic;
Parameters in Mathematical Model amendment are as follows:
Wherein, mil0kil0kcl0cil0The quality of left mass block, stiffness factor, the coefficient of coup, two sides vocal cords are respectively represented to touch The parameter standard value of added resilience coefficient when hitting;mir0kir0kcr0cir0Respectively represent quality, the stiffness factor, coupling of right mass block The parameter standard value of added resilience coefficient when collaboration number, two sides vocal cords collide;Pi0For the parameter standard value of pressure subglottic.
4. voice classification method according to claim 1, which is characterized in that the acoustics in extraction glottis wave described in step 3 Parameter specifically:
First order derivative is asked to glottis wave Ug, glottal flow derivative Ug ' is obtained, according to the main time points of glottis wave and glottal flow derivative Extract glottis wave characteristic parameter:
F0=1/ (toT-to) (5)
OQ=(tc-to)/(toT-to) (6)
CIQ=(tc-tm)/(toT-to) (7)
Sr=Ugc'/Ugr'(8)
NAQ=Ugm/ (| Ugc'| (toT-to)) (9)
Wherein, F0 represents fundamental frequency, and quotient is opened in OQ representative, and CIQ represents closure quotient, and Sr represents oblique ratio, and NAQ represents normalized amplitude Quotient;To is glottis start-up time, and tc is glottis closing moment, and at the time of tm is corresponding to glottis wave Ug peak value, Ugc' is glottis The minimum value of waveguide number Ug ', Ugr' are the maximum value of glottal flow derivative Ug ', and Ugm is glottis crest value.
5. voice classification method according to claim 4, which is characterized in that using genetic algorithm and quasi- ox described in step 4 The optimization algorithm GA-QN that the method for pausing combines, passes through matching target and model glottis wave characteristic parameter carries out vocal cords kinetic model Under operation, specifically:
A, initial population is firstly generated, and determines maximum evolutionary generation;
B, the fitness f (Φ of each individual in population is calculatedi), the maximum individual of fitness is obtained as contemporary optimum individual;
Fitness function are as follows:
Wherein subscript m indicates model glottis wave characteristic parameter, and subscript o indicates practical glottis wave characteristic parameter;
C, contemporary community selected, intersected, obtaining new group after mutation operator;
D, Quasi-Newton algorithm is carried out to individual each in obtained new population again, obtains the population of a new generation;
E, the above B-D operation is repeated until reaching maximum evolutionary generation, and output global optimum is individual.
6. voice classification method according to claim 3, which is characterized in that the weighting of left and right vocal cords described in step 5 is asymmetric Spend WAR and normalization weighting zoom factor NWS:
It is combined with NWS two indices to distinguish normal voice and special voice using WAR, WAR describes the symmetrical of vocal cords Degree, WAR is smaller, and the degree of asymmetry of vocal cords is higher;NWS describes vocal cords bilateral intensity of anomaly, and NWS is bigger, and vocal cords bilateral is different Chang Chengdu is more serious.
CN201810824379.6A 2018-07-25 2018-07-25 Vocal classification method using vocal cord modeling inversion Active CN109119094B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810824379.6A CN109119094B (en) 2018-07-25 2018-07-25 Vocal classification method using vocal cord modeling inversion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810824379.6A CN109119094B (en) 2018-07-25 2018-07-25 Vocal classification method using vocal cord modeling inversion

Publications (2)

Publication Number Publication Date
CN109119094A true CN109119094A (en) 2019-01-01
CN109119094B CN109119094B (en) 2023-04-28

Family

ID=64863285

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810824379.6A Active CN109119094B (en) 2018-07-25 2018-07-25 Vocal classification method using vocal cord modeling inversion

Country Status (1)

Country Link
CN (1) CN109119094B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110870765A (en) * 2019-06-27 2020-03-10 上海慧敏医疗器械有限公司 Voice treatment instrument and method adopting glottis closing real-time measurement and audio-visual feedback technology
CN111081273A (en) * 2019-12-31 2020-04-28 湖南景程电子科技有限公司 Voice emotion recognition method based on glottal wave signal feature extraction
CN112201226A (en) * 2020-09-28 2021-01-08 复旦大学 Sound production mode judging method and system
CN112562650A (en) * 2020-10-31 2021-03-26 苏州大学 Voice recognition classification method based on vocal cord characteristic parameters
CN113012716A (en) * 2021-02-26 2021-06-22 武汉星巡智能科技有限公司 Method, device and equipment for identifying baby cry category

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4343969A (en) * 1978-10-02 1982-08-10 Trans-Data Associates Apparatus and method for articulatory speech recognition
US20010020141A1 (en) * 1995-10-31 2001-09-06 Elvina Ivanovna Chahine Method of restoring speech functions in patients suffering from various forms of dysarthria, and dysarthria probes
US20080300867A1 (en) * 2007-06-03 2008-12-04 Yan Yuling System and method of analyzing voice via visual and acoustic data
CN101916566A (en) * 2010-07-09 2010-12-15 西安交通大学 Electronic larynx speech reconstructing method and system thereof
CN103730130A (en) * 2013-12-20 2014-04-16 中国科学院深圳先进技术研究院 Detection method and system for pathological voice
CN103778913A (en) * 2014-01-22 2014-05-07 苏州大学 Pathological voice recognition method
CN105359211A (en) * 2013-09-09 2016-02-24 华为技术有限公司 Unvoiced/voiced decision for speech processing
CN108133713A (en) * 2017-11-27 2018-06-08 苏州大学 Method for estimating sound channel area under glottic closed phase

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4343969A (en) * 1978-10-02 1982-08-10 Trans-Data Associates Apparatus and method for articulatory speech recognition
US20010020141A1 (en) * 1995-10-31 2001-09-06 Elvina Ivanovna Chahine Method of restoring speech functions in patients suffering from various forms of dysarthria, and dysarthria probes
US20080300867A1 (en) * 2007-06-03 2008-12-04 Yan Yuling System and method of analyzing voice via visual and acoustic data
CN101916566A (en) * 2010-07-09 2010-12-15 西安交通大学 Electronic larynx speech reconstructing method and system thereof
CN105359211A (en) * 2013-09-09 2016-02-24 华为技术有限公司 Unvoiced/voiced decision for speech processing
CN103730130A (en) * 2013-12-20 2014-04-16 中国科学院深圳先进技术研究院 Detection method and system for pathological voice
CN103778913A (en) * 2014-01-22 2014-05-07 苏州大学 Pathological voice recognition method
CN108133713A (en) * 2017-11-27 2018-06-08 苏州大学 Method for estimating sound channel area under glottic closed phase

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
HUIQUN DENG ET AL.: "A New Method for Obtaining Accurate Estimates of Vocal-Tract Filters and Glottal Waves From Vowel Sounds", 《IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING》 *
曾晓亮 等: "利用声带动力学模型参数反演方法进行病变嗓音分类", 《声学学报》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110870765A (en) * 2019-06-27 2020-03-10 上海慧敏医疗器械有限公司 Voice treatment instrument and method adopting glottis closing real-time measurement and audio-visual feedback technology
CN111081273A (en) * 2019-12-31 2020-04-28 湖南景程电子科技有限公司 Voice emotion recognition method based on glottal wave signal feature extraction
CN112201226A (en) * 2020-09-28 2021-01-08 复旦大学 Sound production mode judging method and system
CN112201226B (en) * 2020-09-28 2022-09-16 复旦大学 Sound production mode judging method and system
CN112562650A (en) * 2020-10-31 2021-03-26 苏州大学 Voice recognition classification method based on vocal cord characteristic parameters
CN113012716A (en) * 2021-02-26 2021-06-22 武汉星巡智能科技有限公司 Method, device and equipment for identifying baby cry category
CN113012716B (en) * 2021-02-26 2023-08-04 武汉星巡智能科技有限公司 Infant crying type identification method, device and equipment
CN116631443A (en) * 2021-02-26 2023-08-22 武汉星巡智能科技有限公司 Infant crying type detection method, device and equipment based on vibration spectrum comparison
CN116631443B (en) * 2021-02-26 2024-05-07 武汉星巡智能科技有限公司 Infant crying type detection method, device and equipment based on vibration spectrum comparison

Also Published As

Publication number Publication date
CN109119094B (en) 2023-04-28

Similar Documents

Publication Publication Date Title
CN109119094A (en) Voice classification method by utilizing vocal cord modeling inversion
Drugman et al. Glottal source processing: From analysis to applications
CN102664016B (en) Singing evaluation method and system
CN102521281B (en) Humming computer music searching method based on longest matching subsequence algorithm
CN105023570B (en) A kind of method and system for realizing sound conversion
CN109243494A (en) Childhood emotional recognition methods based on the long memory network in short-term of multiple attention mechanism
Ewert et al. Piano transcription in the studio using an extensible alternating directions framework
Bhagatpatil et al. An automatic infant’s cry detection using linear frequency cepstrum coefficients (LFCC)
Zhang et al. Multiple vowels repair based on pitch extraction and line spectrum pair feature for voice disorder
Perez-Carrillo et al. Indirect acquisition of violin instrumental controls from audio signal with hidden Markov models
Narendra et al. Estimation of the glottal source from coded telephone speech using deep neural networks
CN111081273A (en) Voice emotion recognition method based on glottal wave signal feature extraction
Cummings et al. Glottal models for digital speech processing: A historical survey and new results
Le et al. Personalized speech enhancement combining band-split rnn and speaker attentive module
Little et al. Biomechanically informed nonlinear speech signal processing
Parlak et al. Harmonic differences method for robust fundamental frequency detection in wideband and narrowband speech signals
Gao Audio deepfake detection based on differences in human and machine generated speech
Albornoz et al. Snore recognition using a reduced set of spectral features
Zheng et al. Throat microphone speech enhancement via progressive learning of spectral mapping based on lstm-rnn
Zhang et al. Pathological voice classification based on the features of an asymmetric fluid–structure interaction vocal cord model
Patil et al. Combining evidence from spectral and source-like features for person recognition from humming
Lv et al. Objective evaluation method of broadcasting vocal timbre based on feature selection
Nhu et al. Singing performance of the talking robot with newly redesigned artificial vocal cords
Bai et al. Glottal Features Under Workload in Human-Robot Interaction
Yan et al. A Dual-Mode Real-Time Lip-Sync System for a Bionic Dinosaur Robot

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant