CN109119094B - Vocal classification method using vocal cord modeling inversion - Google Patents

Vocal classification method using vocal cord modeling inversion Download PDF

Info

Publication number
CN109119094B
CN109119094B CN201810824379.6A CN201810824379A CN109119094B CN 109119094 B CN109119094 B CN 109119094B CN 201810824379 A CN201810824379 A CN 201810824379A CN 109119094 B CN109119094 B CN 109119094B
Authority
CN
China
Prior art keywords
glottal
vocal cord
voice
vocal
wave
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810824379.6A
Other languages
Chinese (zh)
Other versions
CN109119094A (en
Inventor
孙宝印
陶智
陈莉媛
张晓俊
吴迪
肖仲喆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou University
Original Assignee
Suzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou University filed Critical Suzhou University
Priority to CN201810824379.6A priority Critical patent/CN109119094B/en
Publication of CN109119094A publication Critical patent/CN109119094A/en
Application granted granted Critical
Publication of CN109119094B publication Critical patent/CN109119094B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/39Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using genetic algorithms
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Prostheses (AREA)
  • Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)

Abstract

The invention discloses a voice classification method utilizing vocal cord modeling inversion, which effectively distinguishes various voices from a sounding mechanism angle. The invention mainly utilizes complex cepstrum phase decomposition to obtain the actual vocal cord wave as the target vocal cord wave, adopts an optimization algorithm to carry out vocal cord dynamics model inversion operation by matching the characteristic parameters of the target and the model vocal cord wave, selects normal voice and special voice for identification and classification, and has better accuracy. According to the invention, after the actual voice signal is input, the actual glottal wave is extracted as a target, and the original model is optimized by inversion through a genetic algorithm, so that the vocal cord vibration conditions of different vocal sounds are simulated. Experimental results show that the relative error of the matching of the characteristic parameters after model inversion is not more than 1.95%, and the inversion effect is good. The normal voice and the special voice are selected for recognition and analysis, so that the method has higher accuracy.

Description

Vocal classification method using vocal cord modeling inversion
Technical Field
The invention relates to the voice classification field, in particular to a voice classification method for vocal cord modeling inversion.
Background
The voice classification technique is a technique for performing feature analysis on voice signals and distinguishing different types of voices, and can be applied to emotion voice analysis, voice quality evaluation and the like. The voice quality has direct influence on the language expression of people, is particularly important for teachers, broadcasters, singers and the like, and when people produce voice for a long time or are in a state of tension and pressure, voice can change and even hissing sound occurs.
At present, acoustic analysis methods of voice are widely used, only acoustic information of voice can be provided, the acoustic analysis methods cannot be associated with physiological structures of an actual sound production system, and good classification standard classification results cannot be provided, so that larger errors exist. Research shows that when the same voice is sent out, some special voices (such as vocal cord nodules, vocal cord polyps and hyperthyroidism voices) and normal vocal cord vibration conditions are different, and vocal cord vibration conditions corresponding to voices under different emotions are also different. In view of the fact that vocal cords are one of important systems in the voice generation process, the vibration condition of the vocal cords can directly influence the quality of voices, so that vocal cord modeling technology is combined with voice analysis to simulate glottal waves output when the vocal cords are in different states so as to classify voices.
Voice classification is performed by adopting a vocal cord modeling technology, mainly by outputting glottal waves through a model to simulate actual voice, so that voice is classified. In practical situations, the physical parameters of the model are directly set, so that the glottal wave matched with the actual voice signal is difficult to simulate, and the setting of the subsequent classification standard is greatly influenced.
Disclosure of Invention
The technical problems to be solved by the invention are as follows: aiming at the defects of the background technology, the invention corrects collision force on the basis of the traditional vocal cord two-mass block model, and adopts an inversion algorithm to match and optimize the model so as to accurately simulate the physiological condition of the actual vocal cord vibration, thereby classifying normal voice and special voice.
The invention adopts the following technical scheme for solving the technical problems:
the invention provides a vocal tract dynamics model inversion voice classification method, which specifically comprises the following steps:
step 1, estimating a glottal wave by utilizing Complex Cepstrum Phase Decomposition (CCPD), which specifically comprises the following steps:
(1) Firstly, the pitch period of a frame of voice signal is obtained, the position of a glottal closing point of the frame of voice signal is obtained through a DYPSA algorithm, the position of the glottal closing point corresponds to the pitch period, and the specific position of the glottal closing point in each pitch period is obtained;
(2) Obtaining voice signals in each pitch period, decomposing the voice signals in the period into a maximum phase signal and a minimum phase signal by adopting a complex cepstrum method, and differentiating the maximum phase signal and the minimum phase signal;
(3) Combining the differentiated maximum and minimum phase signals with the position of the glottal closing point, wherein the maximum phase signal is placed in front of the glottal closing point, and the minimum phase signal is placed behind the glottal closing point, so as to obtain differential glottal wave estimation;
(4) And integrating the differential glottal wave to realize glottal wave estimation.
Step 2, establishing a collision force modified vocal cord two-mass vibration model, and determining a model optimization parameter vector; the method comprises the following steps:
establishing a collision force modified vocal cord two-mass vibration model IFCM to obtain a dynamic equation of system vibration, and introducing a left vocal cord scaling coefficient Q l Right vocal cord scaling factor Q r Left vocal cord coupling scaling factor Q cl Right vocal cord coupling scaling factor Q cr Subglottic compression release coefficient Q P As model optimization parameters;
Figure BDA0001742117930000021
wherein m is Is the mass of the mass block at two sides, k Is the stiffness coefficient, r Is a damping coefficient, wherein i=1, 2 respectively represent an upper mass block and a lower mass block, and alpha=l, r respectively represent a left mass block and a right mass block; l is vocal cord length, d i For each mass thickness, k C is the coupling coefficient The additional elastic coefficient is the additional elastic coefficient when the vocal cords on the two sides collide; a, a i Represents glottic gap area, P i Is glottis depression;
the model-related parameters are modified as follows:
Figure BDA0001742117930000022
Figure BDA0001742117930000023
Figure BDA0001742117930000024
wherein m is il0 k il0 k cl0 c il0 Respectively representing the quality of the left mass block, the stiffness coefficient, the coupling coefficient and the parameter standard value of the additional elastic coefficient when the vocal cords on two sides collide; m is m ir0 k ir0 k cr0 c ir0 Respectively representing the quality, stiffness coefficient, coupling coefficient of the right mass block and the parameter standard value of the additional elastic coefficient when the vocal cords on two sides collide; p (P) i0 Is a parameter standard value of glottal depression.
Step 3, extracting acoustic parameters in the glottal wave;
obtaining a primary derivative of the glottal wave Ug to obtain a glottal wave derivative Ug', and extracting glottal wave characteristic parameters according to the glottal wave and main time points of the glottal wave derivative:
F0=1/(to T -to) (5)
OQ=(tc-to)/(to T -to) (6)
CIQ=(tc-tm)/(to T -to) (7)
Sr=Ugc'/Ugr' (8)
0NAQ=Ugm/(Ugc'|·(to T -to)) (9)
wherein F0 represents fundamental frequency, OQ represents open quotient, CIQ represents closed quotient, sr represents skew ratio, and NAQ represents normalized amplitude quotient; to is the glottal opening time, tc is the glottal closing time, tm is the time corresponding to the peak value of the glottal wave Ug, ugc 'is the minimum value of the glottal wave derivative Ug', ugr 'is the maximum value of the glottal wave derivative Ug', and Ugm is the glottal wave peak value.
And 4, performing model inversion operation by adopting an optimization algorithm GA-QN combining a genetic algorithm and a quasi-Newton method, wherein the model inversion operation comprises the following steps of:
A. firstly, generating an initial population, and determining a maximum evolution algebra;
B. calculating fitness f (phi) of each individual in the population i ) Obtaining an individual with the greatest fitness as a current optimal individual;
the fitness function is:
Figure BDA0001742117930000031
wherein the subscript m represents the model glottal wave characteristic parameter and the subscript o represents the actual glottal wave characteristic parameter.
C. Selecting, crossing and mutating the current population to obtain a new population;
D. carrying out Newton-like algorithm on each individual in the obtained new population to obtain a new generation population;
E. repeating the A-D operation until reaching the maximum evolution algebra, and outputting the globally optimal individual.
Step 5, voice classification is carried out;
according to the difference analysis of each special diagnosis parameter during different vocal sounds, the different voice sounds are weighted and combined to provide a left and right vocal cord weighted asymmetry WAR and a normalized weighted scaling coefficient NWS:
Figure BDA0001742117930000032
Figure BDA0001742117930000033
the WAR and the NWS are combined to distinguish normal voice from special voice, the WAR describes the symmetry degree of the vocal cords, and the smaller the WAR is, the higher the asymmetry degree of the vocal cords is; the NWS describes the degree of bilateral abnormalities of the vocal cords, the greater the NWS the more severe the degree of bilateral abnormalities of the vocal cords.
Compared with the prior art, the technical scheme provided by the invention has the following technical effects:
according to the invention, after the actual voice signal is input, the actual glottal wave is extracted as a target, and the original model is optimized by inversion through a genetic algorithm, so that the vocal cord vibration conditions of different vocal sounds are simulated. Experimental results show that the relative error of the matching of the characteristic parameters after model inversion is not more than 3.6%, and the inversion effect is good. The normal voice and the special voice are selected for recognition and analysis, so that the method has higher accuracy.
Drawings
Fig. 1 is a flow chart of the present invention.
Fig. 2 is a CCPD algorithm flow chart.
Fig. 3 is an IFCM model diagram.
Fig. 4 is a table of model physical parameter initial values.
Fig. 5 is a normalized glottal wave Ug graph.
Fig. 6 is a waveform diagram of the glottal derivative Ug'.
FIG. 7 is a flow chart of the GA-QN algorithm.
Fig. 8 is a table of glottal parameter error comparisons after normal voice inversion.
Fig. 9 is a table of glottal parameter error comparisons after special vocal inversion.
Fig. 10 is a table of normal voice and special voice recognition cases using a single classification index.
Fig. 11 is a table of recognition cases of normal voice and special voice using a combination of double classification indexes.
Detailed Description
The technical scheme of the invention is further described in detail below with reference to the accompanying drawings:
it will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
According to the invention, the actual vocal sound glottal wave is obtained by complex cepstrum phase decomposition and is used as a target glottal wave, the vocal cord dynamics model inversion operation is carried out by matching the target and the model glottal wave characteristic parameters through an optimization algorithm, and the vocal sound classification standard is provided according to the optimal parameter vector after inversion optimization. The method comprises the following specific steps:
step 1: acquiring an actual glottal wave;
the present invention utilizes complex cepstral phase decomposition (complex cepstrumphase decomposition, CCPD) to estimate the glottal wave. Firstly, the pitch period of a frame of voice signal is obtained, the position of a glottal closing point of the frame of voice signal is obtained through a DYPSA (Dynamic Programming Projected Phase-Slope Algorithm) Algorithm, the position of the glottal closing point corresponds to the pitch period, and the specific position of the glottal closing point in each pitch period is obtained. The voice signal in each pitch period is obtained, the voice signal in the period is decomposed into a maximum phase signal and a minimum phase signal by adopting a complex cepstrum method and is differentiated, the maximum phase signal and the minimum phase signal are combined with the position of a glottal closing point, the maximum phase component is matched with a glottal opening, and the minimum phase component is matched with a glottal closing point. Before the glottis closing point is the glottis opening phase, the maximum phase signal is placed before the glottis closing point; the glottal closing point is followed by a glottal closing phase, and the minimum phase signal is placed at the glottal closing point to obtain differential glottal wave estimation; and integrating the differential glottal wave to realize glottal wave estimation.
Step 2: determining a model optimization parameter vector;
establishing a collision force modified vocal cord two-mass vibration model (Impact force correctionmodel, IFCM) to obtain a dynamic equation of system vibration, and introducing a left vocal cord scaling coefficient Q l Right vocal cord scaling factor Q r Left vocal cord coupling scaling factor Q cl Right vocal cord coupling scaling factor Q cr Subglottic compression release coefficient Q P As model optimization parameters.
Figure BDA0001742117930000051
Wherein m is Is the mass of the mass block at two sides, k Is the stiffness coefficient and r For damping coefficients, where i=1, 2 represents the upper and lower masses, respectively, and α=l, r represents the left and right masses, respectively. k (k) To be coupled to each other. l is vocal cord length, d i For each mass thickness, k C is the coupling coefficient The additional elastic coefficient is the additional elastic coefficient when the vocal cords on the two sides collide; a, a i Represents glottic gap area, P s Is glottic depression.
The model-related parameters are modified as follows:
Figure BDA0001742117930000052
Figure BDA0001742117930000053
Figure BDA0001742117930000054
wherein m is il0 k il0 k cl0 c il0 Respectively representing the quality of the left mass block, the stiffness coefficient, the coupling coefficient and the parameter standard value of the additional elastic coefficient when the vocal cords on two sides collide; m is m ir0 k ir0 k cr0 c ir0 Respectively representing the quality, stiffness coefficient, coupling coefficient of the right mass block and the parameter standard value of the additional elastic coefficient when the vocal cords on two sides collide; p (P) s0 Is a parameter standard value of glottal depression.
Step 3, extracting acoustic parameters in the glottal wave;
obtaining a primary derivative of the glottal wave Ug to obtain a glottal wave derivative Ug', and extracting glottal wave characteristic parameters according to the glottal wave and main time points of the glottal wave derivative:
F0=1/(to T -to) (5)
OQ=(tc-to)/(to T -to) (6)
CIQ=(tc-tm)/(to T -to) (7)
Sr=Ugc'/Ugr' (8)
NAQ=Ugm/(|Ugc'|·(to T -to)) (9)
wherein F0 represents fundamental frequency, OQ represents open quotient, CIQ represents closed quotient, sr represents skew ratio, and NAQ represents normalized amplitude quotient; to is the glottal opening time, tc is the glottal closing time, tm is the time corresponding to the peak value of the glottal wave Ug, ugc 'is the minimum value of the glottal wave derivative Ug', ugr 'is the maximum value of the glottal wave derivative Ug', and Ugm is the glottal wave peak value.
And 4, performing model inversion operation by adopting an optimization algorithm GA-QN combining a genetic algorithm and a quasi-Newton method, wherein the model inversion operation comprises the following steps of:
A. firstly, generating an initial population, and determining a maximum evolution algebra;
B. calculating fitness f (phi) of each individual in the population i ) Obtaining an individual with the greatest fitness as a current optimal individual;
the fitness function is:
Figure BDA0001742117930000061
wherein the subscript m represents the model glottal wave characteristic parameter and the subscript o represents the actual glottal wave characteristic parameter.
C. Selecting, crossing and mutating the current population to obtain a new population;
D. carrying out Newton-like algorithm on each individual in the obtained new population to obtain a new generation population;
E. repeating the A-D operation until reaching the maximum evolution algebra, and outputting a global optimal individual;
step 5, voice classification is carried out;
according to the difference analysis of each special diagnosis parameter during different vocal sounds, the invention carries out weighted combination on the voice sounds, and provides a left and right vocal cord weighted asymmetry WAR and a normalized weighted scaling factor NWS:
Figure BDA0001742117930000071
Figure BDA0001742117930000072
the WAR and NWS indicators are used in combination to distinguish between normal voice and special voice. The WAR describes the degree of symmetry of the vocal cords, the smaller the WAR, the higher the degree of asymmetry of the vocal cords; the NWS describes the degree of bilateral abnormalities of the vocal cords, the greater the NWS the more severe the degree of bilateral abnormalities of the vocal cords.
Example 1:
the flow of the present invention is shown in fig. 1, where the glottal wave is estimated by complex cepstral phase decomposition (complex cepstrum phase decomposition, CCPD). Firstly, the pitch period of a frame of voice signal is obtained, the position of a glottal closing point of the frame of voice signal is obtained through a DYPSA (Dynamic Programming ProjectedPhase-slopeAlgorithm) algorithm, the position of the glottal closing point corresponds to the pitch period, and the specific position of the glottal closing point in each pitch period is obtained. The voice signal in each pitch period is obtained, the voice signal in the period is decomposed into a maximum phase signal and a minimum phase signal by adopting a complex cepstrum method and is differentiated, the maximum phase signal and the minimum phase signal are combined with the position of a glottal closing point, the maximum phase component is matched with a glottal opening, and the minimum phase component is matched with a glottal closing point. Before the glottis closing point is the glottis opening phase, the maximum phase signal is placed before the glottis closing point; the glottal closing point is followed by a glottal closing phase, and the minimum phase signal is placed at the glottal closing point to obtain differential glottal wave estimation; the differential glottal wave is integrated to realize glottal wave estimation, and the specific operation flow is shown in figure 2.
A standard two-mass model is built as shown in fig. 3, the vocal cords are represented by two masses, which are passed through the springs k And a damper r Is connected, wherein i=1, 2 respectively represent the upper and lower mass blocks, alpha=l, r respectively represent the left and right mass blocks, and the spring k i And a damper r i By an additional spring k To be coupled to each other. And determining model optimization parameters Q l 、Q r 、Q cl 、Q cr 、Q p The coupled glottal airflow acquires a model glottal wave, wherein initial values of physical parameters in the model are shown in fig. 4.
And carrying out vocal cord dynamics model inversion operation by adopting a GA-QN algorithm through matching the target and the model glottal wave characteristic parameters, wherein the operation flow is shown in figure 5. The population number Np is set to be 50, the iteration number is not more than 300, the crossover probability Pc is 0.7, the variation probability Pm is 0.3, and the fitness function of each individual in the population is as follows:
Figure BDA0001742117930000073
wherein each glottal characteristic parameter is a fundamental frequency F0 (F0=1/(to) T -to)), the quotient OQ (oq= (tc-to)/(to) T -to)), the closure quotient CIQ (ciq= (tc-tm)/(to) T -to)), the skew ratio Sr (sr= Ugm '/MFDR), the normalized amplitude quotient NAQ (naq= Ugm/(Ugc' | (toT-to))), where the primary time nodes are shown in fig. 6, 7. The subscript m represents the model glottal wave characteristic parameter and the subscript o represents the actual glottal wave characteristic parameter.
The invention adopts MEEI (Massachusetts Eye and Ear Infirmary) database. The test set of the database is continuous vowels/a/, and normal voice and special voice are selected from the database, wherein the special voice comprises two voice of vocal cord polyps and voice of vocal cord paralysis. From the above samples, 40 normal voices and 40 special voices (20 single-sided vocal cord abnormalities and 20 double-sided vocal cord abnormalities) were selected, wherein the sampling frequency of the voice sample was 25kHz.
The parameters of the normal voice and special voice glottal are matched with error value parameters as shown in fig. 8 and 9. As can be seen from the graph, compared with the traditional Two-mass model (TMM), the matching error of each characteristic value corresponding to the IFCM model provided by the invention is smaller, the inversion effect is better, and the actual vibration condition of the vocal cords when the IFCM model can better reflect sounding is illustrated. When the input voice is special voice, the glottis closing degree is reduced and the glottis wave regularity is reduced due to the fact that the voice is abnormal, the glottis frequency and the amplitude of the glottis wave are different, so that the matching error of each characteristic parameter of the model is higher than that of the normal voice, the inversion effect of the abnormal voice of the voice is slightly inferior to that of the normal voice, but the matching error of all the characteristic parameters is only 1.95% at most, the inversion effect of the model of the voice is good, and the guarantee is provided for analysis of the model parameters after optimization.
Fig. 10 shows the result of identifying normal voice and special voice using only the WAR index or NWS index, and fig. 11 shows the result of identifying abnormal positions of the vocal cords of normal voice and special voice using the combination of the WAR index and NWS index, including the identification rate and Kappa index. Kappa index is used to describe the effect of recognition, the closer the value of the index is to 1, the better the recognition result. As can be seen from the figure, the recognition rate for normal and special voices is 86.25% only with WAR index, and 81.25% with NWS index. The two indexes are combined to identify normal voice and special voice, and the accuracy of the identification result can reach 97.50%. Experimental results show that the method provided by the invention can effectively and comprehensively distinguish abnormal vocal cords of normal voice from abnormal vocal cords of special voice.
The foregoing is only a partial embodiment of the present invention, and it should be noted that it will be apparent to those skilled in the art that modifications and adaptations can be made without departing from the principles of the present invention, and such modifications and adaptations are intended to be comprehended within the scope of the present invention.

Claims (4)

1. A vocal classification method for inverting a vocal cord dynamics model, comprising:
step 1, estimating a glottal wave by utilizing Complex Cepstrum Phase Decomposition (CCPD), which specifically comprises the following steps:
(1) Firstly, the pitch period of a frame of voice signal is obtained, the position of a glottal closing point of the frame of voice signal is obtained through a DYPSA algorithm, the position of the glottal closing point corresponds to the pitch period, and the specific position of the glottal closing point in each pitch period is obtained;
(2) Obtaining voice signals in each pitch period, decomposing the voice signals in the pitch period into a maximum phase signal and a minimum phase signal by adopting a complex cepstrum method, and differentiating the maximum phase signal and the minimum phase signal;
(3) Combining the differentiated maximum and minimum phase signals with the position of the glottal closing point, wherein the maximum phase signal is placed in front of the glottal closing point, and the minimum phase signal is placed behind the glottal closing point, so as to obtain differential glottal wave estimation;
(4) Integrating the differential glottal wave to realize glottal wave estimation;
step 2, establishing a collision force modified vocal cord two-mass vibration model, and determining a model optimization parameter vector;
step 3, extracting acoustic parameters in the glottal wave;
step 4, adopting an optimization algorithm GA-QN combining a genetic algorithm and a quasi-Newton method, and carrying out vocal cord dynamics model inversion operation by matching a target and model glottal wave characteristic parameters;
step 5, voice classification is carried out;
according to the difference analysis of each characteristic parameter when different voices are produced, the characteristic parameters are weighted and combined, a left and right vocal cord weighted asymmetry WAR and a normalized weighted scaling coefficient NWS are provided, and the WAR and the NWS are combined to distinguish normal voices from special voices, wherein the left and right vocal cord weighted asymmetry WAR and the normalized weighted scaling coefficient NWS are adopted:
Figure FDA0003998402560000011
Figure FDA0003998402560000012
left vocal cord scaling factor Q l Right vocal cord scaling factor Q r Left vocal cord coupling scaling factor Q cl Right vocal cord coupling scaling factor Q cr Subglottic compression release coefficient Q P
The WAR and the NWS are combined to distinguish normal voice from special voice, the WAR describes the symmetry degree of the vocal cords, and the smaller the WAR is, the higher the asymmetry degree of the vocal cords is; the NWS describes the degree of bilateral abnormalities of the vocal cords, the greater the NWS the more severe the degree of bilateral abnormalities of the vocal cords.
2. The voice classification method according to claim 1, wherein in step 2, a collision force-corrected vocal cord two-mass vibration model is established, and a model optimization parameter vector is determined; the method comprises the following steps:
establishing a collision force modified vocal cord two-mass vibration model IFCM to obtain a dynamic equation of system vibration, and introducing a left vocal cord scaling coefficient Q l Right vocal cord scaling factor Q r Left vocal cord coupling scaling factor Q cl Right vocal cord coupling scaling factor Q cr Subglottic compression release coefficient Q P As model optimization parameters;
Figure FDA0003998402560000021
wherein m is Is the mass of the mass block at two sides, k Is the stiffness coefficient, r Is a damping coefficient, wherein i=1 or 2 represents an upper mass block and a lower mass block respectively, and alpha=l or r represents a left mass block and a right mass block respectively; l (L) S Is the vocal cord length d i For each mass thickness, k C is the coupling coefficient The additional elastic coefficient is the additional elastic coefficient when the vocal cords on the two sides collide; a, a i Represents glottic gap area, P i Is glottis depression;
the model-related parameters are modified as follows:
Figure FDA0003998402560000022
Figure FDA0003998402560000023
Figure FDA0003998402560000024
wherein m is il0 、k il0 、k cl0 、c il0 Respectively representing the quality of the left mass block, the stiffness coefficient, the coupling coefficient and the parameter standard value of the additional elastic coefficient when the vocal cords on two sides collide; m is m ir0 、k ir0 、k cr0 、c ir0 Respectively representing the quality, stiffness coefficient, coupling coefficient of the right mass block and the parameter standard value of the additional elastic coefficient when the vocal cords on two sides collide; p (P) i0 Is a parameter standard value of glottal depression.
3. The voice classification method according to claim 1, wherein the extracting acoustic parameters in the glottal wave in step 3 is specifically:
obtaining a primary derivative of the glottal wave Ug to obtain a glottal wave derivative Ug', and extracting glottal wave characteristic parameters according to the glottal wave and main time points of the glottal wave derivative:
F0=1/(to T -to) (5)
OQ=(tc-to)/(to T -to) (6)
CIQ=(tc-tm)/(to T -to) (7)
Sr=Ugc'/Ugr' (8)
NAQ=Ugm/(Ugc'·(to T -to)) (9)
wherein F0 represents fundamental frequency, OQ represents open quotient, CIQ represents closed quotient, sr represents skew ratio, and NAQ represents normalized amplitude quotient; to is the glottal opening time, tc is the glottal closing time, tm is the time corresponding to the peak value of the glottal wave Ug, ugc 'is the minimum value of the glottal wave derivative Ug', ugr 'is the maximum value of the glottal wave derivative Ug', and Ugm is the glottal wave peak value.
4. The voice classification method according to claim 3, wherein in step 4, the optimization algorithm GA-QN combining the genetic algorithm and the quasi-newton method is adopted to perform the vocal cord dynamics model inversion operation by matching the target and the model glottal wave characteristic parameters, specifically:
A. firstly, generating an initial population, and determining a maximum evolution algebra;
B. calculating fitness f (phi) of each individual in the population i ) Obtaining an individual with the greatest fitness as a current optimal individual;
the fitness function is:
Figure FDA0003998402560000031
wherein the subscript m represents the model glottal wave characteristic parameter, and the subscript o represents the actual glottal wave characteristic parameter;
C. selecting, crossing and mutating the current population to obtain a new population;
D. carrying out Newton-like algorithm on each individual in the obtained new population to obtain a new generation population;
E. repeating the B-D operation until reaching the maximum evolution algebra, and outputting the global optimal individual.
CN201810824379.6A 2018-07-25 2018-07-25 Vocal classification method using vocal cord modeling inversion Active CN109119094B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810824379.6A CN109119094B (en) 2018-07-25 2018-07-25 Vocal classification method using vocal cord modeling inversion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810824379.6A CN109119094B (en) 2018-07-25 2018-07-25 Vocal classification method using vocal cord modeling inversion

Publications (2)

Publication Number Publication Date
CN109119094A CN109119094A (en) 2019-01-01
CN109119094B true CN109119094B (en) 2023-04-28

Family

ID=64863285

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810824379.6A Active CN109119094B (en) 2018-07-25 2018-07-25 Vocal classification method using vocal cord modeling inversion

Country Status (1)

Country Link
CN (1) CN109119094B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110870765A (en) * 2019-06-27 2020-03-10 上海慧敏医疗器械有限公司 Voice treatment instrument and method adopting glottis closing real-time measurement and audio-visual feedback technology
CN111081273A (en) * 2019-12-31 2020-04-28 湖南景程电子科技有限公司 Voice emotion recognition method based on glottal wave signal feature extraction
CN112201226B (en) * 2020-09-28 2022-09-16 复旦大学 Sound production mode judging method and system
CN112562650A (en) * 2020-10-31 2021-03-26 苏州大学 Voice recognition classification method based on vocal cord characteristic parameters
CN116631443B (en) * 2021-02-26 2024-05-07 武汉星巡智能科技有限公司 Infant crying type detection method, device and equipment based on vibration spectrum comparison

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4343969A (en) * 1978-10-02 1982-08-10 Trans-Data Associates Apparatus and method for articulatory speech recognition
RU2066990C1 (en) * 1995-10-31 1996-09-27 Эльвина Ивановна Скляренко Method and set for restoring speech function in patients having various kinds of dysarthria by reflexogenic stimulation of biologically active points
US20080300867A1 (en) * 2007-06-03 2008-12-04 Yan Yuling System and method of analyzing voice via visual and acoustic data
CN101916566B (en) * 2010-07-09 2012-07-04 西安交通大学 Electronic larynx speech reconstructing method and system thereof
US9570093B2 (en) * 2013-09-09 2017-02-14 Huawei Technologies Co., Ltd. Unvoiced/voiced decision for speech processing
CN103730130B (en) * 2013-12-20 2019-03-01 中国科学院深圳先进技术研究院 A kind of detection system of pathological voice
CN103778913A (en) * 2014-01-22 2014-05-07 苏州大学 Pathologic voice recognizing method
CN108133713B (en) * 2017-11-27 2020-10-02 苏州大学 Method for estimating sound channel area under glottic closed phase

Also Published As

Publication number Publication date
CN109119094A (en) 2019-01-01

Similar Documents

Publication Publication Date Title
CN109119094B (en) Vocal classification method using vocal cord modeling inversion
Drugman et al. Glottal source processing: From analysis to applications
US8762142B2 (en) Multi-stage speech recognition apparatus and method
WO2015124006A1 (en) Audio detection and classification method with customized function
Almaadeed et al. Text-independent speaker identification using vowel formants
CN110648684B (en) Bone conduction voice enhancement waveform generation method based on WaveNet
Pawar et al. Review of various stages in speaker recognition system, performance measures and recognition toolkits
Park et al. Multi-speaker end-to-end speech synthesis
CN110234279A (en) The method for characterizing sleep disordered breathing
Takamichi et al. Sampling-based speech parameter generation using moment-matching networks
Narendra et al. Estimation of the glottal source from coded telephone speech using deep neural networks
Perez-Carrillo et al. Indirect acquisition of violin instrumental controls from audio signal with hidden Markov models
JP2017151230A (en) Voice conversion device, voice conversion method, and computer program
Eray et al. An application of speech recognition with support vector machines
CN111081273A (en) Voice emotion recognition method based on glottal wave signal feature extraction
Minematsu et al. Structural representation of the pronunciation and its use for CALL
Sivaram et al. Data-driven and feedback based spectro-temporal features for speech recognition
Guðnason et al. Evaluation of speech inverse filtering techniques using a physiologically based synthesizer
CN112562650A (en) Voice recognition classification method based on vocal cord characteristic parameters
O'Cinneide et al. Linear Prediction: The Problem, its Solution and Application to Speech
Ahmad et al. The Modeling of the Quranic Alphabets' Correct Pronunciation for Adults and Children Experts
Tanaka et al. An inter-speaker evaluation through simulation of electrolarynx control based on statistical F 0 prediction
Sudro et al. Significance of data augmentation for improving cleft lip and palate speech recognition
Costa Adaptive phonetic segmentation in dysphonic voice
Tsao et al. A study on separation between acoustic models and its applications.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant