CN108133713A - Method for estimating sound channel area under glottic closed phase - Google Patents

Method for estimating sound channel area under glottic closed phase Download PDF

Info

Publication number
CN108133713A
CN108133713A CN201711206456.3A CN201711206456A CN108133713A CN 108133713 A CN108133713 A CN 108133713A CN 201711206456 A CN201711206456 A CN 201711206456A CN 108133713 A CN108133713 A CN 108133713A
Authority
CN
China
Prior art keywords
sound channel
gci
glottis
channel area
estimation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201711206456.3A
Other languages
Chinese (zh)
Other versions
CN108133713B (en
Inventor
陶智
孙宝印
邵雅婷
张晓俊
吴迪
肖仲喆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou University
Original Assignee
Suzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou University filed Critical Suzhou University
Priority to CN201711206456.3A priority Critical patent/CN108133713B/en
Publication of CN108133713A publication Critical patent/CN108133713A/en
Application granted granted Critical
Publication of CN108133713B publication Critical patent/CN108133713B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/75Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 for modelling vocal tract parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/12Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being prediction coefficients
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/60Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for measuring the quality of voice signals

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Quality & Reliability (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The invention discloses a method for estimating the area of a sound channel under glottic closure, which comprises the steps of firstly determining the positions of two adjacent closed points of the glottic by adopting a DYPSA algorithm, and synchronously calculating the weight excitation function of attenuation by taking the two adjacent closed points as a unit; then, calculating a reflection coefficient of a glottal closed-phase lower sound channel model by using a weighted linear prediction method; the discrete vocal tract area function is then iteratively calculated. The superiority of the method is verified from the perspective of inverse filtering, six types of sound channel area characteristics are selected for recognition and analysis, and 7% of accuracy improvement is achieved compared with a characteristic fusion optimization algorithm using the same voice library.

Description

A kind of method that sound channel area is estimated in the case where glottis closes phase
Technical field
The present invention relates to linear prediction method estimation sound channel area techniques fields more particularly to one kind to estimate in the case where glottis closes phase The method of sound channel area.
Background technology
Sound channel is one of important system during speech production, the research of vocal tract shape can be applied to phonetic synthesis, Speech recognition, speech training, music control etc..Research shows that when sending out identical voice, some special voices are (such as:Vocal cords are small Knot, polyp of vocal cord, hyperthyroidism voice) and the corresponding vocal tract shape of normal voice it is different.X-ray imaging, ultrasonic imaging, MRI The medical procedures such as (magnetic resonance imaging) magnetic resonance imaging can obtain accurate sound channel area, but this A little methods make subject be exposed in different types of ray and electromagnetic wave, have potential hazard to human body and also equipment requirement compared with High, complicated for operation, underaction is convenient.The method of estimation vocal tract shape is it is only necessary to handle voice data indirectly, letter It is single practical.Estimation sound channel area mainly has formant method and linear prediction liftering method at present, wherein using the inverse filter of linear prediction The method of wave is related to the hypothesis to boundary condition.
In the research of linear prediction method estimation sound channel area, there are two different boundary conditions:Glottis is closed completely, i.e., Glottis reflectance factor is 1, and sound channel loss concentrates on lip end;Lip end is closed completely, i.e., lip end reflectance factor is 1, sound channel loss collection In at glottis.
During the above-mentioned hypothesis of practical application, both are not well positioned to meet, so as to be unfavorable for estimating for sound channel area function Meter if glottis is in sounding, is regularly opened and is closed, and under the conditions of frequency is low-down, lip end radiation impedance can be by It is considered 0, then boundary condition cannot obtain rational result;And the pronunciation of certain vowels (vowel/a/ etc.) leads to condition not It tallies with the actual situation.
Deng H propose to estimate sound channel area function in the case where glottis closes phase, but glottis is only closed phase and the width of glottis wave by it Value connects, less than half of glottis wave amplitude in peak value is considered that glottis closes phase by him, this method of estimation is not stringent Accurately, and the data volume for autocorrelation analysis is caused to become insufficient.
Invention content
The technical problems to be solved by the invention are that in order to overcome the disadvantages of the above prior art, the present invention closes phase in glottis A kind of new algorithm is proposed on the basis of method, to reach the sound channel area that voice under phase is closed in accurate estimation.
The present invention uses following technical scheme to solve above-mentioned technical problem
A kind of method of estimation of the sound channel area in the case where glottis closes phase, specifically comprises the following steps:
Step 1:Determine two adjacent closing point position GCI of glottis1And GCI2
Step 2:According to the two of glottis adjacent closing point position GCI1And GCI2Calculate the weight excitation function W of attenuationn, It is specific as follows:
With two of glottis adjacent closing point position GCI1And GCI2For a cycle, by two adjacent closing point positions GCI1And GCI2Neighbouring WnD is set as, with GCI1Coordinate, weight excitation function W are established for originnIncreased with constant-slope from d To 1, d is reduced to from 1 with the identical slope of absolute value, hereafter until GCI2Position WnBecome d, weight excitation function WnIt is formed One trapezoidal piecewise function, it is as follows:
Wherein, d is the normal number less than 1, and n represents n-th of voice data point since origin, and N is in a cycle All data points, α, β are the ratio that different segmentations are shared in piecewise function, NSlopeRepresent that weighting function value rises to 1 from d The points passed through;
Step 3:The linear prediction of sound channel under glottis closes phase is calculated under conditions of weighted linear predicts mean square error minimum Coefficient;
Step 4:Iterate to calculate the discrete channels area function of lossless pipeline model:
It is specific as follows using reflectance factor and the discrete channels area function of the lossless pipeline model of formula Recursive Solution:
Wherein, μmM-th of predictive coefficient of m rank linear predictions for reflectance factor, i.e. am(m);AmRepresent m length of tubing Sectional area.
As the further preferred scheme of present invention method of estimation of sound channel area in the case where glottis closes phase, in step 1, from The test set of database chooses sounding vowel, determines two adjacent closures of glottis using DYPSA algorithms for its audible curve Point position GCI1And GCI2
As the further preferred scheme of present invention method of estimation of sound channel area in the case where glottis closes phase, the step 3 has Body comprises the following steps:
Step 3.1, the mean square error of weighted linear prediction is calculated, it is specific as follows:
Wherein, E be weighted linear prediction mean square error, enRepresent prediction error, WnIt is the weight excitation function of step 2, snIt is voice signal, aiAs predictive coefficient, P be weighted linear prediction order, [GCI1,GCI2] except signal be 0;
Step 3.2, the mean square error E predicted the weighted linear that step 3.1 calculates carries out derivation so as to all ai's Partial derivative is 0, specific as follows:
Step 3.3, it solves above-mentioned matrix and obtains all reflectance factor ai,
It, in step 4, will as the further preferred scheme of present invention method of estimation of sound channel area in the case where glottis closes phase Glottis regards a slight change, lossless, uniform sound pipe as to this section of sound channel of lip, establish sound channel by multiple equal lengths, The different pipe series connection of sectional area forms the vocal tube model of lossless pipeline.
As the further preferred scheme of present invention method of estimation of sound channel area in the case where glottis closes phase, in step 2, d It is 10-4
As the further preferred scheme of present invention method of estimation of sound channel area in the case where glottis closes phase, in step 2, α It is 0.05.
As the further preferred scheme of present invention method of estimation of sound channel area in the case where glottis closes phase, in step 2, β It is 0.7.
As the further preferred scheme of present invention method of estimation of sound channel area in the case where glottis closes phase, in step 2, NSlopeIt is 7.
The present invention compared with prior art, has following technique effect using above technical scheme:
The present invention determines that glottis closes phase point using DYPSA algorithms, the weighting function that attenuation mainly encourages is determined, so as to utilize Weighted linear prediction obtains sound channel area in the case where glottis closes phase;Compared by the channel model parameter obtained to liftering, from width It is 2.66 to the averaged power spectrum error of reflectance factor that value angle defined, which closes phase method, and weighted linear prediction algorithm proposed in this paper is estimated Meter error is reduced to 2.01, realizes 24.3% promotion;And the normal voice and special voice that can realize highest 99% are known Rate and 96% polyp of vocal cord do not segment accuracy rate with vocal nodule.
Description of the drawings
Fig. 1 is the flow chart of implementation of the present invention;
Fig. 2 (a) is normalization glottis wave oscillogram;
Fig. 2 (b) is normalization glottal flow derivative oscillogram;
Fig. 3 is the LF models and its weighting function of derivative glottal flow;
Fig. 4 is that the sound channel area of MRI magnetic resonance imagings and two kinds of glottises close sound channel area under phase and compare;
Fig. 5 (a) is sound channel area distribution plot of the normal voice on different frame;
Fig. 5 (b) is sound channel area distribution plot of the vocal nodule voice on different frame;
Fig. 5 (c) is sound channel area distribution plot of the polyp of vocal cord on different frame;
Fig. 5 (d) is sound channel area distribution plot of the hyperthyroidism oedema voice on different frame;
Fig. 6 is the corresponding glottis source parameter error table of three kinds of methods;
Fig. 7 is the recognition result under three kinds of recognizers.
Specific embodiment
Technical scheme of the present invention is described in further detail below in conjunction with the accompanying drawings:
Embodiment 1
The algorithm of the present invention first with DYPSA algorithms as shown in Figure 1, obtain LF (Liljencrants-Fant) model GCI positions.Within a vibration period, vocal cords are vibrated from opening state to closed state, form two of glottis wave impulse Major part is opened mutually and closes phase.If Fig. 2 (a) represents glottis wave signal, as Fig. 2 (b) represents the LF moulds of derivative glottal flow signal Type.DYPSA algorithms using tilted phase function, obtain dynamic GCI from voice signal automatically;Then by determine GCI establishes weighting function, while obtains the cycle T of signal, and linear prediction analysis is weighted to voice signal, obtains and pipeline The equivalent reflectance factor of model;Finally discrete channels area is calculated using iteration function.
The selection of weighting function Wn is an essential part in weighted linear prediction, selects the weight letter of attenuation herein Number, such as Fig. 3 are based primarily upon practical glottal flow derivative waveform, reduce the tribute for being located at the speech samples of sound channel master drive near GCI It offers.The d as described in step 2 is preferably 10-4, and preferably 0.05, preferably 0.7, N slopes are preferably 7, obtain matrix W n.Then (3) In formula matrix equation, GCI1 and GCI2 positions it is known that voice data sn it is known that weight Wn it is known that matrix equation can be solved In all unknown number ai, and then obtain reflectance factor μm, Recursive Solution discrete area function Am.
The present invention is using University Of Suzhou's voice library and MEEI (Massachusetts Eye and Ear Infirmary) numbers According to library.The test set of database is sustained vowel/a/, and normal voice and special voice are chosen from database, wherein special Voice includes three kinds of vocal nodule voice, polyp of vocal cord voice and hyperthyroidism voice.100 normal voices are chosen from above-mentioned sample Sample frequency with 230 special voice (100 vocal nodules, 100 polyp of vocal cord, 30 hyperthyroidism voices) speech samples is 25kHz, chooses the frame length of 60ms, and the frame of 30ms moves.The estimation of vocal tract shape is by linear prediction order and sound channel length It influences, document is provided for vowel/a/, and the optimal sound channel length of adult is 17cm, and optimal linear prediction exponent number is 12.
Fig. 4 is that the sound channel area of MRI magnetic resonance imagings and two kinds of glottises close sound channel area under phase and compare, wherein VTA-MRI Represent the sound channel area that magnetic nuclear resonance method obtains, VTA-WLP (vocal tract area- Weightedlinearprediction) the sound channel area of phase method acquisition is closed in the weighted linear prediction proposed for this method, VTA-HPV (vocal tract area-halfpeak value) represents glottis wave amplitude that DengH proposes, and less than half is closed The sound channel area that phase method obtains.Abscissa represents the sound channel number of nodes from glottis to lip end, and ordinate is normalized section Product uses the exponent number of linear prediction as 12 herein, therefore 12 section sound channel pipeline node areas of acquisition is carried out equally spaced slotting Value processing finds out, VTA-WLP, closer to VTA-MRI, and shows more details than VTA-HPV from figure.It counts respectively Mean square error of the sound channel area data to nuclear magnetic resonance area data of two methods acquisition is calculated, wherein, what HPV methods obtained MSE_area=0.1542, WLP method obtain MSE_area=0.0341, it was demonstrated that using this method calculate sound channel area more It is accurate.It is below different classes of normal and area distribution plot of the special voice on different frame, wherein Fig. 5 (a) is represented just The VTAF of normal voice;Fig. 5 (b) is the VTAF of vocal nodule voice;Fig. 5 (c) is the VTAF of polyp of vocal cord voice;Fig. 5 (d) is represented The VTAF of hyperthyroidism oedema voice.When can be seen that hair/a/ from figure, lip site area is bigger, with Magnetic resonance imaging side The area that method obtains is compared, and normal voice tallies with the actual situation, other three kinds of voices seem disorderly, with reference figuration difference compared with Greatly.
From the validity of method after the verification improvement of liftering angle, while choose currently used iteration self-adapting liftering Algorithm (iterative adaptive inverse filtering, IAIF) compares jointly.IAIF algorithms are to channel model The influence of formant is modeled and eliminated by liftering, is accurately estimated by the method for linear prediction and the method for discrete full pole The model of sound channel obtains glottis signal finally by liftering.To liftering method, accurately assessment must be by means of glottis The known synthesis voice of wave signal.According to the LF models of the glottal flow derivative parametric configuration glottis wave extracted from raw tone, then Corresponding channel parameters are extracted, synthesize tested speech.
Specific assessment is by comparing the error between Prediction Parameters and actual test speech parameter.It shakes including normalization Width quotient NAQ (NAQ=Ugm/(|Ugc’|*(t0 T=t0))), open quotient QQQ (QOQ=(tc-t0)/(t0 T-t0)), slope ratio Sr (Sr =Ugc’/Ugr'), the position of each variable is indicated in fig. 2.The results are shown in Figure 6 for relative error.Wherein, CP_WLP (closed phase_weighted linear prediction) represents that phase method is closed in weighted linear prediction proposed in this paper, CP_HPV (closed phase_half peak value) represents the Deng H glottis wave amplitudes that provide, and less than half closes phase Method, IAIF are iteration self-adapting liftering algorithm;A test groups are that fundamental frequency is less than 150Hz, and average frequency is The mean error of ten synthesis voices of 130Hz is as a result, C test groups are higher than 250Hz for fundamental frequency, and average fundamental frequency is 280Hz's Ten synthesis phonological errors are as a result, B test groups represent fundamental frequency 150 to ten conjunctions that between 250Hz, average fundamental frequency is 220Hz Into voice.The result shows that in addition to the high band of Sr parameters, all parameters of other frequency ranges are all that CP_WLP behaves oneself best, IAIF The resultant error of algorithm is maximum, and two kinds to close phase method obvious with respect to the advantage of IAIF algorithm lifterings, illustrates to close facies analysis Necessity, while prove the superiority of this method CP_WLP.
Fig. 7 gives the recognition result and B (100 vocal cords of the A (normal voice and special voice) of three kinds of recognition methods Brief summary voice and 100 polyp of vocal cord voices) subdivision as a result, including discrimination, AUC indexs and Kappa indexs.AUC indexs It is used for describing the effect of identification with Kappa indexs, when the two refer to target value closer to 1, shows that recognition result is better.From table As can be seen that B subdivisions result will be slightly lower than the recognition result of A because normally the otherness between special voice compared to Discrimination between two kinds of special voices is more obvious, and wherein A highests discrimination can reach the highest subdivision knots of 99%, B Fruit can reach 96%, the results showed that, this algorithm realizes 7% compared to the Fusion Features optimization algorithm of same sound bank Promotion.

Claims (8)

1. a kind of method of estimation of the sound channel area in the case where glottis closes phase, which is characterized in that specifically comprise the following steps:
Step 1:Determine two adjacent closing point position GCI of glottis1And GCI2
Step 2:According to the two of glottis adjacent closing point position GCI1And GCI2Calculate the weight excitation function W of attenuationn, specifically It is as follows:
With two of glottis adjacent closing point position GCI1And GCI2For a cycle, by two adjacent closing point position GCI1With GCI2Neighbouring WnD is set as, with GCI1Coordinate, weight excitation function W are established for originnIncrease to 1 from d with constant-slope, with The identical slope of absolute value is reduced to d from 1, hereafter until GCI2Position WnBecome d, weight excitation function WnForm a ladder Shape piecewise function, it is as follows:
Wherein, d is the normal number less than 1, and n represents n-th of voice data point since origin, and N is the institute in a cycle There are data points, α, β are the ratio shared by different segmentations, N in piecewise functionSlopeRepresent that weighting function value rises to 1 from d and passed through The points crossed;
Step 3:The linear predictor coefficient of sound channel under glottis closes phase is calculated under conditions of weighted linear predicts mean square error minimum;
Step 4:Iterate to calculate the discrete channels area function of lossless pipeline model:
It is specific as follows using reflectance factor and the discrete channels area function of the lossless pipeline model of formula Recursive Solution:
Wherein, μmM-th of predictive coefficient of m rank linear predictions for reflectance factor, i.e. am(m);AmRepresent cutting for m length of tubing Area.
2. the method for estimation of the sound channel area according to claim 1 in the case where glottis closes phase, it is characterised in that:In step 1, Sounding vowel is chosen from the test set of database, determines that two of glottis adjacent close using DYPSA algorithms for its audible curve Chalaza position GCI1And GCI2
3. the method for estimation of the sound channel area according to claim 1 in the case where glottis closes phase, it is characterised in that:The step 3 Specifically comprise the following steps:
Step 3.1, the mean square error of weighted linear prediction is calculated, it is specific as follows:
Wherein, E be weighted linear prediction mean square error, enRepresent prediction error, WnIt is the weight excitation function of step 2, snIt is Voice signal, aiAs predictive coefficient, P be weighted linear prediction order, [GCI1,GCI2] except signal be 0;
Step 3.2, the mean square error E predicted the weighted linear that step 3.1 calculates carries out derivation so as to all aiLocal derviation Number is 0, specific as follows:
Step 3.3, solution matrix obtains all reflectance factor ai,
4. the method for estimation of the sound channel area according to claim 1 in the case where glottis closes phase, it is characterised in that:In step 4, It regards glottis to this section of sound channel of lip as a slight change, lossless, uniform sound pipe, establishes sound channel by multiple length phases Deng the different pipe series connection of, sectional area, the vocal tube model of lossless pipeline is formed.
5. the method for estimation of the sound channel area according to claim 1 in the case where glottis closes phase, it is characterised in that:In step 2, D is 10-4
6. the method for estimation of the sound channel area according to claim 1 in the case where glottis closes phase, it is characterised in that:In step 2, α is 0.05.
7. the method for estimation of the sound channel area according to claim 1 in the case where glottis closes phase, it is characterised in that:In step 2, β is 0.7.
8. the method for estimation of the sound channel area according to claim 1 in the case where glottis closes phase, it is characterised in that:In step 2, NSlopeIt is 7.
CN201711206456.3A 2017-11-27 2017-11-27 Method for estimating sound channel area under glottic closed phase Active CN108133713B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711206456.3A CN108133713B (en) 2017-11-27 2017-11-27 Method for estimating sound channel area under glottic closed phase

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711206456.3A CN108133713B (en) 2017-11-27 2017-11-27 Method for estimating sound channel area under glottic closed phase

Publications (2)

Publication Number Publication Date
CN108133713A true CN108133713A (en) 2018-06-08
CN108133713B CN108133713B (en) 2020-10-02

Family

ID=62389887

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711206456.3A Active CN108133713B (en) 2017-11-27 2017-11-27 Method for estimating sound channel area under glottic closed phase

Country Status (1)

Country Link
CN (1) CN108133713B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108830232A (en) * 2018-06-21 2018-11-16 浙江中点人工智能科技有限公司 A kind of voice signal period divisions method based on multiple dimensioned nonlinear energy operator
CN109119094A (en) * 2018-07-25 2019-01-01 苏州大学 Voice classification method by utilizing vocal cord modeling inversion

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5744742A (en) * 1995-11-07 1998-04-28 Euphonics, Incorporated Parametric signal modeling musical synthesizer
CN101578659A (en) * 2007-05-14 2009-11-11 松下电器产业株式会社 Voice tone converting device and voice tone converting method
CN102610236A (en) * 2012-02-29 2012-07-25 山东大学 Method for improving voice quality of throat microphone
CN102799759A (en) * 2012-06-14 2012-11-28 天津大学 Vocal tract morphological standardization method during large-scale physiological pronunciation data processing
CN103117059A (en) * 2012-12-27 2013-05-22 北京理工大学 Voice signal characteristics extracting method based on tensor decomposition
CN103778913A (en) * 2014-01-22 2014-05-07 苏州大学 Pathological voice recognition method
US9263052B1 (en) * 2013-01-25 2016-02-16 Google Inc. Simultaneous estimation of fundamental frequency, voicing state, and glottal closure instant
CN105679333A (en) * 2016-03-03 2016-06-15 河海大学常州校区 Vocal cord-larynx ventricle-vocal track linked physical model and mental pressure detection method

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5744742A (en) * 1995-11-07 1998-04-28 Euphonics, Incorporated Parametric signal modeling musical synthesizer
CN101578659A (en) * 2007-05-14 2009-11-11 松下电器产业株式会社 Voice tone converting device and voice tone converting method
CN102610236A (en) * 2012-02-29 2012-07-25 山东大学 Method for improving voice quality of throat microphone
CN102799759A (en) * 2012-06-14 2012-11-28 天津大学 Vocal tract morphological standardization method during large-scale physiological pronunciation data processing
CN103117059A (en) * 2012-12-27 2013-05-22 北京理工大学 Voice signal characteristics extracting method based on tensor decomposition
US9263052B1 (en) * 2013-01-25 2016-02-16 Google Inc. Simultaneous estimation of fundamental frequency, voicing state, and glottal closure instant
CN103778913A (en) * 2014-01-22 2014-05-07 苏州大学 Pathological voice recognition method
CN105679333A (en) * 2016-03-03 2016-06-15 河海大学常州校区 Vocal cord-larynx ventricle-vocal track linked physical model and mental pressure detection method

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
HUIQUN DENG等: "A New Method for Obtaining Accurate Estimates of Vocal-Tract Filters and Glottal Waves From Vowel Sounds", 《IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING》 *
HUIQUN DENG等: "ESTIMATING VOCAL-TRACT AREA FUNCTIONS FROM VOWEL SOUND SIGNALS OVER CLOSED GLOTTAL PHASES", 《2004 INTERNATIONAL CONFERENCE ON ACOUSTICS,SPEECH,AND SIGNAL PROCESSING》 *
TALAL BIN AMIN等: "Glottal and Vocal Tract Characteristics of Voice Impersonators", 《IEEE TRANSACTIONS ON MULTIMEDIA》 *
中岛隆之等: "用自适应倒滤波法估计声道面积函数", 《电子计算机参考资料》 *
俞振利等: "从语音信号的有限个共振峰频率估计声道面积参数的一个方法", 《电子学报》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108830232A (en) * 2018-06-21 2018-11-16 浙江中点人工智能科技有限公司 A kind of voice signal period divisions method based on multiple dimensioned nonlinear energy operator
CN108830232B (en) * 2018-06-21 2021-06-15 浙江中点人工智能科技有限公司 Voice signal period segmentation method based on multi-scale nonlinear energy operator
CN109119094A (en) * 2018-07-25 2019-01-01 苏州大学 Voice classification method by utilizing vocal cord modeling inversion

Also Published As

Publication number Publication date
CN108133713B (en) 2020-10-02

Similar Documents

Publication Publication Date Title
Ghosh et al. A generalized smoothness criterion for acoustic-to-articulatory inversion
Uria et al. A deep neural network for acoustic-articulatory speech inversion
Degottex et al. Phase minimization for glottal model estimation
CN111048071B (en) Voice data processing method, device, computer equipment and storage medium
CN104221018A (en) Sound detecting apparatus, sound detecting method, sound feature value detecting apparatus, sound feature value detecting method, sound section detecting apparatus, sound section detecting method, and program
van Santen et al. High-accuracy automatic segmentation.
EP2843659B1 (en) Method and apparatus for detecting correctness of pitch period
CN108133713A (en) Method for estimating sound channel area under glottic closed phase
Pruthi et al. Simulation and analysis of nasalized vowels based on magnetic resonance imaging data
CN108369803B (en) Method for forming an excitation signal for a parametric speech synthesis system based on a glottal pulse model
Greenwood et al. Measurements of vocal tract shapes using magnetic resonance imaging
CA2947957C (en) Method for forming the excitation signal for a glottal pulse model based parametric speech synthesis system
Xie et al. Investigation of stacked deep neural networks and mixture density networks for acoustic-to-articulatory inversion
CN115116475B (en) Voice depression automatic detection method and device based on time delay neural network
Rodriguez et al. A fuzzy information space approach to speech signal non‐linear analysis
Airaksinen et al. Automatic estimation of the lip radiation effect in glottal inverse filtering
Degottex et al. Joint estimate of shape and time-synchronization of a glottal source model by phase flatness
Naikare et al. Classification of voice disorders using i-vector analysis
Arroabarren et al. Glottal source parameterization: a comparative study
Laprie A concurrent curve strategy for formant tracking.
Wood et al. Excitation synchronous formant analysis
Bous et al. Semi-supervised learning of glottal pulse positions in a neural analysis-synthesis framework
Vernekar et al. Deep learning model for speech emotion classification based on GCI and GOI detection
WO2023242445A1 (en) Glottal features extraction using neural networks
Rasilo Estimation of vocal tract shape trajectory using lossy Kelly-Lochbaum model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant