CN108133713A - Method for estimating sound channel area under glottic closed phase - Google Patents
Method for estimating sound channel area under glottic closed phase Download PDFInfo
- Publication number
- CN108133713A CN108133713A CN201711206456.3A CN201711206456A CN108133713A CN 108133713 A CN108133713 A CN 108133713A CN 201711206456 A CN201711206456 A CN 201711206456A CN 108133713 A CN108133713 A CN 108133713A
- Authority
- CN
- China
- Prior art keywords
- sound channel
- gci
- glottis
- channel area
- estimation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 53
- 230000001755 vocal effect Effects 0.000 claims abstract description 14
- 230000005284 excitation Effects 0.000 claims abstract description 9
- 210000004704 glottis Anatomy 0.000 claims description 52
- 238000012360 testing method Methods 0.000 claims description 7
- 239000011159 matrix material Substances 0.000 claims description 5
- 238000009795 derivation Methods 0.000 claims description 2
- 230000011218 segmentation Effects 0.000 claims description 2
- 238000004458 analytical method Methods 0.000 abstract description 4
- 238000001914 filtration Methods 0.000 abstract description 2
- 230000004927 fusion Effects 0.000 abstract description 2
- 238000005457 optimization Methods 0.000 abstract description 2
- 238000002595 magnetic resonance imaging Methods 0.000 description 10
- 206010047675 Vocal cord polyp Diseases 0.000 description 7
- 208000014515 polyp of vocal cord Diseases 0.000 description 7
- 206010020850 Hyperthyroidism Diseases 0.000 description 5
- 238000009826 distribution Methods 0.000 description 5
- 230000015572 biosynthetic process Effects 0.000 description 4
- 238000003786 synthesis reaction Methods 0.000 description 4
- 238000010606 normalization Methods 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 210000001260 vocal cord Anatomy 0.000 description 3
- 206010030113 Oedema Diseases 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000003384 imaging method Methods 0.000 description 2
- 208000035126 Facies Diseases 0.000 description 1
- 238000005481 NMR spectroscopy Methods 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 239000012141 concentrate Substances 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000005855 radiation Effects 0.000 description 1
- 238000009774 resonance method Methods 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 230000002459 sustained effect Effects 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/75—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 for modelling vocal tract parameters
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/12—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being prediction coefficients
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/60—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for measuring the quality of voice signals
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Quality & Reliability (AREA)
- Circuit For Audible Band Transducer (AREA)
Abstract
The invention discloses a method for estimating the area of a sound channel under glottic closure, which comprises the steps of firstly determining the positions of two adjacent closed points of the glottic by adopting a DYPSA algorithm, and synchronously calculating the weight excitation function of attenuation by taking the two adjacent closed points as a unit; then, calculating a reflection coefficient of a glottal closed-phase lower sound channel model by using a weighted linear prediction method; the discrete vocal tract area function is then iteratively calculated. The superiority of the method is verified from the perspective of inverse filtering, six types of sound channel area characteristics are selected for recognition and analysis, and 7% of accuracy improvement is achieved compared with a characteristic fusion optimization algorithm using the same voice library.
Description
Technical field
The present invention relates to linear prediction method estimation sound channel area techniques fields more particularly to one kind to estimate in the case where glottis closes phase
The method of sound channel area.
Background technology
Sound channel is one of important system during speech production, the research of vocal tract shape can be applied to phonetic synthesis,
Speech recognition, speech training, music control etc..Research shows that when sending out identical voice, some special voices are (such as:Vocal cords are small
Knot, polyp of vocal cord, hyperthyroidism voice) and the corresponding vocal tract shape of normal voice it is different.X-ray imaging, ultrasonic imaging, MRI
The medical procedures such as (magnetic resonance imaging) magnetic resonance imaging can obtain accurate sound channel area, but this
A little methods make subject be exposed in different types of ray and electromagnetic wave, have potential hazard to human body and also equipment requirement compared with
High, complicated for operation, underaction is convenient.The method of estimation vocal tract shape is it is only necessary to handle voice data indirectly, letter
It is single practical.Estimation sound channel area mainly has formant method and linear prediction liftering method at present, wherein using the inverse filter of linear prediction
The method of wave is related to the hypothesis to boundary condition.
In the research of linear prediction method estimation sound channel area, there are two different boundary conditions:Glottis is closed completely, i.e.,
Glottis reflectance factor is 1, and sound channel loss concentrates on lip end;Lip end is closed completely, i.e., lip end reflectance factor is 1, sound channel loss collection
In at glottis.
During the above-mentioned hypothesis of practical application, both are not well positioned to meet, so as to be unfavorable for estimating for sound channel area function
Meter if glottis is in sounding, is regularly opened and is closed, and under the conditions of frequency is low-down, lip end radiation impedance can be by
It is considered 0, then boundary condition cannot obtain rational result;And the pronunciation of certain vowels (vowel/a/ etc.) leads to condition not
It tallies with the actual situation.
Deng H propose to estimate sound channel area function in the case where glottis closes phase, but glottis is only closed phase and the width of glottis wave by it
Value connects, less than half of glottis wave amplitude in peak value is considered that glottis closes phase by him, this method of estimation is not stringent
Accurately, and the data volume for autocorrelation analysis is caused to become insufficient.
Invention content
The technical problems to be solved by the invention are that in order to overcome the disadvantages of the above prior art, the present invention closes phase in glottis
A kind of new algorithm is proposed on the basis of method, to reach the sound channel area that voice under phase is closed in accurate estimation.
The present invention uses following technical scheme to solve above-mentioned technical problem
A kind of method of estimation of the sound channel area in the case where glottis closes phase, specifically comprises the following steps:
Step 1:Determine two adjacent closing point position GCI of glottis1And GCI2;
Step 2:According to the two of glottis adjacent closing point position GCI1And GCI2Calculate the weight excitation function W of attenuationn,
It is specific as follows:
With two of glottis adjacent closing point position GCI1And GCI2For a cycle, by two adjacent closing point positions
GCI1And GCI2Neighbouring WnD is set as, with GCI1Coordinate, weight excitation function W are established for originnIncreased with constant-slope from d
To 1, d is reduced to from 1 with the identical slope of absolute value, hereafter until GCI2Position WnBecome d, weight excitation function WnIt is formed
One trapezoidal piecewise function, it is as follows:
Wherein, d is the normal number less than 1, and n represents n-th of voice data point since origin, and N is in a cycle
All data points, α, β are the ratio that different segmentations are shared in piecewise function, NSlopeRepresent that weighting function value rises to 1 from d
The points passed through;
Step 3:The linear prediction of sound channel under glottis closes phase is calculated under conditions of weighted linear predicts mean square error minimum
Coefficient;
Step 4:Iterate to calculate the discrete channels area function of lossless pipeline model:
It is specific as follows using reflectance factor and the discrete channels area function of the lossless pipeline model of formula Recursive Solution:
Wherein, μmM-th of predictive coefficient of m rank linear predictions for reflectance factor, i.e. am(m);AmRepresent m length of tubing
Sectional area.
As the further preferred scheme of present invention method of estimation of sound channel area in the case where glottis closes phase, in step 1, from
The test set of database chooses sounding vowel, determines two adjacent closures of glottis using DYPSA algorithms for its audible curve
Point position GCI1And GCI2。
As the further preferred scheme of present invention method of estimation of sound channel area in the case where glottis closes phase, the step 3 has
Body comprises the following steps:
Step 3.1, the mean square error of weighted linear prediction is calculated, it is specific as follows:
Wherein, E be weighted linear prediction mean square error, enRepresent prediction error, WnIt is the weight excitation function of step 2,
snIt is voice signal, aiAs predictive coefficient, P be weighted linear prediction order, [GCI1,GCI2] except signal be 0;
Step 3.2, the mean square error E predicted the weighted linear that step 3.1 calculates carries out derivation so as to all ai's
Partial derivative is 0, specific as follows:
Step 3.3, it solves above-mentioned matrix and obtains all reflectance factor ai,。
It, in step 4, will as the further preferred scheme of present invention method of estimation of sound channel area in the case where glottis closes phase
Glottis regards a slight change, lossless, uniform sound pipe as to this section of sound channel of lip, establish sound channel by multiple equal lengths,
The different pipe series connection of sectional area forms the vocal tube model of lossless pipeline.
As the further preferred scheme of present invention method of estimation of sound channel area in the case where glottis closes phase, in step 2, d
It is 10-4。
As the further preferred scheme of present invention method of estimation of sound channel area in the case where glottis closes phase, in step 2, α
It is 0.05.
As the further preferred scheme of present invention method of estimation of sound channel area in the case where glottis closes phase, in step 2, β
It is 0.7.
As the further preferred scheme of present invention method of estimation of sound channel area in the case where glottis closes phase, in step 2,
NSlopeIt is 7.
The present invention compared with prior art, has following technique effect using above technical scheme:
The present invention determines that glottis closes phase point using DYPSA algorithms, the weighting function that attenuation mainly encourages is determined, so as to utilize
Weighted linear prediction obtains sound channel area in the case where glottis closes phase;Compared by the channel model parameter obtained to liftering, from width
It is 2.66 to the averaged power spectrum error of reflectance factor that value angle defined, which closes phase method, and weighted linear prediction algorithm proposed in this paper is estimated
Meter error is reduced to 2.01, realizes 24.3% promotion;And the normal voice and special voice that can realize highest 99% are known
Rate and 96% polyp of vocal cord do not segment accuracy rate with vocal nodule.
Description of the drawings
Fig. 1 is the flow chart of implementation of the present invention;
Fig. 2 (a) is normalization glottis wave oscillogram;
Fig. 2 (b) is normalization glottal flow derivative oscillogram;
Fig. 3 is the LF models and its weighting function of derivative glottal flow;
Fig. 4 is that the sound channel area of MRI magnetic resonance imagings and two kinds of glottises close sound channel area under phase and compare;
Fig. 5 (a) is sound channel area distribution plot of the normal voice on different frame;
Fig. 5 (b) is sound channel area distribution plot of the vocal nodule voice on different frame;
Fig. 5 (c) is sound channel area distribution plot of the polyp of vocal cord on different frame;
Fig. 5 (d) is sound channel area distribution plot of the hyperthyroidism oedema voice on different frame;
Fig. 6 is the corresponding glottis source parameter error table of three kinds of methods;
Fig. 7 is the recognition result under three kinds of recognizers.
Specific embodiment
Technical scheme of the present invention is described in further detail below in conjunction with the accompanying drawings:
Embodiment 1
The algorithm of the present invention first with DYPSA algorithms as shown in Figure 1, obtain LF (Liljencrants-Fant) model
GCI positions.Within a vibration period, vocal cords are vibrated from opening state to closed state, form two of glottis wave impulse
Major part is opened mutually and closes phase.If Fig. 2 (a) represents glottis wave signal, as Fig. 2 (b) represents the LF moulds of derivative glottal flow signal
Type.DYPSA algorithms using tilted phase function, obtain dynamic GCI from voice signal automatically;Then by determine
GCI establishes weighting function, while obtains the cycle T of signal, and linear prediction analysis is weighted to voice signal, obtains and pipeline
The equivalent reflectance factor of model;Finally discrete channels area is calculated using iteration function.
The selection of weighting function Wn is an essential part in weighted linear prediction, selects the weight letter of attenuation herein
Number, such as Fig. 3 are based primarily upon practical glottal flow derivative waveform, reduce the tribute for being located at the speech samples of sound channel master drive near GCI
It offers.The d as described in step 2 is preferably 10-4, and preferably 0.05, preferably 0.7, N slopes are preferably 7, obtain matrix W n.Then (3)
In formula matrix equation, GCI1 and GCI2 positions it is known that voice data sn it is known that weight Wn it is known that matrix equation can be solved
In all unknown number ai, and then obtain reflectance factor μm, Recursive Solution discrete area function Am.
The present invention is using University Of Suzhou's voice library and MEEI (Massachusetts Eye and Ear Infirmary) numbers
According to library.The test set of database is sustained vowel/a/, and normal voice and special voice are chosen from database, wherein special
Voice includes three kinds of vocal nodule voice, polyp of vocal cord voice and hyperthyroidism voice.100 normal voices are chosen from above-mentioned sample
Sample frequency with 230 special voice (100 vocal nodules, 100 polyp of vocal cord, 30 hyperthyroidism voices) speech samples is
25kHz, chooses the frame length of 60ms, and the frame of 30ms moves.The estimation of vocal tract shape is by linear prediction order and sound channel length
It influences, document is provided for vowel/a/, and the optimal sound channel length of adult is 17cm, and optimal linear prediction exponent number is 12.
Fig. 4 is that the sound channel area of MRI magnetic resonance imagings and two kinds of glottises close sound channel area under phase and compare, wherein VTA-MRI
Represent the sound channel area that magnetic nuclear resonance method obtains, VTA-WLP (vocal tract area-
Weightedlinearprediction) the sound channel area of phase method acquisition is closed in the weighted linear prediction proposed for this method,
VTA-HPV (vocal tract area-halfpeak value) represents glottis wave amplitude that DengH proposes, and less than half is closed
The sound channel area that phase method obtains.Abscissa represents the sound channel number of nodes from glottis to lip end, and ordinate is normalized section
Product uses the exponent number of linear prediction as 12 herein, therefore 12 section sound channel pipeline node areas of acquisition is carried out equally spaced slotting
Value processing finds out, VTA-WLP, closer to VTA-MRI, and shows more details than VTA-HPV from figure.It counts respectively
Mean square error of the sound channel area data to nuclear magnetic resonance area data of two methods acquisition is calculated, wherein, what HPV methods obtained
MSE_area=0.1542, WLP method obtain MSE_area=0.0341, it was demonstrated that using this method calculate sound channel area more
It is accurate.It is below different classes of normal and area distribution plot of the special voice on different frame, wherein Fig. 5 (a) is represented just
The VTAF of normal voice;Fig. 5 (b) is the VTAF of vocal nodule voice;Fig. 5 (c) is the VTAF of polyp of vocal cord voice;Fig. 5 (d) is represented
The VTAF of hyperthyroidism oedema voice.When can be seen that hair/a/ from figure, lip site area is bigger, with Magnetic resonance imaging side
The area that method obtains is compared, and normal voice tallies with the actual situation, other three kinds of voices seem disorderly, with reference figuration difference compared with
Greatly.
From the validity of method after the verification improvement of liftering angle, while choose currently used iteration self-adapting liftering
Algorithm (iterative adaptive inverse filtering, IAIF) compares jointly.IAIF algorithms are to channel model
The influence of formant is modeled and eliminated by liftering, is accurately estimated by the method for linear prediction and the method for discrete full pole
The model of sound channel obtains glottis signal finally by liftering.To liftering method, accurately assessment must be by means of glottis
The known synthesis voice of wave signal.According to the LF models of the glottal flow derivative parametric configuration glottis wave extracted from raw tone, then
Corresponding channel parameters are extracted, synthesize tested speech.
Specific assessment is by comparing the error between Prediction Parameters and actual test speech parameter.It shakes including normalization
Width quotient NAQ (NAQ=Ugm/(|Ugc’|*(t0 T=t0))), open quotient QQQ (QOQ=(tc-t0)/(t0 T-t0)), slope ratio Sr (Sr
=Ugc’/Ugr'), the position of each variable is indicated in fig. 2.The results are shown in Figure 6 for relative error.Wherein, CP_WLP
(closed phase_weighted linear prediction) represents that phase method is closed in weighted linear prediction proposed in this paper,
CP_HPV (closed phase_half peak value) represents the Deng H glottis wave amplitudes that provide, and less than half closes phase
Method, IAIF are iteration self-adapting liftering algorithm;A test groups are that fundamental frequency is less than 150Hz, and average frequency is
The mean error of ten synthesis voices of 130Hz is as a result, C test groups are higher than 250Hz for fundamental frequency, and average fundamental frequency is 280Hz's
Ten synthesis phonological errors are as a result, B test groups represent fundamental frequency 150 to ten conjunctions that between 250Hz, average fundamental frequency is 220Hz
Into voice.The result shows that in addition to the high band of Sr parameters, all parameters of other frequency ranges are all that CP_WLP behaves oneself best, IAIF
The resultant error of algorithm is maximum, and two kinds to close phase method obvious with respect to the advantage of IAIF algorithm lifterings, illustrates to close facies analysis
Necessity, while prove the superiority of this method CP_WLP.
Fig. 7 gives the recognition result and B (100 vocal cords of the A (normal voice and special voice) of three kinds of recognition methods
Brief summary voice and 100 polyp of vocal cord voices) subdivision as a result, including discrimination, AUC indexs and Kappa indexs.AUC indexs
It is used for describing the effect of identification with Kappa indexs, when the two refer to target value closer to 1, shows that recognition result is better.From table
As can be seen that B subdivisions result will be slightly lower than the recognition result of A because normally the otherness between special voice compared to
Discrimination between two kinds of special voices is more obvious, and wherein A highests discrimination can reach the highest subdivision knots of 99%, B
Fruit can reach 96%, the results showed that, this algorithm realizes 7% compared to the Fusion Features optimization algorithm of same sound bank
Promotion.
Claims (8)
1. a kind of method of estimation of the sound channel area in the case where glottis closes phase, which is characterized in that specifically comprise the following steps:
Step 1:Determine two adjacent closing point position GCI of glottis1And GCI2;
Step 2:According to the two of glottis adjacent closing point position GCI1And GCI2Calculate the weight excitation function W of attenuationn, specifically
It is as follows:
With two of glottis adjacent closing point position GCI1And GCI2For a cycle, by two adjacent closing point position GCI1With
GCI2Neighbouring WnD is set as, with GCI1Coordinate, weight excitation function W are established for originnIncrease to 1 from d with constant-slope, with
The identical slope of absolute value is reduced to d from 1, hereafter until GCI2Position WnBecome d, weight excitation function WnForm a ladder
Shape piecewise function, it is as follows:
Wherein, d is the normal number less than 1, and n represents n-th of voice data point since origin, and N is the institute in a cycle
There are data points, α, β are the ratio shared by different segmentations, N in piecewise functionSlopeRepresent that weighting function value rises to 1 from d and passed through
The points crossed;
Step 3:The linear predictor coefficient of sound channel under glottis closes phase is calculated under conditions of weighted linear predicts mean square error minimum;
Step 4:Iterate to calculate the discrete channels area function of lossless pipeline model:
It is specific as follows using reflectance factor and the discrete channels area function of the lossless pipeline model of formula Recursive Solution:
Wherein, μmM-th of predictive coefficient of m rank linear predictions for reflectance factor, i.e. am(m);AmRepresent cutting for m length of tubing
Area.
2. the method for estimation of the sound channel area according to claim 1 in the case where glottis closes phase, it is characterised in that:In step 1,
Sounding vowel is chosen from the test set of database, determines that two of glottis adjacent close using DYPSA algorithms for its audible curve
Chalaza position GCI1And GCI2。
3. the method for estimation of the sound channel area according to claim 1 in the case where glottis closes phase, it is characterised in that:The step 3
Specifically comprise the following steps:
Step 3.1, the mean square error of weighted linear prediction is calculated, it is specific as follows:
Wherein, E be weighted linear prediction mean square error, enRepresent prediction error, WnIt is the weight excitation function of step 2, snIt is
Voice signal, aiAs predictive coefficient, P be weighted linear prediction order, [GCI1,GCI2] except signal be 0;
Step 3.2, the mean square error E predicted the weighted linear that step 3.1 calculates carries out derivation so as to all aiLocal derviation
Number is 0, specific as follows:
Step 3.3, solution matrix obtains all reflectance factor ai,。
4. the method for estimation of the sound channel area according to claim 1 in the case where glottis closes phase, it is characterised in that:In step 4,
It regards glottis to this section of sound channel of lip as a slight change, lossless, uniform sound pipe, establishes sound channel by multiple length phases
Deng the different pipe series connection of, sectional area, the vocal tube model of lossless pipeline is formed.
5. the method for estimation of the sound channel area according to claim 1 in the case where glottis closes phase, it is characterised in that:In step 2,
D is 10-4。
6. the method for estimation of the sound channel area according to claim 1 in the case where glottis closes phase, it is characterised in that:In step 2,
α is 0.05.
7. the method for estimation of the sound channel area according to claim 1 in the case where glottis closes phase, it is characterised in that:In step 2,
β is 0.7.
8. the method for estimation of the sound channel area according to claim 1 in the case where glottis closes phase, it is characterised in that:In step 2,
NSlopeIt is 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711206456.3A CN108133713B (en) | 2017-11-27 | 2017-11-27 | Method for estimating sound channel area under glottic closed phase |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711206456.3A CN108133713B (en) | 2017-11-27 | 2017-11-27 | Method for estimating sound channel area under glottic closed phase |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108133713A true CN108133713A (en) | 2018-06-08 |
CN108133713B CN108133713B (en) | 2020-10-02 |
Family
ID=62389887
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711206456.3A Active CN108133713B (en) | 2017-11-27 | 2017-11-27 | Method for estimating sound channel area under glottic closed phase |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108133713B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108830232A (en) * | 2018-06-21 | 2018-11-16 | 浙江中点人工智能科技有限公司 | A kind of voice signal period divisions method based on multiple dimensioned nonlinear energy operator |
CN109119094A (en) * | 2018-07-25 | 2019-01-01 | 苏州大学 | Voice classification method by utilizing vocal cord modeling inversion |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5744742A (en) * | 1995-11-07 | 1998-04-28 | Euphonics, Incorporated | Parametric signal modeling musical synthesizer |
CN101578659A (en) * | 2007-05-14 | 2009-11-11 | 松下电器产业株式会社 | Voice tone converting device and voice tone converting method |
CN102610236A (en) * | 2012-02-29 | 2012-07-25 | 山东大学 | Method for improving voice quality of throat microphone |
CN102799759A (en) * | 2012-06-14 | 2012-11-28 | 天津大学 | Vocal tract morphological standardization method during large-scale physiological pronunciation data processing |
CN103117059A (en) * | 2012-12-27 | 2013-05-22 | 北京理工大学 | Voice signal characteristics extracting method based on tensor decomposition |
CN103778913A (en) * | 2014-01-22 | 2014-05-07 | 苏州大学 | Pathological voice recognition method |
US9263052B1 (en) * | 2013-01-25 | 2016-02-16 | Google Inc. | Simultaneous estimation of fundamental frequency, voicing state, and glottal closure instant |
CN105679333A (en) * | 2016-03-03 | 2016-06-15 | 河海大学常州校区 | Vocal cord-larynx ventricle-vocal track linked physical model and mental pressure detection method |
-
2017
- 2017-11-27 CN CN201711206456.3A patent/CN108133713B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5744742A (en) * | 1995-11-07 | 1998-04-28 | Euphonics, Incorporated | Parametric signal modeling musical synthesizer |
CN101578659A (en) * | 2007-05-14 | 2009-11-11 | 松下电器产业株式会社 | Voice tone converting device and voice tone converting method |
CN102610236A (en) * | 2012-02-29 | 2012-07-25 | 山东大学 | Method for improving voice quality of throat microphone |
CN102799759A (en) * | 2012-06-14 | 2012-11-28 | 天津大学 | Vocal tract morphological standardization method during large-scale physiological pronunciation data processing |
CN103117059A (en) * | 2012-12-27 | 2013-05-22 | 北京理工大学 | Voice signal characteristics extracting method based on tensor decomposition |
US9263052B1 (en) * | 2013-01-25 | 2016-02-16 | Google Inc. | Simultaneous estimation of fundamental frequency, voicing state, and glottal closure instant |
CN103778913A (en) * | 2014-01-22 | 2014-05-07 | 苏州大学 | Pathological voice recognition method |
CN105679333A (en) * | 2016-03-03 | 2016-06-15 | 河海大学常州校区 | Vocal cord-larynx ventricle-vocal track linked physical model and mental pressure detection method |
Non-Patent Citations (5)
Title |
---|
HUIQUN DENG等: "A New Method for Obtaining Accurate Estimates of Vocal-Tract Filters and Glottal Waves From Vowel Sounds", 《IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING》 * |
HUIQUN DENG等: "ESTIMATING VOCAL-TRACT AREA FUNCTIONS FROM VOWEL SOUND SIGNALS OVER CLOSED GLOTTAL PHASES", 《2004 INTERNATIONAL CONFERENCE ON ACOUSTICS,SPEECH,AND SIGNAL PROCESSING》 * |
TALAL BIN AMIN等: "Glottal and Vocal Tract Characteristics of Voice Impersonators", 《IEEE TRANSACTIONS ON MULTIMEDIA》 * |
中岛隆之等: "用自适应倒滤波法估计声道面积函数", 《电子计算机参考资料》 * |
俞振利等: "从语音信号的有限个共振峰频率估计声道面积参数的一个方法", 《电子学报》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108830232A (en) * | 2018-06-21 | 2018-11-16 | 浙江中点人工智能科技有限公司 | A kind of voice signal period divisions method based on multiple dimensioned nonlinear energy operator |
CN108830232B (en) * | 2018-06-21 | 2021-06-15 | 浙江中点人工智能科技有限公司 | Voice signal period segmentation method based on multi-scale nonlinear energy operator |
CN109119094A (en) * | 2018-07-25 | 2019-01-01 | 苏州大学 | Voice classification method by utilizing vocal cord modeling inversion |
Also Published As
Publication number | Publication date |
---|---|
CN108133713B (en) | 2020-10-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Ghosh et al. | A generalized smoothness criterion for acoustic-to-articulatory inversion | |
Uria et al. | A deep neural network for acoustic-articulatory speech inversion | |
Degottex et al. | Phase minimization for glottal model estimation | |
CN111048071B (en) | Voice data processing method, device, computer equipment and storage medium | |
CN104221018A (en) | Sound detecting apparatus, sound detecting method, sound feature value detecting apparatus, sound feature value detecting method, sound section detecting apparatus, sound section detecting method, and program | |
van Santen et al. | High-accuracy automatic segmentation. | |
EP2843659B1 (en) | Method and apparatus for detecting correctness of pitch period | |
CN108133713A (en) | Method for estimating sound channel area under glottic closed phase | |
Pruthi et al. | Simulation and analysis of nasalized vowels based on magnetic resonance imaging data | |
CN108369803B (en) | Method for forming an excitation signal for a parametric speech synthesis system based on a glottal pulse model | |
Greenwood et al. | Measurements of vocal tract shapes using magnetic resonance imaging | |
CA2947957C (en) | Method for forming the excitation signal for a glottal pulse model based parametric speech synthesis system | |
Xie et al. | Investigation of stacked deep neural networks and mixture density networks for acoustic-to-articulatory inversion | |
CN115116475B (en) | Voice depression automatic detection method and device based on time delay neural network | |
Rodriguez et al. | A fuzzy information space approach to speech signal non‐linear analysis | |
Airaksinen et al. | Automatic estimation of the lip radiation effect in glottal inverse filtering | |
Degottex et al. | Joint estimate of shape and time-synchronization of a glottal source model by phase flatness | |
Naikare et al. | Classification of voice disorders using i-vector analysis | |
Arroabarren et al. | Glottal source parameterization: a comparative study | |
Laprie | A concurrent curve strategy for formant tracking. | |
Wood et al. | Excitation synchronous formant analysis | |
Bous et al. | Semi-supervised learning of glottal pulse positions in a neural analysis-synthesis framework | |
Vernekar et al. | Deep learning model for speech emotion classification based on GCI and GOI detection | |
WO2023242445A1 (en) | Glottal features extraction using neural networks | |
Rasilo | Estimation of vocal tract shape trajectory using lossy Kelly-Lochbaum model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |