WO1988007738A1 - Appareil d'estimation de variations multiples utilisant des techniques adaptatives - Google Patents

Appareil d'estimation de variations multiples utilisant des techniques adaptatives Download PDF

Info

Publication number
WO1988007738A1
WO1988007738A1 PCT/US1988/000030 US8800030W WO8807738A1 WO 1988007738 A1 WO1988007738 A1 WO 1988007738A1 US 8800030 W US8800030 W US 8800030W WO 8807738 A1 WO8807738 A1 WO 8807738A1
Authority
WO
WIPO (PCT)
Prior art keywords
classifiers
calculating
statistical
speech
frame
Prior art date
Application number
PCT/US1988/000030
Other languages
English (en)
Inventor
David Lynn Thomson
Original Assignee
American Telephone & Telegraph Company
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by American Telephone & Telegraph Company filed Critical American Telephone & Telegraph Company
Priority to JP62506332A priority Critical patent/JPH01502779A/ja
Priority to DE8888901347T priority patent/DE3875894T2/de
Priority to AT88901347T priority patent/ATE82426T1/de
Publication of WO1988007738A1 publication Critical patent/WO1988007738A1/fr
Priority to SG598/93A priority patent/SG59893G/en
Priority to HK1066/93A priority patent/HK106693A/xx

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/93Discriminating between voiced and unvoiced parts of speech signals

Definitions

  • This invention relates to classifying samples representing a real time process into groups with each group corresponding to a state of the real time process.
  • the classifying is done in real time as each sample is generated using statistical techniques.
  • One example of such a process is the generation of speech by the human vocal tract.
  • the sound produced by the vocal tract can have a fundamental frequency - voiced state or no fundamental frequency - unvoiced state.
  • a third state may exist if no sound is being produced - silence state.
  • the problem of determining these three states is referred to as the voicing/silence decision.
  • degradation of voice quality is often due to inaccurate voicing decisions.
  • the difficulty in correctly making these voicing decisions lies in the fact that no single speech parameter or classifier can reliably distinguish voiced speech from unvoiced speech.
  • this relationship may be expressed as a'x + b > 0 where "a” is a vector comprising the weights, "x” is a vector comprising the classifiers, and "b” is a scalar representing the threshold value.
  • the weights are chosen to maximize performance on a training set of speech where the voicing of each frame is known. These weights form a decision rule which provides significant speech quality improvements in speech coders compared to those using a single parameter.
  • a problem associated with the fixed weighted sum method is that it does not perform well when the speech environment changes. Such changes in the speech environment may be a result of a telephone conversation being carried on in a car via a mobile telephone or maybe due to different telephone transmitters.
  • the reason that the fixed weighted sum methods do not perform well in changing environments is that many speech classifiers are influenced by background noise, non-linear distortion, and filtering. If voicing is to be determined for speech with characteristics different from that of the training set, the weights, in general, will not yield satisfactory results.
  • the speech samples are processed by a set of weights and a threshold value after the results of one of these sets is chosen on the basis of the value of a signal-to-noise-ratio, SNR.
  • SNR signal-to-noise-ratio
  • the range of possible values that the SNR can have is subdivided into subranges with each subrange being assigned to one of the sets.
  • the SNR is calculated; the subrange is determined; and then, the detector associated with this subrange is used to determine whether the frame is unvoiced/voiced.
  • the problem with this method is that it is only valid for the training data plus white noise and cannot adapt to a wide range of speech environments and speakers. Therefore, there exists a need for a voiced detector that can reliably determine whether speech is unvoiced or voiced for a varying environment and different speakers.
  • an apparatus that is responsive to real time samples from a physical process to determine statistical distributions for plurality of process states and from the those distributions to establish decision regions.
  • the latter regions are used to determine the present process state as each process sample is generated.
  • the apparatus adapts to a changing speech environment by utilizing the statistics of classifiers of the speech. Statistics are based on the classifiers and are used to modify the decision regions used in the voicing decision.
  • the apparatus estimates statistical distributions for both voiced and unvoiced frames and uses those statistical distributions for determining decision regions. The latter regions are then used to determine whether a present speech frame is voiced or unvoiced.
  • a voiced detector calculates the probability that the present speech frame is unvoiced, the probability that the present speech frame is voiced, and an overall probability that any frame will be unvoiced. Using these three probabilities, the detector then calculates the probability distribution of unvoiced frames and the probability distribution of voiced frames. In addition, the calculation for determining the probability that the present speech frame is voiced or unvoiced is performed by doing a maximum likelihood statistical operation. Also, the maximum likelihood statistical operation is responsive to a weight vector and a threshold value in addition to the probabilities. In another embodiment, the weight vector and threshold value are adaptiyely calculated for each frame. This adaptive calculation of the weight vector and the threshold value allows the detector to rapidly adapt to changing speech environments.
  • an apparatus for determining the presence of the fundamental frequency in frames of speech has a circuit responsive to a set of classifiers, representing the speech attributes of a speech frame for calculating a set of statistical parameters.
  • a second circuit is responsive to the calculated set of parameters defining the statistical distributions to calculate a set of weights each associated with one of the classifiers.
  • a third circuit in response to the calculated set of weights and classifiers and the set of parameters determines the presence of the fundamental frequency in the speech frame or as it is commonly expressed makes the unvoiced/voiced decision.
  • the second circuit also calculates a threshold value and a new weight vector and communicates these values to the first circuit that is responsive to these values and a new set of classifiers for determining another set of statistical parameters.
  • This other set of statistical parameters is then used to determine the presence of the fundamental frequency for the next frame of speech.
  • the first circuit is responsive to the next set of classifiers and the new weight vector and threshold value to calculate the probability that the next frame is unvoiced, the probability that the next frame is voiced, and the overall probability that any frame will be unvoiced. These probabilities are then utilized with a set of values giving the average of classifiers for past and present frames to determine the other set of statistical parameters.
  • the method for determining a voicing decision is performed by the following steps: estimating statistical distributions for voiced and unvoiced frames, determining decision regions representing voiced and unvoiced speech in response to the statistical distributions, and making the voicing decision in response to the decision regions and a present speech frame.
  • the statistical distributions are calculated from the probability that the present speech frame is unvoiced, the probability that the present speech frame is voiced, and the overall probability that any frame will be unvoiced. These three probabilities are calculated as three sub-steps of the step of determining the statistical distributions.
  • FIG. 1 is a block diagram of an apparatus using the present invention
  • FIG. 2 illustrates, in block diagram form, the present invention
  • FIGS. 3 and 4 illustrate, in greater detail, the functions performed by statistical- voiced detector 103 of FIG. 2; and
  • FIG. 5 illustrates, in greater detail, functions performed by block 340 of FIG. 4.
  • FIG. 1 illustrates an apparatus for performing the unvoiced/voiced decision operation using as one of the voiced detectors a statistical voiced detector which is the subject of this invention.
  • the apparatus of FIG. 1 utilizes two types of detectors: discriminant and statistical voiced detectors.
  • Statistical voiced detector 103 is an adaptive detector that detects changes in the voice environment and modifies the weights used to process classifiers coming from classifier generator 101 so as to more accurately make the unvoiced/voiced decision.
  • Discriminant voice detector 102 is utilized during initial start up or rapidly changing voice environment conditions when statistical voice detector 103 has not yet fully adapted to the initial or new voice environment.
  • Classifier generator 101 is responsive to each frame of speech to generate classifiers which advantageously may be the log of the speech energy, the log of the LPC gain, the log area ratio of the first reflection coefficient, and the squared correlation coefficient of two speech segments one frame long which are offset by one pitch period.
  • the calculation of these classifiers involves digitally sampling analog speech, forming frames of the digital samples, and processing those frames and is well known in the art.
  • Generator 101 transmits the classifiers to detectors 102 and 103 via path 106.
  • Detectors 102 and 103 are responsive to the classifiers received via path 106 to make unvoiced voiced decisions and transmit these decisions via paths 107 and 110, respectively, to multiplexer 105.
  • the detectors determine a distance measure between voiced and unvoiced frames and transmit these distances via paths 108 and 109 to comparator 104.
  • these distances may be Mahalanobis distances or other generalized distances.
  • Comparator 104 is responsive to the distances received via paths 108 and 109 to control multiplexer 105 so that the latter multiplexer selects the output of the detector that is generating the largest distance.
  • FIG. 2 illustrates, in greater detail, statistical voiced detector 103.
  • a set of classifiers also referred to as a vector of classifiers is received via path 106 from classifier generator 101.
  • Silence detector 201 is responsive to these classifiers to determine whether or not speech is present in the present frame. If speech is present, detector 201 transmits a signal via path 210. If no speech (silence) is present in the .frame, then only subtracter 207 and U V determinator 205 are operational for that particular frame. Whether speech is present or not, the unvoiced/voiced decision is made for every frame by determinator 205.
  • classifier averager 202 In response to the signal from detector 201, classifier averager 202 maintains an average of the individual classifiers received via path 106 by averaging in the classifiers for the present frame with the classifiers for previous frames. If speech (non-silence) is present in the frame, silence detector 201 signals statistical calculator 203, generator 206, and averager 202 via path 210.
  • Statistical calculator 203 calculates statistical distributions for voiced and unvoiced frames.
  • calculator 203 is responsive to the signal received via path 210 to calculate the overall probability that any frame is unvoiced and the probability that any frame is voiced.
  • statistical calculator 203 calculates the statistical value that each classifier would have if the frame was unvoiced and the statistical value that each classifier would have if the frame was voiced.
  • calculator 203 calculates the covariance matrix of the classifiers.
  • that statistical value may be the mean. The calculations performed by calculator 203 are not only based on the present frame but on previous frames as well.
  • Statistical calculator 203 performs these calculations not only on the basis of the classifiers received for the present frame via path 106 and the average of the classifiers received path 211 but also on the basis of the weight for each classifiers and a threshold value defining whether a frame is unvoiced or voiced received via path 213 from weights calculator 204.
  • Weights calculator 204 is responsive to the probabilities, covariance matrix, and statistical values of the classifiers for the present frame as generated by calculator 203 and received via path 212 to recalculate the values used as weight vector a, for each of the classifiers and the threshold value b, for the present frame. Then, these new values of a and b are transmitted back to statistical calculator 203 via path 213.
  • weights calculator 204 transmits the weights and the statistical values for the classifiers in both the unvoiced and voiced regions via path 214, determinator 205, and path 208 to generator 206.
  • the latter generator is responsive to this information to calculate the distance measure which is subsequently transmitted via path 109 to comparator 104 as illustrated in FIG. 1.
  • U/V determinator 205 is responsive to the information transmitted via paths 214 and 215 to determine whether or not the frame is unvoiced or voiced and to transmit this decision via path 110 to multiplexer 105 of FIG. 1.
  • Averager 202, statistical calculator 203, and weights calculator 204 implement an improved EM algorithm similar to that suggested in the article by N. E. Day entitled “Estimating the Components of a Mixture of Normal Distributions", Biometrika, Vol. 56, no. 3, pp. 463-474, 1969.
  • classifier averager 202 Utilizing the concept of a decaying average, classifier averager 202 calculates the average for the classifiers for the present and previous frames by calculating following equations 1, 2, and 3:
  • n n+l if n ⁇ 2000 (1)
  • x n is a vector representing the classifiers for the present frame, and n is the number of frames that have been processed up to 2000.
  • z represents the decaying average coefficient
  • X n represents the average of the classifiers over the present and past frames.
  • Statistical calculator 203 is responsive to receipt of the z, x n and X n information to calculate the covariance matrix, T, by first calculating the matrix of sums of squares and products, C_ n , as follows:
  • T is calculated as follows:
  • the means are subtracted from the classifiers as follows:
  • calculator 203 determines the probability that the classifiers represent a voiced frame by solving the following:
  • calculator 203 determines the overall probability that any frame will be unvoiced by solving equation 9 for p n :
  • calculator 203 determines two vectors, u and v, which give the mean values of each classifier for both unvoiced and voiced type frames.
  • Vectors u and v are the statistical averages for unvoiced and voiced frames, respectively.
  • Vector u statistical average unvoiced vector, contains the mean values of each classifier if a frame is unvoiced; and vector v, statistical average voiced vector, gives the mean value for each classifier if a frame is voiced.
  • Vector u for the present frame is solved by calculating equation 10
  • vector v is determined for the present frame by calculating equation 11 as follows:
  • v n (1-z) V ⁇ _ ! + z x n P(v I x n )/(l-p n ) - zx n (11)
  • Calculator 203 now communicates the u and v vectors, T matrix, and probabihty p to weights calculator 204 via path 212.
  • Weights calculator 204 is responsive to this information to calculate new values for vector a and scalar b. These new values are then transmitted back to statistical calculator 203 via path 213. This allows detector 103 to adapt rapidly to changing environments.
  • determinator 205 uses vectors u and v as well as vector a and scalar b to make the voicing decision. If n is greater than advantageously 99, vector a and scalar b are calculated as follows.
  • Vector a is determined by solving the following equation:
  • Scalar b is determined by solving the following equation:
  • weights calculator 204 After calculating equations 12 and 13, weights calculator 204 transmits vectors a, u, and v to block 205 via path 214. If the frame contained silence only equation 6 is calculated.
  • Determinator 205 is responsive to this transmitted information to decide whether the present frame is voiced or unvoiced. If the element of vector
  • Equation 14 can also be rewritten as:
  • Equation 15 can also be rewritten as:
  • Equations 14 and 15 represent decision regions for making the voicing decision.
  • the log term of the rewritten forms of equations 14 and 15 can be eliminated with some change of performance.
  • the element corresponding to power is the log of the speech energy.
  • Generator 206 is responsive to the information received via path 214 from calculator 204 to calculate the distance measure, A, as follows.
  • the discriminant variable, d is calculated by equation 16 as follows:
  • P d is initially set to .5.
  • equations 21 through 24 are solved as follows:
  • the probability, P d , that determinator 205 will declare a frame unvoiced is calculated by the following equation:
  • Equation 25 uses Hotelling's two-sample T 2 statistic to calculate the distance measure.
  • the distance measure can also be the Mahalanobis distance which is given in the following equation:
  • Discriminant detector 102 makes the unvoiced/voiced decision by transmitting information to multiplexer 105 via path 107 indicating a voiced frame if a'x + b > 0. If this condition is not true, then detector 102 indicates an unvoiced frame.
  • the values for vector a and scalar b used by detector 102 are advantageously identical to the initial values of a and b for statistical voiced detector 103.
  • Detector 102 determines the distance measure in a manner similar to generator 206 by performing calculations similar to those given in equations 16 through 28.
  • FIGS. 3 and 4 illustrate, in greater detail, the operations performed by statistical voiced detector 103 of FIG.2.
  • Blocks 302 and 300 implement blocks 202 and 201 of FIG. 2, respectively.
  • Blocks 304 through 318 implement statistical calculator 203.
  • Blocks 320 and 322 implement weights calculator 204, and blocks 326 through 338 implement block 205 of FIG.2.
  • Generator 206 of FIG. 2 is implemented by block 340.
  • Subtracter 207 is implemented by block 308 or block 324.
  • Block 302 calculates the vector which represents the average of the classifiers for the present frame and all previous frames.
  • Block 300 determines whether speech or silence is present in the present frame; and if silence is present in the present frame, the mean for each classifier is subtracted from each classifier by block 324 before control is transferred to decision block 326. However, if speech is present in the present frame, then the statistical and weights calculations are performed by blocks 304 through 322.
  • the average vector is found in block 302.
  • Second, the sums of the squares and products matrix is calculated in block 304. The latter matrix along with the vector X representing the mean of the classifiers for the present and past frames is then utilized to calculate the covariance matrix, T, in block 306.
  • Block 310 calculates the probability that the present frame is unvoiced by utilizing the present weight vector a, the present threshold value b, and the classifier vector for the present frame, x n . After calculating the probability that the present frame is unvoiced, the probability that the present frame is voiced is calculated by block 312. Then, the overall probability, p n , that any frame will be unvoiced is calculated by block 314.
  • Blocks 316 and 318 calculate two vectors: u and v.
  • the values contained in vector u represent the statistical average values that each classifier would have if the frame were unvoiced.
  • vector v contains values representing the statistical average values that each classifier would have if the frame were voiced.
  • the actual vectors of classifiers for the present and previous frames are clustered around either vector u or vector v.
  • the vectors representing the classifiers for the previous and present frames are clustered around vector u if these frames are found to be unvoiced; otherwise, the previous classifier vectors are clustered around vector v. *
  • control is transferred to decision block 320.
  • control is transferred to block 322; otherwise, control is transferred to block 326.
  • block 322 Upon receiving control, block 322 then calculates a new weight vector a and a new threshold value b.
  • the vector a and value b are used in the next sequential frame by the preceding blocks in FIG. 3.
  • N is required to be greater than infinity, vector a and scalar b will never be changed, and detector 103 will adapt solely in response to vectors v and u as illustrated in blocks 326 through 338.
  • Blocks 326 through 338 implement u/v determinator 205 of FIG. 2.
  • Block 326 determines whether the power term of vector v of the present frame is greater than or equal to the power term of vector u. If this condition is true, then decision block 328 is executed. The latter decision block determines whether the test for voiced or unvoiced is met. If the frame is found to be voiced in decision block 328, then the frame is so marked as voiced by block 330 otherwise the frame is marked as unvoiced by block 332. If the power term of vector v is less than the power term of vector u for the present frame, blocks 334 through 338 function are executed and function in a similar manner. Finally, block 340 calculates the distance measure.
  • FIG. 5 illustrates, in greater detail the operations performed by block 340 of FIG. 4.
  • Decision block 501 determines whether the frame has been indicated as unvoiced or voiced by examining the calculations 330, 332, 336, or 338. If the frame has been designated as voiced, path 507 is selected.
  • Block 510 calculates probability P d , and block 502 recalculates the mean, n_ ⁇ , for the voiced frames and block 503 recalculates the variance, ki , for voiced frames. If the frame was determined to be unvoiced, decision block 501 selects path 508.
  • Block 509 recalculates probability P d
  • block 504 recalculates mean, mo, for unvoiced frames
  • block 505 recalculates the variance ko for unvoiced frames.
  • block 506 calculates the distance measure by performing the calculations indicated.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Health & Medical Sciences (AREA)
  • Transmission Systems Not Characterized By The Medium Used For Transmission (AREA)
  • Feedback Control In General (AREA)
  • Paper (AREA)
  • Bridges Or Land Bridges (AREA)
  • Measurement Of Radiation (AREA)
  • Radar Systems Or Details Thereof (AREA)
  • Measuring Pulse, Heart Rate, Blood Pressure Or Blood Flow (AREA)
  • Medicines Containing Antibodies Or Antigens For Use As Internal Diagnostic Agents (AREA)
  • Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)

Abstract

L'appareil décrit permet de détecter une fréquence fondamentale dans un signal vocal dans un environnement où les signaux vocaux varient, en utilisant des techniques statistiques adaptatives. Un détecteur vocal statistique (103) détecte les modifications de l'environnement vocal au moyen de classificateurs qui définissent certains attributs du signal vocal afin de recalculer les pondérations qui sont utilisées pour combiner les classificateurs, lesquels permettent de décider si le signal vocal est voisé ou non voisé, de manière à déterminer si le signal vocal présente ou non une fréquence fondamentale. Le détecteur réagit aux classificateurs afin de calculer d'abord la moyenne des classificateurs (202), puis pour déterminer quelle est la probabilité globale qu'un bloc phonique soit non voisé. En outre le détecteur qui utilise un calculateur statistique (203) forme deux vecteurs dont l'un représente la moyenne statistique des valeurs qu'auraient les classificateurs d'un bloc phonique non voisé et dont l'autre représente la moyenne statistique des valeurs des classificateurs pour un bloc phonique voisé. On effectue ces derniers calculs en utilisant non seulement la valeur moyenne des classificateurs ainsi que la valeur des classificateurs en cours d'utilisation mais également un vecteur définissant les pondérations qui sont utilisées pour déterminer si un bloc phonique est voisé ou non voisé plus une valeur seuil. Un calculateur de pondérations (204) réagit aux informations produites dans le calcul statistique, afin de produire un nouvel ensemble de valeurs pour le vecteur des pondérations et pour la valeur seuil, lesquelles sont utilisées par le calculateur statistique lors du traitement du bloc phonique suivant. Un déterminateur de signal voisé/non voisé (205) réagit ensuite aux deux vecteurs moyens statistiques et aux vecteurs des pondérations pour décider si le signal vocal est voisé ou non voisé.
PCT/US1988/000030 1987-04-03 1988-01-12 Appareil d'estimation de variations multiples utilisant des techniques adaptatives WO1988007738A1 (fr)

Priority Applications (5)

Application Number Priority Date Filing Date Title
JP62506332A JPH01502779A (ja) 1987-04-03 1988-01-12 適応多変数推定装置
DE8888901347T DE3875894T2 (de) 1987-04-03 1988-01-12 Adaptive multivariable analyseeinrichtung.
AT88901347T ATE82426T1 (de) 1987-04-03 1988-01-12 Adaptive multivariable analyseeinrichtung.
SG598/93A SG59893G (en) 1987-04-03 1993-05-07 An adaptive multivariate estimating apparatus
HK1066/93A HK106693A (en) 1987-04-03 1993-10-07 An adaptive multivariate estimating apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US3429687A 1987-04-03 1987-04-03
US034,296 1987-04-03

Publications (1)

Publication Number Publication Date
WO1988007738A1 true WO1988007738A1 (fr) 1988-10-06

Family

ID=21875521

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1988/000030 WO1988007738A1 (fr) 1987-04-03 1988-01-12 Appareil d'estimation de variations multiples utilisant des techniques adaptatives

Country Status (9)

Country Link
EP (1) EP0308433B1 (fr)
JP (1) JPH01502779A (fr)
AT (1) ATE82426T1 (fr)
AU (1) AU599459B2 (fr)
CA (2) CA1337708C (fr)
DE (1) DE3875894T2 (fr)
HK (1) HK106693A (fr)
SG (1) SG59893G (fr)
WO (1) WO1988007738A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0566131A3 (fr) * 1992-04-15 1994-03-30 Sony Corp
CN104517614A (zh) * 2013-09-30 2015-04-15 上海爱聊信息科技有限公司 基于各子带特征参数值的清浊音判决装置及其判决方法

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0795239B2 (ja) * 1987-04-03 1995-10-11 アメリカン テレフォン アンド テレグラフ カムパニー 音声フレーム中の基本周波数の存在を検出する装置および方法
US6202046B1 (en) 1997-01-23 2001-03-13 Kabushiki Kaisha Toshiba Background noise/speech classification method
JP3670217B2 (ja) * 2000-09-06 2005-07-13 国立大学法人名古屋大学 雑音符号化装置、雑音復号装置、雑音符号化方法および雑音復号方法
JP4517045B2 (ja) * 2005-04-01 2010-08-04 独立行政法人産業技術総合研究所 音高推定方法及び装置並びに音高推定用プラグラム

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0795239B2 (ja) * 1987-04-03 1995-10-11 アメリカン テレフォン アンド テレグラフ カムパニー 音声フレーム中の基本周波数の存在を検出する装置および方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
1978 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH & SIGNAL PROCESSING, 10-12 April 1978, Tulsa, Oklahoma, IEEE, (New York, US), V.V.S. SARMA et al., "Studies on Pattern Recognition Approach to Voiced-Unvoiced-Silence Classification", pages 1-4. *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0566131A3 (fr) * 1992-04-15 1994-03-30 Sony Corp
US5664052A (en) * 1992-04-15 1997-09-02 Sony Corporation Method and device for discriminating voiced and unvoiced sounds
CN104517614A (zh) * 2013-09-30 2015-04-15 上海爱聊信息科技有限公司 基于各子带特征参数值的清浊音判决装置及其判决方法

Also Published As

Publication number Publication date
EP0308433B1 (fr) 1992-11-11
SG59893G (en) 1993-07-09
AU599459B2 (en) 1990-07-19
DE3875894T2 (de) 1993-05-19
CA1337708C (fr) 1995-12-05
JPH01502779A (ja) 1989-09-21
EP0308433A1 (fr) 1989-03-29
JPH0795237B1 (fr) 1995-10-11
AU1222688A (en) 1988-11-02
DE3875894D1 (en) 1992-12-17
CA1338251C (fr) 1996-04-16
ATE82426T1 (de) 1992-11-15
HK106693A (en) 1993-10-15

Similar Documents

Publication Publication Date Title
US6993481B2 (en) Detection of speech activity using feature model adaptation
US5715372A (en) Method and apparatus for characterizing an input signal
EP0548054B1 (fr) Dispositif de détection de la présence d'un signal de parole
KR100636317B1 (ko) 분산 음성 인식 시스템 및 그 방법
US6314396B1 (en) Automatic gain control in a speech recognition system
CN108922513B (zh) 语音区分方法、装置、计算机设备及存储介质
JP2002140096A (ja) 信号処理システム
EP1766614A2 (fr) Extension de largeur de bande artificielle sur la base d'une neuroevolution .
US5046100A (en) Adaptive multivariate estimating apparatus
US5007093A (en) Adaptive threshold voiced detector
EP0308433B1 (fr) Appareil d'estimation de variations multiples utilisant des techniques adaptatives
US4972490A (en) Distance measurement control of a multiple detector system
JP2001520764A (ja) スピーチ分析システム
FI111572B (fi) Menetelmä puheen käsittelemiseksi akustisten häiriöiden läsnäollessa
EP0309561B1 (fr) Detecteur de signal vocal voise utilisant des valeurs seuil adaptatives
CN112786068B (zh) 一种音频音源分离方法、装置及存储介质
EP0310636B1 (fr) Commande de mesure de la distance d'un systeme a detecteurs multiples
Bertocco et al. In-service nonintrusive measurement of noise and active speech level in telephone-type networks
Grimaldi An improved procedure for QoS measurement in telecommunication systems
Yamazaki et al. An objective method for evaluating the quality of speech with code errors using pattern matching techniques
JPH11224097A (ja) 音声の有音/休止判定方法およびその装置

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AU JP

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH DE FR GB IT LU NL SE

WWE Wipo information: entry into national phase

Ref document number: 1988901347

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 1988901347

Country of ref document: EP

WWG Wipo information: grant in national office

Ref document number: 1988901347

Country of ref document: EP