CN1151490C - High-accuracy high-resolution base frequency extracting method for speech recognization - Google Patents

High-accuracy high-resolution base frequency extracting method for speech recognization

Info

Publication number
CN1151490C
CN1151490C CNB001247115A CN00124711A CN1151490C CN 1151490 C CN1151490 C CN 1151490C CN B001247115 A CNB001247115 A CN B001247115A CN 00124711 A CN00124711 A CN 00124711A CN 1151490 C CN1151490 C CN 1151490C
Authority
CN
China
Prior art keywords
fundamental frequency
frequency
score
per
fundamental
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
CNB001247115A
Other languages
Chinese (zh)
Other versions
CN1342968A (en
Inventor
波 徐
徐波
张健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CNB001247115A priority Critical patent/CN1151490C/en
Publication of CN1342968A publication Critical patent/CN1342968A/en
Application granted granted Critical
Publication of CN1151490C publication Critical patent/CN1151490C/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Abstract

The present invention relates to a high-accuracy high-resolution fundamental frequency extracting method for speech recognition, which makes a frequency domain, a time domain analysis and a dynamic programming (DP) combined together. The present invention is characterized in that speech signals are processed by FFT transformation at the frequency domain for harmonic analysis; a plurality of candidates of fundamental frequencies are selected by peak detection; the candidate fundamental frequencies are evaluated by autocorrelation coefficients at the time domain; the analytic results of the frequency domain and the time domain, and the fundamental frequency variation are backdated and synthesized in an indefinite length mode by a dynamic programming algorithm to determine an optimal fundamental frequency contour line. In order to guarantee the resolution extracted by the fundamental frequencies, the present invention adopts the methods of reducing sampling rate, interpolation, etc.

Description

The high-accuracy high-resolution fundamental frequency extracting method that is used for speech recognition
Technical field
The present invention has extracted a kind of fundamental frequency extracting method that frequency domain, time-domain analysis and dynamic programming (DP) are combined, and belongs to signal Processing and field of speech recognition.
Background technology
At present, under the continuous speech condition, extracting reliable fundamental frequency change curve is the basis of Chinese language tone identification and modeling.But the difficulty that fundamental frequency extracts mainly is, is subjected to the influence of factors such as sound channel resonance peak and noise, and frequency multiplication and the half frequency value at correct fundamental frequency has than the more obvious feature of fundamental frequency (bigger as corresponding peaks) sometimes.General independent time domain or the frequency domain of utilizing of traditional method carries out the fundamental frequency feature extraction, and only extracts the basis of a single fundamental frequency value as subsequent treatment such as median smoothing at all time points.So only with a kind of method of frequency domain or time domain, only extract fundamental frequency and some mistakes usually occur from an extreme standard (as only getting a certain largest peaks), cause the fundamental frequency track that obtains that obvious discontinuous variation is arranged, this does not obviously square with the fact.As: the test findings that J.Sundberg1979 is published on the J.of Phonetics shows that the rate of change of fundamental frequency is in the scope of about 1%/ms.Obvious discontinuous variation for fundamental frequency, can be with certain the remedying in addition of traditional DP (dynamic programming) method, but must handling one whole section voice, traditional DP algorithm could export the fundamental frequency track, this delay is very difficult received in real-time speech recognition system, all are in the application of some other DP algorithm, and the method that all adopts fixed length to recall addresses this problem.
The subject matter that fundamental frequency extracts is:
1. on frequency domain, the frequency spectrum that voice signal obtains through FFT (fast fourier transform) is at F 0/ 2, F 0, 2F 0, 3F 0... locate often to have bigger peak (F to occur 0Represent correct fundamental frequency), as shown in Figure 2, the spectrum that a unvoiced frame obtains after the FFT computing, it is at F 0, 2F 0, 3F 0There is tangible peak at the place, but fundamental frequency F 0The peak value at place does not but obviously have 2F 0The place is high.If any algorithm only the frequency of maximum peak correspondence as fundamental frequency; Perhaps someways extract a plurality of peaks, and calculate between these peak values range averaging with the estimation fundamental frequency.But some peak value is not fairly obvious sometimes, even the phenomenon of " fundamental tone disappearance " (missing fundamental) can occur, and this has caused very big difficulty for traditional method of doing fundamental detection with spectrum analysis.
2. on time domain, a kind of representational fundamental frequency extracting method is the short-time autocorrelation function method.The computing formula of short-time autocorrelation function is:
R [ m ] = 1 N - m Σ i = 0 N - m - 1 x i x i + m - - - ( 1 )
X wherein iRepresent i sampling point value in the frame, N represents the total sampled point number of every frame, R[m] autocorrelation value when indication cycle is m sampled point.According to the character of autocorrelation function, the autocorrelation function of cyclical signal also is periodic, and identical with the cycle of voice signal, in general, if T 0Represent pitch period (T 0=1/F 0), except R[0], T 0The peak value maximum at place.Fig. 4 has provided the autocorrelator trace of a unvoiced frame, can see, at T 0, 2T 0, 3T 0There is bigger peak value at the place, but T 0The peak height and the 2T at place 0, 3T 0Very approaching.So the maximum peak that detects autocorrelation function can be used as a kind of method of estimating fundamental frequency, but similar with top situation, at T 0/ 2,2T 0, 3T 0... neighbouring usually have bigger peak value, and maximum peak occurs sometimes in these peaks.This brings certain limitation for this algorithm.
In addition, resolution and calculated amount have caused certain difficulty also for the extract real-time of fundamental frequency.
For example in the method that adopts the FFT spectrum analysis, the resolution that fundamental frequency extracts is directly proportional with counting of FFT calculating, and its calculated amount is to be exponential increase thereupon, and this just causes and can not calculate with the FFT that counts too much; (the fundamental frequency scope that refers to human speech) calculates auto-correlation with a large amount of multiplying of search higher value needs in all effective frequency domain scope, is very consuming time, needing not too to be suitable in the speech recognition system of processing in real time; The candidate value number of DP algorithm also influences the speed of decision of DP, if with all fundamental frequencies in effective frequency domain scope all as the candidate, also certainly will increase the weight of the burden of making a strategic decision greatly.
Summary of the invention
The objective of the invention is to: propose a kind of new fundamental frequency extracting method.The basic thought of this method should be: earlier voice signal is carried out the FFT conversion on frequency domain, again it is carried out frequency analysis, detect by peak value then, select the candidate of several fundamental frequencies, on time domain, these candidate's fundamental frequencies are evaluated and tested with coefficient of autocorrelation, determined the fundamental frequency track of an optimum again with dynamic programming algorithm.
Technical essential of the present invention is: a kind of speech recognition fundamental frequency analysis of high-accuracy high-resolution fundamental frequency extracting method in conjunction with time-domain and frequency-domain, extract possible several fundamental frequency candidates, and adopt a kind of coefficient of autocorrelation, the objective function that spectrum analysis result and fundamental frequency variable quantity combine is searched for dynamic programming algorithm, before carrying out auto-correlation calculating, the spectrum analysis result who uses according to speech recognition, utilize the FFT harmonic analysis method, fundamental frequency scope to voice is carried out preliminary estimation, reduce the scope that auto-correlation is calculated, and fundamental frequency extraction precision is not exerted an influence; The feature of harmonic analysis method comprises down-sampled rate, pre-emphasis and windowing, FFT calculating and carries out steps such as secondary Spline interpolation in order to improve resolution.
1.FFT the frequency spectrum of trying to achieve adopts harmonic analytic method, at first for each fundamental frequency of effective fundamental frequency scope calculate adding up of several harmonic waves and.If use x (i)The voice signal of representing a frame is represented its FFT conversion with X (f), the power spectrum that S (f) expression obtains, that is:
S(f)=|X(f)| 2 (2)
The result that adds up of harmonic wave is so:
H ( f ) = Σ n = 1 HN h n S ( nf ) - - - ( 3 )
HN represents the number of harmonic wave, h nBe the power of n harmonic wave.Power spectrum is not carried out logarithm Log computing in this method, linear frequency domain is not transformed into the logarithm frequency domain yet, promptly ask Log (F 0), this has reduced calculated amount to a certain extent.And then from the H as a result (f) of frequency analysis, select the peak of top's maximum by linear search, (establish P 1, P 2... P PNRepresent this several peak values), with the candidate value of correspondent frequency as fundamental frequency.Calculate the relative maximum peak P of each peak value simultaneously MaxSize as the grading parameters H of back DP algorithm Per, that is:
H per ( i ) = P i / P max , i = 1 . . PN - - - ( 4 )
The harmonic wave that example such as Fig. 3 that frequency analysis fundamental frequency peak value detects, Fig. 3 have provided the unvoiced frame figure that adds up is through Harmonics Calculation, at F 0There is a tangible maximum peak at the place, together with the several peaks (representing with circle) that also have other in addition, all as the candidate of fundamental frequency.
2. on time domain, the fundamental frequency candidate that frequency domain is obtained calculates autocorrelation value { R[i] }, its result R[0] carried out normalization, i.e. coefficient of autocorrelation: R Per(i)=R[i]/R[0].As mentioned above, the peakedness ratio of the coefficient of autocorrelation of voiced sound (voiced) frame is bigger, with 1 more approaching, as shown in Figure 4, and do not have the autocorrelative peak value of non-voiced sound (unvoiced) frame of fundamental frequency all very as shown in Figure 5 low, so coefficient of autocorrelation can be used as the foundation that pure and impure sound is differentiated, and can filter out some tangible non-unvoiced frames.Utilize coefficient of autocorrelation, the limited fundamental frequency candidate that frequency domain is obtained gives a mark and screens, both reduced the calculated amount of autocorrelative calculated amount and follow-up DP algorithm, also made simultaneously when selecting fundamental frequency the comprehensively analysis result of frequency domain and time domain with DP algorithm, more reasonable.
3. dynamic programming algorithm is a kind of widely used optimization algorithm.It turns to the subprocess of a plurality of single phases decision-making to multistage decision process, makes computational short cut.Each decision-making is given a mark respectively according to scoring function to all possible paths, according to the size of score the path is reduced, and exports the path of score optimum (maximum or minimum) at last.The each fundamental frequency candidate who handles a frame of this DP algorithm carries out continuation to original fundamental frequency path, and to each paths marking, it is the highest to keep some scores then.The score formula is:
Score(i)=max{Score(i-1)-D(i,j)}+aR per(i)+bH per(i) (5)
Wherein Score (i) represents the score of this fundamental frequency path at the i frame, a, and b is expression R Per(i) and H Per(i) coefficient of weight, D (i, j) the fundamental frequency p of expression i frame iJ fundamental frequency candidate p with the i-1 frame jDistance
D(i,j)=2*|p i-p j|/(p i+p j)
The scoring function that this method adopts has been considered the factor of three aspects: continuity, coefficient of autocorrelation accumulated value ∑ R that fundamental frequency changes Per(i) maximum, the harmonic wave accumulated value ∑ H of relative peak that adds up Per(i) maximum.
These three aspects all are the important evidence that fundamental frequency is differentiated.Latter two factors was mentioned in front.
Wherein the continuity of fundamental frequency variation is that the fundamental frequency track of considering human pronunciation can not produce the principle of suddenling change, to possible fundamental frequency path marking.
This algorithm does not have the characteristics of obvious fundamental frequency according to non-voiced sound, has adopted the random length retrogressive method.Concrete grammar is as follows: first frame with continuous unvoiced frame of this DP algorithm begins, and does end with the last frame of continuous unvoiced frame, carries out the part of dynamic random length and recalls.Because the fundamental frequency that proposes at non-unvoiced frame is nonsensical, the continuity of fundamental frequency also can't embody after entering unvoiced frame.If in the overall situation, carry out dynamic programming, then certainly will add too much interference mark, make fundamental frequency track generation random offset, have a strong impact on the effect of DP algorithm sometimes.The continuous basis for estimation zero-crossing rate detection of unvoiced frame and coefficient of autocorrelation and the harmonic wave relative peak that adds up in this method, chosen the threshold value that better discrimination is arranged according to statistics, do not exist fundamental frequency candidate's frame to be judged as non-voiced sound through after the threshold filtering, do not participate in DP algorithm and ask the fundamental frequency track.Reducing major part does not like this have the non-unvoiced frame of fundamental frequency, greatly reduces their influences to extracting fundamental frequency with DP algorithm, has reduced mistake; Both guarantee simultaneously the continuity of fundamental frequency track, also significantly reduced the delay of DP algorithm.The non-unvoiced frame of minority of being included can not have big influence to DP algorithm; The unvoiced frame that minority is not included can well be compensated in line.
4. other measure.
(a) zero-crossing rate detects.As mentioned above, for differentiation rate that improves voiced sound and non-voiced sound and the speed that improves the fundamental frequency extraction, voice signal has at first been carried out the zero-crossing rate detection.Preestablish a bigger threshold value, zero-crossing rate surpasses the non-voiced sound that is judged as of this value, does not carry out fundamental frequency and extracts.Otherwise extract fundamental frequency with FFT/DP algorithm cited below.The bigger reason that threshold value is established is to guarantee that voiced sound is not filtered.In order to reduce the influence of waveform skew, the reference value of zero-crossing rate is not zero, but the average of frame data.
(b) down-sampled rate and interpolation.Reduce fundamental frequency extraction precision for improving, carried out down-sampled rate processing, the frequency spectrum that is obtained by FFT has been carried out interpolation carrying out FFT conversion voice signal before.If the crude sampling rate is SR, after sampling rate is reduced to original 1/RDC, carry out the FFT conversion that FFTLen is ordered, the power spectrum that obtains is used Inpl_N point interpolation (that is: every adjacent amount FFT point interleaves Inpl_N-1 point) again, and the fundamental frequency that obtains so extracts resolution and is exactly:
SR RDC * FFTLen * Inpl _ N - - - ( 7 )
In present embodiment, original signal sampling rate SR=16000Hz is reduced to 1/4, that is: RDC=4, and the FFT conversion is counted and is FFTLen=512, interpolation Inpl_N=20 point.Extracting resolution like this is exactly 0.39Hz.And for the autocorrelation function extraction method of 384 sampled points of every frame, extraction resolution is 16000/N 2(N is the signal period corresponding sampling points), promptly approximately from 15Hz to 0.4Hz, obviously it has sufficiently high resolution in the fundamental frequency scope.
Adopt the advantage of foregoing invention to be: to extract this two kinds of methods in conjunction with time domain and frequency domain fundamental frequency, can carry out effective complementation: by frequency domain handle according to harmonic wave and size determine the fundamental frequency candidate, the centre does not have artificial settings such as thresholding, can reduce the uncertainty amount of calculating; On the dynamic programming function, considered time domain auto-correlation and frequency domain harmonic wave and these two and the closely-related important indicator of fundamental frequency simultaneously, make it maximum, guaranteed the continuity and the reliability of fundamental frequency; Random length is recalled the theory time-delay that then can reduce speech recognition system, and this requires more intense extremely important as call voice identification etc. to real-time.
For validity more of the present invention, disturb strong, fundamental frequency to extract two difficult voice and extracted 725 frame voiced sounds from sound channel, through artificial judgment, with tangible discontinuous fundamental frequency as identification error.Do not adopting before fundamental frequency candidate among the present invention adds DP algorithm, the identification error rate is 11.7%, is 0.4% after adopting.The accuracy rate of identification improves greatly.
In conjunction with foregoing description, the flow process that complete fundamental frequency extracts as shown in Figure 6:
(1) signal segmentation, the voice signal of input at first is split into some frames, and adjacent two frames have certain overlapping; Respectively every frame is carried out following processing then;
(2) zero-crossing rate detects, and calculates average zero passage number of times, carries out the guestimate of pure and impure sound; The zero passage number of times is higher than given threshold value and is judged as non-unvoiced frame, does not carry out fundamental frequency and extracts;
(3) down-sampled rate had both improved fundamental frequency and had extracted resolution, can guarantee again not lose that 1250Hz is above that fundamental frequency is extracted significant frequency content;
(4) pre-emphasis and windowing, the purpose that this one handles is to have reduced frequency alias.What adopt is the hamming window.Formula is:
Figure C0012471100111
Wherein h (n) represents window function, and N represents the length of window;
(5) FFT calculates and asks power spectrum, employing be 512 FFT (fast fourier transform).Compose with formula (2) rated output;
(6) interpolation in order to improve the precision that fundamental frequency extracts, is inserted Inpl_N=20 value between per two FFT point values of frequency spectrum, use be secondary Spline interpolation method;
(7) calculate that harmonic wave adds up and, the formula that harmonic wave adds up is formula (3):
(8) determine the fundamental frequency candidate, from harmonic wave adds up spectrum, select several peaks, the fundamental frequency that peaking point is corresponding,
And the relative height of this peak value and maximum peak, that is: H Per
(9) candidate to a plurality of fundamental frequencies asks corresponding coefficient of autocorrelation, that is: R on time domain PerFor having low H PerValue or R PerThe fundamental frequency candidate of value is filtered, to reduce the operand of DP;
(10) ask the fundamental frequency path with DP algorithm, calculate the score of every paths with DP algorithm, the score formula is formula (5). According to score, get several paths of top and note;
(11), then export as the result who extracts with the fundamental frequency path of optimum according to the score of DP algorithm if handle one section continuous unvoiced frame; Otherwise returned for the 2nd step next frame is done the zero-crossing rate detection; Handle whole input signals, enter normalization and be connected,, carry out normalization, can eliminate speaker's difference like this with average fundamental frequency to one section voice of handling; Connection is that the consonant part that does not have fundamental frequency is partly carried out level and smooth being connected with the vowel with fundamental frequency.
Description of drawings
Fig. 1. fundamental frequency extracts example as a result, and the left side is the fundamental frequency track synoptic diagram of tone correspondence in 4 in the isolated voice, and the right is the waveform of continuous speech, the fundamental frequency track and the voice content (from top to bottom) of extraction;
Fig. 2. the spectrogram that a unvoiced frame obtains through FFT result.At F 0, 2F 0, 3F 0There is tangible peak at the place, but F 0The peak at place does not have 2F 0Obviously;
Fig. 3. the harmonic wave of the unvoiced frame figure that adds up, at F 0Tangible maximum peak is arranged, together with also having other several peaks (shown in the circle) in addition, as the candidate of fundamental frequency;
Fig. 4. the autocorrelator trace figure of a unvoiced frame, at T 0, 2T 03T 0There is bigger peak value at the place, but T 0The height and the 2T at peak, place 0, 3T 0Very approaching;
Fig. 5. the autocorrelator trace figure of a unvoiced frames, except R[0] peak value thought is all lower, also do not have obvious periodic;
Fig. 6. fundamental frequency extracts process flow diagram.
Embodiment
A tone that distinguishing feature is exactly a Chinese of Chinese speech.The standard Chinese tone generally is divided into, two, three, the four tones of standard Chinese pronunciation, that is: high and level tone, rising tone, last sound, falling tone.Also have softly (zero tone) in addition.The female syllable of forming of identical sound has diverse meaning, corresponding different word or speech probably.(for example: rise, week, prison term, the departure date, stink smell.) therefore, tone has very important significance in the identification of Chinese language.
Tone is embodied in fundamental frequency F 0The variation of track.What Fig. 1 represented on the left side is the tone and the fundamental frequency contrast of isolated voice, and the right is the contrast figure of continuous speech tone and fundamental frequency.And the fundamental frequency extracting method that the present invention proposes can provide fundamental frequency track comparatively accurately for speech recognition, is used for Tone recognition; Particularly can be with the fundamental frequency that extracts parameter as speech recognition system, or set up the tone model, to reflect phonetic feature more comprehensively, instruct the processing of language, improve the accuracy of identification.
The accurate extraction of fundamental frequency track also can be used as the important evidence of information Recognition such as syntactic structure in the continuous speech, word are read again, speaker's purpose; The tone of Chinese is the difficult point in the Chinese teaching, the pronunciation that the demonstration by fundamental curve can the interactively instruction of papil, thereby can be used in the teaching of standard Chinese.Though tone can be subjected to contextual influence, tone information more is not subjected to the influence of acoustic enviroment and passage, thereby from some angles, tone is significant for the reliability that improves the voice messaging processing.

Claims (1)

1. a high-accuracy high-resolution fundamental frequency extracting method that is used for speech recognition is characterized in that, the step that fundamental frequency extracts is:
(1) signal segmentation, the voice signal of input at first is split into some frames, and adjacent two frames have certain overlapping; Each frame is handled successively according to the following steps:
(2) zero-crossing rate detects, and calculates average zero passage number of times, carries out the guestimate of pure and impure sound; The zero passage number of times is higher than given threshold value and is judged as non-unvoiced frame, does not carry out fundamental frequency and extracts;
(3) down-sampled rate is guaranteeing not lose the suitable sampling rate that reduces under the prerequisite of below the 1250Hz fundamental frequency being extracted significant frequency content;
(4) pre-emphasis and windowing, employing be the hamming window, formula is:
Wherein h (n) represents window function, and N represents the length of window;
(5) FFT calculates and asks power spectrum, employing be the FFT (fast fourier transform) of multiple spot, with formula rated output spectrum:
S(f)=|X(f)| 2
Wherein use the FFT conversion of X (f) expression signal, S (f) represents power spectrum;
(6) interpolation is inserted Inpl_N value between per two FFT point values of frequency spectrum, use be secondary Spline (batten) interpolation method;
(7) calculate that harmonic wave adds up and, obtain the harmonic wave spectrum that adds up, its computing formula is:
H ( f ) = Σ n = 1 HN h n S ( nf )
Wherein S (f) expression is through the power spectrum after the interpolation, and HN represents the maximum number of harmonic wave, h nBe the weights of n harmonic wave, the harmonic wave of H (f) the expression frequency f correspondence spectrum that adds up;
(8) peak value detects, and determines the fundamental frequency candidate, selects several peaks from harmonic wave adds up spectrum, and the fundamental frequency of peaking point correspondence is as the fundamental frequency candidate, and the relative height of this peak value and maximum peak, i.e. H Per
(9) candidate to a plurality of fundamental frequencies asks corresponding coefficient of autocorrelation, i.e. R on time domain PerFor having low H PerValue or R PerThe fundamental frequency candidate of value is filtered, to reduce the operand of next step dynamic programming (DP);
(10) ask the fundamental frequency track with dynamic programming (DP) algorithm, calculate the score of every track with DP algorithm; The score formula be Score (i)=max{Score (i-1)-D (i, j) }+aR Per(i)+bH Per(i).According to score, several track records obtaining the branch top get off; Score in the score formula (i) represents the score of this fundamental frequency path at the i frame, and a, b are respectively expression R Per(i) and H Per(i) coefficient of weight, D (i, j) the fundamental frequency p of expression i frame iJ fundamental frequency candidate p with the i-1 frame iDistance, its computing formula is: D (i, j)=2*|p i-p j|/(p i+ p j)
(11) handle one section continuous unvoiced frame, then export as the result who extracts with the fundamental frequency track of optimum according to the score of DP algorithm; Otherwise returned for (2) step next frame is done the zero-crossing rate detection; Handle whole input signals, enter normalization and line; To one section voice of handling, carry out normalization with average fundamental frequency; Connection is that the non-voiced sound part that does not have fundamental frequency is partly carried out level and smooth being connected with the voiced sound with fundamental frequency.
CNB001247115A 2000-09-13 2000-09-13 High-accuracy high-resolution base frequency extracting method for speech recognization Expired - Lifetime CN1151490C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB001247115A CN1151490C (en) 2000-09-13 2000-09-13 High-accuracy high-resolution base frequency extracting method for speech recognization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB001247115A CN1151490C (en) 2000-09-13 2000-09-13 High-accuracy high-resolution base frequency extracting method for speech recognization

Publications (2)

Publication Number Publication Date
CN1342968A CN1342968A (en) 2002-04-03
CN1151490C true CN1151490C (en) 2004-05-26

Family

ID=4590609

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB001247115A Expired - Lifetime CN1151490C (en) 2000-09-13 2000-09-13 High-accuracy high-resolution base frequency extracting method for speech recognization

Country Status (1)

Country Link
CN (1) CN1151490C (en)

Families Citing this family (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100580768C (en) * 2005-08-08 2010-01-13 中国科学院声学研究所 Voiced sound detection method based on harmonic characteristic
KR100762596B1 (en) * 2006-04-05 2007-10-01 삼성전자주식회사 Speech signal pre-processing system and speech signal feature information extracting method
CN1835075B (en) * 2006-04-07 2011-06-29 安徽中科大讯飞信息科技有限公司 Speech synthetizing method combined natural sample selection and acaustic parameter to build mould
CN101030375B (en) * 2007-04-13 2011-01-26 清华大学 Method for extracting base-sound period based on dynamic plan
CN101727902B (en) * 2008-10-29 2011-08-10 中国科学院自动化研究所 Method for estimating tone
CN102176313B (en) * 2009-10-10 2012-07-25 北京理工大学 Formant-frequency-based Mandarin single final vioce visualizing method
CN102163428A (en) * 2011-01-19 2011-08-24 无敌科技(西安)有限公司 Method for judging Chinese pronunciation
CN102783034B (en) * 2011-02-01 2014-12-17 华为技术有限公司 Method and apparatus for providing signal processing coefficients
CN102842305B (en) * 2011-06-22 2014-06-25 华为技术有限公司 Method and device for detecting keynote
CN103366736A (en) * 2012-03-29 2013-10-23 北京中传天籁数字技术有限公司 Phonetic tone identification method and phonetic tone identification apparatus
CN104251934B (en) * 2013-06-26 2018-08-14 华为技术有限公司 Harmonic analysis method and device and the method and apparatus for determining clutter between harmonic wave
CN103680518A (en) * 2013-12-20 2014-03-26 上海电机学院 Voice gender recognition method and system based on virtual instrument technology
CN103871417A (en) * 2014-03-25 2014-06-18 北京工业大学 Specific continuous voice filtering method and device of mobile phone
CN103943104B (en) * 2014-04-15 2017-03-01 海信集团有限公司 A kind of voice messaging knows method for distinguishing and terminal unit
JP6716466B2 (en) * 2014-04-28 2020-07-01 マサチューセッツ インスティテュート オブ テクノロジー Monitoring vital signs by radio reflection
CN104200818A (en) * 2014-08-06 2014-12-10 重庆邮电大学 Pitch detection method
CN104217722B (en) * 2014-08-22 2017-07-11 哈尔滨工程大学 A kind of dolphin whistle signal time-frequency spectrum contour extraction method
CN105551501B (en) * 2016-01-22 2019-03-15 大连民族大学 Harmonic signal fundamental frequency estimation algorithm and device
CN107045875B (en) * 2016-02-03 2019-12-06 重庆工商职业学院 fundamental tone frequency detection method based on genetic algorithm
CN106205638B (en) * 2016-06-16 2019-11-08 清华大学 A kind of double-deck fundamental tone feature extracting method towards audio event detection
US10989803B1 (en) 2017-08-21 2021-04-27 Massachusetts Institute Of Technology Security protocol for motion tracking systems
CN107833581B (en) * 2017-10-20 2021-04-13 广州酷狗计算机科技有限公司 Method, device and readable storage medium for extracting fundamental tone frequency of sound
CN108447505B (en) * 2018-05-25 2019-11-05 百度在线网络技术(北京)有限公司 Audio signal zero-crossing rate processing method, device and speech recognition apparatus
CN109346109B (en) * 2018-12-05 2020-02-07 百度在线网络技术(北京)有限公司 Fundamental frequency extraction method and device
CN110379438B (en) * 2019-07-24 2020-05-12 山东省计算中心(国家超级计算济南中心) Method and system for detecting and extracting fundamental frequency of voice signal

Also Published As

Publication number Publication date
CN1342968A (en) 2002-04-03

Similar Documents

Publication Publication Date Title
CN1151490C (en) High-accuracy high-resolution base frequency extracting method for speech recognization
Drugman et al. Glottal closure and opening instant detection from speech signals
CN105825852A (en) Oral English reading test scoring method
CN108896878B (en) Partial discharge detection method based on ultrasonic waves
CN103886871B (en) Detection method of speech endpoint and device thereof
Deshmukh et al. Use of temporal information: Detection of periodicity, aperiodicity, and pitch in speech
CN101872616B (en) Endpoint detection method and system using same
CN1991976A (en) Phoneme based voice recognition method and system
CN110599987A (en) Piano note recognition algorithm based on convolutional neural network
US20030088401A1 (en) Methods and apparatus for pitch determination
CN104143324B (en) A kind of musical tone recognition method
CN1527994A (en) Fast frequency-domain pitch estimation
CN108305639B (en) Speech emotion recognition method, computer-readable storage medium and terminal
CN103366735B (en) The mapping method of speech data and device
CN102054480A (en) Method for separating monaural overlapping speeches based on fractional Fourier transform (FrFT)
CN108847252B (en) Acoustic feature extraction method based on acoustic signal spectrogram texture distribution
CN103366759A (en) Speech data evaluation method and speech data evaluation device
CN108682432B (en) Speech emotion recognition device
CN110299141A (en) The acoustic feature extracting method of recording replay attack detection in a kind of Application on Voiceprint Recognition
Li et al. A comparative study on physical and perceptual features for deepfake audio detection
CN114627892A (en) Deep learning-based polyphonic music and human voice melody extraction method
KR100393899B1 (en) 2-phase pitch detection method and apparatus
CN202758611U (en) Speech data evaluation device
US4982433A (en) Speech analysis method
CN111091816B (en) Data processing system and method based on voice evaluation

Legal Events

Date Code Title Description
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C06 Publication
PB01 Publication
C14 Grant of patent or utility model
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20020403

Assignee: The purple winter of Beijing is voice technology company limited with keen determination

Assignor: Institute of Automation, Chinese Academy of Sciences

Contract record no.: 2015110000014

Denomination of invention: High-accuracy high-resolution base frequency extracting method for speech recognization

Granted publication date: 20040526

License type: Common License

Record date: 20150519

LICC Enforcement, change and cancellation of record of contracts on the licence for exploitation of a patent or utility model
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20020403

Assignee: Taro Technology (Hangzhou) Co., Ltd.

Assignor: The purple winter of Beijing is voice technology company limited with keen determination

Contract record no.: 2015110000050

Denomination of invention: High-accuracy high-resolution base frequency extracting method for speech recognization

Granted publication date: 20040526

License type: Common License

Record date: 20151130

LICC Enforcement, change and cancellation of record of contracts on the licence for exploitation of a patent or utility model
CX01 Expiry of patent term

Granted publication date: 20040526

CX01 Expiry of patent term