WO2012175054A1 - 一种基音检测的方法和装置 - Google Patents

一种基音检测的方法和装置 Download PDF

Info

Publication number
WO2012175054A1
WO2012175054A1 PCT/CN2012/077456 CN2012077456W WO2012175054A1 WO 2012175054 A1 WO2012175054 A1 WO 2012175054A1 CN 2012077456 W CN2012077456 W CN 2012077456W WO 2012175054 A1 WO2012175054 A1 WO 2012175054A1
Authority
WO
WIPO (PCT)
Prior art keywords
amplitude
ratio
frequency point
frequency
spectrum
Prior art date
Application number
PCT/CN2012/077456
Other languages
English (en)
French (fr)
Inventor
齐峰岩
苗磊
塔勒布•阿里斯
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP12802425.4A priority Critical patent/EP2662854A1/en
Priority to JP2013556963A priority patent/JP2014507689A/ja
Priority to KR1020137021767A priority patent/KR20130117855A/ko
Publication of WO2012175054A1 publication Critical patent/WO2012175054A1/zh
Priority to US14/136,130 priority patent/US20140142931A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/90Pitch determination of speech signals

Definitions

  • the present invention relates to a method and apparatus for pitch detection, and more particularly to a pitch detection method and apparatus with high accuracy and low computational complexity. Background technique
  • pitch detection is one of the key technologies in the practical application of various speech and audio.
  • speech coding speech recognition and pitch retrieval
  • pitch is an important extraction parameter, and the accuracy of pitch detection directly affects.
  • the performance of the final encoding For the detection of the pitch period, in the prior art, two methods are generally used:
  • One method is the time domain method. After preprocessing the speech signal, the input signal is analyzed and calculated in the time domain to determine the pitch period.
  • the correlation function method is mostly used, and the correlation value of the speech signal is detected only in the time domain, and the correlation value of the speech signal on the integer multiple of the true pitch period is large. It is difficult to accurately distinguish the detection, and the pitch period doubling error is prone to occur, thereby reducing the accuracy of the pitch parameter detection.
  • Another method is the frequency domain method, which converts the time domain signal into the frequency domain and performs peak detection in the frequency domain; according to the detected peak and pitch tracking algorithm, the pitch frequency is obtained; and the pitch frequency is converted accordingly. , get the pitch period.
  • Embodiments of the present invention provide a pitch detection method and apparatus with high accuracy and low computational complexity.
  • a method of pitch detection comprising:
  • Fine pitch period detection is performed based on the initial pitch period and characteristic parameters to obtain a fine pitch period.
  • a pitch detecting device comprising:
  • An initial pitch period acquisition module configured to perform pitch detection on the voice signal in a time domain to obtain an initial pitch period
  • a time-frequency conversion module configured to convert the voice signal into a frequency domain to obtain a frequency spectrum of the voice signal, where the spectrum includes an amplitude spectrum of the spectrum;
  • a feature parameter extraction module configured to extract a feature parameter according to an initial pitch period and a spectrum of the voice signal
  • Fine pitch period acquisition module used to perform fine pitch period detection based on initial pitch period and feature parameters to obtain a fine pitch period.
  • a method and apparatus for pitch detection detects a pitch period based on an initial pitch period acquired in a time domain and a characteristic parameter extracted in a frequency domain, thereby avoiding occurrence of a pitch period doubling error, and improving Accuracy of pitch period detection.
  • FIG. 1 is a flowchart of a method for detecting pitch sound according to an embodiment of the present invention
  • FIG. 2 is a schematic structural diagram of a voice information windowing method for a pitch sound detection method according to an embodiment of the present invention
  • FIG. 3 is a flowchart of time-frequency conversion of a method for pitch sound detection according to an embodiment of the present invention
  • FIG. 4 is a method for detecting a pitch sound according to an average amplitude of a frequency point and a frequency point amplitude according to an embodiment of the present invention
  • FIG. 5 is a flowchart of a method for detecting a pitch sound according to a ratio of a frequency point average amplitude to a frequency point amplitude parameter value and an average amplitude parameter value for frequency double detection according to an embodiment of the present invention
  • FIG. 6 is a flowchart of a method for detecting a pitch sound according to a ratio of a frequency point average amplitude to a frequency point amplitude parameter value and a buffer frequency for performing triple frequency detection on a triple frequency according to an embodiment of the present invention
  • FIG. 7 is a flowchart of a method for detecting a pitch sound according to a ratio parameter value of a frequency average amplitude to a frequency point amplitude and a double frequency detection of a buffered data according to an embodiment of the present invention
  • FIG. 8 is a flow chart of interpolating an amplitude spectrum by a method of pitch detection according to an embodiment of the present invention
  • FIG. 9 is a flowchart of a method for detecting a pitch signal to zero-fill a voice signal according to an embodiment of the present invention
  • FIG. 11 is a schematic structural diagram of a pitch detecting apparatus according to an embodiment of the present invention.
  • FIG. 12 is a schematic structural diagram of a time-frequency conversion module of a device for detecting pitch sound according to Embodiment 2 of the present invention
  • FIG. 13 is a schematic structural diagram of a time-frequency conversion module for a device for detecting pitch sound according to Embodiment 3 of the present invention.
  • audio codecs and video codecs are widely used in various electronic devices, such as: mobile phones, wireless devices, personal data assistants (PDAs), handheld or portable computers, GPS receivers/navigators. , cameras, audio/video players, camcorders, video recorders, surveillance equipment, etc.
  • PDAs personal data assistants
  • audio encoder or decoder may be directly implemented by a digital circuit or a chip such as a DSP (digital signal processor), or may be executed by a software code driven processor in the software code. The process is implemented.
  • DSP digital signal processor
  • a method of pitch detection includes:
  • Step 100 Perform pitch detection on the speech signal in the time domain to obtain an initial pitch period.
  • the open-loop pitch detection may be performed according to the perceptually weighted speech signal to obtain an initial gene period ⁇ '.
  • Step 101 Perform pre-processing on the voice signal.
  • the speech signal s (n) is pre-processed, for example, pre-emphasis processing, to emphasize high-frequency components in the speech signal, and to improve the accuracy of speech coding.
  • the pre-processed speech signal s pre (n) is obtained. In order to convert the speech signal to the frequency domain and make the pitch detection more accurate, it is necessary to perform pre-processing on the speech signal.
  • Step 102 Add an analysis window to the pre-processed frame signal.
  • an analysis window is added to the pre-processed frame signal, and the analysis window function is:
  • the first analysis window is added to the current frame, and the second analysis window is added to the second half of the current frame and the first half of the future frame, as shown in FIG.
  • an embodiment of the step includes:
  • Step 300 Perform frequency domain transformation on the voice signal after the analysis window to obtain a spectrum coefficient.
  • a framed speech signal is subjected to Fourier transform, for example, a frame length FT is 256.
  • a 256-point Fourier transform can be performed to obtain a corresponding spectral coefficient, and the spectral coefficient function is :
  • Step 301 Calculate an energy spectrum according to a spectral coefficient. The real and imaginary parts of the spectral coefficients are summed to calculate the energy spectrum.
  • the energy spectrum function E(k) is:
  • Step 302 Perform weighting processing on the energy spectrum according to the current frame and the previous frame to smooth the energy spectrum.
  • the energy spectrum can be weighted according to the current frame and the previous frame to obtain a smoothed energy spectrum.
  • the smoothed energy spectrum function is:
  • E[ . ] (K) is the Burgundy was generated according to the energy spectrum of the first analysis window
  • E [1] W is the energy spectrum generated by a second analysis window "values represent E [°] (k), ! ⁇ 1 ⁇
  • the proportion of 1 ⁇ is selected according to experience, for example, it can be set to 0.5.
  • Step 303 Calculate an amplitude spectrum of the spectrum according to the energy spectrum.
  • Performing a square operation on the energy spectrum function to obtain an amplitude spectrum function in order to prevent the value of the amplitude spectrum function from being too large, a logarithmic operation is performed on the amplitude spectrum function, and the amplitude range is compressed; When the function value of the smoothed energy spectrum is 0, the logarithm of the logarithm is infinitely close to negative infinity. Overflow occurs during the operation, so a small positive number is set to prevent the logarithmic value from overflowing.
  • the amplitude is constant and can be set according to
  • Step 104 extracting characteristic parameters according to the initial pitch period and the spectrum of the speech signal.
  • a fundamental frequency f ' can be obtained, and a multiple of the fundamental frequency f ' can be obtained to obtain a frequency multiplication, such as 2f ' and f ' /2 .
  • the characteristic parameters include: an average amplitude parameter, a ratio parameter of the average amplitude to the frequency point amplitude, and a peak position parameter.
  • the set function is:
  • S(k) Where S(k) is the average amplitude function, S(k) is the amplitude spectrum function, f ' is the initial pitch period T' corresponding to the frequency domain in the frequency domain, and the value at the time of detection represents the frequency point k to be measured, the range The average amplitude of the frequency points within. r ( k ) is a function of the ratio of the average amplitude to the amplitude of the frequency to be measured.
  • the values of the fundamental frequency, the double frequency and the triple frequency are substituted into the function to obtain the fundamental frequency characteristic parameters '), r(f '), the second frequency characteristic parameters S( 2 f '), r ( 2 f ' ), triple frequency characteristic parameters S( 3 f '), r ( 3 f ').
  • Step 105 Perform fine pitch period detection according to the initial pitch period and the characteristic parameter to obtain a fine pitch period.
  • the frequency signal is subjected to frequency multiplication detection according to the initial pitch period and the characteristic parameters.
  • most of the pitch period doubling errors occur at the fundamental frequency point, the double frequency point and the triple frequency point of the frequency domain, so when the accuracy of the detection is not high, in order to reduce the complexity of the detection, Only the fundamental frequency, the second frequency and the triple frequency are detected.
  • the ratio of the frequency point average amplitude to the frequency point amplitude and the average amplitude parameter value is detected for the triple frequency, as shown in FIG. 4, it includes:
  • Step 400 Determine whether a ratio of a ratio parameter value of the average amplitude of the fundamental frequency point to the frequency point amplitude and a ratio of the average amplitude of the triple frequency point to the ratio of the frequency point amplitude are greater than the first default value.
  • the average amplitude parameter ⁇ ⁇ the ratio of the average amplitude to the frequency point amplitude parameter 1 "( k ) , it can be seen that the larger the amplitude value of the measured frequency point relative to the average amplitude parameter ⁇ ⁇ , the smaller the r( k ) value is. It shows that there is a peak at this frequency, and the fluctuation characteristics of the amplitude spectrum are obvious.
  • the amplitude value s ( k ) at the frequency point is larger than the value of the average amplitude parameter in the range of 2 f '- 1 around it, and the average amplitude and frequency point are The value of the amplitude parameter r ( k ) is small. Therefore, based on the sum of the fundamental frequency, the double frequency point, and the triple frequency point 1 "( k ), it can be determined whether or not the pitch period doubling error occurs in the acquired pitch period.
  • the 3 position is probably the fine pitch frequency, and the first default value can be set to 1.22 according to experience.
  • Step 401 If the ratio of the ratio of the average amplitude of the fundamental frequency point to the amplitude of the frequency point and the ratio of the average amplitude of the triple frequency point to the ratio of the frequency point amplitude are greater than the first default value, determine the average amplitude and frequency of the double frequency point. The ratio parameter value of the point amplitude and the ratio of the average amplitude of the triple frequency point to the amplitude of the frequency point Whether the ratio is greater than the second default value.
  • the ratio of r(f ') and ⁇ ( ⁇ ') is greater than the first default value
  • the second default value Can be set to 1.22 based on experience.
  • Step 402 If the ratio of the ratio of the average amplitude of the double frequency point to the amplitude of the frequency point and the ratio of the average amplitude of the triple frequency point to the ratio of the frequency point amplitude is greater than the second default value, determining the average amplitude parameter of the triple frequency point Whether the difference between the value and the baseband average amplitude parameter value is greater than the third default value.
  • the third default value may be Set to 0.6 based on experience.
  • Step 403 If the difference between the average amplitude parameter value of the triple frequency point and the average amplitude parameter value of the base frequency point is greater than the third default value, determine that the triple frequency is the required fine pitch frequency.
  • the triple frequency is the fine pitch frequency
  • the required fine pitch period can be determined according to the fine pitch frequency
  • the double frequency detection is performed according to the ratio of the frequency average amplitude to the frequency amplitude and the average amplitude parameter value, as shown in FIG. 5, including:
  • Step 500 Determine whether a ratio of a ratio parameter value of the average amplitude of the fundamental frequency point to the frequency point amplitude and a ratio of the average amplitude of the double frequency point to the ratio of the frequency point amplitude are greater than a seventh default value.
  • Step 501 If the ratio of the ratio of the average amplitude of the fundamental frequency point to the amplitude of the frequency point and the ratio of the average amplitude of the double frequency point to the ratio of the frequency point amplitude is greater than the seventh default value, determine the average amplitude and frequency of the triple frequency point. Whether the ratio of the ratio parameter value of the point amplitude and the ratio of the average amplitude of the double frequency point to the ratio of the frequency point amplitude is greater than the eighth default value.
  • the first Eight default values can be set to 1.22 based on experience.
  • Step 502 If the ratio of the ratio of the average amplitude of the triple frequency point to the amplitude of the frequency point and the ratio of the average amplitude of the double frequency point to the ratio of the frequency point amplitude is greater than the eighth default value, further determine the average amplitude of the double frequency point. Whether the difference between the parameter value and the average amplitude parameter value of the fundamental frequency point is greater than the ninth default value.
  • Step 503 If the difference between the average amplitude parameter value of the double frequency point point and the average amplitude parameter value of the base frequency point is greater than the ninth default value, determine that the double frequency is the required fine pitch frequency.
  • the double frequency is the fine pitch frequency
  • the required fine pitch period can be determined according to the fine pitch frequency
  • Step 600 Determine whether a ratio of a ratio parameter value of the average amplitude of the fundamental frequency point to the frequency point amplitude and a ratio of the average amplitude of the triple frequency point to the ratio of the frequency point amplitude are greater than a fourth default value.
  • the fourth default value 4 can be set according to experience as
  • Step 601 If the ratio of the ratio of the average amplitude of the fundamental frequency point to the amplitude of the frequency point and the ratio of the ratio of the average amplitude of the triple frequency point to the amplitude of the frequency point is greater than the fourth default value, determine the average amplitude and frequency of the double frequency point. Whether the ratio of the ratio parameter value of the point amplitude and the ratio of the average amplitude of the triple frequency point to the ratio of the frequency point amplitude is greater than the fifth default value.
  • the ratio of r ( f ') and r 0 f ') is greater than the fourth default value S 3 , it is determined whether the ratio of r ( 2 f ') and r 0 f ') is greater than the fifth default value ⁇ , the fifth The default value ⁇ can be set to 1.05 based on experience.
  • Step 602 If the ratio of the ratio of the average amplitude of the double frequency point to the amplitude of the frequency point and the ratio of the average amplitude of the triple frequency point to the ratio of the frequency point amplitude is greater than the fifth default value, determine whether a pitch period occurs in the previous frame. Triple error.
  • Step 603 If a pitch error of three times occurs in the previous frame, it is determined whether the number of times the pitch period occurs three times before the current frame is greater than a sixth default value.
  • the former is determined in a period three times the error has occurred is doubled, it is further determined whether the number of triple pitch error occurs before the current frame is greater than the sixth default value Cl. If the first 10 frames of the current frame are judged, whether the number of times the pitch period is three times the error is continuously greater than the sixth default value c i .
  • the sixth default The value e i can be set to 3 if it is judged according to the entire frame, and can be set to 6 if judged according to the field.
  • Step 604 If the number of times the pitch period triple error occurs before the current frame is greater than the sixth default value, determine that the triple frequency is the required fine pitch period.
  • the double frequency detection is performed according to the ratio parameter value of the frequency average amplitude to the frequency amplitude and the buffer data, as shown in FIG. 7, including:
  • Step 700 Determine whether a ratio of a ratio of the average amplitude of the fundamental frequency point to the amplitude of the frequency point and a ratio of the average amplitude of the double frequency point to the ratio of the frequency of the frequency point is greater than a tenth default value.
  • the tenth default value can be set empirically as
  • Step 701 If the ratio of the ratio of the average amplitude of the fundamental frequency point to the amplitude of the frequency point and the ratio of the average amplitude of the double frequency point to the ratio of the frequency point amplitude is greater than the tenth default value, determine the average amplitude and frequency of the triple frequency point. Whether the ratio of the ratio parameter value of the point amplitude and the ratio of the average amplitude of the double frequency point to the ratio of the frequency point amplitude is greater than the eleventh default value.
  • the tenth A default value can be set to 1.05 based on experience.
  • Step 702 If the ratio of the ratio value of the average amplitude of the triple frequency point to the frequency point amplitude and the ratio of the average amplitude of the double frequency point to the ratio of the frequency point amplitude are greater than the eleventh default value, determine whether a pitch occurs in the previous frame. Cycle double error
  • the frame mark determines whether a double cycle doubled error has occurred in the previous frame.
  • Step 703 If the pitch period of the previous frame is doubled, it is determined whether the number of times the pitch period occurs before the current frame is greater than the twelfth default value.
  • Step 704 If the number of times the pitch period occurs before the current frame is greater than the twelfth default value, determine that the double frequency is the fine pitch frequency required to be detected.
  • the detection result is saved in the cached previous frame mark. For example, when it is judged that the pitch error of the current frame is doubled, the pitch period is doubled in the previous frame mark, and The number of consecutive occurrences of the record is used to detect the next frame of data.
  • the ratio of the average amplitude of the frequency point to the ratio of the frequency point amplitude and the average amplitude parameter value can be determined and the average amplitude according to the frequency point is
  • the ratio parameter value of the frequency point amplitude and the buffer data are judged in two ways to judge the fine pitch frequency.
  • the judgment conditions of the two judgment modes are combined according to or logic.
  • the frequency point can be determined to be the required fine pitch frequency.
  • the triple frequency when judging the triple error of the pitch period, as long as the judgment condition for judging the ratio parameter value and the average amplitude parameter value according to the average amplitude of the frequency point and the frequency point amplitude is satisfied, the triple frequency can be determined to be the required fineness.
  • the pitch frequency, or the judgment condition for judging according to the ratio parameter value of the average amplitude and the frequency point amplitude and the judgment result of the multiplication before the current frame stored in the buffer, may also determine that the triple frequency is the required fine pitch frequency.
  • Step 303 the interpolation is performed according to the acquired amplitude spectrum, as shown in FIG. 8, including: Step 800: Interpolating the amplitude spectrum of the spectrum to obtain a high-density amplitude of the voice signal. Degree spectrum.
  • interpolation is performed between the existing frequency points in the frequency domain.
  • cubic B-spline interpolation is used, that is, on the basis of the original K frequency points, the frequency is expanded to mK frequency points, and m is a positive integer. Because the cubic B-spline interpolation has a certain deviation at the boundary, in order to reduce this error, some dummy data is artificially expanded at both ends of the data before interpolation, that is, the L-point expansion of the amplitude spectrum is performed, so that the boundary conditions are not affected. The interpolation accuracy of the actual data.
  • the expanded values are equal to the values of the points at both ends of the spectrum.
  • the expanded amplitude spectrum is: S (0), ... , S (0), ⁇ S (k), ke[0,kl] ⁇ , S(kl) ,...,S(kl)
  • f (x) represents the amplitude of the frequency to be inserted
  • k is an integer
  • (X) is a cubic B-spline basis function whose expression is:
  • Step 801 Perform weighting processing on the high-density amplitude spectrum according to the current frame and the previous frame to smooth the high-density spectrum.
  • the smoothed high-density spectral function is:
  • S' [ -1] (i) is the high-density spectrum of the previous one, by setting the ratio of S ' [ -1] « and 8 ' [ ° ] «in), for example, can be set to 0.4.
  • the fine pitch frequency is detected based on the high density amplitude spectrum.
  • the fine pitch period is detected.
  • the detection process because the number of frequency points is increased, the accuracy of the average amplitude is improved, and the influence of the frequency point amplitude value jump on the detection is reduced.
  • the detection steps are the same as those in Embodiment 1 and Embodiment 2, and are not described again.
  • the speech signal can be zero-padded in the time domain, as shown in Figure 9, including:
  • Step 900 Perform tail-zero interpolation on the speech signal and convert to a frequency domain to obtain a high-density amplitude spectrum of the speech signal.
  • the point where the amplitude is zero is added, and the zero-padded speech signal is converted into the frequency domain, and the frequency point and the tail-added amplitude in the original speech signal are zero by the time-frequency transform.
  • the point is converted to the frequency domain, that is, the frequency point can be inserted between the frequency points of the amplitude spectrum of the original frequency domain.
  • the amplitude of the original frequency point in the amplitude spectrum is not affected by the zero point, that is, the original frequency point and the amplitude value corresponding to the frequency point are maintained in the amplitude spectrum, thereby A high-density amplitude spectrum corresponding to the time domain signal in the frequency domain is obtained.
  • Step 901 Perform weighting processing on the high-density amplitude spectrum according to the current frame and the previous frame to smooth the high-density amplitude spectrum.
  • the smoothing is performed, and the smoothed high-density amplitude spectrum function is:
  • the fine pitch period is detected.
  • the detection process because the number of frequency points is increased, the accuracy of the average amplitude is improved, and the influence of the frequency point amplitude value jump on the detection is reduced.
  • the detection steps are the same as those in Embodiment 1 and Embodiment 2, and are not described again.
  • the obtained fine pitch frequency is a multiple of the initial pitch frequency.
  • the search range is only at the fundamental frequency, the second frequency and the triple frequency position, and not all frequency domains are detected. accurate.
  • the peak search of the amplitude of the high-density amplitude spectrum can be performed, and the fine pitch period is determined according to the corresponding characteristic parameter.
  • Performing fine pitch period detection according to the initial pitch period and the characteristic parameter to obtain a fine pitch period further includes:
  • Step 1000 In the high-density amplitude spectrum, compare the amplitude values of the fundamental frequency point and each of the multiple frequency points in a certain range, and determine the peak position within a certain range near the fundamental frequency point and each frequency multiplication point.
  • the peak value of the amplitude value is searched to determine the peak position within a certain range near the fundamental frequency point and each doubling point, wherein the fundamental frequency point and each of the multiplication frequency points respectively correspond to one peak position.
  • the peak of the amplitude corresponding to the fundamental frequency point and each of the multiple frequency points can be obtained.
  • Step 1001 Determine whether there is a ratio of the average amplitude of the frequency point to the frequency point amplitude in the fundamental frequency point and each frequency doubling point, and the ratio of the average amplitude of the other frequency points to the ratio of the frequency point amplitude is greater than the tenth Three default values, the one frequency point is called the target frequency point.
  • the ratio of the average amplitude of the fundamental frequency point and each doubling point to the ratio of the frequency point amplitude the ratio of the average value of the frequency point to the amplitude of the frequency point and the average amplitude and frequency range of all other frequency points are determined.
  • the ratio of the ratio parameter values is greater than the thirteenth default value of 3, and the thirteenth default value ⁇ can be set empirically, for example, set to 1.22.
  • Step 1002 If the fundamental frequency point and each of the multiple frequency points have a ratio of the average amplitude of the frequency point to the frequency point amplitude and the ratio of the average amplitude of the other frequency points to the ratio of the frequency point amplitude, the ratio is larger than the first
  • the thirteen default value determines whether the distance from the target frequency point to the peak position corresponding to the target frequency point is smaller than the distance from the other frequency points to the corresponding peak position.
  • Step 1003 If the distance from the target frequency point to the peak position corresponding to the target frequency point is smaller than the distance from the other frequency point to the corresponding peak position, determine that the period corresponding to the target frequency point is a fine pitch period.
  • the target frequency is the desired fine pitch frequency.
  • a reciprocal operation is performed on the fine pitch frequency to obtain a fine pitch period.
  • the determined fine pitch frequency is the fundamental frequency or each doubling point, and the accuracy is relatively low.
  • further search can be performed according to the frequency points detected in Embodiment 1, Embodiment 2, and Embodiment 6.
  • the peak search is performed on the high-density spectrum by setting the three-frequency point 3 as a center and within a certain range around it (for example, 2 f ' - 2 between the double frequency point 2 and the quadruple frequency point 4 f ').
  • the peak search range can be set to be within the range of f′ center U (k is the frequency of the searched frequency point)
  • the peak value can be determined by determining the peak position as the fine pitch frequency, and performing a reciprocal operation on the fine pitch frequency to determine the required fine pitch period.
  • the frequency point corresponding to the peak obtained in this range is the required fine pitch frequency.
  • a device for pitch detection includes:
  • An initial pitch period acquisition module configured to perform pitch detection on the voice signal in a time domain, Initial pitch period
  • a time-frequency conversion module configured to convert the voice signal into a frequency domain to obtain a frequency spectrum of the voice signal, where the spectrum includes an amplitude spectrum of the spectrum;
  • a feature parameter extraction module configured to extract a feature parameter according to an initial pitch period and a spectrum of the voice signal
  • Fine pitch period acquisition module used to perform fine pitch period detection based on initial pitch period and feature parameters to obtain a fine pitch period.
  • the characteristic parameters include: an average amplitude parameter, a ratio parameter of the average amplitude to the frequency point amplitude, and a peak position parameter.
  • the fine pitch period acquisition module further includes:
  • Multiplier detection module Used to compare the characteristic parameters of the fundamental frequency point and the multiplication frequency point to determine the fine base audio frequency.
  • the frequency multiplication detecting module further includes:
  • Peak search module used to search for the peak value of the amplitude within a certain range around the fine pitch frequency, and perform a reciprocal operation on the frequency point corresponding to the peak to obtain a fine pitch period.
  • the device for detecting a pitch sound further includes:
  • a preprocessing module configured to preprocess the voice signal
  • Windowing module used to add an analysis window to the pre-processed frame signal.
  • the time-frequency conversion module as shown in FIG. 12, further includes:
  • a spectral coefficient acquisition module configured to perform frequency domain transformation on the speech signal after the addition of the analysis window to obtain a spectral coefficient
  • the energy spectrum is obtained from the ear: It is used to calculate the energy spectrum based on the spectral coefficients.
  • the device for detecting a pitch sound further includes:
  • Energy spectrum smoothing module used to weight the energy spectrum according to the current frame and the previous frame to smooth the energy spectrum.
  • the device for detecting a pitch sound further includes:
  • Amplitude spectrum acquisition module used to calculate the amplitude spectrum of the spectrum according to the energy spectrum.
  • the device for detecting a pitch sound further includes:
  • An amplitude spectrum interpolation module configured to interpolate an amplitude spectrum of the spectrum to obtain the voice signal High density amplitude spectrum.
  • the time-frequency conversion module as shown in FIG. 13, further includes:
  • the speech signal interpolation module is configured to perform tail-zero interpolation on the speech signal and convert to a frequency domain to obtain a high-density amplitude spectrum of the speech signal.
  • the device for detecting a pitch sound further includes:
  • High-density amplitude spectrum smoothing module used to weight the high-density amplitude spectrum according to the current frame and the previous frame to smooth the high-density amplitude spectrum.
  • a method and apparatus for pitch detection detects a pitch period based on an initial pitch period acquired in a time domain and a characteristic parameter extracted in a frequency domain, thereby avoiding occurrence of a pitch period doubling error, and improving Accuracy of pitch period detection.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Electrophonic Musical Instruments (AREA)
  • Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)

Abstract

本发明公开了一种基音检测的方法和装置,属于语音与音频领域。该基音检测的方法,包括:在时域对所述语音信号进行基音检测,得到初始基音周期;将所述语音信号转换到频域,获得语音信号的频谱,该频谱包括频谱的幅度谱;根据初始基音周期和所述语音信号的频谱提取特征参数;根据初始基音周期和特征参数进行精细基音周期检测,得到精细基音周期。

Description

一种基音检测的方法和装置
本申请要求于 2011 年 06 月 22 日提交中国专利局、 申请号为 201110170075.0、 发明名称为 "一种基音检测的方法和装置" 的中国专利申请 的优先权, 其全部内容通过引用结合在本申请中。 技术领域
本发明涉及一种基音检测的方法和装置, 尤其涉及一种高精确度、运算复 杂度较低的基音检测方法和装置。 背景技术
在数字通信领域, 语音、 图像、 音频、 视频的传输有着非常广泛的应用需 求, 如手机通话、 音视频会议、 广播电视、 多媒体娱乐等。 为了降低音视频信 号存储或者传输过程中占用的资源, 音视频压缩编码技术应运而生。在语音与 音频信号处理中,基音检测是各种语音与音频实际应用中的关键技术之一,在 语音编码, 语音识别, 音调检索中基音都是重要的提取参数, 基音检测的准确 性直接影响最后编码的性能。 对于基音周期的检测, 现有技术中, 一般采用两 种方法:
一种方法是时域法,通过对语音信号进行预处理后,在时域对输入信号进 行分析和计算, 确定基音周期。
因为语音信号在时域内对语音信号进行基音检测,大多采用的是相关函数 法, 只在时域内对语音信号的相关值进行检测, 而语音信号在真正基音周期整 数倍上的相关值都会很大,很难进行准确区分检测, 容易出现基音周期加倍错 误, 从而降低了基音参数检测的精度。
另一种方法是频域法,是将时域信号转换到频域, 并在频域上进行峰值检 测; 根据检测到的峰值和基音跟踪算法, 得到基音频率; 对该基音频率进行相 应的转换, 获得基音周期。
在此过程中,将时域信号转换到频域并在频域进行基音搜索的运算复杂度 较高, 在实际应用中很难被采用。 发明内容
本发明的实施例提供一种精确度高、运算复杂度较低的基音检测方法和装 置。
为达到上述目的, 本发明的实施例采用如下技术方案:
一种基音检测的方法, 包括:
在时域对所述语音信号进行基音检测, 得到初始基音周期;
将所述语音信号转换到频域, 获得语音信号的频谱, 该频谱包括频谱的幅 度谱;
根据初始基音周期和所述语音信号的频谱提取特征参数;
根据初始基音周期和特征参数进行精细基音周期检测, 得到精细基音周 期。
一种基音检测的装置, 包括:
初始基音周期获取模块: 用于在时域对所述语音信号进行基音检测,得到 初始基音周期;
时频转换模块: 用于将所述语音信号转换到频域, 获得语音信号的频谱, 该频谱包括频谱的幅度谱;
特征参数提取模块:用于根据初始基音周期和所述语音信号的频谱提取特 征参数;
精细基音周期获取模块:用于根据初始基音周期和特征参数进行精细基音 周期检测, 得到精细基音周期。
本发明实施例提供的一种基音检测的方法和装置,根据在时域上获取的初 始基音周期和频域中所提取的特征参数对基音周期进行检测,避免基音周期加 倍错误的出现, 提高了基音周期检测的精确度。 附图说明
图 1为本发明实施例一种基音检测的方法的流程图;
图 2为为本发明实施例一种基音检测的方法语音信息加窗的结构示意图; 图 3为本发明实施例一种基音检测的方法时频转换的流程图;
图 4为本发明实施例一种基音检测的方法根据频点平均幅度与频点幅度的 比值参数值和平均幅度参数值对三倍频进行倍频检测的流程图;
图 5为本发明实施例一种基音检测的方法根据频点平均幅度与频点幅度的 比值参数值和平均幅度参数值对二倍频进行倍频检测的流程图;
图 6为本发明实施例一种基音检测的方法根据频点平均幅度与频点幅度的 比值参数值和緩存数据对三倍频进行倍频检测的流程图;
图 7为本发明实施例一种基音检测的方法根据频点平均幅度与频点幅度的 比值参数值和緩存数据对二倍频进行倍频检测的流程图;
图 8为本发明实施例一种基音检测的方法对幅度谱进行插值的流程图; 图 9为本发明实施例一种基音检测的方法对语音信号进行补零的流程图; 图 10为本发明实施例一种基音检测的方法对全频域进行检测的流程图。 图 11为本发明实施例一种基音检测装置的结构示意图;
图 12为本发明实施例 2—种基音检测的装置时频转换模块的结构示意图; 图 13为本发明实施例 3—种基音检测的装置时频转换模块的结构示意图。 具体实施方式
数字信号处理领域, 音频编解码器、视频编解码器广泛应用于各种电子设 备中, 例如: 移动电话, 无线装置, 个人数据助理(PDA ), 手持式或便携式 计算机, GPS接收机 /导航器, 照相机, 音频 /视频播放器, 摄像机, 录像机, 监控设备等。 通常, 这类电子设备中包括音频编码器或音频解码器, 音频编码 器或者解码器可以直接由数字电路或芯片例如 DSP ( digital signal processor )实 现, 或者由软件代码驱动处理器执行软件代码中的流程而实现。音频编码器中 通常会有基音检测的流程。下面结合附图对本发明实施例一种基音检测的方法 进行详细描述。
实施例 1
一种基音检测的方法, 如图 1所示, 包括:
步骤 100、 在时域对所述语音信号进行基音检测, 得到初始基音周期 在时域中, 可根据感知加权后的语音信号进行开环基音检测,得到初始基 因周期 τ'。
步骤 101、 对所述语音信号进行预处理。 对语音信号 s(n)进行预处理, 例如预加重处理, 以加重语音信号中的高频 成分, 提高语音编码的精确度。 完成对于语音信号的预处理后, 得到预处理语 音信号 spre (n)。 为将所述语音信号转换到频域, 使基音检测更加精确, 则需要 对该语音信号进行前期处理。
步骤 102、 为所述预处理后的帧信号加分析窗。
根据完成预处理的语音信号 Sp n) , 为所述预处理后的帧信号加分析窗, 该分析窗函数为:
wFFT (n) = |0.5 - 0.5cos
Figure imgf000006_0001
IQ , 其中 为分析窗长度。
第一分析窗加在当前帧上,而第二分析窗加在当前帧的后半帧和未来帧的 前半帧上, 如图 2所示。
第一分析窗函数为: s[。]™d (n) = WFFT (n) spre (n), n = 0, 1, 2, ... , FT - 1
第二分析窗函数为: s[1]™a (n) = WFFT (n) spre (n + L^FT / 2), n = 0, 1, 2, ... , L^FT - 1 步骤 103、 将所述语音信号转换到频域, 获得语音信号的频谱, 该频谱包 括频谱的幅度谱。
为在频域中对语音信号进行检测, 则需要获取语音信号在频域中的频谱, 该频谱包括频谱的幅度谱, 如图 3所示, 该步骤的一个实施例包括:。
步骤 300、 对所述加分析窗后的语音信号进行频域变换, 得到频谱系数。 为获取频谱系数, 对加窗后的一帧语音信号进行傅立叶变换, 例如帧长 FT为 256, 在实际应用中, 就可以进行 256点的傅立叶变换, 得到相应的频谱 系数, 该频谱系数函数为:
X(k) =∑ (n)e— J , k = 0, 1, 2,… , Κ - 1
, 其中频谱系数 为复数, 包括实部和虚部。 步骤 301、 根据频谱系数, 计算出能量谱 取频谱系数中的实部和虚部进行平方和运算,计算出能量谱, 该能量谱函 数 E(k)为:
E(k) = X^(k) + XI 2(k), k = 0,l,2,' ,K-l, 其中 XRW和 X k)分别表示实部和虚 部。
步骤 302、 根据当前帧和前一帧对能量谱进行加权处理, 使能量谱平滑。 为进一步提高基音周期检测的精确度,可以根据当前帧和前一帧对能量谱 进行加权, 得到平滑能量谱, 该平滑能量谱函数为:
E(k)=«E[0](k) + (l-«)E[1](k), k = 0,l,2,...,K-l, 0<«<1? 其中 E[。](k)为才艮据第一 分析窗生成的能量谱, E[1]W为根据第二分析窗生成的能量谱, "的值代表 E[°](k)、 !^ 在1^1^所占的比例, 根据经验进行选取, 例如可设置为 0.5。
步骤 303、 根据能量谱, 计算出频谱的幅度谱。
对能量谱函数进行开方运算, 可得到幅度谱函数,在计算幅度谱函数的过 程中, 为防止所述幅度谱函数的值过大, 对幅度谱函数进行对数运算, 压缩幅 度范围; 当平滑能量谱的函数值为 0时, 其对数值无限趋近于负无穷, 在运算 过程中会发生溢出现象, 故设置一个较小的正数 防止对数值溢出。 所述幅度 其^和 为常数, 可根据设
Figure imgf000007_0001
置该常数的大小来调整频谱的幅度范围, 例如可设置为 = 2 ?7 = §ΙΟ(4^2ΡΡΤ) Ο 步骤 104、 根据初始基音周期和所述语音信号的频谱提取特征参数。
对初始基音周期 T'进行倒数运算, 可得到基频 f', 并对该基频 f'进行倍 数运算可得到倍频, 例如 2f'和 f'/2
所述特征参数, 包括: 平均幅度参数, 平均幅度与频点幅度的比值参数, 峰值位置参数。
为对精细基音周期进行检测, 以避免基音周期加倍错误的出现, 则需要设 置函数获取幅度大小和幅度谱的起伏特性来确定精细基音周期,例如设置的函 数为:
∑ S(i + k)
S(k) = ^-^ , k= f 73, f 72, f ',2f ',3f '
2f '- 1
r(k) = -^, k= f 73, f 72, f ',2f ',3f '
S(k) 其中 S(k)为平均幅度函数, S(k)为幅度谱函数, f '为初始基音周期 T '对应 在频域的频点,检测时 的值代表以待测频点 k为中心, 范围内的频点 的平均幅度。 r (k)为平均幅度与待测频点幅度的比值函数。
检测时,将基频、二倍频和三倍频的值代入函数,获取基频特征参数 ')、 r(f ') , 二倍频特征参数 S(2 f ')、 r(2 f '), 三倍频特征参数 S(3 f ')、 r(3 f ')。
步骤 105、 根据初始基音周期和特征参数进行精细基音周期检测, 得到精 细基音周期。
根据初始基音周期和特征参数,对语音信号进行倍频检测。在实际检测中, 基音周期加倍错误大部分发生在频域的基频点、 二倍频点和三倍频点的位置, 故当要求检测的精度不高时, 为降低检测的复杂度, 可只对基频、 二倍频和三 倍频进行检测。
当根据频点平均幅度与频点幅度的比值参数值和平均幅度参数值对三倍 频检测, 如图 4所示, 包括:
步骤 400、 判断基频点平均幅度与频点幅度的比值参数值和三倍频点平均 幅度与频点幅度的比值参数值的比值是否大于第一默认值。
根据平均幅度参数^ ^, 平均幅度与频点幅度的比值参数1 "(k) , 可知, 被 测频点的幅度值相对于平均幅度参数^ ^越大, 则 r(k)值越小, 说明该频点处 有峰值, 幅度谱的起伏特性明显。
检测时,在真实基音频率的位置,会出现峰值,此时该频点处的幅度值 s(k) 大于其周围 2 f '-1范围内的平均幅度参数的值 , 则平均幅度与频点幅度的 比值参数的值 r(k)较小。 故根据基频点、 二倍频点和三倍频点的 和1 "(k) , 可判定该已获取的基音周期是否发生基音周期加倍错误。
在做倍频检测时, 首先判断 3 f '位置是否可能为精细基音频率, 为使倍频 的检测更加准确, 则设置第一默认值 , 只有当1 "( f)和1" (3 f ')的比值大于 时,
3 位置才可能为精细基音频率, 该第一默认值 可根据经验设置为 1.22。
步骤 401、 如果基频点平均幅度与频点幅度的比值参数值和三倍频点平均 幅度与频点幅度的比值参数值的比值大于第一默认值,则判断二倍频点平均幅 度与频点幅度的比值参数值和三倍频点平均幅度与频点幅度的比值参数值的 比值是否大于第二默认值。
当 r(f ')和 ι·(Μ')的比值大于第一默认值 时, 则判断 r(2f ')和 r(M')的比值 是否大于第二默认值 , 该第二默认值 可根据经验设置为 1.22。
步骤 402、 如果二倍频点平均幅度与频点幅度的比值参数值和三倍频点平 均幅度与频点幅度的比值参数值的比值大于第二默认值,则判断三倍频点平均 幅度参数值与基频点平均幅度参数值的差值是否大于第三默认值。
当 i"(2f ')和 ι·(Μ')的比值大于第二默认值 A ,则判断 3f')和 ^f')的差是否 大于第三默认值 , 该第三默认值 可根据经验设置为 0.6。
步骤 403、 如果三倍频点平均幅度参数值与基频点平均幅度参数值的差值 大于第三默认值, 则确定三倍频为所需要的精细基音频率。
当同时满足上述三个条件时, 则可以判断在基频、 二倍频和三倍频中, 三 倍频为精细基音频率, 根据该精细基音频率可确定所需要精细基音周期。
如果三倍频不是所需要的精细基音频率,则根据频点平均幅度与频点幅度 的比值参数值和平均幅度参数值对二倍频检测, 如图 5所示, 包括:
步骤 500、 判断基频点平均幅度与频点幅度的比值参数值和二倍频点平均 幅度与频点幅度的比值参数值的比值是否大于第七默认值。
与检测基音周期三倍错误类似, 判断1 "(f')和1 "(2f')比值是否大于 该第 七默认值 可根据经验设置为 1.22。
步骤 501、 如果基频点平均幅度与频点幅度的比值参数值和二倍频点平均 幅度与频点幅度的比值参数值的比值大于第七默认值,则判断三倍频点平均幅 度与频点幅度的比值参数值和二倍频点平均幅度与频点幅度的比值参数值的 比值是否大于第八默认值。
当 r(f')和 r(2f')的比值大于第七默认值 A时, 则继续判断 r(M')和 r(2f ')的 比值是否大于第八默认值 ^ , 该第八默认值 可根据经验设置为 1.22。
步骤 502、 如果三倍频点平均幅度与频点幅度的比值参数值和二倍频点平 均幅度与频点幅度的比值参数值的比值大于第八默认值,则进一步判断二倍频 点平均幅度参数值与基频点平均幅度参数值的差值是否大于第九默认值。
当 1· (3 f ')和 r (2 f ')的比值大于第八默认值 ,则继续判断 f ')和 f ')的差 是否大于第九默认值 , 该第九默认值 可根据经验设置为 0.4。 步骤 503、 如果二倍频点平均幅度参数值与基频点平均幅度参数值的差值 大于第九默认值, 则确定二倍频为所需要的精细基音频率。
当同时满足上述三个条件时, 则可以判断在基频、 二倍频和三倍频中, 二 倍频为精细基音频率, 根据该精细基音频率可确定所需要精细基音周期。
实施例 2
在进行倍频检测时,还可以根据频点平均幅度与频点幅度的比值参数值和 緩存中所存储的当前帧之前倍频的判断结果进行判断, 如图 6所示, 对三倍频 检测, 包括:
步骤 600、 判断基频点平均幅度与频点幅度的比值参数值和三倍频点平均 幅度与频点幅度的比值参数值的比值是否大于第四默认值。
判断 r ( f ')和 r (M ')比值是否大于 4 , 该第四默认值 4可根据经验设置为
1.05。
步骤 601、 如果基频点平均幅度与频点幅度的比值参数值和三倍频点平均 幅度与频点幅度的比值参数值的比值大于第四默认值,则判断二倍频点平均幅 度与频点幅度的比值参数值和三倍频点平均幅度与频点幅度的比值参数值的 比值是否大于第五默认值。
当 r ( f ')和 r 0 f ')的比值大于第四默认值 S3时, 则判断 r (2 f ')和 r 0 f ')的比值 是否大于第五默认值 ^ , 该第五默认值 ^可根据经验设置为 1.05。
步骤 602、 如果二倍频点平均幅度与频点幅度的比值参数值和三倍频点平 均幅度与频点幅度的比值参数值的比值大于第五默认值,则判断前一帧是否发 生基音周期三倍错误。
当二倍频点平均幅度与频点幅度的比值参数值和三倍频点平均幅度与频 点幅度的比值参数值的比值大于第五默认值 ^时,则根据緩存中所存储的前一 帧标记, 判断前一帧是否已经发生三倍周期加倍错误。
步骤 603、 如果前一帧发生基音周期三倍错误, 则判断当前帧之前发生基 音周期三倍的次数是否大于第六默认值。
当确定前一帧中已发生三倍周期加倍错误,则进一步判断在当前帧之前发 生基音周期三倍错误的次数是否大于第六默认值 Cl。如对当前帧的前 10帧进行 判断, 连续发生基音周期三倍错误的次数是否大于第六默认值 ci。 该第六默认 值 ei , 如果是根据整帧进行判断, 则可设定为 3 , 如果根据半帧判断, 则可设 定为 6。
步骤 604、如果当前帧之前发生基音周期三倍错误的次数大于第六默认值, 则确定三倍频为所需要的精细基音周期。
3 f'频点所在帧的前一帧已发生基音周期三倍错误, 并且 3 f'频点所在帧 的前 10帧中, 緩存中记录连续发生了 3次基音周期三倍错误, 则确定发生基音 周期三倍错误, 真实基音频率出现在 3 f'附近, 3 f'为所需要的精细基音频率。
如果三倍频不是所需要的精细基音频率,则根据频点平均幅度与频点幅度 的比值参数值和緩存数据对二倍频检测, 如图 7所示, 包括:
步骤 700、 判断基频点平均幅度与频点幅度的比值参数值和二倍频点平均 幅度与频点幅度的比值参数值的比值是否大于第十默认值。
判断 r ( f ')和 r (2 f ')比值是否大于 , 该第十默认值 可根据经验设置为
1.05。
步骤 701、 如果基频点平均幅度与频点幅度的比值参数值和二倍频点平均 幅度与频点幅度的比值参数值的比值大于第十默认值,则判断三倍频点平均幅 度与频点幅度的比值参数值和二倍频点平均幅度与频点幅度的比值参数值的 比值是否大于第十一默认值。
当 r ( f ')和 r (2 Γ)的比值大于第十默认值 δ4时, 则判断 r (3 f ')和 r (2 Γ)的比值 是否大于第十一默认值 , 该第十一默认值 可根据经验设置为 1.05。
步骤 702、 如果三倍频点平均幅度与频点幅度的比值参数值和二倍频点平 均幅度与频点幅度的比值参数值的比值大于第十一默认值,则判断前一帧是否 发生基音周期二倍错误
当三倍频点平均幅度与频点幅度的比值参数值和二倍频点平均幅度与频 点幅度的比值参数值的比值大于第十一默认值 A时,则根据緩存中所存储的前 一帧标记, 判断前一帧是否已经发生二倍周期加倍错误。
步骤 703、 如果前一帧发生基音周期二倍错误, 则判断当前帧之前发生基 音周期二倍的次数是否大于第十二默认值。
当确定前一帧中已发生三倍周期加倍错误,则进一步判断在当前帧之前发 生基音周期二倍错误的次数是否大于第十二默认值。如对当前帧的前 10帧进行 判断, 连续发生基音周期二倍错误的次数是否大于第十二默认值 该第十二 默认值 2 , 如果是根据整帧进行判断, 则可设定为 3 , 如果根据半帧判断, 则 可设定为 6。
步骤 704、 如果当前帧之前发生基音周期二倍的次数大于第十二默认值, 则确定二倍频为所需要检测的精细基音频率。
2 f'频点所在帧的前一帧已发生基音周期二倍错误,并且 2 f'频点所在帧 的前 10帧中, 緩存中记录连续发生了 3次基音周期二倍错误, 则确定发生基音 周期二倍错误, 真实基音频率发生在 2 f'附近, 2 f'为所需要的精细基音频率。
当倍频检测完成后, 将检测结果保存到緩存的前一帧标记中,例如当判断 当前帧发生基音周期二倍错误时,则在前一帧标记中记录发生了基音周期二倍 错误, 并记录连续发生的次数, 用于对下一帧数据的检测。
实施例 3
在对基音周期进行倍频检测时,如实施例 1和实施例 2所述, 可根据频点的 平均幅度与频点幅度的比值参数值和平均幅度参数值进行判断和根据频点平 均幅度与频点幅度的比值参数值和緩存数据进行判断两种方式对精细基音频 率进行判断。在实际中进行判断时,根据或逻辑对两种判断方式的判断条件进 行组合, 当满足一种方式的判断条件时,便可以确定该频点为所需要的精细基 音频率。
例如,对基音周期三倍错误进行判断时, 只要满足根据频点平均幅度与频 点幅度的比值参数值和平均幅度参数值进行判断的判断条件,便可以确定该三 倍频为所需要的精细基音频率,或者只要满足根据平均幅度与频点幅度的比值 参数值和緩存中所存储的当前帧之前倍频的判断结果进行判断的判断条件,也 可以确定该三倍频为所需要的精细基音频率。
实施例 4
为使倍频检测更加精确, 则需要获取频域中的高密度幅度谱, 例如在原有 的幅度谱中存在 256个频点, 在各个频点间插入频点可获取所述幅度谱的高密 度幅度谱。
在步骤 303后, 根据已获取的幅度谱进行插值, 如图 8所示, 包括: 步骤 800、 为所述频谱的幅度谱进行插值, 获取所述语音信号的高密度幅 度谱。
根据插值算法在频域内已有的频点间进行插值, 在本发明中采用三次 B样 条插值, 即在原 K个频点的基础上, 扩充至 mK个频点, m为正整数。 因三次 B 样条插值在边界处有一定的偏差, 为降低此误差, 在进行插值前, 人为地在数 据两端扩充一些伪数据, 即对幅度谱进行 L点扩展, 使边界条件不会影响实际 数据的插值精度。 所扩展的值分别等于频谱两端点的值, 扩展后幅度谱为: S (0), ... , S (0), { S (k), ke[0,k-l]},S(k-l),...,S(k-l)
L L 所述三次 B样条插值函数为:
Figure imgf000013_0001
其中, f(x)表示待插入频点的幅度, k的取值为整数, (X)为三次 B样条 基函数, 其表达式为:
'2/3-1 xl2 + lxl3/2, 0≤lxkl
(2-1 x I3)/ 6, l≤lxl<2
0, lxl≥2 c(k)是三次 B样条插值系数, 定义 c— (k) = c(k)/6, 对于给定的 K维输入矢量 y= {y(0),...,y(K-l)}? c-(k)可通过一下两个公式的递归方程求得:
c+(k) = y(k) + ac+(k-l) k = l,2,3 .,K- 1, 相当于一个因果滤波器。
c-(k) = a(c-(k + l)-c+(k)) k=K- 2,K- 3.K- 4,...,0,相当于一个非因果滤波器。 其中, a = V^-2, 这两个递归方程的初始值 c+(0)和 c— (K-1)分别为:
k
c+(0) =∑y(k)ak
k=0
c (K-l) =—— r(c+(K-l) + ac+(K-2))
1-a 其中, k^logA/logla l, A是为满足精度要求而设的常数。 最后, 将求解 的三次 B样条插值系数 c(k)带入公式 c^y + adk-1) k = 12,3, ······,K-1, 可 获得待插值序列, 插值后的幅度谱为: S'W , i = 0,l,2,—,mK - 1。
步骤 801、 根据当前帧和前一帧对高密度幅度谱进行加权处理, 使高密度 谱平滑。
完成插值后, 为降低该高密度幅度谱的跳变, 对其进行平滑处理, 平滑后 的高密度频谱函数为:
S(i)=^S'[-1](i) + (l-^)S'[0](i), i = 0,l,2,...,mK -l, 0< ≤1 , 其中 S'[- 1](i)为前一†贞的 高密度频谱,通过 ^设置 S'[1]«和8'[°]«在 )中所占的比例,例如可设置为 0.4。
为所需要的高密度幅度谱, 根据该高密度幅度谱, 对精细基音频率进 行检测。
获得平滑后的高密度幅度谱后,对精细基音周期进行检测。在检测过程中, 因为增加了频点的数量, 提高了平均幅度 的精确度, 降低了频点幅度值跳 变给检测带来的影响。 所述检测步骤与实施例 1和实施例 2相同, 不再赘述。
实施例 5
除可以对幅度谱进行三次 B样条插值外, 还可以在时域对该语音信号进行 补零插值, 如图 9所示, 包括:
步骤 900、 对所述语音信号进行尾部补零插值后转换到频域, 获得该语音 信号的高密度幅度谱。
在该语音信号的尾部补充幅值为零的点, 将补零后的语音信号转换到频 域,通过时频变换,将原有语音信号中的频点以及尾部所补充的幅值为零的点 转换到频域, 即可以在原有频域的幅度谱的频点之间, 插入频点。
在时域到频域的转换过程中,幅度谱中原有频点的幅值不受到所补零点的 影响, 即在幅度谱中保持原有的频点以及该频点所对应的幅度值,从而获得了 所述时域信号在频域内所对应的高密度幅度谱。
步骤 901、 根据当前帧和前一帧对高密度幅度谱进行加权处理, 使高密度 幅度谱平滑。
完成时频变换, 获得所需要的高密度幅度谱后, 为降低该高密度幅度谱的 跳变, 对其进行平滑处理, 平滑后的高密度幅度谱函数为:
S(i)=^S'[-1](i) + (l-^)S'[0](i), i = 0,...,mK -l, 0< ≤1 , 其中 S'[- 1](i)为前一帧的高 密度幅度谱,通过 设置 s'[1]«和8'[°]«在 )中所占的比例,例如可设置为 0.4。 为所需要的高密度幅度谱, 根据该高密度幅度谱, 对精细基音频率进 行检测。
获得平滑后的高密度幅度谱后,对精细基音周期进行检测。在检测过程中, 因为增加了频点的数量, 提高了平均幅度 的精确度, 降低了频点幅度值跳 变给检测带来的影响。 所述检测步骤与实施例 1和实施例 2相同, 不再赘述。
实施例 6
当对高密度幅度谱进行倍频检测时,所获得的精细基音频率为初始基音频 率的倍数,搜索范围只在基频、二倍频和三倍频位置,未对所有频域进行检测, 不够精确。 为得到精度更高的精细基音周期,在获取语音信号的高密度幅度谱 后,还可以对该高密度幅度谱进行幅度的峰值搜索, 并根据所对应的特征参数 确定精细基音周期。
所述根据初始基音周期和特征参数进行精细基音周期检测,得到精细基音 周期, 如图 10所示, 还包括:
步骤 1000、在所述高密度幅度谱中,对基频点和各倍频点附近一定范围内 的幅度值进行比较, 确定基频点和各倍频点附近一定范围内的峰值位置。
对频谱的幅度谱进行插值后, 获取高密度幅度谱, 在该高密度幅度谱中, 在基频点和各倍频点附近一定范围内, 例如以基频点 f '为中心的 2 f '_2的范围 内, 进行幅度值的峰值搜索,确定基频点和各倍频点附近一定范围内的峰值位 置, 其中, 基频点和每一个倍频点分别对应一个峰值位置。 另可以求出基频点 和各倍频点所对应的幅度的峰值。
步骤 1001、判断基频点和各倍频点中是否存在一频点的平均幅度与频点幅 度的比值参数值和其他频点的平均幅度与频点幅度的比值参数值的比值都大 于第十三默认值, 该一频点称为目标频点。
根据基频点和各倍频点的平均幅度与频点幅度的比值参数值进行比较,确 定一个频点的平均幅度与频点幅度的比值参数值与其他所有频点的平均幅度 与频点幅度的比值参数值的比值都大于第十三默认值 3 , 该第十三默认值 δ可 根据经验进行设置, 例如设置为 1.22。
步骤 1002、如果基频点和各倍频点中存在一频点的平均幅度与频点幅度的 比值参数值和其他频点的平均幅度与频点幅度的比值参数值的比值都大于第 十三默认值,则判断所述目标频点到该目标频点所对应的峰值位置的距离是否 小于其他频点到所对应的峰值位置的距离。
当基频点和各倍频点中存在一频点的平均幅度与频点幅度的比值参数值 和其他频点的平均幅度与频点幅度的比值参数值的比值都大于第十三默认值 时, 则判断所述目标频点到该目标频点所对应的峰值位置的距离是否小于其 他频点到所对应的峰值位置的距离,即确定该目标频点到所对应的峰值位置的 距离是否为所有频点中到所对应的峰值位置的距离最小。
步骤 1003、如果所述目标频点到该目标频点所对应的峰值位置的距离小于 其他频点到所对应的峰值位置的距离,则确定该目标频点所对应的周期为精细 基音周期。
如果满足上述两个条件, 则可以确定该目标频点为所需要的精细基音频 率。 对该精细基音频率进行倒数运算, 得到精细基音周期。
实施例 7
如实施例 1、 实施例 2和实施例 6所述, 当对高密度幅度谱中进行倍频检测 时, 所确定的精细基音频率为基频或各倍频点, 精确度相对较低。 当需要更高 精确度的精细基音周期时, 则可以根据实施例 1、 实施例 2和实施例 6所检测出 来的频点做进一步的搜索。
所述对基音周期加倍错误的检测步骤与实施例 1、实施例 2和实施例 6相同, 不再赘述。
完成检测后, 确定一倍频点, 如系数为整数倍的三倍频点 3 f '。 设置以该 三倍频点 3 为中心, 在其周围一定范围内 (如二倍频点 2 和四倍频点 4 f '之 间 2 f ' - 2 ),对该高密度频谱进行峰值搜索。 当所确定的倍频点的系数为分数倍 的二分之一倍频点 f ' 时, 可以设定峰值搜索范围为以 f ' 为中心 U ( k为 所搜索的频点的频率 )范围内的峰值, 则最终可确定该峰值位置为所述的精细 基音频率, 对该精细基音频率做倒数运算, 可确定所需要的精细基音周期。
在该范围内所获得的峰值所对应的频点为所需要的精细基音频率。
与上述一种基音检测方法相对应, 本发明还提供了一种基音检测装置。 一种基音检测的装置, 如图 11所示, 包括:
初始基音周期获取模块: 用于在时域对所述语音信号进行基音检测,得到 初始基音周期;
时频转换模块: 用于将所述语音信号转换到频域, 获得语音信号的频谱, 该频谱包括频谱的幅度谱;
特征参数提取模块:用于根据初始基音周期和所述语音信号的频谱提取特 征参数;
精细基音周期获取模块:用于根据初始基音周期和特征参数进行精细基音 周期检测, 得到精细基音周期。
所述特征参数, 包括: 平均幅度参数, 平均幅度与频点幅度的比值参数, 峰值位置参数。
所述精细基音周期获取模块, 还包括:
倍频检测模块: 用于对基频点和倍频点的特征参数进行比较,确定精细基 音频率。
所述倍频检测模块, 还包括:
峰值搜索模块: 用于在精细基音频率附近一定范围内搜索幅度的峰值,对 该峰值所对应的频点进行倒数运算, 获取精细基音周期。
所述的一种基音检测的装置, 还包括:
预处理模块: 用于对所述语音信号进行预处理;
加窗模块: 用于为所述预处理后的帧信号加分析窗。
所述时频转换模块, 如图 12所示, 还包括:
频谱系数获取模块: 用于对所述加分析窗后的语音信号进行频域变换,得 到频谱系数;
能量谱获耳 W莫块: 用于根据频谱系数, 计算出能量谱。
所述的一种基音检测的装置, 还包括:
能量谱平滑模块: 用于根据当前帧和前一帧对能量谱进行加权处理,使能 量谱平滑。
所述的一种基音检测的装置, 还包括:
幅度谱获取模块: 用于根据能量谱, 计算出频谱的幅度谱。
所述的一种基音检测的装置, 还包括:
幅度谱插值模块: 用于为所述频谱的幅度谱进行插值,获取所述语音信号 的高密度幅度谱。
所述时频转换模块, 如图 13所示, 还包括:
语音信号插值模块: 用于对所述语音信号进行尾部补零插值后转换到频 域, 获得该语音信号的高密度幅度谱。
所述的一种基音检测的装置, 还包括:
高密度幅度谱平滑模块:用于根据当前帧和前一帧对高密度幅度谱进行加 权处理, 使高密度幅度谱平滑。
本发明实施例提供的一种基音检测的方法和装置,根据在时域上获取的初 始基音周期和频域中所提取的特征参数对基音周期进行检测,避免基音周期加 倍错误的出现, 提高了基音周期检测的精确度。
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于 此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内, 可轻易想到 变化或替换, 都应涵盖在本发明的保护范围之内。 因此, 本发明的保护范围应 所述以权利要求的保护范围为准。

Claims

权 利 要 求
1、 一种基音检测的方法, 其特征在于, 包括:
在时域对所述语音信号进行基音检测, 得到初始基音周期;
将所述语音信号转换到频域, 获得语音信号的频谱, 该频谱包括频谱的幅 度谱;
根据初始基音周期和所述语音信号的频谱提取特征参数;
根据初始基音周期和特征参数进行精细基音周期检测, 得到精细基音周 期。
2、 根据权利要求 1所述的一种基音检测的方法, 其特征在于, 所述特征 参数, 包括:平均幅度参数,平均幅度与频点幅度的比值参数和峰值位置参数。
3、 根据权利要求 1所述的一种基音检测的方法, 其特征在于, 所述根据 初始基音周期和特征参数进行精细基音周期检测,得到精细基音周期,还包括: 根据平均幅度与频点幅度的比值参数值大小和平均幅度参数值大小进行判断 或者根据平均幅度与频点幅度的比值参数值大小和緩存中所存储的当前帧之 前倍频的判断结果进行判断。
4、 根据权利要求 3所述的一种基音检测的方法, 其特征在于, 所述根据 平均幅度与频点幅度的比值参数值大小和平均幅度参数值大小进行判断, 包 括:
判断基频点平均幅度与频点幅度的比值参数值和三倍频点平均幅度与频 点幅度的比值参数值的比值是否大于第一默认值;
如果基频点平均幅度与频点幅度的比值参数值和三倍频点平均幅度与频 点幅度的比值参数值的比值大于第一默认值,则判断二倍频点平均幅度与频点 幅度的比值参数值和三倍频点平均幅度与频点幅度的比值参数值的比值是否 大于第二默认值;
如果二倍频点平均幅度与频点幅度的比值参数值和三倍频点平均幅度与 频点幅度的比值参数值的比值大于第二默认值,则判断三倍频点平均幅度参数 值与基频点平均幅度参数值的差值是否大于第三默认值;
如果三倍频点平均幅度参数值与基频点平均幅度参数值的差值大于第三 默认值, 则确定三倍频为所需要的精细基音频率。
5、 根据权利要求 3所述的一种基音检测的方法, 其特征在于, 所述根据 平均幅度与频点幅度的比值参数值大小和緩存中所存储的当前帧之前倍频的 判断结果进行判断, 包括:
判断基频点平均幅度与频点幅度的比值参数值和三倍频点平均幅度与频 点幅度的比值参数值的比值是否大于第四默认值;
如果基频点平均幅度与频点幅度的比值参数值和三倍频点平均幅度与频 点幅度的比值参数值的比值大于第四默认值,则判断二倍频点平均幅度与频点 幅度的比值参数值和三倍频点平均幅度与频点幅度的比值参数值的比值是否 大于第五默认值;
如果二倍频点平均幅度与频点幅度的比值参数值和三倍频点平均幅度与 频点幅度的比值参数值的比值大于第五默认值,则判断前一帧是否发生基音周 期三倍错误;
如果前一帧发生基音周期三倍错误,则判断当前帧之前发生基音周期三倍 错误的次数是否大于第六默认值;
如果当前帧之前发生基音周期三倍错误的次数大于第六默认值,则确定三 倍频为所需要的精细基音周期。
6、 根据权利要求 3所述的一种基音检测的方法, 其特征在于, 所述根据 平均幅度与频点幅度的比值参数值大小和平均幅度参数值大小进行判断,还包 括:
判断基频点平均幅度与频点幅度的比值参数值和二倍频点平均幅度与频 点幅度的比值参数值的比值是否大于第七默认值;
如果基频点平均幅度与频点幅度的比值参数值和二倍频点平均幅度与频 点幅度的比值参数值的比值大于第七默认值,则判断三倍频点平均幅度与频点 幅度的比值参数值和二倍频点平均幅度与频点幅度的比值参数值的比值是否 大于第八默认值;
如果三倍频点平均幅度与频点幅度的比值参数值和二倍频点平均幅度与 频点幅度的比值参数值的比值大于第八默认值,则判断二倍频点平均幅度参数 值与基频点平均幅度参数值的差值是否大于第九默认值;
如果二倍频点平均幅度参数值与基频点平均幅度参数值的差值大于第九 默认值, 则确定二倍频为所需要的精细基音频率。
7、 根据权利要求 3所述的一种基音检测的方法, 其特征在于, 所述根据 平均幅度与频点幅度的比值参数值大小和緩存中所存储的当前帧之前倍频的 判断结果进行判断, 还包括:
判断基频点平均幅度与频点幅度的比值参数值和二倍频点平均幅度与频 点幅度的比值参数值的比值是否大于第十默认值;
如果基频点平均幅度与频点幅度的比值参数值和二倍频点平均幅度与频 点幅度的比值参数值的比值大于第十默认值,则判断三倍频点平均幅度与频点 幅度的比值参数值和二倍频点平均幅度与频点幅度的比值参数值的比值是否 大于第十一默认值;
如果三倍频点平均幅度与频点幅度的比值参数值和二倍频点平均幅度与 频点幅度的比值参数值的比值大于第十一默认值,则判断前一帧是否发生基音 周期二倍错误;
如果前一帧发生基音周期二倍错误,则判断当前帧之前发生基音周期二倍 错误的次数是否大于第十二默认值;
如果当前帧的之前发生基音周期二倍错误的次数大于第十二默认值,则确 定二倍频为所需要检测的精细基音频率。
8、 根据权利要求 1所述的一种基音检测的方法, 其特征在于, 在所述根 据初始基音周期和所述语音信号的频谱提取特征参数之前, 包括:
为所述频谱的幅度谱进行插值, 获取所述语音信号的高密度幅度谱。
9、 根据权利要求 8所述的一种基音检测的方法, 其特征在于, 所述插值, 包括: 三次 B样条插值 f (x) = y C(k)^3(x-k)
^ , 其中 f (x)为待插值信号, c(k)为三次 B样插值系数, 3(x)为三次 B样条基函数。
10、 根据权利要求 9所述的一种基音检测的方法, 其特征在于, 在所述三 次 B样条插值之前, 还包括:
在幅度谱前后端点分别插入 L个扩展点, 该扩展点的值分别等于前后端点 的值。
11、 根据权利要求 1所述的一种基音检测的方法, 其特征在于, 在将所述 语音信号转换到频域, 获得语音信号的频谱, 该频谱包括频谱的幅度谱, 还包 括: 对所述语音信号进行尾部补零后转换到频域,获得该语音信号的高密度幅 度谱。
12、 根据权利要求 8或 11所述的一种基音检测的方法, 其特征在于, 获 取所述语音信号的高密度幅度谱之后, 包括:
根据当前帧和前一帧对高密度幅度谱进行加权处理, 使高密度幅度谱平 滑。
13、 根据权利要求 12所述的一种基音检测的方法, 其特征在于, 所述根 据初始基音周期和特征参数进行精细基音周期检测,得到精细基音周期,还包 括:
在所述高密度幅度谱中,对基频点和各倍频点附近一定范围内的幅度值进 行比较, 确定基频点和各倍频点附近一定范围内的峰值位置;
判断基频点和各倍频点中是否存在一频点的平均幅度与频点幅度的比值 参数值和其他频点的平均幅度与频点幅度的比值参数值的比值都大于第十三 默认值, 该一频点称为目标频点;
如果基频点和各倍频点中存在一频点的平均幅度与频点幅度的比值参数 值和其他频点的平均幅度与频点幅度的比值参数值的比值都大于第十三默认 值,则判断所述目标频点到该目标频点所对应的峰值位置的距离是否小于其他 频点到所对应的峰值位置的距离;
如果所述目标频点到该目标频点所对应的峰值位置的距离小于其他频点 到所对应的峰值位置的距离, 则确定该目标频点所对应的周期为精细基音周 期。
14、 根据权利要求 1所述的一种基音检测的方法, 其特征在于, 所述根据 初始基音周期和特征参数进行精细基音周期检测,得到精细基音周期,还包括: 在精细基音频率附近一定范围内搜索幅度的峰值,对该峰值所对应的频点 进行倒数运算, 获取精细基音周期。
15、 根据权利要求 1所述的一种基音检测的方法, 其特征在于, 在所述将 所述语音信号转换到频域, 获得语音信号的频谱之前, 包括:
对所述语音信号进行预处理;
为所述预处理后的帧信号加分析窗。
16、 根据权利要求 15所述的一种基音检测的方法, 其特征在于, 所述将 所述语音信号转换到频域, 包括:
对所述加分析窗后的语音信号进行频域变换, 得到频谱系数;
根据频谱系数, 计算出能量谱。
17、 根据权利要求 16所述的一种基音检测的方法, 其特征在于, 在所述 根据能量谱, 计算出幅度谱之前, 包括:
根据当前帧和前一帧对能量谱进行加权处理, 使能量谱平滑。
18、 根据权利要求 17所述的一种基音检测的方法, 其特征在于, 对所属 能量谱进行平滑处理, 得到平滑能量谱之后, 包括:
根据能量谱, 计算出频谱的幅度谱
S(k) = , ^log10 (VTTE(k) ), k = 0,... , K -l ? 其中 S(k)为幅度谱函数。
19、 一种基音检测的装置, 其特征在于, 包括:
初始基音周期获取模块: 用于在时域对所述语音信号进行基音检测,得到 初始基音周期;
时频转换模块: 用于将所述语音信号转换到频域, 获得语音信号的频谱, 该频谱包括频谱的幅度谱;
特征参数提取模块:用于根据初始基音周期和所述语音信号的频谱提取特 征参数;
精细基音周期获取模块:用于根据初始基音周期和特征参数进行精细基音 周期检测, 得到精细基音周期。
20、 根据权利要求 19所述的一种基音检测的装置, 其特征在于, 所述特 征参数, 包括: 平均幅度参数, 平均幅度与频点幅度的比值参数, 峰值位置参 数。
21、 根据权利要求 19所述的一种基音检测的装置, 其特征在于, 所述精 细基音周期获取模块, 还包括:
倍频检测模块: 用于对基频点和倍频点的特征参数进行比较,确定精细基 音频率, 并对精细基音频率进行倒数运算, 获取精细基音周期。
22、 根据权利要求 19所述的一种基音检测的装置, 其特征在于, 所述倍 频检测模块, 还包括:
峰值搜索模块: 用于在精细基音频率附近一定范围内搜索幅度的峰值,对 该峰值所对应的频点进行倒数运算, 获取精细基音周期。
23、 根据权利要求 19所述的一种基音检测的装置, 其特征在于, 包括: 预处理模块: 用于对所述语音信号进行预处理;
加窗模块: 用于为所述预处理后的帧信号加分析窗。
24、 根据权利要求 19所述的一种基音检测的装置, 其特征在于, 所述时 频转换模块, 还包括:
频谱系数获取模块: 用于对所述加分析窗后的语音信号进行频域变换,得 到频谱系数;
能量谱获取模块: 用于根据频谱系数, 计算出能量谱。
25、 根据权利要求 24所述的一种基音检测的装置, 其特征在于, 还包括: 能量谱平滑模块: 用于根据当前帧和前一帧对能量谱进行加权处理,使能 量谱平滑。
26、 根据权利要求 25所述的一种基音检测的装置, 其特征在于, 还包括: 幅度谱获取模块: 用于根据能量谱, 计算出频谱的幅度谱。
27、 根据权利要求 26所述的一种基音检测的装置, 其特征在于, 还包括: 幅度谱插值模块: 用于为所述频谱的幅度谱进行插值, 获取所述语音信号 的高密度幅度谱。
28、 根据权利要求 19所述的一种基音检测的装置, 其特征在于, 所述时 频转换模块, 还包括:
语音信号插值模块: 用于对所述语音信号进行尾部补零插值后转换到频 域, 获得该语音信号的高密度幅度谱。
29、 根据权利要求 27或 28所述的一种基音检测的装置, 其特征在于, 还 包括:
高密度幅度谱平滑模块:用于根据当前帧和前一帧对高密度幅度谱进行加 权处理, 使高密度幅度谱平滑。
PCT/CN2012/077456 2011-06-22 2012-06-25 一种基音检测的方法和装置 WO2012175054A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
EP12802425.4A EP2662854A1 (en) 2011-06-22 2012-06-25 Method and device for detecting fundamental tone
JP2013556963A JP2014507689A (ja) 2011-06-22 2012-06-25 ピッチ検出方法及び装置
KR1020137021767A KR20130117855A (ko) 2011-06-22 2012-06-25 피치 검출 방법 및 장치
US14/136,130 US20140142931A1 (en) 2011-06-22 2013-12-20 Pitch detection method and apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201110170075.0A CN102842305B (zh) 2011-06-22 2011-06-22 一种基音检测的方法和装置
CN201110170075.0 2011-06-22

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US14/136,130 Continuation US20140142931A1 (en) 2011-06-22 2013-12-20 Pitch detection method and apparatus

Publications (1)

Publication Number Publication Date
WO2012175054A1 true WO2012175054A1 (zh) 2012-12-27

Family

ID=47369591

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2012/077456 WO2012175054A1 (zh) 2011-06-22 2012-06-25 一种基音检测的方法和装置

Country Status (6)

Country Link
US (1) US20140142931A1 (zh)
EP (1) EP2662854A1 (zh)
JP (1) JP2014507689A (zh)
KR (1) KR20130117855A (zh)
CN (1) CN102842305B (zh)
WO (1) WO2012175054A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110728990A (zh) * 2019-09-24 2020-01-24 维沃移动通信有限公司 基音检测方法、装置、终端设备和介质

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103426441B (zh) 2012-05-18 2016-03-02 华为技术有限公司 检测基音周期的正确性的方法和装置
CN103915099B (zh) * 2012-12-29 2016-12-28 北京百度网讯科技有限公司 语音基音周期检测方法和装置
CN105338148B (zh) * 2014-07-18 2018-11-06 华为技术有限公司 一种根据频域能量对音频信号进行检测的方法和装置
CN105448297A (zh) * 2014-08-28 2016-03-30 中国移动通信集团公司 一种获取基音周期的方法及装置
CN104599682A (zh) * 2015-01-13 2015-05-06 清华大学 电话线质量语音的基音周期提取方法
JP6904198B2 (ja) * 2017-09-25 2021-07-14 富士通株式会社 音声処理プログラム、音声処理方法および音声処理装置
CN109243479B (zh) * 2018-09-20 2022-06-28 广州酷狗计算机科技有限公司 音频信号处理方法、装置、电子设备及存储介质
CN110176242A (zh) * 2019-07-10 2019-08-27 广州荔支网络技术有限公司 一种音色的识别方法、装置、计算机设备和存储介质
CN110379438B (zh) * 2019-07-24 2020-05-12 山东省计算中心(国家超级计算济南中心) 一种语音信号基频检测与提取方法及系统
CN110853671B (zh) * 2019-10-31 2022-05-06 普联技术有限公司 一种音频特征提取方法和装置、训练方法及音频分类方法
CN111223491B (zh) * 2020-01-22 2022-11-15 深圳市倍轻松科技股份有限公司 一种提取音乐信号主旋律的方法、装置及终端设备
CN113096670B (zh) * 2021-03-30 2024-05-14 北京字节跳动网络技术有限公司 音频数据的处理方法、装置、设备及存储介质
CN113113052B (zh) * 2021-04-08 2024-04-05 深圳市品索科技有限公司 一种离散点的语音基音识别装置及计算机存储介质
CN114299994B (zh) * 2022-01-04 2024-06-18 中南大学 激光多普勒远距离侦听语音的爆音检测方法、设备及介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4696038A (en) * 1983-04-13 1987-09-22 Texas Instruments Incorporated Voice messaging system with unified pitch and voice tracking
CN1342968A (zh) * 2000-09-13 2002-04-03 中国科学院自动化研究所 用于语音识别的高精度高分辨率基频提取方法
US20060080088A1 (en) * 2004-10-12 2006-04-13 Samsung Electronics Co., Ltd. Method and apparatus for estimating pitch of signal
CN1826632A (zh) * 2003-03-31 2006-08-30 国际商业机器公司 用于语音信号的组合频域和时域音高提取的系统和方法
CN101325631A (zh) * 2007-06-14 2008-12-17 华为技术有限公司 一种实现丢包隐藏的方法和装置
CN102016530A (zh) * 2009-02-13 2011-04-13 华为技术有限公司 一种基音周期检测方法和装置

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
NL8400552A (nl) * 1984-02-22 1985-09-16 Philips Nv Systeem voor het analyseren van menselijke spraak.
JP4502246B2 (ja) * 2003-04-24 2010-07-14 株式会社河合楽器製作所 音程判定装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4696038A (en) * 1983-04-13 1987-09-22 Texas Instruments Incorporated Voice messaging system with unified pitch and voice tracking
CN1342968A (zh) * 2000-09-13 2002-04-03 中国科学院自动化研究所 用于语音识别的高精度高分辨率基频提取方法
CN1826632A (zh) * 2003-03-31 2006-08-30 国际商业机器公司 用于语音信号的组合频域和时域音高提取的系统和方法
US20060080088A1 (en) * 2004-10-12 2006-04-13 Samsung Electronics Co., Ltd. Method and apparatus for estimating pitch of signal
CN101325631A (zh) * 2007-06-14 2008-12-17 华为技术有限公司 一种实现丢包隐藏的方法和装置
CN102016530A (zh) * 2009-02-13 2011-04-13 华为技术有限公司 一种基音周期检测方法和装置

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110728990A (zh) * 2019-09-24 2020-01-24 维沃移动通信有限公司 基音检测方法、装置、终端设备和介质

Also Published As

Publication number Publication date
JP2014507689A (ja) 2014-03-27
US20140142931A1 (en) 2014-05-22
CN102842305B (zh) 2014-06-25
EP2662854A1 (en) 2013-11-13
CN102842305A (zh) 2012-12-26
KR20130117855A (ko) 2013-10-28

Similar Documents

Publication Publication Date Title
WO2012175054A1 (zh) 一种基音检测的方法和装置
CN107731223B (zh) 语音活性检测方法、相关装置和设备
JP5763212B2 (ja) 制約付きのラウドスピーカ・エクスカーションを用いたラウドネスの最大化
CN103650040B (zh) 使用多特征建模分析语音/噪声可能性的噪声抑制方法和装置
CN111128213B (zh) 一种分频段进行处理的噪声抑制方法及其系统
CN109147763B (zh) 一种基于神经网络和逆熵加权的音视频关键词识别方法和装置
WO2013142652A2 (en) Harmonicity estimation, audio classification, pitch determination and noise estimation
JP6272433B2 (ja) ピッチ周期の正確性を検出するための方法および装置
US8503694B2 (en) Sound capture system for devices with two microphones
WO2011044795A1 (zh) 一种音频信号检测方法和装置
CN112399247A (zh) 一种音频处理方法、音频处理设备及可读存储介质
US9754606B2 (en) Processing apparatus, processing method, program, computer readable information recording medium and processing system
CN112102851A (zh) 语音端点检测方法、装置、设备及计算机可读存储介质
WO2021007841A1 (zh) 噪声估计方法、噪声估计装置、语音处理芯片以及电子设备
US20070011001A1 (en) Apparatus for predicting the spectral information of voice signals and a method therefor
EP1239458A2 (en) Voice recognition system, standard pattern preparation system and corresponding methods
Sun et al. An adaptive speech endpoint detection method in low SNR environments
US20140140519A1 (en) Sound processing device, sound processing method, and program
CN117727311B (zh) 音频处理方法及装置、电子设备及计算机可读存储介质
CN118430566B (zh) 一种语音通联方法及系统
TWI225637B (en) Method for calculation a pitch period estimation of speech signals with variable step size
WO2024082928A1 (zh) 语音处理方法、装置、设备和介质
TWI241557B (en) Method for estimating a pitch estimation of the speech signals
CN117727311A (zh) 音频处理方法及装置、电子设备及计算机可读存储介质
CN116978360A (zh) 语音端点检测方法、装置和计算机设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12802425

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2012802425

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 20137021767

Country of ref document: KR

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2013556963

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE