CN107430850A - Determine the feature of harmonic signal - Google Patents

Determine the feature of harmonic signal Download PDF

Info

Publication number
CN107430850A
CN107430850A CN201680017664.6A CN201680017664A CN107430850A CN 107430850 A CN107430850 A CN 107430850A CN 201680017664 A CN201680017664 A CN 201680017664A CN 107430850 A CN107430850 A CN 107430850A
Authority
CN
China
Prior art keywords
frequency
signal
pitch
estimation
chirp rate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201680017664.6A
Other languages
Chinese (zh)
Inventor
大卫·卡尔森·布拉德利
黄瑶
马西莫·马斯卡洛
贾尼斯·I·印托尼
肖恩·迈克尔·欧康纳
以利沙·纳塔利·马罗格力
罗伯特·尼古拉斯·希尔顿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Crossbow Ltd By Share Ltd
Original Assignee
Crossbow Ltd By Share Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US14/969,029 external-priority patent/US9870785B2/en
Priority claimed from US14/969,038 external-priority patent/US9842611B2/en
Priority claimed from US14/969,022 external-priority patent/US9548067B2/en
Priority claimed from US14/969,036 external-priority patent/US9922668B2/en
Application filed by Crossbow Ltd By Share Ltd filed Critical Crossbow Ltd By Share Ltd
Priority claimed from PCT/US2016/016261 external-priority patent/WO2016126753A1/en
Publication of CN107430850A publication Critical patent/CN107430850A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/90Pitch determination of speech signals

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Signal Processing (AREA)
  • Auxiliary Devices For Music (AREA)
  • Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)

Abstract

The feature that can be calculated from harmonic signal includes fraction chirp rate, the amplitude of pitch harmonic.For example, can be by calculating the score corresponding to different fraction chirp rates and selecting top score come estimated score chirp rate.For example, can be by using the peak to peak distance in frequency distribution, the frequency representation calculated from the fraction chirp rate using estimation calculates the first pitch.For example, the second pitch can be calculated using the frequency representation of the first pitch and signal by using the correlation of the part of frequency representation.The amplitude of the harmonic wave of signal can be determined using the fraction chirp rate and the second pitch of estimation.The fraction chirp rate of estimation, any one in the second pitch harmony wave-amplitude can be used for further handling, such as speech recognition, speaker verification, Speaker Identification or signal reconstruction.

Description

Determine the feature of harmonic signal
Priority request
The application is based on and requires following priority application:Entitled " the spectrum motion transform " submitted on 2 6th, 2015 U.S. Provisional Patent Application No.62/112836;The U.S. of entitled " the pitch evaluation of speed " submitted on 2 6th, 2015 is interim Patent application No.62/112796;The US provisional patent Shen of entitled " estimation of peak value section pitch " submitted on 2 6th, 2015 Please 62/112832;The U.S. Provisional Patent Application 62/ of entitled " pitch from symmetrical feature " submitted on 2 6th, 2015 The U.S. Non-provisional Patent application 14/ of entitled " feature for determining harmonic signal " submitted on December 15th, 112850 and 2015 969029;The U.S. Non-provisional Patent application 14/ of entitled " using the symmetrical feature estimation pitch " submitted on December 15th, 2015 969022;The U.S. of entitled " utilizing multiple frequency representation estimated score chirp rates " for submitting on December 15th, 2015 is non-provisional specially Profit application 14/969036;The U.S. of entitled " utilizing peak to peak distance estimations pitch " for submitting on December 15th, 2015 is non-provisional Patent application 14/969038, its content are incorporated herein by reference in their entirety.
Background technology
Harmonic signal can have fundamental frequency and one or more overtones.Harmonic signal includes such as voice and music.Harmonic wave Signal can have fundamental frequency, and it can be referred to as first harmonic.Harmonic signal can include may be at the multiple of first harmonic The other harmonic waves occurred.If for example, it is f in certain time fundamental frequency, the frequency of other harmonic waves can be 2f, 3f etc..
The fundamental frequency of harmonic signal can change over time.For example, when a people is speaking, the fundamental frequency of voice may ask Increase at the end of topic.The change of signal frequency can be referred to as chirp rate.The chirp rate of harmonic signal can for different harmonic waves Can be different.For example, if first harmonic has chirp rate c, other harmonic waves can have 2c, 3c etc. chirp rate.
In such as speech recognition, in the application such as signal reconstruction and speaker identification, it may be necessary to determine harmonic signal at any time Between characteristic.For example, it may be desirable to determine the pitch of signal, pitch changes with time rate, or frequency, chirp rate or different humorous The amplitude of ripple.
The content of the invention
In one embodiment, inventive features can include:
1st, a kind of computer implemented method for being used to estimate pitch, methods described include:
Obtain the frequency representation of the Part I of signal;
Obtain the first pitch estimation of the Part I of the signal;
Estimated using first pitch to identify multiple frequency-portions of the frequency representation, the multiple frequency-portions Including first frequency part and second frequency part;
Multiple correlations are calculated using the multiple frequency-portions, the multiple correlation includes the first frequency portion The first correlation divided between the second frequency part;
The first score is calculated using the multiple correlation;With
The second pitch is calculated using first score to estimate.
2nd, the method according to clause 1, wherein the multiple correlation also includes (i) described first frequency part and institute State the second correlation between the inverted version of second frequency part, and (ii) described first frequency part and first frequency Closing property of third phase between the inverted version of rate part.
3rd, the method according to clause 1, wherein the multiple frequency-portions split the frequency representation.
4th, the method according to clause 1, wherein calculating first score includes calculating in the multiple correlation The likelihood score or log likelihood of each correlation.
5th, the method according to clause 1, held wherein calculating the second pitch estimation including the use of first score Row golden section search or gradient decline.
6th, the method according to clause 1, wherein each frequency-portions in the multiple frequency-portions are with described first Centered on the multiple of pitch.
7th, the method according to clause 1, it is additionally included in before calculating the multiple correlation to the multiple frequency portion Each frequency-portions in point are normalized.
8th, the method according to clause 1, also estimate including the use of second pitch to perform speech recognition, speaker Verify, it is at least one in Speaker Identification or signal reconstruction.
9th, a kind of system for being used to estimate the feature of harmonic signal, the system include one or more computing devices, institute Stating one or more computing devices includes at least one processor and at least one memory, one or more of computing devices It is configured as:
Obtain the frequency representation of the Part I of signal;
Obtain the first pitch estimation of the Part I of the signal;
Estimated using first pitch to identify multiple frequency-portions of the frequency representation, the multiple frequency-portions Including first frequency part and second frequency part;
Multiple correlations are calculated using the multiple frequency-portions, the multiple correlation includes the first frequency portion The first correlation divided between the second frequency part;
The first score is calculated using the multiple correlation;With
The second pitch is calculated using first score to estimate.
10th, the system according to clause 1, wherein the multiple correlation also include (i) described first frequency part with The second correlation between the inverted version of the second frequency part, and (ii) described first frequency part and described first Closing property of third phase between the inverted version of frequency-portions.
11st, the system according to clause 1, wherein the multiple frequency-portions split the frequency representation.
12nd, the system according to clause 1, wherein calculating first score includes calculating in the multiple correlation Each correlation takes snow (Fisher) conversion.
13rd, the system according to clause 1, wherein each frequency-portions in the multiple frequency-portions are with described first Centered on the multiple of pitch.
14th, the system according to clause 1, wherein one or more of computing devices are additionally configured to described in calculating Each frequency-portions in the multiple frequency-portions are normalized before multiple correlations.
15th, the system according to clause 1, wherein one or more of computing devices are additionally configured to:
Estimated using second pitch to identify individual frequency-portions more than the second of the frequency representation, the multiple frequency Part includes the 3rd frequency-portions and the 4th frequency-portions;
More than second individual correlations are calculated using more than described second individual frequency-portions;
The second score is calculated using more than described second individual correlations;With
The 3rd pitch is calculated using second score to estimate.
16th, one or more non-transitory computer-readable mediums, including computer executable instructions, the computer can Execute instruction when executed acts at least one computing device, and the action includes:
Obtain the frequency representation of the Part I of signal;
Obtain the first pitch estimation of the Part I of the signal;
Estimated using first pitch to identify multiple frequency-portions of the frequency representation, the multiple frequency-portions Including first frequency part and second frequency part;
Multiple correlations are calculated using the multiple frequency-portions, the multiple correlation includes the first frequency portion The first correlation divided between the second frequency part;
The first score is calculated using the multiple correlation;With
The second pitch is calculated using first score to estimate.
17th, one or more non-transitory computer-readable mediums according to clause 16, wherein being arrived using multiple peaks Peak distance is estimated to calculate first pitch.
18th, one or more non-transitory computer-readable mediums according to clause 16, wherein point using estimation Number chirp rate calculates the frequency representation.
19th, one or more non-transitory computer-readable mediums according to clause 16, wherein the multiple correlation Property also include the second correlation between (i) described first frequency part and the inverted version of the second frequency part, and (ii) closing property of the third phase between the inverted version of the first frequency part and the first frequency part.
20th, one or more non-transitory computer-readable mediums according to clause 16, wherein the multiple correlation Property also include (i) the multiple frequency-portions in each pair frequency-portions between correlation, (ii) the multiple frequency-portions In each pair frequency-portions between correlation, wherein a frequency-portions in each pair frequency-portions are inverted, And (iii) each correlation between frequency-portions and the inverted version of its own.
In another embodiment, inventive features can include:
1st, a kind of computer implemented method for estimated score chirp rate, methods described include:
Obtain a part for signal;
The first frequency that a part for the signal is calculated using the first value of fraction chirp rate is represented;
Represent to calculate the first score using the first frequency;
The second frequency that a part for the signal is calculated using the second value of fraction chirp rate is represented;
Represent to calculate the second score using the second frequency;With
The estimated score chirp of a part for the signal is calculated using first score and second score.
2nd, the method according to clause 1, wherein the first frequency represents it is that frequency of use chirp is distributed, pitch speed The inner product of a part for conversion or the signal with line frequency modulation small echo (chirplet) calculates.
3rd, the method according to clause 1, wherein methods described also include calculating multiple frequencies that the first frequency represents The log-likelihood ratio of rate, and wherein described log-likelihood ratio is the log-likelihood ratio of harmonic wave in frequency be present and in the frequency The ratio of the log-likelihood ratio of harmonic wave is not present in rate.
4th, the method according to clause 1, wherein the autocorrelation represented using the first frequency calculates described One score.
5th, the method according to clause 4, wherein the Fisher information of the autocorrelation represented using the first frequency To calculate first score.
6th, the method according to clause 1, wherein the fraction chirp rate for calculating estimation, which includes selection, corresponds to top score Fraction chirp rate.
7th, the fraction chirp rate of the method according to clause 1, wherein methods described also including the use of estimation is to estimate State the pitch of a part for signal.
8th, the method according to clause 7, also including the use of in the fraction chirp rate of estimation or the pitch of estimation at least One performs speech recognition, speaker verification, at least one in Speaker Identification or signal reconstruction.
9th, a kind of system for estimated score chirp rate, the system include one or more computing devices, and described one Individual or multiple computing devices include at least one processor and at least one memory, one or more of computing devices by with It is set to:
Obtain a part for signal;
The first frequency that a part for the signal is calculated using the first value of fraction chirp rate is represented;
Represent to calculate the first score using the first frequency;
The second frequency that a part for the signal is calculated using the second value of fraction chirp rate is represented;
Represent to calculate the second score using the second frequency;With
The estimated score chirp of a part for the signal is calculated using first score and second score.
10th, the system according to clause 9, wherein one or more of computing devices are additionally configured to calculate described The log-likelihood ratio of multiple frequencies of one frequency representation, and wherein described log-likelihood ratio is pair for existing in frequency harmonic wave Count likelihood ratio and the ratio of the log-likelihood ratio of harmonic wave is not present in the frequency.
11st, the system according to clause 9, wherein the autocorrelation represented using the first frequency calculates described One score.
12nd, the system according to clause 11, wherein the Fisher letters of the autocorrelation represented using the first frequency Cease to calculate first score.
13rd, the system according to clause 9, wherein first score indicates the first fraction chirp rate and the letter Number a part fraction chirp rate between matching.
14th, the system according to clause 9, wherein one or more of computing devices are also configured to use estimation Fraction chirp rate estimates the pitch of a part for the signal.
15th, the system according to clause 15, wherein one or more of computing devices are also configured to use estimation Fraction chirp rate or estimation pitch at least one perform speech recognition, speaker verification, Speaker Identification or letter Number reconstruct in it is at least one.
16th, one or more non-transitory computer-readable mediums, including computer executable instructions, the computer can Execute instruction upon execution acts at least one computing device, and the action includes:
Obtain a part for signal;
The first frequency that a part for the signal is calculated using the first value of fraction chirp rate is represented;
Represent to calculate the first score using the first frequency;
The second frequency that a part for the signal is calculated using the second value of fraction chirp rate is represented;
Represent to calculate the second score using the second frequency;With
The estimated score chirp of a part for the signal is calculated using first score and second score.
17th, one or more non-transitory computer-readable mediums according to clause 16, wherein calculating described first Score includes calculating the function that the first frequency represents.
18th, one or more non-transitory computer-readable mediums according to clause 16, wherein the action is also wrapped Include:
The 3rd frequency representation of a part for the signal is calculated using the 3rd value of fraction chirp rate;
The 3rd score is calculated using the 3rd frequency representation;With
The estimated score chirp of a part for the signal is wherein calculated also including the use of the 3rd score.
19th, one or more non-transitory computer-readable mediums according to clause 16, wherein:
The 3rd frequency representation is changed by using the first fraction chirp rate to represent to create the first frequency; And
The 3rd frequency representation is changed by using the second fraction chirp rate to represent to create the second frequency.
20th, one or more non-transitory computer-readable mediums according to clause 19, wherein the 3rd frequency Represent Fourier (Fourier) conversion corresponding to a part for the signal.
In another embodiment, inventive features can include:
1st, a kind of computer implemented method for being used to estimate pitch, methods described include:
Obtain the Part I of signal;
The first frequency for calculating the Part I of the signal represents;
More than first individual peak values in the first frequency expression are identified using first threshold;
Individual peak to peak distance more than first is calculated using the position in the frequency of more than described first individual peak values;And
Use the pitch of the Part I of signal described in more than described first individual peak to peak distance estimations.
2nd, the method according to clause 1, wherein estimating that the pitch of the Part I includes estimating described more than first The cumulative distribution function of peak to peak distance.
3rd, the method according to clause 1, histogram, and its also are calculated including the use of the multiple peak to peak distance The pitch of the Part I of the middle estimation signal estimates the pitch including the use of the histogram.
4th, the method according to clause 1, wherein being counted using the estimated score chirp rate of the Part I of the signal The first frequency is calculated to represent.
5th, the method according to clause 1, represented wherein calculating the first frequency including the use of the first smoothing kernel.
6th, the method according to clause 1, wherein the first frequency represents to include log-likelihood ratio (LLR) frequency spectrum.
7th, the method according to clause 1, wherein the first frequency represents to include fixed frequency spectrum.
8th, the method according to clause 1, also speech recognition is performed including the use of the pitch of estimation, speaker verification, It is at least one in Speaker Identification or signal reconstruction.
9th, a kind of system for being used to estimate pitch, the system includes one or more computing devices, one or more Individual computing device includes at least one processor and at least one memory, and one or more of computing devices are configured as:
Obtain the Part I of signal;
The first frequency for calculating the Part I of the signal represents;
More than first individual peak values in the first frequency expression are identified using first threshold;
Individual peak to peak distance more than first is calculated using the position in the frequency of more than described first individual peak values;And
Use the pitch of the Part I of signal described in more than described first individual peak to peak distance estimations.
10th, the system according to clause 9, wherein one or more of computing devices are additionally configured to by estimating The cumulative distribution function of individual peak to peak distance more than first is stated to estimate the pitch of the Part I.
11st, the system according to clause 9, wherein one or more of computing devices be also configured to use it is described more Individual peak to peak distance calculates histogram, and estimates using the histogram pitch of the Part I of the signal.
12nd, the system according to clause 9, wherein to be also configured to use first flat for one or more of computing devices Sliding core represents to calculate the first frequency.
13rd, the system according to clause 9, wherein the first frequency represents to include log-likelihood ratio (LLR) frequency spectrum.
14th, the system according to clause 9, wherein one or more of computing devices are additionally configured to:
More than second individual peak values in the first frequency expression are identified using Second Threshold;
Individual peak to peak distance more than second is calculated using the position in the frequency of more than described second individual peak values;And
Use the pitch of the Part I of signal described in more than described second individual peak to peak distance estimations.
15th, the system according to clause 9, wherein one or more of computing devices are additionally configured to:
Obtain the Part II of the signal;
The second frequency for calculating the Part II of the signal represents;
Identify more than second individual peak values in the second frequency expression;
Individual peak to peak distance more than second is calculated using the position in the frequency of more than described second individual peak values;And
Use the pitch of the Part I of signal described in more than described second individual peak to peak distance estimations.
16th, the system according to clause 12, wherein one or more of computing devices are additionally configured to:
The second frequency that the Part I of the signal is calculated using the second smoothing kernel is represented;
Identify more than second individual peak values in the second frequency expression;
Individual peak to peak distance more than second is calculated using the position in the frequency of more than described second individual peak values;And
Use the pitch of the Part I of signal described in more than described second individual peak to peak distance estimations.
17th, one or more non-transitory computer-readable mediums, including computer executable instructions, the computer can Execute instruction upon execution acts at least one computing device, and the action includes:
Obtain the Part I of signal;
The first frequency for calculating the Part I of the signal represents;
Identify more than first individual peak values in the first frequency expression;
Individual peak to peak distance more than first is calculated using the position in the frequency of more than described first individual peak values;And
Use the pitch of the Part I of signal described in more than described first individual peak to peak distance estimations.
18th, one or more non-transitory computer-readable mediums according to clause 17, wherein estimating described first Partial pitch includes estimating the cumulative distribution function of more than the first individual peak to peak distance.
19th, one or more non-transitory computer-readable mediums according to clause 17, also including the use of described more Individual peak to peak distance calculates histogram, and wherein estimates the pitch of the Part I of the signal including the use of the Nogata Figure estimates the pitch.
20th, one or more non-transitory computer-readable mediums according to clause 17, wherein the first frequency Expression includes log-likelihood ratio (LLR) frequency spectrum.
In yet another embodiment, inventive features can include:
1st, a kind of computer implemented method for being used to estimate the feature of harmonic signal, this method include:
Obtain a part for signal;
Calculate the estimated score chirp rate of a part for the signal;
The first frequency that a part for the signal is calculated using the fraction chirp rate of estimation is represented;
The first of the Part I of the signal is calculated using multiple peak to peak distances in first frequency expression Pitch is estimated;And
Using first pitch estimation and the signal a part second frequency represent first frequency part and Correlation between the second frequency part that the second frequency represents is estimated to calculate the second pitch of a part for the signal Meter.
2nd, the method according to clause 1, also come including the use of the fraction chirp rate of estimation and second pitch estimation Calculate the amplitude of multiple harmonic waves of a part for the signal.
3rd, the method according to clause 1, wherein the second frequency represents it is that the first frequency represents.
4th, the method according to clause 1, wherein the fraction chirp rate for calculating estimation includes calculating multiple scores, wherein institute Stating multiple scores includes the first score and the second score, and first score is calculated using the first fraction chirp rate, and described second Score is calculated using the second fraction chirp rate, and calculates the fraction chirp rate of estimation by selecting top score.
5th, the autocorrelation that the method according to clause 4, wherein frequency of use represent calculates first score, and And calculate the frequency representation using the first fraction chirp rate.
6th, the method according to clause 1, wherein performing the one of the signal by using the function of frequency and chirp rate Partial inner product represents to calculate the first frequency, and the chirp rate of wherein described function increases with frequency.
7th, the method according to clause 1, wherein being come using the estimation cumulative distribution function of the multiple peak to peak distance Calculate the first pitch estimation.
8th, the method according to clause 1, wherein the first frequency part corresponds to the of first pitch estimation One multiple, and the second frequency part corresponds to the second multiple of first pitch estimation.
9th, the method according to clause 2, in addition to:
Characteristic vector is calculated using the amplitude of the multiple harmonic wave;And
Speech recognition is performed using the characteristic vector, speaker verification, in Speaker Identification or signal reconstruction extremely It is few one.
10th, a kind of system for being used to estimate the feature of harmonic signal, the system include one or more computing devices, institute Stating one or more computing devices includes at least one processor and at least one memory, one or more of computing devices It is configured as:
Obtain a part for signal;
Calculate the estimated score chirp rate of a part for the signal;
The first frequency that a part for the signal is calculated using the fraction chirp rate of estimation is represented;
The letter is calculated using multiple peak to peak distances in the first frequency expression of a part for the signal Number Part I the first pitch estimation;And
Using first pitch estimation and the signal a part second frequency represent first frequency part and Correlation between the second frequency part that the second frequency represents is estimated to calculate the second pitch of a part for the signal Meter.
11st, the system according to clause 10, wherein one or more of computing devices are also configured to use estimation Fraction chirp rate and second pitch estimation calculate the amplitude of multiple harmonic waves of a part for the signal.
12nd, the system according to clause 10, wherein calculating the first frequency table using the fraction chirp rate of estimation Show.
13rd, the system according to clause 10, wherein the second frequency represents that being different from the first frequency represents.
14th, the system according to clause 10, represented wherein calculating the first frequency using pitch velocity transformation.
15th, the system according to clause 10, wherein being calculated using the histogram of the multiple peak to peak distance described First pitch is estimated.
16th, the system according to clause 10, wherein one or more of computing devices be additionally configured to by using The inverted version of first frequency part calculates correlation to calculate the second pitch estimation.
17th, one or more non-transitory computer-readable mediums, including computer executable instructions, the computer can Execute instruction upon execution acts at least one computing device, and the action includes:
Obtain a part for signal;
Calculate the estimated score chirp rate of a part for the signal;
The letter is calculated using multiple peak to peak distances in the first frequency expression of a part for the signal Number Part I the first pitch estimation;And
Using first pitch estimation and the signal a part second frequency represent first frequency part and Correlation between the second frequency part that the second frequency represents is estimated to calculate the second pitch of a part for the signal Meter.
18th, one or more non-transitory computer-readable mediums according to clause 17, in addition to computer can be held Row instruction, the computer executable instructions make at least one computing device action, the action bag when executed Include using the fraction chirp rate of estimation and second pitch estimation to calculate shaking for multiple harmonic waves of a part for the signal Width.
19th, one or more non-transitory computer-readable mediums according to clause 17, in addition to computer can be held Row instruction, the computer executable instructions make at least one computing device action, the action bag when executed Include:
Obtain the Part II of the signal;
Calculate the second estimated score chirp rate of the Part II of the signal;
Calculate the 3rd pitch estimation of the Part II of the signal;And
Estimate to estimate to calculate the 4th pitch of the Part II of the signal using the 3rd pitch.
20th, one or more non-transitory computer-readable mediums according to clause 19, in addition to computer can be held Row instruction, the computer executable instructions make at least one computing device action, the action bag when executed Include:
The multiple humorous of a part for the signal is calculated using the fraction chirp rate of estimation and second pitch estimation The amplitude of ripple;
Use the magnitude determinations characteristic vector;
The Part II of the signal is calculated using the second estimated score chirp rate and the 4th pitch estimation More than second individual harmonic waves the second amplitude;
Use the second magnitude determinations second feature vector;And
Speech recognition, speaker verification are performed using the characteristic vector and the second feature vector, speaker knows It is at least one not or in signal reconstruction.
Brief description of the drawings
With reference to the following drawings, it is possible to understand that of the invention and its some embodiments it is described in detail below:
Fig. 1 shows the example of the harmonic signal with different fraction chirp rates.
Fig. 2 shows the frequency spectrum of a part for voice signal.
Fig. 3 shows the expression of the frequency and chirp rate of harmonic signal.
Fig. 4 shows the expression of the frequency and fraction chirp rate of harmonic signal.
Fig. 5 shows two examples of the broad sense frequency spectrum of signal.
Fig. 6 shows the pitch velocity transformation of voice signal.
Fig. 7 shows two examples of the broad sense frequency spectrum of voice signal.
Fig. 8 shows the LLR frequency spectrums of voice signal.
Fig. 9 A show the peak to peak distance of the single threshold value in the LLR frequency spectrums of voice signal.
Fig. 9 B show the peak to peak distance of multiple threshold values in the LLR frequency spectrums of voice signal.
Figure 10 A show the frequency-portions of the frequency representation of the voice signal for the estimation of the first pitch.
Figure 10 B show the frequency-portions of the frequency representation of the voice signal for the estimation of the second pitch.
Figure 11 is the flow chart for the example implementation for calculating signal characteristic.
Figure 12 is the flow chart of the example implementation for the fraction chirp rate for estimating signal.
Figure 13 is the flow chart using the example implementation of peak to peak distance estimations signal pitch.
Figure 14 is the flow chart that the example implementation of signal pitch is estimated using correlation.
Figure 15 can be used for estimating the exemplary computer device of signal characteristic.
Embodiment
There has been described for determining technology of the harmonic signal with the property of time.For example, the characteristic of harmonic signal can be with (for example, every 10 milliseconds) determine at regular intervals.These characteristics can be used for handling voice or other signals, such as conduct For performing automatic speech recognition or speaker verification or knowing another characteristic.These characteristics can also be used for performing signal reconstruction to drop The noise level of low harmony wave signal.
The estimation to the characteristic of harmonic signal can be improved using the relation between the harmonic wave of harmonic signal.For example, such as First subharmonic of fruit harmonic signal has frequency f and chirp rate c, then the multiple that the frequency of expected higher hamonic wave is f, chirp rate For c multiple.Result more more preferable than other technologies can be provided using the technology of these relations.
Harmonic signal can have pitch.For some harmonic signals, pitch can correspond to the frequency of the first subharmonic. For some harmonic signals, the first subharmonic may be not present or invisible (for example, it may be possible to being covered by noise), and can root Pitch is determined according to the difference on the frequency between second harmonic and triple-frequency harmonics.For some harmonic signals, multiple harmonic waves there may be Or it is invisible, and pitch can be determined according to the frequency of visible harmonic wave.
The pitch of harmonic signal may time to time change.For example, the pitch of sound or the note of musical instrument such as when Between and change.With the change in pitch of harmonic signal, each harmonic wave will have chirp rate, and the chirp rate of each harmonic wave may It is different.The rate of change of pitch is properly termed as pitch speed or described by fraction chirp rate.In some implementations, fraction chirp rate It may be calculated χ=cn/fn, wherein χ expression fraction chirp rates, cnThe chirp rate of n-th harmonic is represented, fn represents n-th harmonic Frequency.
In some implementations, it may be desirable to calculate the pitch and/or fraction chirp rate of harmonic signal at regular intervals.Example Such as, it may be desirable to which calculating is performed to calculate every 10 milliseconds of pitch and/or fraction chirp rate, the signal to a part for signal Part can by signal application time window (for example, Gauss (Gaussian), Hamming (Hamming) or Korea Spro's grace (Hann) Window) and obtain.The continuous part of signal can be referred to as frame, and frame can be with overlapping.For example, one can be created with every 10 milliseconds Individual frame, the length of each frame can be 50 milliseconds.
Fig. 1 shows four harmonic signals with different fraction chirp rates as time and the example of the function of frequency. Fig. 1 does not represent actual signal, but provides line frequency modulation small echo and (have specified time, frequency, chirp rate and the Gauss of duration Signal) how in the middle concept map occurred of T/F expression (such as spectrogram).
Harmonic signal 110 is centered on time t1 and has four harmonic waves.The frequency of first subharmonic is f, second, the It is respectively three times 2f with the frequency of four-time harmonic, 3f and 4f.Because the frequency of harmonic wave is constant over time, often The chirp rate of individual harmonic wave is 0.Therefore, the fraction chirp rate of harmonic signal 110 is 0.
Harmonic signal 120 is centered on time t2 and has four harmonic waves.The frequency of first subharmonic is 2f, second, Third time and the frequency of four-time harmonic are respectively 4f, 6f and 8f.The chirp rate c of first subharmonic for just because frequency with The passage of time and increase.Secondth, third time and the chirp rate of four-time harmonic are respectively 2c, 3c and 4c.Therefore, harmonic wave is believed Numbers 120 fraction chirp rate is c/2f.
Harmonic signal 130 is centered on time t3 and has four harmonic waves.The frequency of first subharmonic is f, second, the It is respectively three times 2f with the frequency of four-time harmonic, 3f and 4f.The chirp rate of first subharmonic is also c, second, third time and The chirp rate of four-time harmonic is respectively 2c, 3c and 4c.Therefore, the fraction chirp rate of harmonic signal 130 is c/f, and it is harmonic wave Twice of signal 120.
Harmonic signal 140 is centered on time t4 and has four harmonic waves.The frequency of first subharmonic is f, second, the It is respectively three times 2f with the frequency of four-time harmonic, 3f and 4f.The chirp rate of first subharmonic is 2c, because frequency change rate is Twice of harmonic signal 130.Secondth, third time and the chirp rate of four-time harmonic are respectively 4c, 6c and 8c.Therefore, harmonic wave The fraction chirp rate of signal 140 is 2c/f, and it is twice of harmonic signal 130.
Fig. 2 shows the frequency spectrum of a part for voice signal.In spectrogram, it can be seen that multiple harmonic waves.In spectrogram In each moment, harmonic wave has above-mentioned relation.For example, at each moment, the frequency and chirp rate of the second subharmonic are first The frequency of subharmonic and twice of chirp rate.
Fig. 3 shows the example of four harmonic signals of the function as frequency and chirp rate, and this will be claimed herein For frequency-chirp distribution or represent.Fig. 3 does not represent actual signal, but provides Fig. 1 harmonic signal such as how frequency and chirp The concept map that the expression of rate occurs.When calculating frequency-chirp expression, time variable, therefore frequency-chirp point may be not present Cloth can represent whole signal, rather than a part for the signal of special time.In some implementations, it may be necessary to calculate corresponding Frequency-chirp distribution in the part of the signal of different time.For example, it may be desirable to by applying sliding window to signal to count Calculate every 10 milliseconds of frequency-chirp distribution.
Can be by the frequency and chirp rate for the harmonic wave for checking harmonic signal shown in Fig. 1 come conceptive structural map 3.For example, For harmonic signal 110, each chirp rate is 0, and the frequency of four harmonic waves is respectively 2f, 3f and 4f.Therefore, harmonic signal 110 four harmonic waves are expressed in these positions in figure 3.Similarly, the harmonic wave of harmonic signal 120,130 and 140 according to Their own frequency and chirp rate from Fig. 1 figure 3 illustrates.
It can be distributed using similar to the technology for calculating T/F distribution (such as spectrogram) to calculate frequency-chirp. For example, in some implementations, it can be distributed using inner product to calculate frequency-chirp.Represent that frequency-chirp is distributed with FC (f, c), Wherein f corresponds to frequency variable, and c corresponds to chirp rate variable.It can be distributed using inner product to calculate frequency-chirp rate:
FC (f, c)=<X, ψ (f, c)>
Wherein x is the signal (or its window portion) handled, and ψ (f, c) is parameterized by frequency f and chirp rate c Function.In some implementations, ψ (f, c) can represent a line frequency modulation small echo, such as:
Wherein σ corresponds to duration or the propagation of chirp, t0It is the position of line frequency modulation small echo in time.In order to calculate The distribution of frequency and chirp rate, appropriate function ψ (f, c) can be selected, such as line frequency modulation small echo, and calculate f and c multiple values FC (f, c).Frequency-chirp distribution is not limited to above-mentioned example, and can otherwise calculated.For example, frequency-chirp distribution The real part of inner product, imaginary part, amplitude or Amplitude-squared can be calculated as, the measurement of the similarity in addition to inner product can be used Calculate, or can be calculated using the nonlinear function of signal.
The fraction chirp rate that four harmonic signals in Fig. 3 are had nothing in common with each other.The fraction chirp rate of harmonic signal 110 is 0, humorous The fraction chirp rate of ripple signal 120 is c/2f, and the fraction chirp rate of harmonic signal 130 is c/f, and point of harmonic signal 120 Number chirp rate is 2c/f.Therefore dash line and pecked line in Fig. 3 represent the line of constant fraction chirp rate.With short stroke-pecked line Centered on harmonic wave by the fraction chirp rate with c/2f, harmonic wave centered on pecked line by the fraction chirp rate with c/f, And the harmonic wave centered on dash line is by the fraction chirp rate with 2c/f.
Therefore, any RADIAL in Fig. 3 corresponds to constant fraction chirp rate.According to the observation, frequency can be generated With the distribution of fraction chirp rate, it can be referred to as pitch-velocity transformation (PVT) or chirp (chirprum).PVT can be with table P (f, χ) is shown as, wherein f corresponds to frequency variable, and χ corresponds to fraction chirp rate variable.Conceptually, can be by distorting frequency Rate-chirp distribution so that the RADIAL of frequency-chirp distribution is changed into PVT horizontal line to construct PVT.Fig. 4 is shown according to figure 3 frequency-PVT caused by chirp distribution concept example.Because each harmonic wave of harmonic signal has identical fraction chirp Rate, so their horizontal alignments, as shown in Figure 4.
In some implementations, PVT can be calculated according to frequency-chirp distribution.For example, PVT may be calculated:
P (f, χ)=FC (f, χ f)
Because that c=χ f.However, it is not necessary to PVT is calculated according to frequency-chirp distribution.
PVT can also be calculated using similar to the technology for calculating T/F distribution (for example, spectrogram).For example, During some are realized, PVT can be calculated using inner product.Frequency-chirp rate distribution may be calculated:
P (f, χ)=<X, ψ (f, χ f)>
Wherein ψ () is function as described above.In order to calculate PVT, appropriate function ψ (), such as line frequency modulation can be selected Small echo, and calculate the P (f, χ) of f and χ multiple values.PVT is not limited to above-mentioned example, and PVT can be calculated in other ways. For example, PVT can be calculated as the real part of inner product, imaginary part, amplitude or Amplitude-squared, the similarity in addition to inner product can be used Measurement calculate, or can be calculated using the nonlinear function of signal.
The PVT of the designated value of fraction chirp rate is the function of frequency, is considered the frequency spectrum or broad sense frequency spectrum of signal. Therefore, for each value of fraction chirp rate, broad sense frequency spectrum can be determined according to the PVT associated with particular fraction chirp rate. Broad sense frequency spectrum is properly termed as X χ (f).As described below, these broad sense frequency spectrums need not calculate from PVT, and can be in other ways Calculate.The PVT of fraction chirp rate is specified to correspond to PVT section, it will be referred to as PVT row (if PVT is not with herein Same orientation is presented, then is referred to as arranging, and PVT orientation is not the limited features of technology described herein).In order to explain Clarity, will be that function ψ () use line frequency modulation small echo in the following discussion, but for ψ () can use it is any suitably Function.
For the fraction chirp rate for 0, PVT corresponds to
P (f, 0)=<X, ψ (f, 0)>
It corresponds to the inner product of the signal with Gauss, and the chirp rate that wherein it is zero that Gauss, which has, is simultaneously modulated onto PVT's Corresponding frequencies f.This may be identical with calculating the short time discrete Fourier transform of signal with Gauss window.
For non-zero fraction chirp rate, PVT corresponds to the inner product of gaussian signal, wherein the chirp rate of Gauss with Gauss frequency Rate increases and increased.Especially, chirp rate can be the product of fraction chirp rate and frequency.For non-zero chirp rate, PVT may With similar to slow down or reduce signal fraction chirp rate effect (or on the contrary, accelerate or increase signal fraction chirp Rate).Therefore, PVT often row corresponds to broad sense frequency spectrum, in the broad sense frequency spectrum, the fraction chirp rate of signal by with PVT Row corresponding to value changed.
When the fraction chirp rate of broad sense frequency spectrum (or PVT rows) is equal to the fraction chirp rate of signal, broad sense frequency spectrum can correspond to In the fraction chirp rate for removing signal, and the broad sense frequency spectrum for being directed to the value of fraction chirp rate can be referred to as the fixation of signal The optimal row of frequency spectrum or PVT.
Fig. 5 is shown using imaginary broad sense frequency caused by two different fraction chirp rate values of harmonic signal 140 shown in Fig. 1 Compose (or PVT rows).Four peaks (511,512,513,514) show broad sense frequency spectrum, the fraction of its mid-score chirp rate and signal Chirp rate matches, and this is properly termed as fixed frequency spectrum.Due to the fraction chirp rate of the row of broad sense frequency spectrum and the fraction chirp rate of signal Matching, so the width at (i) four peaks may be than the broad sense narrow spectrum of other fraction chirp rate values, and the height at (ii) four peaks It could possibly be higher than the broad sense frequency spectrum of other fraction chirp rate values.Because peak may become narrower and higher, they may be than other Broad sense frequency spectrum is more easily detected.The peak of fixed frequency spectrum may be narrower and higher, because fixed frequency spectrum has the fraction for eliminating signal The effect of chirp rate.
Four peaks (521,522,523,524) show the broad sense of the fraction chirp rate different from the fraction chirp rate of signal Frequency spectrum.Because the fraction chirp rate of broad sense frequency spectrum mismatches with signal, peak may be shorter and wider.
Fig. 6 shows PVT of the signal shown in Fig. 2 at about 0.21 second.Now, signal is of approximately 230Hz pitch About 4 fraction chirp rate.PVT shows the signal characteristic of each harmonic wave.For example, PVT about 230Hz on the frequency axis Place, and 4 on fraction chirp rate axle at show the first subharmonic.Similarly, PVT is on the frequency axis about at 460Hz, and The second subharmonic is shown at 4 on fraction chirp rate axle, by that analogy.Under frequency between harmonic wave, PVT has relatively low value, Because the signal energy in these regions is relatively low.Under the fraction chirp rate different from 4, PVT has relatively low value, because PVT Fraction chirp rate and signal fraction chirp rate mismatch.
Fig. 7 shows two broad sense frequency spectrums corresponding with the row of PVT shown in Fig. 6.Solid line corresponds to broad sense frequency spectrum, wherein dividing Number chirp rate matches with the fraction chirp rate (being about 4 fraction chirp rate) or fixed frequency spectrum of signal.Pecked line corresponds to tool Have the broad sense frequency spectrum of zero number chirp, its will be referred to as zero broad sense frequency spectrum (and can correspond to signal Short-time Fourier become Change).The peak of fixed frequency spectrum is higher and narrower than the peak of zero broad sense frequency spectrum.For the first subharmonic, the peak 711 of fixed frequency spectrum is zero Twice of the height of broad sense spectral peak 721 and 1/3rd of width.For third harmonic, the peak 712 and zero of fixed frequency spectrum Difference between the peak 722 of broad sense frequency spectrum is even more big.For the seventh harmonic, the peak 713 of fixed frequency spectrum be it is high-visible, But the peak of zero broad sense frequency spectrum is sightless.
The fraction chirp rate of signal can be determined using the feature of different broad sense frequency spectrums (or PVT row).As above institute State, for the right value of fraction chirp rate, the peak of broad sense frequency spectrum can be with narrower and higher.Therefore, for the narrower of measurement signal It can be used for the fraction chirp rate of estimation signal with the technology of higher peak.
For estimated score chirp rate, can use vectorial (for example, frequency spectrum) as input and according to some standards Export the function of one or more scores.G () be using vector as input (such as PVT broad sense frequency spectrum or row) and export and The function of one or more values or score corresponding to input.In some implementations, g () output can be the peak of instruction input The numeral of degree.For example, g () can correspond to entropy, Fisher information, KL (Kullback-Leibler) divergence or input The magnitude of biquadratic or higher power.Using function g (), in the following manner can be used to estimate the fraction chirp of signal according to PVT Rate:
WhereinIt is the estimation of fraction chirp rate.Function g () can be calculated for PVT multirow, and can select to produce g The row of the peak of () corresponds to the estimated score chirp rate of signal.
The estimation of fraction chirp rate can also be calculated according to frequency-chirp distribution (for example, above-mentioned frequency chirp is distributed):
Can also be according to the estimation of broad sense spectrometer point counting number chirp rate:
Function ψ () can also be utilized to calculate the estimation of fraction chirp rate using the inner product of signal:
As set forth above, it is possible to PVT is calculated using various technologies, it is each in frequency-chirp rate distribution and broad sense frequency spectrum It is individual.In some implementations, this tittle can be determined by calculating the inner product of the signal with line frequency modulation small echo, but this paper institutes The technology of description is not limited to the specific implementation.It is, for example, possible to use the function in addition to chirp, and can use and remove inner product Outside similarity measurement.
In some implementations, broad sense frequency spectrum can be changed before the fraction chirp rate for determining signal.For example, can be with Log-likelihood ratio (LLR) frequency spectrum is calculated from broad sense frequency spectrum, and can be LLRx (f) by LLR frequency spectrum designations.LLR frequency spectrums can be with Determine to whether there is harmonic wave in the frequency of frequency spectrum to improve using measuring technology is assumed.For example, in order to determine solid shown in Fig. 7 Determine to whether there is harmonic wave in the frequency of frequency spectrum, can be by the value of frequency spectrum compared with threshold value.It can improve this using LLR frequency spectrums One determines.
LLR frequency spectrums can be calculated using the log-likelihood ratio of two hypothesis:(1) harmonic wave be present at the frequency of signal, And harmonic wave is not present in (2) at the frequency of signal.For each in two hypothesis, likelihood score can be calculated.Can compare Compared with the two likelihood scores to determine whether there is harmonic wave, such as the ratio of the logarithm by calculating two likelihood scores.
In some implementations, can be by, by Gauss curve fitting to signal spectrum, then calculating gaussian sum signal at frequency Between residual sum of squares (RSS) come the log likelihood of harmonic wave existing for calculating at signal frequency.In order to which Gauss is intended at frequency Close in frequency spectrum, then Gauss can be calculated centered on frequency using suitable for estimating any technology of these parameters The amplitude of Gauss.In some implementations, the extension of the frequency of Gauss or duration can be matched for calculating signal spectrum Window, or the extension of Gauss can also be determined during fit procedure.Will fixed frequency shown in Gauss curve fitting to Fig. 7 for example, working as During the peak 711 of spectrum, the amplitude of Gauss can be approximated to be 0.12, and the duration of Gauss can be approximately corresponding to continuing for peak Time (or for calculating the window of frequency spectrum).It may then pass through and calculate Gauss in the window of frequency components of likelihood score is calculated Residual sum of squares (RSS) between signal spectrum calculates log likelihood.
In some implementations, the log likelihood in the absence of harmonic wave in frequency can correspond to calculating the frequency of likelihood score Residual sum of squares (RSS) of the zero-frequency spectrum (being all zero frequency spectrum at all frequencies) between signal spectrum is calculated in window around rate to come Calculate log likelihood.
LLR frequency spectrums can be by two likelihood scores of each frequency for calculating signal spectrum (for example, broad sense frequency spectrum), then The logarithm (for example, natural logrithm) for calculating the ratio of two likelihood scores determines.Other steps, such as estimation letter can also be performed Noise variance in number, and normalize log likelihood using the noise variance of estimation.In some implementations, for frequency f LLR frequency spectrums may be calculated:
WhereinIt is the noise variance of estimation, X is frequency spectrum, and h is hermitian (Hermitian) transposition,It is frequency f The best fit Gauss of the frequency spectrum at place.
Fig. 8 shows the example of the LLR frequency spectrums corresponding to fixed frequency spectrum shown in Fig. 7.For each frequency, LLR frequency spectrums exist There is high level in the presence of harmonic wave, there is low value when in the absence of harmonic wave.Compared to other frequency spectrums (such as broad sense or fixed frequency spectrum), LLR Frequency spectrum can preferably determine to whether there is harmonic wave on different frequency.
The estimation of LLR spectrometer point counting number chirp rates can also be used:
In order to illustrate some possible realizations of estimated score chirp rate, it will thus provide function g () example.The example below will Using broad sense frequency spectrum, but other frequency spectrums can also be used, such as LLR frequency spectrums.
In some implementations, it can use and estimated score chirp rate is come to the quadruplicate magnitude of broad sense frequency spectrum:
g(Xχ(f))=∫ | Xχ(f)|4df
In some implementations, function g () can include at least some in following sequence of maneuvers:(1) calculate | Xχ(f)|2 (can by divided by the gross energy of signal or some other normalized values normalize);(2) calculate | Xχ(f)|2Auto-correlation Property, it is expressed as rX(τ);(3) Fisher information, entropy, Kullback-Leibler divergings, r are calculatedXSquare (or the amplitude of (τ) value Square) and, or rXThe second derivative quadratic sum of (τ).Above-mentioned example is not restricted, and other changes are possible.Example Such as, in step (1), X can be usedχ(f) or its size, or real part or imaginary part replace | Xχ(f)|2
Therefore, it is possible to use any combinations of above-mentioned technology or any similar techniques well known by persons skilled in the art are come really Determine the fraction chirp rate of signal.
In addition to estimating the fraction chirp rate of signal, the pitch of signal can also be estimated.In some implementations, can be first First estimated score chirp rate, and the fraction chirp rate estimated can be used for estimating pitch.For example, in estimated score chirp rate (table It is shown as) after, pitch can be estimated using the broad sense frequency spectrum of the fraction chirp rate corresponding to estimation.
When estimating pitch, pitch estimates the difference that may have octave with real pitch, and the octave can be claimed For octave error.For example, if true pitch is 300Hz, pitch estimation can be 150Hz or 600Hz.In order to avoid again Sound interval error, pitch can be estimated using two-step method.It is possible, firstly, to determine that rough pitch estimation may be less accurate to obtain But the estimation influenceed by octave error is less susceptible to, secondly, can estimate to estimate to improve rough pitch using accurate pitch.
Can be by calculating frequency spectrum, such as the peak of broad sense frequency spectrum or LLR frequency spectrums (estimation for corresponding to fraction chirp rate) arrives Peak distance determines the estimation of rough pitch.For the sake of understanding in the following description, LLR frequency spectrums will be used as example frequency Spectrum, but technique described herein is not limited to LLR frequency spectrums, and any appropriate frequency spectrum can be used.
When calculate the peak to peak in frequency spectrum apart from when, may it is not always clear which peak corresponds to signal, which peak pair Should be in noise.Including the too many peak corresponding to noise or exclude that rough pitch estimation may be reduced corresponding to the peak of signal too much Precision.Although the example LLR frequency spectrums in Fig. 8 have low noise, for the signal with high noise levels, it is also possible to deposit In the additional peak as caused by noise.
In some implementations, peak can be selected from LLR frequency spectrums using threshold value.For example, it may be determined that the noise in frequency spectrum Standard deviation (or variance), and can be using the standard deviation of noise to calculate or select threshold value, such as threshold value set Multiple or fraction (for example, setting a threshold to twice of noise standard deviation) for standard deviation., can be true after selecting threshold value Determine peak to peak distance.For example, Fig. 9 A show the peak to peak distance that threshold value is about 0.3.At the threshold value, preceding 5 peak to peaks away from From about 230Hz, the 6th is about 460Hz, and the 7th and the 8th is about 230Hz, and the 9th is about 690Hz. It is determined that after peak to peak distance, the peak to peak distance of most frequent appearance can be selected to estimate as rough pitch.For example, can be with Histogram is calculated using the vertical bar (bin) that width is 2 to 5Hz, and the histogram with maximum count quantity can be selected Vertical bar is estimated as rough pitch.
In some implementations, multiple threshold values as shown in Figure 9 B can be used.It is, for example, possible to use the peak in LLR frequency spectrums Height, such as ten tops or higher than Second Threshold all peaks (for example, twice higher than the standard deviation of noise) come Select threshold value.Each threshold calculations peak to peak distance can be directed to.In figures 9 b and 9, peak to peak is determined using top as threshold value Distance 901, peak to peak distance 911 and 912 is determined using the second top as threshold value, it is true as threshold value using the 3rd top Determine peak to peak distance 921,922 and 923, by that analogy.As described above, it can such as be selected most frequent by using histogram The peak to peak distance of appearance is estimated as rough pitch.
In some implementations, multiple time frames can be directed to and calculates peak to peak distance, to determine that rough pitch is estimated.Example Such as, in order to determine that the rough pitch of particular frame is estimated, can be directed to present frame, first five frame and follow-up five frame calculating peak to peak away from From.The peak to peak distance of all frames may be incorporated in together to determine that rough pitch is estimated, such as calculate all peak to peak distances Histogram.
In some implementations, can be by using different smoothing kernels to calculate peak to peak distance on frequency spectrum.By smoothing kernel The peak as caused by noise may be reduced applied to frequency spectrum, it is also possible to reducing the peak as caused by signal.For noisy signal, Broader core may perform better than, and may be performed better than for less noise signal, narrower core.It may be unaware that Appropriate core width how is selected, therefore peak to peak can be calculated according to the frequency spectrum of each core width in the one group of core width specified Distance.As described above, when it is determined that rough pitch is estimated, the peak to peak distance of all smoothing kernels can be merged.
Therefore, peak to peak distance, including but not limited to different threshold value, at the time of different can be calculated in various manners (for example, frame) and different smoothing kernel.From these peak to peak distances, it may be determined that rough pitch estimation.In some implementations, may be used So that the estimation of rough pitch is defined as into the frequency corresponding with the pattern of the histogram of the peak to peak distance of all calculating.
In some implementations, can be by estimating the cumulative distribution function (CDF) and/or probability density letter of peak to peak distance Count (PDF) rather than rough pitch estimation is determined using histogram.For example, the CDF of pitch can be estimated as follows.For less than Any pitch value of minimum peak to peak distance, CDF will be zero, and for any pitch value more than maximum peak to peak distance, CDF will For one.For the pitch value between the two boundaries, CDF can be estimated as to the cumulative number for being less than the peak to peak distance of pitch value Divided by the sum of peak to peak distance.For example, it is contemplated that the peak to peak distance shown in Fig. 9 A.Fig. 9 A show altogether 9 peak to peaks away from From including 7 230Hz peak to peak distance, 1 460Hz peak to peak distance, and 1 690Hz peak to peak distance.It is right In the frequency less than 230Hz, the value for estimating CDF is 0, and for the frequency between 230Hz and 460Hz, the value for estimating CDF is 7/ 9, for the frequency between 460Hz and 690Hz, the value for estimating CDF is 8/9, for the frequency higher than 690Hz, estimates CDF's It is worth for 1.
The CDF of the estimation can be similar to step function, therefore can use any appropriate smoothing technique (such as batten Interpolation, LPF or local weighted recurrence scatterplot (LOWESS) are smooth) carry out smooth CDF.Rough pitch estimation can be determined For the pitch value of the greatest gradient corresponding to CDF.
In some implementations, PDF can be estimated from CDF by calculating CDF derivative, and can use it is any appropriate Technology calculates derivative.Then the estimation of rough pitch can be determined as corresponding to the pitch value at PDF peak.
In some implementations, it may be determined that multiple first coarse pitch estimations, and can estimate to come using preliminary pitch It is determined that actual rough pitch estimation.For example, can select that first coarse pitch is estimated or the most frequently used rough pitch estimation is averaged Value is as actual rough pitch estimation.For example, each that can be directed in one group of threshold value calculates rough pitch estimation.For height Threshold value, rough pitch estimation may be too high, and for Low threshold, rough pitch estimation may be too low.For threshold between the two Value, rough pitch estimation may be more accurate.In order to determine actual rough pitch estimation, multiple first coarse pitches can be calculated and estimated The histogram of meter, and actual rough pitch estimation can correspond to the frequency of the pattern of histogram.In some implementations, can be with Exceptional value is removed from histogram to improve actual rough pitch estimation.
After rough pitch estimation is obtained, rough pitch estimation can be used to obtain accurate pitch as starting point and estimated Meter.The estimation of accurate pitch can be determined using the shape of each harmonic wave in frequency spectrum, and (it is any appropriate again, can to use Frequency spectrum, such as broad sense frequency spectrum, fixed frequency spectrum or LLR frequency spectrums)., can be such as Figure 10 A and figure in order to compare the shape of harmonic wave in frequency spectrum The part of extraction frequency spectrum shown in 10B.
Figure 10 A show the part of the frequency spectrum for the estimation of the first pitch, and wherein pitch estimates the true of very close signal Flatness is high.Assuming that the true pitch of signal is about 230Hz, pitch estimation is also about 230Hz.Can be by using estimation The multiple of pitch identifies a part for the frequency spectrum of each harmonic wave.In Figure 10 A, part 1010 is in about 230Hz, part 1011 are in about 460Hz, relatively high magnification numbes of the part 1012-1017 each in 230Hz.Due to pitch estimation be it is accurate, So each harmonic approximation centered on the centre of each part.Estimate one of the pitch in audio signal based on symmetry characteristic A little examples were submitted entitled " for according to the symmetry characteristic estimation audio letter independently of harmonic amplifier on the 30th in September in 2014 The U.S. Patent application No.14/502 of the system and method for pitch in number ", described in 844, entire contents are by quoting simultaneously Enter herein.
Figure 10 B show the part of the frequency spectrum for the estimation of the second pitch, and wherein pitch estimation is slightly less than the true of signal Pitch.For example, pitch estimation can be 228Hz, actual pitch is probably 230Hz.Again, the multiple that pitch can be used to estimate To identify a part for the frequency spectrum of each harmonic wave.For each harmonic wave, the part slightly within the actual position of harmonic wave left side, And offset and increase with the increase of multi harmonics.Part 1020 is located at the left side of the actual position of the first subharmonic about At 2Hz, part 1021 is located at the left side of the actual position of the second subharmonic about at 4Hz, and part 1022-1027 respectively with The increase of harmonic wave quantity and more and more to the left.For example, part 1027 is located at the left side of the actual position of the 8th subharmonic about At 16Hz.
Frequency-portions from Figure 10 A and 10B are determined for the precision of pitch estimation.When pitch estimation is accurate, As shown in Figure 10 A, each harmonic wave is centered on frequency-portions, therefore frequency-portions all have similar shape.When pitch is estimated When inaccurate, as shown in Figure 10 B, each harmonic wave not centered on frequency-portions, and with harmonic wave quantity increase and more It is off-center.Therefore, when pitch estimation is less accurate, frequency-portions are less similar each other.
In addition to by the shape of first frequency part compared with second frequency part, frequency-portions can be with it certainly The inverse version of body is compared, because being generally in the shape of for harmonic wave is symmetrical.Estimate for accurate pitch, harmonic wave will be with frequency Command troops to be divided into center, therefore invert the part to provide similar shape.Estimate for the pitch of inaccuracy, harmonic wave will not be with frequency Command troops to be divided into center, and invert the part to cause different shapes.Similarly, first frequency part can be with second frequency Partial inverted version is compared.
Frequency-portions can have any appropriate width.In some implementations, frequency-portions can be with split spectrum, may It is overlapping with adjacent part, or can have gap (as illustrated in figs. 10 a and 10b) between them.The frequency-portions used can With the frequency spectrum corresponding to any frequency representation, such as signal, or the real part of the frequency spectrum of signal, imaginary part, amplitude or Amplitude-squared.Frequently Rate part can also be normalized, to eliminate and determine the small difference of the pitch degree of correlation.For example, for each frequency-portions, can To determine average value and standard deviation, and can by subtract average value and then divided by standard deviation (for example, z scores) come pair Frequency-portions are normalized.
Whether can have similar shape using correlation measuring two frequency-portions and determine harmonic wave whether with Centered on expected frequence.The frequency-portions for pitch estimation can be determined as described above, and can be by calculating two frequencies The inner product of rate part performs correlation.The correlation that can be performed includes:First frequency part is related to second frequency part, the One frequency-portions are related to the inverted version of its own, and first frequency part and the inverse version of second frequency part It is related.
There may be high value for more accurately pitch estimation correlation, estimate correlation for less accurate pitch There may be lower value.Estimate for more accurately pitch, frequency-portions to each other and between mutual inverse version will tool There is bigger similitude (for example, each harmonic wave is centered on frequency-portions), therefore correlation may be higher.For less accurate True pitch estimation, frequency-portions to each other and between mutual inverse version by with relatively low similitude (for example, each The center deviation of harmonic wave corresponds to the amount of harmonic number), therefore correlation may be relatively low.
Can be for example by performing two frequency-portions (or frequency portion of frequency-portions and another frequency-portions Point inverse version) inner product calculate each correlation.Correlation can also by divided by N-1 normalize, wherein N is every Hits in individual frequency-portions.In some implementations, Pearson (Pearson) Coefficient of production-moment correlation can be used.
Some or all of above-mentioned correlation can be used for the score for determining pitch estimated accuracy.It is for example, humorous for eight Ripple, frequency-portions eight correlations related to its own inverse version can be calculated, frequency-portions and another can be directed to 28 correlations of correlation computations between individual frequency-portions, and the reverse of frequency-portions and another frequency-portions can be directed to 28 correlations are calculated between version.These correlations can combine in any suitable manner, to obtain the essence of pitch estimation The total score of degree.For example, correlation can be added or be multiplied to obtain total score.
In some implementations, Fisher can be used to become and brings combined relevance.Individual phasic property r Fisher conversion can be with It is calculated as:
In the area-of-interest of individual correlation, Fisher conversion can be approximated to be:
F(r)≈r
The Fisher conversion of individual correlation can have the probability density function of approximate Gaussian, and its standard deviation isWherein N is the quantity of the sample in each part.Therefore, using above-mentioned approximation method, individual correlation The probability density function f (r) of Fisher conversion can be expressed as:
It may then pass through the f (r) for calculating each correlation and they be multiplied to calculate total score.Therefore, if there is M Individual correlation, then total score s can be calculated as likelihood score:
Or score S can be calculated as log likelihood:
These scores can be used for obtaining accurate pitch estimation, such as golden section search or any by iterative process The gradient descent algorithm of species.For example, it can be estimated to estimate to initialize accurate pitch with rough pitch.It can be directed to currently Other pitch values near accurate pitch estimation and the estimation of accurate pitch calculate score.Work as if the score of another pitch value is higher than The score of preceding pitch estimation, then current pitch can be estimated to be arranged to other pitch values.The process, Zhi Daoda can be repeated To appropriate stop condition.
In some implementations, it is determined that the process of accurate pitch estimation can be with restrained, such as require that accurate pitch estimation exists In the range of rough pitch estimation.The scope can be determined using any appropriate technology.For example, can be according to rough pitch The variance or confidential interval of estimation determine the scope, such as the confidential interval of rough pitch estimation is determined using bootstrap technique. Scope can be determined according to confidential interval, such as the multiple of confidential interval.When it is determined that accurate pitch is estimated, it can limit and search Rope so that accurate pitch estimation is without departing from specified range.
In some implementations, it is determined that after fraction chirp rate and pitch, it may be necessary to estimate the amplitude of the harmonic wave of signal (its can be complex value and including phase information).Each harmonic wave can be modeled as line frequency modulation small echo, wherein using estimation Pitch and the fraction chirp rate of estimation set the frequency and chirp rate of line frequency modulation small echo.For example, for kth subharmonic, harmonic wave Frequency can be k times of pitch of estimation, and the chirp rate of harmonic wave can be that fraction chirp rate is multiplied by line frequency modulation small echo Frequency.Any appropriate duration can be used in line frequency modulation small echo.
The amplitude of harmonic wave can be estimated using any appropriate technology, including for example maximum likelihood degree is estimated.At some In realization, the vector of harmonic amplitudeIt can be estimated as:
Wherein M is matrix, and where each row corresponds to the line frequency modulation small echo of each harmonic wave with parameter as described above, matrix M line number corresponds to the amplitude number of the harmonic wave to be estimated, and h is hermitian transposition, and x is that the time series of signal represents.Harmonic wave shakes The estimation of width can be complex value, and in some implementations, can use other functions of amplitude, such as amplitude, Amplitude-squared, Real part or imaginary part.
In some implementations, can in previous step calculated amplitude, and need not clearly calculate again.For example, In previous processing step use LLR frequency spectrums in the case of, can when calculating LLR frequency spectrums calculated amplitude.By by Gauss Calculating LLR frequency spectrums in frequency spectrum are fitted to, the fitting parameter of Gauss is the amplitude of Gauss.During LLR frequency spectrums are calculated The amplitude of Gauss can be preserved, and these amplitudes can be called, rather than is recalculated.In some implementations, according to LLR frequencies The amplitude that spectrum determines can be starting point, and for example can improve amplitude by using iterative technique.
Above-mentioned technology can be carried out to the continuous part of pending signal, such as every ten milliseconds of signal frame.For place Each part of the signal of reason, it may be determined that fraction chirp rate, pitch harmony wave-amplitude.Fraction chirp rate, pitch harmonic are shaken Width can partly or entirely be referred to as HAM (harmonic amplitude matrix) feature, and can create the feature including HAM features to Amount.Except or be substituted for handle harmonic signal any other feature, can also use HAM features characteristic vector.For example, Except or substitute mel-frequency cepstrum coefficient, perceive linear prediction feature or neural network characteristics, HAM features can also be used. HAM features can apply to any application of harmonic signal, including but not limited to perform speech recognition, word identification, and speaker knows Not, speaker verification, noise reduction or signal reconstruction.
Figure 11-14 is the flow chart for the example implementation for showing above-mentioned processing.Pay attention to, for flow charts described below, step Rapid order is exemplary, and other are sequentially possible, and not all step is all required, and in some realities In existing, it is convenient to omit some steps can add other steps.The processing of flow chart can be by one or more computers (for example, calculation as described below machine) is realized.
Figure 11 is the flow chart of the example implementation of the feature for the Part I for calculating signal.In step 1110, signal is obtained A part.Signal can estimate any signal that feature comes in handy, including but not limited to voice signal or music signal. The part can be any relevant portion of signal, and the part can be for example at regular intervals (such as every 10 milliseconds) The signal frame of extraction.
In step 1120, the fraction chirp rate of a part for signal is estimated.It can be estimated point using any of the above described technology Number chirp rate.For example, multiple possible fraction chirp rates can be identified, and can be that each possible fraction chirp rate calculates Score.Score can be calculated using the function of such as any of the above described function g ().Top score can be corresponded to by selection Fraction chirp rate determine the estimation of fraction chirp rate.In some implementations, fraction Zhou can be determined using iterative process The more accurate estimation for rate of singing, such as by selecting additional possible fraction chirp rate and with golden section search or gradient Iteration drops.Function g () can be using any frequency representation of above-mentioned Part I as input, including but not limited to Part I Frequency spectrum, the LLR frequency spectrums of Part I, the broad sense frequency spectrum of Part I, the frequency of Part I-chirp distribution, or first The PVT divided.
In step 1130, the frequency representation of a part for signal is calculated using the fraction chirp rate of estimation.Frequency representation It can be any expression of the signal section of the function as frequency.Frequency representation can be such as fixed frequency spectrum, broad sense frequency spectrum, The row of LLR frequency spectrums or PVT.Frequency representation can be calculated during the processing of step 1120, it is not necessary to be independent step.Example Such as, can it is determined that fraction chirp rate estimation other processing during calculate frequency representation.
In step 1140, frequency of use represents to calculate rough pitch estimation from a part for signal.Above-mentioned can be used What technology determines rough pitch estimation.For example, above-mentioned any kind of frequency spectrum and various parameters can be directed to (such as not With threshold value, different smoothing kernels) and the other parts of signal determine peak to peak distance.Then can use histogram or on State any other technology and rough pitch estimation is calculated according to peak to peak distance.
In step 1150, frequency of use represents and the estimation of rough pitch calculates accurate pitch estimation from a part for signal. It can be estimated to estimate to initialize accurate pitch with rough pitch, then be improved with iterative process.Estimate for accurate pitch Each possible values of meter, can calculate the score of such as likelihood score or log likelihood etc, and can pass through maximization Score determines the estimation of accurate pitch.Score can be determined using the combination of correlation as described above.It can use any Appropriate program (such as golden section search or gradient decline) maximizes score.
In step 1160, harmonic amplitude is calculated using the fraction chirp rate of estimation and the pitch of estimation.For example, it can lead to It is line frequency modulation small echo and to perform maximal possibility estimation to calculate harmonic amplitude to cross each Harmonic Modeling.
For the continuous part or time interval of signal, Figure 11 processing can be repeated.For example, can be with every 10 milliseconds calculating Fraction chirp rate, pitch harmony wave-amplitude.Fraction chirp rate, pitch harmony wave-amplitude can be used for various applications, including but not It is limited to pitch tracking, signal reconstruction, speech recognition and speaker verification or identification.
Figure 12 is the flow chart of the example implementation of the fraction chirp rate for the part for calculating signal.In step 1210, as above It is described, obtain a part for signal.
In step 1220, multiple frequency representations of the part of signal are calculated, and any of the above described technology can be used To calculate frequency representation.Each frequency representation can correspond to fraction chirp rate.In some implementations, can (i) according to PVT's OK, the radial section that (ii) is distributed according to frequency-chirp, or (iii) use the interior of the part of the signal with line frequency modulation small echo Product (wherein, the chirp rate of line frequency modulation small echo increases with frequency) calculates frequency representation.
In step 1230, score is calculated for each frequency representation, and each score corresponds to fraction chirp rate.Score can To indicate corresponding to the matching between the fraction chirp rate of score and the fraction chirp rate of signal section.It can use any of the above described Technology calculates score.In some implementations, score, such as frequency representation can be calculated with autocorrelation that frequency of use represents Amplitude-squared autocorrelation.Fisher information, entropy can be used, Kullback-Leibler is dissipated, and autocorrelation is put down Side's (or Amplitude-squared) value and/or autocorrelation second derivative quadratic sum calculates score according to autocorrelation.
In step 1240, the fraction chirp rate of a part for signal is estimated.In some implementations, corresponded to most by selection The fraction chirp rate of high score carrys out estimated score chirp rate.In some embodiments, iterative technique (such as gold can be used Sectioning search or gradient decline) improve the estimation of fraction chirp rate.Then the fraction chirp rate of estimation can be used for as above The further processing of described signal, such as speech recognition or Speaker Identification.
Figure 13 is the flow chart of the example implementation of the pitch estimation for the part for calculating signal.In step 1310, as above institute The Part I for obtaining signal is stated, and in step 1320, the frequency of the Part I of signal is calculated using any of the above described technology Rate represents.
In step 1330, threshold value is selected using any of above-mentioned technology.It is, for example, possible to use signal to noise ratio selects Threshold value, or threshold value can be selected using the height at the peak in the frequency representation of the Part I of signal.
Multiple peaks in step 1340, the frequency representation of the Part I of identification signal.It is any appropriate to use Technology identifies peak.For example, the value of frequency representation compared with threshold value, can be consistently higher than the frequency representation of threshold value with identification The continuous part of (each frequency-portions).Peak can be for example identified in the following manner:Select the peak of frequency-portions, selection Midpoint between the beginning of frequency-portions and the end point of frequency-portions, or curve (such as Gauss) is fitted to frequency Part simultaneously uses fitting selection peak.Therefore it can be represented to identify the frequency-portions higher than threshold value, and identified each with processing frequency The peak of frequency-portions.
In step 1350, multiple peak to peak distances in the frequency representation for the Part I for calculating signal.Each peak can be with It is associated with the frequency values corresponding to peak.Peak to peak distance may be calculated the difference of the frequency values of adjacent peak.For example, if peak is deposited It is 230Hz, 690Hz, 920Hz, and 1840Hz (for example, similar to 931 in Fig. 9 B, 932,933 and 934), then peak to peak Distance can be 460Hz, 230Hz and 920Hz.
Other threshold values can be directed to, to other changes set with same threshold, or to other threshold values The change repeat step 1330,1340 and 1350 of other settings.It is for example, as set forth above, it is possible to multiple in frequency of use expression The height at peak selects multiple threshold values, can use second frequency corresponding with the Part II of signal represent identical threshold value or Other threshold values (for example, wherein Part II is before or after Part I), and identical or other threshold values can be with Different smoothing kernels are used together.
In step 1360, the histogram of calculating peak to peak distance.Histogram can be used in above-mentioned peak to peak distance Some or all.Any appropriate vertical bar width, such as 2 to 5Hz vertical bar width can be used.
In step 1370, determine that pitch is estimated using the histogram of peak to peak distance.In some implementations, pitch is estimated It can correspond to the pattern of histogram.In some implementations, it can determine that pitch is estimated using multiple histograms.For example, can To calculate multiple histograms for multiple threshold values (or combination of multiple threshold values and other specification (such as moment or smoothing kernel)), and And each determination original pitch estimation in multiple histograms can be directed to.Can be for example, by most common first by selecting Pitch estimation is walked, determines that final pitch is estimated according to multiple preliminary pitch estimations.
Figure 14 is the flow chart of the example implementation of the pitch estimation for the part for calculating signal.In step 1410, as above institute State, obtain the frequency representation of a part for signal.
In step 1420, the pitch estimation of signal section is obtained.The pitch estimation obtained can use estimation pitch Any technology calculates, including but not limited to above-mentioned rough pitch estimation technique.The pitch estimation of acquisition is considered will The original pitch estimation of renewal, or be considered the operation pitch updated by iterative process and estimate.
In step 1430, multiple frequency-portions of frequency representation are obtained.Times that each frequency-portions can be estimated with pitch Centered on number.For example, first frequency part can be centered on pitch be estimated, second frequency part can be with the two of pitch estimation Centered on times, by that analogy.Frequency-portions can use any appropriate width.For example, frequency-portions can be with dividing frequency table Show, can be with overlapping, or there is gap between them.
In step 1440, multiple frequency-portions that frequency of use represents calculate multiple correlations.Calculate correlation it Before can further processing frequency part.For example, it is N that each frequency-portions can be extracted and stored in length from frequency representation Vector in, wherein the beginning started corresponding to frequency-portions of vector, and the end of vector is corresponding to the knot of frequency-portions Beam.Frequency-portions can shift subsample amount so that frequency-portions arrange exactly.For example, pitch estimation can be located at frequency Between the Frequency point of expression (for example, 230Hz pitch estimation can be between Frequency point 37 and Frequency point 38, about position For 37.3).Therefore, the beginning of frequency-portions, center and end can be defined by fractional sampling value.Frequency-portions can be moved Seat sampling quantity so that the beginning of frequency-portions, one or more of center and end correspond to the integer sample of frequency representation This.In some implementations, can also by subtract average value and divided by frequency-portions standard deviation come to frequency-portions carry out Normalization.
Correlation can include any of following correlation:Between first frequency part and second frequency part Correlation, the correlation between first frequency part and reverse second frequency part, and first frequency part and reverse first Correlation between frequency-portions.Correlation can be calculated using any appropriate technology.For example, frequency-portions can be from frequency Rate is extracted and stored in representing in vector as described above, and (or can have another by the inner product for performing vector The vectorial inner product of the inverted version of vector) calculate correlation.
In step 1450, combined relevance is to obtain the score of pitch estimation.Any appropriate technology next life can be used Into score, including the product of correlation is for example calculated, the summation of correlation, the combination of the Fisher conversion of correlation, or it is related Property likelihood score log likelihood or the Fisher of correlation conversion combination, as described above.
In step 1460, renewal pitch estimation.For example, the first score and the second pitch of the first pitch estimation can be compared Second fraction of estimation, and can determine that pitch is estimated by selecting the pitch with top score to estimate.It can repeat Step 1420 is carried out the continuous pitch that updates to 1460, with the technology using such as golden section search or gradient decline etc and estimated. Can be with repeat step 1420 to 1460, until reaching some appropriate stop conditions, such as maximum iteration or to basis The improvement for the pitch estimation previously estimated is reduced under threshold value.
Figure 15 shows the part of a realization of the computing device 110 for realizing any of the above described technology.In fig.15, Part is shown as on single computing device 1510, but the part system that can be distributed in such as computing device etc is more In individual computing device, including such as terminal user's computing device (for example, smart mobile phone or tablet personal computer) and/or server calculate Equipment (for example, cloud computing).For example, the collection of voice data and the pretreatment of voice data can be held by terminal user's computing device OK, and other operations can be performed by server.
Computing device 1510 can include any typical component of computing device, such as volatile and nonvolatile memory 1520, one or more processors 1521, and one or more network interfaces 1522.Computing device 1510 can also include appointing What input and output block, such as display, keyboard and touch-screen.Computing device 1510 can also include providing specific function Various parts or module, and these parts or module can be realized with software, hardware or its combination.Below, show for one Example property realizes the several examples for describing part, and other realizations can include additional component or exclusion described below one A little parts.
Computing device 1510, which can have, is used to perform input signal any required operation (for example, analog-to-digital conversion, is compiled Code, decoding, sub-sampling, adding window or calculate frequency representation) Signal Processing Element 1530.Computing device 1510, which can have, to be used Any of the above described technology estimates the fraction chirp rate estimation section 1531 of the fraction chirp rate of signal.Computing device 1510 can have There is the rough pitch estimation section 1532 for using peak to peak distance estimations signal pitch as described above.Computing device 1510 can have There is the accurate pitch estimation section 1533 for estimating signal pitch using correlation as described above.Computing device 1510 can have The HAM features generating unit 1534 of harmonic amplitude is determined as described above.
Computing device 1510 can also have the part that above-mentioned technology is applied to application-specific.For example, computing device 1510 can have speech recognition component 1540, speaker verification's part 1541, Speaker Identification part 1542, signal reconstruction portion Any one in part 1543 and word identification component 1544.For example, estimated score chirp rate, estimation tone and estimation harmonic wave shake Width may be used as the input of any application, and make outside the further feature or parameter applied for these or as replacement With.
It can be executed in different order according to realization, the step of any of the above described technology, can combine, can be divided into more Individual step, or can not perform completely.The step of being performed by all-purpose computer, can be by being exclusively used in the calculating of application-specific Machine performs, and can sequentially can be held by multiple computers or computing device by single computer or computing device OK, or can carry out simultaneously.
Above-mentioned technology can be with hardware, and the combination of software or hardware and software is realized.On realizing in hardware or in software The requirement of specific implementation can be depended on by stating any portion of selection of technology.Software module or program code may have easy The non-transitory of mistake memory, nonvolatile storage, RAM, flash memory, ROM, EPROM or any other form is computer-readable to deposit In storage media.
Conditional statement used herein, such as " can ", " can with ", " possibility ", " meeting ", " such as " it is intended to mean that some realities Now including still other realizations does not include some features, element and/or step.Therefore, such conditional statement shows, Mou Xieshi Feature, element and/or step are not needed now.Term " comprising ", "comprising", " having " etc. are synonymous, are made in an open-ended fashion With, and add ons are not excluded for, feature, act, operation.Term "or" with its pardon meaning, (rather than anticipate by its exclusiveness Justice) use, therefore when for such as connecting element list, term "or" refers to one of element in list, some or all.
Unless expressly stated otherwise, otherwise the loigature language of such as phrase " at least one in X, Y and Z " should be understood Expression project, term etc. can be X, Y or Z, or its combination.Therefore, this connection language is not meant to some embodiments It is required that at least one X be present, at least one Y and at least one Z are to each existing.
Although foregoing detailed description it has been shown that be described and pointed out the novel feature applied to various embodiments, It is it is appreciated that can made respectively to the form and details of shown equipment or technology without departing from the spirit of the invention Kind is omitted, and is replaced and is changed.Description of the scope of invention disclosed herein by appended claims rather than above indicates. All changes in the implication and scope of the equivalent of claim will be included in the range of it.

Claims (20)

1. a kind of computer implemented method for being used to estimate the feature of harmonic signal, this method include:
Obtain a part for signal;
Calculate the estimated score chirp rate of a part for the signal;
The first frequency that a part for the signal is calculated using the fraction chirp rate of estimation is represented;
The first pitch of the Part I of the signal is calculated using multiple peak to peak distances in first frequency expression Estimation;And
Using first pitch estimation and the signal a part second frequency expression first frequency part with it is described Correlation between the second frequency part that second frequency represents is estimated to calculate the second pitch of a part for the signal.
2. according to the method for claim 1, also come including the use of the fraction chirp rate of estimation and second pitch estimation Calculate the amplitude of multiple harmonic waves of a part for the signal.
3. according to the method for claim 1, wherein the second frequency represents it is that the first frequency represents.
4. according to the method for claim 1, wherein the fraction chirp rate for calculating estimation includes calculating multiple scores, wherein institute Stating multiple scores includes the first score and the second score, and first score is calculated using the first fraction chirp rate, and described second Score is calculated using the second fraction chirp rate, and calculates the fraction chirp rate of estimation by selecting top score.
5. the autocorrelation that according to the method for claim 4, wherein frequency of use represents calculates first score, and And calculate the frequency representation using the first fraction chirp rate.
6. according to the method for claim 1, wherein performing the one of the signal by using the function of frequency and chirp rate Partial inner product represents to calculate the first frequency, and the chirp rate of wherein described function increases with frequency.
7. according to the method for claim 1, wherein being come using the estimation cumulative distribution function of the multiple peak to peak distance Calculate the first pitch estimation.
8. according to the method for claim 1, wherein the first frequency part corresponds to the of first pitch estimation One multiple, and the second frequency part corresponds to the second multiple of first pitch estimation.
9. the method according to claim 11, in addition to:
Characteristic vector is calculated using the amplitude of the multiple harmonic wave;And
Speech recognition is performed using the characteristic vector, speaker verification, at least one in Speaker Identification or signal reconstruction It is individual.
10. a kind of system for being used to estimate the feature of harmonic signal, the system include one or more computing devices, and described one Individual or multiple computing devices include at least one processor and at least one memory, one or more of computing devices by with It is set to:
Obtain a part for signal;
Calculate the estimated score chirp rate of a part for the signal;
The first frequency that a part for the signal is calculated using the fraction chirp rate of estimation is represented;
The signal is calculated using multiple peak to peak distances in the first frequency expression of a part for the signal The first pitch estimation of Part I;And
Using first pitch estimation and the signal a part second frequency expression first frequency part with it is described Correlation between the second frequency part that second frequency represents is estimated to calculate the second pitch of a part for the signal.
11. system according to claim 10, wherein one or more of computing devices are also configured to use estimation Fraction chirp rate and second pitch estimation calculate the amplitude of multiple harmonic waves of a part for the signal.
12. system according to claim 10, wherein calculating the first frequency table using the fraction chirp rate of estimation Show.
13. system according to claim 10, wherein the second frequency represents that being different from the first frequency represents.
14. system according to claim 10, represented wherein calculating the first frequency using pitch velocity transformation.
15. system according to claim 10, wherein being calculated using the histogram of the multiple peak to peak distance described First pitch is estimated.
16. system according to claim 10, wherein one or more of computing devices be additionally configured to by using The inverted version of first frequency part calculates correlation to calculate the second pitch estimation.
17. one or more non-transitory computer-readable mediums, including computer executable instructions, the computer can perform Instruction when executed acts at least one computing device, and the action includes:
Obtain a part for signal;
Calculate the estimated score chirp rate of a part for the signal;
The signal is calculated using multiple peak to peak distances in the first frequency expression of a part for the signal The first pitch estimation of Part I;And
Using first pitch estimation and the signal a part second frequency expression first frequency part with it is described Correlation between the second frequency part that second frequency represents is estimated to calculate the second pitch of a part for the signal.
18. one or more non-transitory computer-readable mediums according to claim 17, in addition to computer can be held Row instruction, the computer executable instructions make at least one computing device action, the action bag when executed Include using the fraction chirp rate of estimation and second pitch estimation to calculate shaking for multiple harmonic waves of a part for the signal Width.
19. one or more non-transitory computer-readable mediums according to claim 17, in addition to computer can be held Row instruction, the computer executable instructions make at least one computing device action, the action bag when executed Include:
Obtain the Part II of the signal;
Calculate the second estimated score chirp rate of the Part II of the signal;
Calculate the 3rd pitch estimation of the Part II of the signal;And
Estimate to estimate to calculate the 4th pitch of the Part II of the signal using the 3rd pitch.
20. one or more non-transitory computer-readable mediums according to claim 19, in addition to computer can be held Row instruction, the computer executable instructions make at least one computing device action, the action bag when executed Include:
Multiple harmonic waves of a part for the signal are calculated using the fraction chirp rate of estimation and second pitch estimation Amplitude;
Use the magnitude determinations characteristic vector;
The of the Part II of the signal is calculated using the second estimated score chirp rate and the 4th pitch estimation Second amplitude of individual harmonic wave more than two;
Use the second magnitude determinations second feature vector;And
Perform speech recognition using the characteristic vector and the second feature vector, speaker verification, Speaker Identification or It is at least one in signal reconstruction.
CN201680017664.6A 2015-02-06 2016-02-03 Determine the feature of harmonic signal Pending CN107430850A (en)

Applications Claiming Priority (17)

Application Number Priority Date Filing Date Title
US201562112836P 2015-02-06 2015-02-06
US201562112832P 2015-02-06 2015-02-06
US201562112796P 2015-02-06 2015-02-06
US201562112850P 2015-02-06 2015-02-06
US62/112,850 2015-02-06
US62/112,832 2015-02-06
US62/112,796 2015-02-06
US62/112,836 2015-02-06
US14/969,029 US9870785B2 (en) 2015-02-06 2015-12-15 Determining features of harmonic signals
US14/969,029 2015-12-15
US14/969,038 2015-12-15
US14/969,022 2015-12-15
US14/969,038 US9842611B2 (en) 2015-02-06 2015-12-15 Estimating pitch using peak-to-peak distances
US14/969,036 2015-12-15
US14/969,022 US9548067B2 (en) 2014-09-30 2015-12-15 Estimating pitch using symmetry characteristics
US14/969,036 US9922668B2 (en) 2015-02-06 2015-12-15 Estimating fractional chirp rate with multiple frequency representations
PCT/US2016/016261 WO2016126753A1 (en) 2015-02-06 2016-02-03 Determining features of harmonic signals

Publications (1)

Publication Number Publication Date
CN107430850A true CN107430850A (en) 2017-12-01

Family

ID=60239707

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201680017664.6A Pending CN107430850A (en) 2015-02-06 2016-02-03 Determine the feature of harmonic signal

Country Status (2)

Country Link
EP (1) EP3254282A1 (en)
CN (1) CN107430850A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108389575A (en) * 2018-01-11 2018-08-10 苏州思必驰信息科技有限公司 Audio data recognition methods and system
CN108399923A (en) * 2018-02-01 2018-08-14 深圳市鹰硕技术有限公司 More human hairs call the turn spokesman's recognition methods and device
CN108510991A (en) * 2018-03-30 2018-09-07 厦门大学 Utilize the method for identifying speaker of harmonic series
CN110931035A (en) * 2019-12-09 2020-03-27 广州酷狗计算机科技有限公司 Audio processing method, device, equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1525435A (en) * 2003-02-24 2004-09-01 国际商业机器公司 Method and apparatus for estimating pitch frequency of voice signal
CN102197423A (en) * 2008-10-30 2011-09-21 高通股份有限公司 Coding of transitional speech frames for low-bit-rate applications
US20130041656A1 (en) * 2011-08-08 2013-02-14 The Intellisis Corporation System and method for tracking sound pitch across an audio signal
CN103718242A (en) * 2011-03-25 2014-04-09 英特里斯伊斯公司 System and method for processing sound signals implementing a spectral motion transform
CN103999076A (en) * 2011-08-08 2014-08-20 英特里斯伊斯公司 System and method of processing a sound signal including transforming the sound signal into a frequency-chirp domain
CN104200818A (en) * 2014-08-06 2014-12-10 重庆邮电大学 Pitch detection method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1525435A (en) * 2003-02-24 2004-09-01 国际商业机器公司 Method and apparatus for estimating pitch frequency of voice signal
CN102197423A (en) * 2008-10-30 2011-09-21 高通股份有限公司 Coding of transitional speech frames for low-bit-rate applications
CN103718242A (en) * 2011-03-25 2014-04-09 英特里斯伊斯公司 System and method for processing sound signals implementing a spectral motion transform
US20130041656A1 (en) * 2011-08-08 2013-02-14 The Intellisis Corporation System and method for tracking sound pitch across an audio signal
CN103999076A (en) * 2011-08-08 2014-08-20 英特里斯伊斯公司 System and method of processing a sound signal including transforming the sound signal into a frequency-chirp domain
CN104200818A (en) * 2014-08-06 2014-12-10 重庆邮电大学 Pitch detection method

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108389575A (en) * 2018-01-11 2018-08-10 苏州思必驰信息科技有限公司 Audio data recognition methods and system
CN108389575B (en) * 2018-01-11 2020-06-26 苏州思必驰信息科技有限公司 Audio data identification method and system
CN108399923A (en) * 2018-02-01 2018-08-14 深圳市鹰硕技术有限公司 More human hairs call the turn spokesman's recognition methods and device
WO2019148586A1 (en) * 2018-02-01 2019-08-08 深圳市鹰硕技术有限公司 Method and device for speaker recognition during multi-person speech
CN108510991A (en) * 2018-03-30 2018-09-07 厦门大学 Utilize the method for identifying speaker of harmonic series
CN110931035A (en) * 2019-12-09 2020-03-27 广州酷狗计算机科技有限公司 Audio processing method, device, equipment and storage medium
CN110931035B (en) * 2019-12-09 2023-10-10 广州酷狗计算机科技有限公司 Audio processing method, device, equipment and storage medium

Also Published As

Publication number Publication date
EP3254282A1 (en) 2017-12-13

Similar Documents

Publication Publication Date Title
EP3440672B1 (en) Estimating pitch of harmonic signals
CN102124518B (en) Apparatus and method for processing an audio signal for speech enhancement using a feature extraction
US9485597B2 (en) System and method of processing a sound signal including transforming the sound signal into a frequency-chirp domain
US9870785B2 (en) Determining features of harmonic signals
US9922668B2 (en) Estimating fractional chirp rate with multiple frequency representations
CN107430850A (en) Determine the feature of harmonic signal
US20040199382A1 (en) Method and apparatus for formant tracking using a residual model
US20080189109A1 (en) Segmentation posterior based boundary point determination
CN112116922B (en) Noise blind source signal separation method, terminal equipment and storage medium
Karthikeyan et al. Hybrid machine learning classification scheme for speaker identification
US9548067B2 (en) Estimating pitch using symmetry characteristics
Kumar et al. A new pitch detection scheme based on ACF and AMDF
US20210256970A1 (en) Speech feature extraction apparatus, speech feature extraction method, and computer-readable storage medium
Kwon et al. Speech enhancement combining statistical models and NMF with update of speech and noise bases
US11929086B2 (en) Systems and methods for audio source separation via multi-scale feature learning
Singh et al. Application of different filters in mel frequency cepstral coefficients feature extraction and fuzzy vector quantization approach in speaker recognition
US10235993B1 (en) Classifying signals using correlations of segments
Ming et al. An iterative longest matching segment approach to speech enhancement with additive noise and channel distortion
CN113593604A (en) Method, device and storage medium for detecting audio quality
CN112786068A (en) Audio source separation method and device and storage medium
US9842611B2 (en) Estimating pitch using peak-to-peak distances
KR101524848B1 (en) audio type recognizer
Ahuja et al. A complex matrix factorization approach to joint modeling of magnitude and phase for source separation
CN116665698A (en) Pulse sound identification method based on Hilbert-Huang transform and Mel spectrum transform
Sharma et al. Reduced feature sets for vowel recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20171201

WD01 Invention patent application deemed withdrawn after publication