CN107430850A - Determine the feature of harmonic signal - Google Patents
Determine the feature of harmonic signal Download PDFInfo
- Publication number
- CN107430850A CN107430850A CN201680017664.6A CN201680017664A CN107430850A CN 107430850 A CN107430850 A CN 107430850A CN 201680017664 A CN201680017664 A CN 201680017664A CN 107430850 A CN107430850 A CN 107430850A
- Authority
- CN
- China
- Prior art keywords
- frequency
- signal
- pitch
- estimation
- chirp rate
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/02—Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/90—Pitch determination of speech signals
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Signal Processing (AREA)
- Auxiliary Devices For Music (AREA)
- Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)
Abstract
The feature that can be calculated from harmonic signal includes fraction chirp rate, the amplitude of pitch harmonic.For example, can be by calculating the score corresponding to different fraction chirp rates and selecting top score come estimated score chirp rate.For example, can be by using the peak to peak distance in frequency distribution, the frequency representation calculated from the fraction chirp rate using estimation calculates the first pitch.For example, the second pitch can be calculated using the frequency representation of the first pitch and signal by using the correlation of the part of frequency representation.The amplitude of the harmonic wave of signal can be determined using the fraction chirp rate and the second pitch of estimation.The fraction chirp rate of estimation, any one in the second pitch harmony wave-amplitude can be used for further handling, such as speech recognition, speaker verification, Speaker Identification or signal reconstruction.
Description
Priority request
The application is based on and requires following priority application:Entitled " the spectrum motion transform " submitted on 2 6th, 2015
U.S. Provisional Patent Application No.62/112836;The U.S. of entitled " the pitch evaluation of speed " submitted on 2 6th, 2015 is interim
Patent application No.62/112796;The US provisional patent Shen of entitled " estimation of peak value section pitch " submitted on 2 6th, 2015
Please 62/112832;The U.S. Provisional Patent Application 62/ of entitled " pitch from symmetrical feature " submitted on 2 6th, 2015
The U.S. Non-provisional Patent application 14/ of entitled " feature for determining harmonic signal " submitted on December 15th, 112850 and 2015
969029;The U.S. Non-provisional Patent application 14/ of entitled " using the symmetrical feature estimation pitch " submitted on December 15th, 2015
969022;The U.S. of entitled " utilizing multiple frequency representation estimated score chirp rates " for submitting on December 15th, 2015 is non-provisional specially
Profit application 14/969036;The U.S. of entitled " utilizing peak to peak distance estimations pitch " for submitting on December 15th, 2015 is non-provisional
Patent application 14/969038, its content are incorporated herein by reference in their entirety.
Background technology
Harmonic signal can have fundamental frequency and one or more overtones.Harmonic signal includes such as voice and music.Harmonic wave
Signal can have fundamental frequency, and it can be referred to as first harmonic.Harmonic signal can include may be at the multiple of first harmonic
The other harmonic waves occurred.If for example, it is f in certain time fundamental frequency, the frequency of other harmonic waves can be 2f, 3f etc..
The fundamental frequency of harmonic signal can change over time.For example, when a people is speaking, the fundamental frequency of voice may ask
Increase at the end of topic.The change of signal frequency can be referred to as chirp rate.The chirp rate of harmonic signal can for different harmonic waves
Can be different.For example, if first harmonic has chirp rate c, other harmonic waves can have 2c, 3c etc. chirp rate.
In such as speech recognition, in the application such as signal reconstruction and speaker identification, it may be necessary to determine harmonic signal at any time
Between characteristic.For example, it may be desirable to determine the pitch of signal, pitch changes with time rate, or frequency, chirp rate or different humorous
The amplitude of ripple.
The content of the invention
In one embodiment, inventive features can include:
1st, a kind of computer implemented method for being used to estimate pitch, methods described include:
Obtain the frequency representation of the Part I of signal;
Obtain the first pitch estimation of the Part I of the signal;
Estimated using first pitch to identify multiple frequency-portions of the frequency representation, the multiple frequency-portions
Including first frequency part and second frequency part;
Multiple correlations are calculated using the multiple frequency-portions, the multiple correlation includes the first frequency portion
The first correlation divided between the second frequency part;
The first score is calculated using the multiple correlation;With
The second pitch is calculated using first score to estimate.
2nd, the method according to clause 1, wherein the multiple correlation also includes (i) described first frequency part and institute
State the second correlation between the inverted version of second frequency part, and (ii) described first frequency part and first frequency
Closing property of third phase between the inverted version of rate part.
3rd, the method according to clause 1, wherein the multiple frequency-portions split the frequency representation.
4th, the method according to clause 1, wherein calculating first score includes calculating in the multiple correlation
The likelihood score or log likelihood of each correlation.
5th, the method according to clause 1, held wherein calculating the second pitch estimation including the use of first score
Row golden section search or gradient decline.
6th, the method according to clause 1, wherein each frequency-portions in the multiple frequency-portions are with described first
Centered on the multiple of pitch.
7th, the method according to clause 1, it is additionally included in before calculating the multiple correlation to the multiple frequency portion
Each frequency-portions in point are normalized.
8th, the method according to clause 1, also estimate including the use of second pitch to perform speech recognition, speaker
Verify, it is at least one in Speaker Identification or signal reconstruction.
9th, a kind of system for being used to estimate the feature of harmonic signal, the system include one or more computing devices, institute
Stating one or more computing devices includes at least one processor and at least one memory, one or more of computing devices
It is configured as:
Obtain the frequency representation of the Part I of signal;
Obtain the first pitch estimation of the Part I of the signal;
Estimated using first pitch to identify multiple frequency-portions of the frequency representation, the multiple frequency-portions
Including first frequency part and second frequency part;
Multiple correlations are calculated using the multiple frequency-portions, the multiple correlation includes the first frequency portion
The first correlation divided between the second frequency part;
The first score is calculated using the multiple correlation;With
The second pitch is calculated using first score to estimate.
10th, the system according to clause 1, wherein the multiple correlation also include (i) described first frequency part with
The second correlation between the inverted version of the second frequency part, and (ii) described first frequency part and described first
Closing property of third phase between the inverted version of frequency-portions.
11st, the system according to clause 1, wherein the multiple frequency-portions split the frequency representation.
12nd, the system according to clause 1, wherein calculating first score includes calculating in the multiple correlation
Each correlation takes snow (Fisher) conversion.
13rd, the system according to clause 1, wherein each frequency-portions in the multiple frequency-portions are with described first
Centered on the multiple of pitch.
14th, the system according to clause 1, wherein one or more of computing devices are additionally configured to described in calculating
Each frequency-portions in the multiple frequency-portions are normalized before multiple correlations.
15th, the system according to clause 1, wherein one or more of computing devices are additionally configured to:
Estimated using second pitch to identify individual frequency-portions more than the second of the frequency representation, the multiple frequency
Part includes the 3rd frequency-portions and the 4th frequency-portions;
More than second individual correlations are calculated using more than described second individual frequency-portions;
The second score is calculated using more than described second individual correlations;With
The 3rd pitch is calculated using second score to estimate.
16th, one or more non-transitory computer-readable mediums, including computer executable instructions, the computer can
Execute instruction when executed acts at least one computing device, and the action includes:
Obtain the frequency representation of the Part I of signal;
Obtain the first pitch estimation of the Part I of the signal;
Estimated using first pitch to identify multiple frequency-portions of the frequency representation, the multiple frequency-portions
Including first frequency part and second frequency part;
Multiple correlations are calculated using the multiple frequency-portions, the multiple correlation includes the first frequency portion
The first correlation divided between the second frequency part;
The first score is calculated using the multiple correlation;With
The second pitch is calculated using first score to estimate.
17th, one or more non-transitory computer-readable mediums according to clause 16, wherein being arrived using multiple peaks
Peak distance is estimated to calculate first pitch.
18th, one or more non-transitory computer-readable mediums according to clause 16, wherein point using estimation
Number chirp rate calculates the frequency representation.
19th, one or more non-transitory computer-readable mediums according to clause 16, wherein the multiple correlation
Property also include the second correlation between (i) described first frequency part and the inverted version of the second frequency part, and
(ii) closing property of the third phase between the inverted version of the first frequency part and the first frequency part.
20th, one or more non-transitory computer-readable mediums according to clause 16, wherein the multiple correlation
Property also include (i) the multiple frequency-portions in each pair frequency-portions between correlation, (ii) the multiple frequency-portions
In each pair frequency-portions between correlation, wherein a frequency-portions in each pair frequency-portions are inverted,
And (iii) each correlation between frequency-portions and the inverted version of its own.
In another embodiment, inventive features can include:
1st, a kind of computer implemented method for estimated score chirp rate, methods described include:
Obtain a part for signal;
The first frequency that a part for the signal is calculated using the first value of fraction chirp rate is represented;
Represent to calculate the first score using the first frequency;
The second frequency that a part for the signal is calculated using the second value of fraction chirp rate is represented;
Represent to calculate the second score using the second frequency;With
The estimated score chirp of a part for the signal is calculated using first score and second score.
2nd, the method according to clause 1, wherein the first frequency represents it is that frequency of use chirp is distributed, pitch speed
The inner product of a part for conversion or the signal with line frequency modulation small echo (chirplet) calculates.
3rd, the method according to clause 1, wherein methods described also include calculating multiple frequencies that the first frequency represents
The log-likelihood ratio of rate, and wherein described log-likelihood ratio is the log-likelihood ratio of harmonic wave in frequency be present and in the frequency
The ratio of the log-likelihood ratio of harmonic wave is not present in rate.
4th, the method according to clause 1, wherein the autocorrelation represented using the first frequency calculates described
One score.
5th, the method according to clause 4, wherein the Fisher information of the autocorrelation represented using the first frequency
To calculate first score.
6th, the method according to clause 1, wherein the fraction chirp rate for calculating estimation, which includes selection, corresponds to top score
Fraction chirp rate.
7th, the fraction chirp rate of the method according to clause 1, wherein methods described also including the use of estimation is to estimate
State the pitch of a part for signal.
8th, the method according to clause 7, also including the use of in the fraction chirp rate of estimation or the pitch of estimation at least
One performs speech recognition, speaker verification, at least one in Speaker Identification or signal reconstruction.
9th, a kind of system for estimated score chirp rate, the system include one or more computing devices, and described one
Individual or multiple computing devices include at least one processor and at least one memory, one or more of computing devices by with
It is set to:
Obtain a part for signal;
The first frequency that a part for the signal is calculated using the first value of fraction chirp rate is represented;
Represent to calculate the first score using the first frequency;
The second frequency that a part for the signal is calculated using the second value of fraction chirp rate is represented;
Represent to calculate the second score using the second frequency;With
The estimated score chirp of a part for the signal is calculated using first score and second score.
10th, the system according to clause 9, wherein one or more of computing devices are additionally configured to calculate described
The log-likelihood ratio of multiple frequencies of one frequency representation, and wherein described log-likelihood ratio is pair for existing in frequency harmonic wave
Count likelihood ratio and the ratio of the log-likelihood ratio of harmonic wave is not present in the frequency.
11st, the system according to clause 9, wherein the autocorrelation represented using the first frequency calculates described
One score.
12nd, the system according to clause 11, wherein the Fisher letters of the autocorrelation represented using the first frequency
Cease to calculate first score.
13rd, the system according to clause 9, wherein first score indicates the first fraction chirp rate and the letter
Number a part fraction chirp rate between matching.
14th, the system according to clause 9, wherein one or more of computing devices are also configured to use estimation
Fraction chirp rate estimates the pitch of a part for the signal.
15th, the system according to clause 15, wherein one or more of computing devices are also configured to use estimation
Fraction chirp rate or estimation pitch at least one perform speech recognition, speaker verification, Speaker Identification or letter
Number reconstruct in it is at least one.
16th, one or more non-transitory computer-readable mediums, including computer executable instructions, the computer can
Execute instruction upon execution acts at least one computing device, and the action includes:
Obtain a part for signal;
The first frequency that a part for the signal is calculated using the first value of fraction chirp rate is represented;
Represent to calculate the first score using the first frequency;
The second frequency that a part for the signal is calculated using the second value of fraction chirp rate is represented;
Represent to calculate the second score using the second frequency;With
The estimated score chirp of a part for the signal is calculated using first score and second score.
17th, one or more non-transitory computer-readable mediums according to clause 16, wherein calculating described first
Score includes calculating the function that the first frequency represents.
18th, one or more non-transitory computer-readable mediums according to clause 16, wherein the action is also wrapped
Include:
The 3rd frequency representation of a part for the signal is calculated using the 3rd value of fraction chirp rate;
The 3rd score is calculated using the 3rd frequency representation;With
The estimated score chirp of a part for the signal is wherein calculated also including the use of the 3rd score.
19th, one or more non-transitory computer-readable mediums according to clause 16, wherein:
The 3rd frequency representation is changed by using the first fraction chirp rate to represent to create the first frequency;
And
The 3rd frequency representation is changed by using the second fraction chirp rate to represent to create the second frequency.
20th, one or more non-transitory computer-readable mediums according to clause 19, wherein the 3rd frequency
Represent Fourier (Fourier) conversion corresponding to a part for the signal.
In another embodiment, inventive features can include:
1st, a kind of computer implemented method for being used to estimate pitch, methods described include:
Obtain the Part I of signal;
The first frequency for calculating the Part I of the signal represents;
More than first individual peak values in the first frequency expression are identified using first threshold;
Individual peak to peak distance more than first is calculated using the position in the frequency of more than described first individual peak values;And
Use the pitch of the Part I of signal described in more than described first individual peak to peak distance estimations.
2nd, the method according to clause 1, wherein estimating that the pitch of the Part I includes estimating described more than first
The cumulative distribution function of peak to peak distance.
3rd, the method according to clause 1, histogram, and its also are calculated including the use of the multiple peak to peak distance
The pitch of the Part I of the middle estimation signal estimates the pitch including the use of the histogram.
4th, the method according to clause 1, wherein being counted using the estimated score chirp rate of the Part I of the signal
The first frequency is calculated to represent.
5th, the method according to clause 1, represented wherein calculating the first frequency including the use of the first smoothing kernel.
6th, the method according to clause 1, wherein the first frequency represents to include log-likelihood ratio (LLR) frequency spectrum.
7th, the method according to clause 1, wherein the first frequency represents to include fixed frequency spectrum.
8th, the method according to clause 1, also speech recognition is performed including the use of the pitch of estimation, speaker verification,
It is at least one in Speaker Identification or signal reconstruction.
9th, a kind of system for being used to estimate pitch, the system includes one or more computing devices, one or more
Individual computing device includes at least one processor and at least one memory, and one or more of computing devices are configured as:
Obtain the Part I of signal;
The first frequency for calculating the Part I of the signal represents;
More than first individual peak values in the first frequency expression are identified using first threshold;
Individual peak to peak distance more than first is calculated using the position in the frequency of more than described first individual peak values;And
Use the pitch of the Part I of signal described in more than described first individual peak to peak distance estimations.
10th, the system according to clause 9, wherein one or more of computing devices are additionally configured to by estimating
The cumulative distribution function of individual peak to peak distance more than first is stated to estimate the pitch of the Part I.
11st, the system according to clause 9, wherein one or more of computing devices be also configured to use it is described more
Individual peak to peak distance calculates histogram, and estimates using the histogram pitch of the Part I of the signal.
12nd, the system according to clause 9, wherein to be also configured to use first flat for one or more of computing devices
Sliding core represents to calculate the first frequency.
13rd, the system according to clause 9, wherein the first frequency represents to include log-likelihood ratio (LLR) frequency spectrum.
14th, the system according to clause 9, wherein one or more of computing devices are additionally configured to:
More than second individual peak values in the first frequency expression are identified using Second Threshold;
Individual peak to peak distance more than second is calculated using the position in the frequency of more than described second individual peak values;And
Use the pitch of the Part I of signal described in more than described second individual peak to peak distance estimations.
15th, the system according to clause 9, wherein one or more of computing devices are additionally configured to:
Obtain the Part II of the signal;
The second frequency for calculating the Part II of the signal represents;
Identify more than second individual peak values in the second frequency expression;
Individual peak to peak distance more than second is calculated using the position in the frequency of more than described second individual peak values;And
Use the pitch of the Part I of signal described in more than described second individual peak to peak distance estimations.
16th, the system according to clause 12, wherein one or more of computing devices are additionally configured to:
The second frequency that the Part I of the signal is calculated using the second smoothing kernel is represented;
Identify more than second individual peak values in the second frequency expression;
Individual peak to peak distance more than second is calculated using the position in the frequency of more than described second individual peak values;And
Use the pitch of the Part I of signal described in more than described second individual peak to peak distance estimations.
17th, one or more non-transitory computer-readable mediums, including computer executable instructions, the computer can
Execute instruction upon execution acts at least one computing device, and the action includes:
Obtain the Part I of signal;
The first frequency for calculating the Part I of the signal represents;
Identify more than first individual peak values in the first frequency expression;
Individual peak to peak distance more than first is calculated using the position in the frequency of more than described first individual peak values;And
Use the pitch of the Part I of signal described in more than described first individual peak to peak distance estimations.
18th, one or more non-transitory computer-readable mediums according to clause 17, wherein estimating described first
Partial pitch includes estimating the cumulative distribution function of more than the first individual peak to peak distance.
19th, one or more non-transitory computer-readable mediums according to clause 17, also including the use of described more
Individual peak to peak distance calculates histogram, and wherein estimates the pitch of the Part I of the signal including the use of the Nogata
Figure estimates the pitch.
20th, one or more non-transitory computer-readable mediums according to clause 17, wherein the first frequency
Expression includes log-likelihood ratio (LLR) frequency spectrum.
In yet another embodiment, inventive features can include:
1st, a kind of computer implemented method for being used to estimate the feature of harmonic signal, this method include:
Obtain a part for signal;
Calculate the estimated score chirp rate of a part for the signal;
The first frequency that a part for the signal is calculated using the fraction chirp rate of estimation is represented;
The first of the Part I of the signal is calculated using multiple peak to peak distances in first frequency expression
Pitch is estimated;And
Using first pitch estimation and the signal a part second frequency represent first frequency part and
Correlation between the second frequency part that the second frequency represents is estimated to calculate the second pitch of a part for the signal
Meter.
2nd, the method according to clause 1, also come including the use of the fraction chirp rate of estimation and second pitch estimation
Calculate the amplitude of multiple harmonic waves of a part for the signal.
3rd, the method according to clause 1, wherein the second frequency represents it is that the first frequency represents.
4th, the method according to clause 1, wherein the fraction chirp rate for calculating estimation includes calculating multiple scores, wherein institute
Stating multiple scores includes the first score and the second score, and first score is calculated using the first fraction chirp rate, and described second
Score is calculated using the second fraction chirp rate, and calculates the fraction chirp rate of estimation by selecting top score.
5th, the autocorrelation that the method according to clause 4, wherein frequency of use represent calculates first score, and
And calculate the frequency representation using the first fraction chirp rate.
6th, the method according to clause 1, wherein performing the one of the signal by using the function of frequency and chirp rate
Partial inner product represents to calculate the first frequency, and the chirp rate of wherein described function increases with frequency.
7th, the method according to clause 1, wherein being come using the estimation cumulative distribution function of the multiple peak to peak distance
Calculate the first pitch estimation.
8th, the method according to clause 1, wherein the first frequency part corresponds to the of first pitch estimation
One multiple, and the second frequency part corresponds to the second multiple of first pitch estimation.
9th, the method according to clause 2, in addition to:
Characteristic vector is calculated using the amplitude of the multiple harmonic wave;And
Speech recognition is performed using the characteristic vector, speaker verification, in Speaker Identification or signal reconstruction extremely
It is few one.
10th, a kind of system for being used to estimate the feature of harmonic signal, the system include one or more computing devices, institute
Stating one or more computing devices includes at least one processor and at least one memory, one or more of computing devices
It is configured as:
Obtain a part for signal;
Calculate the estimated score chirp rate of a part for the signal;
The first frequency that a part for the signal is calculated using the fraction chirp rate of estimation is represented;
The letter is calculated using multiple peak to peak distances in the first frequency expression of a part for the signal
Number Part I the first pitch estimation;And
Using first pitch estimation and the signal a part second frequency represent first frequency part and
Correlation between the second frequency part that the second frequency represents is estimated to calculate the second pitch of a part for the signal
Meter.
11st, the system according to clause 10, wherein one or more of computing devices are also configured to use estimation
Fraction chirp rate and second pitch estimation calculate the amplitude of multiple harmonic waves of a part for the signal.
12nd, the system according to clause 10, wherein calculating the first frequency table using the fraction chirp rate of estimation
Show.
13rd, the system according to clause 10, wherein the second frequency represents that being different from the first frequency represents.
14th, the system according to clause 10, represented wherein calculating the first frequency using pitch velocity transformation.
15th, the system according to clause 10, wherein being calculated using the histogram of the multiple peak to peak distance described
First pitch is estimated.
16th, the system according to clause 10, wherein one or more of computing devices be additionally configured to by using
The inverted version of first frequency part calculates correlation to calculate the second pitch estimation.
17th, one or more non-transitory computer-readable mediums, including computer executable instructions, the computer can
Execute instruction upon execution acts at least one computing device, and the action includes:
Obtain a part for signal;
Calculate the estimated score chirp rate of a part for the signal;
The letter is calculated using multiple peak to peak distances in the first frequency expression of a part for the signal
Number Part I the first pitch estimation;And
Using first pitch estimation and the signal a part second frequency represent first frequency part and
Correlation between the second frequency part that the second frequency represents is estimated to calculate the second pitch of a part for the signal
Meter.
18th, one or more non-transitory computer-readable mediums according to clause 17, in addition to computer can be held
Row instruction, the computer executable instructions make at least one computing device action, the action bag when executed
Include using the fraction chirp rate of estimation and second pitch estimation to calculate shaking for multiple harmonic waves of a part for the signal
Width.
19th, one or more non-transitory computer-readable mediums according to clause 17, in addition to computer can be held
Row instruction, the computer executable instructions make at least one computing device action, the action bag when executed
Include:
Obtain the Part II of the signal;
Calculate the second estimated score chirp rate of the Part II of the signal;
Calculate the 3rd pitch estimation of the Part II of the signal;And
Estimate to estimate to calculate the 4th pitch of the Part II of the signal using the 3rd pitch.
20th, one or more non-transitory computer-readable mediums according to clause 19, in addition to computer can be held
Row instruction, the computer executable instructions make at least one computing device action, the action bag when executed
Include:
The multiple humorous of a part for the signal is calculated using the fraction chirp rate of estimation and second pitch estimation
The amplitude of ripple;
Use the magnitude determinations characteristic vector;
The Part II of the signal is calculated using the second estimated score chirp rate and the 4th pitch estimation
More than second individual harmonic waves the second amplitude;
Use the second magnitude determinations second feature vector;And
Speech recognition, speaker verification are performed using the characteristic vector and the second feature vector, speaker knows
It is at least one not or in signal reconstruction.
Brief description of the drawings
With reference to the following drawings, it is possible to understand that of the invention and its some embodiments it is described in detail below:
Fig. 1 shows the example of the harmonic signal with different fraction chirp rates.
Fig. 2 shows the frequency spectrum of a part for voice signal.
Fig. 3 shows the expression of the frequency and chirp rate of harmonic signal.
Fig. 4 shows the expression of the frequency and fraction chirp rate of harmonic signal.
Fig. 5 shows two examples of the broad sense frequency spectrum of signal.
Fig. 6 shows the pitch velocity transformation of voice signal.
Fig. 7 shows two examples of the broad sense frequency spectrum of voice signal.
Fig. 8 shows the LLR frequency spectrums of voice signal.
Fig. 9 A show the peak to peak distance of the single threshold value in the LLR frequency spectrums of voice signal.
Fig. 9 B show the peak to peak distance of multiple threshold values in the LLR frequency spectrums of voice signal.
Figure 10 A show the frequency-portions of the frequency representation of the voice signal for the estimation of the first pitch.
Figure 10 B show the frequency-portions of the frequency representation of the voice signal for the estimation of the second pitch.
Figure 11 is the flow chart for the example implementation for calculating signal characteristic.
Figure 12 is the flow chart of the example implementation for the fraction chirp rate for estimating signal.
Figure 13 is the flow chart using the example implementation of peak to peak distance estimations signal pitch.
Figure 14 is the flow chart that the example implementation of signal pitch is estimated using correlation.
Figure 15 can be used for estimating the exemplary computer device of signal characteristic.
Embodiment
There has been described for determining technology of the harmonic signal with the property of time.For example, the characteristic of harmonic signal can be with
(for example, every 10 milliseconds) determine at regular intervals.These characteristics can be used for handling voice or other signals, such as conduct
For performing automatic speech recognition or speaker verification or knowing another characteristic.These characteristics can also be used for performing signal reconstruction to drop
The noise level of low harmony wave signal.
The estimation to the characteristic of harmonic signal can be improved using the relation between the harmonic wave of harmonic signal.For example, such as
First subharmonic of fruit harmonic signal has frequency f and chirp rate c, then the multiple that the frequency of expected higher hamonic wave is f, chirp rate
For c multiple.Result more more preferable than other technologies can be provided using the technology of these relations.
Harmonic signal can have pitch.For some harmonic signals, pitch can correspond to the frequency of the first subharmonic.
For some harmonic signals, the first subharmonic may be not present or invisible (for example, it may be possible to being covered by noise), and can root
Pitch is determined according to the difference on the frequency between second harmonic and triple-frequency harmonics.For some harmonic signals, multiple harmonic waves there may be
Or it is invisible, and pitch can be determined according to the frequency of visible harmonic wave.
The pitch of harmonic signal may time to time change.For example, the pitch of sound or the note of musical instrument such as when
Between and change.With the change in pitch of harmonic signal, each harmonic wave will have chirp rate, and the chirp rate of each harmonic wave may
It is different.The rate of change of pitch is properly termed as pitch speed or described by fraction chirp rate.In some implementations, fraction chirp rate
It may be calculated χ=cn/fn, wherein χ expression fraction chirp rates, cnThe chirp rate of n-th harmonic is represented, fn represents n-th harmonic
Frequency.
In some implementations, it may be desirable to calculate the pitch and/or fraction chirp rate of harmonic signal at regular intervals.Example
Such as, it may be desirable to which calculating is performed to calculate every 10 milliseconds of pitch and/or fraction chirp rate, the signal to a part for signal
Part can by signal application time window (for example, Gauss (Gaussian), Hamming (Hamming) or Korea Spro's grace (Hann)
Window) and obtain.The continuous part of signal can be referred to as frame, and frame can be with overlapping.For example, one can be created with every 10 milliseconds
Individual frame, the length of each frame can be 50 milliseconds.
Fig. 1 shows four harmonic signals with different fraction chirp rates as time and the example of the function of frequency.
Fig. 1 does not represent actual signal, but provides line frequency modulation small echo and (have specified time, frequency, chirp rate and the Gauss of duration
Signal) how in the middle concept map occurred of T/F expression (such as spectrogram).
Harmonic signal 110 is centered on time t1 and has four harmonic waves.The frequency of first subharmonic is f, second, the
It is respectively three times 2f with the frequency of four-time harmonic, 3f and 4f.Because the frequency of harmonic wave is constant over time, often
The chirp rate of individual harmonic wave is 0.Therefore, the fraction chirp rate of harmonic signal 110 is 0.
Harmonic signal 120 is centered on time t2 and has four harmonic waves.The frequency of first subharmonic is 2f, second,
Third time and the frequency of four-time harmonic are respectively 4f, 6f and 8f.The chirp rate c of first subharmonic for just because frequency with
The passage of time and increase.Secondth, third time and the chirp rate of four-time harmonic are respectively 2c, 3c and 4c.Therefore, harmonic wave is believed
Numbers 120 fraction chirp rate is c/2f.
Harmonic signal 130 is centered on time t3 and has four harmonic waves.The frequency of first subharmonic is f, second, the
It is respectively three times 2f with the frequency of four-time harmonic, 3f and 4f.The chirp rate of first subharmonic is also c, second, third time and
The chirp rate of four-time harmonic is respectively 2c, 3c and 4c.Therefore, the fraction chirp rate of harmonic signal 130 is c/f, and it is harmonic wave
Twice of signal 120.
Harmonic signal 140 is centered on time t4 and has four harmonic waves.The frequency of first subharmonic is f, second, the
It is respectively three times 2f with the frequency of four-time harmonic, 3f and 4f.The chirp rate of first subharmonic is 2c, because frequency change rate is
Twice of harmonic signal 130.Secondth, third time and the chirp rate of four-time harmonic are respectively 4c, 6c and 8c.Therefore, harmonic wave
The fraction chirp rate of signal 140 is 2c/f, and it is twice of harmonic signal 130.
Fig. 2 shows the frequency spectrum of a part for voice signal.In spectrogram, it can be seen that multiple harmonic waves.In spectrogram
In each moment, harmonic wave has above-mentioned relation.For example, at each moment, the frequency and chirp rate of the second subharmonic are first
The frequency of subharmonic and twice of chirp rate.
Fig. 3 shows the example of four harmonic signals of the function as frequency and chirp rate, and this will be claimed herein
For frequency-chirp distribution or represent.Fig. 3 does not represent actual signal, but provides Fig. 1 harmonic signal such as how frequency and chirp
The concept map that the expression of rate occurs.When calculating frequency-chirp expression, time variable, therefore frequency-chirp point may be not present
Cloth can represent whole signal, rather than a part for the signal of special time.In some implementations, it may be necessary to calculate corresponding
Frequency-chirp distribution in the part of the signal of different time.For example, it may be desirable to by applying sliding window to signal to count
Calculate every 10 milliseconds of frequency-chirp distribution.
Can be by the frequency and chirp rate for the harmonic wave for checking harmonic signal shown in Fig. 1 come conceptive structural map 3.For example,
For harmonic signal 110, each chirp rate is 0, and the frequency of four harmonic waves is respectively 2f, 3f and 4f.Therefore, harmonic signal
110 four harmonic waves are expressed in these positions in figure 3.Similarly, the harmonic wave of harmonic signal 120,130 and 140 according to
Their own frequency and chirp rate from Fig. 1 figure 3 illustrates.
It can be distributed using similar to the technology for calculating T/F distribution (such as spectrogram) to calculate frequency-chirp.
For example, in some implementations, it can be distributed using inner product to calculate frequency-chirp.Represent that frequency-chirp is distributed with FC (f, c),
Wherein f corresponds to frequency variable, and c corresponds to chirp rate variable.It can be distributed using inner product to calculate frequency-chirp rate:
FC (f, c)=<X, ψ (f, c)>
Wherein x is the signal (or its window portion) handled, and ψ (f, c) is parameterized by frequency f and chirp rate c
Function.In some implementations, ψ (f, c) can represent a line frequency modulation small echo, such as:
Wherein σ corresponds to duration or the propagation of chirp, t0It is the position of line frequency modulation small echo in time.In order to calculate
The distribution of frequency and chirp rate, appropriate function ψ (f, c) can be selected, such as line frequency modulation small echo, and calculate f and c multiple values
FC (f, c).Frequency-chirp distribution is not limited to above-mentioned example, and can otherwise calculated.For example, frequency-chirp distribution
The real part of inner product, imaginary part, amplitude or Amplitude-squared can be calculated as, the measurement of the similarity in addition to inner product can be used
Calculate, or can be calculated using the nonlinear function of signal.
The fraction chirp rate that four harmonic signals in Fig. 3 are had nothing in common with each other.The fraction chirp rate of harmonic signal 110 is 0, humorous
The fraction chirp rate of ripple signal 120 is c/2f, and the fraction chirp rate of harmonic signal 130 is c/f, and point of harmonic signal 120
Number chirp rate is 2c/f.Therefore dash line and pecked line in Fig. 3 represent the line of constant fraction chirp rate.With short stroke-pecked line
Centered on harmonic wave by the fraction chirp rate with c/2f, harmonic wave centered on pecked line by the fraction chirp rate with c/f,
And the harmonic wave centered on dash line is by the fraction chirp rate with 2c/f.
Therefore, any RADIAL in Fig. 3 corresponds to constant fraction chirp rate.According to the observation, frequency can be generated
With the distribution of fraction chirp rate, it can be referred to as pitch-velocity transformation (PVT) or chirp (chirprum).PVT can be with table
P (f, χ) is shown as, wherein f corresponds to frequency variable, and χ corresponds to fraction chirp rate variable.Conceptually, can be by distorting frequency
Rate-chirp distribution so that the RADIAL of frequency-chirp distribution is changed into PVT horizontal line to construct PVT.Fig. 4 is shown according to figure
3 frequency-PVT caused by chirp distribution concept example.Because each harmonic wave of harmonic signal has identical fraction chirp
Rate, so their horizontal alignments, as shown in Figure 4.
In some implementations, PVT can be calculated according to frequency-chirp distribution.For example, PVT may be calculated:
P (f, χ)=FC (f, χ f)
Because that c=χ f.However, it is not necessary to PVT is calculated according to frequency-chirp distribution.
PVT can also be calculated using similar to the technology for calculating T/F distribution (for example, spectrogram).For example,
During some are realized, PVT can be calculated using inner product.Frequency-chirp rate distribution may be calculated:
P (f, χ)=<X, ψ (f, χ f)>
Wherein ψ () is function as described above.In order to calculate PVT, appropriate function ψ (), such as line frequency modulation can be selected
Small echo, and calculate the P (f, χ) of f and χ multiple values.PVT is not limited to above-mentioned example, and PVT can be calculated in other ways.
For example, PVT can be calculated as the real part of inner product, imaginary part, amplitude or Amplitude-squared, the similarity in addition to inner product can be used
Measurement calculate, or can be calculated using the nonlinear function of signal.
The PVT of the designated value of fraction chirp rate is the function of frequency, is considered the frequency spectrum or broad sense frequency spectrum of signal.
Therefore, for each value of fraction chirp rate, broad sense frequency spectrum can be determined according to the PVT associated with particular fraction chirp rate.
Broad sense frequency spectrum is properly termed as X χ (f).As described below, these broad sense frequency spectrums need not calculate from PVT, and can be in other ways
Calculate.The PVT of fraction chirp rate is specified to correspond to PVT section, it will be referred to as PVT row (if PVT is not with herein
Same orientation is presented, then is referred to as arranging, and PVT orientation is not the limited features of technology described herein).In order to explain
Clarity, will be that function ψ () use line frequency modulation small echo in the following discussion, but for ψ () can use it is any suitably
Function.
For the fraction chirp rate for 0, PVT corresponds to
P (f, 0)=<X, ψ (f, 0)>
It corresponds to the inner product of the signal with Gauss, and the chirp rate that wherein it is zero that Gauss, which has, is simultaneously modulated onto PVT's
Corresponding frequencies f.This may be identical with calculating the short time discrete Fourier transform of signal with Gauss window.
For non-zero fraction chirp rate, PVT corresponds to the inner product of gaussian signal, wherein the chirp rate of Gauss with Gauss frequency
Rate increases and increased.Especially, chirp rate can be the product of fraction chirp rate and frequency.For non-zero chirp rate, PVT may
With similar to slow down or reduce signal fraction chirp rate effect (or on the contrary, accelerate or increase signal fraction chirp
Rate).Therefore, PVT often row corresponds to broad sense frequency spectrum, in the broad sense frequency spectrum, the fraction chirp rate of signal by with PVT
Row corresponding to value changed.
When the fraction chirp rate of broad sense frequency spectrum (or PVT rows) is equal to the fraction chirp rate of signal, broad sense frequency spectrum can correspond to
In the fraction chirp rate for removing signal, and the broad sense frequency spectrum for being directed to the value of fraction chirp rate can be referred to as the fixation of signal
The optimal row of frequency spectrum or PVT.
Fig. 5 is shown using imaginary broad sense frequency caused by two different fraction chirp rate values of harmonic signal 140 shown in Fig. 1
Compose (or PVT rows).Four peaks (511,512,513,514) show broad sense frequency spectrum, the fraction of its mid-score chirp rate and signal
Chirp rate matches, and this is properly termed as fixed frequency spectrum.Due to the fraction chirp rate of the row of broad sense frequency spectrum and the fraction chirp rate of signal
Matching, so the width at (i) four peaks may be than the broad sense narrow spectrum of other fraction chirp rate values, and the height at (ii) four peaks
It could possibly be higher than the broad sense frequency spectrum of other fraction chirp rate values.Because peak may become narrower and higher, they may be than other
Broad sense frequency spectrum is more easily detected.The peak of fixed frequency spectrum may be narrower and higher, because fixed frequency spectrum has the fraction for eliminating signal
The effect of chirp rate.
Four peaks (521,522,523,524) show the broad sense of the fraction chirp rate different from the fraction chirp rate of signal
Frequency spectrum.Because the fraction chirp rate of broad sense frequency spectrum mismatches with signal, peak may be shorter and wider.
Fig. 6 shows PVT of the signal shown in Fig. 2 at about 0.21 second.Now, signal is of approximately 230Hz pitch
About 4 fraction chirp rate.PVT shows the signal characteristic of each harmonic wave.For example, PVT about 230Hz on the frequency axis
Place, and 4 on fraction chirp rate axle at show the first subharmonic.Similarly, PVT is on the frequency axis about at 460Hz, and
The second subharmonic is shown at 4 on fraction chirp rate axle, by that analogy.Under frequency between harmonic wave, PVT has relatively low value,
Because the signal energy in these regions is relatively low.Under the fraction chirp rate different from 4, PVT has relatively low value, because PVT
Fraction chirp rate and signal fraction chirp rate mismatch.
Fig. 7 shows two broad sense frequency spectrums corresponding with the row of PVT shown in Fig. 6.Solid line corresponds to broad sense frequency spectrum, wherein dividing
Number chirp rate matches with the fraction chirp rate (being about 4 fraction chirp rate) or fixed frequency spectrum of signal.Pecked line corresponds to tool
Have the broad sense frequency spectrum of zero number chirp, its will be referred to as zero broad sense frequency spectrum (and can correspond to signal Short-time Fourier become
Change).The peak of fixed frequency spectrum is higher and narrower than the peak of zero broad sense frequency spectrum.For the first subharmonic, the peak 711 of fixed frequency spectrum is zero
Twice of the height of broad sense spectral peak 721 and 1/3rd of width.For third harmonic, the peak 712 and zero of fixed frequency spectrum
Difference between the peak 722 of broad sense frequency spectrum is even more big.For the seventh harmonic, the peak 713 of fixed frequency spectrum be it is high-visible,
But the peak of zero broad sense frequency spectrum is sightless.
The fraction chirp rate of signal can be determined using the feature of different broad sense frequency spectrums (or PVT row).As above institute
State, for the right value of fraction chirp rate, the peak of broad sense frequency spectrum can be with narrower and higher.Therefore, for the narrower of measurement signal
It can be used for the fraction chirp rate of estimation signal with the technology of higher peak.
For estimated score chirp rate, can use vectorial (for example, frequency spectrum) as input and according to some standards
Export the function of one or more scores.G () be using vector as input (such as PVT broad sense frequency spectrum or row) and export and
The function of one or more values or score corresponding to input.In some implementations, g () output can be the peak of instruction input
The numeral of degree.For example, g () can correspond to entropy, Fisher information, KL (Kullback-Leibler) divergence or input
The magnitude of biquadratic or higher power.Using function g (), in the following manner can be used to estimate the fraction chirp of signal according to PVT
Rate:
WhereinIt is the estimation of fraction chirp rate.Function g () can be calculated for PVT multirow, and can select to produce g
The row of the peak of () corresponds to the estimated score chirp rate of signal.
The estimation of fraction chirp rate can also be calculated according to frequency-chirp distribution (for example, above-mentioned frequency chirp is distributed):
Can also be according to the estimation of broad sense spectrometer point counting number chirp rate:
Function ψ () can also be utilized to calculate the estimation of fraction chirp rate using the inner product of signal:
As set forth above, it is possible to PVT is calculated using various technologies, it is each in frequency-chirp rate distribution and broad sense frequency spectrum
It is individual.In some implementations, this tittle can be determined by calculating the inner product of the signal with line frequency modulation small echo, but this paper institutes
The technology of description is not limited to the specific implementation.It is, for example, possible to use the function in addition to chirp, and can use and remove inner product
Outside similarity measurement.
In some implementations, broad sense frequency spectrum can be changed before the fraction chirp rate for determining signal.For example, can be with
Log-likelihood ratio (LLR) frequency spectrum is calculated from broad sense frequency spectrum, and can be LLRx (f) by LLR frequency spectrum designations.LLR frequency spectrums can be with
Determine to whether there is harmonic wave in the frequency of frequency spectrum to improve using measuring technology is assumed.For example, in order to determine solid shown in Fig. 7
Determine to whether there is harmonic wave in the frequency of frequency spectrum, can be by the value of frequency spectrum compared with threshold value.It can improve this using LLR frequency spectrums
One determines.
LLR frequency spectrums can be calculated using the log-likelihood ratio of two hypothesis:(1) harmonic wave be present at the frequency of signal,
And harmonic wave is not present in (2) at the frequency of signal.For each in two hypothesis, likelihood score can be calculated.Can compare
Compared with the two likelihood scores to determine whether there is harmonic wave, such as the ratio of the logarithm by calculating two likelihood scores.
In some implementations, can be by, by Gauss curve fitting to signal spectrum, then calculating gaussian sum signal at frequency
Between residual sum of squares (RSS) come the log likelihood of harmonic wave existing for calculating at signal frequency.In order to which Gauss is intended at frequency
Close in frequency spectrum, then Gauss can be calculated centered on frequency using suitable for estimating any technology of these parameters
The amplitude of Gauss.In some implementations, the extension of the frequency of Gauss or duration can be matched for calculating signal spectrum
Window, or the extension of Gauss can also be determined during fit procedure.Will fixed frequency shown in Gauss curve fitting to Fig. 7 for example, working as
During the peak 711 of spectrum, the amplitude of Gauss can be approximated to be 0.12, and the duration of Gauss can be approximately corresponding to continuing for peak
Time (or for calculating the window of frequency spectrum).It may then pass through and calculate Gauss in the window of frequency components of likelihood score is calculated
Residual sum of squares (RSS) between signal spectrum calculates log likelihood.
In some implementations, the log likelihood in the absence of harmonic wave in frequency can correspond to calculating the frequency of likelihood score
Residual sum of squares (RSS) of the zero-frequency spectrum (being all zero frequency spectrum at all frequencies) between signal spectrum is calculated in window around rate to come
Calculate log likelihood.
LLR frequency spectrums can be by two likelihood scores of each frequency for calculating signal spectrum (for example, broad sense frequency spectrum), then
The logarithm (for example, natural logrithm) for calculating the ratio of two likelihood scores determines.Other steps, such as estimation letter can also be performed
Noise variance in number, and normalize log likelihood using the noise variance of estimation.In some implementations, for frequency f
LLR frequency spectrums may be calculated:
WhereinIt is the noise variance of estimation, X is frequency spectrum, and h is hermitian (Hermitian) transposition,It is frequency f
The best fit Gauss of the frequency spectrum at place.
Fig. 8 shows the example of the LLR frequency spectrums corresponding to fixed frequency spectrum shown in Fig. 7.For each frequency, LLR frequency spectrums exist
There is high level in the presence of harmonic wave, there is low value when in the absence of harmonic wave.Compared to other frequency spectrums (such as broad sense or fixed frequency spectrum), LLR
Frequency spectrum can preferably determine to whether there is harmonic wave on different frequency.
The estimation of LLR spectrometer point counting number chirp rates can also be used:
In order to illustrate some possible realizations of estimated score chirp rate, it will thus provide function g () example.The example below will
Using broad sense frequency spectrum, but other frequency spectrums can also be used, such as LLR frequency spectrums.
In some implementations, it can use and estimated score chirp rate is come to the quadruplicate magnitude of broad sense frequency spectrum:
g(Xχ(f))=∫ | Xχ(f)|4df
In some implementations, function g () can include at least some in following sequence of maneuvers:(1) calculate | Xχ(f)|2
(can by divided by the gross energy of signal or some other normalized values normalize);(2) calculate | Xχ(f)|2Auto-correlation
Property, it is expressed as rX(τ);(3) Fisher information, entropy, Kullback-Leibler divergings, r are calculatedXSquare (or the amplitude of (τ) value
Square) and, or rXThe second derivative quadratic sum of (τ).Above-mentioned example is not restricted, and other changes are possible.Example
Such as, in step (1), X can be usedχ(f) or its size, or real part or imaginary part replace | Xχ(f)|2。
Therefore, it is possible to use any combinations of above-mentioned technology or any similar techniques well known by persons skilled in the art are come really
Determine the fraction chirp rate of signal.
In addition to estimating the fraction chirp rate of signal, the pitch of signal can also be estimated.In some implementations, can be first
First estimated score chirp rate, and the fraction chirp rate estimated can be used for estimating pitch.For example, in estimated score chirp rate (table
It is shown as) after, pitch can be estimated using the broad sense frequency spectrum of the fraction chirp rate corresponding to estimation.
When estimating pitch, pitch estimates the difference that may have octave with real pitch, and the octave can be claimed
For octave error.For example, if true pitch is 300Hz, pitch estimation can be 150Hz or 600Hz.In order to avoid again
Sound interval error, pitch can be estimated using two-step method.It is possible, firstly, to determine that rough pitch estimation may be less accurate to obtain
But the estimation influenceed by octave error is less susceptible to, secondly, can estimate to estimate to improve rough pitch using accurate pitch.
Can be by calculating frequency spectrum, such as the peak of broad sense frequency spectrum or LLR frequency spectrums (estimation for corresponding to fraction chirp rate) arrives
Peak distance determines the estimation of rough pitch.For the sake of understanding in the following description, LLR frequency spectrums will be used as example frequency
Spectrum, but technique described herein is not limited to LLR frequency spectrums, and any appropriate frequency spectrum can be used.
When calculate the peak to peak in frequency spectrum apart from when, may it is not always clear which peak corresponds to signal, which peak pair
Should be in noise.Including the too many peak corresponding to noise or exclude that rough pitch estimation may be reduced corresponding to the peak of signal too much
Precision.Although the example LLR frequency spectrums in Fig. 8 have low noise, for the signal with high noise levels, it is also possible to deposit
In the additional peak as caused by noise.
In some implementations, peak can be selected from LLR frequency spectrums using threshold value.For example, it may be determined that the noise in frequency spectrum
Standard deviation (or variance), and can be using the standard deviation of noise to calculate or select threshold value, such as threshold value set
Multiple or fraction (for example, setting a threshold to twice of noise standard deviation) for standard deviation., can be true after selecting threshold value
Determine peak to peak distance.For example, Fig. 9 A show the peak to peak distance that threshold value is about 0.3.At the threshold value, preceding 5 peak to peaks away from
From about 230Hz, the 6th is about 460Hz, and the 7th and the 8th is about 230Hz, and the 9th is about 690Hz.
It is determined that after peak to peak distance, the peak to peak distance of most frequent appearance can be selected to estimate as rough pitch.For example, can be with
Histogram is calculated using the vertical bar (bin) that width is 2 to 5Hz, and the histogram with maximum count quantity can be selected
Vertical bar is estimated as rough pitch.
In some implementations, multiple threshold values as shown in Figure 9 B can be used.It is, for example, possible to use the peak in LLR frequency spectrums
Height, such as ten tops or higher than Second Threshold all peaks (for example, twice higher than the standard deviation of noise) come
Select threshold value.Each threshold calculations peak to peak distance can be directed to.In figures 9 b and 9, peak to peak is determined using top as threshold value
Distance 901, peak to peak distance 911 and 912 is determined using the second top as threshold value, it is true as threshold value using the 3rd top
Determine peak to peak distance 921,922 and 923, by that analogy.As described above, it can such as be selected most frequent by using histogram
The peak to peak distance of appearance is estimated as rough pitch.
In some implementations, multiple time frames can be directed to and calculates peak to peak distance, to determine that rough pitch is estimated.Example
Such as, in order to determine that the rough pitch of particular frame is estimated, can be directed to present frame, first five frame and follow-up five frame calculating peak to peak away from
From.The peak to peak distance of all frames may be incorporated in together to determine that rough pitch is estimated, such as calculate all peak to peak distances
Histogram.
In some implementations, can be by using different smoothing kernels to calculate peak to peak distance on frequency spectrum.By smoothing kernel
The peak as caused by noise may be reduced applied to frequency spectrum, it is also possible to reducing the peak as caused by signal.For noisy signal,
Broader core may perform better than, and may be performed better than for less noise signal, narrower core.It may be unaware that
Appropriate core width how is selected, therefore peak to peak can be calculated according to the frequency spectrum of each core width in the one group of core width specified
Distance.As described above, when it is determined that rough pitch is estimated, the peak to peak distance of all smoothing kernels can be merged.
Therefore, peak to peak distance, including but not limited to different threshold value, at the time of different can be calculated in various manners
(for example, frame) and different smoothing kernel.From these peak to peak distances, it may be determined that rough pitch estimation.In some implementations, may be used
So that the estimation of rough pitch is defined as into the frequency corresponding with the pattern of the histogram of the peak to peak distance of all calculating.
In some implementations, can be by estimating the cumulative distribution function (CDF) and/or probability density letter of peak to peak distance
Count (PDF) rather than rough pitch estimation is determined using histogram.For example, the CDF of pitch can be estimated as follows.For less than
Any pitch value of minimum peak to peak distance, CDF will be zero, and for any pitch value more than maximum peak to peak distance, CDF will
For one.For the pitch value between the two boundaries, CDF can be estimated as to the cumulative number for being less than the peak to peak distance of pitch value
Divided by the sum of peak to peak distance.For example, it is contemplated that the peak to peak distance shown in Fig. 9 A.Fig. 9 A show altogether 9 peak to peaks away from
From including 7 230Hz peak to peak distance, 1 460Hz peak to peak distance, and 1 690Hz peak to peak distance.It is right
In the frequency less than 230Hz, the value for estimating CDF is 0, and for the frequency between 230Hz and 460Hz, the value for estimating CDF is 7/
9, for the frequency between 460Hz and 690Hz, the value for estimating CDF is 8/9, for the frequency higher than 690Hz, estimates CDF's
It is worth for 1.
The CDF of the estimation can be similar to step function, therefore can use any appropriate smoothing technique (such as batten
Interpolation, LPF or local weighted recurrence scatterplot (LOWESS) are smooth) carry out smooth CDF.Rough pitch estimation can be determined
For the pitch value of the greatest gradient corresponding to CDF.
In some implementations, PDF can be estimated from CDF by calculating CDF derivative, and can use it is any appropriate
Technology calculates derivative.Then the estimation of rough pitch can be determined as corresponding to the pitch value at PDF peak.
In some implementations, it may be determined that multiple first coarse pitch estimations, and can estimate to come using preliminary pitch
It is determined that actual rough pitch estimation.For example, can select that first coarse pitch is estimated or the most frequently used rough pitch estimation is averaged
Value is as actual rough pitch estimation.For example, each that can be directed in one group of threshold value calculates rough pitch estimation.For height
Threshold value, rough pitch estimation may be too high, and for Low threshold, rough pitch estimation may be too low.For threshold between the two
Value, rough pitch estimation may be more accurate.In order to determine actual rough pitch estimation, multiple first coarse pitches can be calculated and estimated
The histogram of meter, and actual rough pitch estimation can correspond to the frequency of the pattern of histogram.In some implementations, can be with
Exceptional value is removed from histogram to improve actual rough pitch estimation.
After rough pitch estimation is obtained, rough pitch estimation can be used to obtain accurate pitch as starting point and estimated
Meter.The estimation of accurate pitch can be determined using the shape of each harmonic wave in frequency spectrum, and (it is any appropriate again, can to use
Frequency spectrum, such as broad sense frequency spectrum, fixed frequency spectrum or LLR frequency spectrums)., can be such as Figure 10 A and figure in order to compare the shape of harmonic wave in frequency spectrum
The part of extraction frequency spectrum shown in 10B.
Figure 10 A show the part of the frequency spectrum for the estimation of the first pitch, and wherein pitch estimates the true of very close signal
Flatness is high.Assuming that the true pitch of signal is about 230Hz, pitch estimation is also about 230Hz.Can be by using estimation
The multiple of pitch identifies a part for the frequency spectrum of each harmonic wave.In Figure 10 A, part 1010 is in about 230Hz, part
1011 are in about 460Hz, relatively high magnification numbes of the part 1012-1017 each in 230Hz.Due to pitch estimation be it is accurate,
So each harmonic approximation centered on the centre of each part.Estimate one of the pitch in audio signal based on symmetry characteristic
A little examples were submitted entitled " for according to the symmetry characteristic estimation audio letter independently of harmonic amplifier on the 30th in September in 2014
The U.S. Patent application No.14/502 of the system and method for pitch in number ", described in 844, entire contents are by quoting simultaneously
Enter herein.
Figure 10 B show the part of the frequency spectrum for the estimation of the second pitch, and wherein pitch estimation is slightly less than the true of signal
Pitch.For example, pitch estimation can be 228Hz, actual pitch is probably 230Hz.Again, the multiple that pitch can be used to estimate
To identify a part for the frequency spectrum of each harmonic wave.For each harmonic wave, the part slightly within the actual position of harmonic wave left side,
And offset and increase with the increase of multi harmonics.Part 1020 is located at the left side of the actual position of the first subharmonic about
At 2Hz, part 1021 is located at the left side of the actual position of the second subharmonic about at 4Hz, and part 1022-1027 respectively with
The increase of harmonic wave quantity and more and more to the left.For example, part 1027 is located at the left side of the actual position of the 8th subharmonic about
At 16Hz.
Frequency-portions from Figure 10 A and 10B are determined for the precision of pitch estimation.When pitch estimation is accurate,
As shown in Figure 10 A, each harmonic wave is centered on frequency-portions, therefore frequency-portions all have similar shape.When pitch is estimated
When inaccurate, as shown in Figure 10 B, each harmonic wave not centered on frequency-portions, and with harmonic wave quantity increase and more
It is off-center.Therefore, when pitch estimation is less accurate, frequency-portions are less similar each other.
In addition to by the shape of first frequency part compared with second frequency part, frequency-portions can be with it certainly
The inverse version of body is compared, because being generally in the shape of for harmonic wave is symmetrical.Estimate for accurate pitch, harmonic wave will be with frequency
Command troops to be divided into center, therefore invert the part to provide similar shape.Estimate for the pitch of inaccuracy, harmonic wave will not be with frequency
Command troops to be divided into center, and invert the part to cause different shapes.Similarly, first frequency part can be with second frequency
Partial inverted version is compared.
Frequency-portions can have any appropriate width.In some implementations, frequency-portions can be with split spectrum, may
It is overlapping with adjacent part, or can have gap (as illustrated in figs. 10 a and 10b) between them.The frequency-portions used can
With the frequency spectrum corresponding to any frequency representation, such as signal, or the real part of the frequency spectrum of signal, imaginary part, amplitude or Amplitude-squared.Frequently
Rate part can also be normalized, to eliminate and determine the small difference of the pitch degree of correlation.For example, for each frequency-portions, can
To determine average value and standard deviation, and can by subtract average value and then divided by standard deviation (for example, z scores) come pair
Frequency-portions are normalized.
Whether can have similar shape using correlation measuring two frequency-portions and determine harmonic wave whether with
Centered on expected frequence.The frequency-portions for pitch estimation can be determined as described above, and can be by calculating two frequencies
The inner product of rate part performs correlation.The correlation that can be performed includes:First frequency part is related to second frequency part, the
One frequency-portions are related to the inverted version of its own, and first frequency part and the inverse version of second frequency part
It is related.
There may be high value for more accurately pitch estimation correlation, estimate correlation for less accurate pitch
There may be lower value.Estimate for more accurately pitch, frequency-portions to each other and between mutual inverse version will tool
There is bigger similitude (for example, each harmonic wave is centered on frequency-portions), therefore correlation may be higher.For less accurate
True pitch estimation, frequency-portions to each other and between mutual inverse version by with relatively low similitude (for example, each
The center deviation of harmonic wave corresponds to the amount of harmonic number), therefore correlation may be relatively low.
Can be for example by performing two frequency-portions (or frequency portion of frequency-portions and another frequency-portions
Point inverse version) inner product calculate each correlation.Correlation can also by divided by N-1 normalize, wherein N is every
Hits in individual frequency-portions.In some implementations, Pearson (Pearson) Coefficient of production-moment correlation can be used.
Some or all of above-mentioned correlation can be used for the score for determining pitch estimated accuracy.It is for example, humorous for eight
Ripple, frequency-portions eight correlations related to its own inverse version can be calculated, frequency-portions and another can be directed to
28 correlations of correlation computations between individual frequency-portions, and the reverse of frequency-portions and another frequency-portions can be directed to
28 correlations are calculated between version.These correlations can combine in any suitable manner, to obtain the essence of pitch estimation
The total score of degree.For example, correlation can be added or be multiplied to obtain total score.
In some implementations, Fisher can be used to become and brings combined relevance.Individual phasic property r Fisher conversion can be with
It is calculated as:
In the area-of-interest of individual correlation, Fisher conversion can be approximated to be:
F(r)≈r
The Fisher conversion of individual correlation can have the probability density function of approximate Gaussian, and its standard deviation isWherein N is the quantity of the sample in each part.Therefore, using above-mentioned approximation method, individual correlation
The probability density function f (r) of Fisher conversion can be expressed as:
It may then pass through the f (r) for calculating each correlation and they be multiplied to calculate total score.Therefore, if there is M
Individual correlation, then total score s can be calculated as likelihood score:
Or score S can be calculated as log likelihood:
These scores can be used for obtaining accurate pitch estimation, such as golden section search or any by iterative process
The gradient descent algorithm of species.For example, it can be estimated to estimate to initialize accurate pitch with rough pitch.It can be directed to currently
Other pitch values near accurate pitch estimation and the estimation of accurate pitch calculate score.Work as if the score of another pitch value is higher than
The score of preceding pitch estimation, then current pitch can be estimated to be arranged to other pitch values.The process, Zhi Daoda can be repeated
To appropriate stop condition.
In some implementations, it is determined that the process of accurate pitch estimation can be with restrained, such as require that accurate pitch estimation exists
In the range of rough pitch estimation.The scope can be determined using any appropriate technology.For example, can be according to rough pitch
The variance or confidential interval of estimation determine the scope, such as the confidential interval of rough pitch estimation is determined using bootstrap technique.
Scope can be determined according to confidential interval, such as the multiple of confidential interval.When it is determined that accurate pitch is estimated, it can limit and search
Rope so that accurate pitch estimation is without departing from specified range.
In some implementations, it is determined that after fraction chirp rate and pitch, it may be necessary to estimate the amplitude of the harmonic wave of signal
(its can be complex value and including phase information).Each harmonic wave can be modeled as line frequency modulation small echo, wherein using estimation
Pitch and the fraction chirp rate of estimation set the frequency and chirp rate of line frequency modulation small echo.For example, for kth subharmonic, harmonic wave
Frequency can be k times of pitch of estimation, and the chirp rate of harmonic wave can be that fraction chirp rate is multiplied by line frequency modulation small echo
Frequency.Any appropriate duration can be used in line frequency modulation small echo.
The amplitude of harmonic wave can be estimated using any appropriate technology, including for example maximum likelihood degree is estimated.At some
In realization, the vector of harmonic amplitudeIt can be estimated as:
Wherein M is matrix, and where each row corresponds to the line frequency modulation small echo of each harmonic wave with parameter as described above, matrix
M line number corresponds to the amplitude number of the harmonic wave to be estimated, and h is hermitian transposition, and x is that the time series of signal represents.Harmonic wave shakes
The estimation of width can be complex value, and in some implementations, can use other functions of amplitude, such as amplitude, Amplitude-squared,
Real part or imaginary part.
In some implementations, can in previous step calculated amplitude, and need not clearly calculate again.For example,
In previous processing step use LLR frequency spectrums in the case of, can when calculating LLR frequency spectrums calculated amplitude.By by Gauss
Calculating LLR frequency spectrums in frequency spectrum are fitted to, the fitting parameter of Gauss is the amplitude of Gauss.During LLR frequency spectrums are calculated
The amplitude of Gauss can be preserved, and these amplitudes can be called, rather than is recalculated.In some implementations, according to LLR frequencies
The amplitude that spectrum determines can be starting point, and for example can improve amplitude by using iterative technique.
Above-mentioned technology can be carried out to the continuous part of pending signal, such as every ten milliseconds of signal frame.For place
Each part of the signal of reason, it may be determined that fraction chirp rate, pitch harmony wave-amplitude.Fraction chirp rate, pitch harmonic are shaken
Width can partly or entirely be referred to as HAM (harmonic amplitude matrix) feature, and can create the feature including HAM features to
Amount.Except or be substituted for handle harmonic signal any other feature, can also use HAM features characteristic vector.For example,
Except or substitute mel-frequency cepstrum coefficient, perceive linear prediction feature or neural network characteristics, HAM features can also be used.
HAM features can apply to any application of harmonic signal, including but not limited to perform speech recognition, word identification, and speaker knows
Not, speaker verification, noise reduction or signal reconstruction.
Figure 11-14 is the flow chart for the example implementation for showing above-mentioned processing.Pay attention to, for flow charts described below, step
Rapid order is exemplary, and other are sequentially possible, and not all step is all required, and in some realities
In existing, it is convenient to omit some steps can add other steps.The processing of flow chart can be by one or more computers
(for example, calculation as described below machine) is realized.
Figure 11 is the flow chart of the example implementation of the feature for the Part I for calculating signal.In step 1110, signal is obtained
A part.Signal can estimate any signal that feature comes in handy, including but not limited to voice signal or music signal.
The part can be any relevant portion of signal, and the part can be for example at regular intervals (such as every 10 milliseconds)
The signal frame of extraction.
In step 1120, the fraction chirp rate of a part for signal is estimated.It can be estimated point using any of the above described technology
Number chirp rate.For example, multiple possible fraction chirp rates can be identified, and can be that each possible fraction chirp rate calculates
Score.Score can be calculated using the function of such as any of the above described function g ().Top score can be corresponded to by selection
Fraction chirp rate determine the estimation of fraction chirp rate.In some implementations, fraction Zhou can be determined using iterative process
The more accurate estimation for rate of singing, such as by selecting additional possible fraction chirp rate and with golden section search or gradient
Iteration drops.Function g () can be using any frequency representation of above-mentioned Part I as input, including but not limited to Part I
Frequency spectrum, the LLR frequency spectrums of Part I, the broad sense frequency spectrum of Part I, the frequency of Part I-chirp distribution, or first
The PVT divided.
In step 1130, the frequency representation of a part for signal is calculated using the fraction chirp rate of estimation.Frequency representation
It can be any expression of the signal section of the function as frequency.Frequency representation can be such as fixed frequency spectrum, broad sense frequency spectrum,
The row of LLR frequency spectrums or PVT.Frequency representation can be calculated during the processing of step 1120, it is not necessary to be independent step.Example
Such as, can it is determined that fraction chirp rate estimation other processing during calculate frequency representation.
In step 1140, frequency of use represents to calculate rough pitch estimation from a part for signal.Above-mentioned can be used
What technology determines rough pitch estimation.For example, above-mentioned any kind of frequency spectrum and various parameters can be directed to (such as not
With threshold value, different smoothing kernels) and the other parts of signal determine peak to peak distance.Then can use histogram or on
State any other technology and rough pitch estimation is calculated according to peak to peak distance.
In step 1150, frequency of use represents and the estimation of rough pitch calculates accurate pitch estimation from a part for signal.
It can be estimated to estimate to initialize accurate pitch with rough pitch, then be improved with iterative process.Estimate for accurate pitch
Each possible values of meter, can calculate the score of such as likelihood score or log likelihood etc, and can pass through maximization
Score determines the estimation of accurate pitch.Score can be determined using the combination of correlation as described above.It can use any
Appropriate program (such as golden section search or gradient decline) maximizes score.
In step 1160, harmonic amplitude is calculated using the fraction chirp rate of estimation and the pitch of estimation.For example, it can lead to
It is line frequency modulation small echo and to perform maximal possibility estimation to calculate harmonic amplitude to cross each Harmonic Modeling.
For the continuous part or time interval of signal, Figure 11 processing can be repeated.For example, can be with every 10 milliseconds calculating
Fraction chirp rate, pitch harmony wave-amplitude.Fraction chirp rate, pitch harmony wave-amplitude can be used for various applications, including but not
It is limited to pitch tracking, signal reconstruction, speech recognition and speaker verification or identification.
Figure 12 is the flow chart of the example implementation of the fraction chirp rate for the part for calculating signal.In step 1210, as above
It is described, obtain a part for signal.
In step 1220, multiple frequency representations of the part of signal are calculated, and any of the above described technology can be used
To calculate frequency representation.Each frequency representation can correspond to fraction chirp rate.In some implementations, can (i) according to PVT's
OK, the radial section that (ii) is distributed according to frequency-chirp, or (iii) use the interior of the part of the signal with line frequency modulation small echo
Product (wherein, the chirp rate of line frequency modulation small echo increases with frequency) calculates frequency representation.
In step 1230, score is calculated for each frequency representation, and each score corresponds to fraction chirp rate.Score can
To indicate corresponding to the matching between the fraction chirp rate of score and the fraction chirp rate of signal section.It can use any of the above described
Technology calculates score.In some implementations, score, such as frequency representation can be calculated with autocorrelation that frequency of use represents
Amplitude-squared autocorrelation.Fisher information, entropy can be used, Kullback-Leibler is dissipated, and autocorrelation is put down
Side's (or Amplitude-squared) value and/or autocorrelation second derivative quadratic sum calculates score according to autocorrelation.
In step 1240, the fraction chirp rate of a part for signal is estimated.In some implementations, corresponded to most by selection
The fraction chirp rate of high score carrys out estimated score chirp rate.In some embodiments, iterative technique (such as gold can be used
Sectioning search or gradient decline) improve the estimation of fraction chirp rate.Then the fraction chirp rate of estimation can be used for as above
The further processing of described signal, such as speech recognition or Speaker Identification.
Figure 13 is the flow chart of the example implementation of the pitch estimation for the part for calculating signal.In step 1310, as above institute
The Part I for obtaining signal is stated, and in step 1320, the frequency of the Part I of signal is calculated using any of the above described technology
Rate represents.
In step 1330, threshold value is selected using any of above-mentioned technology.It is, for example, possible to use signal to noise ratio selects
Threshold value, or threshold value can be selected using the height at the peak in the frequency representation of the Part I of signal.
Multiple peaks in step 1340, the frequency representation of the Part I of identification signal.It is any appropriate to use
Technology identifies peak.For example, the value of frequency representation compared with threshold value, can be consistently higher than the frequency representation of threshold value with identification
The continuous part of (each frequency-portions).Peak can be for example identified in the following manner:Select the peak of frequency-portions, selection
Midpoint between the beginning of frequency-portions and the end point of frequency-portions, or curve (such as Gauss) is fitted to frequency
Part simultaneously uses fitting selection peak.Therefore it can be represented to identify the frequency-portions higher than threshold value, and identified each with processing frequency
The peak of frequency-portions.
In step 1350, multiple peak to peak distances in the frequency representation for the Part I for calculating signal.Each peak can be with
It is associated with the frequency values corresponding to peak.Peak to peak distance may be calculated the difference of the frequency values of adjacent peak.For example, if peak is deposited
It is 230Hz, 690Hz, 920Hz, and 1840Hz (for example, similar to 931 in Fig. 9 B, 932,933 and 934), then peak to peak
Distance can be 460Hz, 230Hz and 920Hz.
Other threshold values can be directed to, to other changes set with same threshold, or to other threshold values
The change repeat step 1330,1340 and 1350 of other settings.It is for example, as set forth above, it is possible to multiple in frequency of use expression
The height at peak selects multiple threshold values, can use second frequency corresponding with the Part II of signal represent identical threshold value or
Other threshold values (for example, wherein Part II is before or after Part I), and identical or other threshold values can be with
Different smoothing kernels are used together.
In step 1360, the histogram of calculating peak to peak distance.Histogram can be used in above-mentioned peak to peak distance
Some or all.Any appropriate vertical bar width, such as 2 to 5Hz vertical bar width can be used.
In step 1370, determine that pitch is estimated using the histogram of peak to peak distance.In some implementations, pitch is estimated
It can correspond to the pattern of histogram.In some implementations, it can determine that pitch is estimated using multiple histograms.For example, can
To calculate multiple histograms for multiple threshold values (or combination of multiple threshold values and other specification (such as moment or smoothing kernel)), and
And each determination original pitch estimation in multiple histograms can be directed to.Can be for example, by most common first by selecting
Pitch estimation is walked, determines that final pitch is estimated according to multiple preliminary pitch estimations.
Figure 14 is the flow chart of the example implementation of the pitch estimation for the part for calculating signal.In step 1410, as above institute
State, obtain the frequency representation of a part for signal.
In step 1420, the pitch estimation of signal section is obtained.The pitch estimation obtained can use estimation pitch
Any technology calculates, including but not limited to above-mentioned rough pitch estimation technique.The pitch estimation of acquisition is considered will
The original pitch estimation of renewal, or be considered the operation pitch updated by iterative process and estimate.
In step 1430, multiple frequency-portions of frequency representation are obtained.Times that each frequency-portions can be estimated with pitch
Centered on number.For example, first frequency part can be centered on pitch be estimated, second frequency part can be with the two of pitch estimation
Centered on times, by that analogy.Frequency-portions can use any appropriate width.For example, frequency-portions can be with dividing frequency table
Show, can be with overlapping, or there is gap between them.
In step 1440, multiple frequency-portions that frequency of use represents calculate multiple correlations.Calculate correlation it
Before can further processing frequency part.For example, it is N that each frequency-portions can be extracted and stored in length from frequency representation
Vector in, wherein the beginning started corresponding to frequency-portions of vector, and the end of vector is corresponding to the knot of frequency-portions
Beam.Frequency-portions can shift subsample amount so that frequency-portions arrange exactly.For example, pitch estimation can be located at frequency
Between the Frequency point of expression (for example, 230Hz pitch estimation can be between Frequency point 37 and Frequency point 38, about position
For 37.3).Therefore, the beginning of frequency-portions, center and end can be defined by fractional sampling value.Frequency-portions can be moved
Seat sampling quantity so that the beginning of frequency-portions, one or more of center and end correspond to the integer sample of frequency representation
This.In some implementations, can also by subtract average value and divided by frequency-portions standard deviation come to frequency-portions carry out
Normalization.
Correlation can include any of following correlation:Between first frequency part and second frequency part
Correlation, the correlation between first frequency part and reverse second frequency part, and first frequency part and reverse first
Correlation between frequency-portions.Correlation can be calculated using any appropriate technology.For example, frequency-portions can be from frequency
Rate is extracted and stored in representing in vector as described above, and (or can have another by the inner product for performing vector
The vectorial inner product of the inverted version of vector) calculate correlation.
In step 1450, combined relevance is to obtain the score of pitch estimation.Any appropriate technology next life can be used
Into score, including the product of correlation is for example calculated, the summation of correlation, the combination of the Fisher conversion of correlation, or it is related
Property likelihood score log likelihood or the Fisher of correlation conversion combination, as described above.
In step 1460, renewal pitch estimation.For example, the first score and the second pitch of the first pitch estimation can be compared
Second fraction of estimation, and can determine that pitch is estimated by selecting the pitch with top score to estimate.It can repeat
Step 1420 is carried out the continuous pitch that updates to 1460, with the technology using such as golden section search or gradient decline etc and estimated.
Can be with repeat step 1420 to 1460, until reaching some appropriate stop conditions, such as maximum iteration or to basis
The improvement for the pitch estimation previously estimated is reduced under threshold value.
Figure 15 shows the part of a realization of the computing device 110 for realizing any of the above described technology.In fig.15,
Part is shown as on single computing device 1510, but the part system that can be distributed in such as computing device etc is more
In individual computing device, including such as terminal user's computing device (for example, smart mobile phone or tablet personal computer) and/or server calculate
Equipment (for example, cloud computing).For example, the collection of voice data and the pretreatment of voice data can be held by terminal user's computing device
OK, and other operations can be performed by server.
Computing device 1510 can include any typical component of computing device, such as volatile and nonvolatile memory
1520, one or more processors 1521, and one or more network interfaces 1522.Computing device 1510 can also include appointing
What input and output block, such as display, keyboard and touch-screen.Computing device 1510 can also include providing specific function
Various parts or module, and these parts or module can be realized with software, hardware or its combination.Below, show for one
Example property realizes the several examples for describing part, and other realizations can include additional component or exclusion described below one
A little parts.
Computing device 1510, which can have, is used to perform input signal any required operation (for example, analog-to-digital conversion, is compiled
Code, decoding, sub-sampling, adding window or calculate frequency representation) Signal Processing Element 1530.Computing device 1510, which can have, to be used
Any of the above described technology estimates the fraction chirp rate estimation section 1531 of the fraction chirp rate of signal.Computing device 1510 can have
There is the rough pitch estimation section 1532 for using peak to peak distance estimations signal pitch as described above.Computing device 1510 can have
There is the accurate pitch estimation section 1533 for estimating signal pitch using correlation as described above.Computing device 1510 can have
The HAM features generating unit 1534 of harmonic amplitude is determined as described above.
Computing device 1510 can also have the part that above-mentioned technology is applied to application-specific.For example, computing device
1510 can have speech recognition component 1540, speaker verification's part 1541, Speaker Identification part 1542, signal reconstruction portion
Any one in part 1543 and word identification component 1544.For example, estimated score chirp rate, estimation tone and estimation harmonic wave shake
Width may be used as the input of any application, and make outside the further feature or parameter applied for these or as replacement
With.
It can be executed in different order according to realization, the step of any of the above described technology, can combine, can be divided into more
Individual step, or can not perform completely.The step of being performed by all-purpose computer, can be by being exclusively used in the calculating of application-specific
Machine performs, and can sequentially can be held by multiple computers or computing device by single computer or computing device
OK, or can carry out simultaneously.
Above-mentioned technology can be with hardware, and the combination of software or hardware and software is realized.On realizing in hardware or in software
The requirement of specific implementation can be depended on by stating any portion of selection of technology.Software module or program code may have easy
The non-transitory of mistake memory, nonvolatile storage, RAM, flash memory, ROM, EPROM or any other form is computer-readable to deposit
In storage media.
Conditional statement used herein, such as " can ", " can with ", " possibility ", " meeting ", " such as " it is intended to mean that some realities
Now including still other realizations does not include some features, element and/or step.Therefore, such conditional statement shows, Mou Xieshi
Feature, element and/or step are not needed now.Term " comprising ", "comprising", " having " etc. are synonymous, are made in an open-ended fashion
With, and add ons are not excluded for, feature, act, operation.Term "or" with its pardon meaning, (rather than anticipate by its exclusiveness
Justice) use, therefore when for such as connecting element list, term "or" refers to one of element in list, some or all.
Unless expressly stated otherwise, otherwise the loigature language of such as phrase " at least one in X, Y and Z " should be understood
Expression project, term etc. can be X, Y or Z, or its combination.Therefore, this connection language is not meant to some embodiments
It is required that at least one X be present, at least one Y and at least one Z are to each existing.
Although foregoing detailed description it has been shown that be described and pointed out the novel feature applied to various embodiments,
It is it is appreciated that can made respectively to the form and details of shown equipment or technology without departing from the spirit of the invention
Kind is omitted, and is replaced and is changed.Description of the scope of invention disclosed herein by appended claims rather than above indicates.
All changes in the implication and scope of the equivalent of claim will be included in the range of it.
Claims (20)
1. a kind of computer implemented method for being used to estimate the feature of harmonic signal, this method include:
Obtain a part for signal;
Calculate the estimated score chirp rate of a part for the signal;
The first frequency that a part for the signal is calculated using the fraction chirp rate of estimation is represented;
The first pitch of the Part I of the signal is calculated using multiple peak to peak distances in first frequency expression
Estimation;And
Using first pitch estimation and the signal a part second frequency expression first frequency part with it is described
Correlation between the second frequency part that second frequency represents is estimated to calculate the second pitch of a part for the signal.
2. according to the method for claim 1, also come including the use of the fraction chirp rate of estimation and second pitch estimation
Calculate the amplitude of multiple harmonic waves of a part for the signal.
3. according to the method for claim 1, wherein the second frequency represents it is that the first frequency represents.
4. according to the method for claim 1, wherein the fraction chirp rate for calculating estimation includes calculating multiple scores, wherein institute
Stating multiple scores includes the first score and the second score, and first score is calculated using the first fraction chirp rate, and described second
Score is calculated using the second fraction chirp rate, and calculates the fraction chirp rate of estimation by selecting top score.
5. the autocorrelation that according to the method for claim 4, wherein frequency of use represents calculates first score, and
And calculate the frequency representation using the first fraction chirp rate.
6. according to the method for claim 1, wherein performing the one of the signal by using the function of frequency and chirp rate
Partial inner product represents to calculate the first frequency, and the chirp rate of wherein described function increases with frequency.
7. according to the method for claim 1, wherein being come using the estimation cumulative distribution function of the multiple peak to peak distance
Calculate the first pitch estimation.
8. according to the method for claim 1, wherein the first frequency part corresponds to the of first pitch estimation
One multiple, and the second frequency part corresponds to the second multiple of first pitch estimation.
9. the method according to claim 11, in addition to:
Characteristic vector is calculated using the amplitude of the multiple harmonic wave;And
Speech recognition is performed using the characteristic vector, speaker verification, at least one in Speaker Identification or signal reconstruction
It is individual.
10. a kind of system for being used to estimate the feature of harmonic signal, the system include one or more computing devices, and described one
Individual or multiple computing devices include at least one processor and at least one memory, one or more of computing devices by with
It is set to:
Obtain a part for signal;
Calculate the estimated score chirp rate of a part for the signal;
The first frequency that a part for the signal is calculated using the fraction chirp rate of estimation is represented;
The signal is calculated using multiple peak to peak distances in the first frequency expression of a part for the signal
The first pitch estimation of Part I;And
Using first pitch estimation and the signal a part second frequency expression first frequency part with it is described
Correlation between the second frequency part that second frequency represents is estimated to calculate the second pitch of a part for the signal.
11. system according to claim 10, wherein one or more of computing devices are also configured to use estimation
Fraction chirp rate and second pitch estimation calculate the amplitude of multiple harmonic waves of a part for the signal.
12. system according to claim 10, wherein calculating the first frequency table using the fraction chirp rate of estimation
Show.
13. system according to claim 10, wherein the second frequency represents that being different from the first frequency represents.
14. system according to claim 10, represented wherein calculating the first frequency using pitch velocity transformation.
15. system according to claim 10, wherein being calculated using the histogram of the multiple peak to peak distance described
First pitch is estimated.
16. system according to claim 10, wherein one or more of computing devices be additionally configured to by using
The inverted version of first frequency part calculates correlation to calculate the second pitch estimation.
17. one or more non-transitory computer-readable mediums, including computer executable instructions, the computer can perform
Instruction when executed acts at least one computing device, and the action includes:
Obtain a part for signal;
Calculate the estimated score chirp rate of a part for the signal;
The signal is calculated using multiple peak to peak distances in the first frequency expression of a part for the signal
The first pitch estimation of Part I;And
Using first pitch estimation and the signal a part second frequency expression first frequency part with it is described
Correlation between the second frequency part that second frequency represents is estimated to calculate the second pitch of a part for the signal.
18. one or more non-transitory computer-readable mediums according to claim 17, in addition to computer can be held
Row instruction, the computer executable instructions make at least one computing device action, the action bag when executed
Include using the fraction chirp rate of estimation and second pitch estimation to calculate shaking for multiple harmonic waves of a part for the signal
Width.
19. one or more non-transitory computer-readable mediums according to claim 17, in addition to computer can be held
Row instruction, the computer executable instructions make at least one computing device action, the action bag when executed
Include:
Obtain the Part II of the signal;
Calculate the second estimated score chirp rate of the Part II of the signal;
Calculate the 3rd pitch estimation of the Part II of the signal;And
Estimate to estimate to calculate the 4th pitch of the Part II of the signal using the 3rd pitch.
20. one or more non-transitory computer-readable mediums according to claim 19, in addition to computer can be held
Row instruction, the computer executable instructions make at least one computing device action, the action bag when executed
Include:
Multiple harmonic waves of a part for the signal are calculated using the fraction chirp rate of estimation and second pitch estimation
Amplitude;
Use the magnitude determinations characteristic vector;
The of the Part II of the signal is calculated using the second estimated score chirp rate and the 4th pitch estimation
Second amplitude of individual harmonic wave more than two;
Use the second magnitude determinations second feature vector;And
Perform speech recognition using the characteristic vector and the second feature vector, speaker verification, Speaker Identification or
It is at least one in signal reconstruction.
Applications Claiming Priority (17)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201562112836P | 2015-02-06 | 2015-02-06 | |
US201562112832P | 2015-02-06 | 2015-02-06 | |
US201562112796P | 2015-02-06 | 2015-02-06 | |
US201562112850P | 2015-02-06 | 2015-02-06 | |
US62/112,850 | 2015-02-06 | ||
US62/112,832 | 2015-02-06 | ||
US62/112,796 | 2015-02-06 | ||
US62/112,836 | 2015-02-06 | ||
US14/969,029 US9870785B2 (en) | 2015-02-06 | 2015-12-15 | Determining features of harmonic signals |
US14/969,029 | 2015-12-15 | ||
US14/969,038 | 2015-12-15 | ||
US14/969,022 | 2015-12-15 | ||
US14/969,038 US9842611B2 (en) | 2015-02-06 | 2015-12-15 | Estimating pitch using peak-to-peak distances |
US14/969,036 | 2015-12-15 | ||
US14/969,022 US9548067B2 (en) | 2014-09-30 | 2015-12-15 | Estimating pitch using symmetry characteristics |
US14/969,036 US9922668B2 (en) | 2015-02-06 | 2015-12-15 | Estimating fractional chirp rate with multiple frequency representations |
PCT/US2016/016261 WO2016126753A1 (en) | 2015-02-06 | 2016-02-03 | Determining features of harmonic signals |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107430850A true CN107430850A (en) | 2017-12-01 |
Family
ID=60239707
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201680017664.6A Pending CN107430850A (en) | 2015-02-06 | 2016-02-03 | Determine the feature of harmonic signal |
Country Status (2)
Country | Link |
---|---|
EP (1) | EP3254282A1 (en) |
CN (1) | CN107430850A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108389575A (en) * | 2018-01-11 | 2018-08-10 | 苏州思必驰信息科技有限公司 | Audio data recognition methods and system |
CN108399923A (en) * | 2018-02-01 | 2018-08-14 | 深圳市鹰硕技术有限公司 | More human hairs call the turn spokesman's recognition methods and device |
CN108510991A (en) * | 2018-03-30 | 2018-09-07 | 厦门大学 | Utilize the method for identifying speaker of harmonic series |
CN110931035A (en) * | 2019-12-09 | 2020-03-27 | 广州酷狗计算机科技有限公司 | Audio processing method, device, equipment and storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1525435A (en) * | 2003-02-24 | 2004-09-01 | 国际商业机器公司 | Method and apparatus for estimating pitch frequency of voice signal |
CN102197423A (en) * | 2008-10-30 | 2011-09-21 | 高通股份有限公司 | Coding of transitional speech frames for low-bit-rate applications |
US20130041656A1 (en) * | 2011-08-08 | 2013-02-14 | The Intellisis Corporation | System and method for tracking sound pitch across an audio signal |
CN103718242A (en) * | 2011-03-25 | 2014-04-09 | 英特里斯伊斯公司 | System and method for processing sound signals implementing a spectral motion transform |
CN103999076A (en) * | 2011-08-08 | 2014-08-20 | 英特里斯伊斯公司 | System and method of processing a sound signal including transforming the sound signal into a frequency-chirp domain |
CN104200818A (en) * | 2014-08-06 | 2014-12-10 | 重庆邮电大学 | Pitch detection method |
-
2016
- 2016-02-03 EP EP16706703.2A patent/EP3254282A1/en not_active Withdrawn
- 2016-02-03 CN CN201680017664.6A patent/CN107430850A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1525435A (en) * | 2003-02-24 | 2004-09-01 | 国际商业机器公司 | Method and apparatus for estimating pitch frequency of voice signal |
CN102197423A (en) * | 2008-10-30 | 2011-09-21 | 高通股份有限公司 | Coding of transitional speech frames for low-bit-rate applications |
CN103718242A (en) * | 2011-03-25 | 2014-04-09 | 英特里斯伊斯公司 | System and method for processing sound signals implementing a spectral motion transform |
US20130041656A1 (en) * | 2011-08-08 | 2013-02-14 | The Intellisis Corporation | System and method for tracking sound pitch across an audio signal |
CN103999076A (en) * | 2011-08-08 | 2014-08-20 | 英特里斯伊斯公司 | System and method of processing a sound signal including transforming the sound signal into a frequency-chirp domain |
CN104200818A (en) * | 2014-08-06 | 2014-12-10 | 重庆邮电大学 | Pitch detection method |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108389575A (en) * | 2018-01-11 | 2018-08-10 | 苏州思必驰信息科技有限公司 | Audio data recognition methods and system |
CN108389575B (en) * | 2018-01-11 | 2020-06-26 | 苏州思必驰信息科技有限公司 | Audio data identification method and system |
CN108399923A (en) * | 2018-02-01 | 2018-08-14 | 深圳市鹰硕技术有限公司 | More human hairs call the turn spokesman's recognition methods and device |
WO2019148586A1 (en) * | 2018-02-01 | 2019-08-08 | 深圳市鹰硕技术有限公司 | Method and device for speaker recognition during multi-person speech |
CN108510991A (en) * | 2018-03-30 | 2018-09-07 | 厦门大学 | Utilize the method for identifying speaker of harmonic series |
CN110931035A (en) * | 2019-12-09 | 2020-03-27 | 广州酷狗计算机科技有限公司 | Audio processing method, device, equipment and storage medium |
CN110931035B (en) * | 2019-12-09 | 2023-10-10 | 广州酷狗计算机科技有限公司 | Audio processing method, device, equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
EP3254282A1 (en) | 2017-12-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3440672B1 (en) | Estimating pitch of harmonic signals | |
CN102124518B (en) | Apparatus and method for processing an audio signal for speech enhancement using a feature extraction | |
US9485597B2 (en) | System and method of processing a sound signal including transforming the sound signal into a frequency-chirp domain | |
US9870785B2 (en) | Determining features of harmonic signals | |
US9922668B2 (en) | Estimating fractional chirp rate with multiple frequency representations | |
CN107430850A (en) | Determine the feature of harmonic signal | |
US20040199382A1 (en) | Method and apparatus for formant tracking using a residual model | |
US20080189109A1 (en) | Segmentation posterior based boundary point determination | |
CN112116922B (en) | Noise blind source signal separation method, terminal equipment and storage medium | |
Karthikeyan et al. | Hybrid machine learning classification scheme for speaker identification | |
US9548067B2 (en) | Estimating pitch using symmetry characteristics | |
Kumar et al. | A new pitch detection scheme based on ACF and AMDF | |
US20210256970A1 (en) | Speech feature extraction apparatus, speech feature extraction method, and computer-readable storage medium | |
Kwon et al. | Speech enhancement combining statistical models and NMF with update of speech and noise bases | |
US11929086B2 (en) | Systems and methods for audio source separation via multi-scale feature learning | |
Singh et al. | Application of different filters in mel frequency cepstral coefficients feature extraction and fuzzy vector quantization approach in speaker recognition | |
US10235993B1 (en) | Classifying signals using correlations of segments | |
Ming et al. | An iterative longest matching segment approach to speech enhancement with additive noise and channel distortion | |
CN113593604A (en) | Method, device and storage medium for detecting audio quality | |
CN112786068A (en) | Audio source separation method and device and storage medium | |
US9842611B2 (en) | Estimating pitch using peak-to-peak distances | |
KR101524848B1 (en) | audio type recognizer | |
Ahuja et al. | A complex matrix factorization approach to joint modeling of magnitude and phase for source separation | |
CN116665698A (en) | Pulse sound identification method based on Hilbert-Huang transform and Mel spectrum transform | |
Sharma et al. | Reduced feature sets for vowel recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20171201 |
|
WD01 | Invention patent application deemed withdrawn after publication |