CN101149924A - Method and device for implementing open-loop pitch search - Google Patents

Method and device for implementing open-loop pitch search Download PDF

Info

Publication number
CN101149924A
CN101149924A CNA2006101397038A CN200610139703A CN101149924A CN 101149924 A CN101149924 A CN 101149924A CN A2006101397038 A CNA2006101397038 A CN A2006101397038A CN 200610139703 A CN200610139703 A CN 200610139703A CN 101149924 A CN101149924 A CN 101149924A
Authority
CN
China
Prior art keywords
pitch period
value
autocorrelation function
global reference
current
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CNA2006101397038A
Other languages
Chinese (zh)
Other versions
CN100541609C (en
Inventor
胡瑞敏
刘霖
杨玉红
张勇
王庭红
马付伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Wuhan University WHU
Original Assignee
Huawei Technologies Co Ltd
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd, Wuhan University WHU filed Critical Huawei Technologies Co Ltd
Priority to CNB2006101397038A priority Critical patent/CN100541609C/en
Publication of CN101149924A publication Critical patent/CN101149924A/en
Application granted granted Critical
Publication of CN100541609C publication Critical patent/CN100541609C/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The device for realizing open-loop fundamental sound search comprises a self-correlation function computing unit, a fundamental-sound period global reference computing unit and a fundamental-sound determining unit. The method therefore includes: computing the self-correlation function of voice signal, determining current fundamental-sound period global reference according to the result of the self-correlation function of the voice signal, determining the fundamental-sound period of the voice signal according to the result of current global reference. Advantage: reduced complexity of algorithm operation and storage cost in open-loop fundamental-sound searching course.

Description

Method and device for realizing open-loop pitch search
Technical Field
The present invention relates to speech coding technology, and in particular, to a method and apparatus for implementing open-loop pitch search.
Background
The pitch period is a period in which the vocal cords vibrate when a person pronounces sound. The pitch period is an important problem in speech coding, and its accuracy will directly affect the coding quality and efficiency of the speech coder. The accurate periodic analysis of the fundamental tone can effectively remove redundancy in the voice coding process, reduce the coded bit number and realize low-bit-rate high-quality voice coding.
In order to accurately determine the pitch period, there are currently a number of related pitch detection algorithms. For example, in the time domain, the conventional algorithms include a pitch estimation algorithm based on an average amplitude difference function (AMDF) and a pitch detection algorithm based on a short-time autocorrelation function (ACF). For another example, at the angle of the frequency domain, there exists a frequency domain pitch period estimation scheme for a multi-band excitation speech coding algorithm (MBE), which uses a closed-loop analysis and synthesis method to match the frequency domain waveform of the signal to obtain the optimal pitch period estimation.
In practical application, the time domain-based pitch search algorithm is widely applied due to the simplicity and good performance of the algorithm. For example, in the current wideband speech coding standard AMR-WB +, a time-domain improved short-time autocorrelation function (ACF) pitch detection algorithm is adopted. AMR-WB + adopts a weighted correlation function method to search a pitch period, and the specific implementation process mainly comprises the following processing processes: signal preprocessing process, open-loop pitch search process and closed-loop pitch search process.
In the open-loop pitch search process, AMR-WB + uses a traditional time-domain correlation function to obtain a pitch period. The adopted correlation function calculation formula is as follows:
referring to equation 1.1, corr represents the correlation function value, s (n) is the perceptually weighted speech signal, delay represents the speech pitch period candidate in the search, T 0 Indicating the corresponding delay value when the maximum value of the correlation function is reached. Reflecting the similarity degree of the signal sequence s (n) with the subframe length of 64 sampling points and the signal sequence s (n-delay) with the delay by using the correlation function calculated by the formula 1.1, and solving the T corresponding to the corr maximum value 0 The pitch period value in the open-loop pitch search procedure is obtained.
Because human voice is not a perfect periodic signal and the voice signal is subjected to various interferences such as vocal tract formants and external noise, the pitch period obtained by directly adopting the correlation function has a certain deviation, and the problem of doubling period and the problem of pitch smoothness often occur. Wherein the content of the first and second substances,
for the period doubling problem, referring to fig. 1A, ideally, the input speech signal is a standard periodic signal and the pitch period candidate value delay is T 0 ,2T 0 ,3T 0 All correlation functions are the same and are the correlation function maximum value corr max . However, referring to fig. 1B, since the actual speech signal is not a standard periodic signal, a true pitch period T occurs 0 However, the pitch period obtained by the above equation 1.1 is 3T 0 And thus a double cycle problem occurs.
For the pitch smoothness problem, the pitch period variation of adjacent frames is limited in voiced segments of the speech signal. Generally, the pitch period of two adjacent voiced frame signals varies by no more than 10%, and hardly by more than 25%. Therefore, in the pitch period search process, the pitch smoothness problem must be considered to prevent noise or other conditions from affecting the pitch period of adjacent frames and changing significantly.
At present, AMR-WB + adopts a method of weighting a correlation function to solve the problems of cycle doubling and pitch smoothness. That is, a weighting function is multiplied in the above equation 1.1, that is:
Figure A20061013970300081
Figure A20061013970300082
where s (n) represents the input preprocessed speech signal, delay represents the candidate value of the short pitch period of the speech signal in the search, and w (delay) represents a weighting function with delay as a variable, which is to multiply the formula of the correlation function in the formula (1.1) by the weighting function w (delay).
Therein, the weighting function w (delay) is divided into two parts, which are represented as follows:
w(delay)=w 1 (delay)w n (delay) (1.4)
wherein w 1 (delay) is set to:
w 1 (delay) = cw (delay) (1.5) for solving the doubling period problem;
w n (delay) is set to:
Figure A20061013970300091
wherein cw (delay) represents the above-mentioned weighting function, and T old Represents the average of the pitch periods in the past 5 frames, and v represents the decision on the open-loop gain in the weighting function, and v is set to:
Figure A20061013970300092
wherein, gain represents open loop gain, and the gain calculation formula is:
Figure A20061013970300093
as can be seen from the above description, in the prior art, in order to solve the problem of pitch period and pitch period smoothness, each autocorrelation function value calculated by an autocorrelation function must be weighted, however, for an actual speech signal, pitch period candidates corresponding to all autocorrelation function values do not have a pitch period problem, and a candidate value far from a real pitch period often exists in the pitch period candidates.
In addition, in the prior art, in the open-loop pitch search process, the autocorrelation function of the pitch period is calculated as a non-normalized autocorrelation function, as shown in equation 1.1
Figure A20061013970300094
However, since the main purpose of the open-loop pitch search process is to determine a search range of a finer pitch period for the closed-loop pitch search process, the best delay discriminant for the search in the closed-loop pitch search process is
Figure A20061013970300095
Wherein x (n) is the target signal and y k (n) is constantThe last frame of the delay is activated. Equation 1.9 represents the minimum mean square error criterion for the delayed signal and the target signal. Therefore, in the open-loop pitch searching process, the non-normalized autocorrelation function cannot better meet the minimum mean square error criterion required in the subsequent closed-loop pitch searching process, so that the pitch period deviation determined in the open-loop pitch searching process is larger, and the efficiency of the open-loop pitch searching is greatly reduced.
Disclosure of Invention
In view of the above, the main object of the present invention is to provide a method for implementing an open-loop pitch search, and another object of the present invention is to provide an apparatus for implementing an open-loop pitch search, so as to reduce the complexity of the open-loop pitch search process.
In order to achieve the purpose, the technical scheme of the invention is realized as follows:
a method of implementing an open-loop pitch search, the method comprising:
calculating an autocorrelation function of the speech signal;
determining a current pitch period global reference according to a calculation result of the autocorrelation function;
and determining the pitch period of the voice signal according to the current pitch period global reference.
The step of calculating the autocorrelation function of the speech signal specifically comprises: a normalized autocorrelation function of the speech signal is calculated.
The step of calculating the normalized autocorrelation function specifically includes: computing
Figure A20061013970300101
Where M is the frame length of the subframe, corr is the normalized autocorrelation function value, s (n) is the perceptually weighted speech signal, and delay is the speech pitch period candidate value in the search.
The step of determining the current pitch period global reference specifically includes: obtaining an optimal pitch period candidate value according to a calculation result of the autocorrelation function; and judging whether a reliable pitch period global reference can be determined in the current frame or not according to the obtained optimal pitch period candidate value, if so, determining the obtained optimal pitch period candidate value as a current pitch period global reference, otherwise, judging whether the pitch period global reference determined in the previous frame is invalid or not, if so, determining the current pitch period global reference to be zero, and if not, determining the pitch period global reference determined in the previous frame as the current pitch period global reference.
After determining that the pitch period reference is not invalid, and before determining the pitch period global reference determined in the previous frame as the current pitch period global reference, further comprising: and judging whether the maximum value of the autocorrelation function of two continuous frames of the previous frame and the current frame is smaller than a set threshold value, if so, determining that the current pitch period global reference is zero, otherwise, continuously executing the step of determining the pitch period global reference determined by the previous frame as the current pitch period global reference.
The step of obtaining the optimal pitch period candidate value according to the calculation result of the autocorrelation function specifically includes: selecting a plurality of maximum autocorrelation function values and corresponding pitch period candidate values in the upper and lower limit intervals of the pitch search period according to the calculation result of the autocorrelation function; and weighting the selected autocorrelation function value sequence, and obtaining an optimal pitch period candidate value according to the autocorrelation function value sequence after weighting.
The step of performing weighting processing on the selected autocorrelation function value specifically includes: and judging whether the difference value between each selected pitch period candidate value and a pitch period global reference determined in a previous frame is smaller than a preset threshold value or not for each selected pitch period candidate value, if so, fixedly weighting the autocorrelation function value corresponding to the selected pitch period candidate value, and otherwise, not fixedly weighting the autocorrelation function value corresponding to the pitch period candidate value.
After the weighting process is performed and before the optimal pitch period candidate is obtained, the method further includes: carrying out cycle removal processing on the autocorrelation function value sequence after weighting processing;
and obtaining the optimal pitch period candidate value according to the autocorrelation function value sequence after the period doubling processing is removed.
The step of performing the pitch period elimination processing to obtain the optimal pitch period candidate value specifically includes: and scaling the autocorrelation function corresponding to each selected pitch period candidate value, obtaining a currently undetermined optimal pitch period candidate value according to the autocorrelation function value after scaling, judging whether the pitch period candidate value corresponding to the maximum value of the current autocorrelation function is the doubling of the currently undetermined optimal pitch period candidate value, if so, determining the currently undetermined optimal pitch period candidate value as the optimal pitch period candidate value, otherwise, determining the pitch period candidate value corresponding to the maximum value of the current autocorrelation function as the optimal pitch period candidate value.
Before the scaling processing is performed, the method further comprises the following steps: setting a pitch period candidate value corresponding to the maximum value of the current autocorrelation function as a currently pending optimal pitch period candidate value;
the step of obtaining the undetermined optimal pitch period candidate value according to the scaled autocorrelation function value specifically includes: and sequentially inspecting each selected pitch period candidate value, and if the currently inspected pitch period candidate value is smaller than the currently undetermined optimal pitch period candidate value and the autocorrelation function value corresponding to the currently inspected pitch period candidate value is larger than the maximum value of the autocorrelation function before scaling processing divided by the scaling factor, setting the currently inspected pitch period candidate value as the currently undetermined optimal pitch period candidate value.
The step of scaling the autocorrelation function corresponding to each selected pitch period candidate value specifically includes: and judging whether the pitch period candidate value is larger than a set threshold value or not for each selected pitch period candidate value, if so, dividing the autocorrelation function value corresponding to the pitch period candidate value by a larger scaling factor, otherwise, dividing the autocorrelation function value corresponding to the pitch period candidate value by a smaller scaling factor.
The step of judging whether the reliable pitch period global reference can be determined in the current frame specifically comprises:
judging whether a pitch period candidate value corresponding to the maximum value of the current autocorrelation function is doubled as an optimal pitch period candidate value, and whether a difference value between the obtained optimal pitch period candidate value and a pitch period global reference determined in a previous frame is smaller than a set threshold value, if the pitch period candidate value corresponding to the maximum value of the current autocorrelation function is not doubled as the optimal pitch period candidate value and the difference value is smaller than the set threshold value, determining that a reliable pitch period global reference can be determined in the current frame;
or, judging whether the difference value between the maximum value of the autocorrelation function and any other autocorrelation function value in the current autocorrelation function value sequence is larger than a set threshold value, if so, determining that reliable pitch period global reference can be determined in the current frame;
or, judging whether one or more pitch period candidate values doubled as the optimal pitch period candidate value exist in the current pitch period candidate value sequence, if so, determining that a reliable pitch period global reference can be determined in the current frame;
or, judging whether the pitch period global reference determined in the previous frame is doubled of the best pitch period candidate value determined currently, and whether the maximum value of the current autocorrelation function is greater than a set threshold value, if so, determining that a reliable pitch period global reference can be determined in the current frame.
The step of determining the pitch period of the speech signal according to the current global reference pitch period specifically comprises: judging whether the difference value between the obtained optimal pitch period candidate value and the current pitch period global reference is smaller than a set threshold value, if so, determining the optimal pitch period candidate value as the pitch period of the voice signal;
or, judging whether the current autocorrelation function maximum value is smaller than a set threshold value in the autocorrelation function value sequence, if so, determining a pitch period candidate value corresponding to the current autocorrelation function maximum value as the pitch period of the voice signal;
the step of determining the pitch period of the speech signal according to the current pitch period global reference specifically comprises:
the method comprises the steps of determining a pitch period determination reference value according to a pitch period global reference, selecting a pitch period candidate value with the smallest difference value with the pitch period determination reference value from a current pitch period candidate value sequence, doubling an autocorrelation function value corresponding to the selected pitch period candidate value, selecting a current autocorrelation function maximum value from the autocorrelation function value sequence, and determining the pitch period candidate value corresponding to the selected autocorrelation function maximum value as the pitch period of the voice signal.
The step of determining the pitch period determination reference value according to the pitch period global reference specifically includes:
and judging whether the determined pitch period global reference is not zero, if so, determining the pitch period global reference as a pitch period determination reference value, otherwise, judging whether the previous non-zero pitch period global reference is invalid, if so, determining the pitch period determination reference value as zero, and if not, determining the previous non-zero pitch period global reference as the pitch period determination reference value.
An apparatus for implementing an open-loop pitch search, the apparatus comprising: an autocorrelation function calculation unit, a pitch period global reference calculation unit and a pitch period determination unit, wherein,
the autocorrelation function calculation unit is used for calculating the autocorrelation function of the voice signal and outputting the calculation result of the autocorrelation function to the pitch period global reference calculation unit;
the pitch period global reference calculation unit is used for determining the current pitch period global reference according to the calculation result of the received autocorrelation function and outputting the determined current pitch period global reference to the pitch period determination unit;
and the pitch period determining unit is used for determining the pitch period of the voice signal according to the received current pitch period global reference.
The autocorrelation function calculating unit calculates a normalized autocorrelation function and outputs the calculation result of the normalized autocorrelation function to the pitch period global reference calculating unit;
and the pitch period global reference calculating unit determines the current pitch period global reference according to the calculation result of the normalized autocorrelation function.
The pitch period global reference calculation unit performs weighting processing on a plurality of larger autocorrelation function values in the received autocorrelation function values, obtains an optimal pitch period candidate value according to the autocorrelation function value sequence after weighting processing, and determines the current pitch period global reference according to the optimal pitch period candidate value.
The pitch period global reference calculation unit is used for removing multiple larger autocorrelation function values in the received autocorrelation function values, obtaining an optimal pitch period candidate value according to the autocorrelation function value sequence after the time period removal processing, and determining the current pitch period global reference according to the optimal pitch period candidate value.
It can be seen that the present invention has the following advantages:
1. for the autocorrelation function employed by the present invention, there are two advantages:
(a) Maximizing the autocorrelation function to obtain a pitch period candidate that is statistically accurate and consistent with an integer pitch search in a closed-loop pitch search, with the pitch period candidate solved to a minimum mean square error criterion that seeks for errors in the original signal and the delayed signal;
(b) The autocorrelation function is a normalized autocorrelation function, and the problem of the pitch doubling is solved through classification analysis of autocorrelation function values, and the smoothness of the pitch period is weighted, the strength of the periodicity of the voice is judged, and the pitch period is finally determined.
2. In the present invention, the pitch period global reference contour is set as a measure of the global variation of the pitch period, so that the pitch period can be smoothed.
3. And determining the pitch period for signal adaptation by adopting the classified pitch period analysis.
4. After the autocorrelation function values are calculated, a subsequent series of processing is not carried out on all autocorrelation function values, such as period doubling processing, pitch period smoothness processing and the like, but partial autocorrelation function values which are most beneficial to determining the pitch period are selected, so that the processing process is greatly simplified, the algorithm operation complexity and the storage cost in the open-loop pitch searching process are reduced, and the open-loop pitch searching process is simplified.
Drawings
FIG. 1A is a graph of a correlation function for a standard periodic signal.
FIG. 1B is a graph of a correlation function of an actual speech signal.
Fig. 2 is a schematic structural diagram of an apparatus for implementing open-loop pitch search in the present invention.
Fig. 3 is a flow chart for implementing the open-loop pitch search procedure in an embodiment of the invention.
FIG. 4 is a flow chart of determining a current pitch period global reference in an embodiment of the invention.
Fig. 5 is a flow chart of removing the effect of doubling cycles in an embodiment of the present invention.
Detailed Description
The invention provides a method for realizing open-loop pitch search, which has the core idea that: calculating an autocorrelation function of the speech signal; determining the current pitch period global reference according to the calculation result of the autocorrelation function; and determining the pitch period of the voice signal according to the current pitch period global reference.
Correspondingly, the invention also provides a device for realizing the open-loop pitch search. Fig. 2 is a schematic structural diagram of an apparatus for implementing open-loop pitch search according to the present invention. Referring to fig. 2, in the present invention, an apparatus for implementing an open-loop pitch search includes: an autocorrelation function calculation unit, a pitch period global reference calculation unit and a pitch period determination unit, wherein,
the autocorrelation function calculation unit is used for calculating the autocorrelation function of the voice signal and outputting the calculation result of the autocorrelation function to the pitch period global reference calculation unit;
the pitch period global reference calculation unit is used for determining the current pitch period global reference according to the calculation result of the received autocorrelation function and outputting the determined current pitch period global reference to the pitch period determination unit;
and the pitch period determining unit is used for determining the pitch period of the voice signal according to the received current pitch period global reference.
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail with reference to the accompanying drawings and specific embodiments.
Fig. 3 is a flow chart for implementing the open-loop pitch search procedure in an embodiment of the invention. Referring to fig. 2 and fig. 3, the process of implementing open-loop pitch search by the method of the present invention using the apparatus of the present invention includes the following steps:
step 301: a normalized autocorrelation function is set.
In order to make the open-loop pitch search process better conform to the minimum mean square error criterion required in the subsequent closed-loop pitch search process, in this step 301, a normalized autocorrelation function may be preferably set, and may be represented by the following equation 2.1:
wherein, M is the frame length of the subframe, corr represents the normalized correlation function value, s (n) is the perceptually weighted speech signal, and delay represents the candidate value of the pitch period of the speech in the search. Equation (2.1) is determined according to the criterion of minimum mean square error, and is derived as follows:
the minimum mean square error of the input speech signal and the delayed speech signal is calculated as follows:
Figure A20061013970300162
where error represents the mean square error value of the input speech signal and the delayed signal, gain represents the open-loop gain, s (n) is the perceptually weighted speech signal, and delay represents the speech pitch period candidate in the search.
The minimum value of the equation (2.2) is solved, delay is made constant, and the optimal open-loop gain value gain is solved. Order to
Figure A20061013970300171
Solving to obtain:
Figure A20061013970300172
substituting the result of formula (2.3) into formula (2.2) yields:
(2.4) formula (la) can be converted to:
Figure A20061013970300174
equation (2.5) is the decision criterion of the correlation function in this scheme. In the following closed-loop pitch search decision formula, the following closed-loop pitch search decision formula is expressed by the following formula (1.9)Comparing this formula with formula (2.5), we find that both have similarities: both formulas adopt energy as denominator for normalization, thereby ensuring that a finer search range can be provided for closed-loop pitch search.
Step 302: the autocorrelation function calculation unit calculates the autocorrelation function value of the normalization of the speech signal according to formula 2.1.
Step 303: the autocorrelation function calculation unit selects a plurality of autocorrelation function values and corresponding pitch period candidate values in the upper and lower limit intervals of the pitch search period.
Here, in consideration of the continuity of speech and in order to reduce the complexity of the calculation, in step 303, it is preferable to select only a plurality of autocorrelation function values and pitch period candidates corresponding to the autocorrelation function values that are selected to be larger in the upper and lower limit intervals of the pitch search period, rather than all the autocorrelation function values and the pitch period candidates corresponding to the autocorrelation function values that are calculated. The number of the selected values may be determined according to practical applications, for example, the maximum 6 autocorrelation function values and the corresponding pitch period candidate values are selected.
Executing the step, obtaining an autocorrelation function value sequence with the selected length and a pitch period candidate value sequence corresponding to the autocorrelation function value sequence.
Step 304: the autocorrelation function calculation unit outputs the selected large autocorrelation function values and the corresponding pitch period candidate values to the pitch period global reference calculation unit.
Step 305: the pitch period global reference calculation unit determines the current pitch period global reference according to the received autocorrelation function value and the corresponding pitch period candidate value.
For speech signals, especially for voiced parts, the pitch period of adjacent frames varies little, typically by no more than 10%. In view of this situation, the present invention proposes the concept of global pitch reference (global _ pitch), which is mainly aimed at determining the pitch period of the current frame at the pitch period of the previous frame or frames to avoid the noise and the interference of vocal tract characteristics on the pitch period estimation. In the scheme, the judgment basis of the pitch period global reference is the autocorrelation function value descending sequence and the corresponding pitch period candidate value sequence.
FIG. 4 is a flow chart of determining a current pitch period global reference in an embodiment of the invention. Referring to fig. 4, the specific implementation process of this step 305 mainly includes:
step 401: and obtaining the optimal pitch period candidate value according to the calculation result of the autocorrelation function, namely the selected autocorrelation function value and the corresponding pitch period candidate value.
Here, in step 401, the problem of the pitch lag and the problem of the pitch smoothness can be solved by a procedure of obtaining the optimal pitch lag candidate value.
The specific implementation process of step 401 includes:
first, weighting processing is performed for the pitch cycle smoothness problem.
And performing descending arrangement on the selected autocorrelation function value sequence in advance. And judging whether the difference value between each selected base pitch period candidate value and a base pitch period global reference determined in a previous frame is smaller than a preset threshold value, if so, fixedly weighting the autocorrelation function value corresponding to the selected base pitch period candidate value, and otherwise, not fixedly weighting the autocorrelation function value corresponding to the base pitch period candidate value. After weighting, the current autocorrelation function value sequence may be rearranged in a descending order, and the pitch period candidate value sequence corresponding to the autocorrelation function values may be synchronously shifted.
Second, the double-cycle effect is removed.
Fig. 5 is a flow chart for removing the effect of a double cycle in an embodiment of the present invention. Referring to fig. 5, in step 401, the process of removing the influence of the multiple cycles includes the following steps:
step 501: and setting the pitch period candidate value corresponding to the maximum value of the current autocorrelation function as the currently pending optimal pitch period candidate value.
Step 502: and judging whether the pitch period candidate value which is not considered exists in the current pitch period candidate value sequence, if so, executing the step 503, otherwise, executing the step 508.
Step 503: selecting a pitch period candidate value which is not examined, judging whether the pitch period candidate value is larger than a set threshold value, if so, executing step 504, otherwise, executing step 505.
Step 504: the autocorrelation function value corresponding to the pitch period candidate is divided by the larger scale factor 1, and step 506 is performed.
Step 505: the autocorrelation function value for the pitch candidate is divided by the smaller scaling factor 2.
Step 506: and judging whether the pitch period candidate value currently considered is smaller than the currently pending optimal pitch period candidate value or not, and whether the autocorrelation function value corresponding to the pitch period candidate value currently considered is larger than the maximum value of the autocorrelation function before scaling processing divided by the scaling factor or not, if so, executing the step 507, otherwise, directly returning to the step 502.
Step 507: the currently considered pitch period candidate value is set as the currently pending optimum pitch period candidate value, and the process returns to step 502.
Step 508: and judging whether the pitch period candidate value corresponding to the maximum value of the current autocorrelation function is doubled of the currently pending optimal pitch period candidate value, if so, executing the step 509, otherwise, executing the step 510.
Step 509: and determining the currently-undetermined optimal pitch period candidate value as the optimal pitch period candidate value, and ending the current process.
Step 510: and determining the pitch period candidate value corresponding to the maximum value of the current autocorrelation function as the optimal pitch period candidate value.
Thus, the optimal pitch period candidate described in step 401 is obtained.
Step 402: and judging whether a reliable pitch period global reference can be determined in the current frame according to the obtained optimal pitch period candidate value, if so, executing the step 403, otherwise, executing the step 404.
Here, the implementation manners of determining whether a reliable pitch period global reference can be determined in the current frame include, but are not limited to:
the first method is to judge whether the pitch period candidate value corresponding to the maximum value of the current autocorrelation function is doubled as the optimal pitch period candidate value, and whether the difference between the obtained optimal pitch period candidate value and the pitch period global reference determined in the previous frame is smaller than a set threshold value, if the pitch period candidate value corresponding to the maximum value of the current autocorrelation function is not doubled as the optimal pitch period candidate value and the difference is smaller than the set threshold value, then the reliable pitch period global reference can be determined in the current frame.
Judging whether the difference value between the maximum value of the autocorrelation function and any other autocorrelation function value in the current autocorrelation function value sequence is larger than a set threshold value, if so, determining that reliable pitch period global reference can be determined in the current frame;
and judging whether one or more pitch period candidate values doubled for the optimal pitch period candidate value exist in the current pitch period candidate value sequence, and if so, determining that a reliable pitch period global reference can be determined in the current frame.
And judging whether the pitch period global reference determined in the previous frame is doubled of the currently determined optimal pitch period candidate value and whether the current maximum value of the autocorrelation function is greater than a set threshold value, and if the pitch period global reference determined in the previous frame is doubled and is greater than the threshold value, determining that reliable pitch period global reference can be determined in the current frame.
In this step 402, when it is determined that a reliable pitch period global reference can be determined in the current frame, it means that a new pitch period global reference can be determined, i.e. the following step 403 is performed; if it cannot be determined, this means that the previous reliable pitch period global reference is to be continued at the current frame, i.e. the following step 404 is performed.
Step 403: and determining the obtained optimal pitch period candidate value as the current pitch period global reference, and ending the current process.
Step 404: it is determined whether the pitch period global reference determined in the previous frame has expired, and if so, step 405 is performed, and if not, step 406 is performed.
Here, the number of frames that the pitch period global reference track can hold may be preset, for example, three frames may be held, and then, the determining whether the pitch period global reference determined in the previous frame has failed may be determining whether the number of frames that the pitch period global reference determined in the previous frame has held exceeds three frames, and if so, determining that the pitch period global reference has failed, otherwise, determining that the pitch period global reference has not failed.
Here, if the determined pitch period global reference in the previous frame has failed, it means that the speech segment is not voiced and does not have good pitch period continuity, i.e., the following step 405 is performed.
Step 405: and determining that the global reference of the current pitch period is 0, and ending the current process.
Step 406: and judging whether the maximum value of the autocorrelation function of two continuous frames of the previous frame and the current frame is smaller than a set threshold value, if so, turning to the step 405, otherwise, executing the step 407.
Step 407: the determined pitch period global reference of the previous frame is determined as the current pitch period global reference.
To this end, the process of determining the current pitch period global reference as described in step 305 of fig. 3 is implemented.
Step 306: the pitch period global reference calculation unit outputs the determined current pitch period global reference to the pitch period determination unit.
Step 307: the pitch period determining unit finally determines the pitch period of the voice signal according to the received current pitch period global reference.
Here, when finally determining the pitch period, it can be considered in three cases:
(1) Is determined based on whether the best pitch candidate value is closer to the current pitch global reference.
In this case, the pitch global reference is determined by the optimal pitch candidate of the current frame, and if the two values are relatively close to each other, it is determined that the optimal pitch candidate is reliable, so the optimal pitch value can be directly output as the pitch, and it is also determined that the optimal pitch candidate determined by the current frame satisfies the basis of pitch smoothness, but the autocorrelation function value corresponding to the pitch candidate is small enough to determine a reliable pitch global reference only due to various interferences such as noise and vocal tract interference, so that the current frame will continue to the global pitch reference of the previous frame. Correspondingly, the specific implementation process of this step 307 includes:
and judging whether the difference value between the obtained optimal pitch period candidate value and the current pitch period global reference is smaller than a set threshold value, namely judging whether the obtained optimal pitch period candidate value and the current pitch period global reference are close to each other, and if so, determining the optimal pitch period candidate value as the pitch period of the voice signal.
(2) Determined according to the size of the maximum of the current autocorrelation function.
In this case, the correlation degree of the speech segment signal is relatively small, so that an obvious pitch period is not easy to judge, and at the moment, the current pitch period search has no practical significance, and only a reference for removing long-term correlation to the maximum degree is provided for the subsequent closed-loop pitch search. Thus, in this case, the pitch period search candidate value corresponding to the maximum value of the autocorrelation function can be directly output as the pitch period. Accordingly, the specific implementation process of this step 307 includes:
and judging whether the current autocorrelation function maximum value is smaller than a set threshold value in the autocorrelation function value sequence, and if so, determining a pitch period candidate value corresponding to the current autocorrelation function maximum value as the pitch period of the voice signal.
(3) Is determined based on whether the pitch period cannot be judged significantly.
In this case, the concept of a pitch period determination reference value (trkp) is introduced, which is used as a reference for the determination of the last pitch period.
First, a pitch period determination reference value (trkp) is determined, which is determined by: and judging whether the determined pitch period global reference is not zero, if so, determining the pitch period global reference as a pitch period determination reference value, otherwise, judging whether the previous pitch period global reference which is not zero is failed, if so, determining the pitch period determination reference value as zero, and if not, determining the previous pitch period global reference which is not zero as the pitch period determination reference value.
Then, a reference value (trkp) is determined from the determined pitch period, and a pitch period is finally determined, which is implemented by: selecting a pitch period candidate value with the smallest difference value with a pitch period determination reference value from a current pitch period candidate value sequence, doubling an autocorrelation function value corresponding to the selected pitch period candidate value, selecting a current autocorrelation function maximum value from the autocorrelation function value sequence, and determining the pitch period candidate value corresponding to the selected autocorrelation function maximum value as the pitch period of the voice signal.
Thus, the process of determining the pitch period in the open-loop pitch search process is achieved.
It should be noted that the method and apparatus for implementing open-loop pitch search according to the present invention can be applied to any speech codec.
In short, the above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made without departing from the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (19)

1. A method for implementing an open-loop pitch search, the method comprising:
calculating an autocorrelation function of the speech signal;
determining a current pitch period global reference according to a calculation result of the autocorrelation function;
and determining the pitch period of the voice signal according to the current pitch period global reference.
2. The method according to claim 1, wherein the step of calculating the autocorrelation function of the speech signal is specifically: and calculating the normalized autocorrelation function of the voice signal.
3. The method according to claim 2, wherein the step of computing the normalized autocorrelation function is specifically: computing
Figure A2006101397030002C1
Wherein M is the frame length of the subframe, corr is the normalized autocorrelation function value, s (n) is the speech signal after perceptual weighting, and delay is the speech pitch period candidate value in the search.
4. The method according to claim 1, wherein the step of determining the current pitch cycle total local reference is embodied as: obtaining an optimal pitch period candidate value according to a calculation result of the autocorrelation function; and judging whether a reliable pitch period global reference can be determined in the current frame according to the obtained optimal pitch period candidate value, if so, determining the obtained optimal pitch period candidate value as the current pitch period global reference, otherwise, judging whether the pitch period global reference determined in the previous frame is invalid, if so, determining the current pitch period global reference to be zero, and if not, determining the pitch period global reference determined in the previous frame as the current pitch period global reference.
5. The method of claim 4, wherein after determining that there is no failure, and before determining the determined pitch period global reference from a previous frame as the current pitch period global reference, further comprising: and judging whether the maximum value of the autocorrelation function of two continuous frames of the previous frame and the current frame is smaller than a set threshold value, if so, determining the global reference of the current pitch period as zero, otherwise, continuously executing the step of determining the global reference of the pitch period determined by the previous frame as the global reference of the current pitch period.
6. The method according to claim 4, wherein the step of obtaining the optimal pitch period candidate value according to the calculation result of the autocorrelation function comprises: selecting a plurality of maximum autocorrelation function values and corresponding pitch period candidate values in the upper and lower limit intervals of the pitch search period according to the calculation result of the autocorrelation function; and weighting the selected autocorrelation function value sequence, and obtaining an optimal pitch period candidate value according to the autocorrelation function value sequence after weighting.
7. The method according to claim 6, wherein the step of weighting the selected autocorrelation function values comprises: and judging whether the difference value between each selected pitch period candidate value and a pitch period global reference determined in a previous frame is smaller than a preset threshold value or not for each selected pitch period candidate value, if so, fixedly weighting the autocorrelation function value corresponding to the selected pitch period candidate value, and otherwise, not fixedly weighting the autocorrelation function value corresponding to the pitch period candidate value.
8. The method of claim 6, further comprising, after the weighting and before obtaining the best pitch period candidate; carrying out cycle removal processing on the autocorrelation function value sequence after weighting processing;
and obtaining the optimal pitch period candidate value according to the autocorrelation function value sequence after the period doubling processing is removed.
9. The method according to claim 8, wherein the step of performing the de-binning to obtain the optimal pitch period candidate comprises: and scaling the autocorrelation function corresponding to each selected pitch period candidate value, obtaining a currently undetermined optimal pitch period candidate value according to the autocorrelation function value after the scaling, judging whether the pitch period candidate value corresponding to the maximum value of the current autocorrelation function is doubled as the currently undetermined optimal pitch period candidate value, if so, determining the currently undetermined optimal pitch period candidate value as the optimal pitch period candidate value, otherwise, determining the pitch period candidate value corresponding to the maximum value of the current autocorrelation function as the optimal pitch period candidate value.
10. The method of claim 9, further comprising, prior to performing the scaling process: setting a pitch period candidate value corresponding to the maximum value of the current autocorrelation function as a currently pending optimal pitch period candidate value;
the step of obtaining the undetermined optimal pitch period candidate value according to the scaled autocorrelation function value specifically includes: and sequentially inspecting each selected pitch period candidate value, and if the currently inspected pitch period candidate value is smaller than the currently undetermined optimal pitch period candidate value and the autocorrelation function value corresponding to the currently inspected pitch period candidate value is larger than the maximum value of the autocorrelation function before scaling processing divided by the scaling factor, setting the currently inspected pitch period candidate value as the currently undetermined optimal pitch period candidate value.
11. The method of claim 9, wherein the step of scaling the autocorrelation function corresponding to each of the selected pitch period candidates comprises: for each selected pitch period candidate value, judging whether the pitch period candidate value is larger than a set threshold value, if so, dividing the autocorrelation function value corresponding to the pitch period candidate value by a larger scaling factor, otherwise, dividing the autocorrelation function value corresponding to the pitch period candidate value by a smaller scaling factor.
12. The method according to claim 4, wherein the step of determining whether a reliable pitch period global reference can be determined in the current frame is specifically:
judging whether a pitch period candidate value corresponding to the maximum value of the current autocorrelation function is doubled as an optimal pitch period candidate value, and whether a difference value between the obtained optimal pitch period candidate value and a pitch period global reference determined in a previous frame is smaller than a set threshold value, if the pitch period candidate value corresponding to the maximum value of the current autocorrelation function is not doubled as the optimal pitch period candidate value and the difference value is smaller than the set threshold value, determining that a reliable pitch period global reference can be determined in the current frame;
or, judging whether the difference value between the maximum value of the autocorrelation function and any other autocorrelation function value in the current autocorrelation function value sequence is larger than a set threshold value, if so, determining that a reliable pitch period global reference can be determined in the current frame;
or, judging whether one or more pitch cycle candidate values doubled as the optimal pitch cycle candidate value exist in the current pitch cycle candidate value sequence, if so, determining that a reliable pitch cycle global reference can be determined in the current frame;
or, judging whether the pitch period global reference determined in the previous frame is the doubling of the optimal pitch period candidate value determined currently, and whether the maximum value of the current autocorrelation function is greater than a set threshold value, if the maximum value is the doubling and is greater than the threshold value, determining that a reliable pitch period global reference can be determined in the current frame.
13. The method according to any of claims 1 to 12, wherein the step of determining the pitch period of the speech signal based on the current pitch period global reference is specifically: judging whether the difference value between the obtained optimal pitch period candidate value and the current pitch period global reference is smaller than a set threshold value, if so, determining the optimal pitch period candidate value as the pitch period of the voice signal;
or, judging whether the current autocorrelation function maximum value is smaller than a set threshold value in the autocorrelation function value sequence, if so, determining a pitch period candidate value corresponding to the current autocorrelation function maximum value as the pitch period of the voice signal;
14. the method according to any of claims 1 to 12, wherein the step of determining the pitch period of the speech signal based on the current pitch period global reference is specifically:
determining a pitch period determination reference value according to a pitch period global reference, selecting a pitch period candidate value with the minimum difference value with the pitch period determination reference value from a current pitch period candidate value sequence, doubling an autocorrelation function value corresponding to the selected pitch period candidate value, selecting a current autocorrelation function maximum value from the autocorrelation function value sequence, and determining the pitch period candidate value corresponding to the selected autocorrelation function maximum value as the pitch period of the voice signal.
15. The method according to claim 14, wherein the step of determining the pitch period determination reference value based on the pitch period global reference comprises:
and judging whether the determined pitch period global reference is not zero, if so, determining the pitch period global reference as a pitch period determination reference value, otherwise, judging whether the previous non-zero pitch period global reference fails, if so, determining the pitch period determination reference value as zero, and if not, determining the previous non-zero pitch period global reference as the pitch period determination reference value.
16. An apparatus for implementing an open-loop pitch search, the apparatus comprising: an autocorrelation function calculation unit, a pitch period global reference calculation unit and a pitch period determination unit, wherein,
the autocorrelation function calculation unit is used for calculating the autocorrelation function of the voice signal and outputting the calculation result of the autocorrelation function to the pitch period global reference calculation unit;
the pitch period global reference calculation unit determines the current pitch period global reference according to the calculation result of the received autocorrelation function, and outputs the determined current pitch period global reference to the pitch period determination unit;
and the pitch period determining unit is used for determining the pitch period of the voice signal according to the received current pitch period global reference.
17. The apparatus according to claim 16, wherein said autocorrelation function calculation unit calculates a normalized autocorrelation function, and outputs the calculation result of the normalized autocorrelation function to the pitch cycle global local reference calculation unit;
and the pitch period global reference calculating unit determines the current pitch period global reference according to the calculation result of the normalized autocorrelation function.
18. The apparatus according to claim 16, wherein said pitch period global reference calculating means performs weighting processing on a plurality of autocorrelation function values which are larger among the received autocorrelation function values, obtains an optimal pitch period candidate value from the autocorrelation function value sequence after the weighting processing, and determines the current pitch period global reference based on the optimal pitch period candidate value.
19. The apparatus according to claim 16 or 18, wherein said pitch period global reference calculating means performs a multiple-pass removal process on a larger plurality of received autocorrelation function values, obtains an optimal pitch period candidate value based on the sequence of autocorrelation function values after the multiple-pass removal process, and determines the current pitch period global reference based on the optimal pitch period candidate value.
CNB2006101397038A 2006-09-18 2006-09-18 A kind of method and apparatus of realizing open-loop pitch search Active CN100541609C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB2006101397038A CN100541609C (en) 2006-09-18 2006-09-18 A kind of method and apparatus of realizing open-loop pitch search

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2006101397038A CN100541609C (en) 2006-09-18 2006-09-18 A kind of method and apparatus of realizing open-loop pitch search

Publications (2)

Publication Number Publication Date
CN101149924A true CN101149924A (en) 2008-03-26
CN100541609C CN100541609C (en) 2009-09-16

Family

ID=39250413

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2006101397038A Active CN100541609C (en) 2006-09-18 2006-09-18 A kind of method and apparatus of realizing open-loop pitch search

Country Status (1)

Country Link
CN (1) CN100541609C (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101908341A (en) * 2010-08-05 2010-12-08 浙江工业大学 Voice code optimization method based on G.729 algorithm applicable to embedded system
CN103189916A (en) * 2010-11-10 2013-07-03 皇家飞利浦电子股份有限公司 Method and device for estimating a pattern in a signal
WO2013170610A1 (en) * 2012-05-18 2013-11-21 华为技术有限公司 Method and apparatus for detecting correctness of pitch period
CN103474074A (en) * 2013-09-09 2013-12-25 深圳广晟信源技术有限公司 Voice pitch period estimation method and device
CN105067101A (en) * 2015-08-05 2015-11-18 北方工业大学 Fundamental tone frequency characteristic extraction method based on vibration signal for vibration source identification
CN108831504A (en) * 2018-06-13 2018-11-16 西安蜂语信息科技有限公司 Determination method, apparatus, computer equipment and the storage medium of pitch period
CN109119097A (en) * 2018-10-30 2019-01-01 Oppo广东移动通信有限公司 Fundamental tone detecting method, device, storage medium and mobile terminal
CN109389988A (en) * 2017-08-08 2019-02-26 腾讯科技(深圳)有限公司 Audio adjusts control method and device, storage medium and electronic device

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101908341B (en) * 2010-08-05 2012-05-23 浙江工业大学 Voice code optimization method based on G.729 algorithm applicable to embedded system
CN101908341A (en) * 2010-08-05 2010-12-08 浙江工业大学 Voice code optimization method based on G.729 algorithm applicable to embedded system
CN103189916B (en) * 2010-11-10 2015-11-25 皇家飞利浦电子股份有限公司 The method and apparatus of estimated signal pattern
CN103189916A (en) * 2010-11-10 2013-07-03 皇家飞利浦电子股份有限公司 Method and device for estimating a pattern in a signal
US10249315B2 (en) 2012-05-18 2019-04-02 Huawei Technologies Co., Ltd. Method and apparatus for detecting correctness of pitch period
US9633666B2 (en) 2012-05-18 2017-04-25 Huawei Technologies, Co., Ltd. Method and apparatus for detecting correctness of pitch period
WO2013170610A1 (en) * 2012-05-18 2013-11-21 华为技术有限公司 Method and apparatus for detecting correctness of pitch period
US10984813B2 (en) 2012-05-18 2021-04-20 Huawei Technologies Co., Ltd. Method and apparatus for detecting correctness of pitch period
US11741980B2 (en) 2012-05-18 2023-08-29 Huawei Technologies Co., Ltd. Method and apparatus for detecting correctness of pitch period
CN103474074A (en) * 2013-09-09 2013-12-25 深圳广晟信源技术有限公司 Voice pitch period estimation method and device
CN103474074B (en) * 2013-09-09 2016-05-11 深圳广晟信源技术有限公司 Pitch estimation method and apparatus
CN105067101A (en) * 2015-08-05 2015-11-18 北方工业大学 Fundamental tone frequency characteristic extraction method based on vibration signal for vibration source identification
CN109389988A (en) * 2017-08-08 2019-02-26 腾讯科技(深圳)有限公司 Audio adjusts control method and device, storage medium and electronic device
CN109389988B (en) * 2017-08-08 2022-12-20 腾讯科技(深圳)有限公司 Sound effect adjustment control method and device, storage medium and electronic device
CN108831504A (en) * 2018-06-13 2018-11-16 西安蜂语信息科技有限公司 Determination method, apparatus, computer equipment and the storage medium of pitch period
CN108831504B (en) * 2018-06-13 2020-12-04 西安蜂语信息科技有限公司 Method and device for determining pitch period, computer equipment and storage medium
CN109119097A (en) * 2018-10-30 2019-01-01 Oppo广东移动通信有限公司 Fundamental tone detecting method, device, storage medium and mobile terminal

Also Published As

Publication number Publication date
CN100541609C (en) 2009-09-16

Similar Documents

Publication Publication Date Title
US5692104A (en) Method and apparatus for detecting end points of speech activity
US7756700B2 (en) Perceptual harmonic cepstral coefficients as the front-end for speech recognition
US8725499B2 (en) Systems, methods, and apparatus for signal change detection
US8825477B2 (en) Systems, methods, and apparatus for frame erasure recovery
US5596680A (en) Method and apparatus for detecting speech activity using cepstrum vectors
CN101149924A (en) Method and device for implementing open-loop pitch search
US10482892B2 (en) Very short pitch detection and coding
AU2015258241B2 (en) Apparatus and method for selecting one of a first encoding algorithm and a second encoding algorithm using harmonics reduction
WO2008067719A1 (en) Sound activity detecting method and sound activity detecting device
JPH05346797A (en) Voiced sound discriminating method
CA2413138A1 (en) Fast frequency-domain pitch estimation
JP2011527445A (en) Method and identifier for classifying different segments of a signal
CN108305639B (en) Speech emotion recognition method, computer-readable storage medium and terminal
Ishizuka et al. Noise robust voice activity detection based on periodic to aperiodic component ratio
DK2843659T3 (en) PROCEDURE AND APPARATUS TO DETECT THE RIGHT OF PITCH PERIOD
JPS5870299A (en) Discrimination of and analyzer for voice signal
CN108682432B (en) Speech emotion recognition device
US6564182B1 (en) Look-ahead pitch determination
CN111312265A (en) Weight function determination apparatus and method for quantizing linear predictive coding coefficients
Upadhya Pitch detection in time and frequency domain
Sorin et al. The ETSI extended distributed speech recognition (DSR) standards: client side processing and tonal language recognition evaluation
Haghani et al. Robust voice activity detection using feature combination
Liu et al. Efficient voice activity detection algorithm based on sub-band temporal envelope and sub-band long-term signal variability
Mathur et al. Significance of the LP-MVDR spectral ratio method in whisper detection
Kocharov et al. Articulatory motivated acoustic features for speech recognition.

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant