CN108831509A - Determination method, apparatus, computer equipment and the storage medium of pitch period - Google Patents

Determination method, apparatus, computer equipment and the storage medium of pitch period Download PDF

Info

Publication number
CN108831509A
CN108831509A CN201810607513.7A CN201810607513A CN108831509A CN 108831509 A CN108831509 A CN 108831509A CN 201810607513 A CN201810607513 A CN 201810607513A CN 108831509 A CN108831509 A CN 108831509A
Authority
CN
China
Prior art keywords
pitch period
audio signal
measured
cost value
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810607513.7A
Other languages
Chinese (zh)
Other versions
CN108831509B (en
Inventor
袁念德
邵明绪
田姣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xi'an Bee Language Mdt Infotech Ltd
Original Assignee
Xi'an Bee Language Mdt Infotech Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xi'an Bee Language Mdt Infotech Ltd filed Critical Xi'an Bee Language Mdt Infotech Ltd
Priority to CN201810607513.7A priority Critical patent/CN108831509B/en
Publication of CN108831509A publication Critical patent/CN108831509A/en
Application granted granted Critical
Publication of CN108831509B publication Critical patent/CN108831509B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/90Pitch determination of speech signals

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Electrophonic Musical Instruments (AREA)
  • Display Devices Of Pinball Game Machines (AREA)

Abstract

This application involves determination method, apparatus, computer equipment and the storage mediums of a kind of pitch period.The method includes:When audio signal to be measured is in the current frame Voiced signal, then according to preset cost function, the target cost value of each first pitch period of the audio signal to be measured in the current frame is obtained;Wherein, the target cost value includes:Each first pitch period of the audio signal to be measured and the cost value between the second pitch period each in disassociation frame, the disassociation frame include:The historical frames adjacent with the present frame and the previous video frames set after the present frame;According to each target cost value, the target pitch period of the audio signal to be measured in the current frame is determined from each first pitch period.It can be improved the accuracy of pitch period using this method.

Description

Determination method, apparatus, computer equipment and the storage medium of pitch period
Technical field
The present invention relates to fields of communication technology, set more particularly to a kind of determination method, apparatus of pitch period, computer Standby and storage medium.
Background technique
Pitch period is the every unlatching of vocal cords and is closed the primary time, and pitch period exists as a kind of feature of audio signal The fields such as voice coding, identification are widely used.
In the extraction process of pitch period, it is possible that because of the mistake that all kinds of interference generate, for example, by true fundamental tone The frequency multiplication in period is partly determined as individual pitch periods frequently, alternatively, occurring the point being mutated individually in the track of pitch period Deng.Error rate in extraction process in order to reduce pitch period needs usually after the completion of pitch period extracts to extracting Smoothing processing is done in the track that pitch period is formed, and the point being mutated occurs during removal Pitch-Synchronous OLA.Currently, common smooth side Method is median filtering method, and the principle of this method is to choose the median work of multiple continuous candidate pitch periods in one section of sliding window For the pitch period of final output.
Then, the track of pitch period is carried out using median filtering method it is smooth, the accuracy of obtained pitch period compared with It is low.
Summary of the invention
Based on this, it is necessary in view of the above technical problems, provide a kind of fundamental tone week that can be improved pitch period accuracy Determination method, apparatus, computer equipment and the storage medium of phase.
A kind of determination method of pitch period, the method includes:
When audio signal to be measured is in the current frame Voiced signal, then according to preset cost function, obtain it is described to Survey the target cost value of each first pitch period of audio signal in the current frame;Wherein, the target cost value includes:It is described Each first pitch period of audio signal to be measured and the cost value between the second pitch period each in disassociation frame, the disassociation frame Including:The historical frames adjacent with the present frame and the previous video frames set after the present frame;
According to each target cost value, the mesh of the audio signal to be measured in the current frame is determined from each first pitch period Mark pitch period.
It is described according to preset cost function in one of the embodiments, the audio signal to be measured is obtained current The corresponding target cost value of each first pitch period in frame, including:
According to the cost function, each first pitch period of the audio signal to be measured is obtained and in the historical frames The second pitch period between the first cost value, and, obtain each first pitch period of the audio signal to be measured with The second cost value between the second pitch period in target previous video frames;The target previous video frames be the previous video frames set in It is located at a last previous video frames in timing;
According to first cost value and second cost value, the target cost value is obtained.
The previous video frames set includes the first previous video frames and the second previous video frames in one of the embodiments, and described the Two previous video frames are the target previous video frames;Each first pitch period for obtaining the audio signal to be measured with it is leading in target The second cost value between the second pitch period in frame, including:
According to the cost function, obtain each first pitch period of the audio signal to be measured with it is leading described first The third cost value between each second pitch period in frame, and, it is leading described first to obtain the audio signal to be measured Each second pitch period and the audio signal to be measured in frame is between each second pitch period in second previous video frames Forth generation value;
It is worth according to the third cost value and the forth generation, obtains second cost value.
If the audio signal to be measured is Unvoiced signal in the disassociation frame in one of the embodiments, described Cost function is according to the error amount between the audio signal to be measured under first pitch period and offset audio signal The function of construction, the offset audio signal are the audio signal to be measured according to the letter after first pitch period offset Number.
The cost function is W (n, n ± 1)=α * E in one of the embodiments,n(kn), wherein n is described current The mark of frame, n ± 1 are the mark of the disassociation frame, and α is smoothing factor, En(kn) it is the corresponding error of first pitch period Function, knFor the first pitch period.
If the audio signal to be measured is Voiced signal in the disassociation frame in one of the embodiments, described Cost function is according to described to be measured under first pitch period, second pitch period, first pitch period The function of error amount construction between audio signal and shifted signal, the offset audio signal are the audio signal root to be measured According to the signal after first pitch period offset.
In one of the embodiments, the cost function be W (n, n ± 1)=| kn-kn±1|+α*En(kn), wherein n is The mark of the present frame, n ± 1 are the mark of the disassociation frame, and α is smoothing factor, En(kn) it is first pitch period pair The error function answered, knFor first pitch period, kn±1For second pitch period.
It is described according to each target cost value in one of the embodiments, determined from each first pitch period it is described to The target pitch period of audio signal in the current frame is surveyed, including:
Determine that corresponding first pitch period of minimum cost value in each target cost value is the target fundamental tone week Phase.
A kind of determining device of pitch period, described device include:
Module is obtained, for when audio signal to be measured is in the current frame Voiced signal, then according to preset cost letter Number obtains the target cost value of each first pitch period of the audio signal to be measured in the current frame;Wherein, the target generation Value includes:Each first pitch period of the audio signal to be measured and the cost between the second pitch period each in disassociation frame Value, the disassociation frame include:The historical frames adjacent with the present frame and the previous video frames collection after the present frame It closes;
Determining module, for determining the audio signal to be measured from each first pitch period according to each target cost value Target pitch period in the current frame.
A kind of computer equipment, including memory and processor, the memory are stored with computer program, the processing Device realizes following steps when executing the computer program:
When audio signal to be measured is in the current frame Voiced signal, then according to preset cost function, obtain it is described to Survey the target cost value of each first pitch period of audio signal in the current frame;Wherein, the target cost value includes:It is described Each first pitch period of audio signal to be measured and the cost value between the second pitch period each in disassociation frame, the disassociation frame Including:The historical frames adjacent with the present frame and the previous video frames set after the present frame;
According to each target cost value, the mesh of the audio signal to be measured in the current frame is determined from each first pitch period Mark pitch period.
A kind of computer readable storage medium, is stored thereon with computer program, and the computer program is held by processor Following steps are realized when row:
When audio signal to be measured is in the current frame Voiced signal, then according to preset cost function, obtain it is described to Survey the target cost value of each first pitch period of audio signal in the current frame;Wherein, the target cost value includes:It is described Each first pitch period of audio signal to be measured and the cost value between the second pitch period each in disassociation frame, the disassociation frame Including:The historical frames adjacent with the present frame and the previous video frames set after the present frame;
According to each target cost value, the mesh of the audio signal to be measured in the current frame is determined from each first pitch period Mark pitch period.
Determination method, apparatus, computer equipment and the storage medium of above-mentioned pitch period, when audio signal to be measured is current When being Voiced signal in frame, then according to preset cost function, each first fundamental tone of audio signal to be measured in the current frame is obtained The target cost value in period determines audio signal to be measured in present frame according to each target cost value from each first pitch period In target pitch period, since target cost value includes:Each first pitch period of audio signal to be measured in disassociation frame Cost value between each second pitch period, disassociation frame include:Historical frames adjacent with present frame and it is located at after present frame Previous video frames set, can be determined in present frame in conjunction with the variation of pitch period between present frame and historical frames, previous video frames Target pitch period reach preferable smooth effect to effectively remove the pitch period of mutation, improve pitch period Accuracy.
Detailed description of the invention
Fig. 1 is a kind of flow chart of the determination method for pitch period that one embodiment provides;
Fig. 2 is a kind of frame structure schematic diagram that one embodiment provides;
Fig. 3 is a kind of possible implementation method flow chart of the step 101 in Fig. 1;
Fig. 4 is a kind of possible implementation method flow chart of the step 201 in Fig. 3;
Fig. 5 is a kind of determining device for pitch period that one embodiment provides;
Fig. 6 is a kind of determining device for pitch period that another embodiment provides;
Fig. 7 is a kind of structural scheme of mechanism for computer equipment that one embodiment provides.
Specific embodiment
It is with reference to the accompanying drawings and embodiments, right in order to which the objects, technical solutions and advantages of the application are more clearly understood The application is further elaborated.It should be appreciated that specific embodiment described herein is only used to explain the application, not For limiting the application.
The determination method of pitch period provided by the present application, can be applied in Acoustic detection environment, for believing voice Number pitch period do smoothing processing, to filter out the point being mutated in pitch period.The executing subject of this method can be terminal, clothes Business device etc..Wherein, terminal can be, but not limited to be various personal computers, laptop, smart phone, tablet computer and just Formula wearable device is taken, server can be with the server cluster of the either multiple server compositions of independent server come real It is existing.
Fig. 1 is a kind of flow chart of the determination method for pitch period that one embodiment provides, as shown in Figure 1, this method Include the following steps:
Step 101, when audio signal to be measured is in the current frame Voiced signal, then according to preset cost function, obtain Take the target cost value of each first pitch period of audio signal to be measured in the current frame;Wherein, target cost value includes:It is to be measured Each first pitch period of audio signal and the cost value between each second pitch period in disassociation frame, disassociation frame include: The historical frames adjacent with present frame and the previous video frames set after present frame.
Wherein, cost function is according to the function of the varied configurations of the pitch period in adjacent time frame, for calculating The cost value between pitch period in adjacent time frame, then cost value is used to indicate the pitch period in adjacent time frame Between error.Pitch period variation in adjacent time frame is bigger, then the cost value obtained according to cost function is bigger.First The target cost value of pitch period includes each first pitch period of audio signal to be measured and each second fundamental tone week in historical frames Between each second pitch period in each first pitch period and previous video frames of cost value and audio signal to be measured between phase Cost value.First pitch period and the second pitch period are the audio to be measured determined using the methods of Autocorrelation Detection, amplitude difference Alternative pitch period of the signal in each time frame.
As shown in Fig. 2, present frame is Frm(0), historical frames Frm(-1), include previous video frames Frm in previous video frames set(1)With Previous video frames Frm(2), wherein historical frames Frm(-1)In include have already passed through second pitch period obtained after smoothing processing, Present frame Frm(0), previous video frames Frm(1)With previous video frames Frm(2)In include 5 alternative pitch periods.It should be noted that preceding Leading in frame set may include a previous video frames adjacent with present frame, also may include multiple leading after present frame Frame, for example, including 3,4, even more previous video frames being located at after present frame in previous video frames set, in previous video frames set Previous video frames be continuous in timing.Also, number the application of the alternative pitch period in present frame and each previous video frames In it is without restriction.
It can be according to each first pitch period of cost function calculation audio signal to be measured in the current frame and in historical frames In the second pitch period between cost value W1, further according to cost function calculation audio signal to be measured in the current frame each One pitch period and the cost value W2 between the second pitch period in previous video frames, by each first fundamental tone week corresponding W1 and W2 is added to obtain target cost value, alternatively, being weighted summation to W1 and W2 obtains target cost value.Wherein, due to present frame In the corresponding previous video frames of each pitch period in there is also multiple second pitch periods, therefore, each of present frame first There are multiple W2 for pitch period, then there are multiple target cost values for the first pitch period of each of present frame.
Step 102, according to each target cost value, determine audio signal to be measured in the current frame from each first pitch period Target pitch period.
In the present embodiment, since cost value is bigger, then the variation between the pitch period in adjacent time frame is bigger, because This, can choose lesser target cost and be worth target of corresponding first pitch period as audio signal to be measured in the current frame Pitch period.For example, each target cost value is arranged according to sequence from small to large, one or more mesh for coming front is taken Mark target pitch period of corresponding first pitch period of cost value as audio signal to be measured in the current frame.
The determination method of pitch period provided by the embodiments of the present application, when audio signal to be measured is voiced sound letter in the current frame Number when, then according to preset cost function, obtain the target generation of each first pitch period of audio signal to be measured in the current frame Value determines the target fundamental tone of audio signal to be measured in the current frame according to each target cost value from each first pitch period Period, since target cost value includes:Each first pitch period of audio signal to be measured and the second fundamental tone week each in disassociation frame Cost value between phase, disassociation frame include:The historical frames adjacent with present frame and the previous video frames collection after present frame It closes, can determine the target fundamental tone in present frame in conjunction with the variation of the pitch period between present frame and historical frames, previous video frames Period reaches preferable smooth effect to effectively remove the pitch period of mutation, improves the accuracy of pitch period.
Optionally, step 101 " according to each target cost value, determines that audio signal to be measured exists from each first pitch period Target pitch period in present frame ", including:Determine corresponding first pitch period of minimum cost value in each target cost value For target pitch period.
In the present embodiment, since the variation of the pitch period in adjacent time is bigger, cost value is bigger, therefore, will be each It is minimum in target cost value to be worth corresponding first pitch period greatly and be determined as target pitch period, it can effectively filter out mutation Pitch period so that between each time frame pitch period variation it is smaller, ensure that the accuracy and reliability of pitch period.
Optionally, on the basis of embodiment shown in Fig. 1, if audio signal to be measured is Unvoiced signal in disassociation frame, Cost function is the letter according to the error amount construction between the audio signal to be measured under the first pitch period and offset audio signal Number, offset audio signal are audio signal to be measured according to the signal after the offset of the first pitch period.
In the present embodiment, cost function is used to calculate the cost value between two neighboring time frame, and disassociation frame can be gone through History frame, or previous video frames.When historical frames or previous video frames are Unvoiced signal, can according under the first pitch period to It surveys audio signal and deviates the function of the error amount construction between audio signal.Audio signal to be measured under first pitch period and Deviating the error amount between audio signal can be obtained using single the methods of normalized auto-correlated error or normalized energy difference The error amount arrived.
Optionally, if audio signal to be measured is Unvoiced signal in disassociation frame, cost function is W (n, n ± 1)=α * En (kn), wherein n is the mark of present frame, and n ± 1 is the mark of disassociation frame, and α is smoothing factor, 0<α≤256, En(kn) it is first The corresponding error function of pitch period, knFor the first pitch period.
In the present embodiment, when audio signal to be measured in disassociation frame be Unvoiced signal when, can be used formula W (n, n ± 1)=α * En(kn) cost value is calculated, n-1 is the mark of historical frames, and n+1 is the mark of previous video frames.By taking Fig. 2 as an example, if historical frames Frm(-1)For Unvoiced signal, then formula W (0, -1)=α * E is used0(k0) calculate present frame Frm(0)With historical frames Frm(-1)Between Cost value;If previous video frames Frm(1)For Unvoiced signal, then formula W (0,1)=α * E is used0(k0) calculate present frame Frm(0)With it is preceding Lead frame Frm(1)Between cost value.
Optionally, on the basis of embodiment shown in Fig. 1, if audio signal to be measured is Voiced signal in disassociation frame, Cost function is according to the audio signal to be measured and offset letter under the first pitch period, the second pitch period, the first pitch period The function of error amount construction between number, offset audio signal are audio signal to be measured according to the letter after the offset of the first pitch period Number.
It in the present embodiment, can be according to the first pitch period, second when historical frames or previous video frames are Unvoiced signal The function of error amount construction between audio signal and shifted signal to be measured under pitch period, the first pitch period, wherein the Two pitch periods are the second pitch period in historical frames or the second pitch period in previous video frames.
Optionally, on the basis of embodiment shown in Fig. 1, if audio signal to be measured is Voiced signal in disassociation frame, Cost function be W (n, n ± 1)=| kn-kn±1|+α*En(kn), wherein n is the mark of present frame, and n ± 1 is the mark of disassociation frame Know, α is smoothing factor, En(kn) it is the corresponding error function of the first pitch period, knFor the first pitch period, kn±1For the second base The sound period.
In the present embodiment, if audio signal to be measured in disassociation frame be Voiced signal, using formula W (n, n ± 1)= |kn-kn±1|+α*En(kn) cost value is calculated, n-1 is the mark of historical frames, and n+1 is the mark of previous video frames, kn-1For in historical frames The second pitch period, kn+1For the second pitch period in previous video frames.By taking Fig. 2 as an example, if historical frames Frm(-1)For voiced sound letter Number, then using formula W (0, -1)=| k0-k-1|+α*E0(k0) calculate present frame Frm(0)With historical frames Frm(-1)Between cost Value;If previous video frames Frm (1)For Voiced signal, then using formula W (0,1)=| k0-k1|+α*E0(k0) calculate present frame Frm(0)With Previous video frames Frm(1)Between cost value.
Fig. 3 is a kind of possible implementation method flow chart of the step 101 in Fig. 1, the basis of embodiment shown in Fig. 1 On, as shown in figure 3, step " according to preset cost function, obtains each first fundamental tone week of audio signal to be measured in the current frame Phase corresponding target cost value ", including:
Step 201, according to cost function, obtain each first pitch period of audio signal to be measured and the in historical frames The first cost value between two pitch periods, and, obtain each first pitch period of audio signal to be measured with it is leading in target The second cost value between the second pitch period in frame;Target previous video frames are last to be located in timing in previous video frames set One previous video frames.
Wherein, target previous video frames be previous video frames set in timing be located at a last previous video frames, as shown in Fig. 2, If in previous video frames set including Frm(-1)This previous video frames, then target previous video frames are Frm(-1)If including in previous video frames set Frm(1)And Frm(2), then target previous video frames are Frm(2), alternatively, including Frm in previous video frames set(1)、Frm(2)、Frm(3), then mesh Mark previous video frames are Frm(3), and so on.
In the present embodiment, according to cost function, each first pitch period of audio signal to be measured can be calculated and gone through The first cost value between the second pitch period in history frame, and, each first pitch period of audio signal to be measured in mesh Mark the second cost value between the second pitch period in previous video frames.As shown in Fig. 2, for the first pitch period B1, present frame Frm(0)In the first pitch period B1 in historical frames Frm(-1)In the second pitch period A between first generation value indicia For W (B1, A), present frame Frm(0)In the first pitch period B1 in target previous video frames Frm(2)In each second pitch period The second cost value between D1 is respectively labeled as W (B1, D1), W (B1, D2), W (B1, D3), W (B1, D4), W (B1, D5), In, W (B1, D1) corresponding path may include B1-C1-D1, B1-C2-D1, B1-C3-D1 ..., herein no longer It repeats.
Optionally, if previous video frames set includes the first previous video frames and the second previous video frames, and the second previous video frames are that target is leading Frame;As shown in figure 4, a kind of possible implementation method of step 201 may include:
Step 301, according to cost function, obtain each first pitch period of audio signal to be measured in the first previous video frames Each second pitch period between third cost value, and, obtain each second of audio signal to be measured in the first previous video frames The forth generation value of pitch period and audio signal to be measured between each second pitch period in the second previous video frames.
By taking Fig. 2 as an example, the first previous video frames are Frm(1), the second previous video frames are Frm(2), by taking the first pitch period B1 as an example, root According to cost function, the first pitch period B1 is obtained and in the first previous video frames Frm(1)In each second pitch period between third Cost value W (B1, C1), W (B1, C2), W (B1, C3), W (B1, C4), W (B1, C5);With the first previous video frames Frm(1)In second For pitch period C1, the first previous video frames Frm is obtained(1)In the second pitch period C1 and audio signal to be measured it is leading second Frame Frm(2)In each second pitch period between forth generation value W (C1, D1), W (C1, D2), W (C1, D3), W (C1, D4), W (C1, D5), and so on, obtain multiple third cost values and multiple forth generations value.
Wherein, each second pitch period of the audio signal to be measured in the first previous video frames and audio signal to be measured are calculated the When forth generation value between each second pitch period in two previous video frames, can also using above-mentioned cost function W (n, n ± 1)=α * En(kn) or W (n, n ± 1)=| kn-kn±1|+α*En(kn).For example, using formula W if the second previous video frames are voiceless sound (1,2)=α * E1(k1) calculate forth generation value, if the second previous video frames be voiced sound, using formula W (1,2)=| k1-k2|+α*E1 (k1) calculate forth generation value.
Optionally, each second pitch period of the audio signal to be measured in the first previous video frames can also first be obtained and to acoustic Minimum cost value of the frequency signal between the second pitch period in the second previous video frames, will take audio signal to be measured leading first The corresponding minimum cost value of each second pitch period in frame is worth as forth generation;It will acquire each the of audio signal to be measured again One pitch period combines phase with forth generation value with the third cost value between each second pitch period in the first previous video frames Add, acquires the second cost value.This method can reduce calculating step, improve efficiency.
By taking Fig. 2 as an example, 1) calculate the first previous video frames Frm(1)All second pitch periods to the second previous video frames Frm(2)'s Minimum cost value between pitch period, traversal 25 are deeply in love condition, finally obtain the first previous video frames Frm(1)Each of the second fundamental tone week The minimum cost value of phase.2) Frm is being calculated(0)To Frm(1)Frm is arrived again(2)Cost value when, just no longer need to calculate Frm(1)To Frm(2)Other situations in addition to known minimum cost value, then calculate Frm(0)Five the first pitch periods to Frm(1)Frm is arrived again(2)Minimum cost value and record.3) due to Frm(-1)For the pitch period that the known last time is smoothly acquired, Frm need to be only traversed(-1)To the 5 small cost values that 2) step calculates, Frm is obtained(0)Five the first pitch periods minimum cost value as target Cost value.4) by Frm(0)Historical frames are updated to, target cost is worth corresponding first pitch period as second in historical frames Pitch period, by Frm(1)It is updated to present frame, by Frm(2)As the first previous video frames, start the smoothing processing of next frame.
For example, with the first previous video frames Frm(1)In the second pitch period C1 for, obtain the first previous video frames Frm(1)In Second pitch period C1 and audio signal to be measured are in the second previous video frames Frm(2)In each second pitch period between cost value W (C1, D1), W (C1, D2), W (C1, D3), W (C1, D4), W (C1, D5) then make W (C1, D2) if wherein W (C1, D2) is minimum For the first previous video frames Frm(1)In the second pitch period C1 forth generation value, and so on, calculate the first previous video frames Frm(1)In other second pitch period C1 forth generation value.
Step 302 is worth according to third cost value and forth generation, obtains the second cost value.
In the present embodiment, multiple third cost values are added with the value combination of multiple forth generations, alternatively, by multiple thirds Cost value is added with the weighted value combination that multiple forth generations are worth, and obtains the second cost value.
The method of the present embodiment obtains each first pitch period of audio signal to be measured and first according to cost function The third cost value between each second pitch period in previous video frames, and, audio signal to be measured is obtained in the first previous video frames Each second pitch period and audio signal to be measured between each second pitch period in the second previous video frames forth generation value, It is worth according to third cost value and forth generation, obtains the second cost value, at least use the fundamental tone week of four adjacent time frames Phase starts the target cost value of the first pitch period of each of present frame, when so that target cost value more embodying different Between fundamental tone variation between frame so that the smoothing process of pitch period is more reliable and accurate.
Step 202, according to the first cost value and the second cost value, obtain target cost value.
In the present embodiment, can by the first cost value and the second cost value and target cost value, or first Cost value and the second cost value assign different weights, then sum the first cost value and second generation value weighting to obtain target Cost value.For example, as shown in Fig. 2, the target cost value of the first pitch period B1 can be respectively W (B1, A)+W (B1, D1), W (B1, A)+W (B1, D2), W (B1, A)+W (B1, D3) etc..
Illustratively, in Fig. 2, by present frame Frm(0)In each first pitch period in historical frames Frm(-1)In First generation value indicia between two pitch periods is W (0, -1), by each present frame Frm(0)In the first pitch period with first Previous video frames Frm(1)In each second pitch period between third generation value indicia be W (0,1), will be in the first previous video frames Frm(1) In each second pitch period and audio signal to be measured in the second previous video frames Frm(2)In each second pitch period between the 4th Cost value is labeled as W (1,2), then target cost value W=W (0, -1)+W (0,1)+W (1,2), then theoretically there may be 5*5*5 =125 target cost values can take the smallest target cost value corresponding first fundamental tone week from this 125 target cost values Phase is as target pitch period.
The determination method of pitch period provided in this embodiment obtains each the of audio signal to be measured according to cost function One pitch period and the first cost value between the second pitch period in historical frames, and, obtain audio signal to be measured Each first pitch period and the second cost value between the second pitch period in target previous video frames, according to the first cost value and Second cost value obtains target cost value, determines mesh according to the cost value between the pitch period at least three time frames Cost value is marked, so that the target cost value of each pitch period in present frame has more accurate reference significance, can be incited somebody to action The pitch period for occurring highlighting filters out, and improves the accuracy of pitch period.
It should be understood that although each step in the flow chart of Fig. 1-4 is successively shown according to the instruction of arrow, These steps are not that the inevitable sequence according to arrow instruction successively executes.Unless expressly stating otherwise herein, these steps Execution there is no stringent sequences to limit, these steps can execute in other order.Moreover, at least one in Fig. 1-4 Part steps may include that perhaps these sub-steps of multiple stages or stage are not necessarily in synchronization to multiple sub-steps Completion is executed, but can be executed at different times, the execution sequence in these sub-steps or stage is also not necessarily successively It carries out, but can be at least part of the sub-step or stage of other steps or other steps in turn or alternately It executes.
Fig. 5 is a kind of determining device for pitch period that one embodiment provides, as shown in figure 5, the device includes:It obtains Module 11 and determining module 12.
Module 11 is obtained, for when audio signal to be measured is in the current frame Voiced signal, then according to preset cost Function obtains the target cost value of each first pitch period of the audio signal to be measured in the current frame;Wherein, the target Cost value includes:Each first pitch period of the audio signal to be measured and the generation between the second pitch period each in disassociation frame Value, the disassociation frame include:The historical frames adjacent with the present frame and the previous video frames collection after the present frame It closes;
Determining module 12, for according to each target cost value, determining the audio letter to be measured from each first pitch period Target pitch period number in the current frame.
In one of the embodiments, as shown in fig. 6, obtaining module 11 includes that the first acquisition submodule 111 and second obtains Take submodule 112;First acquisition submodule 111 is used for according to the cost function, obtains each the of the audio signal to be measured One pitch period and the first cost value between the second pitch period in the historical frames, and, it obtains described to acoustic Each first pitch period of frequency signal and the second cost value between the second pitch period in target previous video frames;The target Previous video frames are to be located at a last previous video frames in the previous video frames set in timing;Second acquisition submodule 112 is used for root According to first cost value and second cost value, the target cost value is obtained.
The previous video frames set includes the first previous video frames and the second previous video frames in one of the embodiments, and described the Two previous video frames are the target previous video frames;First acquisition submodule 111 obtains each first fundamental tone week of the audio signal to be measured Phase and the second cost value between the second pitch period in target previous video frames, including:First acquisition submodule 111 is according to institute Cost function is stated, each first pitch period and each second base in first previous video frames for obtaining the audio signal to be measured Third cost value between the sound period, and, obtain each second base of the audio signal to be measured in first previous video frames The forth generation value of sound period and the audio signal to be measured between each second pitch period in second previous video frames;Root It is worth according to the third cost value and the forth generation, obtains second cost value.
If the audio signal to be measured is Unvoiced signal in the disassociation frame in one of the embodiments, described Cost function is according to the error amount between the audio signal to be measured under first pitch period and offset audio signal The function of construction, the offset audio signal are the audio signal to be measured according to the letter after first pitch period offset Number.
The cost function is W (n, n ± 1)=α * E in one of the embodiments,n(kn), wherein n is described current The mark of frame, n ± 1 are the mark of the disassociation frame, and α is smoothing factor, En(kn) it is the corresponding error of first pitch period Function, knFor the first pitch period.
If the audio signal to be measured is Voiced signal in the disassociation frame in one of the embodiments, described Cost function is according to described to be measured under first pitch period, second pitch period, first pitch period The function of error amount construction between audio signal and shifted signal, the offset audio signal are the audio signal root to be measured According to the signal after first pitch period offset.
In one of the embodiments, the cost function be W (n, n ± 1)=| kn-kn±1|+α*En(kn), wherein n is The mark of the present frame, n ± 1 are the mark of the disassociation frame, and α is smoothing factor, En(kn) it is first pitch period pair The error function answered, knFor first pitch period, kn±1For second pitch period.
The determining module 12 is specifically used for determining the minimum in each target cost value in one of the embodiments, Corresponding first pitch period of cost value is the target pitch period.
The specific of determining device about pitch period limits the determination method that may refer to above for pitch period Restriction, details are not described herein.Modules in the determining device of above-mentioned pitch period can be fully or partially through software, hard Part and combinations thereof is realized.Above-mentioned each module can be embedded in the form of hardware or independently of in the processor in computer equipment, It can also be stored in a software form in the memory in computer equipment, execute the above modules in order to which processor calls Corresponding operation.
In one embodiment, a kind of computer equipment is provided, which can be server, internal junction Composition can be as shown in Figure 7.The computer equipment include by system bus connect processor, memory, network interface and Database.Wherein, the processor of the computer equipment is for providing calculating and control ability.The memory packet of the computer equipment Include non-volatile memory medium, built-in storage.The non-volatile memory medium is stored with operating system, computer program and data Library.The built-in storage provides environment for the operation of operating system and computer program in non-volatile memory medium.The calculating The database of machine equipment is for storing each cost Value Data.The network interface of the computer equipment is used to pass through with external terminal Network connection communication.A kind of determination method of pitch period is realized when the computer program is executed by processor.
It will be understood by those skilled in the art that structure shown in Fig. 7, only part relevant to application scheme is tied The block diagram of structure does not constitute the restriction for the computer equipment being applied thereon to application scheme, specific computer equipment It may include perhaps combining certain components or with different component layouts than more or fewer components as shown in the figure.
In one embodiment, a kind of computer equipment, including memory and processor are provided, is stored in memory Computer program, the processor realize following steps when executing computer program:
When audio signal to be measured is in the current frame Voiced signal, then according to preset cost function, obtain it is described to Survey the target cost value of each first pitch period of audio signal in the current frame;Wherein, the target cost value includes:It is described Each first pitch period of audio signal to be measured and the cost value between the second pitch period each in disassociation frame, the disassociation frame Including:The historical frames adjacent with the present frame and the previous video frames set after the present frame;
According to each target cost value, the mesh of the audio signal to be measured in the current frame is determined from each first pitch period Mark pitch period.
In one embodiment, following steps are also realized when processor executes computer program:
According to the cost function, each first pitch period of the audio signal to be measured is obtained and in the historical frames The second pitch period between the first cost value, and, obtain each first pitch period of the audio signal to be measured with The second cost value between the second pitch period in target previous video frames;The target previous video frames be the previous video frames set in It is located at a last previous video frames in timing;According to first cost value and second cost value, the target generation is obtained Value.
In one embodiment, following steps are also realized when processor executes computer program:
According to the cost function, obtain each first pitch period of the audio signal to be measured with it is leading described first The third cost value between each second pitch period in frame, and, it is leading described first to obtain the audio signal to be measured Each second pitch period and the audio signal to be measured in frame is between each second pitch period in second previous video frames Forth generation value;It is worth according to the third cost value and the forth generation, obtains second cost value.
In one embodiment, following steps are also realized when processor executes computer program:If the audio letter to be measured It is Unvoiced signal number in the disassociation frame, then the cost function is according under first pitch period to acoustic The function of error amount construction between frequency signal and offset audio signal, the offset audio signal are the audio signal to be measured According to the signal after first pitch period offset.
In one embodiment, following methods are also realized when processor executes computer program:
The cost function is W (n, n ± 1)=α * En(kn), wherein n is the mark of the present frame, and n ± 1 is described The mark of disassociation frame, α are smoothing factor, En(kn) it is the corresponding error function of first pitch period, knFor the first fundamental tone week Phase.
In one embodiment, following steps are also realized when processor executes computer program:
If the audio signal to be measured is Voiced signal in the disassociation frame, the cost function is according to described the The audio signal to be measured and shifted signal under one pitch period, second pitch period, first pitch period it Between error amount construction function, the offset audio signal for the audio signal to be measured it is inclined according to first pitch period Signal after shifting.
In one embodiment, following methods are also realized when processor executes computer program:
The cost function be W (n, n ± 1)=| kn-kn±1|+α*En(kn), wherein n is the mark of the present frame, n ± 1 is the mark of the disassociation frame, and α is smoothing factor, En(kn) it is the corresponding error function of first pitch period, knFor First pitch period, kn±1For second pitch period.
In one embodiment, following steps are also realized when processor executes computer program:
Determine that corresponding first pitch period of minimum cost value in each target cost value is the target fundamental tone week Phase.
In one embodiment, a kind of computer readable storage medium is provided, computer program is stored thereon with, is calculated Machine program realizes following steps when being executed by processor:
When audio signal to be measured is in the current frame Voiced signal, then according to preset cost function, obtain it is described to Survey the target cost value of each first pitch period of audio signal in the current frame;Wherein, the target cost value includes:It is described Each first pitch period of audio signal to be measured and the cost value between the second pitch period each in disassociation frame, the disassociation frame Including:The historical frames adjacent with the present frame and the previous video frames set after the present frame;
According to each target cost value, the mesh of the audio signal to be measured in the current frame is determined from each first pitch period Mark pitch period.
In one embodiment, following steps are also realized when computer program is executed by processor:
According to the cost function, each first pitch period of the audio signal to be measured is obtained and in the historical frames The second pitch period between the first cost value, and, obtain each first pitch period of the audio signal to be measured with The second cost value between the second pitch period in target previous video frames;The target previous video frames be the previous video frames set in It is located at a last previous video frames in timing;
According to first cost value and second cost value, the target cost value is obtained.
In one embodiment, following steps are also realized when computer program is executed by processor:
According to the cost function, obtain each first pitch period of the audio signal to be measured with it is leading described first The third cost value between each second pitch period in frame, and, it is leading described first to obtain the audio signal to be measured Each second pitch period and the audio signal to be measured in frame is between each second pitch period in second previous video frames Forth generation value;It is worth according to the third cost value and the forth generation, obtains second cost value.
In one embodiment, following steps are also realized when computer program is executed by processor:
If the audio signal to be measured is Unvoiced signal in the disassociation frame, the cost function is according to described the The function of error amount construction between the audio signal to be measured under one pitch period and offset audio signal, the offset sound Frequency signal is the audio signal to be measured according to the signal after first pitch period offset.
In one embodiment, following methods are also realized when computer program is executed by processor:
The cost function is W (n, n ± 1)=α * En(kn), wherein n is the mark of the present frame, and n ± 1 is described The mark of disassociation frame, α are smoothing factor, En(kn) it is the corresponding error function of first pitch period, knFor the first fundamental tone week Phase.
In one embodiment, following steps are also realized when computer program is executed by processor:
If the audio signal to be measured is Voiced signal in the disassociation frame, the cost function is according to described the The audio signal to be measured and shifted signal under one pitch period, second pitch period, first pitch period it Between error amount construction function, the offset audio signal for the audio signal to be measured it is inclined according to first pitch period Signal after shifting.
In one embodiment, following methods are also realized when computer program is executed by processor:
The cost function be W (n, n ± 1)=| kn-kn±1|+α*En(kn), wherein n is the mark of the present frame, n ± 1 is the mark of the disassociation frame, and α is smoothing factor, En(kn) it is the corresponding error function of first pitch period, knFor First pitch period, kn±1For second pitch period.
In one embodiment, following steps are also realized when computer program is executed by processor:
Determine that corresponding first pitch period of minimum cost value in each target cost value is the target fundamental tone week Phase.
Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with Relevant hardware is instructed to complete by computer program, the computer program can be stored in a non-volatile computer In read/write memory medium, the computer program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, To any reference of memory, storage, database or other media used in each embodiment provided herein, Including non-volatile and/or volatile memory.Nonvolatile memory may include read-only memory (ROM), programming ROM (PROM), electrically programmable ROM (EPROM), electrically erasable ROM (EEPROM) or flash memory.Volatile memory may include Random access memory (RAM) or external cache.By way of illustration and not limitation, RAM is available in many forms, Such as static state RAM (SRAM), dynamic ram (DRAM), synchronous dram (SDRAM), double data rate sdram (DDRSDRAM), enhancing Type SDRAM (ESDRAM), synchronization link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic ram (DRDRAM) and memory bus dynamic ram (RDRAM) etc..
Each technical characteristic of embodiment described above can be combined arbitrarily, for simplicity of description, not to above-mentioned reality It applies all possible combination of each technical characteristic in example to be all described, as long as however, the combination of these technical characteristics is not deposited In contradiction, all should be considered as described in this specification.
The embodiments described above only express several embodiments of the present invention, and the description thereof is more specific and detailed, but simultaneously It cannot therefore be construed as limiting the scope of the patent.It should be pointed out that coming for those of ordinary skill in the art It says, without departing from the inventive concept of the premise, various modifications and improvements can be made, these belong to protection of the invention Range.Therefore, the scope of protection of the patent of the invention shall be subject to the appended claims.

Claims (11)

1. a kind of determination method of pitch period, which is characterized in that the method includes:
When audio signal to be measured is in the current frame Voiced signal, then according to preset cost function, obtain described to acoustic The target cost value of each first pitch period of frequency signal in the current frame;Wherein, the target cost value includes:It is described to be measured Each first pitch period of audio signal and the cost value between each second pitch period in disassociation frame, the disassociation frame packet It includes:The historical frames adjacent with the present frame and the previous video frames set after the present frame;
According to each target cost value, the target base of the audio signal to be measured in the current frame is determined from each first pitch period The sound period.
2. acquisition is described to be measured the method according to claim 1, wherein described according to preset cost function The corresponding target cost value of each first pitch period of audio signal in the current frame, including:
According to the cost function, each first pitch period of the audio signal to be measured and the in the historical frames are obtained The first cost value between two pitch periods, and, obtain each first pitch period of the audio signal to be measured and in target The second cost value between the second pitch period in previous video frames;The target previous video frames are in the previous video frames set in timing It is upper to be located at a last previous video frames;
According to first cost value and second cost value, the target cost value is obtained.
3. according to the method described in claim 2, it is characterized in that, the previous video frames set is including before the first previous video frames and second Frame is led, and second previous video frames are the target previous video frames;Each first fundamental tone week for obtaining the audio signal to be measured Phase and the second cost value between the second pitch period in target previous video frames, including:
According to the cost function, each first pitch period of the audio signal to be measured is obtained and in first previous video frames Each second pitch period between third cost value, and, obtain the audio signal to be measured in first previous video frames Each second pitch period and the audio signal to be measured between each second pitch period in second previous video frames Four cost values;
It is worth according to the third cost value and the forth generation, obtains second cost value.
4. method according to claim 1-3, which is characterized in that if the audio signal to be measured is in the association It is Unvoiced signal in frame, then the cost function is according to the audio signal to be measured and offset under first pitch period The function of error amount construction between audio signal, the offset audio signal are the audio signal to be measured according to described first Signal after pitch period offset.
5. according to the method described in claim 4, it is characterized in that, the cost function is W (n, n ± 1)=α * En(kn), In, n is the mark of the present frame, and n ± 1 is the mark of the disassociation frame, and α is smoothing factor, En(kn) it is first fundamental tone Period corresponding error function, knFor the first pitch period.
6. method according to claim 1-3, which is characterized in that if the audio signal to be measured is in the association It is Voiced signal in frame, then the cost function is according to first pitch period, second pitch period, described first The function of the audio signal to be measured under pitch period and the construction of the error amount between shifted signal, the offset audio signal It is the audio signal to be measured according to the signal after first pitch period offset.
7. according to the method described in claim 6, it is characterized in that, the cost function be W (n, n ± 1)=| kn-kn±1|+α* En(kn), wherein n is the mark of the present frame, and n ± 1 is the mark of the disassociation frame, and α is smoothing factor, En(kn) for institute State the corresponding error function of the first pitch period, knFor first pitch period, kn±1For second pitch period.
8. method according to claim 1-3, which is characterized in that it is described according to each target cost value, from each The target pitch period of the audio signal to be measured in the current frame is determined in one pitch period, including:
Determine that corresponding first pitch period of minimum cost value in each target cost value is the target pitch period.
9. a kind of determining device of pitch period, which is characterized in that described device includes:
Module is obtained, for when audio signal to be measured is in the current frame Voiced signal, then according to preset cost function, obtaining Take the target cost value of each first pitch period of the audio signal to be measured in the current frame;Wherein, the target cost value Including:Each first pitch period of the audio signal to be measured and the cost value between the second pitch period each in disassociation frame, The disassociation frame includes:The historical frames adjacent with the present frame and the previous video frames set after the present frame;
Determining module, for determining that the audio signal to be measured is being worked as from each first pitch period according to each target cost value Target pitch period in previous frame.
10. a kind of computer equipment, including memory and processor, the memory are stored with computer program, feature exists In the step of processor realizes any one of claims 1 to 8 the method when executing the computer program.
11. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program The step of method described in any item of the claim 1 to 8 is realized when being executed by processor.
CN201810607513.7A 2018-06-13 2018-06-13 Method and device for determining pitch period, computer equipment and storage medium Active CN108831509B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810607513.7A CN108831509B (en) 2018-06-13 2018-06-13 Method and device for determining pitch period, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810607513.7A CN108831509B (en) 2018-06-13 2018-06-13 Method and device for determining pitch period, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN108831509A true CN108831509A (en) 2018-11-16
CN108831509B CN108831509B (en) 2020-12-04

Family

ID=64144995

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810607513.7A Active CN108831509B (en) 2018-06-13 2018-06-13 Method and device for determining pitch period, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN108831509B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1412742A (en) * 2002-12-19 2003-04-23 北京工业大学 Speech signal base voice period detection method based on wave form correlation method
CN1975861A (en) * 2006-12-15 2007-06-06 清华大学 Vocoder fundamental tone cycle parameter channel error code resisting method
CN101030375A (en) * 2007-04-13 2007-09-05 清华大学 Method for extracting base-sound period based on dynamic plan
KR20100022894A (en) * 2008-08-20 2010-03-03 인하대학교 산학협력단 A voiced/unvoiced decision method for the smv of 3gpp2 using gaussian mixture model
CN101887723A (en) * 2007-06-14 2010-11-17 华为终端有限公司 Fine tuning method and device for pitch period
CN103915099A (en) * 2012-12-29 2014-07-09 北京百度网讯科技有限公司 Speech pitch period detection method and device
US9135923B1 (en) * 2014-03-17 2015-09-15 Chengjun Julian Chen Pitch synchronous speech coding based on timbre vectors

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1412742A (en) * 2002-12-19 2003-04-23 北京工业大学 Speech signal base voice period detection method based on wave form correlation method
CN1975861A (en) * 2006-12-15 2007-06-06 清华大学 Vocoder fundamental tone cycle parameter channel error code resisting method
CN101030375A (en) * 2007-04-13 2007-09-05 清华大学 Method for extracting base-sound period based on dynamic plan
CN101887723A (en) * 2007-06-14 2010-11-17 华为终端有限公司 Fine tuning method and device for pitch period
KR20100022894A (en) * 2008-08-20 2010-03-03 인하대학교 산학협력단 A voiced/unvoiced decision method for the smv of 3gpp2 using gaussian mixture model
CN103915099A (en) * 2012-12-29 2014-07-09 北京百度网讯科技有限公司 Speech pitch period detection method and device
US9135923B1 (en) * 2014-03-17 2015-09-15 Chengjun Julian Chen Pitch synchronous speech coding based on timbre vectors

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
魏旋: "基于动态规划的低延时基音提取算法", 《清华大学学报(自然科学版)》 *

Also Published As

Publication number Publication date
CN108831509B (en) 2020-12-04

Similar Documents

Publication Publication Date Title
Tiffin et al. Is agriculture the engine of growth?
CN110010133A (en) Vocal print detection method, device, equipment and storage medium based on short text
CN109508638A (en) Face Emotion identification method, apparatus, computer equipment and storage medium
CN111785288B (en) Voice enhancement method, device, equipment and storage medium
CN110009225A (en) Risk evaluating system construction method, device, computer equipment and storage medium
CN109815489A (en) Collection information generating method, device, computer equipment and storage medium
CN108922544A (en) General vector training method, voice clustering method, device, equipment and medium
CN109102797A (en) Speech recognition test method, device, computer equipment and storage medium
CN109583682A (en) Recognition methods, device and the computer equipment of business finance fraud risk
CN109886110A (en) Micro- expression methods of marking, device, computer equipment and storage medium
CN109800879A (en) Construction of knowledge base method and apparatus
CN110457985A (en) Pedestrian based on video sequence recognition methods, device and computer equipment again
CN109543011A (en) Question and answer data processing method, device, computer equipment and storage medium
CN109325118A (en) Uneven sample data preprocess method, device and computer equipment
CN110471585A (en) Function of application icon methods of exhibiting, device and computer equipment
CN112817524A (en) Flash memory reliability grade online prediction method and device based on dynamic neural network
CN108984721A (en) The recognition methods of rubbish account and device
CN108255886A (en) The appraisal procedure and device of commending system
CN110008119A (en) Report test method, device, computer equipment and storage medium
CN106203631A (en) The parallel Frequent Episodes Mining of description type various dimensions sequence of events and system
CN109447412A (en) Construct method, apparatus, computer equipment and the storage medium of business connection map
CN116298830A (en) Verification method and processing system of integrated circuit
CN109670534B (en) Policy hotspot prediction method, device, computer equipment and storage medium
Liu et al. Golden gemini is all you need: Finding the sweet spots for speaker verification
CN112818868B (en) Method and device for identifying illegal user based on behavior sequence characteristic data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant