Detailed description of the invention
Purpose, technical scheme and advantage for making the embodiment of the present invention are clearer, below in conjunction with the present invention
Accompanying drawing in embodiment, is clearly and completely described to the technical scheme in the embodiment of the present invention, it is clear that
Described embodiment is a part of embodiment of the present invention, rather than whole embodiments.Based in the present invention
Embodiment, those of ordinary skill in the art obtained under the premise of not making creative work all its
His embodiment, broadly falls into the scope of protection of the invention.
The multicenter voice signal synchronizing process that Fig. 1 provides for the embodiment of the present invention, specifically includes following steps:
S101: selected passage, as template passage, generates corresponding speech signal energy envelope template.
The executive agent of the embodiment of the present invention can be any appliance that can be used for processing voice signal.
Described equipment includes but is not limited to: personal computer, smart mobile phone, panel computer, intelligent television, intelligence
Wrist-watch, Intelligent bracelet, Vehicle mounted station, big-and-middle-sized computer, computer cluster, etc..Described execution master
Body is not intended that limitation of the invention.
In embodiments of the present invention, multichannel can be used to gather voice signal, described manifold from same sound source
Road can include template passage and at least one other passage being selected.Described energy envelope template can be
Part or all from the voice signal of template passage extracts, the list in terms of energy envelope
One feature or assemblage characteristic.Except of course that beyond energy envelope, it is also possible to from the voice signal of template passage
Extract the feature of the aspects such as volume, frequency, tone color, waveform shape, as the template for subsequent match.
S102: respectively the voice signal of other passages each is mated with described energy envelope template, to divide
Do not determine the deviant between the voice signal of described other passages each and the voice signal of described template passage.
In embodiments of the present invention, the voice signal of described template passage and the voice of described other passages each are believed
It number is multicenter voice signal to be synchronized.
After generating energy envelope template, the method generating energy envelope template can be used, each other are led to
The process that the voice signal in road is similar to, and then can determine respectively on the voice signal of other passages each
Go out and the voice signal of template passage mutual corresponding part on a timeline, and described mutual corresponding portion
/ deviant on a timeline, for subsequent synchronisation.
S103: according to described deviant, respectively the voice signal of described other passages each is led to described template
The voice signal in road is synchronized.
In embodiments of the present invention, according to described deviant, it may be determined that template passage voice signal and
In the voice signal of other passages each, the time that the voice signal of any two passage differs on a timeline
(can be waveform segment corresponding in the voice signal of any two passage value on a timeline it
Difference), it is thus possible to by way of translating on a timeline and/or cutting out, by the voice letter of template passage
Number and the voice signal of other passages each align on a timeline, to realize that multicenter voice signal synchronizes.
By said method, automatically multicenter voice signal can be synchronized by equipment, thus save
Manpower, improves efficiency, therefore, solves prior art and uses the mode adjusting manually to manifold
Road voice signal is synchronized not only waste of manpower resource, and the problem that efficiency is very low.
In embodiments of the present invention, for above-mentioned steps S101, corresponding speech signal energy envelope mould is generated
Plate, specifically includes: intercepts waveform segment from the voice signal of template passage, calculates described waveform segment
Energy envelope, as the corresponding speech signal energy envelope template of generation.
The waveform segment intercepting can be in the voice signal of template passage, the more significant portion of waveform change
Point, or the part that different wave shape is bigger compared with other parts, etc..Like this, follow-up more hold
Easily coupling, and matching result and synchronized result are relatively reliable.The length of the waveform segment to intercepting for the present invention is simultaneously
Not limiting, general, the length of the waveform segment of intercepting is longer, and subsequent match result is relatively reliable, but
Accordingly, substrate processing time is also longer, under major applications scene, can will intercept waveform segment
Length be set to about 5 seconds.
In addition, when intercepting waveform segment, in addition it is also necessary to the voice signal of other passages each that consideration is estimated and mould
Deviant (hereinafter referred to as estimating maximum deviation value) maximum between the voice signal of plate passage.Assuming that mould
The voice signal of plate passage is leading on a timeline than the voice signal of other passages, and (template passage starts to gather
The time point of voice signal is more Zao than the time point that other passages start voice signal), if then directly leading to from template
The section start of the voice signal in road starts to intercept waveform segment, may not exist in the voice signal of other passages
With the corresponding part of waveform segment intercepting, thus the reliability of subsequent synchronisation result can be affected.
In order to prevent this problem, can not be directly from the beginning of the section start of the voice signal of template passage
Intercept waveform segment, but be to estimate maximum deviation value from the section start of the voice signal of distance template passage
Time point after time point, or this time point, starts to intercept waveform segment.For example it is assumed that estimate maximum
Deviant is 10 seconds, then can start to cut behind the 10th of the voice signal of template passage the second or the 10th second
Take waveform segment.
Further, the voice signal of template passage can be discrete audio digital signals, it is also possible to be even
Continuous analog voice signal.In order to reduce amount of calculation, accelerate to calculate the speed of energy envelope, can be to template
After the voice signal of passage carries out sample decimation, then calculate energy envelope.Therefore, for above-mentioned steps, meter
Calculate the energy envelope of described waveform segment, specifically include: carry out sample decimation to described waveform segment, determine
Go out the first setting quantity sampled point, selected sliding window is slided according to setting means in described waveform segment
Each described sampled point that is dynamic, that comprise in selected sliding window according to sliding process, calculates described corrugated sheet
The energy vectors of section, as the energy envelope of described waveform segment.
Further, selected sliding window is slided according to setting means in described waveform segment, according to cunning
The each described sampled point comprising in described selected sliding window during Dong, calculate the energy of described waveform segment to
Amount, specifically includes: by selected sliding window according to setting sliding step slip m time in described waveform segment,
Generating the m dimension energy vectors of described waveform segment, wherein, described m ties up the i-th dimension in energy vectors
Value is described selected sliding window after i & lt slip, each described sampling comprising in described selected sliding window
The average energy of point, m, i are positive integer, and i is less than or equal to m.
For example, as shown in Figure 2.Axis of abscissas is time shaft, and axis of ordinates is y-axis, and described y-axis can be used
In the size representing volume, it is assumed that by sliding window slip m time in the waveform segment intercepting, the m of generation
Dimension energy vectors is denoted as [x1,x2,x3...,xm], then x1Be the 1st time slide after sliding window in comprise respectively adopt
The average energy of sampling point, x2It is the average energy of each sampled point comprising in the sliding window after sliding for the 2nd time,
x3It is the average energy of each sampled point comprising in the sliding window after sliding for the 3rd time, etc., after the 3rd time
Sliding window omit in fig. 2 and do not represent.
In addition, in order to accelerate to calculate the speed of the average energy of each sampled point that sliding window comprises, determining
After sampled point, following pretreatment can be carried out to each sampled point: by every for continuous print several (every 16 or
Every 8, etc.) sampled value of sampled point does an average computation, by the average sample value that calculates again
It is defined as the sampled value of this several sampled points.Wherein, the sampled value of sampled point can be that this sampled point is at y
Value on axle, the energy of sampled point be equal to this sampled point sampled value square.Based on described pretreatment,
The follow-up energy vectors being calculated described waveform segment again by sliding window.Like this, it is possible not only to accelerate to put down
The calculating speed of equal energy, can also remove the high frequency random perturbation in voice signal.
In actual applications, can use after giving suitable value respectively for each variable element above-mentioned, example
As Fig. 3 is after each variable element above-mentioned selectes one group of value assignment, the voice letter according to template passage
Number, generate the process of energy envelope template, wherein, described template passage is passage 1, and described each other lead to
Road is passage the 2nd, passage the 3rd, passage 4.Described process specifically may comprise steps of:
S301: from the beginning of at the 10th second of the voice signal of passage 1, intercepted length is the corrugated sheet of 5 seconds
Section, in which it is assumed that estimate maximum deviation value to be not more than 10 seconds.
S302: according to the sampling time interval of 1 millisecond, sample decimation is carried out to the waveform segment intercepting, really
Make 5000 sampled points.
S303: from the beginning of the 1st sampled point, once averagely adopts to the sampled value of every 16 sampled points
Sample value calculates, and the average sample value calculating is redefined the sampled value for this 16 sampled point, until complete
Redefining of the sampled value of paired all sampled points.
By performing S303, reach the purpose pre-processing each sampled point.For ease of describing, hereinafter
The each sampled point being re-determined sampled value is referred to as pretreated sampled point.
S304: by the sliding windows of a length of 32 milliseconds according to the sliding step of 16 milliseconds, ripple after sampling
Slide successively in shape fragment 313 times, generate 313 dimension energy vectors, wherein, in this 313 dimension energy vectors
The value of i-th dimension be sliding window after i & lt slip, each pretreated sampling comprising in sliding window
The average energy of point, i is positive integer, and i is less than or equal to 313.
S305: the generation 313 of generation is tieed up the energy envelope of the waveform segment as intercepting for the energy vectors, also
That is, energy envelope template.
In actual repeatedly test, based on above parameter value, the synchronization accuracy to multicenter voice signal
Reaching 100%, the theoretical error of the voice signal that aligns during synchronization is 16 milliseconds, and measurement error is at 100 milliseconds
Left and right.
In embodiments of the present invention, for above-mentioned steps S102, can be for the language of other passages described in each
Tone signal, determines the skew between the voice signal of other passages described and the voice signal of described template passage
Value, as shown in Figure 4, specifically includes following steps:
S401: from the beginning of the section start of the voice signal of other passages described, uses from described template passage
The method that the waveform segment intercepting in voice signal is used, intercepts the second setting quantity, length successively
The waveform segment identical with the length of the waveform segment intercepting from the voice signal of described template passage.
S402: use the sample decimation of the waveform segment of described template passage and energy envelope computational methods to divide
The other waveform segment to described second setting quantity carries out sample decimation, and calculates corresponding energy envelope.
S403: in the described second waveform segment setting quantity, determine corresponding energy envelope and from institute
State the waveform segment that the energy envelope of the waveform segment intercepting in the voice signal of template passage mates most.
In embodiments of the present invention, can be corresponding by the waveform segment intercepting from the voice signal of template passage
M dimension energy vectors be denoted as [x1,x2,...,xm], will set in the individual waveform segment of quantity described second,
N-th waveform segment corresponding m dimension energy vectors is denoted as [yn1,yn2,...,ynm], wherein, n is equal to described
Second setting quantity;
Calculate [yn1,yn2,...,ynm] and kn×[x1,x2,...,xmDistance between], wherein, knFor energy gain
Coefficient,
The minimum corresponding waveform segment of distance that will calculate, be defined as corresponding energy envelope with from described
The waveform segment that the energy envelope of the waveform segment intercepting in the voice signal of template passage mates most.
It should be noted that in actual applications, [yn1,yn2,...,ynm] and kn×[x1,x2,...,xmBetween]
Distance can be measured based on various ways, and described metric form includes but is not limited to: based on mean square deviation or
Euclidean distances etc. are measured.
For example, when described distance being measured based on mean square deviation.Calculate [yn1,yn2,...,ynm] with
kn×[x1,x2,...,xmDistance between], specifically may include that employing formula
Calculate [yn1,yn2,...,ynm] and kn×[x1,x2,...,xmMean square deviation between], as [yn1,yn2,...,ynm] with
kn×[x1,x2,...,xmDistance between].
At this to knIllustrate, due to even for waveform segment corresponding in multicenter voice signal,
Owing to the volume of these waveform segments is likely to differ relatively big each other, thus cause these waveform segments
The amplitude difference of energy envelope is relatively big, and therefore, matching degree may be relatively low on the contrary.In order to solve this problem,
K can be used on the basis of the amplitude of the energy envelope of the voice signal of template passagenLanguage by other passages each
The amplitude of the energy envelope of tone signal, the amplitude of the energy envelope being adjusted to voice signal with template passage is big
The consistent level of body, thus, the energy envelope of the follow-up voice signal according to template passage, could be relatively reliable
Corresponding waveform segment is determined on the voice signal of other passages each in ground.
S404: determine waveform segment and the voice signal from described template passage that described energy envelope mates most
The waveform segment of middle intercepting difference on a timeline, as voice signal and the described mould of other passages described
Deviant between the voice signal of plate passage.
Wherein, described difference is the section start of the waveform segment that described energy envelope mates most and leads to from template
The section start of the waveform segment intercepting in the voice signal in road difference on a timeline.For example, Fig. 5 is one
Deviant schematic diagram between the voice signal of the template passage in the voice signal of other passages individual and Fig. 2.
It can be seen that, it is assumed that on the voice signal of this other passages, waveform segment that described energy envelope mates most
Section start value on a timeline is t, from the voice signal of template passage intercept waveform segment when
Section start value on countershaft is γ, then the waveform segment that described energy envelope mates most other passages corresponding
Voice signal and the voice signal of described template passage between deviant τ be: τ=t-γ.
In embodiments of the present invention, it is clear that the passage corresponding to bigger deviant is opened more early, then minimum
Passage corresponding to deviant open the latest.Therefore, compared to the passage corresponding to minimum deviant,
The voice signal of every other passage should after section start starts to puncture a part, could with minimum
The voice signal alignment of the passage corresponding to deviant.
According to above-mentioned analysis, for above-mentioned steps S103, according to described deviant, by described other passages each
Voice signal be synchronized with the voice signal of described template passage, described method specifically includes: determine exist
Deviant minimum in the corresponding deviant of voice signal of described other passages each, for described in each other
The voice signal of passage, performs following operation: from the beginning of the section start of described voice signal, puncture length
For the waveform segment of the corresponding deviant of described voice signal and the difference of the deviant of described minimum, and will with cut
Described voice signal after sanction voice signal corresponding with the deviant of described minimum aligns.
Certainly, except on the basis of the passage corresponding to minimum deviant, alignment multicenter voice signal it
Outward, can in multicenter voice signal other arbitrary passage voice signal on the basis of, align multichannel
Voice signal.For example, it is possible on the basis of the voice signal of template passage, according to the voice of other passages each
The corresponding deviant of signal, the voice signal translational length on a timeline by other passages each is deviant
The voice signal of other passages each can be alignd by distance with the voice signal of template passage.Wherein, offset
When value is for positive number, to left, when deviant is negative, to right translation, when deviant is 0, it is not necessary to flat
Move.
In actual applications, parallel processing can be carried out to multicenter voice signal, deviant is all determined
It is synchronized after out again.Fig. 6, for according to described above, carries out parallel processing to multicenter voice signal
And the process sketch of synchronization.Wherein, the passage of the top is template passage, and the lower section of template passage is respectively
It is 3 other passages.When the voice signal for each other passage mates, can successively by from
The energy envelope of each waveform segment intercepting in this other passages, the voice signal intercepting with template passage
The energy envelope of waveform segment carries out mating and (can often intercept a waveform segment to mate once, it is also possible to cut
Take multiple waveform segment to mate respectively again, Fig. 6 have employed former mode), for the ease of describing,
This process can be referred to as mating scanning.Mean square deviation sequence, mean square deviation sequence can be generated by mating scanning
The corresponding waveform segment of mean square deviation of middle minimum i.e. can be identified as: in this other passages and from template passage
The corresponding waveform segment of waveform segment that voice signal intercepts.And then may determine that the voice letter of this other passages
Number and the voice signal of template passage between deviant, and be synchronized based on this deviant.
The multicenter voice signal synchronizing method providing for the embodiment of the present invention above, based on same thinking,
The embodiment of the present invention also provides corresponding multicenter voice signal sychronisation, as shown in Figure 7.
The multicenter voice signal sychronisation structural representation that Fig. 7 provides for the embodiment of the present invention, specifically wraps
Include:
Generation module 701, is used for selecting passage as template passage, generates corresponding speech signal energy bag
Network template;
Determining module 702, for entering the voice signal of other passages each with described energy envelope template respectively
Row coupling, with the voice signal of the voice signal with described template passage that determine described other passages each respectively it
Between deviant;
Synchronization module 703, is used for according to described deviant, the voice signal by described other passages each respectively
It is synchronized with the voice signal of described template passage.
Described generation module 701 is specifically for from the voice signal of described template passage, intercepting waveform segment, right
Described waveform segment carries out sample decimation, determines the first setting quantity sampled point, exists selected sliding window
Described waveform segment slides according to setting means, comprises in selected sliding window according to sliding process
Each described sampled point, calculates the energy vectors of described waveform segment, as the corresponding voice signal energy generating
Amount envelope template.
Described determining module 702 specifically for, from the beginning of the section start of the voice signal of other passages described,
Use the method that used of waveform segment intercepting from the voice signal of described template passage, intercept the successively
Two setting quantity, length and the waveform segment intercepting from the voice signal of described template passage length
Identical waveform segment;
The sample decimation and the energy envelope computational methods that use the waveform segment of described template passage are right respectively
The waveform segment of described second setting quantity carries out sample decimation, and calculates corresponding energy envelope;
In the described second waveform segment setting quantity, determine corresponding energy envelope and from described template
The waveform segment that the energy envelope of the waveform segment intercepting in the voice signal of passage mates most;
Determine that the waveform segment that described energy envelope mates most cuts with from the voice signal of described template passage
The waveform segment taking difference on a timeline, voice signal and described template as other passages described lead to
Deviant between the voice signal in road.
Described determining module 702 is specifically for by selected sliding window according to setting sliding step at described waveform
Slip m time in fragment, generates the m dimension energy vectors of described waveform segment, and wherein, described m ties up energy
The value of the i-th dimension in Xiang Liang is described selected sliding window after i & lt slip, in described selected sliding window
The average energy of each described sampled point comprising, m, i are positive integer, and i is less than or equal to m.
Described determining module 702 is specifically for the waveform that will intercept from the voice signal of described template passage
Fragment corresponding m dimension energy vectors is denoted as [x1,x2,...,xm], the waveform of quantity will be set described second
In fragment, n-th waveform segment corresponding m dimension energy vectors is denoted as [yn1,yn2,...,ynm], wherein, n
Equal to described second setting quantity;
Calculate [yn1,yn2,...,ynm] and kn×[x1,x2,...,xmDistance between], wherein, knFor energy gain
Coefficient,
The minimum corresponding waveform segment of distance that will calculate, be defined as corresponding energy envelope with from described
The waveform segment that the energy envelope of the waveform segment intercepting in the voice signal of template passage mates most.
Described determining module 702 is specifically for using formulaCalculate
[yn1,yn2,...,ynm] and kn×[x1,x2,...,xmMean square deviation between], as [yn1,yn2,...,ynm] with
kn×[x1,x2,...,xmDistance between].
Described synchronization module 703 is specifically for determining that the voice signal at described other passages each is corresponding partially
Deviant minimum in shifting value, for the voice signal of other passages described in each, performs following operation: from
The section start of described voice signal starts, puncture the corresponding deviant of a length of described voice signal with described
The waveform segment of the difference of minimum deviant, and by with cut out after described voice signal and described minimum inclined
The corresponding voice signal alignment of shifting value.
Concrete above-mentioned device as shown in Figure 7 may be located at can be used for processing voice signal arbitrary
On equipment.
The embodiment of the present invention can realize above-mentioned phase by hardware processor (hardware processor)
Close functional module.
The multicenter voice signal synchronizing method of embodiment of the present invention offer and device, by intercepting at each passage
The energy envelope template that the waveform segment that the energy envelope of waveform segment and template passage intercept generates is carried out
Join, determine the deviant of each passage and template channel speech signal, led to template by each channel speech signal
The intercepting of road voice signal deviant, it is achieved the synchronization of multicenter voice signal, thus save manpower, carry
High efficiency.Solve prior art and use the mode adjusting manually, multicenter voice signal is carried out
Synchronize not only waste of manpower resource, and the problem that efficiency is very low.
Device embodiment described above is only schematically, wherein said illustrates as separating component
Unit can be or may not be physically separate, as the parts that unit shows can be or also
Can not be physical location, i.e. may be located at a place, or also can be distributed on multiple NE.
Some or all of module therein can be selected according to the actual needs to realize the mesh of the present embodiment scheme
's.Those of ordinary skill in the art, in the case of not paying performing creative labour, are i.e. appreciated that and implement.
Through the above description of the embodiments, those skilled in the art is it can be understood that arrive each enforcement
Mode can add the mode of required general hardware platform by software and realize, naturally it is also possible to pass through hardware.
Based on such understanding, the part that prior art is contributed by technique scheme substantially in other words is permissible
Embodying with the form of software product, this computer software product can be stored in computer-readable storage medium
In matter, such as ROM/RAM, magnetic disc, CD etc., including some instructions are with so that a computer equipment
(can be personal computer, server, or the network equipment etc.) performs each embodiment or embodiment
The method described in some part.
Last it is noted that above example is only in order to illustrating technical scheme, rather than it is limited
System;Although the present invention being described in detail with reference to previous embodiment, those of ordinary skill in the art
It is understood that it still can the technical scheme described in foregoing embodiments be modified, or to it
Middle part technical characteristic carries out equivalent;And these modifications or replacement, do not make appropriate technical solution
Essence departs from the spirit and scope of various embodiments of the present invention technical scheme.