CN105989846A

CN105989846A - Multi-channel speech signal synchronization method and device

Info

Publication number: CN105989846A
Application number: CN201510321268.XA
Authority: CN
Inventors: 王育军
Original assignee: Leshi Zhixin Electronic Technology Tianjin Co Ltd
Current assignee: Leshi Zhixin Electronic Technology Tianjin Co Ltd
Priority date: 2015-06-12
Filing date: 2015-06-12
Publication date: 2016-10-05
Anticipated expiration: 2035-06-12
Also published as: CN105989846B

Abstract

The invention provides a multi-channel speech signal synchronization method and a multi-channel speech signal synchronization device. The method includes the following steps that: a channel is selected as a template channel, and corresponding speech signal energy envelope templates are generated; the speech signals of other channels are matched with the energy envelope templates, so that offset values between the speech signal of the other channels and the speech signals of the template channel can be determined; and the speech signals of the other channels and the speech signals of the template channel are synchronized according to the offset values. With the multi-channel speech signal synchronization method and the multi-channel speech signal synchronization device of the invention adopted, the problems of human resource waste and low efficiency of a method in the prior art according to which a manual adjustment mode is adopted to synchronize multi-channel speech signals can be solved.

Description

A kind of multicenter voice signal synchronizing method and device

Technical field

The present embodiments relate to field of voice signal, particularly relate to a kind of multicenter voice signal synchronization Method and device.

Background technology

At present, in field of voice signal, it is often necessary to gather voice signal respectively from multichannel, with In the research of the aspect such as anti-noise, speech recognition, wherein, each passage comprising in described multichannel To be that the voice signal that any voice capture device provides inputs or output channel.

But, in actual applications, the voice signal collecting respectively from multichannel is (hereinafter referred to as: many Channel speech signal) may not be (that is, on a timeline may be unjustified) synchronizing each other. For example, in order to study in far field and the perception relation to same sound source for the near field, can be with a voice collecting Equipment (such as mobile phone) is recorded in the place nearer apart from this sound source, and uses another voice capture device (such as microphone) is recording apart from this sound source place farther out, but, due to this mobile phone and microphone May will not start simultaneously at recording, therefore, from the channel acquisition of the passage of mobile phone and microphone to voice letter It number is probably nonsynchronous.And the nonsynchronous multicenter voice signal in all as above examples is used for follow-up grinding Study carefully, the reliability of result of study may be reduced.

For the problems referred to above, in the prior art, the mode that general employing adjusts manually, to asynchronous Multicenter voice signal be synchronized, concrete, researcher can observe in multicenter voice signal respectively The waveform of the voice signal of each passage, then multi channel signals is manually synchronized by the shape according to waveform. But, this synchronous method not only waste of manpower resource, and efficiency is very low.

Content of the invention

The embodiment of the present invention provides a kind of multicenter voice signal synchronizing method and device, in order to solve existing skill Art uses the mode adjusting manually, is synchronized not only waste of manpower resource to multicenter voice signal, And the problem that Efficiency and accuracy is very low.

The embodiment of the present invention provides a kind of multicenter voice signal synchronizing method, comprising:

Selected passage, as template passage, generates corresponding speech signal energy envelope template；

Respectively the energy envelope of the voice signal of other passages each is mated with described energy envelope template, Inclined with determine between the voice signal of described other passages each and the voice signal of described template passage respectively Shifting value；

According to described deviant, the language by the voice signal of described other passages each and described template passage respectively Tone signal is synchronized.

The embodiment of the present invention also provides a kind of multicenter voice signal sychronisation, comprising:

Generation module, is used for selecting passage as template passage, generates corresponding speech signal energy envelope mould Plate；

Determining module, for carrying out the voice signal of other passages each and described energy envelope template respectively Join, to determine respectively between the voice signal of described other passages each and the voice signal of described template passage Deviant；

Synchronization module, for according to described deviant, respectively by voice signal and the institute of described other passages each The voice signal stating template passage is synchronized.

The multicenter voice signal synchronizing method of embodiment of the present invention offer and device, by intercepting at each passage The energy envelope template that the waveform segment that the energy envelope of waveform segment and template passage intercept generates is carried out Join, determine the deviant of each passage and template channel speech signal, led to template by each channel speech signal The intercepting of road voice signal deviant, it is achieved the synchronization of multicenter voice signal, thus save manpower, carry High efficiency.Solve prior art and use the mode adjusting manually, multicenter voice signal is carried out Synchronize not only waste of manpower resource, and the problem that efficiency is very low.

Brief description

In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, below will be to enforcement In example or description of the prior art, the accompanying drawing of required use is briefly described, it should be apparent that, retouch below Accompanying drawing in stating is some embodiments of the present invention, for those of ordinary skill in the art, is not paying On the premise of creative work, other accompanying drawing can also be obtained according to these accompanying drawings.

The multicenter voice signal synchronizing process that Fig. 1 provides for the embodiment of the present invention；

Fig. 2 provides for the embodiment of the present invention, by sliding selected sliding window in waveform segment, calculates The schematic diagram of the energy vectors of this waveform segment；

Fig. 3 provides for the embodiment of the present invention, in actual applications, uses selected parameter value, according to mould The voice signal of plate passage, generates the process of energy envelope template；

Fig. 4 provides for the embodiment of the present invention, for the voice signal of each other passage, determines these other The process of the deviant between the voice signal of the voice signal of passage and template passage；

Fig. 5 provides for the embodiment of the present invention, the voice of the voice signal of other passages and template passage Deviant schematic diagram between signal；

Fig. 6 provides for the embodiment of the present invention, carries out parallel processing and synchronization to multicenter voice signal Process sketch；

The multicenter voice signal sychronisation structural representation that Fig. 7 provides for the embodiment of the present invention.

Detailed description of the invention

Purpose, technical scheme and advantage for making the embodiment of the present invention are clearer, below in conjunction with the present invention Accompanying drawing in embodiment, is clearly and completely described to the technical scheme in the embodiment of the present invention, it is clear that Described embodiment is a part of embodiment of the present invention, rather than whole embodiments.Based in the present invention Embodiment, those of ordinary skill in the art obtained under the premise of not making creative work all its His embodiment, broadly falls into the scope of protection of the invention.

The multicenter voice signal synchronizing process that Fig. 1 provides for the embodiment of the present invention, specifically includes following steps:

S101: selected passage, as template passage, generates corresponding speech signal energy envelope template.

The executive agent of the embodiment of the present invention can be any appliance that can be used for processing voice signal. Described equipment includes but is not limited to: personal computer, smart mobile phone, panel computer, intelligent television, intelligence Wrist-watch, Intelligent bracelet, Vehicle mounted station, big-and-middle-sized computer, computer cluster, etc..Described execution master Body is not intended that limitation of the invention.

In embodiments of the present invention, multichannel can be used to gather voice signal, described manifold from same sound source Road can include template passage and at least one other passage being selected.Described energy envelope template can be Part or all from the voice signal of template passage extracts, the list in terms of energy envelope One feature or assemblage characteristic.Except of course that beyond energy envelope, it is also possible to from the voice signal of template passage Extract the feature of the aspects such as volume, frequency, tone color, waveform shape, as the template for subsequent match.

S102: respectively the voice signal of other passages each is mated with described energy envelope template, to divide Do not determine the deviant between the voice signal of described other passages each and the voice signal of described template passage.

In embodiments of the present invention, the voice signal of described template passage and the voice of described other passages each are believed It number is multicenter voice signal to be synchronized.

After generating energy envelope template, the method generating energy envelope template can be used, each other are led to The process that the voice signal in road is similar to, and then can determine respectively on the voice signal of other passages each Go out and the voice signal of template passage mutual corresponding part on a timeline, and described mutual corresponding portion / deviant on a timeline, for subsequent synchronisation.

S103: according to described deviant, respectively the voice signal of described other passages each is led to described template The voice signal in road is synchronized.

In embodiments of the present invention, according to described deviant, it may be determined that template passage voice signal and In the voice signal of other passages each, the time that the voice signal of any two passage differs on a timeline (can be waveform segment corresponding in the voice signal of any two passage value on a timeline it Difference), it is thus possible to by way of translating on a timeline and/or cutting out, by the voice letter of template passage Number and the voice signal of other passages each align on a timeline, to realize that multicenter voice signal synchronizes.

By said method, automatically multicenter voice signal can be synchronized by equipment, thus save Manpower, improves efficiency, therefore, solves prior art and uses the mode adjusting manually to manifold Road voice signal is synchronized not only waste of manpower resource, and the problem that efficiency is very low.

In embodiments of the present invention, for above-mentioned steps S101, corresponding speech signal energy envelope mould is generated Plate, specifically includes: intercepts waveform segment from the voice signal of template passage, calculates described waveform segment Energy envelope, as the corresponding speech signal energy envelope template of generation.

The waveform segment intercepting can be in the voice signal of template passage, the more significant portion of waveform change Point, or the part that different wave shape is bigger compared with other parts, etc..Like this, follow-up more hold Easily coupling, and matching result and synchronized result are relatively reliable.The length of the waveform segment to intercepting for the present invention is simultaneously Not limiting, general, the length of the waveform segment of intercepting is longer, and subsequent match result is relatively reliable, but Accordingly, substrate processing time is also longer, under major applications scene, can will intercept waveform segment Length be set to about 5 seconds.

In addition, when intercepting waveform segment, in addition it is also necessary to the voice signal of other passages each that consideration is estimated and mould Deviant (hereinafter referred to as estimating maximum deviation value) maximum between the voice signal of plate passage.Assuming that mould The voice signal of plate passage is leading on a timeline than the voice signal of other passages, and (template passage starts to gather The time point of voice signal is more Zao than the time point that other passages start voice signal), if then directly leading to from template The section start of the voice signal in road starts to intercept waveform segment, may not exist in the voice signal of other passages With the corresponding part of waveform segment intercepting, thus the reliability of subsequent synchronisation result can be affected.

In order to prevent this problem, can not be directly from the beginning of the section start of the voice signal of template passage Intercept waveform segment, but be to estimate maximum deviation value from the section start of the voice signal of distance template passage Time point after time point, or this time point, starts to intercept waveform segment.For example it is assumed that estimate maximum Deviant is 10 seconds, then can start to cut behind the 10th of the voice signal of template passage the second or the 10th second Take waveform segment.

Further, the voice signal of template passage can be discrete audio digital signals, it is also possible to be even Continuous analog voice signal.In order to reduce amount of calculation, accelerate to calculate the speed of energy envelope, can be to template After the voice signal of passage carries out sample decimation, then calculate energy envelope.Therefore, for above-mentioned steps, meter Calculate the energy envelope of described waveform segment, specifically include: carry out sample decimation to described waveform segment, determine Go out the first setting quantity sampled point, selected sliding window is slided according to setting means in described waveform segment Each described sampled point that is dynamic, that comprise in selected sliding window according to sliding process, calculates described corrugated sheet The energy vectors of section, as the energy envelope of described waveform segment.

Further, selected sliding window is slided according to setting means in described waveform segment, according to cunning The each described sampled point comprising in described selected sliding window during Dong, calculate the energy of described waveform segment to Amount, specifically includes: by selected sliding window according to setting sliding step slip m time in described waveform segment, Generating the m dimension energy vectors of described waveform segment, wherein, described m ties up the i-th dimension in energy vectors Value is described selected sliding window after i & lt slip, each described sampling comprising in described selected sliding window The average energy of point, m, i are positive integer, and i is less than or equal to m.

For example, as shown in Figure 2.Axis of abscissas is time shaft, and axis of ordinates is y-axis, and described y-axis can be used In the size representing volume, it is assumed that by sliding window slip m time in the waveform segment intercepting, the m of generation Dimension energy vectors is denoted as [x₁,x₂,x₃...,x_m], then x₁Be the 1st time slide after sliding window in comprise respectively adopt The average energy of sampling point, x₂It is the average energy of each sampled point comprising in the sliding window after sliding for the 2nd time, x₃It is the average energy of each sampled point comprising in the sliding window after sliding for the 3rd time, etc., after the 3rd time Sliding window omit in fig. 2 and do not represent.

In addition, in order to accelerate to calculate the speed of the average energy of each sampled point that sliding window comprises, determining After sampled point, following pretreatment can be carried out to each sampled point: by every for continuous print several (every 16 or Every 8, etc.) sampled value of sampled point does an average computation, by the average sample value that calculates again It is defined as the sampled value of this several sampled points.Wherein, the sampled value of sampled point can be that this sampled point is at y Value on axle, the energy of sampled point be equal to this sampled point sampled value square.Based on described pretreatment, The follow-up energy vectors being calculated described waveform segment again by sliding window.Like this, it is possible not only to accelerate to put down The calculating speed of equal energy, can also remove the high frequency random perturbation in voice signal.

In actual applications, can use after giving suitable value respectively for each variable element above-mentioned, example As Fig. 3 is after each variable element above-mentioned selectes one group of value assignment, the voice letter according to template passage Number, generate the process of energy envelope template, wherein, described template passage is passage 1, and described each other lead to Road is passage the 2nd, passage the 3rd, passage 4.Described process specifically may comprise steps of:

S301: from the beginning of at the 10th second of the voice signal of passage 1, intercepted length is the corrugated sheet of 5 seconds Section, in which it is assumed that estimate maximum deviation value to be not more than 10 seconds.

S302: according to the sampling time interval of 1 millisecond, sample decimation is carried out to the waveform segment intercepting, really Make 5000 sampled points.

S303: from the beginning of the 1st sampled point, once averagely adopts to the sampled value of every 16 sampled points Sample value calculates, and the average sample value calculating is redefined the sampled value for this 16 sampled point, until complete Redefining of the sampled value of paired all sampled points.

By performing S303, reach the purpose pre-processing each sampled point.For ease of describing, hereinafter The each sampled point being re-determined sampled value is referred to as pretreated sampled point.

S304: by the sliding windows of a length of 32 milliseconds according to the sliding step of 16 milliseconds, ripple after sampling Slide successively in shape fragment 313 times, generate 313 dimension energy vectors, wherein, in this 313 dimension energy vectors The value of i-th dimension be sliding window after i & lt slip, each pretreated sampling comprising in sliding window The average energy of point, i is positive integer, and i is less than or equal to 313.

S305: the generation 313 of generation is tieed up the energy envelope of the waveform segment as intercepting for the energy vectors, also That is, energy envelope template.

In actual repeatedly test, based on above parameter value, the synchronization accuracy to multicenter voice signal Reaching 100%, the theoretical error of the voice signal that aligns during synchronization is 16 milliseconds, and measurement error is at 100 milliseconds Left and right.

In embodiments of the present invention, for above-mentioned steps S102, can be for the language of other passages described in each Tone signal, determines the skew between the voice signal of other passages described and the voice signal of described template passage Value, as shown in Figure 4, specifically includes following steps:

S401: from the beginning of the section start of the voice signal of other passages described, uses from described template passage The method that the waveform segment intercepting in voice signal is used, intercepts the second setting quantity, length successively The waveform segment identical with the length of the waveform segment intercepting from the voice signal of described template passage.

S402: use the sample decimation of the waveform segment of described template passage and energy envelope computational methods to divide The other waveform segment to described second setting quantity carries out sample decimation, and calculates corresponding energy envelope.

S403: in the described second waveform segment setting quantity, determine corresponding energy envelope and from institute State the waveform segment that the energy envelope of the waveform segment intercepting in the voice signal of template passage mates most.

In embodiments of the present invention, can be corresponding by the waveform segment intercepting from the voice signal of template passage M dimension energy vectors be denoted as [x₁,x₂,...,x_m], will set in the individual waveform segment of quantity described second, N-th waveform segment corresponding m dimension energy vectors is denoted as [y_n1,y_n2,...,y_nm], wherein, n is equal to described Second setting quantity；

Calculate [y_n1,y_n2,...,y_nm] and k_n×[x₁,x₂,...,x_mDistance between], wherein, k_nFor energy gain Coefficient,

k_{n} = \frac{Σ_{i = 1}^{m} x_{i} y_{ni}}{Σ_{i = 1}^{m} {x_{i}}^{2}};

The minimum corresponding waveform segment of distance that will calculate, be defined as corresponding energy envelope with from described The waveform segment that the energy envelope of the waveform segment intercepting in the voice signal of template passage mates most.

It should be noted that in actual applications, [y_n1,y_n2,...,y_nm] and k_n×[x₁,x₂,...,x_mBetween] Distance can be measured based on various ways, and described metric form includes but is not limited to: based on mean square deviation or Euclidean distances etc. are measured.

For example, when described distance being measured based on mean square deviation.Calculate [y_n1,y_n2,...,y_nm] with k_n×[x₁,x₂,...,x_mDistance between], specifically may include that employing formula Calculate [y_n1,y_n2,...,y_nm] and k_n×[x₁,x₂,...,x_mMean square deviation between], as [y_n1,y_n2,...,y_nm] with k_n×[x₁,x₂,...,x_mDistance between].

At this to k_nIllustrate, due to even for waveform segment corresponding in multicenter voice signal, Owing to the volume of these waveform segments is likely to differ relatively big each other, thus cause these waveform segments The amplitude difference of energy envelope is relatively big, and therefore, matching degree may be relatively low on the contrary.In order to solve this problem, K can be used on the basis of the amplitude of the energy envelope of the voice signal of template passage_nLanguage by other passages each The amplitude of the energy envelope of tone signal, the amplitude of the energy envelope being adjusted to voice signal with template passage is big The consistent level of body, thus, the energy envelope of the follow-up voice signal according to template passage, could be relatively reliable Corresponding waveform segment is determined on the voice signal of other passages each in ground.

S404: determine waveform segment and the voice signal from described template passage that described energy envelope mates most The waveform segment of middle intercepting difference on a timeline, as voice signal and the described mould of other passages described Deviant between the voice signal of plate passage.

Wherein, described difference is the section start of the waveform segment that described energy envelope mates most and leads to from template The section start of the waveform segment intercepting in the voice signal in road difference on a timeline.For example, Fig. 5 is one Deviant schematic diagram between the voice signal of the template passage in the voice signal of other passages individual and Fig. 2. It can be seen that, it is assumed that on the voice signal of this other passages, waveform segment that described energy envelope mates most Section start value on a timeline is t, from the voice signal of template passage intercept waveform segment when Section start value on countershaft is γ, then the waveform segment that described energy envelope mates most other passages corresponding Voice signal and the voice signal of described template passage between deviant τ be: τ=t-γ.

In embodiments of the present invention, it is clear that the passage corresponding to bigger deviant is opened more early, then minimum Passage corresponding to deviant open the latest.Therefore, compared to the passage corresponding to minimum deviant, The voice signal of every other passage should after section start starts to puncture a part, could with minimum The voice signal alignment of the passage corresponding to deviant.

According to above-mentioned analysis, for above-mentioned steps S103, according to described deviant, by described other passages each Voice signal be synchronized with the voice signal of described template passage, described method specifically includes: determine exist Deviant minimum in the corresponding deviant of voice signal of described other passages each, for described in each other The voice signal of passage, performs following operation: from the beginning of the section start of described voice signal, puncture length For the waveform segment of the corresponding deviant of described voice signal and the difference of the deviant of described minimum, and will with cut Described voice signal after sanction voice signal corresponding with the deviant of described minimum aligns.

Certainly, except on the basis of the passage corresponding to minimum deviant, alignment multicenter voice signal it Outward, can in multicenter voice signal other arbitrary passage voice signal on the basis of, align multichannel Voice signal.For example, it is possible on the basis of the voice signal of template passage, according to the voice of other passages each The corresponding deviant of signal, the voice signal translational length on a timeline by other passages each is deviant The voice signal of other passages each can be alignd by distance with the voice signal of template passage.Wherein, offset When value is for positive number, to left, when deviant is negative, to right translation, when deviant is 0, it is not necessary to flat Move.

In actual applications, parallel processing can be carried out to multicenter voice signal, deviant is all determined It is synchronized after out again.Fig. 6, for according to described above, carries out parallel processing to multicenter voice signal And the process sketch of synchronization.Wherein, the passage of the top is template passage, and the lower section of template passage is respectively It is 3 other passages.When the voice signal for each other passage mates, can successively by from The energy envelope of each waveform segment intercepting in this other passages, the voice signal intercepting with template passage The energy envelope of waveform segment carries out mating and (can often intercept a waveform segment to mate once, it is also possible to cut Take multiple waveform segment to mate respectively again, Fig. 6 have employed former mode), for the ease of describing, This process can be referred to as mating scanning.Mean square deviation sequence, mean square deviation sequence can be generated by mating scanning The corresponding waveform segment of mean square deviation of middle minimum i.e. can be identified as: in this other passages and from template passage The corresponding waveform segment of waveform segment that voice signal intercepts.And then may determine that the voice letter of this other passages Number and the voice signal of template passage between deviant, and be synchronized based on this deviant.

The multicenter voice signal synchronizing method providing for the embodiment of the present invention above, based on same thinking, The embodiment of the present invention also provides corresponding multicenter voice signal sychronisation, as shown in Figure 7.

The multicenter voice signal sychronisation structural representation that Fig. 7 provides for the embodiment of the present invention, specifically wraps Include:

Generation module 701, is used for selecting passage as template passage, generates corresponding speech signal energy bag Network template；

Determining module 702, for entering the voice signal of other passages each with described energy envelope template respectively Row coupling, with the voice signal of the voice signal with described template passage that determine described other passages each respectively it Between deviant；

Synchronization module 703, is used for according to described deviant, the voice signal by described other passages each respectively It is synchronized with the voice signal of described template passage.

Described generation module 701 is specifically for from the voice signal of described template passage, intercepting waveform segment, right Described waveform segment carries out sample decimation, determines the first setting quantity sampled point, exists selected sliding window Described waveform segment slides according to setting means, comprises in selected sliding window according to sliding process Each described sampled point, calculates the energy vectors of described waveform segment, as the corresponding voice signal energy generating Amount envelope template.

Described determining module 702 specifically for, from the beginning of the section start of the voice signal of other passages described, Use the method that used of waveform segment intercepting from the voice signal of described template passage, intercept the successively Two setting quantity, length and the waveform segment intercepting from the voice signal of described template passage length Identical waveform segment；

The sample decimation and the energy envelope computational methods that use the waveform segment of described template passage are right respectively The waveform segment of described second setting quantity carries out sample decimation, and calculates corresponding energy envelope；

In the described second waveform segment setting quantity, determine corresponding energy envelope and from described template The waveform segment that the energy envelope of the waveform segment intercepting in the voice signal of passage mates most；

Determine that the waveform segment that described energy envelope mates most cuts with from the voice signal of described template passage The waveform segment taking difference on a timeline, voice signal and described template as other passages described lead to Deviant between the voice signal in road.

Described determining module 702 is specifically for by selected sliding window according to setting sliding step at described waveform Slip m time in fragment, generates the m dimension energy vectors of described waveform segment, and wherein, described m ties up energy The value of the i-th dimension in Xiang Liang is described selected sliding window after i & lt slip, in described selected sliding window The average energy of each described sampled point comprising, m, i are positive integer, and i is less than or equal to m.

Described determining module 702 is specifically for the waveform that will intercept from the voice signal of described template passage Fragment corresponding m dimension energy vectors is denoted as [x₁,x₂,...,x_m], the waveform of quantity will be set described second In fragment, n-th waveform segment corresponding m dimension energy vectors is denoted as [y_n1,y_n2,...,y_nm], wherein, n Equal to described second setting quantity；

k_{n} = \frac{Σ_{i = 1}^{m} x_{i} y_{ni}}{Σ_{i = 1}^{m} {x_{i}}^{2}};

Described determining module 702 is specifically for using formulaCalculate [y_n1,y_n2,...,y_nm] and k_n×[x₁,x₂,...,x_mMean square deviation between], as [y_n1,y_n2,...,y_nm] with k_n×[x₁,x₂,...,x_mDistance between].

Described synchronization module 703 is specifically for determining that the voice signal at described other passages each is corresponding partially Deviant minimum in shifting value, for the voice signal of other passages described in each, performs following operation: from The section start of described voice signal starts, puncture the corresponding deviant of a length of described voice signal with described The waveform segment of the difference of minimum deviant, and by with cut out after described voice signal and described minimum inclined The corresponding voice signal alignment of shifting value.

Concrete above-mentioned device as shown in Figure 7 may be located at can be used for processing voice signal arbitrary On equipment.

The embodiment of the present invention can realize above-mentioned phase by hardware processor (hardware processor) Close functional module.

Device embodiment described above is only schematically, wherein said illustrates as separating component Unit can be or may not be physically separate, as the parts that unit shows can be or also Can not be physical location, i.e. may be located at a place, or also can be distributed on multiple NE. Some or all of module therein can be selected according to the actual needs to realize the mesh of the present embodiment scheme 's.Those of ordinary skill in the art, in the case of not paying performing creative labour, are i.e. appreciated that and implement.

Through the above description of the embodiments, those skilled in the art is it can be understood that arrive each enforcement Mode can add the mode of required general hardware platform by software and realize, naturally it is also possible to pass through hardware. Based on such understanding, the part that prior art is contributed by technique scheme substantially in other words is permissible Embodying with the form of software product, this computer software product can be stored in computer-readable storage medium In matter, such as ROM/RAM, magnetic disc, CD etc., including some instructions are with so that a computer equipment (can be personal computer, server, or the network equipment etc.) performs each embodiment or embodiment The method described in some part.

Last it is noted that above example is only in order to illustrating technical scheme, rather than it is limited System；Although the present invention being described in detail with reference to previous embodiment, those of ordinary skill in the art It is understood that it still can the technical scheme described in foregoing embodiments be modified, or to it Middle part technical characteristic carries out equivalent；And these modifications or replacement, do not make appropriate technical solution Essence departs from the spirit and scope of various embodiments of the present invention technical scheme.

Claims

1. a multicenter voice signal synchronizing method, it is characterised in that include:

2. method according to claim 1, it is characterised in that generate corresponding speech signal energy Envelope template, specifically includes:

Waveform segment is intercepted from the voice signal of described template passage；

Sample decimation is carried out to described waveform segment, determines the first setting quantity sampled point；

Selected sliding window is slided according to setting means in described waveform segment, according to sliding process The each described sampled point comprising in selected sliding window, calculates the energy vectors of described waveform segment, as generation Corresponding speech signal energy envelope template.

3. the method according to right 2 requires, it is characterised in that for other passages described in each Voice signal, as follows, determines the voice signal of other passages described and the language of described template passage Deviant between tone signal:

From the beginning of the section start of the voice signal of other passages described, use the voice letter from described template passage The method that used of waveform segment intercepting in number, intercept successively the second setting quantity, length with from institute State the identical waveform segment of the length of the waveform segment intercepting in the voice signal of template passage；

4. method according to claim 2, it is characterised in that by selected sliding window at described waveform Fragment is slided according to setting means, comprise in selected sliding window according to sliding process each described in adopt Sampling point, calculates the energy vectors of described waveform segment, specifically includes:

By selected sliding window according to setting sliding step slip m time in described waveform segment, generate described ripple The m dimension energy vectors of shape fragment, wherein, the value of the i-th dimension in described m dimension energy vectors is described Selected sliding window after i & lt slip, the average energy of each described sampled point comprising in described selected sliding window Amount, m, i are positive integer, and i is less than or equal to m.

5. method according to claim 4, it is characterised in that set quantity described second In waveform segment, determine corresponding energy envelope and the waveform intercepting from the voice signal of described template passage The waveform segment that the energy envelope of fragment mates most, specifically includes:

The waveform segment corresponding m dimension energy vectors note that will intercept from the voice signal of described template passage Make [x₁,x₂,...,x_m], will set in the individual waveform segment of quantity described second, n-th waveform segment pair The m dimension energy vectors answered is denoted as [y_n1,y_n2,...,y_nm], wherein, n is equal to the described second setting quantity；

k_{n} = \frac{Σ_{i = 1}^{m} x_{i} y_{ni}}{Σ_{i = 1}^{m} {x_{i}}^{2}};

6. method according to claim 5, it is characterised in that calculate [y_n1,y_n2,...,y_nm] with k_n×[x₁,x₂,...,x_mDistance between], specifically includes:

Use formulaCalculate [y_n1,y_n2,...,y_nm] and k_n×[x₁,x₂,...,x_m] it Between mean square deviation, as [y_n1,y_n2,...,y_nm] and k_n×[x₁,x₂,...,x_mDistance between].

7. method according to claim 1, it is characterised in that according to described deviant, respectively will The voice signal of described other passages each is synchronized with the voice signal of described template passage, specifically includes:

Determine deviant minimum in the corresponding deviant of voice signal of described other passages each；

For the voice signal of other passages described in each, perform following operation: from described voice signal Start at beginning, puncture the difference of the corresponding deviant of a length of described voice signal and the deviant of described minimum Waveform segment, and by with cut out after the deviant corresponding voice letter of described voice signal and described minimum Number alignment.

8. a multicenter voice signal sychronisation, it is characterised in that include:

9. device according to claim 8, it is characterised in that described generation module specifically for, Intercept waveform segment from the voice signal of described template passage, sample decimation carried out to described waveform segment, Determine the first setting quantity sampled point, by selected sliding window in described waveform segment according to setting means Slide, each described sampled point comprising in selected sliding window according to sliding process, calculate described waveform The energy vectors of fragment, as the corresponding speech signal energy envelope template generating.

10. device according to claim 8, it is characterised in that described synchronization module specifically for, Determine deviant minimum in the corresponding deviant of voice signal of described other passages each, for each institute State the voice signal of other passages, perform following operation: from the beginning of the section start of described voice signal, cut out Fall the waveform segment of the difference with the deviant of described minimum for the corresponding deviant of a length of described voice signal, and By with cut out after described voice signal voice signal corresponding with the deviant of described minimum align.