CN104361889B

CN104361889B - Method and device for processing audio file

Info

Publication number: CN104361889B
Application number: CN201410589685.8A
Authority: CN
Inventors: 王徽蓉
Original assignee: Beijing Yinzhibang Culture Technology Co ltd
Current assignee: Shenzhen Taile Culture Technology Co ltd
Priority date: 2014-10-28
Filing date: 2014-10-28
Publication date: 2018-03-16
Anticipated expiration: 2034-10-28
Also published as: CN104361889A

Abstract

The invention provides a method and a device for processing an audio file, wherein the method comprises the following steps: acquiring a first spectrogram and a second spectrogram which respectively correspond to a first audio file and a second audio file, wherein the first audio file and the second audio file are related audio files; acquiring offset time of the first audio file and the second audio file through the first spectrogram and the second spectrogram; and performing association correlation processing on the first audio file and the second audio file through the offset time. The embodiment of the invention can realize seamless switching between the two audio files by offsetting time, thereby improving the entertainment experience of users.

Description

A kind of method and device handled audio file

Technical field

The present invention relates to audio signal processing technique field, more particularly to a kind of method handled audio file and dress Put.

Background technology

With the increasing demand for entertainment, internet operators are also singing the form in room by network K to user K song business is provided, because accompaniment band and original singer belong to two audio files, therefore when user switches between accompaniment band and original singer There can be nonsynchronous phenomenon, it is therefore desirable to synchronize accompaniment band and original singer.If when user needs to listen original singer not Moment original singer can be switched to, can now cause to be aligned completely between two audio signals due to switching, so as to reduce use The recreation experience at family.

The content of the invention

The embodiment of the present invention provides a kind of method and device handled audio file, allows user as needed The seamless switching in two audio files is realized, improves the recreation experience of user.

To reach above-mentioned purpose, embodiments of the invention adopt the following technical scheme that：

A kind of method handled audio file, this method include：

Obtain corresponding with the second audio file difference with the first audio file the first sound spectrograph and the second sound spectrograph；

First audio file and second audio file are obtained by first sound spectrograph and the second sound spectrograph Shift time；

Relevant treatment is associated to first audio file and second audio file by the shift time.

A kind of device handled audio file, the device include：

Sound spectrograph acquisition module, for obtaining, with the first audio file with the second audio file, corresponding first language is composed respectively Figure and the second sound spectrograph；

Shift time acquisition module, for obtaining the first audio text by first sound spectrograph and the second sound spectrograph The shift time of part and second audio file；

Audio processing modules, for by the shift time to first audio file and second audio file It is associated relevant treatment.

The method and device provided in an embodiment of the present invention handled audio file, obtain the first audio file and the After the synchronous shift time of two audio files, phase is associated to the first audio file and the second audio file by shift time Pass is handled, and realizes the processing such as the alignment seamless to two audio files, synthesis, when the accompaniment that the first audio file is song Band, when the second audio file is original singer+accompaniment band of song, user is set to be realized as needed during K songs seamless Switch accompaniment band and original singer, improve the recreation experience of user.

Brief description of the drawings

Fig. 1 is the schematic flow sheet for the method handled audio file that the embodiment of the present invention one provides.

Fig. 2 is the schematic flow sheet for the method handled audio file that the embodiment of the present invention two provides.

Fig. 3 is the schematic flow sheet that step 230 implements in embodiment two.

Fig. 4 is the principle schematic of step 230 in embodiment two.

Fig. 5 is spectrum energy schematic diagram of the embodiment two after step 233 processing.

Fig. 6 is the structural representation of the processing unit for the audio file that the embodiment of the present invention three provides.

Fig. 7 is the structural representation of the processing unit for the audio file that the embodiment of the present invention four provides.

Embodiment

The method and device provided in an embodiment of the present invention handled audio file is carried out below in conjunction with the accompanying drawings detailed Thin description.

Embodiment one：

Fig. 1 is the schematic flow sheet for the method handled audio file that the embodiment of the present invention one provides, such as Fig. 1 institutes Show, the method handled audio file of the embodiment of the present invention comprises the following steps：

Step 110, obtaining the first audio file, corresponding first sound spectrograph and the second language are composed respectively with the second audio file Figure, wherein, the first audio file is associated related audio file to the second audio file for needs.

Step 120, the first sound spectrograph and the second sound spectrograph are simplified, obtains threeth corresponding with the first sound spectrograph Sound spectrograph, the 4th sound spectrograph corresponding with the second sound spectrograph.

Step 130, the inclined of the first audio file and the second audio file is obtained according to the 3rd sound spectrograph and the 4th sound spectrograph Shift time.

Step 140, relevant treatment is associated to the first audio file and the second audio file by shift time.

In step 110, the first audio file the first sound spectrograph corresponding with the second audio file difference and second are obtained The specific processing of sound spectrograph can be：The first audio file and the second audio file are decoded respectively, and adopted with predetermined Sample frequency (for example, 8000Hz) carries out resampling to decoded signal, and the audio after resampling is merged into monophonic, pairing Audio after and carries out framing, and carries out Hanning window processing, carries out Fourier to two audios after above-mentioned processing respectively Conversion, obtain the first audio file the first sound spectrograph corresponding with the second audio file difference and the second sound spectrograph.

In the step 120, the processing simplified to the first sound spectrograph with the second sound spectrograph is specifically as follows：Obtain first Multiple frequency values of setting number (for example, first 20) before spectrum energy in sound spectrograph comes, from the first sound spectrograph and second This 20 frequency values are extracted in sound spectrograph, the 3rd sound spectrograph simplified is respectively obtained by this 20 frequency values and the 4th language is composed Figure.Because 20 frequency values are frequency values maximum in the first sound spectrograph, the first language has been drawn into by this 20 frequency values Key frequency band in spectrogram and the second sound spectrograph, and key frequency band illustrates the first sound spectrograph and more having in the second sound spectrograph Audio signal information, useless audio signal is avoided in follow-up participation, therefore pass through the 3rd sound spectrograph and the 4th language The processing of spectrogram will significantly reduce the probability of drain process, so as to improve the treatment effeciency to two audio files.Ability Field technique personnel it is understood that setting number can be according to the specific frequency distribution of the first audio file depending on, specifically Setting number can not form the limitation to the embodiment of the present invention.

In step 140, relevant treatment is associated to the first audio file and the second audio file by shift time At least one of following processing can be included：

The first audio file and the second audio file are synchronized according to shift time；

The first audio file is alignd with the second audio file according to shift time；

The first audio file and the second audio file are synthesized according to shift time；

The first audio file is separated with the second audio file according to shift time.

Wherein, the first audio file is specifically as follows with what the second audio file was alignd according to shift time：Root According to shift time, two the first audio files audio signal corresponding with the second audio file is alignd, for being made due to alignment Into alignment after audio signal have partial content blank, then the spectral energy values of blank parts are set to 0.Correspondingly, according to skew Time is specifically as follows to the first audio file with what the second audio file was synthesized：By the audio signal point after two alignment Left and right acoustic channels not as audio, merge into an audio file.So, when the accompaniment band that the first audio file is song, the When two audio files are original singer+accompaniment of song, user decodes two sound channels during K songs, but during broadcasting sound, The data of one of sound channel are only sent according to user's request to sound card, when carrying out the switching of accompaniment band and original singer, will be played The data of sound are switched to another sound channel.

The method provided in an embodiment of the present invention handled audio file, by according to the 3rd sound spectrograph and the 4th language Spectrogram obtains the first audio file shift time synchronous with the second audio file, and by shift time to the first audio file Relevant treatment is associated with the second audio file, the processing such as the alignment seamless to two audio files, synthesis is realized, when the One audio file is the accompaniment band of song, when the second audio file is original singer+accompaniment band of song, makes user during K songs Seamless switching accompaniment band and original singer can be realized as needed, improve the recreation experience of user.

Embodiment two：

Fig. 2 is the schematic flow sheet for the method handled audio file that the embodiment of the present invention two provides, such as Fig. 2 institutes Show, the audio synchronization method of the embodiment of the present invention comprises the following steps：

Step 210, obtaining the first audio file, corresponding first sound spectrograph and the second language are composed respectively with the second audio file Figure, wherein, the first audio file is related audio file to be associated to the second audio file.

Step 220, the first sound spectrograph and the second sound spectrograph are simplified, obtains threeth corresponding with the first sound spectrograph Sound spectrograph, the 4th sound spectrograph corresponding with the second sound spectrograph.

Step 230, the maximum frame position of upper energy is obtained according to the crosspower spectrum of the 3rd sound spectrograph and the 4th sound spectrograph.

Step 240, the shift time of the first audio file and the second audio file is obtained according to the maximum frame position of energy.

Step 250, relevant treatment is associated to the first audio file and the second audio file by shift time.

The specific processing of step 210~step 220 refers to a kind of processing of step 110~120 of embodiment, herein no longer It is described in detail.

Embodiment illustrated in fig. 3 is referred in the specific processing of step 230.

In step 240, it is synchronous with the second audio file according to maximum frame position the first audio file of acquisition of energy The step of shift time can be specially：The is obtained according to the product of the maximum frame position of energy and the time represented by each frame The shift time of one audio file and second audio file.The precision of shift time is exactly the interval time of each frame, can With according to required precision, the appropriate time for reducing frame period, so as to reach higher registration accuracy.

The specific processing of step 250 refers to a kind of processing of step 140 of embodiment, is no longer described in detail herein.

Before specific processing of the reference picture 3 to step 230 is described in detail, introduces and pass through with reference first to Fig. 4 Crosspower spectrum between two sound spectrographs obtains the principle of the shift time of two audios, so that those skilled in the art are to step The understanding of rapid 230 technical scheme is apparent.

If f₂(x) it is f₁(x) x is translated respectively in x directions₀Curve afterwards, i.e.,：

f₂(x)=f₁(x-x₀) (1)

If f₁And f (x)₂(x) Fourier transform corresponding to is respectively F₁And F (u)₂(u), then there is following relation between them：

Then f₁And f (x)₂(x) crosspower spectrum is:

WhereinRepresent F₂Complex conjugate.By carrying out the inverse fourier transform, (x in (x) space to (3) formula₀) place general A unit impulse function is formed, pulse position is two relative translation being registered between curve amount x₀.As shown in figure 4, its To carry out obtained spectrum energy figure after inverse fourier transform to (3) formula, a unit impulse function as can be seen from Fig. 4, pulse The relative translation amount x that positional representation two is registered between curve₀。

Therefore, the embodiment of the present invention can carry out Fourier by the crosspower spectrum between the sound spectrograph to two audio files Leaf inverse transformation obtains the shift time of two audios.Retouched in detail below with reference to specific processing of the Fig. 3 to step 230 State.

Fig. 3 is a schematic flow sheet of the specific implementation of step 230.Reference picture 3, the specific implementation of step 230 can be with Comprise the following steps：

Step 231, Shape correction is carried out to the 3rd sound spectrograph and the 4th sound spectrograph.

Step 232, the crosspower spectrum between the 3rd sound spectrograph after shaping and the 4th sound spectrograph after shaping is obtained.

Step 233, corresponding energy value vector is obtained after inverse fourier transform is carried out to crosspower spectrum.

Step 234, the maximum frame position of energy is obtained from energy value vector.

In step 231, carrying out the specific processing of Shape correction with the 4th sound spectrograph to the 3rd sound spectrograph can be：To Corresponding two-dimensional matrix stretches form the first one-dimensional vector and the 2nd 1 respectively in order respectively for three sound spectrographs and the 4th sound spectrograph Dimensional vector.Specifically, can respectively by every data line of the 3rd sound spectrograph two-dimensional matrix corresponding with the 4th sound spectrograph according to It is sequentially connected with, i.e., first column data of a line is directly connected on behind previous row last column data below, forms first One-dimensional vector and the second one-dimensional vector.By the way that the sound spectrograph of two dimension is shaped as into one-dimensional vector, improve to two audio files Processing speed, accelerate the analysis process of audio file.

It will be appreciated by persons skilled in the art that order of connection when two-dimensional matrix is stretched as one-dimensional matrix can be according to From the first row, a line is sequentially connected with to the end, can also be sequentially connected since any a line, can also be from last Row starts to connect up successively, and the order of connection can be defined according to circumstances, as long as can make the 3rd sound spectrograph and the 4th sound spectrograph Corresponding frequency can correspond to above, and the example that said sequence is sequentially connected can not form the limit to the embodiment of the present invention System.

In step 232, Fourier transform is carried out to the first one-dimensional vector and the second one-dimensional vector respectively, then obtains the The crosspower spectrum of one one-dimensional vector and the second one-dimensional vector, the specific signal of crosspower spectrum may refer to embodiment illustrated in fig. 4 Description, is no longer described in detail herein.

In step 233, the corresponding energy value vector for crosspower spectrum obtain after inverse fourier transform is such as Fig. 5 institutes Show.Abscissa represents frame number in Fig. 5, and ordinate represents energy value.As can be seen from Figure 5 in the frame position of about 600 frames or so , there is an impulse function in place, then the energy value at the frame position is maximum.

In step 234, the maximum frame position of energy is obtained in the energy value vector obtained from step 233.Using Fig. 5 as Example, the frame position corresponding to the maximum impulse function of energy value can be found at the 600th frame.

By step 230~step 240, if time for representing of each frame is dt, 600 frames are shared, then shift time is：dt × 600, can be according to specific essence so as to obtain accurate shift time between the first audio file and the second audio file Degree requires, the appropriate time for reducing frame period, so as to reach the precision of higher synchronization or alignment etc..

Embodiment three：

Fig. 6 is the structural representation for the device handled audio file that the embodiment of the present invention three provides, such as Fig. 6 institutes Show, the device handled audio file of the embodiment of the present invention includes：

Sound spectrograph acquisition module 610, for obtaining and the first audio file and the second audio file difference corresponding first Sound spectrograph and the second sound spectrograph, wherein, first audio file is related audio to be associated to second audio file File.

Shift time acquisition module 620, for obtaining first sound by first sound spectrograph and the second sound spectrograph The shift time of frequency file and second audio file.

Audio processing modules 630, for by the shift time to first audio file and second audio File is associated relevant treatment.

The processing of sound spectrograph acquisition module 610 refers to a kind of processing of step 110 of embodiment, will not be repeated here.

Shift time acquisition module 620 can include：

Spectrogram simplification unit 621, for being simplified to first sound spectrograph with the second sound spectrograph, obtain with it is described The 3rd corresponding sound spectrograph of first sound spectrograph and the 4th sound spectrograph corresponding with second sound spectrograph.

Shift time acquiring unit 622, for obtaining described the according to the 3rd sound spectrograph and the 4th sound spectrograph The shift time of one audio file and second audio file.

The specific processing of spectrogram simplification unit 621 refers to the processing of step 120 in embodiment, will not be repeated here.

The specific processing of spectrogram simplification unit 622 refers to the processing of step 130 in embodiment, will not be repeated here.

Audio processing modules 630 can include with least one of lower unit：

Audio sync unit (not shown), for according to the shift time to first audio file with it is described Second audio file synchronizes；

Audio alignment unit (not shown), for according to the shift time to first audio file with it is described Second audio file is alignd；

Audio synthesizer unit (not shown), for according to the shift time to first audio file with it is described Second audio file is synthesized；

Audio separative element (not shown), for according to the shift time to first audio file with it is described Second audio file is separated.

Wherein, the specific processing of audio alignment unit can be：According to shift time, by two the first audio files and Audio signal corresponding to two audio files is alignd, and the audio signal after the alignment for caused by due to alignment has partial content empty In vain, then the spectral energy values of blank parts are set to 0.Correspondingly, the specific processing of audio synthesizer unit can be：After two alignment Audio signal respectively as the left and right acoustic channels of audio, merge into an audio.So, as the companion that the first audio file is song Band is played, when the second audio file is original singer+accompaniment of song, user decodes two sound channels during K songs, but plays During sound, the data of one of sound channel are only sent according to user's request to sound card, when carrying out the switching of accompaniment band and original singer, The data for playing sound are switched to another sound channel.

The device provided in an embodiment of the present invention handled audio file, by shift time acquisition module 620 The first audio file shift time synchronous with the second audio file is obtained with the second sound spectrograph according to the first sound spectrograph, and passes through sound Frequency processing module 630 is associated relevant treatment by shift time to the first audio file and the second audio file, realizes The processing such as the alignment seamless to two audio files, synthesis, when the accompaniment band that the first audio file is song, the second audio file For song original singer+accompaniment band when, make user can be realized as needed during K songs seamless switching accompaniment band with it is former Sing, improve the recreation experience of user.

Example IV：

Fig. 7 is the structural representation for the device handled audio file that the embodiment of the present invention four provides, such as Fig. 7 institutes Show, the device handled audio file of the embodiment of the present invention includes：

As can be seen from the third embodiment, shift time acquisition module 620 can include：Spectrogram simplification unit 621, shift time Acquiring unit 622.Further, shift time acquiring unit 622 may further include：

Energy largest frames acquiring unit 6221, for the cross-power according to the 3rd sound spectrograph and the 4th sound spectrograph Spectrum obtains the maximum frame position of energy.

Shift time obtains subelement 6222, for obtaining the first audio text according to the maximum frame position of the energy The shift time of part and second audio file.

Further, energy largest frames acquiring unit includes 6221：

Shape correction unit 62211, for carrying out Shape correction to the 3rd sound spectrograph and the 4th sound spectrograph.

Crosspower spectrum acquiring unit 62212, for obtaining the 3rd sound spectrograph after shaping and described the after shaping Crosspower spectrum between four sound spectrographs.

Energy value vector acquiring unit 62213, for being obtained accordingly after carrying out inverse fourier transform to the crosspower spectrum Energy value vector.

Energy largest frames obtain subelement 62214, for obtaining the maximum frame position of energy from the energy value vector.

Shape correction unit 62211 carries out Shape correction to the 3rd sound spectrograph with the 4th sound spectrograph can be specific For：To the 3rd sound spectrograph, corresponding two-dimensional matrix stretches formation the 1st respectively in order respectively with the 4th sound spectrograph Dimensional vector and the second one-dimensional vector.The specific processing of Shape correction unit 6221 and its caused advantageous effects refer to The associated description of step 231 in embodiment two, will not be repeated here.

The specific processing of crosspower spectrum acquiring unit 62212 refers to the step 232 in embodiment two, no longer superfluous herein State.

The specific processing of energy value vector acquiring unit 62213 refers to the step 233 in embodiment two, no longer superfluous herein State.

The specific processing of energy largest frames acquisition subelement 62214 refers to the step 234 in embodiment two, herein no longer Repeat.

Shift time obtains subelement 6222 according to the maximum frame position of energy and the product of the time represented by each frame Obtain the shift time of first audio file and second audio file.

In summary, the square law device provided in an embodiment of the present invention handled audio file, by according to first Sound spectrograph obtains the first audio file shift time synchronous with the second audio file with the second sound spectrograph, and passes through shift time First audio file and the second audio file are handled, realize the alignment seamless to two audio files, synthesis etc. Reason, when the accompaniment band that the first audio file is song, when the second audio file is original singer+accompaniment band of song, user is set to be sung in K During can realize seamless switching accompaniment band and original singer as needed, improve the recreation experience of user.

The foregoing is only a specific embodiment of the invention, but protection scope of the present invention is not limited thereto, any Those familiar with the art the invention discloses technical scope in, change or replacement can be readily occurred in, should all be contained Cover within protection scope of the present invention.Therefore, protection scope of the present invention should be based on the protection scope of the described claims.

Claims

A kind of 1. method handled audio file, it is characterised in that methods described includes：

Obtain corresponding with the second audio file difference with the first audio file the first sound spectrograph and the second sound spectrograph；

The inclined of first audio file and second audio file is obtained by first sound spectrograph and the second sound spectrograph Shift time；

Relevant treatment is associated to first audio file and second audio file by the shift time,

Wherein, it is described that first audio file and second audio are obtained by first sound spectrograph and the second sound spectrograph The step of shift time of file, includes：

First sound spectrograph is simplified with the second sound spectrograph, obtains the threeth language spectrum corresponding with first sound spectrograph Figure and the 4th sound spectrograph corresponding with second sound spectrograph；

First audio file and second audio file are obtained according to the 3rd sound spectrograph and the 4th sound spectrograph Shift time,

Wherein, it is described that first audio file and described second is obtained according to the 3rd sound spectrograph and the 4th sound spectrograph The step of shift time of audio file, includes：

The maximum frame position of energy is obtained according to the crosspower spectrum of the 3rd sound spectrograph and the 4th sound spectrograph；

The shift time of first audio file and second audio file is obtained according to the maximum frame position of the energy.
2. according to the method for claim 1, it is characterised in that described that first sound spectrograph and the second sound spectrograph are carried out The step of simplifying includes：

Obtain multiple frequency values of setting number before the spectrum energy in first sound spectrograph comes；

The multiple frequency values are extracted from first sound spectrograph and second sound spectrograph, pass through the multiple frequency values point The 3rd sound spectrograph do not simplified and the 4th sound spectrograph.
3. according to the method for claim 1, it is characterised in that described to be composed according to the 3rd sound spectrograph and the 4th language The step of frame position of the crosspower spectrum acquisition energy maximum of figure, includes：

Shape correction is carried out to the 3rd sound spectrograph and the 4th sound spectrograph；

Obtain the crosspower spectrum between the 3rd sound spectrograph after shaping and the 4th sound spectrograph after shaping；

Corresponding energy value vector is obtained after inverse fourier transform is carried out to the crosspower spectrum；

The maximum frame position of energy is obtained from the energy value vector.
4. according to the method for claim 3, it is characterised in that described to the 3rd sound spectrograph and the 4th sound spectrograph The step of carrying out Shape correction includes：

To the 3rd sound spectrograph and the 4th sound spectrograph, corresponding two-dimensional matrix stretches form first respectively in order respectively One-dimensional vector and the second one-dimensional vector.
5. according to the method for claim 1, it is characterised in that described in the frame position maximum according to the energy obtains The step of shift time of first audio file and second audio file, includes：

According to the product of the maximum frame position of the energy and the time represented by each frame obtain first audio file with The shift time of second audio file.
6. according to any described methods of claim 1-5, it is characterised in that it is described by the shift time to described first The step of audio file is associated relevant treatment with second audio file includes：

First audio file and second audio file are synchronized according to the shift time；Or

First audio file is alignd with second audio file according to the shift time；Or

First audio file and second audio file are synthesized according to the shift time；Or

First audio file is separated with second audio file according to the shift time.
7. a kind of device handled audio file, it is characterised in that described device includes：

Sound spectrograph acquisition module, for obtain with the first audio file and the second audio file distinguish corresponding first sound spectrograph and Second sound spectrograph；

Shift time acquisition module, for by first sound spectrograph and the second sound spectrograph obtain first audio file with The shift time of second audio file；

Audio processing modules, for being carried out by the shift time to first audio file and second audio file Associate relevant treatment,

Wherein, the shift time acquisition module includes：

Spectrogram simplification unit, for being simplified to first sound spectrograph with the second sound spectrograph, obtain and first language The 3rd corresponding sound spectrograph of spectrogram and the 4th sound spectrograph corresponding with second sound spectrograph；

Shift time acquiring unit, for obtaining the first audio text according to the 3rd sound spectrograph and the 4th sound spectrograph The shift time of part and second audio file,

Wherein, the shift time acquiring unit includes：

Energy largest frames position acquisition unit, for being obtained according to the crosspower spectrum of the 3rd sound spectrograph and the 4th sound spectrograph Take the maximum frame position of energy；

Shift time obtains subelement, for according to the maximum frame position of the energy obtain first audio file with it is described The shift time of second audio file.
8. device according to claim 7, it is characterised in that the spectrogram simplification unit includes：

Frequency values acquiring unit, for obtaining multiple frequencies of setting number before the spectrum energy in first sound spectrograph comes Value；

Sound spectrograph simplifies subelement, for extracting the multiple frequency from first sound spectrograph and second sound spectrograph Value, the 3rd sound spectrograph and the 4th sound spectrograph simplified are respectively obtained by the multiple frequency values.
9. device according to claim 7, it is characterised in that the energy largest frames position acquisition unit includes：

Shape correction unit, for carrying out Shape correction to the 3rd sound spectrograph and the 4th sound spectrograph；

Crosspower spectrum acquiring unit, for obtain the 3rd sound spectrograph after shaping and the 4th sound spectrograph after shaping it Between crosspower spectrum；

Energy value vector acquiring unit, for the crosspower spectrum carry out inverse fourier transform after obtain corresponding energy value to Amount；

Energy largest frames position acquisition subelement, for obtaining the maximum frame position of energy from the energy value vector.
10. device according to claim 9, it is characterised in that the Shape correction unit to the 3rd sound spectrograph with 4th sound spectrograph respectively corresponding to two-dimensional matrix stretch form the first one-dimensional vector and the second one-dimensional vector respectively in order.
11. device according to claim 7, it is characterised in that the shift time obtains subelement according to the energy Maximum frame position and the product of the time represented by each frame obtain first audio file and second audio file Shift time.
12. according to any described devices of claim 7-11, it is characterised in that the audio processing modules are included with lower unit In it is at least one：

Audio sync unit, for being carried out according to the shift time to first audio file and second audio file It is synchronous；

Audio alignment unit, for being carried out according to the shift time to first audio file and second audio file Alignment；

Audio synthesizer unit, for being carried out according to the shift time to first audio file and second audio file Synthesis；

Audio separative element, for being carried out according to the shift time to first audio file and second audio file Separation.