A kind of method and device handled audio file
Technical field
The present invention relates to audio signal processing technique field, more particularly to a kind of method handled audio file and dress
Put.
Background technology
With the increasing demand for entertainment, internet operators are also singing the form in room by network K to user
K song business is provided, because accompaniment band and original singer belong to two audio files, therefore when user switches between accompaniment band and original singer
There can be nonsynchronous phenomenon, it is therefore desirable to synchronize accompaniment band and original singer.If when user needs to listen original singer not
Moment original singer can be switched to, can now cause to be aligned completely between two audio signals due to switching, so as to reduce use
The recreation experience at family.
The content of the invention
The embodiment of the present invention provides a kind of method and device handled audio file, allows user as needed
The seamless switching in two audio files is realized, improves the recreation experience of user.
To reach above-mentioned purpose, embodiments of the invention adopt the following technical scheme that:
A kind of method handled audio file, this method include:
Obtain corresponding with the second audio file difference with the first audio file the first sound spectrograph and the second sound spectrograph;
First audio file and second audio file are obtained by first sound spectrograph and the second sound spectrograph
Shift time;
Relevant treatment is associated to first audio file and second audio file by the shift time.
A kind of device handled audio file, the device include:
Sound spectrograph acquisition module, for obtaining, with the first audio file with the second audio file, corresponding first language is composed respectively
Figure and the second sound spectrograph;
Shift time acquisition module, for obtaining the first audio text by first sound spectrograph and the second sound spectrograph
The shift time of part and second audio file;
Audio processing modules, for by the shift time to first audio file and second audio file
It is associated relevant treatment.
The method and device provided in an embodiment of the present invention handled audio file, obtain the first audio file and the
After the synchronous shift time of two audio files, phase is associated to the first audio file and the second audio file by shift time
Pass is handled, and realizes the processing such as the alignment seamless to two audio files, synthesis, when the accompaniment that the first audio file is song
Band, when the second audio file is original singer+accompaniment band of song, user is set to be realized as needed during K songs seamless
Switch accompaniment band and original singer, improve the recreation experience of user.
Brief description of the drawings
Fig. 1 is the schematic flow sheet for the method handled audio file that the embodiment of the present invention one provides.
Fig. 2 is the schematic flow sheet for the method handled audio file that the embodiment of the present invention two provides.
Fig. 3 is the schematic flow sheet that step 230 implements in embodiment two.
Fig. 4 is the principle schematic of step 230 in embodiment two.
Fig. 5 is spectrum energy schematic diagram of the embodiment two after step 233 processing.
Fig. 6 is the structural representation of the processing unit for the audio file that the embodiment of the present invention three provides.
Fig. 7 is the structural representation of the processing unit for the audio file that the embodiment of the present invention four provides.
Embodiment
The method and device provided in an embodiment of the present invention handled audio file is carried out below in conjunction with the accompanying drawings detailed
Thin description.
Embodiment one:
Fig. 1 is the schematic flow sheet for the method handled audio file that the embodiment of the present invention one provides, such as Fig. 1 institutes
Show, the method handled audio file of the embodiment of the present invention comprises the following steps:
Step 110, obtaining the first audio file, corresponding first sound spectrograph and the second language are composed respectively with the second audio file
Figure, wherein, the first audio file is associated related audio file to the second audio file for needs.
Step 120, the first sound spectrograph and the second sound spectrograph are simplified, obtains threeth corresponding with the first sound spectrograph
Sound spectrograph, the 4th sound spectrograph corresponding with the second sound spectrograph.
Step 130, the inclined of the first audio file and the second audio file is obtained according to the 3rd sound spectrograph and the 4th sound spectrograph
Shift time.
Step 140, relevant treatment is associated to the first audio file and the second audio file by shift time.
In step 110, the first audio file the first sound spectrograph corresponding with the second audio file difference and second are obtained
The specific processing of sound spectrograph can be:The first audio file and the second audio file are decoded respectively, and adopted with predetermined
Sample frequency (for example, 8000Hz) carries out resampling to decoded signal, and the audio after resampling is merged into monophonic, pairing
Audio after and carries out framing, and carries out Hanning window processing, carries out Fourier to two audios after above-mentioned processing respectively
Conversion, obtain the first audio file the first sound spectrograph corresponding with the second audio file difference and the second sound spectrograph.
In the step 120, the processing simplified to the first sound spectrograph with the second sound spectrograph is specifically as follows:Obtain first
Multiple frequency values of setting number (for example, first 20) before spectrum energy in sound spectrograph comes, from the first sound spectrograph and second
This 20 frequency values are extracted in sound spectrograph, the 3rd sound spectrograph simplified is respectively obtained by this 20 frequency values and the 4th language is composed
Figure.Because 20 frequency values are frequency values maximum in the first sound spectrograph, the first language has been drawn into by this 20 frequency values
Key frequency band in spectrogram and the second sound spectrograph, and key frequency band illustrates the first sound spectrograph and more having in the second sound spectrograph
Audio signal information, useless audio signal is avoided in follow-up participation, therefore pass through the 3rd sound spectrograph and the 4th language
The processing of spectrogram will significantly reduce the probability of drain process, so as to improve the treatment effeciency to two audio files.Ability
Field technique personnel it is understood that setting number can be according to the specific frequency distribution of the first audio file depending on, specifically
Setting number can not form the limitation to the embodiment of the present invention.
In step 140, relevant treatment is associated to the first audio file and the second audio file by shift time
At least one of following processing can be included:
The first audio file and the second audio file are synchronized according to shift time;
The first audio file is alignd with the second audio file according to shift time;
The first audio file and the second audio file are synthesized according to shift time;
The first audio file is separated with the second audio file according to shift time.
Wherein, the first audio file is specifically as follows with what the second audio file was alignd according to shift time:Root
According to shift time, two the first audio files audio signal corresponding with the second audio file is alignd, for being made due to alignment
Into alignment after audio signal have partial content blank, then the spectral energy values of blank parts are set to 0.Correspondingly, according to skew
Time is specifically as follows to the first audio file with what the second audio file was synthesized:By the audio signal point after two alignment
Left and right acoustic channels not as audio, merge into an audio file.So, when the accompaniment band that the first audio file is song, the
When two audio files are original singer+accompaniment of song, user decodes two sound channels during K songs, but during broadcasting sound,
The data of one of sound channel are only sent according to user's request to sound card, when carrying out the switching of accompaniment band and original singer, will be played
The data of sound are switched to another sound channel.
The method provided in an embodiment of the present invention handled audio file, by according to the 3rd sound spectrograph and the 4th language
Spectrogram obtains the first audio file shift time synchronous with the second audio file, and by shift time to the first audio file
Relevant treatment is associated with the second audio file, the processing such as the alignment seamless to two audio files, synthesis is realized, when the
One audio file is the accompaniment band of song, when the second audio file is original singer+accompaniment band of song, makes user during K songs
Seamless switching accompaniment band and original singer can be realized as needed, improve the recreation experience of user.
Embodiment two:
Fig. 2 is the schematic flow sheet for the method handled audio file that the embodiment of the present invention two provides, such as Fig. 2 institutes
Show, the audio synchronization method of the embodiment of the present invention comprises the following steps:
Step 210, obtaining the first audio file, corresponding first sound spectrograph and the second language are composed respectively with the second audio file
Figure, wherein, the first audio file is related audio file to be associated to the second audio file.
Step 220, the first sound spectrograph and the second sound spectrograph are simplified, obtains threeth corresponding with the first sound spectrograph
Sound spectrograph, the 4th sound spectrograph corresponding with the second sound spectrograph.
Step 230, the maximum frame position of upper energy is obtained according to the crosspower spectrum of the 3rd sound spectrograph and the 4th sound spectrograph.
Step 240, the shift time of the first audio file and the second audio file is obtained according to the maximum frame position of energy.
Step 250, relevant treatment is associated to the first audio file and the second audio file by shift time.
The specific processing of step 210~step 220 refers to a kind of processing of step 110~120 of embodiment, herein no longer
It is described in detail.
Embodiment illustrated in fig. 3 is referred in the specific processing of step 230.
In step 240, it is synchronous with the second audio file according to maximum frame position the first audio file of acquisition of energy
The step of shift time can be specially:The is obtained according to the product of the maximum frame position of energy and the time represented by each frame
The shift time of one audio file and second audio file.The precision of shift time is exactly the interval time of each frame, can
With according to required precision, the appropriate time for reducing frame period, so as to reach higher registration accuracy.
The specific processing of step 250 refers to a kind of processing of step 140 of embodiment, is no longer described in detail herein.
Before specific processing of the reference picture 3 to step 230 is described in detail, introduces and pass through with reference first to Fig. 4
Crosspower spectrum between two sound spectrographs obtains the principle of the shift time of two audios, so that those skilled in the art are to step
The understanding of rapid 230 technical scheme is apparent.
If f2(x) it is f1(x) x is translated respectively in x directions0Curve afterwards, i.e.,:
f2(x)=f1(x-x0) (1)
If f1And f (x)2(x) Fourier transform corresponding to is respectively F1And F (u)2(u), then there is following relation between them:
Then f1And f (x)2(x) crosspower spectrum is:
WhereinRepresent F2Complex conjugate.By carrying out the inverse fourier transform, (x in (x) space to (3) formula0) place general
A unit impulse function is formed, pulse position is two relative translation being registered between curve amount x0.As shown in figure 4, its
To carry out obtained spectrum energy figure after inverse fourier transform to (3) formula, a unit impulse function as can be seen from Fig. 4, pulse
The relative translation amount x that positional representation two is registered between curve0。
Therefore, the embodiment of the present invention can carry out Fourier by the crosspower spectrum between the sound spectrograph to two audio files
Leaf inverse transformation obtains the shift time of two audios.Retouched in detail below with reference to specific processing of the Fig. 3 to step 230
State.
Fig. 3 is a schematic flow sheet of the specific implementation of step 230.Reference picture 3, the specific implementation of step 230 can be with
Comprise the following steps:
Step 231, Shape correction is carried out to the 3rd sound spectrograph and the 4th sound spectrograph.
Step 232, the crosspower spectrum between the 3rd sound spectrograph after shaping and the 4th sound spectrograph after shaping is obtained.
Step 233, corresponding energy value vector is obtained after inverse fourier transform is carried out to crosspower spectrum.
Step 234, the maximum frame position of energy is obtained from energy value vector.
In step 231, carrying out the specific processing of Shape correction with the 4th sound spectrograph to the 3rd sound spectrograph can be:To
Corresponding two-dimensional matrix stretches form the first one-dimensional vector and the 2nd 1 respectively in order respectively for three sound spectrographs and the 4th sound spectrograph
Dimensional vector.Specifically, can respectively by every data line of the 3rd sound spectrograph two-dimensional matrix corresponding with the 4th sound spectrograph according to
It is sequentially connected with, i.e., first column data of a line is directly connected on behind previous row last column data below, forms first
One-dimensional vector and the second one-dimensional vector.By the way that the sound spectrograph of two dimension is shaped as into one-dimensional vector, improve to two audio files
Processing speed, accelerate the analysis process of audio file.
It will be appreciated by persons skilled in the art that order of connection when two-dimensional matrix is stretched as one-dimensional matrix can be according to
From the first row, a line is sequentially connected with to the end, can also be sequentially connected since any a line, can also be from last
Row starts to connect up successively, and the order of connection can be defined according to circumstances, as long as can make the 3rd sound spectrograph and the 4th sound spectrograph
Corresponding frequency can correspond to above, and the example that said sequence is sequentially connected can not form the limit to the embodiment of the present invention
System.
In step 232, Fourier transform is carried out to the first one-dimensional vector and the second one-dimensional vector respectively, then obtains the
The crosspower spectrum of one one-dimensional vector and the second one-dimensional vector, the specific signal of crosspower spectrum may refer to embodiment illustrated in fig. 4
Description, is no longer described in detail herein.
In step 233, the corresponding energy value vector for crosspower spectrum obtain after inverse fourier transform is such as Fig. 5 institutes
Show.Abscissa represents frame number in Fig. 5, and ordinate represents energy value.As can be seen from Figure 5 in the frame position of about 600 frames or so
, there is an impulse function in place, then the energy value at the frame position is maximum.
In step 234, the maximum frame position of energy is obtained in the energy value vector obtained from step 233.Using Fig. 5 as
Example, the frame position corresponding to the maximum impulse function of energy value can be found at the 600th frame.
By step 230~step 240, if time for representing of each frame is dt, 600 frames are shared, then shift time is:dt
× 600, can be according to specific essence so as to obtain accurate shift time between the first audio file and the second audio file
Degree requires, the appropriate time for reducing frame period, so as to reach the precision of higher synchronization or alignment etc..
Embodiment three:
Fig. 6 is the structural representation for the device handled audio file that the embodiment of the present invention three provides, such as Fig. 6 institutes
Show, the device handled audio file of the embodiment of the present invention includes:
Sound spectrograph acquisition module 610, for obtaining and the first audio file and the second audio file difference corresponding first
Sound spectrograph and the second sound spectrograph, wherein, first audio file is related audio to be associated to second audio file
File.
Shift time acquisition module 620, for obtaining first sound by first sound spectrograph and the second sound spectrograph
The shift time of frequency file and second audio file.
Audio processing modules 630, for by the shift time to first audio file and second audio
File is associated relevant treatment.
The processing of sound spectrograph acquisition module 610 refers to a kind of processing of step 110 of embodiment, will not be repeated here.
Shift time acquisition module 620 can include:
Spectrogram simplification unit 621, for being simplified to first sound spectrograph with the second sound spectrograph, obtain with it is described
The 3rd corresponding sound spectrograph of first sound spectrograph and the 4th sound spectrograph corresponding with second sound spectrograph.
Shift time acquiring unit 622, for obtaining described the according to the 3rd sound spectrograph and the 4th sound spectrograph
The shift time of one audio file and second audio file.
The specific processing of spectrogram simplification unit 621 refers to the processing of step 120 in embodiment, will not be repeated here.
The specific processing of spectrogram simplification unit 622 refers to the processing of step 130 in embodiment, will not be repeated here.
Audio processing modules 630 can include with least one of lower unit:
Audio sync unit (not shown), for according to the shift time to first audio file with it is described
Second audio file synchronizes;
Audio alignment unit (not shown), for according to the shift time to first audio file with it is described
Second audio file is alignd;
Audio synthesizer unit (not shown), for according to the shift time to first audio file with it is described
Second audio file is synthesized;
Audio separative element (not shown), for according to the shift time to first audio file with it is described
Second audio file is separated.
Wherein, the specific processing of audio alignment unit can be:According to shift time, by two the first audio files and
Audio signal corresponding to two audio files is alignd, and the audio signal after the alignment for caused by due to alignment has partial content empty
In vain, then the spectral energy values of blank parts are set to 0.Correspondingly, the specific processing of audio synthesizer unit can be:After two alignment
Audio signal respectively as the left and right acoustic channels of audio, merge into an audio.So, as the companion that the first audio file is song
Band is played, when the second audio file is original singer+accompaniment of song, user decodes two sound channels during K songs, but plays
During sound, the data of one of sound channel are only sent according to user's request to sound card, when carrying out the switching of accompaniment band and original singer,
The data for playing sound are switched to another sound channel.
The device provided in an embodiment of the present invention handled audio file, by shift time acquisition module 620
The first audio file shift time synchronous with the second audio file is obtained with the second sound spectrograph according to the first sound spectrograph, and passes through sound
Frequency processing module 630 is associated relevant treatment by shift time to the first audio file and the second audio file, realizes
The processing such as the alignment seamless to two audio files, synthesis, when the accompaniment band that the first audio file is song, the second audio file
For song original singer+accompaniment band when, make user can be realized as needed during K songs seamless switching accompaniment band with it is former
Sing, improve the recreation experience of user.
Example IV:
Fig. 7 is the structural representation for the device handled audio file that the embodiment of the present invention four provides, such as Fig. 7 institutes
Show, the device handled audio file of the embodiment of the present invention includes:
Sound spectrograph acquisition module 610, for obtaining and the first audio file and the second audio file difference corresponding first
Sound spectrograph and the second sound spectrograph, wherein, first audio file is related audio to be associated to second audio file
File.
Shift time acquisition module 620, for obtaining first sound by first sound spectrograph and the second sound spectrograph
The shift time of frequency file and second audio file.
Audio processing modules 630, for by the shift time to first audio file and second audio
File is associated relevant treatment.
As can be seen from the third embodiment, shift time acquisition module 620 can include:Spectrogram simplification unit 621, shift time
Acquiring unit 622.Further, shift time acquiring unit 622 may further include:
Energy largest frames acquiring unit 6221, for the cross-power according to the 3rd sound spectrograph and the 4th sound spectrograph
Spectrum obtains the maximum frame position of energy.
Shift time obtains subelement 6222, for obtaining the first audio text according to the maximum frame position of the energy
The shift time of part and second audio file.
Further, energy largest frames acquiring unit includes 6221:
Shape correction unit 62211, for carrying out Shape correction to the 3rd sound spectrograph and the 4th sound spectrograph.
Crosspower spectrum acquiring unit 62212, for obtaining the 3rd sound spectrograph after shaping and described the after shaping
Crosspower spectrum between four sound spectrographs.
Energy value vector acquiring unit 62213, for being obtained accordingly after carrying out inverse fourier transform to the crosspower spectrum
Energy value vector.
Energy largest frames obtain subelement 62214, for obtaining the maximum frame position of energy from the energy value vector.
Shape correction unit 62211 carries out Shape correction to the 3rd sound spectrograph with the 4th sound spectrograph can be specific
For:To the 3rd sound spectrograph, corresponding two-dimensional matrix stretches formation the 1st respectively in order respectively with the 4th sound spectrograph
Dimensional vector and the second one-dimensional vector.The specific processing of Shape correction unit 6221 and its caused advantageous effects refer to
The associated description of step 231 in embodiment two, will not be repeated here.
The specific processing of crosspower spectrum acquiring unit 62212 refers to the step 232 in embodiment two, no longer superfluous herein
State.
The specific processing of energy value vector acquiring unit 62213 refers to the step 233 in embodiment two, no longer superfluous herein
State.
The specific processing of energy largest frames acquisition subelement 62214 refers to the step 234 in embodiment two, herein no longer
Repeat.
Shift time obtains subelement 6222 according to the maximum frame position of energy and the product of the time represented by each frame
Obtain the shift time of first audio file and second audio file.
In summary, the square law device provided in an embodiment of the present invention handled audio file, by according to first
Sound spectrograph obtains the first audio file shift time synchronous with the second audio file with the second sound spectrograph, and passes through shift time
First audio file and the second audio file are handled, realize the alignment seamless to two audio files, synthesis etc.
Reason, when the accompaniment band that the first audio file is song, when the second audio file is original singer+accompaniment band of song, user is set to be sung in K
During can realize seamless switching accompaniment band and original singer as needed, improve the recreation experience of user.
The foregoing is only a specific embodiment of the invention, but protection scope of the present invention is not limited thereto, any
Those familiar with the art the invention discloses technical scope in, change or replacement can be readily occurred in, should all be contained
Cover within protection scope of the present invention.Therefore, protection scope of the present invention should be based on the protection scope of the described claims.