CN103943113B

CN103943113B - The method and apparatus that a kind of song goes accompaniment

Info

Publication number: CN103943113B
Application number: CN201410151551.8A
Authority: CN
Inventors: 王子亮; 陈凤
Original assignee: Fujian Star Net eVideo Information Systems Co Ltd
Current assignee: Fujian Star Net eVideo Information Systems Co Ltd
Priority date: 2014-04-15
Filing date: 2014-04-15
Publication date: 2017-11-07
Anticipated expiration: 2034-04-15
Also published as: CN103943113A

Abstract

The present invention provides a kind of method that song goes accompaniment, including step：Obtain audio accompaniment signal and song audio signal；FFT is carried out to song audio signal and audio accompaniment signal respectively and obtains song audio signal amplitude spectrum and audio accompaniment signal amplitude spectrum and the phase of song audio signal；Audio accompaniment signal amplitude spectrum is strengthened；Song audio signal amplitude spectrum is subtracted into enhanced audio accompaniment signal amplitude spectrum, and combines the phase of song audio signal and carries out the audio signal that FFT inverse transformations obtain accompaniment.The present invention also provides the device that a kind of song goes accompaniment.The present invention can improve the degree of purity for the song isolated, and the execution efficiency separated is high, and algorithm is simple and easy to apply.

Description

The method and apparatus that a kind of song goes accompaniment

Technical field

The present invention relates to Audio Signal Processing field, the method and apparatus that more particularly to a kind of song goes accompaniment.

Background technology

Song piece-rate system is widely used in some fields, such as lyrics automatic identification and correction.Lyrics automatic identification Usually require that input system is single song, i.e., only song of not accompanying, but this for almost all of song usually It is unpractiaca, because most song is all the accompaniment while comprising song and musical instrument.

The research that song is isolated in current music is also seldom, never with Sound seperation acoustic problem, such Business is easy for people, but highly difficult for machine.Speech Separation is widely studied, but due to Music is a kind of extremely complex signal, and the multiple signals comprising song and different musical instruments are mixed, and musical instrument sound Sound and song or related, are difficult to isolate pure song using Blind Speech Signal isolation technics.

Master's thesis《Song separation based on time frequency analysis》Propose the song separation analyzed based on TF.Its separation process master Main pitch parameters are depended on, there can be overlapping phenomenon between song and the fundamental tone and overtone of musical instrument in many cases, individually It is it is difficult to obtain the TF information of song completely, therefore song often or can not be separated with accompaniment using keynote height.And it is this Method has that algorithm is complicated, computationally intensive, execution efficiency is low.

The content of the invention

The present invention provides a kind of method that song goes accompaniment, it is possible to increase the degree of purity for the song isolated, and separates Execution efficiency it is high, algorithm is simple and easy to apply.

A kind of method that song goes accompaniment, including step：

Obtain audio accompaniment signal and song audio signal；

Song audio signal and audio accompaniment signal are pre-processed respectively and FFT is carried out and obtains song audio signal Amplitude spectrum and audio accompaniment signal amplitude spectrum and the phase of song audio signal；

Audio accompaniment signal amplitude spectrum is strengthened；

Song audio signal amplitude spectrum is subtracted into enhanced audio accompaniment signal amplitude spectrum, and combines song audio signal Phase carry out FFT inverse transformations obtain accompaniment audio signal.

The present invention also provides the device that a kind of song goes accompaniment, and the song goes the device of accompaniment to include：

Audio accompaniment signal and song audio signal acquisition module, for obtaining audio accompaniment signal and song audio letter Number；

Pretreatment and FFT module, for being gone forward side by side respectively to song audio signal and audio accompaniment signal as pretreatment Row FFT obtains song audio signal amplitude spectrum and audio accompaniment signal amplitude spectrum and the phase of song audio signal.

Amplitude spectrum of accompanying strengthens module, for strengthening audio accompaniment signal amplitude spectrum；

Spectral substraction and FFT inverse transform modules, for song audio signal amplitude spectrum to be subtracted into enhanced audio accompaniment Signal amplitude is composed, and combines the audio signal that the phase progress FFT inverse transformations of song audio signal obtain accompaniment.

Beneficial effects of the present invention are：The present invention to song audio signal and audio accompaniment signal by carrying out FFT Song audio signal amplitude spectrum and audio accompaniment signal amplitude spectrum are obtained, and the amplitude spectrum of audio accompaniment signal is strengthened, Song audio signal amplitude spectrum is subtracted into enhanced audio accompaniment signal amplitude spectrum, and combines the phase of song audio signal to enter Row FFT inverse transformations obtain the audio signal of accompaniment, and the present invention can improve the degree of purity for the song isolated, and separate Execution efficiency is high, and algorithm is simple and easy to apply.

Brief description of the drawings

Fig. 1 goes the execution flow chart of the method for accompaniment for a kind of song in an embodiment of the present invention；

Fig. 2 goes the functional block diagram of the device of accompaniment for a kind of song in an embodiment of the present invention；

Fig. 3 is embodiment of the present invention example song《Meet》Song audio time domain beamformer；

Fig. 4 is embodiment of the present invention example song《Meet》Audio accompaniment time domain beamformer；

Fig. 5 is embodiment of the present invention example song《Meet》Go accompaniment after audio time domain oscillogram；

Major Symbol explanation：

10- audio accompaniments signal and song audio signal acquisition module；20- is pre-processed and FFT module；30- accompanies Amplitude spectrum strengthens module；40- spectral substractions and FFT inverse transform modules.

Embodiment

The present invention is composed by the way that song audio signal amplitude spectrum is subtracted into the enhanced audio accompaniment signal amplitude of amplitude spectrum, from And the degree of purity for the song isolated is improved, and the execution efficiency separated is high.

To describe the technology contents of the present invention in detail, feature, the objects and the effects being constructed, below in conjunction with embodiment And coordinate accompanying drawing to be explained in detail.

Referring to Fig. 1, a kind of song of present embodiment removes the method flow diagram of accompaniment method.The song goes to the side of accompaniment Method, including step：

S1, acquisition audio accompaniment signal and song audio signal；

S2, song audio signal and audio accompaniment signal are pre-processed respectively and FFT is carried out obtain song audio Signal amplitude composes the phase with audio accompaniment amplitude spectrum and song audio signal；

S3, to audio accompaniment signal amplitude spectrum strengthen；

S4, song audio signal amplitude spectrum is subtracted into enhanced audio accompaniment signal amplitude composed, and combine song audio The phase of signal carries out the audio signal that FFT inverse transformations obtain accompaniment.

The present invention is subtracted song audio signal amplitude spectrum after enhancing by strengthening audio accompaniment signal amplitude spectrum Audio accompaniment signal amplitude spectrum, and combine song audio signal phase carry out FFT inverse transformations obtain accompaniment audio letter Number, the beneficial degree of purity for improving the song isolated, and the simple execution efficiency of algorithm of present embodiment is high.

In the present embodiment, the step S1 " obtaining audio accompaniment signal and song audio signal " method for " from Audio accompaniment signal and song audio signal are obtained in stereo song audio ", be specially：

The left channel signals of stereo song audio are carried out anti-phase to obtain left inversion signal；

Left inversion signal is added with right-channel signals and obtains audio accompaniment signal.

And it regard right-channel signals in stereo song audio as the song audio signal for needing removal to accompany.

The stereo song audio includes left channel signals and right-channel signals, and the left channel signals are voice and a left side The mixed signal of sound channel accompaniment, right-channel signals are the mixed signals of voice and R channel accompaniment.

In another embodiment, the step S1 " obtaining audio accompaniment signal and song audio signal " method " from Audio accompaniment signal and song audio signal are obtained in stereo song audio ", be specially：

The right-channel signals of stereo song audio are carried out anti-phase to obtain right inversion signal；

Right inversion signal is added with left channel signals and obtains audio accompaniment signal.

And it regard left channel signals in stereo song audio as the song audio signal for needing removal to accompany.

In another embodiment, the step S1 can also be realized by other method.Determine whether song audio And the audio accompaniment corresponding with song audio, if can just make next step processing.

In the present embodiment, it is described " respectively to song sound in step S2 for ease of the processing to song audio signal Frequency signal and audio accompaniment signal are pre-processed ", it implements step and is：

Step S20, song audio signal and audio accompaniment signal are normalized respectively；Wherein, the normalizing Changing the method handled is：The maximum value of song audio signal and audio accompaniment signal is found out respectively, and song audio is believed Number and audio accompaniment signal divided by corresponding maximum value；

The song audio signal and audio accompaniment signal after normalized are divided into N number of frame respectively, wherein, N is just Integer, each song frame and accompaniment frame include 1024 sampled points, and have per two adjacent song frames or between accompaniment frame The sampled point of 512 coincidences.

By the normalized, its amplitude of the song audio signal and audio accompaniment signal be limited to -1 and Between+1, it is easy to subsequent treatment；Song audio signal and audio accompaniment signal are divided into each frame, and two adjacent songs There is the sampled point of 512 coincidences between bent frame or accompaniment frame, in order that being seamlessly transitted between frame and frame.

In the present embodiment, the spectral leakage caused when being and reduce subsequent conversion to frequency domain, in " difference described in step S2 FFT is carried out to song audio and audio accompaniment signal " it is preceding also including carrying out adding Hanning window to each song frame and accompaniment frame Filtering.

In the present embodiment, in step S3, described " audio accompaniment signal is carried out into amplitude spectrum enhancing " implements Step includes：

Step S30, traversal audio accompaniment signal amplitude spectrum M_n(i),（I=0,1,2L512, n=0,1,2LN-1) it is all Frame, finds out the maximum of all amplitude spectrum corresponding points of common 2m+1 frames of the rear m frames of present frame, the preceding m frames of present frame and present frame, The new value that will be put corresponding to the value as present frame, wherein, m is default positive integer.

Such as, m selections 2 in one embodiment.Travel through all frames of audio accompaniment signal amplitude spectrum（Remove all frames Preceding 2 frame with the frame of end 2）, rear 2 frame that present frame, preceding 2 frame of present frame and present frame are found out successively is compared and assignment. For example, present frame is the 2nd frame, then to find out its preceding 2 frame the i.e. the 0th, 1 frame, the 2nd frame, and 2 frames are the 3rd, 4 frames thereafter, to this 5 Frame is traveled through by the 0th~512 point successively, is found out the maximum of 5 each corresponding points of frame and is assigned present frame by the value Corresponding points.For example, the 0th maximum is the value of the 3rd frame in 5 frames, then it is the 2nd frame to assign present frame by the value of the 3rd frame 0th point.Then, the value and assignment that this 5 frame 1-512 point is compared successively give the corresponding points of present frame.Then, ought the 3rd frame work For present frame, its preceding 2 frame the i.e. the 1st, 2 frame, the 3rd frame are found out, and 2 frames are the 4th, 5 frames thereafter, are entered according to above-mentioned identical step Row compares and assignment.Formula is M_n(i)=max(MM_n-2(i),MM_n-1(i),MM_n(i),MM_n+1(i),MM_n+2(i)),i=0,1, 2L512, n=2,3,4LN-3, wherein MM_n(i)=M_n(i), i=0,1,2L512, wherein n=0,1,2LN-1, MM_n(i) it is copy The amplitude spectrum caching of audio accompaniment signal.In other embodiments, the m values can be arranged to other positive integers beyond 2, Such as 1,3,4.

The amplitude of audio accompaniment signal can be strengthened by " being strengthened audio accompaniment signal amplitude spectrum " step Spectrum, allows spectral substraction and FFT inverse transformation steps largely to remove the accompaniment composition in song audio signal.

In present embodiment, the step S4 is specifically included：

S41, according to formula(i=0,1,2L512) (n=0,1,2LN-1),

All song audio frames of traversal, are traveled through by the 0th~512 point, by the amplitude spectrum of song audio frame again per frame The corresponding amplitude spectrum of enhanced audio accompaniment frame is subtracted, the amplitude spectrum of all frames of audio after accompaniment is obtained.Wherein, S_n (i) composed for song audio signal amplitude, M_n(i) composed for enhanced audio accompaniment signal amplitude, Y_n(i) it is to remove the sound after accompanying Frequency signal amplitude is composed, and a is signal to noise ratio Dynamic gene, and b is accompaniment Dynamic gene；

A takes 2, b to take 4 in the present embodiment, and a and b can be arranged to other values in other embodiments, increases a Value can improve accompaniment after audio signal signal to noise ratio, increase b value can increase the removal of accompaniment.

S42, according to formula k_n(i)=Y_n(i)/S_n(i) (i=0,1,2L512) (n=0,1,2LN-1) will remove the sound after accompaniment Frequency frame amplitude is composed divided by the corresponding amplitude spectrum of song audio frame obtains proportionality coefficient k_n(i)；

The FFT real parts of all song audio frames are multiplied by corresponding proportionality coefficient k with imaginary part respectively_n(i), it can be gone The 0th point of FFT real part and imaginary part to the 512nd point of the audio frame after accompaniment, according to FFT symmetry principle, FFT symmetrical 2 Conjugate complex number, i.e. real part are equal each other for part sample value, imaginary part on the contrary, can obtain the 513rd point to 1023 points of FFT real parts with Imaginary part, then carries out the inverse FFT of 1024 points；

Frame obtained by after inverse FFT is stitched together（Notice that interframe is overlapping）, obtain removing the audio letter after accompaniment Number.

Referring to Fig. 2, being that the present invention also provides the functional block diagram that a kind of song removes the device of accompaniment.The song goes accompaniment Device include audio accompaniment signal and song audio signal acquisition module 10, pretreatment and FFT module 20, accompaniment amplitude Spectrum enhancing module 30, spectral substraction and FFT inverse transform modules 40；

Pretreatment and FFT module, for being gone forward side by side respectively to song audio signal and audio accompaniment signal as pretreatment Row FFT obtains song audio signal amplitude spectrum and audio accompaniment signal amplitude spectrum and the phase of song audio signal；

Amplitude spectrum of accompanying strengthens module, strengthens for the amplitude spectrum to audio accompaniment signal；

The present invention carries out amplitude spectrum enhancing by signal amplitude spectrum enhancing module of accompanying to audio accompaniment signal, makes frequency spectrum phase Subtract and FFT inverse transform modules can largely remove the accompaniment composition in song audio, so as to improve the song isolated Degree of purity.

In the present embodiment, the audio accompaniment signal and song audio signal acquisition module include audio accompaniment signal Acquiring unit and song audio signal acquiring unit.

The audio accompaniment signal acquiring unit is used to the left channel signals of stereo song audio carrying out anti-phase obtain Left inversion signal；Left inversion signal is added with right-channel signals and obtains audio accompaniment signal.

The song audio signal acquiring unit be used for using right-channel signals in stereo song audio as need remove The song audio signal of accompaniment.

In another embodiment, the audio accompaniment signal and song audio signal acquisition module are believed including audio accompaniment Number acquiring unit and song audio signal acquiring unit.

The audio accompaniment signal acquiring unit is used to the right-channel signals of stereo song audio carrying out anti-phase obtain Right inversion signal；Right inversion signal is added with left channel signals and obtains audio accompaniment signal.

The song audio signal acquiring unit be used for using left channel signals in stereo song audio as need remove The song audio signal of accompaniment.

In another embodiment, the audio accompaniment signal and song audio signal acquisition module can also be by other Method is realized.Song audio is determined whether and the audio accompaniment corresponding with song audio, if can just make next Step processing.

In the above-described embodiment, the pretreatment and FFT module also include normalization unit, framing unit, added Window unit；

The normalization unit is used to song audio signal and audio accompaniment signal is normalized respectively, wherein Normalized is：Find out the maximum value of song audio signal and audio accompaniment signal respectively, and by song audio signal With audio accompaniment signal divided by corresponding maximum value；

The framing unit is used to the song audio signal and audio accompaniment signal after normalized are divided into N respectively Individual frame, wherein, N is positive integer, and each song frame and accompaniment frame include 1024 sampled points, and per two adjacent song frames Or have the sampled point of 512 coincidences between accompaniment frame.

The windowing unit is used to carry out plus Hanning window filtering each song frame and accompaniment frame.In above-mentioned embodiment In, the accompaniment amplitude spectrum enhancing module is used for all frames for traveling through audio accompaniment signal amplitude spectrum, finds out present frame, present frame Preceding m frames and present frame rear m frames all amplitude spectrum corresponding points of common 2m+1 frames maximum, using the value as present frame institute it is right The new value that should be put, wherein, m is default positive integer.

Spectral substraction and the FFT inverse transform module includes spectral substraction unit, FFT inverse transformation blocks and concatenation unit；

The spectral substraction unit is used to the amplitude spectrum of song audio signal subtracting enhanced audio accompaniment signal width Degree spectrum, obtains the audio frequency signal amplitude spectrum after accompaniment, and formula is：(i=0,1, 2L512)(n=0,1,2LN-1).Wherein, S_n(i) composed for song audio signal amplitude, M_n(i) it is enhanced audio accompaniment signal Amplitude spectrum, Y_n(i) to go the audio frequency signal amplitude spectrum after accompaniment, a is signal to noise ratio Dynamic gene, and b is accompaniment Dynamic gene；

The FFT inverse transformation blocks are used for going the audio frequency signal amplitude spectrum after accompaniment to carry out FFT inverse transformations.Specifically, According to formula k_n(i)=Y_n(i)/S_n(i) (i=0,1,2L512) (n=0,1,2LN-1) will go the audio frequency signal amplitude after accompaniment to compose Divided by song audio signal amplitude spectrum obtains proportionality coefficient k_n(i)；Then the FFT real parts of song audio signal and imaginary part are distinguished It is multiplied by proportionality coefficient k_n(i), and carry out 1024 points FFT inverse transformations；

The concatenation unit is used to frame resulting after FFT inverse transformations being stitched together, and obtains removing the audio after accompaniment Signal.

In summary, the method and apparatus that song of the present invention goes accompaniment, by increasing to audio accompaniment signal amplitude spectrum By force, the amplitude spectrum of song audio signal is subtracted into enhanced audio accompaniment signal amplitude to compose, and combines song audio signal Phase carries out the audio that FFT inverse transformations obtain accompaniment, the degree of purity for the song that beneficial raising is isolated, and present embodiment The simple execution efficiency of algorithm it is high.

Example

Removing accompaniment example below by a specific song, the present invention will be described.

The song of Sun Yan appearances《Meet》, audio format is stereo double channel audio.

By stereo song《Meet》L channel carry out anti-phase obtaining inversion signal；By inversion signal and stereo song The right-channel signals of audio are added and obtain song《Meet》Audio accompaniment signal；And by the right-channel signals of stereo song audio As《Meet》Song audio signal.

2 audios are obtained through above-mentioned steps：Meet _ original singer .wav, and meet _ accompany .wav.

Reading is met _ original singer .wav and meet _ and pre-processed after the voice data for the .wav that accompanies, and 1024 points of progress FFT, met _ the song audio signal amplitude of original singer spectrum and meet _ audio accompaniment signal amplitude spectrum.Then according to such as Lower formula is to meeting _ audio accompaniment signal amplitude spectrum progress amplitude spectrum enhancing：

M_n(i)=max(MM_n-2(i),MM_n-1(i),MM_n(i),MM_n+1(i),MM_n+2(i)),i=0,1,2L512,n=2,3, 4LN-3, wherein MM_n(i)=M_n(i), i=0,1,2L512, n=0,1,2LN-1 represent that the audio accompaniment signal amplitude spectrum of copy is slow Deposit, N represents frame number.

According to formula(i=0,1,2L512) (n=0,1,2LN-1), will meet _ former Sing song audio signal amplitude spectrum subtract it is enhanced meet _ audio accompaniment signal amplitude spectrum, obtain accompaniment after audio Signal amplitude is composed.Wherein a takes 2, b to take 4.

According to formula k_n(i)=Y_n(i)/S_n(i) (i=0,1,2L512) (n=0,1,2LN-1) will go the audio after accompaniment to believe Number amplitude spectrum divided by meet _ the song audio signal amplitude spectrum of original singer obtains proportionality coefficient k_n(i)；

By meeting _ the FFT real parts of the song audio signal amplitude spectrum of original singer are multiplied by proportionality coefficient k respectively with imaginary part_n (i), and carry out 1024 points FFT inverse transformations；

Frame obtained by after FFT inverse transformations is stitched together, the audio for obtaining removing after accompaniment meets _ voice .wav.

It refer to Fig. 3 to Fig. 5, respectively song《Meet》Song audio, audio accompaniment and the audio gone after accompaniment Time domain beamformer.Use player plays audio：Meet _ voice .wav, can hear, accompaniment removes clean, voice substantially Although amplitude has weakened, tonequality is close to the voice in original audio.

Embodiments of the invention are the foregoing is only, are not intended to limit the scope of the invention, it is every to utilize this hair Equivalent structure or equivalent flow conversion that bright specification and accompanying drawing content are made, or directly or indirectly it is used in other related skills Art field, is included within the scope of the present invention.

Claims

1. a kind of method that song goes accompaniment, it is characterised in that including step：

Obtain audio accompaniment signal and song audio signal；

Audio accompaniment signal amplitude spectrum is strengthened；" being strengthened audio accompaniment signal amplitude spectrum " specifically includes step Suddenly：Travel through audio accompaniment signal amplitude and compose all frames, find out the rear common 2m+1 of m frames of present frame, the preceding m frames of present frame and present frame The maximum of all amplitude spectrum corresponding points of frame, the new value that will be put corresponding to the value as present frame, wherein, m is default just whole Number；

Song audio signal amplitude spectrum is subtracted into enhanced audio accompaniment signal amplitude spectrum, and combines the phase of song audio signal Position carries out the audio signal that FFT inverse transformations obtain accompaniment.

2. song according to claim 1 goes accompaniment method, it is characterised in that the step " obtains audio accompaniment signal With song audio signal " method be：Audio accompaniment signal and song audio signal are obtained from stereo song audio, specifically For：

The left or right sound channel signal of stereo song audio is carried out anti-phase to obtain left or right inversion signal；

Left or right inversion signal is added with right or left channel signals and obtains audio accompaniment signal；

And it regard the right side of stereo song audio or left channel signals as the song audio signal for needing to go to accompany.

3. song according to claim 1 goes accompaniment method, it is characterised in that described " respectively to song audio signal The step that implements pre-processed is done with audio accompaniment signal is：

Song audio signal and audio accompaniment signal are normalized respectively；Wherein, the method for the normalized For：Find out the maximum value of song audio signal and audio accompaniment signal respectively, and by song audio signal and audio accompaniment Signal divided by corresponding maximum value；

The song audio signal and audio accompaniment signal after normalized are divided into N number of frame respectively, wherein, N is positive integer；

Each song frame and accompaniment frame are carried out plus Hanning window filtering.

4. song according to claim 1 goes accompaniment method, it is characterised in that described " to compose song audio signal amplitude Enhanced audio accompaniment signal amplitude spectrum is subtracted, and combines the phase progress FFT inverse transformations of song audio signal and obtains companion The audio signal played ", specifically includes step：

According to formulaBy song The amplitude spectrum of audio signal subtracts enhanced audio accompaniment signal amplitude spectrum, obtains the audio frequency signal amplitude spectrum after accompaniment, Wherein, S_n(i) composed for song audio signal amplitude, M_n(i) composed for enhanced audio accompaniment signal amplitude, Y_n(i) it is to go accompaniment Audio frequency signal amplitude spectrum afterwards, FN is the points of FFT, and a is signal to noise ratio Dynamic gene, and b is accompaniment Dynamic gene, and N is just Integer；

According to formula k_n(i)=Y_n(i)/S_n(i) (i=0,1,2 ... FN/2) (n=0,1,2 ... N-1) will go the audio after accompaniment Signal amplitude is composed divided by song audio signal amplitude spectrum obtains proportionality coefficient k_n(i)

The FFT real parts of song audio signal and imaginary part are multiplied by proportionality coefficient k respectively_n(i), and the FFT inversions of FN point are carried out Change；

Frame obtained by after FFT inverse transformations is stitched together, obtains removing the audio signal after accompaniment.

5. a kind of song removes the device of accompaniment, it is characterised in that including：

Audio accompaniment signal and song audio signal acquisition module, for obtaining audio accompaniment signal and song audio signal；

Pretreatment and FFT module, for being pre-processed respectively to song audio signal and audio accompaniment signal and carrying out FFT Conversion obtains song audio signal amplitude spectrum and audio accompaniment signal amplitude spectrum and the phase of song audio signal；

Amplitude spectrum of accompanying strengthens module, for strengthening audio accompaniment signal amplitude spectrum；Specifically for traversal audio accompaniment All frames of signal amplitude spectrum, find out common all amplitudes of 2m+1 frames of the rear m frames of present frame, the preceding m frames of present frame and present frame The maximum of spectrum corresponding points, the new value that will be put corresponding to the value as present frame, wherein, m is default positive integer；

Spectral substraction and FFT inverse transform modules, for song audio signal amplitude spectrum to be subtracted into enhanced audio accompaniment signal Amplitude spectrum, and combine the audio signal that the phase progress FFT inverse transformations of song audio signal obtain accompaniment.

6. song according to claim 5 removes accompaniment apparatus, it is characterised in that the audio accompaniment signal and song audio Signal acquisition module includes audio accompaniment signal acquiring unit and song audio signal acquiring unit；

The audio accompaniment signal acquiring unit is used to the left or right sound channel signal of stereo song audio carrying out anti-phase obtain Left or right inversion signal；Left or right inversion signal is added with right or left channel signals and obtains audio accompaniment signal；

The song audio signal acquiring unit is used for right in stereo song audio or left channel signals as needing to remove The song audio signal of accompaniment.

7. song according to claim 5 removes accompaniment apparatus, it is characterised in that the pretreatment and FFT module are also Including normalization unit, framing unit, windowing unit；

The normalization unit is used to song audio signal and audio accompaniment signal is normalized respectively, wherein normalizing Change is processed as：Find out the maximum value of song audio signal and audio accompaniment signal respectively, and by song audio signal and companion Play audio signal divided by corresponding maximum value；

Song audio signal and audio accompaniment signal after normalized is divided into N number of by the framing unit for respectively Frame, wherein, N is positive integer；

The windowing unit is used to carry out plus Hanning window filtering each song frame and accompaniment frame.

8. song according to claim 5 removes accompaniment apparatus, it is characterised in that spectral substraction and FFT the inversion mold changing Block includes spectral substraction unit, FFT inverse transformation blocks and concatenation unit；

The spectral substraction unit is used to the amplitude spectrum of song audio signal subtracting enhanced audio accompaniment signal amplitude spectrum, obtains The audio frequency signal amplitude gone after accompaniment is composed, and formula is： Wherein, S_n(i) composed for song audio signal amplitude, M_n(i) composed for enhanced audio accompaniment signal amplitude, Y_n(i) it is to go accompaniment Audio frequency signal amplitude spectrum afterwards, FN is the points of FFT, and a is signal to noise ratio Dynamic gene, and b is accompaniment Dynamic gene, and N is just Integer；

The FFT inverse transformation blocks are used for going the audio frequency signal amplitude spectrum after accompaniment to carry out FFT inverse transformations；Specifically, according to Formula k_n(i)=Y_n(i)/S_n(i) (i=0,1,2 ... FN/2) (n=0,1,2 ... N-1) will go the audio frequency signal amplitude after accompaniment Spectrum divided by song audio signal amplitude spectrum obtain proportionality coefficient k_n(i)；Then by the FFT real parts and imaginary component of song audio signal Proportionality coefficient k is not multiplied by_n(i), and the FFT inverse transformations of FN point are carried out；

The concatenation unit is used to frame resulting after FFT inverse transformations being stitched together, and obtains removing the audio signal after accompaniment.