CN103680517A - Method, device and equipment for processing audio signals - Google Patents

Method, device and equipment for processing audio signals Download PDF

Info

Publication number
CN103680517A
CN103680517A CN201310587304.8A CN201310587304A CN103680517A CN 103680517 A CN103680517 A CN 103680517A CN 201310587304 A CN201310587304 A CN 201310587304A CN 103680517 A CN103680517 A CN 103680517A
Authority
CN
China
Prior art keywords
frequency
region signal
signal
frame
repeats
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201310587304.8A
Other languages
Chinese (zh)
Inventor
徐德著
顾凤香
赵翔宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN201310587304.8A priority Critical patent/CN103680517A/en
Publication of CN103680517A publication Critical patent/CN103680517A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/031Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
    • G10H2210/041Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal based on mfcc [mel -frequency spectral coefficients]
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/031Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
    • G10H2210/056Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for extraction or identification of individual instrumental parts, e.g. melody, chords, bass; Identification or separation of instrumental parts by their characteristic voices or timbres
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2250/00Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
    • G10H2250/131Mathematical functions for musical analysis, processing, synthesis or composition
    • G10H2250/215Transforms, i.e. mathematical transforms into domains appropriate for musical signal processing, coding or compression
    • G10H2250/235Fourier transform; Discrete Fourier Transform [DFT]; Fast Fourier Transform [FFT]

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Electrophonic Musical Instruments (AREA)

Abstract

The invention discloses a method, a device and equipment for processing audio signals, and belongs to the field of audio processing. The method includes converting a single-track signal of a song from time domain into frequency domain to acquire a first frequency domain signal including a harmonic musical component, a percussive musical component and a human voice component; separating a second frequency domain signal, including the percussive musical component and the human voice component, from the first frequency domain signal by the HPSS (harmonic/percussive sound separation) algorithm; extracting the human voice component from the second frequency domain signal by the NNMF (nearest neighbors and median filtering) algorithm. The device comprises a converting unit, a separating unit and an extracting unit. The equipment is used for implementing the method. By the method, the device and the equipment, quality of human voices extracted from songs is improved.

Description

A kind of disposal route of sound signal, device and equipment
Technical field
The present invention relates to field of audio processing, particularly a kind of disposal route of sound signal, device and equipment.
Background technology
Each sound channel signal of monophony or two-channel song generally comprises voice and two kinds of sound signals of accompaniment.If user wants to extract voice or accompaniment in song, can adopt voice or accompaniment isolation technics that voice or accompaniment are extracted from song.
With voice, be separated into example, introduce existing separate mode, it comprises the following steps: the first step, is converted to frequency domain by the left and right sound track signals of song by time domain respectively; Second step, the right normalized crosscorrelation value of corresponding frequency of calculating left and right sound track signals; The 3rd step, the mean value signal weighting people acoustic gain right to the corresponding frequency of left and right sound track signals, the normalized crosscorrelation value value in direct ratio that people's acoustic gain and current frequency are right; The 4th step, is converted to time domain by the mean value signal of the left channel signals after weighting people acoustic gain and right-channel signals by frequency domain and extracts voice.
Existing separate mode gives music composition different gains according to the correlativity of the left and right sound track signals of song, the right gain of different frequent points is different, and the right gain value of different frequent points is separate, there is no certain correlativity, while getting different gains, can change tamber characteristic, cause people's sound distortion; Like this, the people's sound effective value extracting is poor, cannot meet the extraction requirement of high-quality voice.
Summary of the invention
In order to solve the problem of prior art, the embodiment of the present invention provides a kind of disposal route, device and equipment of sound signal.Described technical scheme is as follows:
First aspect, the embodiment of the present invention provides a kind of disposal route of sound signal, and described method comprises:
From time domain, convert the monophonic signal of song to frequency domain, obtain the first frequency-region signal, described the first frequency-region signal comprises harmonic wave class music composition, knock type music composition and voice composition;
Adopt the separated HPSS algorithm of harmonic wave class/knock type music to isolate the second frequency-region signal from described the first frequency-region signal, described the second frequency-region signal comprises described knock type music composition and described voice composition;
Adopt the medium filtering NNMF algorithm between the most similar consecutive frame from described the second frequency-region signal, to extract described voice composition.
In the first implementation of first aspect, the separated HPSS algorithm of described employing harmonic wave class/knock type music is isolated the second frequency-region signal from described the first frequency-region signal, comprising:
Each frequency in described the first frequency-region signal is got to amplitude, obtain the first matrix;
Each row in described the first matrix are carried out to medium filtering, obtain the second matrix, and each row in described the first matrix is carried out to medium filtering, obtain the 3rd matrix;
According to described the second matrix and described the 3rd matrix, by following formula, from described the first frequency-region signal, isolate described the second frequency-region signal;
((P.*P)./((H.*H)+(P.*P))).*X
H represents described the second matrix, and P represents described the 3rd matrix, and X represents described the first matrix ./representing some division operation .* represents point multiplication operation.
In conjunction with the first implementation of first aspect or first aspect, in the second implementation, the described monophonic signal by song converts frequency domain to from time domain, obtains the first frequency-region signal, comprising:
Adopt Fast Fourier Transform (FFT) FFT from time domain, to convert the monophonic signal of described song to frequency domain, obtain described the first frequency-region signal; The sampling rate of described FFT is 44.1KHz, and frame length is not less than 8192 points, and frame moves half into described frame length.
The first implementation or the second implementation in conjunction with first aspect, first aspect, in the third implementation, medium filtering NNMF algorithm between the most similar consecutive frame of described employing also comprised extract described voice composition from described the second frequency-region signal before:
Adopt Fast Fourier Transform Inverse (FFTI) from frequency domain, to convert described the second frequency-region signal to time domain, then adopt FFT to carry out time domain to the conversion of frequency domain, obtain repeating described second frequency-region signal of conversion; The sampling rate that obtains the FFT that described the second frequency-region signal that repeats to change adopts is 44.1KHz, and frame length is not more than 4096 points, frame move for obtain the FFT that described the second frequency-region signal that repeats conversion adopts frame length 1/4th;
Medium filtering NNMF algorithm between the most similar consecutive frame of described employing extracts described voice composition from described the second frequency-region signal, comprising:
Adopt NNMF algorithm to extract described voice composition from described the second frequency-region signal that repeats to change.
In conjunction with the third implementation of first aspect, in the 4th kind of implementation, described employing NNMF algorithm extracts described voice composition from described the second frequency-region signal that repeats to change, and comprising:
Each frequency in described the second frequency-region signal that repeats to change is got to amplitude;
Travel through each frame signal in described the second frequency-region signal that repeats conversion, calculate each frame signal respectively and described the second frequency-region signal that repeats conversion in similarity between other frame signals except described each frame signal;
According to described similarity, obtain the described frequency domain spectra that repeats the second frequency-region signal of conversion and estimate;
By following formula, calculate the difference of the right index normalized crosscorrelation value of corresponding frequency between the frequency domain spectra estimation of described the second frequency-region signal that repeats conversion and described the second frequency-region signal that repeats conversion,
Q ( i , j ) = ( exp ( - ( log ( PP ( i , j ) ) - log ( Y ( i , j ) ) ) 2 2 * namda * namda ) ) 2
And by following formula, according to described difference, calculate the weight of described knock type music composition;
W ( i , j ) = 0 , Q ( i , j ) < 0.85 1 , Q ( i , j ) &GreaterEqual; 0.85
PP(i, j) represent described repeat conversion the second frequency-region signal in i frequency of j frame; Y(i, j) represent the described frequency domain spectra that repeats the second frequency-region signal of conversion estimate in i frequency of j frame; Q(i, j) represent the difference of the index normalized crosscorrelation value between i frequency of j frame in i frequency of j frame in described the second frequency-region signal that repeats conversion and the described frequency domain spectra estimation that repeats the second frequency-region signal of changing, W(i, j) represent the weight of the knock type music composition of i frequency of j frame in described the second frequency-region signal that repeats to change, namda is weight factor, namda=3;
By following formula, according to the weight of described knock type music composition, from described the second frequency-region signal that repeats to change, extract described voice composition;
P1=(1-W).*PP
P1 represents described voice composition, and W represents the weight of described knock type music composition, and PP represents described the second frequency-region signal that repeats conversion.
In conjunction with the 4th kind of implementation of first aspect, in the 5th kind of implementation, described according to described similarity, obtain the described frequency domain spectra that repeats the second frequency-region signal of conversion and estimate, comprising:
According to described similarity, obtain the similar frame signal of the predetermined quantity of described each frame signal;
According to the similar frame signal of described predetermined quantity, calculate the frequency domain spectra of described each frame signal and estimate;
The frequency domain spectra of described each frame signal calculating is estimated to form the described frequency domain spectra that repeats the second frequency-region signal of conversion to be estimated.
In conjunction with the 4th kind of implementation of first aspect, in the 6th kind of implementation, described according to the weight of described knock type music composition, from described the second frequency-region signal that repeats to change, extract after described voice composition, also comprise:
By following formula, from described the second frequency-region signal that repeats to change, isolate described knock type music composition,
P2=W.*PP
P2 represents described knock type music composition.
In conjunction with the 6th kind of implementation of first aspect, in the 7th kind of implementation, described from described repeat conversion the second frequency-region signal isolate after described knock type music composition, also comprise:
From described the first frequency-region signal, isolate described harmonic wave class music composition;
From frequency domain, convert described isolated described harmonic wave class music composition to time domain, from frequency domain, convert described knock type music composition to time domain, and the knock type music composition after the harmonic wave class music composition after conversion and conversion is synthesized, obtain the composition of accompanying.
Second aspect, the embodiment of the present invention provides a kind for the treatment of apparatus of sound signal, and described device comprises:
Converting unit, for converting the monophonic signal of song to frequency domain from time domain, obtains the first frequency-region signal, and described the first frequency-region signal comprises harmonic wave class music composition, knock type music composition and voice composition;
Separative element, isolates described the second frequency-region signal for the first frequency-region signal that adopts the separated HPSS algorithm of harmonic wave class/knock type music to obtain from described converting unit, and described the second frequency-region signal comprises described knock type music composition and described voice composition;
Extraction unit, for adopting the medium filtering NNMF algorithm between the most similar contiguous frames to extract described voice composition from isolated the second frequency-region signal of described separative element.
In the first implementation of second aspect, described separative element specifically for:
In the first frequency-region signal that described converting unit is obtained, each frequency is got amplitude, obtains the first matrix;
Each row in described the first matrix are carried out to medium filtering, obtain the second matrix, and each row in described the first matrix is carried out to medium filtering, obtain the 3rd matrix;
According to described the second matrix and described the 3rd matrix, by following formula, from described the first frequency-region signal, isolate described the second frequency-region signal;
((P.*P)./((H.*H)+(P.*P))).*X
H represents described the second matrix, and P represents described the 3rd matrix, and X represents described the first matrix ./representing some division operation .* represents point multiplication operation.
In conjunction with the first implementation of the second method or second aspect, in the second implementation, described converting unit specifically for:
Adopt Fast Fourier Transform (FFT) FFT from time domain, to convert the monophonic signal of described song to frequency domain, obtain described the first frequency-region signal; The sampling rate of described FFT is 44.1KHz, and frame length is not less than 8192 points, and frame moves half into described frame length.
In conjunction with the first implementation or the second implementation of the second method, second aspect, in the third implementation, described separative element also for:
Adopt Fast Fourier Transform Inverse (FFTI) from frequency domain, to convert described the second frequency-region signal to time domain, then adopt FFT to carry out time domain to the conversion of frequency domain, obtain repeating the second frequency-region signal of conversion; The sampling rate that obtains the FFT that described the second frequency-region signal that repeats to change adopts is 44.1K, and frame length is not more than 4096 points, frame move for obtain the FFT that described the second frequency-region signal that repeats conversion adopts frame length 1/4th;
Described converting unit specifically for:
Described the second frequency-region signal that repeats to change that adopts described NNMF algorithm to obtain from described separative element, extract described voice composition.
In conjunction with the third implementation of second aspect, in the 4th kind of implementation, described extraction unit comprises:
First obtains subelement, for described each frequency of the second frequency-region signal that repeats conversion that described separative element is obtained, gets amplitude;
The first computation subunit, for traveling through described each frame signal of the second frequency-region signal that repeats conversion, calculate each frame signal respectively and described the second frequency-region signal that repeats conversion in similarity between other frame signals except described each frame signal;
Second obtains subelement, for the similarity calculating according to described the first computation subunit, obtains the described frequency domain spectra that repeats the second frequency-region signal of conversion and estimates;
The second computation subunit, for passing through following formula, the difference of the right index normalized crosscorrelation value of corresponding frequency between the frequency domain spectra estimation of described the second frequency-region signal that repeats conversion that calculates that described the second frequency-region signal and described second that repeats conversion obtains that subelement obtains
Q ( i , j ) = ( exp ( - ( log ( PP ( i , j ) ) - log ( Y ( i , j ) ) ) 2 2 * namda * namda ) ) 2
And by following formula, according to described difference, calculate the weight of described knock type music signal;
W ( i , j ) = 0 , Q ( i , j ) < 0.85 1 , Q ( i , j ) &GreaterEqual; 0.85
Wherein, PP(i, j) represent described repeat conversion the second frequency-region signal in i frequency of j frame; Y(i, j) represent the described frequency domain spectra that repeats the second frequency-region signal of conversion estimate in i frequency of j frame; Q(i, j) represent the difference of the index normalized crosscorrelation value between i frequency of j frame in i frequency of j frame in described the second frequency-region signal that repeats conversion and the described frequency domain spectra estimation that repeats the second frequency-region signal of changing, W(i, j) represent the weight of the knock type music composition of i frequency of j frame in described the second frequency-region signal that repeats to change, namda is weight factor, namda=3;
Extract subelement, for by following formula, according to the weight of the knock type music composition of described the second computation subunit calculating, from described the second frequency-region signal that repeats to change, extract described voice composition;
P1=(1-W).*PP
P1 represents described voice composition, and W represents the weight of described knock type music composition, and PP represents described the second frequency-region signal that repeats conversion.
In conjunction with the 4th kind of implementation of second aspect, in the 5th kind of implementation, described second obtain subelement specifically for:
According to described similarity, obtain the similar frame signal of the predetermined quantity of described each frame signal;
According to the similar frame signal of described predetermined quantity, calculate the frequency domain spectra of described each frame signal and estimate;
The frequency domain spectra of described each frame signal calculating is estimated to form the described frequency domain spectra that repeats the second frequency-region signal of conversion to be estimated.
In conjunction with the 4th kind of implementation of second aspect, in the 6th kind of implementation, described extraction unit also for:
By following formula, described the second frequency-region signal that repeats to change obtaining from described separative element, isolate described knock type music composition,
P2=W.*PP
P2 represents described knock type music composition.
In conjunction with the 6th kind of implementation of second aspect, in the 7th kind of implementation, described device also comprises:
Synthesis unit, isolates described harmonic wave class music composition for the first frequency-region signal obtaining from described converting unit; From frequency domain, convert isolated described knock type music composition to time domain, from frequency domain, convert described knock type music composition to time domain, and the knock type music composition after the harmonic wave class music composition after conversion and conversion is synthesized, obtain the composition of accompanying.
The third aspect, the embodiment of the present invention provides a kind for the treatment of facility of sound signal, and described equipment comprises processor and storer, and described processor is for carrying out as giving an order:
From time domain, convert the monophonic signal of song to frequency domain, obtain the first frequency-region signal, described the first frequency-region signal comprises harmonic wave class music composition, knock type music composition and voice composition;
Adopt the separated HPSS algorithm of harmonic wave class/knock type music to isolate the second frequency-region signal from described the first frequency-region signal, described the second frequency-region signal comprises described knock type music composition and described voice signal content;
Adopt the medium filtering NNMF algorithm between the most similar contiguous frames from described the second frequency-region signal, to extract described voice composition.
The beneficial effect that the technical scheme that the embodiment of the present invention provides is brought is: by accompanying song being divided for harmonic wave class music composition and knock type music composition, first adopt HPSS algorithm from song, to isolate the second frequency-region signal, the second frequency-region signal comprises knock type music composition and voice composition; Then adopt NNMF algorithm to isolate voice composition from knock type music composition, make isolated voice comparison of ingredients clean, avoided larger accompaniment residual; And, by NNMF algorithm, from the monophonic signal of song, extract voice composition, can consider the frequency distribution feature between similar frame signal, making full use of accompaniment and thering is very strong periodicity and being full of variety property of voice feature is extracted voice composition, the damage having brought to voice composition while having avoided extracting voice with the right frequency distribution feature of independent frequency, applied widely, the voice composition effect extracting is better, can meet the extraction requirement of high-quality voice.
Accompanying drawing explanation
In order to be illustrated more clearly in the technical scheme in the embodiment of the present invention, below the accompanying drawing of required use during embodiment is described is briefly described, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skills, do not paying under the prerequisite of creative work, can also obtain according to these accompanying drawings other accompanying drawing.
Fig. 1 is the process flow diagram of the disposal route of a kind of sound signal of providing of the embodiment of the present invention one;
Fig. 2 is the process flow diagram of the disposal route of another sound signal of providing of the embodiment of the present invention two;
Fig. 3 is the schematic diagram of the KTV application scenarios that provides of the embodiment of the present invention two;
Fig. 4 is the structural representation of the treating apparatus of a kind of sound signal of providing of the embodiment of the present invention three;
Fig. 5 is the structural representation of the treating apparatus of another sound signal of providing of the embodiment of the present invention four;
Fig. 6 is the structural representation of the treatment facility of a kind of sound signal of providing of the embodiment of the present invention five.
Embodiment
For making the object, technical solutions and advantages of the present invention clearer, below in conjunction with accompanying drawing, embodiment of the present invention is described further in detail.
Embodiment mono-
Referring to Fig. 1, the embodiment of the present invention provides a kind of disposal route of sound signal, and the method flow process comprises:
Step 101: convert the monophonic signal of song to frequency domain from time domain, obtain the first frequency-region signal, this first frequency-region signal comprises harmonic wave class music composition, knock type music composition and voice composition.
The monophonic signal of song can be, the left/right sound channel signal of the monophonic signal of monophony song or two-channel song.Two-channel song comprises the stereo song of two-channel.
Wherein, harmonic wave class music composition and knock type music composition form the accompaniment composition of instrument playing in song.Harmonic wave class music composition comprises the sound that the musical instrument of piano one class sends, and knock type music composition comprises tympanitic note and beats sound.
Can adopt Fast Fourier Transform (FFT) (Fast Fourier Transformation is called for short FFT) from time domain, to convert the monophonic signal of song to frequency domain.Alternatively, the sampling rate of carrying out FFT is 44.1KHz, and frame length is not less than 8192 points, and it can be 1/2nd of frame length that frame moves, and for example, the song that is 44.1kHz for sampling rate, adopts 8192 point (185.7ms) frame lengths, and 4096 point frames move, and are FFT.
Step 102: adopt harmonic wave class/knock type music separated (Harmonic/Percussive Sound Separation is called for short HPSS) algorithm to isolate the second frequency-region signal from the first frequency-region signal.
Wherein, the second frequency-region signal comprises knock type music composition and voice composition.
Wherein, HPSS algorithm comprises medium filtering HPSS algorithm and spectrum diffusion method (the complementary diffusion method of frequency spectrum).
Step 103: adopt medium filtering (Nearest Neighbours and Median Filtering the is called for short NNMF) algorithm between the most similar contiguous frames to extract voice composition from the second frequency-region signal.
The embodiment of the present invention, by accompanying song being divided for harmonic wave class music composition and knock type music composition, first adopts HPSS algorithm from song, to isolate the second frequency-region signal, and the second frequency-region signal comprises knock type music composition and voice composition; Then adopt NNMF algorithm to isolate voice composition from knock type music composition, make isolated voice comparison of ingredients clean, avoided larger accompaniment residual; And, by NNMF algorithm, from the monophonic signal of song, extract voice composition, can consider the frequency distribution feature between similar frame signal, making full use of accompaniment and thering is very strong periodicity and being full of variety property of voice feature is extracted voice composition, the damage having brought to voice composition while having avoided extracting voice with the right frequency distribution feature of independent frequency, applied widely, the voice composition effect extracting is better, can meet the extraction requirement of high-quality voice.
Embodiment bis-
Referring to Fig. 2, the embodiment of the present invention provides a kind of disposal route of sound signal, and the method flow process comprises:
Step 201: convert the monophonic signal of song to frequency domain from time domain, obtain the first frequency-region signal, this first frequency-region signal comprises harmonic wave class music composition, knock type music composition and voice composition.
Usually, song comprises monophony song and two-channel song.The monophonic signal of song is, the left/right sound channel signal of the monophonic signal of monophony song or two-channel song.
Wherein, harmonic wave class music composition and knock type music composition form the accompaniment composition of instrument playing in song.Harmonic wave class music composition comprises the sound that the musical instrument of piano one class sends, and knock type music composition comprises tympanitic note and beats sound.
Alternatively, can adopt FFT from time domain, to convert the monophonic signal of song to frequency domain, obtain the first frequency-region signal.The sampling rate of this FFT can be 44.1KHz, and frame length is not less than 8192 points, and it can be 1/2nd of frame length that frame moves.For example, the song that is 44.1kHz for sampling rate, adopts 8192 point (185.7ms) frame lengths, and 4096 point frames move, and are FFT, convert time domain to frequency domain, obtain " time-frequency band " information of two dimension.In the first frequency-region signal, there were significant differences on spectrum signature for harmonic wave class music composition and knock type music composition, and the spectrum signature of knock type music composition and voice composition approaches.
Step 202: adopt HPSS algorithm to isolate the second frequency-region signal from the first frequency-region signal.
Wherein, because HPSS algorithm can be realized the separation between the signal that frequency spectrum difference is large, therefore, can adopt HPSS algorithm to isolate harmonic wave class music composition in song.It should be noted that the characteristic based on HPSS algorithm itself, in step 201, carry out the frame length of FFT must be large (being not less than 8192 points), isolated like this knock type music composition and voice comparison of ingredients are clean.Meanwhile, in order to take into account operand, the frame that carries out FFT moves should not be too large (choose frame length 1/2nd).
Wherein, the second frequency-region signal comprises knock type music composition and voice composition.Alternatively, step 202 comprises:
Step 2021: each frequency in the first frequency-region signal is got to amplitude, obtain the first matrix.
Wherein, the first frequency-region signal is " time-frequency band " information of two dimension, supposes that this " time-frequency band " information is X(F, N), the frame number that N is time dimension, F is the frequency band number (equaling frame length) of frequency domain dimension.X(F, N) in complex representation for each element (each frequency in corresponding frequency-region signal), the amplitude and the phase information that have comprised each frequency in the first frequency-region signal.
Suppose X(F, N) in each frequency to get the first matrix obtaining after amplitude be XX(F, N).
Step 2022: each row in the first matrix are carried out to medium filtering, obtain the second matrix; And each row in the first matrix is carried out to medium filtering, obtain the 3rd matrix.
Take each row in the first matrix are carried out to medium filtering as example, introduce the process of medium filtering.Wherein, in the first matrix, each classifies F dimensional vector as.The k that supposes the first matrix classifies vector x (k) as, x(k)=and (x(k1), x(k2) ... x(kF)), x(k) carry out being output as F dimensional vector y(k after medium filtering), y(k) represent that the k of the second matrix is listed as, y(k)=(y(k1), y(k2) ..., y(kF)), and
Y(k)=median{x(k-l:k+l), l=(order-l)/2}, k=1 ..., F; Median represents to get median; Order is exponent number, can be 17.
Suppose that the second matrix is H(F, N), the 3rd matrix is P(F, N).
Step 2023: according to the second matrix and the 3rd matrix, by following formula (1), isolate the second frequency-region signal from the first frequency-region signal.
((P.*P)./((H.*H)+(P.*P))).*X (1)
Wherein, H represents the second matrix, and P represents the 3rd matrix, and X represents the first matrix ./representing some division operation .* represents point multiplication operation, matrix multiplies each other by element.
From the first frequency-region signal, isolate harmonic wave class music and become sub-signal, harmonic wave class music becomes sub-signal to be expressed as, ((H.*H) ./((H.*H)+(P.*P))) .*X.
Step 203: adopt Fast Fourier Transform Inverse (FFTI) to convert the second frequency-region signal to time domain from frequency domain, then adopt FFT to carry out time domain to the conversion of frequency domain, obtain repeating the second frequency-region signal of conversion.
Alternatively, the sampling rate of the FFT that the second frequency-region signal that obtains repeating to change adopts is 44.1KHz, and frame length is not more than 4096 points, and it can be 1/4th of frame length that frame moves.For example, the second frequency-region signal that is 44.1KHz for sampling rate, adopts 4096 point (92.8ms) frame lengths, and 1024 point frames move, and are FFT, convert the second frequency-region signal to frequency domain from time domain, obtain " time-frequency band " information of two dimension.
It should be noted that, in embodiments of the present invention, step 203 is optional step.In other embodiments, after step 202 executes, can directly perform step 204, adopt NNMF algorithm from the second frequency-region signal, to extract voice composition.
Step 204: adopt NNMF algorithm to extract voice composition from the second frequency-region signal that repeats to change.
Alternatively, step 204 comprises:
Step 2041: each frequency in the second frequency-region signal that repeats to change is got to amplitude.
Wherein, repeat " time-frequency band " information that the second frequency-region signal of conversion is two dimension, suppose that the second frequency-region signal that repeats to change is PP(F, N), to PP(F, N) in each frequency obtain the 4th matrix Z(F, N after getting amplitude), N is the frame number of time dimension, and F is the frequency band number (F can be half of frame length) of frequency domain dimension.
Step 2042: traversal repeats each frame signal in the second frequency-region signal of conversion, calculates each frame signal respectively and repeats the similarity between other frame signals except each frame signal in the second frequency-region signal of conversion.
Wherein, can represent each frame signal respectively and repeat the similarity between other frame signals except each frame signal in the second frequency-region signal of conversion with the 5th matrix.The 5th matrix is the symmetric matrix of N*N dimension, the diagonal element of the 5th matrix is set to 0(and represents each frame signal and the similarity of self), except diagonal element, the row or column of the 5th matrix is placed every frame signal according to the order of frame and is repeated the similarity between other frame signals except each frame signal in the second frequency-region signal of conversion.Suppose that the 5th matrix is D, in D k to be listed as the element that l is capable be D(k, l), k or l=1 ..., N; Have,
D(k,l)=(Z(:,k)-Z(:,l)) 2
Z(:, k) represent k column element (frequency domain information that has comprised k frame signal) in the 4th matrix, Z(:, l) represent l column element in the 4th matrix.D(k, l) represent the similarity between k column element and l column element in the 4th matrix.Easily know, similarity is higher, D(k, l) value less.
Step 2043: according to each frame signal calculating respectively and repeat the similarity between other frame signals except each frame signal in the second frequency-region signal of conversion, obtain the frequency domain spectra of the second frequency-region signal that repeats conversion and estimate.
Alternatively, step 2043 comprises: first, according to similarity, obtain the similar frame signal of the predetermined quantity of each frame signal, this similar frame signal other similarities outside being greater than the similarity of removing similar frame signal and each frame signal in similarity to the similarity of each frame signal.Particularly, can be by the similarity between each frame signal calculating and different frame signal by descending sort, and select the similarity above that is arranged in of predetermined quantity.The similarity of this similar frame signal and each frame signal is the similarity above that is arranged in of the predetermined quantity selected.Then, according to the similar frame signal of predetermined quantity, calculate the frequency domain spectra of each frame signal and estimate, in each frame signal, the frequency domain spectra of each frequency is estimated as, the intermediate value of all corresponding frequencies in the similar frame signal of the predetermined quantity of determining for each frame signal.Finally, the frequency domain spectra of each frame signal calculating is estimated form the frequency domain spectra estimation of the second frequency-region signal that repeats conversion.
For example, suppose that the present frame of traversal is i frame, predetermined quantity is 20.First, i column data in the 5th matrix (in the second frequency-region signal each frame signal and repeat the similarity between other frame signals except each frame signal in the second frequency-region signal of conversion) is sorted (the less similarity of data is higher) from small to large, obtain front 20 similarities.In corresponding the 4th matrix of the line number of front 20 similarities (repeat each frequency in the second frequency-region signal of conversion and get the matrix obtaining after amplitude) with the row number of 20 frames of i frame similarity maximum.Secondly, from the 4th matrix, extract this 20 frame signal, form the 6th matrix.In the 6th matrix, each line frequency is put each line frequency point of corresponding i frame.Then, each line frequency point of the 6th matrix is got to intermediate value, the frequency domain spectra that obtains i frame is estimated.
The frequency domain spectra of supposing to repeat the second frequency-region signal of conversion is estimated as the 7th matrix Y(F, N).
Step 2044: by following formula (2), the difference of the index normalized crosscorrelation value that the second frequency-region signal that calculating repeats conversion is right with repeating corresponding frequency between the frequency domain spectra estimation of the second frequency-region signal of conversion, and by following formula (3), according to this difference, calculate the weight that knock type music becomes sub-signal.
Q ( i , j ) = ( exp ( - ( log ( PP ( i , j ) ) - log ( Y ( i , j ) ) ) 2 2 * namda * namda ) ) 2 - - - ( 2 ) ,
W ( i , j ) = 0 , Q ( i , j ) < 0.85 1 , Q ( i , j ) &GreaterEqual; 0.85 - - - ( 3 )
Wherein, PP(i, j) represent to repeat i frequency of j frame in second frequency-region signal (the 4th matrix) of conversion; Y(i, j) frequency domain spectra that represents to repeat the second frequency-region signal of conversion estimates i frequency of j frame in (the 7th matrix); Q(i, j) represent to repeat the difference of the index normalized crosscorrelation value between i frequency of j frame in the frequency domain spectra estimation of i frequency of j frame and the second frequency-region signal that repeats to change in the second frequency-region signal of conversion, W(i, j) the knock type music that represents to repeat i frequency of j frame in the second frequency-region signal of conversion becomes the weight of sub-signal.Namda is weight factor, can get 3.
Step 2045: by following formula (4), become the weight of sub-signal according to knock type music, extract voice composition from the second frequency-region signal that repeats to change.
P1=(1-W).*PP (4)
Wherein, P1 represents voice composition, and W represents the weight of knock type music composition, and PP represents to repeat the second frequency-region signal of conversion.
Alternatively, by Fast Fourier Transform Inverse (FFTI), voice composition is converted to time domain from frequency domain, output voice time-domain signal.The voice composition that adopts NNMF algorithm to extract from repeat the second frequency-region signal of conversion, residual slight tum in knock type music composition, can play the effect that instructs of beating time for voice.
Step 2046: by following formula (5), isolate knock type music and become sub-signal from the second frequency-region signal that repeats to change.
P2=W.*PP (5)
Wherein, P2 represents that knock type music becomes sub-signal.
Step 205: isolate harmonic wave class music composition from the first frequency-region signal, from frequency domain, convert isolated harmonic wave class music composition to time domain, from frequency domain, convert knock type music composition to time domain, and the knock type music composition after the harmonic wave class music composition after conversion and conversion is synthesized, obtain the composition of accompanying.
Alternatively, the process of isolating harmonic wave class music composition from the first frequency-region signal can, referring to step 2023, not repeat them here.
Alternatively, by Fast Fourier Transform Inverse (FFTI), respectively isolated harmonic wave class music composition and knock type music composition are converted to time domain from frequency domain, and both are synthesized to accompaniment time-domain signal.
It should be noted that, the applicable scene of the method that the present embodiment provides comprises KTV application scenarios.Referring to Fig. 3, adopt in advance the method that the present embodiment provides that song files is divided into people's acoustical signal and accompaniment signal, and for song arranges three kinds of play mode, lead singing/instructional model, original singer's pattern and accompaniment pattern.Leading under sing/instructional model, user can regulate respectively accompaniment signal volume and voice signal volume, makes loudspeaker can only play voice, or the voice that loudspeaker is play has slight accompaniment.User, can be when singing under this pattern, and background has singer original singer softly to instruct, and experiences singer's the effect of singing opera arias.Under accompaniment pattern, shielding people acoustical signal, loudspeaker is only play accompaniment signal; Under original singer's pattern, loudspeaker is play people's acoustical signal and accompaniment signal.
The embodiment of the present invention, by accompanying song being divided for harmonic wave class music composition and knock type music composition, first adopts HPSS algorithm from song, to isolate the second frequency-region signal, and the second frequency-region signal comprises knock type music composition and voice composition; Then adopt NNMF algorithm to isolate voice composition from knock type music composition, make isolated voice comparison of ingredients clean; And, by NNMF algorithm, from the monophonic signal of song, extract voice composition, can consider the frequency distribution feature between similar frame signal, making full use of accompaniment and thering is very strong periodicity and being full of variety property of voice feature is extracted voice composition, the damage having brought to voice composition while having avoided extracting voice with the right frequency distribution feature of independent frequency, applied widely, the voice composition effect extracting is better, can meet the extraction requirement of high-quality voice.
Embodiment tri-
Referring to Fig. 4, the embodiment of the present invention provides a kind for the treatment of apparatus of sound signal, and this device comprises:
Converting unit 401, for converting the monophonic signal of song to frequency domain from time domain, obtains the first frequency-region signal.
This first frequency-region signal comprises harmonic wave class music composition, knock type music composition and voice composition.
The monophonic signal of song can be the left/right sound channel signal of two-channel song or the monophonic signal of monophony song.
Separative element 402, for adopting the first frequency-region signal that HPSS algorithm is converted to from converting unit 401 to isolate the second frequency-region signal, the second frequency-region signal comprises and knocks knock type music composition and voice composition.
Extraction unit 403, for adopting NNMF algorithm to extract voice composition from isolated the second frequency-region signal of separative element 402.
The embodiment of the present invention, by accompanying song being divided for harmonic wave class music composition and knock type music composition, first adopts HPSS algorithm from song, to isolate the second frequency-region signal, and the second frequency-region signal comprises knock type music composition and voice composition; Then adopt NNMF algorithm to isolate voice composition from knock type music composition, make isolated voice comparison of ingredients clean, avoided larger accompaniment residual; And, by NNMF algorithm, from the monophonic signal of song, extract voice composition, can consider the frequency distribution feature between similar frame signal, making full use of accompaniment and thering is very strong periodicity and being full of variety property of voice feature is extracted voice composition, the damage having brought to voice composition while having avoided extracting voice with the right frequency distribution feature of independent frequency, applied widely, the voice composition effect extracting is better, can meet the extraction requirement of high-quality voice.
Embodiment tetra-
Referring to Fig. 5, the embodiment of the present invention provides a kind for the treatment of apparatus of sound signal, and this device comprises:
Converting unit 501, for converting the monophonic signal of song to frequency domain from time domain, obtains the first frequency-region signal.
Alternatively, this converting unit 501 for, adopt FFT from time domain, to convert the monophonic signal of song to frequency domain, obtain the first frequency-region signal, the sampling rate of this FFT is 44.1KHz, frame length is not less than 8192 points, it can be 1/2nd of frame length that frame moves.
This first frequency-region signal comprises harmonic wave class music composition, knock type music composition and voice composition.
The monophonic signal of song can be the left/right sound channel signal of two-channel song or the monophonic signal of monophony song.
Separative element 502, for adopting the first frequency-region signal that HPSS algorithm obtains from converting unit 501 to isolate the second frequency-region signal, the second frequency-region signal comprises and knocks knock type music composition and voice composition.
Alternatively, separative element 502 for: each frequency of the first frequency-region signal that converting unit 501 is obtained is got amplitude, obtains the first matrix; Each row in the first matrix are carried out to medium filtering, obtain the second matrix, and each row of the first matrix is carried out to medium filtering, obtain the 3rd matrix; According to the second matrix and the 3rd matrix, by following formula (1), from the first frequency-region signal, isolate the second frequency-region signal; The second frequency-region signal comprises and knocks knock type music composition and voice composition;
((P.*P)./((H.*H)+(P.*P))).*X (1)
H represents the second matrix, and P represents the 3rd matrix, and X represents the first matrix ./representing some division operation .* represents point multiplication operation.
Extraction unit 503, for adopting NNMF algorithm to extract voice composition from isolated the second frequency-region signal of separative element 502.
Alternatively, separative element 502 also for: adopt Fast Fourier Transform Inverse (FFTI) from frequency domain, to convert the second frequency-region signal to time domain, then adopt FFT to carry out time domain to the conversion of frequency domain, obtain repeating the second frequency-region signal of conversion; The sampling rate of the FFT that the second frequency-region signal that obtains repeating to change adopts is 44.1KHz, and frame length is not more than 4096 points, frame move for obtain repeating the FFT that the second frequency-region signal of conversion adopts frame length 1/4th.
Alternatively, extraction unit 503 for, the second frequency-region signal that repeats conversion that adopts NNMF algorithm to obtain from separative element 502, extract voice composition.
Alternatively, extraction unit 503 comprises:
First obtains subelement 5031, for each frequency of the second frequency-region signal that repeats conversion that separative element 502 is obtained, gets amplitude.
The first computation subunit 5032, for traveling through each frame signal of the second frequency-region signal that repeats conversion, calculates each frame signal respectively and repeats the similarity between other frame signals except each frame signal in the second frequency-region signal of conversion.
Second obtains subelement 5033, for the similarity calculating according to the first computation subunit 5032, obtains the frequency domain spectra of the second frequency-region signal and estimates.
Alternatively, second obtains subelement 5033 for the similarity calculating according to the first computation subunit 5032, obtains the similar frame signal of the predetermined quantity of each frame signal; According to the similar frame signal of predetermined quantity, calculate the frequency domain spectra of each frame signal and estimate; The frequency domain spectra of each frame signal calculating is estimated to form to the frequency domain spectra estimation of the second frequency-region signal that repeats conversion.
The second computation subunit 5034, for passing through following formula (2), the second frequency-region signal and second that calculating repeats conversion obtains the difference of the right index normalized crosscorrelation value of corresponding frequency between the frequency domain spectra estimation of the second frequency-region signal that repeats conversion that subelement 5033 obtains, and by following formula (3), according to this difference, calculate the weight that knock type music becomes sub-signal, this computing formula is
Q ( i , j ) = ( exp ( - ( log ( PP ( i , j ) ) - log ( Y ( i , j ) ) ) 2 2 * namda * namda ) ) 2 - - - ( 2 ) , W ( i , j ) = 0 , Q ( i , j ) < 0.85 1 , Q ( i , j ) &GreaterEqual; 0.85 - - - ( 3 )
Wherein, PP(i, j) represent to repeat i frequency of j frame in the second frequency-region signal of conversion; Y(i, j) frequency domain spectra that represents to repeat the second frequency-region signal of conversion estimate in i frequency of j frame; Q(i, j) represent to repeat the difference of the index normalized crosscorrelation value between i frequency of j frame in the frequency domain spectra estimation of i frequency of j frame and the second frequency-region signal that repeats to change in the second frequency-region signal of conversion, W(i, j) weight of the knock type music composition of i frequency of j frame in the second frequency-region signal that represents to repeat to change, namda is weight factor, namda=3.
Extract subelement 5035, for by following formula (4), according to the weight of the knock type music composition of the second computation subunit 5034 calculating, from the second frequency-region signal that repeats to change, extract voice composition.
P1=(1-W).*PP (4)
P1 represents voice composition, and W represents the weight of knock type music composition, and PP represents to repeat the second frequency-region signal of conversion.
Alternatively, extract subelement 5035 also for: by following formula (5), according to the weight of knock type music composition, from repeat the second frequency-region signal of conversion, isolate knock type music composition.
P2=W.*PP (5)
P2 represents that knock type music becomes sub-signal.
Alternatively, this device also comprises:
Synthesis unit 504, isolates harmonic wave class music composition for the first frequency-region signal obtaining from converting unit 501; From frequency domain, convert isolated harmonic wave class music composition to time domain, from frequency domain, convert knock type music composition to time domain, and the knock type music composition after the harmonic wave class music composition after conversion and conversion is synthesized, obtain the composition of accompanying.
The embodiment of the present invention, by accompanying song being divided for harmonic wave class music composition and knock type music composition, first adopts HPSS algorithm from song, to isolate the second frequency-region signal, and the second frequency-region signal comprises knock type music composition and voice composition; Then adopt NNMF algorithm to isolate voice composition from knock type music composition, make isolated voice comparison of ingredients clean, avoided larger accompaniment residual; And, by NNMF algorithm, from the monophonic signal of song, extract voice composition, can consider the frequency distribution feature between similar frame signal, making full use of accompaniment and thering is very strong periodicity and being full of variety property of voice feature is extracted voice composition, the damage having brought to voice composition while having avoided extracting voice with the right frequency distribution feature of independent frequency, applied widely, the voice composition effect extracting is better, can meet the extraction requirement of high-quality voice.
Embodiment five
The embodiment of the present invention provides a kind for the treatment of facility of sound signal, and this equipment can be a kind of computing machine (computer system that comprises hand-held form, as smart mobile phone, panel computer etc.) or server, as shown in Figure 6.It generally comprises for example CPU of at least one processor 10(), user interface 11, at least one network interface 12 or other communication interfaces, storer 13 and at least one communication bus 14.The structure that it will be understood by those skilled in the art that the computing machine shown in Fig. 6 does not form the restriction to computing machine, and it can comprise the parts more more or less than diagram, or combines some parts, or different parts are arranged.
Below in conjunction with Fig. 6, each component parts of this equipment is carried out to concrete introduction:
Communication bus 14 is for realizing the connection communication between processor 10, storer 13 and communication interface.
At least one network interface 12(can be wired or wireless) realize the communication connection between this equipment and at least one other computing machine or server, can use internet, wide area network, local network, Metropolitan Area Network (MAN) etc.
Storer 13 can be used for storing software program and application module, and processor 10 is stored in software program and the application module of storer 13 by operation, thus various function application and the data processing of actuating equipment.Storer 13 can mainly comprise storage program district and storage data field, wherein, and the application program that storage program district can storage operation system, at least one function (such as moving HPSS algorithm) is required etc.; The data (such as the frequency-region signal that is stored in buffer memory) that create according to the use of equipment etc. can be stored in storage data field.In addition, storer 13 can comprise high-speed RAM (Random Access Memory, random access memory), can also comprise nonvolatile memory (non-volatile memory), for example at least one disk memory, flush memory device or other volatile solid-state parts.
User interface 10, includes but not limited to output device and input equipment.Wherein, input equipment generally includes keyboard and pointing device (for example, mouse, trace ball (trackball), touch-sensitive plate or touch sensitive display screen).Wherein, output device generally includes the equipment that display, printer and projector etc. represent computerized information.Display can be used for showing the information of being inputted by user or the file that offers user etc.; Keyboard and pointing device can be used for receiving numeral or the character information of input, and generation arranges with the user of equipment and function is controlled the input of relevant signal, such as obtaining operational order that user sends according to operation indicating etc.
Processor 10 is control centers of equipment, utilize the various piece of various interface and the whole equipment of connection, by moving or carry out software program and/or the application module being stored in storer 13, and call the data that are stored in storer 13, various functions and the deal with data of actuating equipment, thus equipment is carried out to integral monitoring.
Particularly, by moving or carry out software program and/or the application module being stored in storer 13, and call the data that are stored in storer 13, processor 10 can be realized, from time domain, convert the monophonic signal of song to frequency domain, obtain the first frequency-region signal, this first frequency-region signal comprises harmonic wave harmonic wave class music composition, knocks knock type music composition and voice composition; Adopt HPSS algorithm from the first frequency-region signal, to isolate the second frequency-region signal, the second frequency-region signal comprises and knocks knock type music composition and voice composition; Adopt NNMF algorithm from the second frequency-region signal, to extract voice composition.
Alternatively, processor 10 can be realized, and each frequency in the first frequency-region signal is got to amplitude, obtains the first matrix; Each row in the first matrix are carried out to medium filtering, obtain the second matrix, each row in the first matrix is carried out to medium filtering, obtain the 3rd matrix; According to the second matrix and the 3rd matrix, by following formula (1), from the first frequency-region signal, isolate the second frequency-region signal.
((P.*P)./((H.*H)+(P.*P))).*X (1)
H represents the second matrix, and P represents the 3rd matrix, and X represents the first matrix ./representing some division operation, dot product representing matrix multiplies each other by element.
Alternatively, processor 10 can be realized, and adopts FFT from time domain, to convert the monophonic signal of song to frequency domain, obtains the first frequency-region signal; The sampling rate of FFT is 44.1KHz, and frame length is not less than 8192 points, and frame moves half into frame length.
Alternatively, processor 10 can be realized, and adopts Fast Fourier Transform Inverse (FFTI) from frequency domain, to convert the second frequency-region signal to time domain, then adopts FFT to carry out time domain to the conversion of frequency domain, obtains repeating the second frequency-region signal of conversion; The sampling rate of the FFT that the second frequency-region signal that obtains repeating to change adopts is 44.1KHz, and frame length is not more than 4096 points, frame move for obtain repeating the FFT that the second frequency-region signal of conversion adopts frame length 1/4th; Adopt NNMF algorithm to extract voice composition from the second frequency-region signal that repeats to change.
Alternatively, processor 10 can be realized, and each frequency in the second frequency-region signal that repeats to change is got to amplitude; Travel through each frame signal in this second frequency-region signal that repeats conversion, calculate each frame signal respectively and repeat the similarity between other frame signals except each frame signal in the second frequency-region signal of conversion; According to similarity, obtain the frequency domain spectra of the second frequency-region signal that repeats conversion and estimate; By following formula (2), the difference of the index normalized crosscorrelation value that the second frequency-region signal that calculating repeats conversion is right with repeating corresponding frequency between the frequency domain spectra estimation of the second frequency-region signal of conversion, and by following formula (3), according to this difference, calculate the weight that knock type music becomes sub-signal; By following formula (4), according to knock type music, become the weight of sub-signal, from the second frequency-region signal that repeats to change, extract voice composition;
Q ( i , j ) = ( exp ( - ( log ( PP ( i , j ) ) - log ( Y ( i , j ) ) ) 2 2 * namda * namda ) ) 2 - - - ( 2 ) ,
W ( i , j ) = 0 , Q ( i , j ) < 0.85 1 , Q ( i , j ) &GreaterEqual; 0.85 - - - ( 3 )
P1=(1-W).*PP (4)
Wherein, PP(i, j) represent to repeat i frequency of j frame in the second frequency-region signal of conversion; Y(i, j) frequency domain spectra that represents to repeat the second frequency-region signal of conversion estimate in i frequency of j frame; Q(i, j) represent to repeat the difference of the index normalized crosscorrelation value between i frequency of j frame in the frequency domain spectra estimation of i frequency of j frame and the second frequency-region signal that repeats to change in the second frequency-region signal of conversion, W(i, j) weight of the knock type music composition of i frequency of j frame in the second frequency-region signal that represents to repeat to change, namda is weight factor, namda=3.P1 represents voice composition, and W represents the weight of knock type music composition, and PP represents to repeat the second frequency-region signal of conversion.
Alternatively, processor 10 can be realized, and according to similarity, obtains the similar frame signal of the predetermined quantity of each frame signal; According to the similar frame signal of predetermined quantity, calculate the frequency domain spectra of each frame signal and estimate; The frequency domain spectra of each frame signal calculating is estimated to form to the frequency domain spectra estimation of the second frequency-region signal that repeats conversion.
Alternatively, processor 10 can be realized, and by following formula (5), according to the weight of knock type music composition, isolates knock type music composition from the second frequency-region signal that repeats to change:
P2=W.*PP (5)
P2 represents that knock type music becomes sub-signal.
Alternatively, processor 10 can be realized, and from the first frequency-region signal, isolates harmonic wave class music composition; Isolated harmonic wave class music composition is converted to time domain from frequency domain, knock type music composition is converted to time domain from frequency-region signal, and the knock type music composition after the harmonic wave class music composition after conversion and conversion is synthesized, obtain the composition of accompanying.
The embodiment of the present invention, by accompanying song being divided for harmonic wave class music composition and knock type music composition, first adopts HPSS algorithm from song, to isolate the second frequency-region signal, and the second frequency-region signal comprises knock type music composition and voice composition; Then adopt NNMF algorithm to isolate voice composition from knock type music composition, make isolated voice comparison of ingredients clean, avoided larger accompaniment residual; And, by NNMF algorithm, from the monophonic signal of song, extract voice composition, can consider the frequency distribution feature between similar frame signal, making full use of accompaniment and thering is very strong periodicity and being full of variety property of voice feature is extracted voice composition, the damage having brought to voice composition while having avoided extracting voice with the right frequency distribution feature of independent frequency, applied widely, the voice composition effect extracting is better, can meet the extraction requirement of high-quality voice.
It should be noted that: the treating apparatus of the sound signal that above-described embodiment provides is when extracting voice, only the division with above-mentioned each functional module is illustrated, in practical application, can above-mentioned functions be distributed and by different functional modules, completed as required, the inner structure of the equipment of being about to is divided into different functional modules, to complete all or part of function described above.In addition, the treating apparatus of the sound signal that above-described embodiment provides and the disposal route embodiment of sound signal belong to same design, and its specific implementation process refers to embodiment of the method, repeats no more here.
The invention described above embodiment sequence number, just to describing, does not represent the quality of embodiment.
One of ordinary skill in the art will appreciate that all or part of step that realizes above-described embodiment can complete by hardware, also can come the hardware that instruction is relevant to complete by program, described program can be stored in a kind of computer-readable recording medium, the above-mentioned storage medium of mentioning can be ROM (read-only memory), disk or CD etc.
The foregoing is only preferred embodiment of the present invention, in order to limit the present invention, within the spirit and principles in the present invention not all, any modification of doing, be equal to replacement, improvement etc., within all should being included in protection scope of the present invention.

Claims (17)

1. a disposal route for sound signal, is characterized in that, described method comprises:
From time domain, convert the monophonic signal of song to frequency domain, obtain the first frequency-region signal, described the first frequency-region signal comprises harmonic wave class music composition, knock type music composition and voice composition;
Adopt the separated HPSS algorithm of harmonic wave class/knock type music to isolate the second frequency-region signal from described the first frequency-region signal, described the second frequency-region signal comprises described harmonic wave class music composition and described voice composition;
Adopt the medium filtering NNMF algorithm between the most similar consecutive frame from described the second frequency-region signal, to extract described voice composition.
2. method according to claim 1, is characterized in that, the separated HPSS algorithm of described employing harmonic wave class/knock type music is isolated the second frequency-region signal from described the first frequency-region signal, comprising:
Each frequency in described the first frequency-region signal is got to amplitude, obtain the first matrix;
Each row in described the first matrix are carried out to medium filtering, obtain the second matrix;
Each row in described the first matrix is carried out to medium filtering, obtain the 3rd matrix;
According to described the second matrix and described the 3rd matrix, by following formula, from described the first frequency-region signal, isolate described the second frequency-region signal;
((P.*P)./((H.*H)+(P.*P))).*X
H represents described the second matrix, and P represents described the 3rd matrix, and X represents described the first matrix ./representing some division operation .* represents point multiplication operation.
3. method according to claim 1 and 2, is characterized in that, the described monophonic signal by song converts frequency domain to from time domain, obtains the first frequency-region signal, comprising:
Adopt Fast Fourier Transform (FFT) FFT from time domain, to convert the monophonic signal of described song to frequency domain, obtain described the first frequency-region signal; The sampling rate of described FFT is 44.1KHz, and frame length is not less than 8192 points, and frame moves half into described frame length.
4. according to the arbitrary described method of claims 1 to 3, it is characterized in that, described medium filtering NNMF algorithm between the most similar consecutive frame of employing also comprised extract described voice composition from described the second frequency-region signal before:
Adopt Fast Fourier Transform Inverse (FFTI) from frequency domain, to convert described the second frequency-region signal to time domain, then adopt FFT to carry out time domain to the conversion of frequency domain, obtain repeating the second frequency-region signal of conversion; The sampling rate that obtains the FFT that described the second frequency-region signal that repeats to change adopts is 44.1KHz, and frame length is not more than 4096 points, frame move for obtain the FFT that described the second frequency-region signal that repeats conversion adopts frame length 1/4th;
Medium filtering NNMF algorithm between the most similar consecutive frame of described employing extracts described voice composition from described the second frequency-region signal, comprising:
Adopt NNMF algorithm to extract described voice composition from described the second frequency-region signal that repeats to change.
5. method according to claim 4, is characterized in that, described employing NNMF algorithm extracts described voice composition from described the second frequency-region signal that repeats to change, and comprising:
Each frequency in described the second frequency-region signal that repeats to change is got to amplitude;
Travel through each frame signal in described the second frequency-region signal that repeats conversion, calculate each frame signal respectively and described the second frequency-region signal that repeats conversion in similarity between other frame signals except described each frame signal;
According to described similarity, obtain the described frequency domain spectra that repeats the second frequency-region signal of conversion and estimate;
By following formula, calculate the difference of the right index normalized crosscorrelation value of corresponding frequency between the frequency domain spectra estimation of described the second frequency-region signal that repeats conversion and described the second frequency-region signal that repeats conversion,
Q ( i , j ) = ( exp ( - ( log ( PP ( i , j ) ) - log ( Y ( i , j ) ) ) 2 2 * namda * namda ) ) 2
And by following formula, according to described difference, calculate the weight of described knock type music composition;
W ( i , j ) = 0 , Q ( i , j ) < 0.85 1 , Q ( i , j ) &GreaterEqual; 0.85
PP(i, j) represent described repeat conversion the second frequency-region signal in i frequency of j frame; Y(i, j) represent the described frequency domain spectra that repeats the second frequency-region signal of conversion estimate in i frequency of j frame; Q(i, j) represent the difference of the index normalized crosscorrelation value between i frequency of j frame in i frequency of j frame in described the second frequency-region signal that repeats conversion and the described frequency domain spectra estimation that repeats the second frequency-region signal of changing, W(i, j) represent the weight of the knock type music composition of i frequency of j frame in described the second frequency-region signal that repeats to change, namda is weight factor, namda=3;
By following formula, according to the weight of described knock type music composition, from described the second frequency-region signal that repeats to change, extract described voice composition;
P1=(1-W).*PP
P1 represents described voice composition, and W represents the weight of described knock type music composition, and PP represents described the second frequency-region signal that repeats conversion.
6. method according to claim 5, is characterized in that, described according to similarity, obtains the described frequency domain spectra that repeats the second frequency-region signal of conversion and estimates, comprising:
According to described similarity, obtain the similar frame signal of the predetermined quantity of described each frame signal;
According to the similar frame signal of described predetermined quantity, calculate the frequency domain spectra of described each frame signal and estimate;
The frequency domain spectra of described each frame signal calculating is estimated to form the described frequency domain spectra that repeats the second frequency-region signal of conversion to be estimated.
7. method according to claim 5, is characterized in that, described according to the weight of described knock type music composition, from described the second frequency-region signal that repeats to change, extracts after described voice composition, also comprises:
By following formula, from described the second frequency-region signal that repeats to change, isolate described knock type music composition,
P2=W.*PP
P2 represents described knock type music composition.
8. method according to claim 7, is characterized in that, described from described repeat conversion the second frequency-region signal isolate after described knock type music composition, also comprise:
From described the first frequency-region signal, isolate described harmonic wave class music composition;
From frequency domain, convert described isolated described harmonic wave class music composition to time domain, from frequency domain, convert described knock type music composition to time domain, and the knock type music composition after the harmonic wave class music composition after conversion and conversion is synthesized, obtain the composition of accompanying.
9. a treating apparatus for sound signal, is characterized in that, described device comprises:
Converting unit, for converting the monophonic signal of song to frequency domain from time domain, obtains the first frequency-region signal, and described the first frequency-region signal comprises harmonic wave class music composition, knock type music composition and voice composition;
Separative element, isolates described the second frequency-region signal for the first frequency-region signal that adopts the separated HPSS algorithm of harmonic wave class/knock type music to obtain from described converting unit, and described the second frequency-region signal comprises described class music composition and described voice composition;
Extraction unit, for adopting the medium filtering NNMF algorithm between the most similar contiguous frames to extract described voice composition from isolated the second frequency-region signal of described separative element.
10. device according to claim 9, is characterized in that, described separative element specifically for:
In the first frequency-region signal that described converting unit is obtained, each frequency is got amplitude, obtains the first matrix;
Each row in described the first matrix are carried out to medium filtering, obtain the second matrix, and each row in described the first matrix is carried out to medium filtering, obtain the 3rd matrix;
According to described the second matrix and described the 3rd matrix, by following formula, from described the first frequency-region signal, isolate described the second frequency-region signal;
((P.*P)./((H.*H)+(P.*P))).*X
H represents described the second matrix, and P represents described the 3rd matrix, and X represents described the first matrix ./representing some division operation .* represents point multiplication operation.
11. according to the device described in claim 9 or 10, it is characterized in that, described converting unit specifically for:
Adopt Fast Fourier Transform (FFT) FFT from time domain, to convert the monophonic signal of described song to frequency domain, obtain described the first frequency-region signal; The sampling rate of described FFT is 44.1KHz, and frame length is not less than 8192 points, and frame moves half into described frame length.
12. according to the arbitrary described device of claim 9-11, it is characterized in that, described separative element also for:
Adopt Fast Fourier Transform Inverse (FFTI) from frequency domain, to convert described the second frequency-region signal to time domain, then adopt FFT to carry out time domain to the conversion of frequency domain, obtain repeating the second frequency-region signal of conversion; The sampling rate that obtains the FFT that described the second frequency-region signal that repeats to change adopts is 44.1KHz, and frame length is not more than 4096 points, frame move for obtain the FFT that described the second frequency-region signal that repeats conversion adopts frame length 1/4th;
Described converting unit specifically for:
Described the second frequency-region signal that repeats to change that adopts described NNMF algorithm to obtain from described separative element, extract described voice composition.
13. devices according to claim 12, is characterized in that, described extraction unit comprises:
First obtains subelement, for described each frequency of the second frequency-region signal that repeats conversion that described separative element is obtained, gets amplitude;
The first computation subunit, for traveling through described each frame signal of the second frequency-region signal that repeats conversion, calculate each frame signal respectively and described the second frequency-region signal that repeats conversion in similarity between other frame signals except described each frame signal;
Second obtains subelement, for the similarity calculating according to described the first computation subunit, obtains the described frequency domain spectra that repeats the second frequency-region signal of conversion and estimates;
The second computation subunit, for passing through following formula, the difference of the right index normalized crosscorrelation value of corresponding frequency between the frequency domain spectra estimation of described the second frequency-region signal that repeats conversion that calculates that described the second frequency-region signal and described second that repeats conversion obtains that subelement obtains
Q ( i , j ) = ( exp ( - ( log ( PP ( i , j ) ) - log ( Y ( i , j ) ) ) 2 2 * namda * namda ) ) 2
And by following formula, according to described difference, calculate the weight of described knock type music signal;
W ( i , j ) = 0 , Q ( i , j ) < 0.85 1 , Q ( i , j ) &GreaterEqual; 0.85
Wherein, PP(i, j) represent described repeat conversion the second frequency-region signal in i frequency of j frame; Y(i, j) represent the described frequency domain spectra that repeats the second frequency-region signal of conversion estimate in i frequency of j frame; Q(i, j) represent the difference of the index normalized crosscorrelation value between i frequency of j frame in i frequency of j frame in described the second frequency-region signal that repeats conversion and the described frequency domain spectra estimation that repeats the second frequency-region signal of changing, W(i, j) represent the weight of the knock type music composition of i frequency of j frame in described the second frequency-region signal that repeats to change, namda is weight factor, namda=3;
Extract subelement, for by following formula, according to the weight of the knock type music composition of described the second computation subunit calculating, from described the second frequency-region signal that repeats to change, extract described voice composition;
P1=(1-W).*PP
P1 represents described voice composition, and W represents the weight of described knock type music composition, and PP represents described the second frequency-region signal that repeats conversion.
14. devices according to claim 13, is characterized in that, described second obtain subelement specifically for:
According to described similarity, obtain the similar frame signal of the predetermined quantity of described each frame signal;
According to the similar frame signal of described predetermined quantity, calculate the frequency domain spectra of described each frame signal and estimate;
The frequency domain spectra of described each frame signal calculating is estimated to form the described frequency domain spectra that repeats the second frequency-region signal of conversion to be estimated.
15. devices according to claim 13, is characterized in that, described extraction unit also for:
By following formula, described the second frequency-region signal that repeats to change obtaining from described separative element, isolate described knock type music composition,
P2=W.*PP
P2 represents described knock type music composition.
16. devices according to claim 15, is characterized in that, described device also comprises:
Synthesis unit, isolates described harmonic wave class music composition for the first frequency-region signal obtaining from described converting unit; From frequency domain, convert isolated described knock type music composition to time domain, from frequency domain, convert described knock type music composition to time domain, and the knock type music composition after the harmonic wave class music composition after conversion and conversion is synthesized, obtain the composition of accompanying.
The treatment facility of 17. 1 kinds of sound signals, is characterized in that, described equipment comprises processor and storer, and described processor is for carrying out as giving an order:
From time domain, convert the monophonic signal of song to frequency domain, obtain the first frequency-region signal, described the first frequency-region signal comprises harmonic wave class music composition, knock type music composition and voice composition;
Adopt the separated HPSS algorithm of harmonic wave class/knock type music to isolate the second frequency-region signal from described the first frequency-region signal, described the second frequency-region signal comprises described knock type music composition and described voice signal content;
Adopt the medium filtering NNMF algorithm between the most similar contiguous frames from described the second frequency-region signal, to extract described voice composition.
CN201310587304.8A 2013-11-20 2013-11-20 Method, device and equipment for processing audio signals Withdrawn CN103680517A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310587304.8A CN103680517A (en) 2013-11-20 2013-11-20 Method, device and equipment for processing audio signals

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310587304.8A CN103680517A (en) 2013-11-20 2013-11-20 Method, device and equipment for processing audio signals

Publications (1)

Publication Number Publication Date
CN103680517A true CN103680517A (en) 2014-03-26

Family

ID=50317870

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310587304.8A Withdrawn CN103680517A (en) 2013-11-20 2013-11-20 Method, device and equipment for processing audio signals

Country Status (1)

Country Link
CN (1) CN103680517A (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104053120A (en) * 2014-06-13 2014-09-17 福建星网视易信息系统有限公司 Method and device for processing stereo audio frequency
CN104616663A (en) * 2014-11-25 2015-05-13 重庆邮电大学 Music separation method of MFCC (Mel Frequency Cepstrum Coefficient)-multi-repetition model in combination with HPSS (Harmonic/Percussive Sound Separation)
CN105590633A (en) * 2015-11-16 2016-05-18 福建省百利亨信息科技有限公司 Method and device for generation of labeled melody for song scoring
CN106024005A (en) * 2016-07-01 2016-10-12 腾讯科技(深圳)有限公司 Processing method and apparatus for audio data
CN107146630A (en) * 2017-04-27 2017-09-08 同济大学 A kind of binary channels language separation method based on STFT
EP3220386A1 (en) * 2016-03-18 2017-09-20 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus and method for harmonic-percussive-residual sound separation using a structure tensor on spectrograms
CN107705778A (en) * 2017-08-23 2018-02-16 腾讯音乐娱乐(深圳)有限公司 Audio-frequency processing method, device, storage medium and terminal
CN108335703A (en) * 2018-03-28 2018-07-27 腾讯音乐娱乐科技(深圳)有限公司 The method and apparatus for determining the stress position of audio data
CN108962229A (en) * 2018-07-26 2018-12-07 汕头大学 A kind of target speaker's voice extraction method based on single channel, unsupervised formula
CN110097895A (en) * 2019-05-14 2019-08-06 腾讯音乐娱乐科技(深圳)有限公司 A kind of absolute music detection method, device and storage medium
CN110232931A (en) * 2019-06-18 2019-09-13 广州酷狗计算机科技有限公司 The processing method of audio signal, calculates equipment and storage medium at device
CN111145726A (en) * 2019-10-31 2020-05-12 南京励智心理大数据产业研究院有限公司 Deep learning-based sound scene classification method, system, device and storage medium
CN111724757A (en) * 2020-06-29 2020-09-29 腾讯音乐娱乐科技(深圳)有限公司 Audio data processing method and related product
CN112053669A (en) * 2020-08-27 2020-12-08 海信视像科技股份有限公司 Method, device, equipment and medium for eliminating human voice
CN112597331A (en) * 2020-12-25 2021-04-02 腾讯音乐娱乐科技(深圳)有限公司 Method, device, equipment and storage medium for displaying range matching information

Cited By (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104053120B (en) * 2014-06-13 2016-03-02 福建星网视易信息系统有限公司 A kind of processing method of stereo audio and device
CN104053120A (en) * 2014-06-13 2014-09-17 福建星网视易信息系统有限公司 Method and device for processing stereo audio frequency
CN104616663A (en) * 2014-11-25 2015-05-13 重庆邮电大学 Music separation method of MFCC (Mel Frequency Cepstrum Coefficient)-multi-repetition model in combination with HPSS (Harmonic/Percussive Sound Separation)
CN105590633A (en) * 2015-11-16 2016-05-18 福建省百利亨信息科技有限公司 Method and device for generation of labeled melody for song scoring
CN109247030B (en) * 2016-03-18 2023-03-10 弗劳恩霍夫应用研究促进协会 Apparatus and method for harmonic-percussion-residual sound separation using structure tensor on spectrogram
US10770051B2 (en) 2016-03-18 2020-09-08 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus and method for harmonic-percussive-residual sound separation using a structure tensor on spectrograms
CN109247030A (en) * 2016-03-18 2019-01-18 弗劳恩霍夫应用研究促进协会 Harmonic wave-percussion music-remnant voice separation device and method are carried out using the structure tensor on spectrogram
EP3220386A1 (en) * 2016-03-18 2017-09-20 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus and method for harmonic-percussive-residual sound separation using a structure tensor on spectrograms
CN106024005A (en) * 2016-07-01 2016-10-12 腾讯科技(深圳)有限公司 Processing method and apparatus for audio data
CN106024005B (en) * 2016-07-01 2018-09-25 腾讯科技(深圳)有限公司 A kind of processing method and processing device of audio data
WO2018001039A1 (en) * 2016-07-01 2018-01-04 腾讯科技(深圳)有限公司 Audio data processing method and apparatus
US10770050B2 (en) 2016-07-01 2020-09-08 Tencent Technology (Shenzhen) Company Limited Audio data processing method and apparatus
CN107146630A (en) * 2017-04-27 2017-09-08 同济大学 A kind of binary channels language separation method based on STFT
CN107146630B (en) * 2017-04-27 2020-02-14 同济大学 STFT-based dual-channel speech sound separation method
CN107705778A (en) * 2017-08-23 2018-02-16 腾讯音乐娱乐(深圳)有限公司 Audio-frequency processing method, device, storage medium and terminal
CN107705778B (en) * 2017-08-23 2020-09-15 腾讯音乐娱乐(深圳)有限公司 Audio processing method, device, storage medium and terminal
CN108335703B (en) * 2018-03-28 2020-10-09 腾讯音乐娱乐科技(深圳)有限公司 Method and apparatus for determining accent position of audio data
CN108335703A (en) * 2018-03-28 2018-07-27 腾讯音乐娱乐科技(深圳)有限公司 The method and apparatus for determining the stress position of audio data
CN108962229A (en) * 2018-07-26 2018-12-07 汕头大学 A kind of target speaker's voice extraction method based on single channel, unsupervised formula
CN108962229B (en) * 2018-07-26 2020-11-13 汕头大学 Single-channel and unsupervised target speaker voice extraction method
CN110097895A (en) * 2019-05-14 2019-08-06 腾讯音乐娱乐科技(深圳)有限公司 A kind of absolute music detection method, device and storage medium
CN110232931A (en) * 2019-06-18 2019-09-13 广州酷狗计算机科技有限公司 The processing method of audio signal, calculates equipment and storage medium at device
CN111145726A (en) * 2019-10-31 2020-05-12 南京励智心理大数据产业研究院有限公司 Deep learning-based sound scene classification method, system, device and storage medium
CN111145726B (en) * 2019-10-31 2022-09-23 南京励智心理大数据产业研究院有限公司 Deep learning-based sound scene classification method, system, device and storage medium
CN111724757A (en) * 2020-06-29 2020-09-29 腾讯音乐娱乐科技(深圳)有限公司 Audio data processing method and related product
CN112053669A (en) * 2020-08-27 2020-12-08 海信视像科技股份有限公司 Method, device, equipment and medium for eliminating human voice
CN112053669B (en) * 2020-08-27 2023-10-27 海信视像科技股份有限公司 Method, device, equipment and medium for eliminating human voice
CN112597331A (en) * 2020-12-25 2021-04-02 腾讯音乐娱乐科技(深圳)有限公司 Method, device, equipment and storage medium for displaying range matching information

Similar Documents

Publication Publication Date Title
CN103680517A (en) Method, device and equipment for processing audio signals
Cano et al. Musical source separation: An introduction
EP3522151B1 (en) Method and device for processing dual-source audio data
CN102402977B (en) Accompaniment, the method for voice and device thereof is extracted from stereo music
CN104134444B (en) A kind of song based on MMSE removes method and apparatus of accompanying
CN104538011A (en) Tone adjusting method and device and terminal device
CN104143324B (en) A kind of musical tone recognition method
CN103426436A (en) Source separation by independent component analysis in conjuction with optimization of acoustic echo cancellation
Fitzgerald Upmixing from mono-a source separation approach
CN110120212B (en) Piano auxiliary composition system and method based on user demonstration audio frequency style
CN107274876A (en) A kind of audition paints spectrometer
CN103943113A (en) Method and device for removing accompaniment from song
US20160027421A1 (en) Audio signal analysis
CN106375780A (en) Method and apparatus for generating multimedia file
CN113921022A (en) Audio signal separation method, device, storage medium and electronic equipment
CN107146630B (en) STFT-based dual-channel speech sound separation method
Vinitha George et al. A novel U-Net with dense block for drum signal separation from polyphonic music signal mixture
Oh et al. Spectrogram-channels u-net: a source separation model viewing each channel as the spectrogram of each source
Dobashi et al. A music performance assistance system based on vocal, harmonic, and percussive source separation and content visualization for music audio signals
Chen et al. Multi-scale temporal-frequency attention for music source separation
CN105280178A (en) audio signal processing device and audio signal processing method thereof
Kim Vocal Separation in Music Using SVM and Selective Frequency Subtraction
CN109243472A (en) A kind of audio-frequency processing method and audio processing system
CN104424971A (en) Audio file playing method and audio file playing device
Lagrange et al. Semi-automatic mono to stereo up-mixing using sound source formation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C04 Withdrawal of patent application after publication (patent law 2001)
WW01 Invention patent application withdrawn after publication

Application publication date: 20140326