CN107017005B - DFT-based dual-channel speech sound separation method - Google Patents

DFT-based dual-channel speech sound separation method Download PDF

Info

Publication number
CN107017005B
CN107017005B CN201710287632.4A CN201710287632A CN107017005B CN 107017005 B CN107017005 B CN 107017005B CN 201710287632 A CN201710287632 A CN 201710287632A CN 107017005 B CN107017005 B CN 107017005B
Authority
CN
China
Prior art keywords
channel
speech
dft
sound
right channel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710287632.4A
Other languages
Chinese (zh)
Other versions
CN107017005A (en
Inventor
叶晨
陈建清
陈适宜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tongji University
Original Assignee
Tongji University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tongji University filed Critical Tongji University
Priority to CN201710287632.4A priority Critical patent/CN107017005B/en
Publication of CN107017005A publication Critical patent/CN107017005A/en
Application granted granted Critical
Publication of CN107017005B publication Critical patent/CN107017005B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • G10L21/0308Voice signal separating characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S1/00Two-channel systems
    • H04S1/007Two-channel systems in which the audio signals are in digital form

Abstract

The invention relates to a DFT-based two-channel speech sound separation method, which comprises the following steps: s1, slicing the time domain signal sequences of the left channel and the right channel respectively, and performing DFT conversion to obtain frequency domain signal sequences of the left channel and the right channel; s2, obtaining the angle condition between the left and right sound track background music component and the angle condition between the speech component and the frequency point signal, separating the speech and the music; and S3, performing inverse DFT on the result obtained in the step S2 to obtain time domain signals of the left channel and the right channel after the speech and music are separated. Compared with the prior art, the method can effectively separate background music and speech sound by utilizing the discrete Fourier transform of the fragments; different phase difference conditions are determined by considering the angle range of the sound pickup system and the distance between two channels of the sound pickup system, so that the calculation result is more accurate; and filtering the obtained final result to filter unnecessary noise, and can be applied to the application programs of the cell phones of the Karaoke type.

Description

DFT-based dual-channel speech sound separation method
Technical Field
The invention relates to a voice processing method, in particular to a double-channel speech sound separation method based on DFT.
Background
The main technique of separating human voice comes from processing frequency and phase, and the existing technique basically involves two manual linkage operations, such as filtering in a frequency stage and phase cancellation in some frequencies. The DFT algorithm can effectively convert time domain information into frequency domain information, and the inverse DFT transform can convert the frequency domain information into time domain information. The DFT algorithm has wide application in digital filtering, power spectrum analysis and communication theory. The technology is applied to the separation of the human voice and the background music and is improved, so that the human voice can be well separated.
A strengthening separation method for multiple specific musical instruments in single-channel music voice separation relates to a strengthening separation method for multiple specific musical instruments in single-channel music voice separation. The method is used for strengthening and separating 8 musical instruments including an electric guitar, a clarinet, a violin, a piano, an acoustic guitar, an organ, a flute and a trumpet, and the strengthening and separating are realized through a layer of single musical instrument separator and three layers of multi-musical instrument combination reinforcers, wherein the first layer of multi-musical instrument combination reinforcers can separate 2 types of musical instrument sounds, the second layer of multi-musical instrument combination reinforcers can separate 4 types of musical instrument sounds, and the third layer of multi-musical instrument combination reinforcers can separate 8 types of musical instrument sounds. However, the technology is limited to the separation of the sound of the musical instrument, and the application field is narrow; only single channel music can be processed, the single channel has too little information to distinguish it according to the difference of speech sounds and background music, and the result is usually hard to imagine.
Disclosure of Invention
The present invention aims to overcome the defects of the prior art and provide a DFT-based dual-channel speech separation method which can well separate human voice from background music.
The purpose of the invention can be realized by the following technical scheme:
a DFT-based two-channel speech sound separation method is used for separating speech sound and background music, and comprises the following steps:
s1, slicing the time domain signal sequences of the left channel and the right channel, and performing DFT conversion to obtain the frequency domain signal sequences of the left channel and the right channel, wherein the signal separation expression of each frequency point is as follows:
Figure GDA0002255471180000021
wherein, | ωL| is a modulus value of the left channel signal,
Figure GDA0002255471180000022
is the unit vector, | ω, of the left channel signalhumanLL is the module value of the left channel speech component,
Figure GDA0002255471180000023
is the unit vector of the vocal component of the left channel, | omegamusicL| is the background music score of the left trackThe modulus value of the quantity is,
Figure GDA0002255471180000024
is the unit vector of the left channel background music component, | omegaR| is a modulus value of the right channel signal,
Figure GDA0002255471180000025
is the unit vector, | omega, of the right channel signalhumanR| is the module value of the right channel speech component,
Figure GDA0002255471180000026
is the unit vector, | omega, of the vocal component of the right channelmusicR| is a module value of the right channel background music component,
Figure GDA0002255471180000027
a unit vector for a right channel background music component;
s2, let each frequency point | ωhumanL|=|ωhumanR|,
Figure GDA0002255471180000028
Obtaining the angle condition between the left and right sound track background music components and the angle condition between the speech sound component and the frequency point signal, and calculating the angle condition in the formula (1)
Figure GDA0002255471180000029
And
Figure GDA00022554711800000210
separating the speech and music;
and S3, performing DFT inverse transformation on the result obtained in the step S2, and performing noise filtering to obtain time domain signals of the left channel and the right channel after the speech and the music are separated.
In step S2, the angle between the left and right channel background music components is: when the frequency of the frequency point signal is greater than 603Hz,
Figure GDA00022554711800000211
otherwise
Figure GDA00022554711800000212
Wherein d is the distance between two channels of the sound pickup system, α is the angle of a single sound pickup device in the sound pickup system for receiving audio, and λ is the wavelength of the frequency point signal.
The single sound pickup device receives the maximum angle of the audio
Figure GDA00022554711800000213
In step S2, the angle between the speech component and the frequency point signal is:
Figure GDA00022554711800000214
in step S1, the time domain signal sequences of the left channel and the right channel are divided into a plurality of slices having equal lengths.
Compared with the prior art, the method has the advantages that after the frequency domain signal is obtained by utilizing the discrete Fourier transform of the fragments, the background music and the speech sound can be effectively separated; different phase difference conditions are determined by considering the angle range of the sound pickup system and the distance between two channels of the sound pickup system, so that the calculation result is more accurate; and filtering the obtained final result to filter unnecessary noise, and can be applied to the application programs of the cell phones of the Karaoke type.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
fig. 2 is a schematic diagram illustrating a relationship between a sound pickup system and a sound source according to the present embodiment.
Detailed Description
The invention is described in detail below with reference to the figures and specific embodiments. The present embodiment is implemented on the premise of the technical solution of the present invention, and a detailed implementation manner and a specific operation process are given, but the scope of the present invention is not limited to the following embodiments.
Examples
As shown in fig. 1, a DFT-based two-channel speech separation method for separating speech from background music includes the following steps:
and S1, slicing the time domain signal sequences of the left channel and the right channel respectively, and performing DFT conversion to obtain the frequency domain signal sequences of the left channel and the right channel.
S2, let each frequency point | ωhumanL|=|ωhumanR|,
Figure GDA0002255471180000031
Acquiring an included angle condition between the left and right sound track background music components and an included angle condition between the speech component and the frequency point signal, and separating the speech and the music;
and S3, performing DFT inverse transformation on the result obtained in the step S2, and performing noise filtering to obtain time domain signals of the left channel and the right channel after the speech and the music are separated.
For each frequency bin, there is the following equation:
ωi=ωhumanmusic
wherein ω isiIs the complex value of the ith frequency point, omegahumanAs the speech component of the ith frequency point, ωmusicIs a component of the background music. All variables are complex, in other words, the above formula can be written as
Figure GDA0002255471180000032
There will be songs for two channels
Figure GDA0002255471180000033
For equation (1), the left side is the known variable, the complex value of a certain frequency bin, can also be decomposed into unit vector and module value. The left variable has two and the right variable has four, and equation (1) can be considered to be numerically unsolvable in view of frequency bin independence. A similar conclusion can be drawn also for equation (2). Considering that most of voices in an album listened to daily are recorded through a microphone, namely at any frequency point, the following should be provided:
humanL|=|ωhumanR|
Figure GDA0002255471180000041
thus, equation (2) is transformed into the following form:
Figure GDA0002255471180000042
the discrete Fourier transform can be as follows
Figure GDA0002255471180000043
Wherein:
Figure GDA0002255471180000044
assuming that the slicing is performed for two sequences (containing left and right channels) that are long enough, we get:
Figure GDA0002255471180000045
the result obtained after the sliced Fourier transform is
Figure GDA0002255471180000046
Wherein ω isRijJ term, ω, representing the ith slice of the right channelLijThe jth term representing the ith slice of the left channel. If all slices after inverse transformation are expected to be attached, impulse response is not generated as much as possible, and the change of any frequency point among the slices is required to be as small as possible. A section of unprocessed audio may be selected for its sliced fourier transform and the frequency bins at the same position therein may be selected for analysis to observe continuous and non-subsequent impulse response frequency phase changes.
Assuming a sinusoidal signal, sampling is performed in a time domain slice of a fixed length, where the phase of the signal in the time domain slice is:
Figure GDA0002255471180000051
for the nth sampling period, the range of sampling is considered to be:
Figure GDA0002255471180000052
where n is the number of cycles experienced by a small time domain slice,
Figure GDA0002255471180000053
is an angle that exceeds an integer period within a time slice. So for the Nth time slice, the corresponding (the latter equal sign is not equal meaning) has relative to the first time slice
Figure GDA0002255471180000054
A phase difference.
The characteristics in modulus are less obvious than those in phase, but after the signal is reconstructed in frequency domain, the time domain signal between adjacent slices must be continuous and smooth, otherwise obvious impulse response will occur.
Next, an attempt is made to pass X in the formula (7)RijAnd XLijRebuild new X'RijAnd X'Lij
The frequency domain result obtained after DFT transformation can be processed by the next voice and audio separation algorithm. All the following processing processes are performed on each frequency point. Let S (m, n) be the input parameter here. Considering that the source of the signal is left and right channels, the combination (2) has:
Figure GDA0002255471180000055
Figure GDA0002255471180000056
if order:
Figure GDA0002255471180000057
the following simple equation is obtained:
Figure GDA0002255471180000061
the parameter g here is not presented above, and the model assumed here is more accurate. If the speech sounds are loudness shifted during post-processing, such as when singing at the ear, g1 ≠ g 2. However, only g is considered here1=g2The case (1). It is not important what the two parameters are in particular, since
Figure GDA0002255471180000062
Is unknown. In other words, if g1,g2Increase in same ratio only will
Figure GDA0002255471180000063
Is reduced on a par, but does not affect
Figure GDA0002255471180000064
And (4) solving. Establishing at this point, let:
Figure GDA0002255471180000065
equation (13) can become:
Figure GDA0002255471180000066
removing intermediate parts (speech) taking into account a certain frequency point
Figure GDA0002255471180000067
) Are generated by a variety of instruments and synthesizers. These sound sources emit various sounds to reachThe phase difference of the recording points of the left and right channels will also be different. It is assumed that the left and right channels have a characteristic of being uniformly distributed in the sound source, in other words, various phase differences are uniformly distributed at the same frequency point. Therefore, considering the comprehensive effect of averaging the frequency points, the angle between the background music of the left and right sound channels is considered as
Figure GDA0002255471180000068
Therefore, there is a first additional condition derived from assumptions and a priori knowledge, which condition is hereinafter also referred to as a first phase difference condition:
Figure GDA0002255471180000069
this equation seems simple and is the key to solving the problem. This assumption is actually problematic because the distance between the recordings of the left and right channels is only about 30cm, and for parts within 300Hz, it is almost impossible for the left and right channels to differ by 90 °. Because the wavelength of the sound wave within 300Hz must be greater than 1 meter, considering the distance from the sound source to the left and right channel access points, there will be:
Figure GDA00022554711800000610
the frequency point emitted by the source will reach 0.6 pi only when the source is present on the extension of the two receivers. The optimization of the selection of different frequency angles is discussed in detail below. Here again in
Figure GDA00022554711800000611
The equation is solved for the condition.
Substituting equation (14) into (15) would be:
Figure GDA00022554711800000612
simplifying to obtain:
Figure GDA0002255471180000071
where θ is
Figure GDA0002255471180000072
And
Figure GDA0002255471180000073
the angle of (a) is often small. In fact, it is convenient to approximate θ here to 0. Thus, there are:
Figure GDA0002255471180000074
solving the two-order equation to obtain two roots:
Figure GDA0002255471180000075
the negative sign is taken here in view of the problem of energy distribution. Therefore:
Figure GDA0002255471180000076
all components that need to be solved are:
Figure GDA0002255471180000077
Figure GDA0002255471180000078
Figure GDA0002255471180000079
the next work is to substitute equation (11) to solve the inverse DFT transform. And filtering the obtained final result to remove unnecessary noise.
Above when solving for equation (4)Using equation (5), consider that the mean of the angles of all frequency bins should be
Figure GDA00022554711800000710
The premise of this assumption is that at any frequency point, all sources are rich enough and the phase differences are distributed over the entire real axis (this does not conflict with the angle between (0,180) degrees, since the determination of the angle essentially determines that its domain is uniformly mapped to the real axis.) certainly this is not the case, first background music often comes from small indoor recordings and various special effects are added by software. This scheme is the sound image moving system mentioned above, and this scheme usually places a certain section of recorded sound source within a virtual specific distance, and then obtains different left and right channels through computer simulation.
In addition, the influence of the sound reception angle of the sound reception system needs to be considered. For two specific points of observation, left and right ears for humans, left and right channel radios for the sound pick-up system, and two sound-simulated points of reception for post-processing, the source location is often within a limited sound field in front of the two points of observation, as shown in fig. 2, A, B being the image point.
For a small band accompaniment, the requirement for this angle is often not very critical, in other words, the line connecting the sound source to the pickup is not a very small acute angle θ from the extension of the two pickups.
It is also worth studying the dimensions of the audio image point from the sound pick-up system and the distance between two channels of the sound pick-up system. The distance between the two channels of a modern common case sound pick-up system is:
d=30cm
and h is 1-2 m under the normal condition, the distance is usually more random, and the distance actually implies the distance of adding the image point during post-production. Therefore, it can be considered that the phase difference when a certain sound source reaches two sound pickup devices is:
Figure GDA0002255471180000081
this equation states: the phase difference between the two pickups is not drastically changed by the distance of the sound source from the sound pick-up system, and since d is determined, in the low frequency range,
Figure GDA0002255471180000082
often fluctuating within a certain range. This is in contradiction to a strong assumption established in the previous section:
Figure GDA0002255471180000083
the reason is as follows: when λ is large, since d is small and θ is large, the phase difference between the two sound pickup devices does not reach
Figure GDA0002255471180000084
In (1). By refining the range of this angle, a more accurate average value of the angle is given.
An upper limit of α in fig. 2 may be given in particular, it may be considered that all sound sources are present from one side of the sound pick-up system and:
Figure GDA0002255471180000085
with the above conditions, for sound waves with a wavelength λ, when the maximum value of the phase difference between the two sound pickup devices is minimal, all sound sources located on the vertical bisector of the two sound pickup devices will not have the phase difference:
Figure GDA0002255471180000086
based on equation (23), there is:
Figure GDA0002255471180000091
now, for equation (15) in the previous section, we correct for this equation, which is also referred to as the second phase difference condition:
Figure GDA0002255471180000092
given the parameter values, λ is the wavelength of the current processing frequency point,
Figure GDA0002255471180000093
d is 0.3 m. For high frequency sound waves, e.g. sound waves greater than 2kHz, due to
Figure GDA0002255471180000094
The assumption given here is no longer valid and the first phase difference condition is still used.
From equation (25) and equation (14), there is:
Figure GDA0002255471180000095
substitution α, d, has:
Figure GDA0002255471180000096
from equation (26), when the wavelength is less than 0.2819m, the phase difference condition should be chosen as equation (15). Considering that the operation is in the frequency domain and the velocity of the acoustic wave in air is 340m/s, there are:
Figure GDA0002255471180000097
therefore, the second phase difference condition is selected when the detection frequency point is smaller than 603Hz, and the first phase difference condition is selected when the frequency point value is larger than 603 Hz. Given the constraints, the equations can be solved. Under the second phase difference condition, the following equation is given:
Figure GDA0002255471180000098
here, the
Figure GDA0002255471180000099
The previous coefficient removal is only for simplicity of writing. Moreover, this factor has no practical effect, as already explained above. In the first phase difference condition, use is made of
Figure GDA00022554711800000910
The multiplication terms are directly eliminated to obtain a simple calculation result. But if based on the second phase difference condition, the simplified results have to be faced
Figure GDA00022554711800000911
A quadratic term. This quadratic term is numerically the product of two root equations, which becomes a quaternion quadrivalent equation for solution.
Considering that most of the energy of audio is concentrated in the range of medium and low frequencies, where the sound waves are between two sound pick-up devices (only about 30 cm), the amplitudes of the attenuations in the air are not very different. Specifically, since the distance difference is short, and air absorption attenuation is not considered, ground absorption attenuation is considered, only diffusion attenuation is considered, and it can be considered that the sound source is at least 1m away from two sound pickup devices, there are:
Figure GDA0002255471180000101
wherein l1,l2For a distance of a sound source from two pick-up devices,P1,P2The sound pressure of the sound wave emitted by the sound source reaching the two sound pickup devices. In fact, the ratio of these two should be slightly larger than 1, rather than close to the result of 1.69 obtained by the above equation, since here the sound source would typically be directly opposite the two sound pick-up devices, rather than on the extension of the sides, and the distance of the sound source would also be larger than 1 m. The significance of this equation is to give an upper bound on the variation, facilitating the establishment of the following approximation:
Figure GDA0002255471180000102
combining equation (28) yields such an approximated error range:
Figure GDA0002255471180000103
in fact this is an acceptable error range. And it is believed that in most cases, this approximation will yield more accurate results. Substituting it into equation (27), trying to eliminate
Figure GDA0002255471180000104
Obtaining:
Figure GDA0002255471180000105
simplifying to obtain:
Figure GDA0002255471180000106
the first order term is approximated as a scalar quantity, under the same principle as equation (18):
Figure GDA0002255471180000107
the coefficients of the first order quadratic equation are:
Figure GDA0002255471180000108
the solution is still according to the above scheme and considering the unity of sign, and should also take the root with negative sign, and in principle, should not get an inverted left and right channel:
Figure GDA0002255471180000111
due to various reflection diffraction, the difference between two sides is not almost zero at low frequency. Here simply written as:
Figure GDA0002255471180000112
and performing short-time Fourier transform inverse transformation on the processed left and right channels to obtain time-domain signals. Filtering the time domain signal to filter out the noise of the high frequency part generated by the processing, and obtaining the final result:
Figure GDA0002255471180000113

Claims (5)

1. a DFT-based two-channel speech separation method is used for separating speech and background music, and is characterized by comprising the following steps:
s1, slicing the time domain signal sequences of the left channel and the right channel, and performing DFT conversion to obtain the frequency domain signal sequences of the left channel and the right channel, wherein the signal separation expression of each frequency point is as follows:
Figure FDA0002255471170000011
wherein, | ωL| is a modulus value of the left channel signal,
Figure FDA0002255471170000012
is the unit vector, | ω, of the left channel signalhumanLL is leftThe modulus of the vocal tract speech components,
Figure FDA0002255471170000013
is the unit vector of the vocal component of the left channel, | omegamusicL| is the module value of the left channel background music component,
Figure FDA0002255471170000014
is the unit vector of the left channel background music component, | omegaR| is a modulus value of the right channel signal,
Figure FDA0002255471170000015
is the unit vector, | omega, of the right channel signalhumanR| is the module value of the right channel speech component,
Figure FDA0002255471170000016
is the unit vector, | omega, of the vocal component of the right channelmusicR| is a module value of the right channel background music component,
Figure FDA0002255471170000017
a unit vector for a right channel background music component;
s2, let each frequency point | ωhumanL|=|ωhumanR|,
Figure FDA0002255471170000018
Obtaining the angle condition between the left and right sound track background music components and the angle condition between the speech sound component and the frequency point signal, and calculating the angle condition in the formula (1)
Figure FDA0002255471170000019
And
Figure FDA00022554711700000110
separating the speech and music;
and S3, performing inverse DFT on the result obtained in the step S2 to obtain time domain signals of the left channel and the right channel after the speech and music are separated.
2. The DFT-based two-channel speech separation method according to claim 1, wherein in step S2, the angle between the left and right channel background music components is: when the frequency of the frequency point signal is greater than 603Hz,
Figure FDA00022554711700000111
otherwise
Figure FDA00022554711700000112
Wherein d is the distance between two channels of the sound pickup system, α is the angle of a single sound pickup device in the sound pickup system for receiving audio, and λ is the wavelength and symbol of frequency point signals<,>Representing the angle between the two vectors.
3. The DFT-based two-channel speech separation method of claim 2, wherein the angle at which the single pickup device receives audio is determined by the angle at which the single pickup device receives audio
Figure FDA00022554711700000113
4. The DFT-based two-channel speech separation method according to claim 1, wherein in step S1, the time domain signal sequences of the left channel and the right channel are divided into a plurality of slices with equal length.
5. The DFT-based two-channel speech separation method according to claim 1, wherein the step S3 further includes: and carrying out noise filtering on the result of the DFT inverse transformation.
CN201710287632.4A 2017-04-27 2017-04-27 DFT-based dual-channel speech sound separation method Active CN107017005B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710287632.4A CN107017005B (en) 2017-04-27 2017-04-27 DFT-based dual-channel speech sound separation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710287632.4A CN107017005B (en) 2017-04-27 2017-04-27 DFT-based dual-channel speech sound separation method

Publications (2)

Publication Number Publication Date
CN107017005A CN107017005A (en) 2017-08-04
CN107017005B true CN107017005B (en) 2020-03-24

Family

ID=59447955

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710287632.4A Active CN107017005B (en) 2017-04-27 2017-04-27 DFT-based dual-channel speech sound separation method

Country Status (1)

Country Link
CN (1) CN107017005B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109994098B (en) * 2019-01-11 2021-02-02 同济大学 Weighted noise active control method based on off-line reconstruction of secondary path
CN110232931B (en) * 2019-06-18 2022-03-22 广州酷狗计算机科技有限公司 Audio signal processing method and device, computing equipment and storage medium
CN112198496B (en) * 2020-09-29 2022-11-29 上海特金无线技术有限公司 Signal processing method, device and equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101604012A (en) * 2008-06-11 2009-12-16 索尼株式会社 Signal processing apparatus, signal processing method and program
CN102402977A (en) * 2010-09-14 2012-04-04 无锡中星微电子有限公司 Method for extracting accompaniment and human voice from stereo music and device of method
CN102568493A (en) * 2012-02-24 2012-07-11 大连理工大学 Underdetermined blind source separation (UBSS) method based on maximum matrix diagonal rate
JP5273080B2 (en) * 2010-03-30 2013-08-28 ブラザー工業株式会社 Singing voice separation device and program
CN104167214A (en) * 2014-08-20 2014-11-26 电子科技大学 Quick source signal reconstruction method achieving blind sound source separation of two microphones
CN104616663A (en) * 2014-11-25 2015-05-13 重庆邮电大学 Music separation method of MFCC (Mel Frequency Cepstrum Coefficient)-multi-repetition model in combination with HPSS (Harmonic/Percussive Sound Separation)
CN105654963A (en) * 2016-03-23 2016-06-08 天津大学 Voice underdetermined blind identification method and device based on frequency spectrum correction and data density clustering

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101604012A (en) * 2008-06-11 2009-12-16 索尼株式会社 Signal processing apparatus, signal processing method and program
JP5273080B2 (en) * 2010-03-30 2013-08-28 ブラザー工業株式会社 Singing voice separation device and program
CN102402977A (en) * 2010-09-14 2012-04-04 无锡中星微电子有限公司 Method for extracting accompaniment and human voice from stereo music and device of method
CN102568493A (en) * 2012-02-24 2012-07-11 大连理工大学 Underdetermined blind source separation (UBSS) method based on maximum matrix diagonal rate
CN104167214A (en) * 2014-08-20 2014-11-26 电子科技大学 Quick source signal reconstruction method achieving blind sound source separation of two microphones
CN104616663A (en) * 2014-11-25 2015-05-13 重庆邮电大学 Music separation method of MFCC (Mel Frequency Cepstrum Coefficient)-multi-repetition model in combination with HPSS (Harmonic/Percussive Sound Separation)
CN105654963A (en) * 2016-03-23 2016-06-08 天津大学 Voice underdetermined blind identification method and device based on frequency spectrum correction and data density clustering

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"Voice Conversion Using Dynamic Frequency Warping With Amplitude Scaling, for Parallel or Nonparallel Corpora";Elizabeth Godoy等;《IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING》;20120531;第20卷(第4期);全文 *
"非平稳环境中的盲源分离算法研究";刘建强;《中国博士学位论文全文数据库 信息科技辑》;20090115(第01期);全文 *

Also Published As

Publication number Publication date
CN107017005A (en) 2017-08-04

Similar Documents

Publication Publication Date Title
US9111526B2 (en) Systems, method, apparatus, and computer-readable media for decomposition of a multichannel music signal
EP1741313B1 (en) A method and system for sound source separation
US10430154B2 (en) Tonal/transient structural separation for audio effects
KR20180050652A (en) Method and system for decomposing sound signals into sound objects, sound objects and uses thereof
EP2946382B1 (en) Vehicle engine sound extraction and reproduction
CN107017005B (en) DFT-based dual-channel speech sound separation method
Argenti et al. Automatic transcription of polyphonic music based on the constant-Q bispectral analysis
WO2006090589A1 (en) Sound separating device, sound separating method, sound separating program, and computer-readable recording medium
JP6452653B2 (en) A system for modeling the characteristics of musical instruments
Colonel et al. Reverse engineering of a recording mix with differentiable digital signal processing
JP2017090888A (en) Method for modeling characteristic of instrument
EP1463030B1 (en) Reverberation sound generating apparatus
CN107146630B (en) STFT-based dual-channel speech sound separation method
Lee et al. Musical onset detection based on adaptive linear prediction
Itoyama et al. Integration and adaptation of harmonic and inharmonic models for separating polyphonic musical signals
Benetos et al. Auditory spectrum-based pitched instrument onset detection
Pishdadian et al. A multi-resolution approach to common fate-based audio separation
Wu et al. Multipitch estimation by joint modeling of harmonic and transient sounds
Han et al. Reconstructing completely overlapped notes from musical mixtures
Woodruff et al. Resolving overlapping harmonics for monaural musical sound separation using pitch and common amplitude modulation
JP5397786B2 (en) Fog removal device
Giampiccolo et al. Virtual Bass Enhancement Via Music Demixing
Dziubiński et al. High accuracy and octave error immune pitch detection algorithms
Gong et al. Monaural musical octave sound separation using relaxed extended common amplitude modulation
Bailey et al. Applications of the phase vocoder in the control of real‐time electronic musical instruments

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant