CN113823316A - Voice signal separation method for sound source close to position - Google Patents
Voice signal separation method for sound source close to position Download PDFInfo
- Publication number
- CN113823316A CN113823316A CN202111125927.4A CN202111125927A CN113823316A CN 113823316 A CN113823316 A CN 113823316A CN 202111125927 A CN202111125927 A CN 202111125927A CN 113823316 A CN113823316 A CN 113823316A
- Authority
- CN
- China
- Prior art keywords
- signal
- time
- separation matrix
- frequency
- separation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000926 separation method Methods 0.000 title claims abstract description 66
- 239000011159 matrix material Substances 0.000 claims abstract description 49
- 238000000034 method Methods 0.000 claims abstract description 36
- 238000005457 optimization Methods 0.000 claims abstract description 15
- 238000010606 normalization Methods 0.000 claims abstract description 5
- 230000009466 transformation Effects 0.000 claims description 4
- 230000004913 activation Effects 0.000 claims description 2
- 238000012163 sequencing technique Methods 0.000 claims description 2
- 238000000354 decomposition reaction Methods 0.000 claims 1
- 230000000694 effects Effects 0.000 abstract description 7
- 238000012545 processing Methods 0.000 description 6
- 238000001228 spectrum Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 230000004044 response Effects 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 1
- 238000009792 diffusion process Methods 0.000 description 1
- 238000005315 distribution function Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000000638 solvent extraction Methods 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0272—Voice signal separating
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0272—Voice signal separating
- G10L21/028—Voice signal separating using properties of sound source
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/70—Reducing energy consumption in communication networks in wireless communication networks
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Circuit For Audible Band Transducer (AREA)
Abstract
The invention discloses a voice signal separation method aiming at a position close to a sound source. The method comprises the following steps: step 1, acquiring a mixed voice time-frequency domain signal to be processed; step 2, initializing a separation matrix of each frequency band; step 3, performing joint optimization on the separation matrixes of all frequency bands; step 4, conducting amplitude normalization on the separation matrix; step 5, estimating the separated time-frequency domain voice signal; and 6, recovering the time domain voice signal from the separated time-frequency domain voice signal. The method can help the separation algorithm to obtain better voice signal separation effect under the unfavorable condition that the sound source positions are close.
Description
Technical Field
The invention relates to the technical field of voice processing, in particular to a voice signal separation technology.
Background
The voice separation technology can separate original sound source signals from mixed signals of a plurality of sound sources, is an important task in the field of voice signal processing, and plays an important role in various application scenes such as intelligent home systems, video conference systems and voice recognition systems.
In a multi-channel speech signal processing scheme, Independent Vector Analysis (IVA) establishes associations of frequency components of source signals by a joint probability distribution model, thereby constructing an overall cost function. Auxiliary function based IVA (AuxiVA) and Independent low-rank matrix analysis (ILRMA) are considered to be the most advanced current methods of separating convolution mixed audio signals. The Auxiva algorithm utilizes the optimization skill of the majority-minimization (MM) to deduce Iterative Project (IP) iteration rules, and can quickly and stably optimize the separation matrix. The optimization of AuxIVA can also be combined with other more flexible signal models. The ILRMA is a signal model which combines an optimization strategy of AuxIVA and MNMF, and the cost after each iteration is ensured to be non-increased while the strong expression capability of the MNMF is utilized.
Ideally, the separation effect of IVA is independent of the sound source position, however, in practice, due to the presence of noise, the separation effect of the algorithm is significantly reduced when the sound source positions are close, which greatly limits the application of the separation algorithm in practice. How to improve the separation effect of the sound sources located close to each other is a technical problem of great concern.
Disclosure of Invention
In order to solve the above technical problem, the present invention provides a method for separating a speech signal located close to a sound source, which can significantly improve the separation effect of the speech signal.
The technical scheme adopted by the invention is as follows:
a method for separating a voice signal positioned close to a sound source, comprising the steps of:
step 1, acquiring a mixed voice time-frequency domain signal to be processed;
step 2, initializing a separation matrix of each frequency band for the mixed voice time-frequency domain signal;
step 3, performing joint optimization on the separation matrixes of all frequency bands to solve the sequencing uncertainty;
step 4, conducting amplitude normalization on the optimized separation matrix;
and 6, restoring the time domain voice signal from the time-frequency domain voice signal estimated in the step 5.
Further, the specific steps of step 1 are: and acquiring a time domain signal of the mixed voice to be processed by using a signal acquisition system, and performing short-time Fourier transform on the time domain signal to obtain a time-frequency domain signal of the mixed voice to be processed.
Further, in step 2, the unit matrix is used to initialize the separation matrix of each frequency band, the diagonal elements of the matrix are 1, and the remaining elements are 0.
Further, in step 3, the specific step of performing joint optimization on the separation matrices of all frequency bands is: (1) selecting a source signal distribution model to obtain a cost function; (2) selecting an optimization method for the cost function to obtain an update rule of the separation matrix; (3) and iterating the separation matrix by using the updating rule until convergence is achieved, and obtaining the separation matrix after each frequency band is optimized.
Further, in the step 4, the separation matrix is amplitude-normalized according to a minimum distortion criterion.
Further, the specific steps of step 5 are: and (4) multiplying the separation matrix obtained in the step (4) with the mixed voice time-frequency domain signal to be processed, and estimating the separated time-frequency domain voice signal.
Further, the specific steps of step 6 are: and 5, performing short-time Fourier inverse transformation on the time-frequency domain voice signal estimated in the step 5 to obtain a separated time-domain voice signal.
The invention aims at sound sources with close positions and realizes an improved voice signal separation method. The method has the advantages that the separation effect of the sound source position close to the scene is obviously improved, the block sorting problem of the IVA under certain conditions is relieved, and the separation effect of the sound source position far away from the scene is also improved.
Drawings
FIG. 1 is a flow chart of a speech signal separation method according to the present invention;
FIG. 2 is a schematic diagram of a sound source close-up scene to which the present invention is applicable;
fig. 3 is a graph comparing SDR lift values at different reverberation times for the original AuxIVA method, the improved AuxIVA method of the present invention, the original ILRMA method, and the improved ILRMA method of the present invention.
Fig. 4 is a graph comparing SIR increase values at different reverberation times for the original AuxIVA method, the improved AuxIVA method of the present invention, and the original ILRMA method and the improved ILRMA method of the present invention.
Detailed Description
The invention relates to a voice separation method aiming at a position close to a sound source, which mainly comprises the following parts:
1. signal acquisition
1) And mixing the pure source signal with the room impulse response convolution, and adding the diffusion noise to obtain a mixed signal.
2) Short-time Fourier transform of signals
If the mixed signal collected by the m microphone is xm(t) converting the signal into a time-frequency domain by short-time Fourier transform, ignoring the time-frame index t, and expressing the signal of the k-th frequency band asThe signals collected by the M microphones form a mixed signal vectorThe superscript T denotes the transpose operation.
2. Iterative algorithm
The nth source signal vector is denoted as snN is the source signal index and N is 1,2, …, N is the total number of source signals. The separation matrix is denoted by W, and the n-th row of the separation matrix is denoted byNote that the superscript H denotes the conjugate transpose, the superscript K denotes the kth band, K is 1,2, …, and K is the total number of bands.Representing the set of all band separation matrices, detWkIs the determinant of the separation matrix in the k-th band. Source signal vector snThe corresponding estimated signal is denoted yn,A t-th frame representing an nth estimated signal in a k-th frequency band. The time-frame indicator is ignored,for the purpose of separation, the estimated signals are made independent as much as possible, and mutual information is used as a measure of independence to construct a cost function.
1) If a Laplace source signal distribution model is selected, the mutual information cost function is properly modified to be suitable for a scene with a close sound source position, and the final cost function can be written in the following form:
wherein Which represents the average of the sample samples,is | | | yn||2F represents the probability density distribution function of the source signal as a function of the argument. Adopting the optimization skill of the major-minimization (MM), constructing an auxiliary function:
Where q is another source signal indicator. The iteration rule is then:
g' (. cndot.) represents the first derivative of G (-), enRepresenting a unit vector, the nth element is 1 and the remaining elements are 0. For Laplace distribution, G (| | y)n||2)=||yn||2,G'(||yn||2) 1. Initializing the separation matrix into a unit matrix, and then iterating until convergence according to the rules of the formulas (4) to (7) to obtain the optimized separation matrix.
2) If the MNMF is used as a source signal distribution model, cost functions of the IVA and the MNMF are fused, and the cost functions are properly modified to be suitable for a scene with a close sound source position, the final cost function can be written into the following form:
wherein ,tkl,n and vlt,nRespectively, the base and activation parameters of different sound sources, and l is an index of the base. Adopting the optimization skill of the major-minimization (MM), the following iteration rules are obtained:
wherein the model parameter tkl,n and vlt,nThe update rules of (1) are respectively:
wherein Represents the mean of the samples and l' is a new indicator of the basis. Initializing the separation matrix into an identity matrix, and then iterating until convergence according to the rules of equations (9) - (14) to obtain an optimized separation matrix.
3. Amplitude normalization
In order to solve the uncertainty of the recovered signal amplitude, amplitude normalization needs to be performed on the separation matrix obtained after convergence. According to the MDP, the following processing is further carried out on the optimized separation matrix:
Wk←(Wk(Wk)H)-1/2Wk (15)
4. reconstructing a target signal
1) Estimating a time-frequency domain target signal
From the final separation matrix obtained by equation (15), the speech signal after each band separation can be estimated by the following equation:
yk=Wkxk (16)
2) reconstructing a time-domain target signal
And finally, transforming the separated time-frequency domain voice signals to the time domain through short-time Fourier inverse transformation, and recovering the time-domain signals.
Examples
The technical scheme in the embodiment of the invention is clearly and completely described below with reference to the accompanying drawings.
1. Test sample and objective evaluation criteria
The clean speech signal in this embodiment is selected from the TIMIT data set (a speech signal of 10s length is formed by cutting and splicing) and has a sampling rate of 16 kHz. Room impulse responses were generated with an Image model (j.b. allen and d.a. berkley, "Image method for influencing small-room communications," j.acoust. soc. am., vol.65, pp.943-950,1979.), the room size was 7m × 5m × 2.75m, and the reverberation times were set to 0ms, 100ms, 300ms, 500ms, 700ms, respectively. As shown in fig. 2, in the present embodiment, 2 microphones are used to receive signals from 2 sound sources. The distance between the two microphones is 2.5cm, and the center of the two microphones is positioned at the position of [4,1,1.5] (m). The sound source and the microphone are positioned on the same horizontal plane, the two sound sources are respectively positioned at 45 degrees and 60 degrees, and the distance from the center of the array is 1 m. Clean speech signals are convolved with the room impulse response and 100 different segments of mixed signal are generated by adding a diffuse noise signal to noise ratio (SNR) of 30dB as per the method in the literature (e.a. habps and s.gannot, "Generating sensor signals in anisotropic noise fields," JASA, vol.122, No.6, pp.3464-3470,2007.). All algorithms are processed in the time-frequency domain, and the short-time Fourier transform adopts a 2048-point Hanning window and an overlap ratio of 3/4.
In the embodiment, a signal to deviation ratio (SDR) and a signal to interference ratio (SIR) are used as objective evaluation criteria, and an output SDR value (SDR _ out)/SIR value (SIR _ out) after algorithm processing is subtracted from an SDR value (SDR _ in)/SIR value (SIR _ in) of an input mixed signal to obtain an SDR boost value (sdramp)/SIR boost value (SIRimp) after algorithm processing, that is, sdramp is SDR _ out-SDR _ in and SIRimp is SIR _ out-SIR _ in.
2. Concrete implementation process of method
Referring to fig. 1, a time-domain mixed speech signal is input and subjected to short-time fourier transform to obtain a time-frequency spectrum, and a separation matrix of each frequency band is initialized to an identity matrix. In a modified AuxIVA algorithm (denoted as AuxIVA-imp), iterative optimization is performed using equations (4) - (7); in the modified ILRMA algorithm (denoted ILRMA-imp), iterative optimization is performed using equations (9) - (14). After iterative convergence, the final separation matrix W is obtained by amplitude warping with the formula (15)kAnd substituting the estimated time-domain frequency spectrum into a formula (16) to obtain a separated voice time-domain frequency spectrum estimation, and finally performing short-time Fourier inverse transformation on the estimated voice time-domain frequency spectrum to obtain a separated time-domain voice signal.
In order to embody the performance of the method of the present invention, the present embodiment compares the original Auxiva algorithm (denoted as Auxiva-ori) (N.Ono, "Stable and fast update rules for independent vector analysis based on automatic function technology," in Proc.IEEE WASPAA, pp.189-192,2011 ") with the ILRMA algorithm (denoted as ILRMA-ori) (D.Kitamura, N.Ono, H.Sawada, H.Kameoka, and H.Sarwari," Determined blue resource partitioning analysis and non-networked function analysis, "IEEE transactions, speed, Lang.Process, vol.24, No.9, pp.1626-1641,2016) with the method of the present invention. FIG. 3 shows the results of an average SDRimp obtained from 100 tests at different reverberation times; figure 4 shows the results of the average SIRimp obtained from 100 tests at different reverberation times.
It can be found that in a scene with a close sound source position, compared with the original algorithm, the method of the present invention can perform separation more effectively under a noisy condition, and the advantage is more obvious in the case of medium-low reverberation.
Claims (9)
1. A method for separating a speech signal located close to a sound source, the method comprising the steps of:
step 1, acquiring a mixed voice time-frequency domain signal to be processed;
step 2, initializing a separation matrix of each frequency band for the mixed voice time-frequency domain signal;
step 3, performing joint optimization on the separation matrixes of all frequency bands to solve the sequencing uncertainty;
step 4, conducting amplitude normalization on the optimized separation matrix;
step 5, estimating a time-frequency domain voice signal according to the separation matrix processed in the step 4;
and 6, restoring the time domain voice signal from the time-frequency domain voice signal estimated in the step 5.
2. The method for separating a voice signal of a location close to a sound source according to claim 1, wherein the step 1 comprises the following steps: and acquiring a time domain signal of the mixed voice to be processed by using a signal acquisition system, and performing short-time Fourier transform on the time domain signal to obtain a time-frequency domain signal of the mixed voice to be processed.
3. The method as claimed in claim 1, wherein the step 2 initializes the separation matrix of each frequency band by using an identity matrix, the diagonal element of the matrix is 1, and the remaining elements are 0.
4. The method as claimed in claim 1, wherein the step 3 of jointly optimizing the separation matrices of all frequency bands comprises the following steps:
(1) selecting a source signal distribution model to obtain a cost function;
(2) selecting an optimization method for the cost function to obtain an update rule of the separation matrix;
(3) and iterating the separation matrix by using the updating rule until convergence is achieved, and obtaining the separation matrix after each frequency band is optimized.
5. The method according to claim 4, wherein the Laplace distribution is selected as the source signal distribution model in step (1), and the cost function is:
wherein ,representing the sample-sample average, G (-) is a scoring function determined by the source signal model; n is the source signal index and N is 1,2, …, N is the total number of source signals; k is a frequency index and K is 1,2, …, K being the total number of frequency bands;representing the nth estimated signal, detW, in the k-th frequency bandkIs a determinant of a separation matrix within the k-th frequency band;
adopting a maj orientation-minimization optimization method for the cost function, and obtaining an update rule of the separation matrix as follows:
wherein Represents a separation matrix WkIn the nth row of (1), the superscript H denotes the conjugate transpose, xkRepresenting the mixed signal vector in the k-th frequency band,m represents the total number of microphones, G' (. cndot.) represents the first derivative of G (-), G (r)n)=rn,G'(rn)=1;enRepresenting a unit vector, the nth element is 1 and the remaining elements are 0.
6. The method according to claim 4, wherein the multi-channel non-negative matrix decomposition is selected as the source signal model in step (1), and the cost function is:
where t is a time frame index, tkl,n and vlt,nRespectively, a base and an activation parameter of different sound sources, wherein l is an index of the base, N is a source signal index, and N is 1,2, …, N and N are the total number of source signals; k is a frequency index and K is 1,2, …, K being the total number of frequency bands;t-th frame, detW, representing the n-th estimated signal in the k-th frequency bandkIs a determinant of a separation matrix within the k-th frequency band;
adopting a maj orientation-minimization optimization method for the cost function, and obtaining an update rule of the separation matrix as follows:
wherein tkl,n and vlt,nThe update rules of (1) are respectively:
7. The method according to claim 1, wherein in the step 4, the amplitude warping is performed on the separation matrix according to a minimum distortion criterion, and the specific steps are as follows:
Wk←(Wk(Wk)H)-1/2Wk
wherein K is a frequency index, K is 1,2, …, and K is the total number of frequency bands; wkThe separation matrix representing the k-th band and the superscript H the conjugate transpose.
8. The method according to claim 7, wherein the step 5 comprises the following steps: separating the separation matrix W obtained in the step 4kWith the mixed speech time-frequency domain signal x to be processedkMultiplying to estimate the separated time-frequency domain speech signal yk。
9. The method for separating a voice signal of a location close to a sound source according to claim 1, wherein the step 6 comprises the following steps: and 5, performing short-time Fourier inverse transformation on the time-frequency domain voice signal estimated in the step 5 to obtain a separated time-domain voice signal.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111125927.4A CN113823316B (en) | 2021-09-26 | 2021-09-26 | Voice signal separation method for sound source close to position |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111125927.4A CN113823316B (en) | 2021-09-26 | 2021-09-26 | Voice signal separation method for sound source close to position |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113823316A true CN113823316A (en) | 2021-12-21 |
CN113823316B CN113823316B (en) | 2023-09-12 |
Family
ID=78915482
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111125927.4A Active CN113823316B (en) | 2021-09-26 | 2021-09-26 | Voice signal separation method for sound source close to position |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113823316B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114220453A (en) * | 2022-01-12 | 2022-03-22 | 中国科学院声学研究所 | Multi-channel non-negative matrix decomposition method and system based on frequency domain convolution transfer function |
CN116866123A (en) * | 2023-07-13 | 2023-10-10 | 中国人民解放军战略支援部队航天工程大学 | Convolution blind separation method without orthogonal limitation |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104333523A (en) * | 2014-10-14 | 2015-02-04 | 集美大学 | NPCA-based post nonlinear blind source separation method |
WO2016050725A1 (en) * | 2014-09-30 | 2016-04-07 | Thomson Licensing | Method and apparatus for speech enhancement based on source separation |
WO2016152511A1 (en) * | 2015-03-23 | 2016-09-29 | ソニー株式会社 | Sound source separating device and method, and program |
CN108597531A (en) * | 2018-03-28 | 2018-09-28 | 南京大学 | A method of improving binary channels Blind Signal Separation by more sound source activity detections |
CN109584900A (en) * | 2018-11-15 | 2019-04-05 | 昆明理工大学 | A kind of blind source separation algorithm of signals and associated noises |
CN110010148A (en) * | 2019-03-19 | 2019-07-12 | 中国科学院声学研究所 | A kind of blind separation method in frequency domain and system of low complex degree |
CN111259327A (en) * | 2020-01-15 | 2020-06-09 | 桂林电子科技大学 | Subgraph processing-based optimization method for consistency problem of multi-agent system |
CN112037813A (en) * | 2020-08-28 | 2020-12-04 | 南京大学 | Voice extraction method for high-power target signal |
CN112185411A (en) * | 2019-07-03 | 2021-01-05 | 南京人工智能高等研究院有限公司 | Voice separation method, device, medium and electronic equipment |
CN112820312A (en) * | 2019-11-18 | 2021-05-18 | 北京声智科技有限公司 | Voice separation method and device and electronic equipment |
-
2021
- 2021-09-26 CN CN202111125927.4A patent/CN113823316B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2016050725A1 (en) * | 2014-09-30 | 2016-04-07 | Thomson Licensing | Method and apparatus for speech enhancement based on source separation |
CN104333523A (en) * | 2014-10-14 | 2015-02-04 | 集美大学 | NPCA-based post nonlinear blind source separation method |
WO2016152511A1 (en) * | 2015-03-23 | 2016-09-29 | ソニー株式会社 | Sound source separating device and method, and program |
CN108597531A (en) * | 2018-03-28 | 2018-09-28 | 南京大学 | A method of improving binary channels Blind Signal Separation by more sound source activity detections |
CN109584900A (en) * | 2018-11-15 | 2019-04-05 | 昆明理工大学 | A kind of blind source separation algorithm of signals and associated noises |
CN110010148A (en) * | 2019-03-19 | 2019-07-12 | 中国科学院声学研究所 | A kind of blind separation method in frequency domain and system of low complex degree |
CN112185411A (en) * | 2019-07-03 | 2021-01-05 | 南京人工智能高等研究院有限公司 | Voice separation method, device, medium and electronic equipment |
CN112820312A (en) * | 2019-11-18 | 2021-05-18 | 北京声智科技有限公司 | Voice separation method and device and electronic equipment |
CN111259327A (en) * | 2020-01-15 | 2020-06-09 | 桂林电子科技大学 | Subgraph processing-based optimization method for consistency problem of multi-agent system |
CN112037813A (en) * | 2020-08-28 | 2020-12-04 | 南京大学 | Voice extraction method for high-power target signal |
Non-Patent Citations (2)
Title |
---|
SHRIKANT VENKATARAMANI: "Performance Based Cost Functions for End-to-End Speech Separation", 《2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC)》 * |
陈田田: "基于稀疏分量分析的欠定盲声源分离技术研究", 《中国优秀硕士学位论文全文数据库》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114220453A (en) * | 2022-01-12 | 2022-03-22 | 中国科学院声学研究所 | Multi-channel non-negative matrix decomposition method and system based on frequency domain convolution transfer function |
CN116866123A (en) * | 2023-07-13 | 2023-10-10 | 中国人民解放军战略支援部队航天工程大学 | Convolution blind separation method without orthogonal limitation |
CN116866123B (en) * | 2023-07-13 | 2024-04-30 | 中国人民解放军战略支援部队航天工程大学 | Convolution blind separation method without orthogonal limitation |
Also Published As
Publication number | Publication date |
---|---|
CN113823316B (en) | 2023-09-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20210089967A1 (en) | Data training in multi-sensor setups | |
US9668066B1 (en) | Blind source separation systems | |
CN111133511B (en) | sound source separation system | |
US8874439B2 (en) | Systems and methods for blind source signal separation | |
CN113823316B (en) | Voice signal separation method for sound source close to position | |
CN106251877A (en) | Voice Sounnd source direction method of estimation and device | |
CN107479030A (en) | Based on frequency dividing and improved broad sense cross-correlation ears delay time estimation method | |
JP2007526511A (en) | Method and apparatus for blind separation of multipath multichannel mixed signals in the frequency domain | |
Saruwatari et al. | Blind source separation for speech based on fast-convergence algorithm with ICA and beamforming. | |
CN106847301A (en) | A kind of ears speech separating method based on compressed sensing and attitude information | |
CN109671447A (en) | A kind of binary channels is deficient to determine Convolution Mixture Signals blind signals separation method | |
Nesta et al. | Robust Automatic Speech Recognition through On-line Semi Blind Signal Extraction | |
CN112037813B (en) | Voice extraction method for high-power target signal | |
CN112201276B (en) | TC-ResNet network-based microphone array voice separation method | |
JP7046636B2 (en) | Signal analyzers, methods, and programs | |
CN114283832A (en) | Processing method and device for multi-channel audio signal | |
CN110265060B (en) | Speaker number automatic detection method based on density clustering | |
JP6910609B2 (en) | Signal analyzers, methods, and programs | |
Cobos et al. | Two-microphone separation of speech mixtures based on interclass variance maximization | |
CN114863944B (en) | Low-delay audio signal overdetermined blind source separation method and separation device | |
CN112820312A (en) | Voice separation method and device and electronic equipment | |
Jafari et al. | Sparse coding for convolutive blind audio source separation | |
CN113393850A (en) | Parameterized auditory filter bank for end-to-end time domain sound source separation system | |
JP4714892B2 (en) | High reverberation blind signal separation apparatus and method | |
Hsu et al. | Array configuration-agnostic personalized speech enhancement using long-short-term spatial coherence |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |