CN113823316A - Voice signal separation method for sound source close to position - Google Patents

Voice signal separation method for sound source close to position Download PDF

Info

Publication number
CN113823316A
CN113823316A CN202111125927.4A CN202111125927A CN113823316A CN 113823316 A CN113823316 A CN 113823316A CN 202111125927 A CN202111125927 A CN 202111125927A CN 113823316 A CN113823316 A CN 113823316A
Authority
CN
China
Prior art keywords
signal
time
separation matrix
frequency
separation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111125927.4A
Other languages
Chinese (zh)
Other versions
CN113823316B (en
Inventor
廖乐乐
卢晶
陈锴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN202111125927.4A priority Critical patent/CN113823316B/en
Publication of CN113823316A publication Critical patent/CN113823316A/en
Application granted granted Critical
Publication of CN113823316B publication Critical patent/CN113823316B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • G10L21/028Voice signal separating using properties of sound source
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The invention discloses a voice signal separation method aiming at a position close to a sound source. The method comprises the following steps: step 1, acquiring a mixed voice time-frequency domain signal to be processed; step 2, initializing a separation matrix of each frequency band; step 3, performing joint optimization on the separation matrixes of all frequency bands; step 4, conducting amplitude normalization on the separation matrix; step 5, estimating the separated time-frequency domain voice signal; and 6, recovering the time domain voice signal from the separated time-frequency domain voice signal. The method can help the separation algorithm to obtain better voice signal separation effect under the unfavorable condition that the sound source positions are close.

Description

Voice signal separation method for sound source close to position
Technical Field
The invention relates to the technical field of voice processing, in particular to a voice signal separation technology.
Background
The voice separation technology can separate original sound source signals from mixed signals of a plurality of sound sources, is an important task in the field of voice signal processing, and plays an important role in various application scenes such as intelligent home systems, video conference systems and voice recognition systems.
In a multi-channel speech signal processing scheme, Independent Vector Analysis (IVA) establishes associations of frequency components of source signals by a joint probability distribution model, thereby constructing an overall cost function. Auxiliary function based IVA (AuxiVA) and Independent low-rank matrix analysis (ILRMA) are considered to be the most advanced current methods of separating convolution mixed audio signals. The Auxiva algorithm utilizes the optimization skill of the majority-minimization (MM) to deduce Iterative Project (IP) iteration rules, and can quickly and stably optimize the separation matrix. The optimization of AuxIVA can also be combined with other more flexible signal models. The ILRMA is a signal model which combines an optimization strategy of AuxIVA and MNMF, and the cost after each iteration is ensured to be non-increased while the strong expression capability of the MNMF is utilized.
Ideally, the separation effect of IVA is independent of the sound source position, however, in practice, due to the presence of noise, the separation effect of the algorithm is significantly reduced when the sound source positions are close, which greatly limits the application of the separation algorithm in practice. How to improve the separation effect of the sound sources located close to each other is a technical problem of great concern.
Disclosure of Invention
In order to solve the above technical problem, the present invention provides a method for separating a speech signal located close to a sound source, which can significantly improve the separation effect of the speech signal.
The technical scheme adopted by the invention is as follows:
a method for separating a voice signal positioned close to a sound source, comprising the steps of:
step 1, acquiring a mixed voice time-frequency domain signal to be processed;
step 2, initializing a separation matrix of each frequency band for the mixed voice time-frequency domain signal;
step 3, performing joint optimization on the separation matrixes of all frequency bands to solve the sequencing uncertainty;
step 4, conducting amplitude normalization on the optimized separation matrix;
step 5, estimating a time-frequency domain voice signal according to the separation matrix processed in the step 4;
and 6, restoring the time domain voice signal from the time-frequency domain voice signal estimated in the step 5.
Further, the specific steps of step 1 are: and acquiring a time domain signal of the mixed voice to be processed by using a signal acquisition system, and performing short-time Fourier transform on the time domain signal to obtain a time-frequency domain signal of the mixed voice to be processed.
Further, in step 2, the unit matrix is used to initialize the separation matrix of each frequency band, the diagonal elements of the matrix are 1, and the remaining elements are 0.
Further, in step 3, the specific step of performing joint optimization on the separation matrices of all frequency bands is: (1) selecting a source signal distribution model to obtain a cost function; (2) selecting an optimization method for the cost function to obtain an update rule of the separation matrix; (3) and iterating the separation matrix by using the updating rule until convergence is achieved, and obtaining the separation matrix after each frequency band is optimized.
Further, in the step 4, the separation matrix is amplitude-normalized according to a minimum distortion criterion.
Further, the specific steps of step 5 are: and (4) multiplying the separation matrix obtained in the step (4) with the mixed voice time-frequency domain signal to be processed, and estimating the separated time-frequency domain voice signal.
Further, the specific steps of step 6 are: and 5, performing short-time Fourier inverse transformation on the time-frequency domain voice signal estimated in the step 5 to obtain a separated time-domain voice signal.
The invention aims at sound sources with close positions and realizes an improved voice signal separation method. The method has the advantages that the separation effect of the sound source position close to the scene is obviously improved, the block sorting problem of the IVA under certain conditions is relieved, and the separation effect of the sound source position far away from the scene is also improved.
Drawings
FIG. 1 is a flow chart of a speech signal separation method according to the present invention;
FIG. 2 is a schematic diagram of a sound source close-up scene to which the present invention is applicable;
fig. 3 is a graph comparing SDR lift values at different reverberation times for the original AuxIVA method, the improved AuxIVA method of the present invention, the original ILRMA method, and the improved ILRMA method of the present invention.
Fig. 4 is a graph comparing SIR increase values at different reverberation times for the original AuxIVA method, the improved AuxIVA method of the present invention, and the original ILRMA method and the improved ILRMA method of the present invention.
Detailed Description
The invention relates to a voice separation method aiming at a position close to a sound source, which mainly comprises the following parts:
1. signal acquisition
1) And mixing the pure source signal with the room impulse response convolution, and adding the diffusion noise to obtain a mixed signal.
2) Short-time Fourier transform of signals
If the mixed signal collected by the m microphone is xm(t) converting the signal into a time-frequency domain by short-time Fourier transform, ignoring the time-frame index t, and expressing the signal of the k-th frequency band as
Figure BDA0003278712260000031
The signals collected by the M microphones form a mixed signal vector
Figure BDA0003278712260000032
The superscript T denotes the transpose operation.
2. Iterative algorithm
The nth source signal vector is denoted as snN is the source signal index and N is 1,2, …, N is the total number of source signals. The separation matrix is denoted by W, and the n-th row of the separation matrix is denoted by
Figure BDA0003278712260000033
Note that the superscript H denotes the conjugate transpose, the superscript K denotes the kth band, K is 1,2, …, and K is the total number of bands.
Figure BDA0003278712260000034
Representing the set of all band separation matrices, detWkIs the determinant of the separation matrix in the k-th band. Source signal vector snThe corresponding estimated signal is denoted yn
Figure BDA0003278712260000035
A t-th frame representing an nth estimated signal in a k-th frequency band. The time-frame indicator is ignored,
Figure BDA0003278712260000036
for the purpose of separation, the estimated signals are made independent as much as possible, and mutual information is used as a measure of independence to construct a cost function.
1) If a Laplace source signal distribution model is selected, the mutual information cost function is properly modified to be suitable for a scene with a close sound source position, and the final cost function can be written in the following form:
Figure BDA0003278712260000037
wherein
Figure BDA0003278712260000038
Which represents the average of the sample samples,
Figure BDA0003278712260000039
is | | | yn||2F represents the probability density distribution function of the source signal as a function of the argument. Adopting the optimization skill of the major-minimization (MM), constructing an auxiliary function:
Figure BDA00032787122600000310
wherein
Figure BDA00032787122600000311
Is an auxiliary variable. Order to
Figure BDA00032787122600000312
Optimal condition for obtaining solution
Figure BDA00032787122600000313
Where q is another source signal indicator. The iteration rule is then:
Figure BDA0003278712260000041
Figure BDA0003278712260000042
Figure BDA0003278712260000043
Figure BDA0003278712260000044
g' (. cndot.) represents the first derivative of G (-), enRepresenting a unit vector, the nth element is 1 and the remaining elements are 0. For Laplace distribution, G (| | y)n||2)=||yn||2,G'(||yn||2) 1. Initializing the separation matrix into a unit matrix, and then iterating until convergence according to the rules of the formulas (4) to (7) to obtain the optimized separation matrix.
2) If the MNMF is used as a source signal distribution model, cost functions of the IVA and the MNMF are fused, and the cost functions are properly modified to be suitable for a scene with a close sound source position, the final cost function can be written into the following form:
Figure BDA0003278712260000045
wherein ,tkl,n and vlt,nRespectively, the base and activation parameters of different sound sources, and l is an index of the base. Adopting the optimization skill of the major-minimization (MM), the following iteration rules are obtained:
Figure BDA0003278712260000046
Figure BDA0003278712260000047
Figure BDA0003278712260000048
Figure BDA0003278712260000049
wherein the model parameter tkl,n and vlt,nThe update rules of (1) are respectively:
Figure BDA0003278712260000051
Figure BDA0003278712260000052
wherein
Figure BDA0003278712260000053
Represents the mean of the samples and l' is a new indicator of the basis. Initializing the separation matrix into an identity matrix, and then iterating until convergence according to the rules of equations (9) - (14) to obtain an optimized separation matrix.
3. Amplitude normalization
In order to solve the uncertainty of the recovered signal amplitude, amplitude normalization needs to be performed on the separation matrix obtained after convergence. According to the MDP, the following processing is further carried out on the optimized separation matrix:
Wk←(Wk(Wk)H)-1/2Wk (15)
4. reconstructing a target signal
1) Estimating a time-frequency domain target signal
From the final separation matrix obtained by equation (15), the speech signal after each band separation can be estimated by the following equation:
yk=Wkxk (16)
2) reconstructing a time-domain target signal
And finally, transforming the separated time-frequency domain voice signals to the time domain through short-time Fourier inverse transformation, and recovering the time-domain signals.
Examples
The technical scheme in the embodiment of the invention is clearly and completely described below with reference to the accompanying drawings.
1. Test sample and objective evaluation criteria
The clean speech signal in this embodiment is selected from the TIMIT data set (a speech signal of 10s length is formed by cutting and splicing) and has a sampling rate of 16 kHz. Room impulse responses were generated with an Image model (j.b. allen and d.a. berkley, "Image method for influencing small-room communications," j.acoust. soc. am., vol.65, pp.943-950,1979.), the room size was 7m × 5m × 2.75m, and the reverberation times were set to 0ms, 100ms, 300ms, 500ms, 700ms, respectively. As shown in fig. 2, in the present embodiment, 2 microphones are used to receive signals from 2 sound sources. The distance between the two microphones is 2.5cm, and the center of the two microphones is positioned at the position of [4,1,1.5] (m). The sound source and the microphone are positioned on the same horizontal plane, the two sound sources are respectively positioned at 45 degrees and 60 degrees, and the distance from the center of the array is 1 m. Clean speech signals are convolved with the room impulse response and 100 different segments of mixed signal are generated by adding a diffuse noise signal to noise ratio (SNR) of 30dB as per the method in the literature (e.a. habps and s.gannot, "Generating sensor signals in anisotropic noise fields," JASA, vol.122, No.6, pp.3464-3470,2007.). All algorithms are processed in the time-frequency domain, and the short-time Fourier transform adopts a 2048-point Hanning window and an overlap ratio of 3/4.
In the embodiment, a signal to deviation ratio (SDR) and a signal to interference ratio (SIR) are used as objective evaluation criteria, and an output SDR value (SDR _ out)/SIR value (SIR _ out) after algorithm processing is subtracted from an SDR value (SDR _ in)/SIR value (SIR _ in) of an input mixed signal to obtain an SDR boost value (sdramp)/SIR boost value (SIRimp) after algorithm processing, that is, sdramp is SDR _ out-SDR _ in and SIRimp is SIR _ out-SIR _ in.
2. Concrete implementation process of method
Referring to fig. 1, a time-domain mixed speech signal is input and subjected to short-time fourier transform to obtain a time-frequency spectrum, and a separation matrix of each frequency band is initialized to an identity matrix. In a modified AuxIVA algorithm (denoted as AuxIVA-imp), iterative optimization is performed using equations (4) - (7); in the modified ILRMA algorithm (denoted ILRMA-imp), iterative optimization is performed using equations (9) - (14). After iterative convergence, the final separation matrix W is obtained by amplitude warping with the formula (15)kAnd substituting the estimated time-domain frequency spectrum into a formula (16) to obtain a separated voice time-domain frequency spectrum estimation, and finally performing short-time Fourier inverse transformation on the estimated voice time-domain frequency spectrum to obtain a separated time-domain voice signal.
In order to embody the performance of the method of the present invention, the present embodiment compares the original Auxiva algorithm (denoted as Auxiva-ori) (N.Ono, "Stable and fast update rules for independent vector analysis based on automatic function technology," in Proc.IEEE WASPAA, pp.189-192,2011 ") with the ILRMA algorithm (denoted as ILRMA-ori) (D.Kitamura, N.Ono, H.Sawada, H.Kameoka, and H.Sarwari," Determined blue resource partitioning analysis and non-networked function analysis, "IEEE transactions, speed, Lang.Process, vol.24, No.9, pp.1626-1641,2016) with the method of the present invention. FIG. 3 shows the results of an average SDRimp obtained from 100 tests at different reverberation times; figure 4 shows the results of the average SIRimp obtained from 100 tests at different reverberation times.
It can be found that in a scene with a close sound source position, compared with the original algorithm, the method of the present invention can perform separation more effectively under a noisy condition, and the advantage is more obvious in the case of medium-low reverberation.

Claims (9)

1. A method for separating a speech signal located close to a sound source, the method comprising the steps of:
step 1, acquiring a mixed voice time-frequency domain signal to be processed;
step 2, initializing a separation matrix of each frequency band for the mixed voice time-frequency domain signal;
step 3, performing joint optimization on the separation matrixes of all frequency bands to solve the sequencing uncertainty;
step 4, conducting amplitude normalization on the optimized separation matrix;
step 5, estimating a time-frequency domain voice signal according to the separation matrix processed in the step 4;
and 6, restoring the time domain voice signal from the time-frequency domain voice signal estimated in the step 5.
2. The method for separating a voice signal of a location close to a sound source according to claim 1, wherein the step 1 comprises the following steps: and acquiring a time domain signal of the mixed voice to be processed by using a signal acquisition system, and performing short-time Fourier transform on the time domain signal to obtain a time-frequency domain signal of the mixed voice to be processed.
3. The method as claimed in claim 1, wherein the step 2 initializes the separation matrix of each frequency band by using an identity matrix, the diagonal element of the matrix is 1, and the remaining elements are 0.
4. The method as claimed in claim 1, wherein the step 3 of jointly optimizing the separation matrices of all frequency bands comprises the following steps:
(1) selecting a source signal distribution model to obtain a cost function;
(2) selecting an optimization method for the cost function to obtain an update rule of the separation matrix;
(3) and iterating the separation matrix by using the updating rule until convergence is achieved, and obtaining the separation matrix after each frequency band is optimized.
5. The method according to claim 4, wherein the Laplace distribution is selected as the source signal distribution model in step (1), and the cost function is:
Figure FDA0003278712250000011
wherein ,
Figure FDA0003278712250000012
representing the sample-sample average, G (-) is a scoring function determined by the source signal model; n is the source signal index and N is 1,2, …, N is the total number of source signals; k is a frequency index and K is 1,2, …, K being the total number of frequency bands;
Figure FDA0003278712250000013
representing the nth estimated signal, detW, in the k-th frequency bandkIs a determinant of a separation matrix within the k-th frequency band;
adopting a maj orientation-minimization optimization method for the cost function, and obtaining an update rule of the separation matrix as follows:
Figure FDA0003278712250000021
Figure FDA0003278712250000022
Figure FDA0003278712250000023
Figure FDA0003278712250000024
wherein
Figure FDA0003278712250000025
Represents a separation matrix WkIn the nth row of (1), the superscript H denotes the conjugate transpose, xkRepresenting the mixed signal vector in the k-th frequency band,
Figure FDA0003278712250000026
m represents the total number of microphones, G' (. cndot.) represents the first derivative of G (-), G (r)n)=rn,G'(rn)=1;enRepresenting a unit vector, the nth element is 1 and the remaining elements are 0.
6. The method according to claim 4, wherein the multi-channel non-negative matrix decomposition is selected as the source signal model in step (1), and the cost function is:
Figure FDA0003278712250000027
where t is a time frame index, tkl,n and vlt,nRespectively, a base and an activation parameter of different sound sources, wherein l is an index of the base, N is a source signal index, and N is 1,2, …, N and N are the total number of source signals; k is a frequency index and K is 1,2, …, K being the total number of frequency bands;
Figure FDA0003278712250000028
t-th frame, detW, representing the n-th estimated signal in the k-th frequency bandkIs a determinant of a separation matrix within the k-th frequency band;
adopting a maj orientation-minimization optimization method for the cost function, and obtaining an update rule of the separation matrix as follows:
Figure FDA0003278712250000029
Figure FDA00032787122500000210
Figure FDA0003278712250000031
Figure FDA0003278712250000032
wherein tkl,n and vlt,nThe update rules of (1) are respectively:
Figure FDA0003278712250000033
Figure FDA0003278712250000034
wherein ,
Figure FDA0003278712250000035
represents the mean of the samples, enRepresenting a unit vector, the nth element is 1, the rest elements are 0, and l' is a new index of the base;
Figure FDA0003278712250000036
represents a separation matrix WkAnd the superscript H on the nth row of (a) indicates the conjugate transpose.
7. The method according to claim 1, wherein in the step 4, the amplitude warping is performed on the separation matrix according to a minimum distortion criterion, and the specific steps are as follows:
Wk←(Wk(Wk)H)-1/2Wk
wherein K is a frequency index, K is 1,2, …, and K is the total number of frequency bands; wkThe separation matrix representing the k-th band and the superscript H the conjugate transpose.
8. The method according to claim 7, wherein the step 5 comprises the following steps: separating the separation matrix W obtained in the step 4kWith the mixed speech time-frequency domain signal x to be processedkMultiplying to estimate the separated time-frequency domain speech signal yk
9. The method for separating a voice signal of a location close to a sound source according to claim 1, wherein the step 6 comprises the following steps: and 5, performing short-time Fourier inverse transformation on the time-frequency domain voice signal estimated in the step 5 to obtain a separated time-domain voice signal.
CN202111125927.4A 2021-09-26 2021-09-26 Voice signal separation method for sound source close to position Active CN113823316B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111125927.4A CN113823316B (en) 2021-09-26 2021-09-26 Voice signal separation method for sound source close to position

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111125927.4A CN113823316B (en) 2021-09-26 2021-09-26 Voice signal separation method for sound source close to position

Publications (2)

Publication Number Publication Date
CN113823316A true CN113823316A (en) 2021-12-21
CN113823316B CN113823316B (en) 2023-09-12

Family

ID=78915482

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111125927.4A Active CN113823316B (en) 2021-09-26 2021-09-26 Voice signal separation method for sound source close to position

Country Status (1)

Country Link
CN (1) CN113823316B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114220453A (en) * 2022-01-12 2022-03-22 中国科学院声学研究所 Multi-channel non-negative matrix decomposition method and system based on frequency domain convolution transfer function
CN116866123A (en) * 2023-07-13 2023-10-10 中国人民解放军战略支援部队航天工程大学 Convolution blind separation method without orthogonal limitation

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104333523A (en) * 2014-10-14 2015-02-04 集美大学 NPCA-based post nonlinear blind source separation method
WO2016050725A1 (en) * 2014-09-30 2016-04-07 Thomson Licensing Method and apparatus for speech enhancement based on source separation
WO2016152511A1 (en) * 2015-03-23 2016-09-29 ソニー株式会社 Sound source separating device and method, and program
CN108597531A (en) * 2018-03-28 2018-09-28 南京大学 A method of improving binary channels Blind Signal Separation by more sound source activity detections
CN109584900A (en) * 2018-11-15 2019-04-05 昆明理工大学 A kind of blind source separation algorithm of signals and associated noises
CN110010148A (en) * 2019-03-19 2019-07-12 中国科学院声学研究所 A kind of blind separation method in frequency domain and system of low complex degree
CN111259327A (en) * 2020-01-15 2020-06-09 桂林电子科技大学 Subgraph processing-based optimization method for consistency problem of multi-agent system
CN112037813A (en) * 2020-08-28 2020-12-04 南京大学 Voice extraction method for high-power target signal
CN112185411A (en) * 2019-07-03 2021-01-05 南京人工智能高等研究院有限公司 Voice separation method, device, medium and electronic equipment
CN112820312A (en) * 2019-11-18 2021-05-18 北京声智科技有限公司 Voice separation method and device and electronic equipment

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016050725A1 (en) * 2014-09-30 2016-04-07 Thomson Licensing Method and apparatus for speech enhancement based on source separation
CN104333523A (en) * 2014-10-14 2015-02-04 集美大学 NPCA-based post nonlinear blind source separation method
WO2016152511A1 (en) * 2015-03-23 2016-09-29 ソニー株式会社 Sound source separating device and method, and program
CN108597531A (en) * 2018-03-28 2018-09-28 南京大学 A method of improving binary channels Blind Signal Separation by more sound source activity detections
CN109584900A (en) * 2018-11-15 2019-04-05 昆明理工大学 A kind of blind source separation algorithm of signals and associated noises
CN110010148A (en) * 2019-03-19 2019-07-12 中国科学院声学研究所 A kind of blind separation method in frequency domain and system of low complex degree
CN112185411A (en) * 2019-07-03 2021-01-05 南京人工智能高等研究院有限公司 Voice separation method, device, medium and electronic equipment
CN112820312A (en) * 2019-11-18 2021-05-18 北京声智科技有限公司 Voice separation method and device and electronic equipment
CN111259327A (en) * 2020-01-15 2020-06-09 桂林电子科技大学 Subgraph processing-based optimization method for consistency problem of multi-agent system
CN112037813A (en) * 2020-08-28 2020-12-04 南京大学 Voice extraction method for high-power target signal

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
SHRIKANT VENKATARAMANI: "Performance Based Cost Functions for End-to-End Speech Separation", 《2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC)》 *
陈田田: "基于稀疏分量分析的欠定盲声源分离技术研究", 《中国优秀硕士学位论文全文数据库》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114220453A (en) * 2022-01-12 2022-03-22 中国科学院声学研究所 Multi-channel non-negative matrix decomposition method and system based on frequency domain convolution transfer function
CN116866123A (en) * 2023-07-13 2023-10-10 中国人民解放军战略支援部队航天工程大学 Convolution blind separation method without orthogonal limitation
CN116866123B (en) * 2023-07-13 2024-04-30 中国人民解放军战略支援部队航天工程大学 Convolution blind separation method without orthogonal limitation

Also Published As

Publication number Publication date
CN113823316B (en) 2023-09-12

Similar Documents

Publication Publication Date Title
US20210089967A1 (en) Data training in multi-sensor setups
US9668066B1 (en) Blind source separation systems
CN111133511B (en) sound source separation system
US8874439B2 (en) Systems and methods for blind source signal separation
CN113823316B (en) Voice signal separation method for sound source close to position
CN107479030A (en) Based on frequency dividing and improved broad sense cross-correlation ears delay time estimation method
Saruwatari et al. Blind source separation for speech based on fast-convergence algorithm with ICA and beamforming
Nesta et al. Robust Automatic Speech Recognition through On-line Semi Blind Signal Extraction
CN112037813B (en) Voice extraction method for high-power target signal
CN112201276B (en) TC-ResNet network-based microphone array voice separation method
Araki et al. Underdetermined blind separation of convolutive mixtures of speech with directivity pattern based mask and ICA
CN110265060B (en) Speaker number automatic detection method based on density clustering
JP6910609B2 (en) Signal analyzers, methods, and programs
Cobos et al. Two-microphone separation of speech mixtures based on interclass variance maximization
JP7046636B2 (en) Signal analyzers, methods, and programs
CN112820312A (en) Voice separation method and device and electronic equipment
Jafari et al. Sparse coding for convolutive blind audio source separation
CN114863944B (en) Low-delay audio signal overdetermined blind source separation method and separation device
JP6285855B2 (en) Filter coefficient calculation apparatus, audio reproduction apparatus, filter coefficient calculation method, and program
CN114283832A (en) Processing method and device for multi-channel audio signal
CN113393850A (en) Parameterized auditory filter bank for end-to-end time domain sound source separation system
JP4714892B2 (en) High reverberation blind signal separation apparatus and method
Hsu et al. Array configuration-agnostic personalized speech enhancement using long-short-term spatial coherence
SHIRAISHI et al. Blind source separation by multilayer neural network classifiers for spectrogram analysis
Douglas et al. Natural gradient multichannel blind deconvolution and source separation using causal FIR filters

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant