CN113470689B - Voice separation method - Google Patents

Voice separation method Download PDF

Info

Publication number
CN113470689B
CN113470689B CN202110968974.9A CN202110968974A CN113470689B CN 113470689 B CN113470689 B CN 113470689B CN 202110968974 A CN202110968974 A CN 202110968974A CN 113470689 B CN113470689 B CN 113470689B
Authority
CN
China
Prior art keywords
residual
signals
signal
voice
path
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110968974.9A
Other languages
Chinese (zh)
Other versions
CN113470689A (en
Inventor
梁骏
沈旭东
叶丰
卢燕
姚欢
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Guoxin Microelectronics Co.,Ltd.
Original Assignee
Hangzhou Nationalchip Science & Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Nationalchip Science & Technology Co ltd filed Critical Hangzhou Nationalchip Science & Technology Co ltd
Priority to CN202110968974.9A priority Critical patent/CN113470689B/en
Publication of CN113470689A publication Critical patent/CN113470689A/en
Application granted granted Critical
Publication of CN113470689B publication Critical patent/CN113470689B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The invention discloses a voice separation method. The existing method does not process the human voice residues of other residual channels of the blind source separation algorithm. Firstly, processing voice signals of a plurality of channels through a blind source separation module to obtain voice signals after multipath separation, wherein each path of separated voice signals contains voice signals remained in other channels; then, after being processed by a residual selection module, a multi-path residual output signal is obtained; the residual output signals are filtered by a residual suppression module to obtain expected residual signals, and each path of separated voice signals output by the blind source separation module is subtracted from the corresponding expected residual signals to serve as residual signals, namely the final output voice signals of the path; the adaptive filter of the residual suppression module updates the filter coefficients using an iterative algorithm using the residual signal. The method can separate the voice residual signals of other channels, reduce the energy of the voice residual, does not damage the expected signals, and improves the voice separation quality.

Description

Voice separation method
Technical Field
The invention belongs to the field of voice processing, and particularly relates to a method for separating voice.
Background
Speech extraction in noisy environments, as well as in multi-acoustic and environmental environments, is a key technology. In a scenario where multiple microphones are used, independent Component Analysis (ICA) is a well established approach. ICA was first discussed in us patent No. 5706402. Assuming that each microphone signal is a mixture of multiple independent speech signals, the independent component analysis iterates a unmixed (un-mixing) matrix, which is multiplied by the mixed signal, resulting in a separate signal. The objective of the independent component analysis iterations is that the relative entropy of these signals be maximized, thereby minimizing information redundancy. ICA does not require information of each signal source, and is also a kind of blind source separation method (BSS).
But ICA algorithms cannot completely separate the signals recorded by the microphones. Each path of the output signal of the ICA algorithm includes signal energy of undesired speech of other paths of smaller amplitude in addition to the signal of the desired target object. Since the human hearing system is a very sensitive system, the signal energy of the undesired speech is clearly audible to the human even if it is suppressed to a large extent. After post-processing such as noise reduction, the ICA output signal can further reduce the signal energy of the undesired voice, but this generally causes damage to the desired signal.
The patent number 200380109681.5 invention discloses a system and method for separating a mixture of audio signals into a desired audio signal and a noise signal. The method sets the microphone to receive the mixed audio signal and the independent component analysis ICA uses stability constraints to process the sound mixture. The ICA process recognizes and separates a target sound signal using predetermined characteristics of a desired sound signal. The filter coefficients are adapted with learning rules and stabilize the filter weight update dynamics to assist in converging to a stable separated ICA signal result. The separated signals are aided using post-processing and pre-processing techniques and information to further reduce noise effects.
The invention patent number 201110117022.2 discloses a frequency domain blind separation ordering algorithm of a convolution voice signal. For convolutionally aliased speech signals, the time sequence is first converted to the frequency domain, blind separation is performed on each frequency band using the frequency domain ICA algorithm, and then the ordering is performed by the ordering algorithm. The sequencing method comprises the following steps: the reference frequency bands are firstly selected for alignment, then the rest frequency bands are ordered according to the already ordered reference frequency bands, the frequency bands with possibly wrong ordering are marked, and the DOA estimation based on the separation matrix is used for complementary alignment.
The invention patent number 201610866508.9 discloses a blind sound separation method, a blind sound separation structure, a voice control system and an electric appliance assembly. According to the method, a linear transient system assumption is made for an actual working environment, and an ICA-based blind source separation technology is provided for a linear transient mixing system, so that the noise reduction effect of a voice signal is achieved. The blind sound separation method comprises the following steps: noise reduction pretreatment is carried out on the detected voice signals, wherein the voice signals are linear superposition signals of voice information of a plurality of signal sources; constructing an objective function on the preprocessed voice signal by using a non-Gaussian metric; and estimating the separation matrix with the maximum expected objective function through an iterative algorithm.
Disclosure of Invention
The invention aims at providing a voice separation method with higher performance aiming at the characteristics of the existing blind source separation method, and reduces the residues of undesired signals.
The method comprises the steps that firstly, voice signals of N channels are processed through a blind source separation module, N paths of separated voice signals are obtained, and the separated voice signals contain voice signals remained in other channels, wherein N is more than or equal to 2; then, after being processed by a residual selection module, N paths of residual output signals are obtained; the residual output signal is filtered by a residual suppression module to obtain an expected residual signal, the residual voice signals which are output by a blind source separation module and contain the residual voice signals of other channels are subtracted with the expected residual signal to be used as residual signals, the nth path of residual signals are the final nth path of output voice signals, and n=1, 2, …, and N ',1 is less than or equal to N'.
The voice signals of the N channels are input into a blind source separation module, the blind source separation module outputs N paths of voice output signals, and N is a natural number larger than 1; the blind source separation module adopts a blind source separation algorithm including ICA to separate N paths of input voice signals and outputs N paths of voice signals containing other channel signal residues.
The residual selection module comprises N' path selection processing channels, the N path selection processing channels take all N paths of separated voice signals output by the blind source separation module as input, the N path of separated voice signals are taken as main signals, and the N path selection processing channels calculate the probability that the main signals are target voice signals at each moment: when the probability is larger than or equal to a set threshold value, outputting a voice signal with the amplitude of 0 at the moment; when the probability is smaller than a set threshold, outputting a main signal at the current moment as an nth path of residual output signal of the residual selection module; the N 'path selection processing channel outputs N' path residual output signals; the probability of the target voice is calculated by adopting a phase comparison or signal amplitude comparison method.
The residual suppression module comprises an N' path suppression processing channel, the N path suppression processing channel takes the N path of voice output signal output by the blind source separation module and the N path of residual output signal output by the residual selection module as inputs, performs residual suppression, and outputs the voice signal as a final N path of output voice signal.
The nth path of the inhibition processing channel in the residual inhibition module comprises an adaptive filter, and the adaptive filter filters the input corresponding residual output signal to obtain an expected residual signal; the n-th path of voice output signal of the blind source separation module is subtracted from the expected residual signal to be used as a residual signal, and the adaptive filter updates the filter coefficient by using the residual signal and adopting an iterative algorithm; when the residual output signal received by the residual suppression module is 0, the adaptive filter does not update the filter coefficients. The nth residual signal is the final nth output voice signal.
The existing post-processing algorithm focuses on reducing the residual noise signals of the blind source separation algorithm, and human voice residues of other channels of the residual blind source separation algorithm are not processed. The method reduces the signals of the human voice residues of other residual channels of the blind source separation algorithm through the self-adaptive filter, reduces the energy of the human voice residues heard by people, does not bring the damage of expected signals, and improves the voice separation quality.
Detailed Description
The following describes a voice separation method according to the present invention using three voice signals as an example.
The voice signals of the three channels are input into a blind source separation module, the blind source separation module carries out voice signal separation by adopting an ICA algorithm, three paths of separated voice signals B1, B2 and B3 are output, each path of separated voice signal contains residual voice signals of other channels, wherein the first path of voice signal B1 contains the residues of B2 and B3, the second path of voice signal B2 contains the residues of B1 and B3, and the third path of voice signal B3 contains the residues of B1 and B2.
The residual selection module comprises three paths of selection processing channels, each path of selection processing channel takes voice signals B1, B2 and B3 as input, the voice signals B1, B2 and B3 after the 1 st path, the 2 nd path and the 3 rd path are separated as main signals, each path of selection processing channel calculates the probability that the main signal is a target voice signal at each moment, and the probability is calculated through amplitude difference: for the first path selection processing channel, fourier transformation is performed on the voice signals B1, B2 and B3, wherein the voice signal B1 is used as a main signal, the amplitude difference between the voice signal B1 and the voice signal B2 and the amplitude difference between the voice signal B1 and the voice signal B3 are compared on each time frequency point, and if the two amplitude differences are greater than or equal to an amplitude difference threshold D1 (set to 0), the amplitude of the voice signal output at the moment is 0; and if the two amplitude differences are not more than or equal to the amplitude difference threshold D1, outputting a main signal at the current moment.
The residual suppression module comprises three suppression processing channels. Taking the first path as an example, the first path inhibition processing channel takes the first path voice output signal B1 output by the blind source separation module and the first path residual output signal of the residual selection module as inputs, carries out residual inhibition, and outputs a path voice signal as a final first path output voice signal.
The first path of inhibition processing channel in the residual inhibition module comprises an adaptive filter with the length of K (K is a natural number), and the adaptive filter filters the input corresponding residual output signal to obtain an expected residual signal; the first path of voice output signal B1 of the blind source separation module is subtracted from the expected residual signal to be used as a residual signal S1, the adaptive filter uses the residual signal S1, and an iterative algorithm (such as an LMS algorithm) is adopted to update the filter coefficient; when the residual output signal received by the residual suppression module is 0, the adaptive filter does not update the filter coefficients. The first path of residual signal S1 is the final first path of output voice signal.
Alternatively, if it is desired to separate only one or two speech signals, the residual selection module may include only one or two selection processing channels, and the residual suppression module may include only one or two suppression processing channels.
It should be understood that the foregoing examples are illustrative of the present invention and are not intended to be limiting, and that any invention that does not depart from the spirit and scope of the present invention falls within the scope of the present invention.

Claims (7)

1. A method of speech separation, characterized by:
firstly, processing voice signals of N channels through a blind source separation module to obtain N paths of separated voice signals, wherein each path of separated voice signals contains voice signals remained in other channels, and N is more than or equal to 2; then, after being processed by a residual selection module, N paths of residual output signals are obtained; the residual output signals are filtered by a residual suppression module to obtain expected residual signals, and each path of separated voice signals output by the blind source separation module is subtracted from the corresponding expected residual signals to serve as residual signals, namely the final output voice signals of the path; the adaptive filter of the residual suppression module updates the filter coefficient by using a residual signal and adopting an iterative algorithm;
the residual selection module comprises N ' path selection processing channels, the N path selection processing channels take all N paths of separated voice signals output by the blind source separation module as input, the N path of separated voice signals are taken as main signals, the N path selection processing channels calculate the probability that the main signals are target voice signals at each moment, and n=1, 2, …, N ',1 is less than or equal to N ' isless than or equal to N: when the probability is larger than or equal to a set threshold value, outputting a voice signal with the amplitude of 0 at the moment; when the probability is smaller than a set threshold, outputting a main signal at the current moment as an nth path of residual output signal of the residual selection module;
the residual suppression module comprises an N' path suppression processing channel, the N path suppression processing channel takes the N path of voice output signal output by the blind source separation module and the N path of residual output signal output by the residual selection module as inputs, performs residual suppression, and outputs the voice signal as a final N path of output voice signal.
2. A method of speech separation according to claim 1, wherein: the voice signals of N channels are input into a blind source separation module, a blind source separation algorithm adopted by the blind source separation module is used for separating N paths of input voice signals, and N paths of voice signals containing the residual signals of other channels are output.
3. A method of speech separation according to claim 2, wherein: the blind source separation algorithm adopts an ICA algorithm.
4. A method of speech separation according to claim 1, wherein: the probability that the main signal is the target voice signal at each moment is calculated by adopting a phase comparison method or a signal amplitude comparison method.
5. A method of speech separation according to claim 1, wherein: each path of inhibition processing channel in the residual inhibition module comprises an adaptive filter, and the adaptive filter filters the input corresponding residual output signal to obtain an expected residual signal; the n-th speech output signal of the blind source separation module is subtracted from the expected residual signal to be used as a residual signal, and the n-th residual signal is the final n-th output speech signal.
6. A method of speech separation according to claim 5 wherein: the adaptive filter updates the filter coefficient by using the residual signal and adopting an iterative algorithm; when the residual output signal received by the residual suppression module is 0, the adaptive filter does not update the filter coefficient.
7. The method of claim 6, wherein: the iterative algorithm adopts an LMS algorithm.
CN202110968974.9A 2021-08-23 2021-08-23 Voice separation method Active CN113470689B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110968974.9A CN113470689B (en) 2021-08-23 2021-08-23 Voice separation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110968974.9A CN113470689B (en) 2021-08-23 2021-08-23 Voice separation method

Publications (2)

Publication Number Publication Date
CN113470689A CN113470689A (en) 2021-10-01
CN113470689B true CN113470689B (en) 2024-01-30

Family

ID=77867013

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110968974.9A Active CN113470689B (en) 2021-08-23 2021-08-23 Voice separation method

Country Status (1)

Country Link
CN (1) CN113470689B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108962276A (en) * 2018-07-24 2018-12-07 北京三听科技有限公司 A kind of speech separating method and device
EP3671739A1 (en) * 2018-12-21 2020-06-24 FRAUNHOFER-GESELLSCHAFT zur Förderung der angewandten Forschung e.V. Apparatus and method for source separation using an estimation and control of sound quality
CN112530452A (en) * 2020-11-23 2021-03-19 北京蓦然认知科技有限公司 Post-filtering compensation method, device and system

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1830026B (en) * 2001-01-30 2011-06-15 汤姆森特许公司 Geometric source preparation signal processing technique
KR20050115857A (en) * 2002-12-11 2005-12-08 소프트맥스 인코퍼레이티드 System and method for speech processing using independent component analysis under stability constraints
CN106356075B (en) * 2016-09-29 2019-09-17 合肥美的智能科技有限公司 Blind sound separation method, structure and speech control system and electric appliance assembly

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108962276A (en) * 2018-07-24 2018-12-07 北京三听科技有限公司 A kind of speech separating method and device
EP3671739A1 (en) * 2018-12-21 2020-06-24 FRAUNHOFER-GESELLSCHAFT zur Förderung der angewandten Forschung e.V. Apparatus and method for source separation using an estimation and control of sound quality
CN112530452A (en) * 2020-11-23 2021-03-19 北京蓦然认知科技有限公司 Post-filtering compensation method, device and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
多人声混叠语音信号的盲源分离算法研究;栗科峰 等;《信息通信》(第1期);第29-32页 *

Also Published As

Publication number Publication date
CN113470689A (en) 2021-10-01

Similar Documents

Publication Publication Date Title
Zhang et al. ADL-MVDR: All deep learning MVDR beamformer for target speech separation
US10403299B2 (en) Multi-channel speech signal enhancement for robust voice trigger detection and automatic speech recognition
JP4166706B2 (en) Adaptive beamforming method and apparatus using feedback structure
EP2183853B1 (en) Robust two microphone noise suppression system
JP5060631B1 (en) Signal processing apparatus and signal processing method
JP2009540378A (en) Signal separator, method for determining an output signal based on a microphone signal, and computer program
Caroselli et al. Adaptive Multichannel Dereverberation for Automatic Speech Recognition.
Van den Bogaert et al. Binaural cue preservation for hearing aids using an interaural transfer function multichannel Wiener filter
WO2020170907A1 (en) Signal processing device, learning device, signal processing method, learning method, and program
JPWO2014024248A1 (en) Beam forming equipment
US11647344B2 (en) Hearing device with end-to-end neural network
Marin-Hurtado et al. Perceptually inspired noise-reduction method for binaural hearing aids
CN113470689B (en) Voice separation method
CN114339539A (en) Multi-channel speech enhancement method adopting microphone array for pickup
AU778351B2 (en) Circuit and method for the adaptive suppression of noise
Hidri et al. About multichannel speech signal extraction and separation techniques
Aroudi et al. Cognitive-driven convolutional beamforming using EEG-based auditory attention decoding
Uhle et al. Determined source separation for microphone recordings using IIR filters
Chen et al. A cascaded speech enhancement for hearing aids in noisy-reverberant conditions
Ali et al. A noise reduction strategy for hearing devices using an external microphone
Park et al. Postprocessing with Wiener filtering technique for reducing residual crosstalk in blind source separation
US11838727B1 (en) Hearing aids with parallel neural networks
Zorilă et al. On reducing the effect of speaker overlap for CHiME-5
Douglas et al. Blind separation of acoustical mixtures without time-domain deconvolution or decorrelation
CN112770222A (en) Audio processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address

Address after: 310012 5-6 / F, block a, East Software Park Innovation Building, 90 Wensan Road, Hangzhou City, Zhejiang Province

Patentee after: Hangzhou Guoxin Microelectronics Co.,Ltd.

Country or region after: China

Address before: 310012 5-6 / F, block a, East Software Park Innovation Building, 90 Wensan Road, Hangzhou City, Zhejiang Province

Patentee before: HANGZHOU NATIONALCHIP SCIENCE & TECHNOLOGY Co.,Ltd.

Country or region before: China