CN111599372A - Stable on-line multi-channel voice dereverberation method and system - Google Patents

Stable on-line multi-channel voice dereverberation method and system Download PDF

Info

Publication number
CN111599372A
CN111599372A CN202010256507.9A CN202010256507A CN111599372A CN 111599372 A CN111599372 A CN 111599372A CN 202010256507 A CN202010256507 A CN 202010256507A CN 111599372 A CN111599372 A CN 111599372A
Authority
CN
China
Prior art keywords
signal
frequency domain
voice
covariance matrix
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010256507.9A
Other languages
Chinese (zh)
Other versions
CN111599372B (en
Inventor
李妍文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Unisound Intelligent Technology Co Ltd
Xiamen Yunzhixin Intelligent Technology Co Ltd
Original Assignee
Unisound Intelligent Technology Co Ltd
Xiamen Yunzhixin Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Unisound Intelligent Technology Co Ltd, Xiamen Yunzhixin Intelligent Technology Co Ltd filed Critical Unisound Intelligent Technology Co Ltd
Priority to CN202010256507.9A priority Critical patent/CN111599372B/en
Publication of CN111599372A publication Critical patent/CN111599372A/en
Application granted granted Critical
Publication of CN111599372B publication Critical patent/CN111599372B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0224Processing in the time domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L2021/02082Noise filtering the noise being echo, reverberation of the speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The invention provides a stable on-line multi-channel voice dereverberation method and a system, wherein the method comprises the following steps: performing first preprocessing on an input voice signal and converting the input voice signal from a time domain to a frequency domain; calculating a covariance matrix of an input voice signal; calculating a regularization vector corresponding to each frame of signal; each frequency band is independent, and a filter coefficient corresponding to the frequency domain signal is estimated by adopting a recursive least square method; calculating an auxiliary covariance matrix of covariance among channels, and correcting the auxiliary covariance matrix based on the regularization vector; updating the filter coefficient based on the covariance matrix and the corrected auxiliary covariance matrix to obtain a new filter coefficient; and the filtering module filters the frequency domain signal according to the new filter coefficient to obtain the frequency domain signal after the reverberation is removed, converts the signal from the frequency domain to the time domain and transmits the signal to the voice recognition system. The feature value range of the covariance matrix can be controlled by regularizing the covariance matrix, the matrix is prevented from entering a sick state, and the stability of the algorithm is enhanced.

Description

Stable on-line multi-channel voice dereverberation method and system
Technical Field
The invention relates to the technical field of voice processing, in particular to a stable on-line multi-channel voice dereverberation method and system.
Background
In the prior art, the received signal of the indoor microphone array is influenced by reverberation, so that the voice recognition performance is reduced. At present, the recursive least square filtering method is usually adopted to realize the online dereverberation of the voice, so that the recognition accuracy is improved to a great extent, however, the method has poor stability and is easy to disperse, and in an actual situation, due to the instantaneous variability and diversity of the voice, the processed voice result may be wrong, so that the voice recognition result is influenced.
Disclosure of Invention
The invention provides a stable on-line multi-channel voice dereverberation method and a system, which are used for solving the technical problem.
A stable online multi-channel speech dereverberation method, comprising:
step 1: performing first preprocessing on an input voice signal, and converting the voice signal subjected to the first preprocessing into a frequency domain from a time domain to obtain a frequency domain signal; meanwhile, calculating a covariance matrix of the input voice signal;
the first preprocessing comprises framing;
step 2: calculating a regularization vector corresponding to each frame signal in the frequency domain signal; and step 3: estimating a filter coefficient corresponding to the frequency domain signal by adopting a recursive least square method based on a mode that each frequency band is independent;
and 4, step 4: calculating an auxiliary covariance matrix of covariance among channels, and correcting the auxiliary covariance matrix based on the regularization vector calculated in the step 2;
and 5: updating the filter coefficient based on the covariance matrix and the corrected auxiliary covariance matrix to obtain a new filter coefficient, and outputting the new filter coefficient to a filtering module;
step 6: and the filtering module carries out filtering processing on the frequency domain signal according to the new filter coefficient to obtain a frequency domain signal after reverberation is removed, converts the signal after reverberation is removed from a frequency domain to a time domain and transmits the signal to a voice recognition system.
Preferably, step 1 further comprises: acquiring a voice signal by adopting a microphone array, and converting the voice signal into a digital signal;
the step 1 converts the voice signal after the first pretreatment from a time domain to a frequency domain through short-time Fourier transform;
the step 2 is to calculate a regularization vector corresponding to each frame of signal according to the number of the microphones and the length of the filter;
said step 6 transforms the dereverberated signal from the frequency domain to the time domain by a short time inverse fourier transform.
Preferably, the microphone array is a linear array or a circular array or a spherical array.
Preferably, in the framing processing in step 1, the frame length is 512 sample points, and the frame length is shifted to half of the frame length.
Preferably, the step 4 calculates an auxiliary covariance matrix of the covariance between the channels by using an auxiliary orthogonal transformation.
Preferably, the first pretreatment comprises sequentially performing: pre-emphasis processing, framing processing, windowing processing, and end point detection, wherein the end point detection is used for determining an effective signal of the digital signal, and extracting the effective signal part to serve as a signal output after the first pre-processing.
Preferably, after the microphone array is used for acquiring the voice signal, the second preprocessing is performed first, and then the voice signal is converted into a digital signal, where the second preprocessing includes: denoising;
the denoising processing comprises the following steps:
calculating the similarity of adjacent voice signals in the voice signals, and judging whether noise exists according to the similarity;
when noise exists, acquiring characteristic parameters of the noise contained in the voice signal;
denoising the voice signal according to the characteristic parameters;
and storing the denoised voice signal.
Preferably, the second preprocessing further includes a speech enhancement process, and the speech enhancement process includes:
determining the position and direction of a voice source according to the position of the microphone and the strength of the voice signal;
enhancing speech in the direction of the speech source while attenuating speech in the direction of the non-speech source.
A system for use in a dereverberation method as claimed in any preceding claim, the system comprising:
the first preprocessing module is used for performing the first preprocessing;
a first transformation module, configured to transform the first preprocessed voice signal from a time domain to a frequency domain;
a first calculation module for performing the calculation of a covariance matrix of the input speech signal;
a second calculation module, configured to perform the step 2;
a recursion module for performing said step 3;
a third calculation module for performing the step 4;
a filter coefficient update module for performing the step 5;
a filtering module, configured to perform filtering processing on the frequency domain signal in step 6;
a second transform module for converting the dereverberated signal from the frequency domain to the time domain.
Preferably, the system comprises:
the microphone array is used for acquiring a voice signal;
the input end of the second preprocessing module is connected with the output end of the microphone array;
and the input end of the audio coding and decoding chip is connected with the output end of the second preprocessing module, and the output end of the audio coding and decoding chip is connected with the first preprocessing module.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:
fig. 1 is a flow chart of a stable on-line multi-channel speech dereverberation method according to an embodiment of the present invention.
Fig. 2 is a schematic diagram of a stable on-line multi-channel speech dereverberation system according to an embodiment of the present invention.
Detailed Description
The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it will be understood that they are described herein for the purpose of illustration and explanation and not limitation.
In addition, the descriptions related to the first, the second, etc. in the present invention are only used for description purposes, do not particularly refer to an order or sequence, and do not limit the present invention, but only distinguish components or operations described in the same technical terms, and are not understood to indicate or imply relative importance or implicitly indicate the number of indicated technical features. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In addition, technical solutions and technical features between various embodiments can be combined with each other, but must be realized by a person skilled in the art, and when the technical solutions are contradictory or cannot be realized, the combination of the technical solutions should be considered to be absent and not be within the protection scope of the present invention.
An embodiment of the present invention provides a stable online multi-channel speech dereverberation method, as shown in fig. 1, including:
step 1: performing first preprocessing on an input voice signal, and converting the voice signal subjected to the first preprocessing into a frequency domain from a time domain to obtain a frequency domain signal; meanwhile, calculating a covariance matrix of the input voice signal;
the first preprocessing comprises framing;
step 2: calculating a regularization vector corresponding to each frame signal in the frequency domain signal; and step 3: estimating a filter coefficient corresponding to the frequency domain signal by adopting a recursive least square method based on a mode that each frequency band is independent;
and 4, step 4: calculating an auxiliary covariance matrix of covariance among channels, and correcting the auxiliary covariance matrix based on the regularization vector calculated in the step 2; wherein, the regularization control factor is introduced to change the regularization size of the matrix.
And 5: updating the filter coefficient based on the covariance matrix and the corrected auxiliary covariance matrix to obtain a new filter coefficient, and outputting the new filter coefficient to a filtering module;
step 6: and the filtering module carries out filtering processing on the frequency domain signal according to the new filter coefficient to obtain a frequency domain signal after reverberation is removed, converts the signal after reverberation is removed from a frequency domain to a time domain and transmits the signal to a voice recognition system.
Preferably, the step 4 calculates an auxiliary covariance matrix of the covariance between the channels by using an auxiliary orthogonal transformation.
The working principle of the technical scheme is as follows: at present, the on-line dereverberation of voice is usually realized by adopting a recursive least square filtering method, the solution of a covariance matrix is a key step of the recursive least square filtering process, the technical scheme adopts a regularization vector corresponding to each frame of signal to correct an auxiliary covariance matrix of covariance among channels, the corrected auxiliary covariance matrix and the signal are adopted to calculate the covariance matrix, and a filter coefficient is updated.
The beneficial effects of the above technical scheme are: according to the technical scheme, the characteristic value range of the matrix can be controlled by regularizing the covariance matrix, the matrix is prevented from entering a ill state, the stability of the algorithm is enhanced, the dispersion is not easy to occur, meanwhile, the dereverberation performance of the algorithm is not influenced, correct processed voice is obtained, and the accuracy of voice recognition is guaranteed.
In one embodiment, step 1 is preceded by: acquiring a voice signal by adopting a microphone array, and converting the voice signal into a digital signal;
the step 1 converts the voice signal after the first pretreatment from a time domain to a frequency domain through short-time Fourier transform;
the step 2 is to calculate a regularization vector corresponding to each frame of signal according to the number of the microphones and the length of the filter;
said step 6 transforms the dereverberated signal from the frequency domain to the time domain by a short time inverse fourier transform.
The microphone array is a linear array or a circular array or a spherical array, and preferably, the microphone array element spacing is 3.5 cm.
The beneficial effects of the above technical scheme are: the microphone array is convenient for collecting voice signals in different spatial directions; short-time fourier transforms allow more observation of the instantaneous frequency of the signal than fourier transforms.
In the framing process of step 1, the frame length is 512 sampling points, and the frame is shifted to half of the frame length.
The beneficial effects of the above technical scheme are: selecting the appropriate frame length and frame shift facilitates accurate signal processing.
In one embodiment, the first pre-processing comprises sequentially: pre-emphasis processing, framing processing, windowing processing, and end point detection, wherein the end point detection is used for determining an effective signal of the digital signal, and extracting the effective signal part to serve as a signal output after the first pre-processing.
The voice signal end point detection technology accurately determines a starting point and an end point of voice from a segment of signal containing voice, and distinguishes a voice signal (i.e. the effective signal) from a non-voice signal (including a silence segment and a noise segment).
The effective end point detection technology not only can reduce the data acquisition amount in the voice recognition system and save the processing time, but also can eliminate the interference of an unvoiced segment or a noise segment and improve the performance of the voice recognition system.
The beneficial effects of the above technical scheme are: the pre-emphasis process can be pre-emphasized by a first-order high-pass digital filter; because the voice signal has short-time stationarity, the voice signal can be divided into a plurality of short sections to be collected by windowing, so that the signal processing is more convenient; determining a valid signal of the digital signal through end point detection, and extracting a valid signal part to serve as a signal output after first preprocessing; the technical scheme ensures the reliability of signal processing and is convenient for the subsequent steps.
In an embodiment, the obtaining of the voice signal by using the microphone array is followed by performing a second preprocessing, and then converting the voice signal into a digital signal, where the second preprocessing includes: denoising;
the denoising processing comprises the following steps:
calculating the similarity of adjacent voice signals in the voice signals, and judging whether noise exists according to the similarity;
when noise exists, acquiring characteristic parameters of the noise contained in the voice signal;
denoising the voice signal according to the characteristic parameters;
and storing the denoised voice signal.
The working principle of the technical scheme is as follows: the denoising processing firstly calculates the similarity of adjacent voice signals in the voice signals, and judges whether noise exists according to the similarity; when noise exists, acquiring characteristic parameters of the noise contained in the voice signal; denoising the voice signal according to the characteristic parameters; finally, storing the voice signal after denoising;
the beneficial effects of the above technical scheme are: the technical scheme can ensure the noise processing effect and is more convenient to ensure the accuracy of the signal processing of the invention.
In one embodiment, the second pre-processing further comprises speech enhancement processing comprising:
determining the position and direction of a voice source according to the position of the microphone and the strength of the voice signal;
enhancing speech in the direction of the speech source while attenuating speech in the direction of the non-speech source.
The working principle effect of the technical scheme is as follows: the technical scheme determines the position and the direction of the voice source according to the position of the microphone and the strength of the voice signal; and enhancing the voice in the voice source direction and weakening the voice in the non-voice source direction at the same time according to the determined position and direction of the voice source.
The beneficial effects of the above technical scheme are: the voice in the voice source direction can be enhanced, and the voice signal processing effect can be ensured more conveniently.
A system for use in any of the above methods, as shown in fig. 2, comprising:
the first preprocessing module is used for performing the first preprocessing;
a first transformation module, configured to transform the first preprocessed voice signal from a time domain to a frequency domain;
a first calculation module for performing the calculation of a covariance matrix of the input speech signal;
a second calculation module, configured to perform the step 2;
a recursion module for performing said step 3;
a third calculation module for performing the step 4;
a filter coefficient update module for performing the step 5;
a filtering module, configured to perform the filtering processing on the frequency domain signal in step 6
Said filtering the received signal;
a second transform module for converting the dereverberated signal from the frequency domain to the time domain.
The working principle of the technical scheme is as follows: the first preprocessing is carried out through a first preprocessing module, and the voice signal after the first preprocessing is transmitted to a first conversion module; converting the voice signal from a time domain to a frequency domain through a first transformation module and transmitting the frequency domain signal to a first calculation module, a second calculation module and a recursion module; the first calculation module is used for calculating the covariance matrix of the input voice signal and transmitting the covariance matrix to the filter coefficient updating module; step 2 is executed by the second calculation module and is transmitted to a third calculation module, and step 4 is executed by the third calculation module and is transmitted to a filter coefficient updating module; step 3 is executed by the recursion module and transmitted to the filter updating module; the filter updating module executes the step 5 to obtain a new filter coefficient and transmits the new filter coefficient to the filtering module; and the filtering module carries out filtering according to the updated filter coefficient to obtain a frequency domain signal after the reverberation is removed, converts the signal after the reverberation is removed from the frequency domain to a time domain and sends the signal to the voice recognition system.
The beneficial effects of the above technical scheme are: according to the technical scheme, the characteristic value range of the matrix can be controlled by regularizing the covariance matrix, the matrix is prevented from entering a ill state, the stability of the algorithm is enhanced, the dispersion is not easy to occur, meanwhile, the dereverberation performance of the algorithm is not influenced, correct processed voice is obtained, and the accuracy of voice recognition is guaranteed.
In one embodiment, as shown in FIG. 2, the system comprises:
the microphone array is used for acquiring a voice signal;
the input end of the second preprocessing module is connected with the output end of the microphone array;
and the input end of the audio coding and decoding chip is connected with the output end of the second preprocessing module, and the output end of the audio coding and decoding chip is connected with the first preprocessing module.
The working principle of the technical scheme is as follows: and (analog) voice signals are acquired through the microphone array and are transmitted to the second preprocessing module for second preprocessing, the second preprocessing module transmits the (analog) voice signals subjected to the second preprocessing to the audio decoding chip, and the (analog) voice signals are converted into digital signals to be transmitted to the first preprocessing module.
The beneficial effects of the above technical scheme are: the microphone array and the audio decoding chip are used for acquiring voice signals and converting the voice signals into digital signals, so that subsequent processing is facilitated, and second preprocessing is performed on the signals through the second preprocessing module, so that the reliability of signal transmission is guaranteed.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (10)

1. A method for stable on-line multi-channel speech dereverberation, comprising:
step 1: performing first preprocessing on an input voice signal, and converting the voice signal subjected to the first preprocessing into a frequency domain from a time domain to obtain a frequency domain signal; meanwhile, calculating a covariance matrix of the input voice signal;
the first preprocessing comprises framing;
step 2: calculating a regularization vector corresponding to each frame signal in the frequency domain signal;
and step 3: estimating a filter coefficient corresponding to the frequency domain signal by adopting a recursive least square method based on a mode that each frequency band is independent;
and 4, step 4: calculating an auxiliary covariance matrix of covariance among channels, and correcting the auxiliary covariance matrix based on the regularization vector calculated in the step 2;
and 5: updating the filter coefficient based on the covariance matrix and the corrected auxiliary covariance matrix to obtain a new filter coefficient, and outputting the new filter coefficient to a filtering module;
step 6: and the filtering module carries out filtering processing on the frequency domain signal according to the new filter coefficient to obtain a frequency domain signal after reverberation is removed, converts the signal after reverberation is removed from a frequency domain to a time domain and transmits the signal to a voice recognition system.
2. The method of claim 1, further comprising before step 1: acquiring a voice signal by adopting a microphone array, and converting the voice signal into a digital signal;
the step 1 converts the voice signal after the first pretreatment from a time domain to a frequency domain through short-time Fourier transform;
the step 2 is to calculate a regularization vector corresponding to each frame of signal according to the number of the microphones and the length of the filter;
said step 6 transforms the dereverberated signal from the frequency domain to the time domain by a short time inverse fourier transform.
3. The method of claim 2, wherein the microphone array is a linear array, a circular array or a spherical array.
4. The method of claim 1, wherein the frame length in the framing process of step 1 is 512 samples, and the frame length is half of the frame length.
5. The method of claim 1, wherein the step 4 uses an auxiliary orthogonal transformation to calculate an auxiliary covariance matrix of the covariance between the channels.
6. The method of claim 1, wherein the first preprocessing comprises sequentially performing: pre-emphasis processing, framing processing, windowing processing, and end point detection, wherein the end point detection is used for determining an effective signal of the digital signal, and extracting the effective signal part to serve as a signal output after the first pre-processing.
7. The method as claimed in claim 2, wherein the obtaining of the speech signal by the microphone array is followed by a second pre-processing, and then the second pre-processing is performed to convert the speech signal into a digital signal, and the second pre-processing includes: denoising;
the denoising processing comprises the following steps:
calculating the similarity of adjacent voice signals in the voice signals, and judging whether noise exists according to the similarity;
when noise exists, acquiring characteristic parameters of the noise contained in the voice signal;
denoising the voice signal according to the characteristic parameters;
and storing the denoised voice signal.
8. The method of claim 7, wherein the second pre-processing further comprises a speech enhancement process, the speech enhancement process comprising:
determining the position and direction of a voice source according to the position of the microphone and the strength of the voice signal;
enhancing speech in the direction of the speech source while attenuating speech in the direction of the non-speech source.
9. A system for use in a dereverberation method as claimed in any one of claims 1 to 8, characterized in that the system comprises:
the first preprocessing module is used for performing the first preprocessing;
a first transformation module, configured to transform the first preprocessed voice signal from a time domain to a frequency domain;
a first calculation module for performing the calculation of a covariance matrix of the input speech signal;
a second calculation module, configured to perform the step 2;
a recursion module for performing said step 3;
a third calculation module for performing the step 4;
a filter coefficient update module for performing the step 5;
a filtering module, configured to perform filtering processing on the frequency domain signal in step 6;
a second transform module for converting the dereverberated signal from the frequency domain to the time domain.
10. The system of claim 9, wherein the system comprises:
the microphone array is used for acquiring a voice signal;
the input end of the second preprocessing module is connected with the output end of the microphone array;
and the input end of the audio coding and decoding chip is connected with the output end of the second preprocessing module, and the output end of the audio coding and decoding chip is connected with the first preprocessing module.
CN202010256507.9A 2020-04-02 2020-04-02 Stable on-line multi-channel voice dereverberation method and system Active CN111599372B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010256507.9A CN111599372B (en) 2020-04-02 2020-04-02 Stable on-line multi-channel voice dereverberation method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010256507.9A CN111599372B (en) 2020-04-02 2020-04-02 Stable on-line multi-channel voice dereverberation method and system

Publications (2)

Publication Number Publication Date
CN111599372A true CN111599372A (en) 2020-08-28
CN111599372B CN111599372B (en) 2023-03-21

Family

ID=72185460

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010256507.9A Active CN111599372B (en) 2020-04-02 2020-04-02 Stable on-line multi-channel voice dereverberation method and system

Country Status (1)

Country Link
CN (1) CN111599372B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112700787A (en) * 2021-03-24 2021-04-23 深圳市中科蓝讯科技股份有限公司 Noise reduction method, nonvolatile readable storage medium and electronic device
CN113299301A (en) * 2021-04-21 2021-08-24 北京搜狗科技发展有限公司 Voice processing method and device for voice processing
WO2023016018A1 (en) * 2021-08-12 2023-02-16 北京荣耀终端有限公司 Voice processing method and electronic device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108172231A (en) * 2017-12-07 2018-06-15 中国科学院声学研究所 A kind of dereverberation method and system based on Kalman filtering
US20180350379A1 (en) * 2017-06-02 2018-12-06 Apple Inc. Multi-Channel Speech Signal Enhancement for Robust Voice Trigger Detection and Automatic Speech Recognition
CN110915233A (en) * 2017-04-20 2020-03-24 弗劳恩霍夫应用研究促进协会 Apparatus and method for multi-channel interference cancellation

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110915233A (en) * 2017-04-20 2020-03-24 弗劳恩霍夫应用研究促进协会 Apparatus and method for multi-channel interference cancellation
US20180350379A1 (en) * 2017-06-02 2018-12-06 Apple Inc. Multi-Channel Speech Signal Enhancement for Robust Voice Trigger Detection and Automatic Speech Recognition
CN108172231A (en) * 2017-12-07 2018-06-15 中国科学院声学研究所 A kind of dereverberation method and system based on Kalman filtering

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
何冲;王冬霞;王旭东;蒋茂松;: "一种基于正交非负矩阵分解的多通道线性预测语音去混响方法" *
王旭东;王冬霞;周城旭;: "基于改进BFDNN的远距离语音识别方法" *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112700787A (en) * 2021-03-24 2021-04-23 深圳市中科蓝讯科技股份有限公司 Noise reduction method, nonvolatile readable storage medium and electronic device
CN113299301A (en) * 2021-04-21 2021-08-24 北京搜狗科技发展有限公司 Voice processing method and device for voice processing
WO2023016018A1 (en) * 2021-08-12 2023-02-16 北京荣耀终端有限公司 Voice processing method and electronic device

Also Published As

Publication number Publication date
CN111599372B (en) 2023-03-21

Similar Documents

Publication Publication Date Title
CN111599372B (en) Stable on-line multi-channel voice dereverberation method and system
CN109767783B (en) Voice enhancement method, device, equipment and storage medium
CN102356427B (en) Noise suppression device
KR100636317B1 (en) Distributed Speech Recognition System and method
CN110322891B (en) Voice signal processing method and device, terminal and storage medium
CN109378013B (en) Voice noise reduction method
CN108335694B (en) Far-field environment noise processing method, device, equipment and storage medium
EP2425426B1 (en) Low complexity auditory event boundary detection
CN112735456B (en) Speech enhancement method based on DNN-CLSTM network
CN108847253B (en) Vehicle model identification method, device, computer equipment and storage medium
JP4050350B2 (en) Speech recognition method and system
CN103544961B (en) Audio signal processing method and device
CN111429932A (en) Voice noise reduction method, device, equipment and medium
CN111091833A (en) Endpoint detection method for reducing noise influence
CN108053842B (en) Short wave voice endpoint detection method based on image recognition
CN111489763A (en) Adaptive method for speaker recognition in complex environment based on GMM model
US6678656B2 (en) Noise reduced speech recognition parameters
CN112599148A (en) Voice recognition method and device
CN112233696A (en) Oil field pumping unit abnormal sound detection and reporting system based on artificial intelligence and big data
CN112002307B (en) Voice recognition method and device
TWI396186B (en) Speech enhancement technique based on blind source separation for far-field noisy speech recognition
CN113035216B (en) Microphone array voice enhancement method and related equipment
CN113744725A (en) Training method of voice endpoint detection model and voice noise reduction method
CN112712818A (en) Voice enhancement method, device and equipment
CN112562701A (en) Heart sound signal double-channel self-adaptive noise reduction algorithm, device, medium and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant