CN113470678B - Microphone array noise reduction method and device and electronic equipment - Google Patents

Microphone array noise reduction method and device and electronic equipment Download PDF

Info

Publication number
CN113470678B
CN113470678B CN202110776150.1A CN202110776150A CN113470678B CN 113470678 B CN113470678 B CN 113470678B CN 202110776150 A CN202110776150 A CN 202110776150A CN 113470678 B CN113470678 B CN 113470678B
Authority
CN
China
Prior art keywords
signal frame
voice signal
microphone
frequency spectrum
transfer function
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110776150.1A
Other languages
Chinese (zh)
Other versions
CN113470678A (en
Inventor
黄海力
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
TAILING MICROELECTRONICS (SHANGHAI) CO Ltd
Original Assignee
TAILING MICROELECTRONICS (SHANGHAI) CO Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by TAILING MICROELECTRONICS (SHANGHAI) CO Ltd filed Critical TAILING MICROELECTRONICS (SHANGHAI) CO Ltd
Priority to CN202110776150.1A priority Critical patent/CN113470678B/en
Publication of CN113470678A publication Critical patent/CN113470678A/en
Application granted granted Critical
Publication of CN113470678B publication Critical patent/CN113470678B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02165Two microphones, one receiving mainly the noise signal and the other one mainly the speech signal
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The method comprises the steps of carrying out iteration on the ratio of a corresponding second transmission function to a first transmission function respectively according to a variance obtained by carrying out statistics on energy values of first original voice signal frames, and a variance obtained by carrying out statistics on energy values obtained by multiplying the frequency spectrum of each second original voice signal frame by the conjugate value of the frequency spectrum of the first original voice signal frame at a corresponding moment, wherein the voice signal frames picked up by a main microphone are the first original voice signal frames, the voice signal frames picked up by an auxiliary microphone are the second original voice signal frames, the transmission function from a target sound source to the main microphone is the first transmission function, and the transmission function from the target sound source to the auxiliary microphone is the second transmission function. According to the method, accurate estimation of the ratio of the transmission function can be realized, and the noise reduction effect is improved.

Description

Microphone array noise reduction method and device and electronic equipment
Technical Field
The application belongs to the technical field of signal processing, and particularly relates to a microphone array noise reduction method and device and electronic equipment.
Background
Generalized sidelobe canceling techniques have been widely used for noise reduction processing of microphone arrays. Wherein the transfer function of each microphone is predetermined by the sweep frequency. In fact, the transfer function of each microphone is not a fixed value and needs to be updated in real time.
Disclosure of Invention
The invention aims at overcoming the defects of the prior art and provides a microphone array noise reduction method, a microphone array noise reduction device and electronic equipment.
In order to solve the technical problems, the application adopts the following technical scheme: a method of noise reduction for a microphone array comprising a primary microphone and at least one secondary microphone, the method comprising:
according to the variance obtained by statistics of the energy values of the first original voice signal frames, the variance obtained by statistics of the energy values obtained by multiplication of the frequency spectrums of the second original voice signal frames and the conjugate values of the frequency spectrums of the first original voice signal frames at corresponding moments, iteration is carried out on the ratio of the corresponding second transmission function to the first transmission function respectively, wherein the voice signal frames picked up by the main microphone are the first original voice signal frames, the voice signal frames picked up by the auxiliary microphone are the second original voice signal frames, the transmission function from the target sound source to the main microphone is the first transmission function, and the transmission function from the target sound source to the auxiliary microphone is the second transmission function;
constructing a frequency spectrum of an enhanced voice signal frame according to the frequency spectrum of an original voice signal frame acquired by each microphone in the microphone array and the ratio of each second transfer function to the first transfer function;
constructing the frequency spectrum of the noise signal frame corresponding to each auxiliary microphone according to the frequency spectrum of the first original voice signal frame, the frequency spectrum of each second original voice signal frame and the ratio of each second transfer function to the first transfer function;
the frequency spectrums of the noise signal frames corresponding to the auxiliary microphones are subjected to self-adaptive filtering and then accumulated to obtain the frequency spectrums of the counteracted voice signal frames;
filtering the spectrum of the offset voice signal frame from the spectrum of the enhanced voice signal frame to obtain the spectrum of the target voice signal frame;
wherein, the filter coefficient of the adaptive filtering corresponding to each auxiliary microphone is updated according to the frequency spectrum of the target voice signal frame.
In order to solve the technical problems, the application adopts the following technical scheme: a microphone array noise reduction apparatus, the microphone array comprising a primary microphone and at least one secondary microphone, the noise reduction apparatus comprising:
the transmission function ratio updating module is used for iterating the ratio of the corresponding second transmission function to the first transmission function respectively according to the variance obtained by statistics on the energy value of the first original voice signal frame, the variance obtained by statistics on the frequency spectrum of each second original voice signal frame and the conjugate value of the frequency spectrum of the first original voice signal frame at the corresponding moment, wherein the voice signal frame picked up by the main microphone is the first original voice signal frame, the voice signal frame picked up by the auxiliary microphone is the second original voice signal frame, the transmission function from the target sound source to the main microphone is the first transmission function, and the transmission function from the target sound source to the auxiliary microphone is the second transmission function;
the enhanced voice signal construction module is used for constructing the frequency spectrum of the enhanced voice signal frame according to the frequency spectrum of the original voice signal frame acquired by each microphone in the microphone array and the ratio of each second transfer function to the first transfer function;
the noise signal construction module is used for constructing the frequency spectrum of the noise signal frame corresponding to each auxiliary microphone according to the frequency spectrum of the first original voice signal frame, the frequency spectrum of each second original voice signal frame and the ratio of each second transfer function to the first transfer function;
the cancellation sound signal construction module is used for carrying out adaptive filtering on the frequency spectrums of the noise signal frames corresponding to the auxiliary microphones and then accumulating to obtain the frequency spectrums of the cancellation sound signal frames;
the output module is used for filtering the frequency spectrum of the counteracted voice signal frame from the frequency spectrum of the enhanced voice signal frame to obtain the frequency spectrum of the target voice signal frame;
wherein, the filter coefficient of the adaptive filtering corresponding to each auxiliary microphone is updated according to the frequency spectrum of the target voice signal frame.
In order to solve the technical problems, the application adopts the following technical scheme: a microphone array noise reduction apparatus, comprising: a memory storing instructions that are executed by the processor to perform the aforementioned method.
In order to solve the technical problems, the application adopts the following technical scheme: an electronic device comprises the microphone array noise reduction device.
Compared with the prior art, the beneficial effects of this application are: the ratio of the transfer function of each auxiliary microphone to the transfer function of the main microphone is updated in real time, so that the construction of the enhanced voice signal and the counteracted voice signal is more accurate, and the noise reduction effect of the microphone array is better.
Drawings
Fig. 1 is a flow chart of a microphone array noise reduction method according to an embodiment of the present application.
Fig. 2 is a signal flow schematic diagram of a microphone array noise reduction method according to an embodiment of the application.
Fig. 3 is a block diagram of a microphone noise reduction device according to an embodiment of the present application.
Fig. 4 is a block diagram of a microphone noise reduction device according to an embodiment of the present application.
Detailed Description
In this application, it should be understood that terms such as "comprises" or "comprising," etc., are intended to indicate the presence of the disclosed features, numbers, steps, acts, components, portions, or combinations thereof in this specification, but do not preclude the presence or addition of one or more other features, numbers, steps, acts, components, portions, or combinations thereof.
In addition, it should be noted that, without conflict, the embodiments and features of the embodiments in the present application may be combined with each other. The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
The present application will be further described with reference to the embodiments shown in the drawings.
As shown in fig. 1, an embodiment of the present application provides a method for noise reduction of a microphone array, where the microphone array includes a primary microphone and at least one secondary microphone, and the method for noise reduction includes the following steps.
Step 101, iterating the ratio of the corresponding second transfer function to the first transfer function according to the variance obtained by counting the energy value of the first original voice signal frame, the variance obtained by counting the frequency spectrum of each second original voice signal frame and the conjugate value of the frequency spectrum of the first original voice signal frame at the corresponding moment, wherein the voice signal frame picked up by the main microphone is the first original voice signal frame, the voice signal frame picked up by the auxiliary microphone is the second original voice signal frame, the transfer function from the target sound source to the main microphone is the first transfer function, and the transfer function from the target sound source to the auxiliary microphone is the second transfer function.
Specifically, successive original speech signals are framed. Each first original speech signal frame has an energy value. And counting the energy value of the first original voice signal frame picked up by the main microphone to obtain a cut-off current time window, wherein the variance value of the energy value.
Specifically, after the spectrum of the first original speech signal frame is conjugated, the spectrum value (which is a complex number) of each frequency point of the second original speech signal frame is multiplied by the conjugate value of the spectrum value of the corresponding frequency point in the spectrum of the first original speech signal frame, and the product represents the energy value. And adding the energy values of the frequency points to obtain the energy value corresponding to the time window. And further, the energy value of each time window is counted, and another variance value of the current time window can be obtained.
This step consists in iterating the ratio of the second transfer function to the first transfer function. Note that the first transfer function and the second transfer function are both expressed in the frequency domain herein.
Step 102, constructing a frequency spectrum of the enhanced voice signal frame according to the frequency spectrum of the original voice signal frame acquired by each microphone in the microphone array and the ratio of each second transfer function to the first transfer function.
The enhanced speech signal is an approximation to the target speech signal.
Step 103, constructing a frequency spectrum of a noise signal frame corresponding to each auxiliary microphone according to the frequency spectrum of the first original voice signal frame, the frequency spectrum of each second original voice signal frame and the ratio of each second transfer function to the first transfer function.
In this step, the noise signals picked up by the respective auxiliary microphones are extracted. The noise signal corresponding to a time window constitutes a noise signal frame.
Step 104, the frequency spectrums of the noise signal frames corresponding to the auxiliary microphones are adaptively filtered and accumulated to obtain the frequency spectrums of the offset voice signal frames.
The cancellation speech signal is to be finally filtered out. One time window corresponds to one cancellation speech signal frame.
Step 105, filtering the spectrum of the offset voice signal frame from the spectrum of the enhanced voice signal frame to obtain the spectrum of the target voice signal frame; wherein, the filter coefficient of the adaptive filtering corresponding to each auxiliary microphone is updated according to the frequency spectrum of the target voice signal frame.
The ratio of the transfer function of each auxiliary microphone to the transfer function of the main microphone is updated in real time, so that the construction of the enhanced voice signal and the counteracted voice signal is more accurate, and the noise reduction effect of the microphone array is better.
With the iteratively estimated transfer function, the microphone can be used for self-estimation, thus reducing the deviation caused by factors such as production. The method is better suitable for the actual application environment of the earphone and the state of the earphone.
Further, with this method, the channel response can be measured in advance without depending on a complicated measuring apparatus. The noise reduction function is realized at lower cost.
Specifically, the ratio of the second transfer function to the first transfer function iterates according to the following formula:
H m (t,e )=H m (t-1,e )+μ H R 1_m_v (t,e )/R 1_v (t,e ) Wherein H is m (t,e ) H is the ratio of the second transfer function to the first transfer function of the auxiliary microphone numbered m in the current frame period m (t-1,e ) Mu, the ratio of the second transfer function to the first transfer function of the auxiliary microphone numbered m in the frame period before the current frame period H To learn step length, R 1_m_v (t,e ) For stopping the variance obtained by the statistics of the energy value obtained by the conjugate multiplication of the frequency spectrum of the second original signal frame of the auxiliary microphone with the current frame period number m and the frequency spectrum of the first original signal frame, R 1_v (t,e ) The variance obtained by counting the energy value of the first original signal frame is used for stopping the current frame period. In fig. 2, the number of the secondary microphone starts from 2.
In this embodiment, a least square method is used, and a transfer function can be constructed according to a minimum mean square error criterion, so that a portion of the secondary microphone signal having the same energy as the primary microphone is extracted.
Specifically, R 1_v (t,e ) The method is determined as follows:
calculating an energy instant value R of a first original voice signal frame 1_v (t,e );
Calculating average value R for counting energy value of first original voice signal frame by stopping current frame period 1 (t,e ) Wherein R is 1 (t,e )=ρ H R 1 (t-1,e )+(1-ρ H )f(p)R 1_t (t,e ) Wherein ρ is H To set the coefficient, R 1 (t-1,e ) The average value obtained by counting the energy value of the first original voice signal frame for stopping the previous frame period of the current frame period is f (p) which is the probability of existence of the target voice;
calculating a square R of a difference between the energy instant value and the mean value of a first original speech signal frame 1_v_t (t,e j ω ),R 1_v_t (t,e )=(R 1_t (t,e )-R 1 (t,e )) 2
Wherein R is 1_v (t,e )=ρ R R 1_v (t-1,e )+(1-ρ H )R 1_v_t (t-1,e ),ρ R To set coefficients.
Specifically, R 1_m_v (t,e ) The method comprises the following steps of:
calculating an instantaneous value R of the product of the frequency spectrum of the second original voice signal frame of the auxiliary microphone with the current frame period number m and the conjugate of the frequency spectrum of the first original voice signal frame 1_m_t (t,e );
Calculating the mean value R of the product 1_m (t,e ) Wherein R is 1_m (t,e )=ρ H R 1_m_t (t-1,e )+(1-ρ H )f(p)R 1_m_t (t,e ),ρ H To set the coefficient, f (p) is the probability that the target voice exists;
calculating the square R of the difference between the instantaneous value and the mean value 1_m_v_t (t,e ) Wherein R is 1_m_v_t (t,e )=(R 1_m_t (t,e )-R 1_m (t,e )) 2
Wherein R is 1_m_v (t,e )=ρ R R 1_m_v (t-1,e )+(1-ρ H )R 1_m_v_t (t-1,e ),ρ R To set coefficients.
The first transfer function of the primary microphone is denoted as A 1 (e ) The second transfer function of the auxiliary microphone numbered m is denoted as A m (e ) Ratio H of the second transfer function to the first transfer function of the auxiliary microphone numbered m m (e ) The method comprises the following steps: h m (e )=A m (e )/A 1 (e )。
Specifically, in step 102, defining: h T (e )=[1 H 1 (e )...H m (e )]. For H T (e ) Normalizing to obtain W 0 (e )=H(e )/||H(e )|| 2 . Referring to fig. 2, the frequency spectrum of the voice signal frame received by each microphone is in turn: z is Z 1 (t,e )、Z 2 (t,e )、……。
Enhancing the frequency spectrum Y of a speech signal frame FBF (t,e ) The determination may be made according to the following formula:
the function of step 102 is to normalize the process to ensure that the energy of the human voice (i.e., the frame of the enhanced speech signal) is unchanged after passing through this filter.
Specifically, in step 103, the spectrum of the noise signal frame of each secondary microphone is constructed:
the effect of step 103 is to construct noise, Z 1 Form signal subspace, will Z m The energy mapped to the signal subspace is removed, leaving a noise signal.
Specifically, in step 104, to minimize E { |Y FBF (t,e )-G(t,e ) + U(e )|| 2 Constructing adaptive filter coefficients G (t, e) for the target ). Obtaining G (t, e) using iterative methods ):Wherein P is est (t,e )=ρP est (t-1,e )+(1-ρ)∑|Z m (t, )| 2
The spectrum of the constructed cancellation speech signal frame is denoted as Y NC (t,e )。
The function of step 104 is to iterate according to a minimum mean square error criterion to construct the noise energy that needs to be removed.
Steps 102 through 104 may be implemented in accordance with standard generalized sidelobe canceling process flows.
Specifically, in step 105, the spectrum Y of the speech signal frame is enhanced FBF (t,e ) The spectrum of the frame of the cancellation speech signal is denoted as Y NC (t,e ) The subtraction is the frequency spectrum Y (t, e) ). And performing inverse Fourier transform on the frequency spectrum to obtain a target voice signal frame in the time domain.
It should be noted that the above processes are all operations performed in the frequency domain.
The above process does not take into account the influence of the echo signal. When the speaker in the electronic device in which the microphone array is located plays sound (herein referred to as echo signal), the iteration process in step 101 should be stopped, avoiding iterating the direction of the target speech signal to the speaker.
Namely: stopping iterating the ratio of the second transfer function to the first transfer function under the condition that the loudspeaker on the electronic equipment where the microphone array is located plays the echo signal; and under the condition that the loudspeaker on the earphone where the microphone array is positioned does not play the echo signal, starting to iterate the ratio of the second transfer function to the first transfer function.
When an echo signal is present, the influence of the echo signal on the finally output target voice signal can be canceled as follows:
under the condition that an echo signal is played by a loudspeaker on an earphone where a microphone array is located, filtering a spectrum of a counteracted voice signal frame from a spectrum of an enhanced voice signal frame to obtain a spectrum of a target voice signal frame, wherein the method comprises the following steps:
determining a first interference echo signal reaching each microphone according to the echo signal and a preset third transfer function;
constructing a frequency spectrum of a second interference echo signal frame corresponding to each auxiliary microphone according to the frequency spectrum of each first interference echo signal frame and the ratio of each second transmission function to the first transmission function;
the frequency spectrum of the second interference signal frame corresponding to each auxiliary microphone is subjected to the self-adaptive filtering and then accumulated to obtain the frequency spectrum of the final interference signal frame;
and filtering the spectrum of the offset voice signal frame and the spectrum of the final interference signal frame from the spectrum of the enhanced voice signal frame to obtain the spectrum of the target voice signal frame.
In connection with fig. 2, the echo signal played by the loudspeaker is known, and the third transfer function is pre-measured by means of a frequency sweep. So that the first interfering echo signal can be deduced by calculation.
The original speech signal Z in fig. 2 is replaced by the first interfering echo signal and the spectrum of the second interfering signal frame is obtained by h+ processing in the lower part of fig. 2, i.e. the same processing procedure that is used to construct the spectrum of the noise signal frame in step 103. The spectrum of the final interfering signal frame is then obtained through the same process as step 104 to construct the cancellation speech signal frame.
The method can estimate the influence of the echo signal on the final output signal, thereby accurately eliminating the influence of the echo signal on the final output signal.
Based on the same inventive concept, referring to fig. 3, an embodiment of the present application further provides a microphone array noise reduction apparatus, where the microphone array includes one primary microphone and at least one secondary microphone, the noise reduction apparatus includes:
the transmission function ratio updating module 1 is configured to iterate ratios of the corresponding second transmission functions to the first transmission functions according to variances obtained by counting energy values of the first original voice signal frames, variances obtained by counting conjugate values of frequency spectrums of the second original voice signal frames and frequency spectrums of the first original voice signal frames at corresponding moments, wherein the voice signal frames picked up by the main microphone are the first original voice signal frames, the voice signal frames picked up by the auxiliary microphone are the second original voice signal frames, the transmission function from the target sound source to the main microphone is the first transmission function, and the transmission function from the target sound source to the auxiliary microphone is the second transmission function;
an enhanced speech signal constructing module 2, configured to construct a spectrum of an enhanced speech signal frame according to the spectrum of an original speech signal frame acquired by each microphone in the microphone array and a ratio of each second transfer function to the first transfer function;
the noise signal construction module 3 is configured to construct a frequency spectrum of a noise signal frame corresponding to each auxiliary microphone according to the frequency spectrum of the first original voice signal frame, the frequency spectrum of each second original voice signal frame, and the ratio of each second transfer function to the first transfer function;
the cancellation sound signal construction module 4 is configured to adaptively filter the frequency spectrums of the noise signal frames corresponding to the auxiliary microphones and then accumulate the frequency spectrums to obtain frequency spectrums of cancellation sound signal frames;
the output module 5 is used for filtering the frequency spectrum of the counteracted voice signal frame from the frequency spectrum of the enhanced voice signal frame to obtain the frequency spectrum of the target voice signal frame;
wherein, the filter coefficient of the adaptive filtering corresponding to each auxiliary microphone is updated according to the frequency spectrum of the target voice signal frame.
Optionally, the ratio of the second transfer function to the first transfer function iterates according to the following formula:
H m (t,e )=H m (t-1,e )+μ H R 1_m_v (t,e )/R 1_v (t,e ) Wherein H is m (t,e ) H is the ratio of the second transfer function to the first transfer function of the auxiliary microphone numbered m in the current frame period m (t-1,e ) Mu, the ratio of the second transfer function to the first transfer function of the auxiliary microphone numbered m in the frame period before the current frame period H To learn step length, R 1_m_v (t,e ) For stopping the variance obtained by the statistics of the energy value obtained by the conjugate multiplication of the frequency spectrum of the second original signal frame of the auxiliary microphone with the current frame period number m and the frequency spectrum of the first original signal frame, R 1_v (t,e ) The variance obtained by counting the energy value of the first original signal frame is used for stopping the current frame period.
Alternatively, R 1_v (te ) The method is determined as follows:
calculating an energy instant value R of a first original voice signal frame 1_v (t,e );
Calculating average value R for counting energy value of first original voice signal frame by stopping current frame period 1 (t,e ) Wherein R is 1 (t,e )=ρ H R 1 (t-1,e )+(1-ρ H )f(p)R 1_t (t,e ) Wherein ρ is H To set the coefficient, R 1 (t-1,e ) The average value obtained by counting the energy value of the first original voice signal frame for stopping the previous frame period of the current frame period is f (p) which is the probability of existence of the target voice;
calculating a square R of a difference between the energy instant value and the mean value of a first original speech signal frame 1_v_t (t,e j ω ),R 1_v_t (t,e )=(R 1_t (t,e )-R 1 (t,e )) 2
Wherein R is 1_v (t,e )=ρ R R 1_v (t-1,e )+(1-ρ H )R 1_v_t (t-1,e ),ρ R To set coefficients.
Alternatively, R 1_m_v (t,e ) The method comprises the following steps of:
calculating an instantaneous value R of the product of the frequency spectrum of the second original voice signal frame of the auxiliary microphone with the current frame period number m and the conjugate of the frequency spectrum of the first original voice signal frame 1_m_t (t,e );
Calculating the mean value R of the product 1_m (t,e ) Wherein R is 1_m (t,e )=ρ H R 1_m_t (t-1,e )+(1-ρ H )f(p)R 1_m_t (t,e ),ρ H To set the coefficient, f (p) is the probability that the target voice exists;
calculating the square R of the difference between the instantaneous value and the mean value 1_m_t (t,e ) Wherein R is 1_m_v_t (t,e )=(R 1_m_t (t,e )-R 1_m (t,e )) 2
Wherein R is 1_m_v (t,e )=ρ R R 1_m_v (t-1,e )+(1-ρ H )R 1_m_v_t (t-1,e ),ρ R To set coefficients.
Optionally, the echo processing module 6 is further included for:
stopping iterating the ratio of the second transfer function to the first transfer function under the condition that the loudspeaker on the electronic equipment where the microphone array is located plays the echo signal; and under the condition that the loudspeaker on the earphone where the microphone array is positioned does not play the echo signal, starting to iterate the ratio of the second transfer function to the first transfer function.
Optionally, the output module 5 is specifically configured to perform, in a case where the speaker on the earphone where the microphone array is located plays the echo signal:
determining a first interference echo signal reaching each microphone according to the echo signal and a preset third transfer function;
constructing a frequency spectrum of a second interference echo signal frame corresponding to each auxiliary microphone according to the frequency spectrum of each first interference echo signal frame and the ratio of each second transmission function to the first transmission function;
the frequency spectrum of the second interference signal frame corresponding to each auxiliary microphone is subjected to the self-adaptive filtering and then accumulated to obtain the frequency spectrum of the final interference signal frame;
and filtering the spectrum of the offset voice signal frame and the spectrum of the final interference signal frame from the spectrum of the enhanced voice signal frame to obtain the spectrum of the target voice signal frame.
Referring to fig. 4, an embodiment of the present application further provides a microphone array noise reduction device, including: a memory 10 and a processor 20, the memory 10 storing instructions, the processor 20 executing the instructions to perform the microphone array noise reduction method described previously.
The embodiment of the application also provides electronic equipment, which comprises the microphone array noise reduction device.
All embodiments in the application are described in a progressive manner, and identical and similar parts of all embodiments are mutually referred, so that each embodiment mainly describes differences from other embodiments.
The scope of the present application is not limited to the above-described embodiments, and it is apparent that various modifications and variations can be made to the present application by those skilled in the art without departing from the scope and spirit of the present application. Such modifications and variations are intended to be included herein within the scope of the following claims and their equivalents.

Claims (12)

1. A method of noise reduction for a microphone array comprising a primary microphone and at least one secondary microphone, the method comprising:
for the auxiliary microphone with the number m in the microphone array, according to the variance obtained by statistics on the energy value of the first original voice signal frame, the variance obtained by statistics on the energy value obtained by multiplication on the conjugate value of the frequency spectrum of the second original voice signal frame corresponding to the auxiliary microphone with the number m and the frequency spectrum of the first original voice signal frame at the corresponding moment, iterating the ratio of the second transmission function corresponding to the auxiliary microphone with the number m to the first transmission function through the following formula, wherein the voice signal frame picked up by the main microphone is the first original voice signal frame, the voice signal frame picked up by the auxiliary microphone is the second original voice signal frame, the transmission function from the target sound source to the main microphone is the first transmission function, and the transmission function from the target sound source to the auxiliary microphone is the second transmission function:
H m (t,e )=H m (t-1,e )+μ H R 1_m_v (t,e )/R 1_v (t,e ) Wherein H is m (t,e ) H is the ratio of the second transfer function to the first transfer function of the auxiliary microphone numbered m in the current frame period m (t-1,e ) Mu, the ratio of the second transfer function to the first transfer function of the auxiliary microphone numbered m in the frame period before the current frame period H To learn step length, R 1_m_v (t,e ) For stopping the variance obtained by statistics of the energy value obtained by conjugate multiplication of the frequency spectrum of the second original voice signal frame of the auxiliary microphone with the current frame period number m and the frequency spectrum of the first original voice signal frame, R 1_v (t,e ) The variance obtained by statistics of the energy value of the first original voice signal frame is used for stopping the current frame period;
constructing a frequency spectrum of an enhanced voice signal frame according to the frequency spectrum of an original voice signal frame acquired by each microphone in the microphone array and the ratio of each second transfer function to the first transfer function;
constructing the frequency spectrum of the noise signal frame corresponding to each auxiliary microphone according to the frequency spectrum of the first original voice signal frame, the frequency spectrum of each second original voice signal frame and the ratio of each second transfer function to the first transfer function;
the frequency spectrums of the noise signal frames corresponding to the auxiliary microphones are subjected to self-adaptive filtering and then accumulated to obtain the frequency spectrums of the counteracted voice signal frames;
filtering the spectrum of the offset voice signal frame from the spectrum of the enhanced voice signal frame to obtain the spectrum of the target voice signal frame;
wherein, the filter coefficient of the adaptive filtering corresponding to each auxiliary microphone is updated according to the frequency spectrum of the target voice signal frame.
2. The method of claim 1, wherein R 1_v (t,e ) The method is determined as follows:
calculating an energy instant value R of a first original voice signal frame 1_t (t,e );
Calculating average value R for counting energy value of first original voice signal frame by stopping current frame period 1 (t,e ) Wherein R is 1 (t,e )=ρ H R 1 (t-1,e )+(1-ρ H )f(p)R 1_t (t,e ) Wherein ρ is H To set the coefficient, R 1 (t-1,e ) The average value obtained by counting the energy value of the first original voice signal frame for stopping the previous frame period of the current frame period is f (p) which is the probability of existence of the target voice;
calculating a square R of a difference between the energy instant value and the mean value of a first original speech signal frame 1_v_t (t,e ),R 1_v_t (t,e )=(R 1_t (t,e )-R 1 (t,e )) 2
Wherein R is 1_v (t,e )=ρ R R 1_v (t-1,e )+(1-ρ H )R 1_v_t (t-1,e ),ρ R To set coefficients.
3. The method of claim 1, wherein R 1_m_v (t,e ) The method comprises the following steps of:
calculating an instantaneous value R of the product of the frequency spectrum of the second original voice signal frame of the auxiliary microphone with the current frame period number m and the conjugate of the frequency spectrum of the first original voice signal frame 1_m_t (t,e );
Calculating the mean value R of the product 1_m (t,e ) Wherein R is 1_m (t,e )=ρ H R 1_m_t (t-1,e )+(1-ρ H )f(p)R 1_m_t (t,e ),ρ H To set the coefficient, f (p) is the probability that the target voice exists;
calculating the square R of the difference between the instantaneous value and the mean value 1_m_v_t (t,e ) Wherein R is 1_m_v_t (t,e )=(R 1_m_t (t,e )-R 1_m (t,e )) 2
Wherein R is 1_m_v (t,e )=ρ R R 1_m_v (t-1,e )+(1-ρ H )R 1_m_v_t (t-1,e ),ρ R To set coefficients.
4. The method as recited in claim 1, further comprising:
stopping iterating the ratio of the second transfer function to the first transfer function under the condition that the loudspeaker on the electronic equipment where the microphone array is located plays the echo signal; and under the condition that the loudspeaker on the earphone where the microphone array is positioned does not play the echo signal, starting to iterate the ratio of the second transfer function to the first transfer function.
5. The method of claim 4, wherein filtering the spectrum of the cancellation speech signal frame from the spectrum of the enhancement speech signal frame in the case of an echo signal played by a speaker on an earphone where the microphone array is located, to obtain the spectrum of the target speech signal frame, comprises:
determining a first interference echo signal reaching each microphone according to the echo signal and a preset third transfer function;
constructing a frequency spectrum of a second interference echo signal frame corresponding to each auxiliary microphone according to the frequency spectrum of each first interference echo signal frame and the ratio of each second transmission function to the first transmission function;
the frequency spectrum of the second interference signal frame corresponding to each auxiliary microphone is subjected to the self-adaptive filtering and then accumulated to obtain the frequency spectrum of the final interference signal frame;
and filtering the spectrum of the offset voice signal frame and the spectrum of the final interference signal frame from the spectrum of the enhanced voice signal frame to obtain the spectrum of the target voice signal frame.
6. A microphone array noise reduction apparatus, the microphone array comprising a primary microphone and at least one secondary microphone, the noise reduction apparatus comprising:
the transmission function ratio updating module is configured to iterate, for an auxiliary microphone numbered m in the microphone array, a ratio of a second transmission function corresponding to the auxiliary microphone numbered m to the first transmission function according to a variance obtained by counting an energy value of the first original speech signal frame, a variance obtained by counting a conjugate value of a frequency spectrum of the second original speech signal frame corresponding to the auxiliary microphone numbered m and a frequency spectrum of the first original speech signal frame at a corresponding time, where the speech signal frame picked up by the main microphone is the first original speech signal frame, the speech signal frame picked up by the auxiliary microphone is the second original speech signal frame, a transmission function from the target sound source to the main microphone is the first transmission function, and a transmission function from the target sound source to the auxiliary microphone is the second transmission function:
H m (t, e )=H m (t-1,e )+μ H R 1_m_v (t,e )/R 1_v (t,e ) Wherein H is m (t,e ) H is the ratio of the second transfer function to the first transfer function of the auxiliary microphone numbered m in the current frame period m (t-1,e ) To be in the current frame periodThe ratio of the second transfer function to the first transfer function of the secondary microphone numbered m, mu, of the previous frame period H To learn step length, R 1_m_v (t,e ) For stopping the variance obtained by statistics of the energy value obtained by conjugate multiplication of the frequency spectrum of the second original voice signal frame of the auxiliary microphone with the current frame period number m and the frequency spectrum of the first original voice signal frame, R 1_v (t,e ) The variance obtained by statistics of the energy value of the first original voice signal frame is used for stopping the current frame period;
the enhanced voice signal construction module is used for constructing the frequency spectrum of the enhanced voice signal frame according to the frequency spectrum of the original voice signal frame acquired by each microphone in the microphone array and the ratio of each second transfer function to the first transfer function;
the noise signal construction module is used for constructing the frequency spectrum of the noise signal frame corresponding to each auxiliary microphone according to the frequency spectrum of the first original voice signal frame, the frequency spectrum of each second original voice signal frame and the ratio of each second transfer function to the first transfer function;
the cancellation sound signal construction module is used for carrying out adaptive filtering on the frequency spectrums of the noise signal frames corresponding to the auxiliary microphones and then accumulating to obtain the frequency spectrums of the cancellation sound signal frames;
the output module is used for filtering the frequency spectrum of the counteracted voice signal frame from the frequency spectrum of the enhanced voice signal frame to obtain the frequency spectrum of the target voice signal frame;
wherein, the filter coefficient of the adaptive filtering corresponding to each auxiliary microphone is updated according to the frequency spectrum of the target voice signal frame.
7. The apparatus of claim 6, wherein the device comprises a plurality of sensors,
R 1_v (t,e ) The method is determined as follows:
calculating an energy instant value R of a first original voice signal frame 1_t (t,e );
Calculating average value R for counting energy value of first original voice signal frame by stopping current frame period 1 (t,e ) Wherein R is 1 (t,e )=ρ H R 1 (t-1,e )+(1-ρ H )f(p)R 1_t (t,t ) Wherein ρ is H To set the coefficient, R 1 (t-1,e ) The average value obtained by counting the energy value of the first original voice signal frame for stopping the previous frame period of the current frame period is f (p) which is the probability of existence of the target voice;
calculating a square R of a difference between the energy instant value and the mean value of a first original speech signal frame 1_v_t (t,e ),R 1_v_t (t,e )=(R 1_t (t,e )-R 1 (t,e )) 2
Wherein R is 1_v (t,e )=ρ R R 1_v (t-1,e )+(1-ρ H )R 1_v_t (t-1,e ),ρ R To set coefficients.
8. The device of claim 6, wherein R 1_m_v (t,e ) The method comprises the following steps of:
calculating an instantaneous value R of the product of the frequency spectrum of the second original voice signal frame of the auxiliary microphone with the current frame period number m and the conjugate of the frequency spectrum of the first original voice signal frame 1_m_t (t,e );
Calculating the mean value R of the product 1_m (T,e ) Wherein R is 1_m (t,e )=ρ H R 1_m_t (t-1,e )+(1-ρ H )f(p)R 1_m_t (t,e ),ρ H To set the coefficient, f (R) is the probability that the target voice exists;
calculating the square R of the difference between the instantaneous value and the mean value 1_m_v_t (t,e ) Wherein R is 1_m_v_t (t,e )=(R 1_m_t (t,e )-R 1_m (t,e )) 2
Wherein R is 1_m_v (t,e )=ρ R R 1_m_v (t-1,e )+(1-ρ H )R 1_m_v_t (t-1,e ),ρ R To set coefficients.
9. The apparatus of claim 6, further comprising an echo processing module configured to:
stopping iterating the ratio of the second transfer function to the first transfer function under the condition that the loudspeaker on the electronic equipment where the microphone array is located plays the echo signal; and under the condition that the loudspeaker on the earphone where the microphone array is positioned does not play the echo signal, starting to iterate the ratio of the second transfer function to the first transfer function.
10. The apparatus according to claim 9, wherein the output module is configured to perform, in particular in case of an echo signal played by a speaker on an earphone where the microphone array is located:
determining a first interference echo signal reaching each microphone according to the echo signal and a preset third transfer function;
constructing a frequency spectrum of a second interference echo signal frame corresponding to each auxiliary microphone according to the frequency spectrum of each first interference echo signal frame and the ratio of each second transmission function to the first transmission function;
the frequency spectrum of the second interference signal frame corresponding to each auxiliary microphone is subjected to the self-adaptive filtering and then accumulated to obtain the frequency spectrum of the final interference signal frame;
and filtering the spectrum of the offset voice signal frame and the spectrum of the final interference signal frame from the spectrum of the enhanced voice signal frame to obtain the spectrum of the target voice signal frame.
11. A microphone array noise reduction device, comprising: a memory storing instructions that are executable by the processor to perform the method of any one of claims 1 to 5.
12. An electronic device comprising a microphone array noise reducer according to any one of claims 6 to 10.
CN202110776150.1A 2021-07-08 2021-07-08 Microphone array noise reduction method and device and electronic equipment Active CN113470678B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110776150.1A CN113470678B (en) 2021-07-08 2021-07-08 Microphone array noise reduction method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110776150.1A CN113470678B (en) 2021-07-08 2021-07-08 Microphone array noise reduction method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN113470678A CN113470678A (en) 2021-10-01
CN113470678B true CN113470678B (en) 2024-03-15

Family

ID=77879351

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110776150.1A Active CN113470678B (en) 2021-07-08 2021-07-08 Microphone array noise reduction method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN113470678B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103268766A (en) * 2013-05-17 2013-08-28 泰凌微电子(上海)有限公司 Method and device for speech enhancement with double microphones
CN106782590A (en) * 2016-12-14 2017-05-31 南京信息工程大学 Based on microphone array Beamforming Method under reverberant ambiance
EP3285501A1 (en) * 2016-08-16 2018-02-21 Oticon A/s A hearing system comprising a hearing device and a microphone unit for picking up a user's own voice
WO2019112468A1 (en) * 2017-12-08 2019-06-13 Huawei Technologies Co., Ltd. Multi-microphone noise reduction method, apparatus and terminal device
CN111312269A (en) * 2019-12-13 2020-06-19 辽宁工业大学 Rapid echo cancellation method in intelligent loudspeaker box
CN112236820A (en) * 2018-06-25 2021-01-15 赛普拉斯半导体公司 Beamformer and Acoustic Echo Canceller (AEC) system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE602006006664D1 (en) * 2006-07-10 2009-06-18 Harman Becker Automotive Sys Reduction of background noise in hands-free systems

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103268766A (en) * 2013-05-17 2013-08-28 泰凌微电子(上海)有限公司 Method and device for speech enhancement with double microphones
EP3285501A1 (en) * 2016-08-16 2018-02-21 Oticon A/s A hearing system comprising a hearing device and a microphone unit for picking up a user's own voice
CN106782590A (en) * 2016-12-14 2017-05-31 南京信息工程大学 Based on microphone array Beamforming Method under reverberant ambiance
WO2019112468A1 (en) * 2017-12-08 2019-06-13 Huawei Technologies Co., Ltd. Multi-microphone noise reduction method, apparatus and terminal device
CN112236820A (en) * 2018-06-25 2021-01-15 赛普拉斯半导体公司 Beamformer and Acoustic Echo Canceller (AEC) system
CN111312269A (en) * 2019-12-13 2020-06-19 辽宁工业大学 Rapid echo cancellation method in intelligent loudspeaker box

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Binaural beamforming using pre-determined relative acoustic transfer functions;Andreas I. Koutrouvelis;《2017 25th European Signal Processing Conference (EUSIPCO)》;全文 *
麦克风阵列波束成形算法研究与实现;陈颖睿;《中国优秀硕士学位论文全文库(数据科技辑)》(第2期);全文 *

Also Published As

Publication number Publication date
CN113470678A (en) 2021-10-01

Similar Documents

Publication Publication Date Title
CN109727604B (en) Frequency domain echo cancellation method for speech recognition front end and computer storage medium
CN111768796B (en) Acoustic echo cancellation and dereverberation method and device
CN111885275B (en) Echo cancellation method and device for voice signal, storage medium and electronic device
CN103929704B (en) The method and system that a kind of adaptive acoustic feedback based on transform domain is eliminated
CN107360497B (en) Calculation method and device for estimating reverberation component
JP5662232B2 (en) Echo canceling apparatus, method and program
CN113470678B (en) Microphone array noise reduction method and device and electronic equipment
CN111445916B (en) Audio dereverberation method, device and storage medium in conference system
CN113012709B (en) Echo cancellation method and device
CN115579016B (en) Method and system for eliminating acoustic echo
JP2003309493A (en) Method, device and program for reducing echo
Thomas et al. Application of channel shortening to acoustic channel equalization in the presence of noise and estimation error
CN112802487B (en) Echo processing method, device and system
JP5937451B2 (en) Echo canceling apparatus, echo canceling method and program
CN115620737A (en) Voice signal processing device, method, electronic equipment and sound amplification system
JP2002223182A (en) Echo canceling method, its device, its program and its recording medium
JP5228903B2 (en) Signal processing apparatus and method
US10636410B2 (en) Adaptive acoustic echo delay estimation
CN113763984A (en) Parameterized noise elimination system for distributed multiple speakers
JP7495684B2 (en) Echo cancellation method, device and electronic device
CN108074580B (en) Noise elimination method and device
US20230344941A1 (en) Method for echo cancellation, echo cancellation device and electronic equipment
KR101558397B1 (en) Reverberation Filter Estimation Method and Dereverberation Filter Estimation Method, and A Single-Channel Speech Dereverberation Method Using the Dereverberation Filter
CN107393559B (en) Method and device for checking voice detection result
EP4319192A1 (en) Echo suppressing device, echo suppressing method, and echo suppressing program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant