CN111048105A - Voice enhancement processing method, device and system, household appliance and storage medium - Google Patents

Voice enhancement processing method, device and system, household appliance and storage medium Download PDF

Info

Publication number
CN111048105A
CN111048105A CN201911343570.XA CN201911343570A CN111048105A CN 111048105 A CN111048105 A CN 111048105A CN 201911343570 A CN201911343570 A CN 201911343570A CN 111048105 A CN111048105 A CN 111048105A
Authority
CN
China
Prior art keywords
noise
signals
processing
enhancement processing
speech
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911343570.XA
Other languages
Chinese (zh)
Inventor
孟林
徐成茂
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Midea Group Co Ltd
Guangdong Midea White Goods Technology Innovation Center Co Ltd
Original Assignee
Midea Group Co Ltd
Guangdong Midea White Goods Technology Innovation Center Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Midea Group Co Ltd, Guangdong Midea White Goods Technology Innovation Center Co Ltd filed Critical Midea Group Co Ltd
Priority to CN201911343570.XA priority Critical patent/CN111048105A/en
Publication of CN111048105A publication Critical patent/CN111048105A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The application provides a voice enhancement processing method, a device, a system, a household appliance and a storage medium, wherein the voice enhancement processing method comprises the following steps: performing delay compensation on the audio signals collected by the multi-microphone array to configure synchronous audio signals, wherein the synchronous audio signals comprise a first group of synchronous signals and a second group of synchronous signals; performing weighting processing on the first group of synchronization signals to generate a speech processing component; extracting noise signals in the second group of synchronous signals, and performing filtering processing on the noise signals by adopting a self-adaptive filter to generate noise processing components; controlling the noise processing component to perform a cancellation operation on the speech processing component to generate an output signal; and updating the adaptive filter according to the relation between the output signal and the expected signal so as to carry out filtering operation on the voice processing component according to the updated adaptive filter. Through the technical scheme of this application, can eliminate instantaneous impulse noise and gaussian noise in the speech signal effectively, promote speech recognition effect.

Description

Voice enhancement processing method, device and system, household appliance and storage medium
Technical Field
The present application relates to the field of speech recognition technologies, and in particular, to a speech enhancement processing method, a speech enhancement processing apparatus, a speech enhancement system, and a computer-readable storage medium.
Background
In the voice recognition process, a plurality of life noises such as human dialogue sound, noisy noise of household appliance work and vibration, traffic noise, pet noise and the like can influence the voice recognition result, and at the moment, the voice recognition effect needs to be improved by adopting a signal enhancement mode.
In the related art, speech enhancement can be achieved by frequency domain spectral subtraction based on an auditory masking model, which results in large noise residual due to the fact that the spectrum estimation is not reasonable.
Disclosure of Invention
The present application is directed to solving at least one of the problems of the prior art or the related art.
To this end, an object of the present application is to provide a speech enhancement processing method.
Another object of the present application is to provide a speech enhancement processing apparatus.
It is a further object of this application to provide a speech enhancement system.
It is yet another object of the present application to provide a home appliance.
It is yet another object of the present application to provide a computer-readable storage medium.
In order to achieve the above object, according to an embodiment of a first aspect of the present application, there is provided a speech enhancement processing method including: performing delay compensation on the audio signals collected by the multi-microphone array to configure synchronous audio signals, wherein the synchronous audio signals comprise a first group of synchronous signals and a second group of synchronous signals; performing weighting processing on the first group of synchronization signals to generate a speech processing component; extracting noise signals in the second group of synchronous signals, and performing filtering processing on the noise signals by adopting a self-adaptive filter to generate noise processing components; controlling the noise processing component to perform a cancellation operation on the speech processing component to generate an output signal; and updating the adaptive filter according to the relation between the output signal and the expected signal, so as to carry out filtering operation on the voice processing component according to the updated adaptive filter and output an enhanced voice signal.
In the technical scheme, a plurality of microphone channels are constructed by a multi-microphone array, audio signals collected by the microphone array can generate a certain time delay phenomenon due to the difference of the arrangement positions of microphones, synchronous audio signals are obtained by performing time delay compensation on the audio signals so as to ensure the consistency of the audio signals received by each microphone channel, the synchronous audio signals are divided into two groups, the first group of synchronous audio signals is synchronously weighted to obtain a voice processing component, the second group of synchronous signals is processed by an adaptive filter to obtain a noise processing component, an output signal is synthesized by the voice processing component and the noise processing component, the adaptive filter is further updated based on the relation between the output signal and a preset expected signal, the voice processing component is filtered again by the updated adaptive filter, to output an enhanced speech signal.
The multi-microphone array may be a dual-microphone array, a four-microphone array, or an eight-microphone array, and the microphones may be oriented in the same direction or in different directions according to the number of the microphones.
The speech enhancement refers to a technology for extracting useful speech signals from noise backgrounds and inhibiting and reducing noise interference after the speech signals are interfered and ground by various noise signals, and the speech enhancement processing scheme is limited in the application, and parameters in the adaptive filter are updated according to the relation between output signals and expected signals so that the output signals are close to the expected signals.
The expected signal is a clean voice signal and can be specially collected in a training stage to be used as a detection reference for judging whether the adaptive filter is reasonable or not.
In addition, the speech enhancement processing scheme defined in the present application has no frequency estimation (frequency transform) and adopts an adaptive filter instead of a speech model, so that the speech enhancement processing scheme has a better enhancement effect in a low signal-to-noise ratio condition and a better noise suppression effect, and can obtain a higher evaluation in subjective speech quality evaluation (PESQ).
In the above technical solution, performing weighting processing on the first group of synchronization signals to generate a speech processing component specifically includes: configuring a corresponding weighting matrix by adopting a fixed non-adaptive weighting coefficient; the first set of synchronization signals are weighted according to the weighting matrix to cancel noise of the multi-microphone array and generate speech processing components.
In the technical scheme, a corresponding weighting matrix static weighting matrix is configured by adopting a fixed non-adaptive weighting coefficient so as to cancel the noise of the multi-microphone array, wherein the cancelled noise comprises linear white noise, nonlinear impulse noise and the like.
In any of the above technical solutions, the adaptive filter includes a linear filter and a nonlinear filter, extracts a noise signal in the second group of synchronization signals, and performs filtering processing on the noise signal by using the adaptive filter to generate a noise processing component, and specifically includes: filtering the target voice in the second group of synchronous signals by adopting a blocking matrix to obtain noise signals, wherein the noise signals comprise a first group of noise signals and a second group of noise signals; performing a linear filtering operation on the first set of noise signals to filter out linear noise and generate a first set of components; performing a nonlinear filtering operation on the second set of noise signals by using a nonlinear filter to filter out nonlinear noise and generate a second set of components; wherein the blocking matrix is determined as a linear filter.
In the technical scheme, the blocking matrix is specifically a spatial wave trap, and for a second group of synchronous signals, a blocking matrix can be firstly adopted to filter out a target signal (namely a required audio signal), retain a noise signal, respectively perform linear and nonlinear filtering on the retained noise signal, and obtain a noise processing component, on one hand, the noise processing component is combined with the voice processing component to obtain an output signal, and then the adaptive filter can be updated according to the relation between the output signal and an expected signal to realize the optimization of the adaptive filter, on the other hand, the blocking matrix comprises a linear weight coefficient, the second group of noise signals obtained by the blocking matrix further performs nonlinear operation through the nonlinear filter, realizes the cooperative adaptive filtering operation, and further filters the noise signals which cannot be filtered by the linear filtering, thereby being beneficial to improving the recognition rate of the voice recognition.
In any of the above technical solutions, performing a nonlinear filtering operation on the second group of noise signals to filter out nonlinear noise and generate a second group of components specifically includes: performing activation processing on the second group of noise signals to map the second group of noise signals to a specified range and generate intermediate noise processing signals; adopting a Legendre polynomial function to carry out extension processing on the intermediate noise processing signal so as to generate an extension signal; and performing a compensation operation on the spread signal using the cooperation vector to output a second set of components, wherein the activation function is formed using a hyperbolic tangent function configuration.
The Legendre polynomial function, that is, the factorial division polynomial function, is a continuous function defined in the range of [ -1, 1] and is used to approximate a function, which is used to approximate the corresponding expected noise in the present application.
In the technical scheme, the activation function may specifically be a hyperbolic tangent function, the first group of components include linear weight coefficients (filtering weights), and the second group of components include nonlinear weight coefficients (cooperative factors, which provide a convergence condition for the Legendre polynomial function after mapping by using the hyperbolic tangent function as the activation function, expand the Legendre polynomial function, and perform weighted summation by combining the cooperative vectors) to perform cooperative summation on the two components.
Further, the weight coefficient is solved to obtain a system containing two adaptive factors, namely linear and nonlinear.
Specifically, the second group of components include nonlinear adaptive filtering weights, the signals are mapped into a range of [0, 1] by utilizing a hyperbolic tangent activation function to facilitate data operation, then a Legendre polynomial function is adopted to determine the nonlinear weights and approach expected noise, and then the noise processing components are controlled to perform cancellation operation on the voice processing components to generate output signals, so that the aims of filtering nonlinear noise and enhancing useful voice signals are fulfilled.
In any of the above technical solutions, updating the adaptive filter according to the relationship between the output signal and the desired signal, so as to perform a filtering operation on the speech processing component according to the updated adaptive filter, and output an enhanced speech signal, specifically including: determining an offset value between the output signal and the desired signal; performing iterative operation by adopting a normalized least mean square function numerical control linear filter and a nonlinear filter according to the deviation value so as to enable a first group of components to approximate to first expected noise and a second group of components to approximate to second expected noise; detecting that the deviation value is reduced until the output signal approaches to the expected signal, and determining a filtering weight of the updated linear filter and a co-factor of the updated nonlinear filter; and generating an updated self-adaptive filter according to the updated filtering weight and the synergistic factor.
In the technical scheme, a deviation value of an output signal is determined based on a relation between the output signal and an expected signal, a filtering weight and a synergistic factor are continuously updated based on the deviation value, iterative operation is carried out based on the updated filtering weight and the updated synergistic factor until the deviation value is reduced until the output signal approaches the expected signal, an optimized self-adaptive filtering weight, namely an updated self-adaptive filter, is obtained, filtering operation is carried out through the updated self-adaptive filter, and the recognition rate of voice recognition can be improved in the environment of white noise, pink noise, F16 noise and babble noise.
In any of the above technical solutions, performing delay compensation on an audio signal collected by a multi-microphone array to configure a synchronous audio signal specifically includes: performing time delay estimation on the audio signal; and performing time delay compensation according to the time delay estimation result to complete the synchronous processing of the audio signals and configure synchronous audio signals.
In the technical scheme, firstly, a delay estimation module is used for carrying out delay estimation on the signals, then, the acquired signals are synchronized by using delay compensation, and the synchronized signals are x (n) ([ x1(n), x2(n), … and x N (n) ] T, so that two groups of synchronous signals are continuously divided.
According to an aspect of the second aspect of the present application, there is provided a speech enhancement processing apparatus including: a memory and a processor; a memory for storing program code; and the processor is used for calling the program codes to execute the voice enhancement processing method of the technical scheme of the first aspect of the application.
According to an aspect of the third aspect of the present application, there is provided a speech enhancement system, including: the speech enhancement processing apparatus according to the second aspect of the present application; and the clock device is electrically connected with the voice enhancement processing device and is used for providing a working clock for the voice enhancement processing device.
In this solution, the clock device includes a crystal oscillator with high frequency stability and a phase-locked loop PLL (frequency synthesizer) that generates a plurality of paths of low phase noise. The frequency synthesizer consists of a frequency reference, a phase detector, a charge pump, a loop filter and a Voltage Controlled Oscillator (VCO). A frequency synthesizer based on PLL technology adds two frequency dividers, one for lowering the reference frequency and the other for dividing the VCO to achieve adjustment of the operating frequency.
In the above technical solution, the method further comprises: the reset device is electrically connected with the voice enhancement processing device and comprises a power-on reset module and a monitoring module, wherein the power-on reset module is used for loading firmware required by the voice enhancement processing device, and the monitoring module is used for monitoring the running state of the voice enhancement processing device.
In the technical scheme, the resetting device is arranged to ensure the normal work of the system and prevent the measures of error blocking, thereby improving the running stability of the voice enhancement system.
In any one of the above technical solutions, the method further includes: the power supply device is electrically connected with the voice enhancement processing device and is used for supplying power to the voice enhancement processing device; and the communication device is electrically connected with the voice enhancement processing device and is used for transmitting the processing result of the voice enhancement processing device.
In the technical scheme, a power supply device supplies power for other devices, direct current power supply of devices is composed of an LDO (low dropout regulator) and a high-efficiency DCDC (direct current regulator), the DCDC provides voltages of 5V/1A and 3.3V/2A, the efficiency is up to 90%, the LDO provides 1.8V/1.5A, the power supply ripple is not more than 10mV, and the core working performance of the voice enhancement processing device is ensured.
In addition, the communication device may include a Combo two-in-one module for WIFI and BLE.
The speech enhancement processing device can be realized by a special ASIC (application specific integrated circuit) or SOC (system on chip), or can be realized by being respectively arranged on the signal processing module and the control module.
In any one of the above technical solutions, the speech enhancement processing apparatus includes: a signal processing module, configured to execute the speech enhancement processing method according to the first aspect of the present application; and the control module is used for controlling the signal processing module to execute the voice enhancement processing method.
In the technical scheme, the control module is mainly a microcontroller, and comprises a RISC (reduced instruction set computer) processor core and a high-performance CPU, can ensure deterministic operation and has a low-power consumption mode.
Based on the limitation of the voice enhancement system, the voice enhancement processing method is operated on the platform to provide a low signal-to-noise ratio operating environment for voice enhancement processing, and the consistency of the microphone array can be improved, so that the voice enhancement processing effect is improved.
The fourth aspect of the present application provides a home appliance, which includes the speech enhancement processing apparatus according to the second aspect of the present application.
The technical scheme of the fifth aspect of the present application provides a home appliance device, which includes the voice enhancement system according to any one of the technical schemes of the third aspect of the present application.
Specifically, the household appliance includes any one of a refrigerator, a washing machine, an air conditioner, an oven, an electric cooker, a microwave oven and a floor sweeping robot.
A sixth aspect of the present application provides a computer-readable storage medium, and a computer program is executed by a processor to implement the steps of the speech enhancement processing method defined in any one of the above technical solutions, so that the technical effect of the control method defined in any one of the above technical solutions is achieved, and details are not repeated herein.
Additional aspects and advantages of the present application will be set forth in part in the description which follows, or may be learned by practice of the invention.
Drawings
The above and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 shows a schematic flow diagram of a speech enhancement processing method according to an embodiment of the present application;
FIG. 2 shows a schematic block diagram of a speech enhancement processing apparatus according to an embodiment of the present application;
FIG. 3 shows a process diagram of a speech enhancement processing apparatus according to an embodiment of the present application;
fig. 4 shows a schematic block diagram of a home device according to an embodiment of the present application;
FIG. 5 shows a schematic block diagram of a speech enhancement system according to an embodiment of the present application;
fig. 6 shows a schematic block diagram of a home device according to another embodiment of the present application.
Detailed Description
In order that the above objects, features and advantages of the present application can be more clearly understood, the present application will be described in further detail with reference to the accompanying drawings and detailed description. It should be noted that the embodiments and features of the embodiments of the present application may be combined with each other without conflict.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application, however, the present application may be practiced in other ways than those described herein, and therefore the scope of the present application is not limited by the specific embodiments disclosed below.
Example one
As shown in fig. 1, a speech enhancement processing method according to an embodiment of the present application includes: and 102, performing delay compensation on the audio signals collected by the multi-microphone array to configure synchronous audio signals, wherein the synchronous audio signals comprise a first group of synchronous signals and a second group of synchronous signals.
As a specific implementation manner of step 102, the method includes: performing time delay estimation on the audio signal; and performing time delay compensation according to the time delay estimation result to complete the synchronous processing of the audio signals and configure synchronous audio signals.
In this embodiment, the delay estimation module performs delay estimation on the signals, and then synchronizes the acquired signals by using delay compensation, where the synchronized signals are [ x1(n), x2(n), …, x N (n) ], so as to continuously divide two groups of synchronization signals.
Step 104, performing weighting processing on the first group of synchronous signals to generate a voice processing component.
As a specific implementation manner of step 104, the method includes: configuring a corresponding weighting matrix by adopting a fixed non-adaptive weighting coefficient; the first set of synchronization signals are weighted according to the weighting matrix to cancel noise of the multi-microphone array and generate speech processing components.
In this embodiment, a static weighting matrix of a corresponding weighting matrix is configured by using a fixed non-adaptive weighting coefficient to cancel noise of the multi-microphone array, where the cancelled noise includes linear white noise, nonlinear impulse noise, and the like.
And 106, extracting the noise signals in the second group of synchronous signals, and performing filtering processing on the noise signals by adopting an adaptive filter to generate noise processing components.
The adaptive filter includes a linear filter and a nonlinear filter, and as a specific implementation manner of step 106, the method includes: filtering the target voice in the second group of synchronous signals by adopting a blocking matrix to obtain noise signals, wherein the noise signals comprise a first group of noise signals and a second group of noise signals; performing a linear filtering operation on the first set of noise signals to filter out linear noise and generate a first set of components; performing a nonlinear filtering operation on the second set of noise signals by using a nonlinear filter to filter out nonlinear noise and generate a second set of components; and synthesizing the noise processing components according to the first group of components and the second group of components and the corresponding co-factors, wherein the blocking matrix is determined as a linear filter.
In this embodiment, the blocking matrix is specifically a spatial filter, and for the second group of synchronous signals, a blocking matrix may be first used to filter out a target signal (i.e., a desired audio signal), retain a noise signal, and perform linear and nonlinear filtering on the retained noise signal, and obtain a noise processing component.
In any of the above embodiments, performing a nonlinear filtering operation on the second group of noise signals to filter out nonlinear noise and generate a second group of components specifically includes: performing activation processing on the second group of noise signals to map the second group of noise signals to a specified range and generate intermediate noise processing signals; adopting a Legendre polynomial function to carry out extension processing on the intermediate noise processing signal so as to generate an extension signal; and performing a compensation operation on the spread signal using the cooperation vector to output a second set of components, wherein the activation function is formed using a hyperbolic tangent function configuration.
In this embodiment, the activation function may specifically be a hyperbolic tangent function, the first group of components includes a linear weight coefficient (filtering weight), and the second group of components includes a nonlinear weight coefficient (a synergistic factor, which provides a convergence condition for the Legendre polynomial function after mapping by using the hyperbolic tangent function as the activation function, expands the Legendre polynomial function, and performs weighted summation by combining a synergistic vector), so as to perform synergistic summation on the two components.
Further, the weight coefficient is solved to obtain a system containing two adaptive factors, namely linear and nonlinear.
Specifically, the second group of components include nonlinear adaptive filtering weights, the signals are mapped into a range of [0, 1] by utilizing a hyperbolic tangent activation function to facilitate data operation, then a Legendre polynomial function is adopted to determine the nonlinear weights and approach expected noise, and then the noise processing components are controlled to perform cancellation operation on the voice processing components to generate output signals, so that the aims of filtering nonlinear noise and enhancing useful voice signals are fulfilled.
The noise processed component is controlled to perform a cancellation operation on the speech processed component to generate an output signal, step 108.
Step 110, updating the adaptive filter according to the relationship between the output signal and the desired signal, so as to perform filtering operation on the speech processing component according to the updated adaptive filter, and output an enhanced speech signal.
As a specific implementation manner of step 110, the method specifically includes: determining an offset value between the output signal and the desired signal; performing iterative operation by adopting a normalized least mean square function numerical control linear filter and a nonlinear filter according to the deviation value so as to enable a first group of components to approximate to first expected noise and a second group of components to approximate to second expected noise; detecting that the deviation value is reduced until the output signal approaches to the expected signal, and determining a filtering weight of the updated linear filter and a co-factor of the updated nonlinear filter; and generating an updated self-adaptive filter according to the updated filtering weight and the synergistic factor.
In this embodiment, an offset value of the output signal is determined based on a relationship between the output signal and the desired signal, the filtering weight and the co-factor are continuously updated based on the offset value, and an iterative operation is performed based on the updated filtering weight and the updated co-factor until the offset value is reduced until the output signal approaches the desired signal, so as to obtain an optimized adaptive filtering weight, that is, an updated adaptive filter, and then the updated adaptive filter performs a filtering operation, so that the recognition rate of speech recognition can be improved in an environment of white noise, pink noise, F16 noise, and babble noise.
In this embodiment, the multi-microphone array constructs a plurality of microphone channels, due to the difference of the arrangement positions of the microphones, the audio signals collected by the microphone array will have a certain time delay phenomenon, the synchronous audio signals are obtained by performing time delay compensation on the audio signals to ensure the consistency of the audio signals received by each microphone channel, the synchronous audio signals are divided into two groups, the first group of synchronous audio signals is synchronously weighted to obtain a speech processing component, the second group of synchronous signals is processed by an adaptive filter to obtain a noise processing component, an output signal is synthesized by the speech processing component and the noise processing component, and the adaptive filter is further updated based on the relationship between the output signal and a preset desired signal, the speech processing component is re-filtered by the updated adaptive filter, to output an enhanced speech signal.
The speech enhancement method comprises the steps of extracting useful speech signals from noise backgrounds after the speech signals are interfered and ground by various noise signals, and inhibiting and reducing noise interference.
The expected signal is a clean voice signal, and can be specially collected in a training stage to be used as a detection reference for judging whether the adaptive filter is reasonable or not.
In addition, the speech enhancement processing scheme defined in the present application has no frequency estimation (frequency transform) and adopts an adaptive filter instead of a speech model, so that the speech enhancement processing scheme has a better enhancement effect in a low signal-to-noise ratio condition and a better noise suppression effect, and can obtain a higher evaluation in subjective speech quality evaluation (PESQ).
Example two
As shown in fig. 2, a speech enhancement processing apparatus 20 according to an embodiment of the present application includes: a memory 202 and a processor 204; a memory 202 for storing program code; and the processor 204 is used for calling the program codes to execute the voice enhancement processing method of the embodiment.
As shown in fig. 3, the processing procedure of the speech enhancement processing method executed by the speech enhancement processing apparatus according to an embodiment of the present application includes: receiving audio signals collected by a multi-microphone array (Mic1, Mic2, …, MicN), including X (N), X (N-1), X (N-N +1) and the like, and obtaining a synchronous signal [ X (N-N) +1) by carrying out time delay estimation and time delay compensation1(n),X2(n),…,XN(n)]Dividing the synchronization signals into a first set of synchronization signals and a second set of synchronization signals, wherein the first set of synchronization signals is input into a weighting matrix comprising fixed non-adaptive weighting factors to output y through a weighting processc(n)。
Inputting the second group of synchronous signals into a blocking matrix and outputting a noise signal u1(n)、u2(n)…uN-1(n) passing the first set of noise signals through a linear filter Wl(n) filtering, outputting ys1(n) corresponding to the first contraction parameter lambda1(n)。
Mapping a second group of noise signals into a (0, 1) range after the second group of noise signals pass through a Tanh activation function, providing a convergence condition for a Legendre polynomial function, expanding the Legendre polynomial function, and passing through a cooperation vector [ C ]0,1(n),...CM,N-1(n)]To go inRow weighted sum), implements nonlinear filtering, outputs ys2(n) corresponding to the first contraction parameter lambda2(n) to ys1(n) in ys2(n) performing a collaborative summation to output ys(n)。
Will yc(n) and ys(n) to obtain an output signal y (n).
And according to the output signal y (n), the expected signal d (n) and the normalized least mean square NLMS corresponding to e (n), iteration is carried out, the filtering weight values of linear filtering and nonlinear filtering are updated based on the iteration result, and the filtering operation is executed based on the updated result to obtain a voice enhancement result.
EXAMPLE III
As shown in fig. 4, the home appliance 1 according to an embodiment of the present application includes the speech enhancement processing device 20 according to the second embodiment.
The home appliance 1 may further include a microphone 70 to perform a voice enhancement process on the audio signal collected by the microphone 70, wherein the microphone 70 may be a microphone array composed of a plurality of microphones.
Specifically, the household appliance includes any one of a refrigerator, a washing machine, an air conditioner, an oven, an electric cooker, a microwave oven and a floor sweeping robot.
Example four
As shown in fig. 5, a speech enhancement system according to an embodiment of the present application includes: the speech enhancement processing apparatus 20 according to the above embodiment; and the clock device 30 is electrically connected with the voice enhancement processing device 20 and is used for providing an operation clock for the voice enhancement processing device 20.
In this embodiment, the clock device 30 includes a crystal oscillator with high frequency stability, a phase-locked loop PLL (frequency synthesizer) that generates multiple paths of low phase noise. The frequency synthesizer consists of a frequency reference, a phase detector, a charge pump, a loop filter and a Voltage Controlled Oscillator (VCO). A frequency synthesizer based on PLL technology adds two frequency dividers, one for lowering the reference frequency and the other for dividing the VCO to achieve adjustment of the operating frequency.
In the above embodiment, the method further includes: the resetting device 40 is electrically connected with the speech enhancement processing device 20, and the resetting device 40 includes a power-on resetting module and a monitoring module, the power-on resetting module is used for loading firmware required by the speech enhancement processing device 20, and the monitoring module is used for monitoring the running state of the speech enhancement processing device 20.
In this embodiment, the resetting device 40 is provided to ensure the normal operation of the system, and prevent the measures of error locking, thereby improving the stability of the operation of the voice enhancement system.
In any of the above embodiments, further comprising: a power supply device 50 electrically connected to the speech enhancement processing device 20 for supplying power to the speech enhancement processing device 20; and a communication device 60 electrically connected to the speech enhancement processing device 20 for transmitting the processing result of the speech enhancement processing device 20.
In this embodiment, the power supply device 50 supplies power to other devices, and the dc power supply of the device is composed of the LDO and the high-efficiency DCDC, the DCDC provides voltages of 5V/1A and 3.3V/2A, the efficiency is up to 90%, the LDO provides 1.8V/1.5A, the power ripple does not exceed 10mV, and the core operation performance of the speech enhancement processing device 20 is ensured.
In addition, the communication device 60 may include a Combo two-in-one module for WIFI and BLE.
The speech enhancement processing device 20 may be implemented by a dedicated ASIC (application specific integrated circuit) or SOC (system on chip), or may be implemented separately on the signal processing module 208 and the control module 206.
In any of the above embodiments, the speech enhancement processing apparatus 20 includes: a signal processing module 208, configured to execute the speech enhancement processing method according to the embodiment of the first aspect of the present application; a control module 206 for controlling the signal processing module 208 to execute the speech enhancement processing method.
In this embodiment, the control module 206 is mainly a microcontroller, and includes a risc (reduced instruction set computer) processor core, a high-performance CPU, and a low-power mode, and can ensure deterministic operation, and in addition, integrates a sleep state support, multiple power domains, and architecture-based software control, which facilitates interrupt processing, and provides hardware guarantee for the operation of the signal processing module 208.
Based on the limitation of the voice enhancement system, the voice enhancement processing method is operated on the platform to provide a low signal-to-noise ratio operating environment for voice enhancement processing, and the consistency of the microphone array can be improved, so that the voice enhancement processing effect is improved.
EXAMPLE five
As shown in fig. 6, a home appliance according to another embodiment of the present application includes the voice enhancement system according to the fourth embodiment, that is, the home appliance can include the voice enhancement processing device 20, the clock device 30, the reset device 40, the power supply device 50, the communication device 60, and the like according to the fourth embodiment. In addition, the home appliance may further include a microphone 80 to perform a voice enhancement process on the audio signal collected by the microphone 80, where the microphone 80 may be a microphone array composed of a plurality of microphones.
According to the computer-readable storage medium provided in the embodiments of the present application, when being executed by a processor, a computer program implements the steps of the speech enhancement processing method defined in any one of the above technical solutions, so that the method has the technical effects of the control method defined in any one of the above technical solutions, and details are not repeated herein.
Specifically, the household appliance includes any one of a refrigerator, a washing machine, an air conditioner, an oven, an electric cooker, a microwave oven and a floor sweeping robot.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It should be noted that in the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The application can be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.
The term "plurality" means two or more unless expressly limited otherwise. The terms "mounted," "connected," "fixed," and the like are to be construed broadly, and for example, "connected" may be a fixed connection, a removable connection, or an integral connection; "coupled" may be direct or indirect through an intermediary. The specific meaning of the above terms in the present application can be understood by those of ordinary skill in the art as appropriate.
While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the scope of the claims of the application and their equivalents, and it is intended that the present application also include such changes and modifications.

Claims (14)

1. A method of speech enhancement processing, comprising:
performing delay compensation on audio signals collected by a multi-microphone array to configure synchronous audio signals, wherein the synchronous audio signals comprise a first group of synchronous signals and a second group of synchronous signals;
performing a weighting process on the first set of synchronization signals to generate speech processing components;
extracting noise signals in the second group of synchronous signals, and performing filtering processing on the noise signals by adopting a self-adaptive filter to generate noise processing components;
controlling the noise processing component to perform a cancellation operation on the speech processing component to generate an output signal;
and updating the adaptive filter according to the relation between the output signal and the expected signal, so as to carry out filtering operation on the voice processing component according to the updated adaptive filter and output an enhanced voice signal.
2. The method according to claim 1, wherein the weighting the first group of synchronization signals to generate the speech processing component comprises:
configuring a corresponding weighting matrix by adopting a fixed non-adaptive weighting coefficient;
and carrying out weighting processing on the first group of synchronous signals according to the weighting matrix so as to cancel noise of the multi-microphone array and generate the voice processing component.
3. The speech enhancement processing method according to claim 1, wherein the adaptive filter comprises a linear filter and a nonlinear filter, and the extracting a noise signal from the second set of synchronization signals and performing a filtering process on the noise signal by using the adaptive filter to generate a noise processing component specifically comprises:
filtering the target voice in the second group of synchronous signals by adopting a blocking matrix to obtain noise signals, wherein the noise signals comprise a first group of noise signals and a second group of noise signals;
performing a linear filtering operation on the first set of noise signals to filter out linear noise and generate a first set of components;
performing a nonlinear filtering operation on the second set of noise signals with the nonlinear filter to filter out nonlinear noise and generate a second set of components;
wherein the blocking matrix is determined as the linear filter.
4. The speech enhancement processing method according to claim 3, wherein the performing a nonlinear filtering operation on the second set of noise signals to filter out nonlinear noise and generate a second set of components comprises:
performing activation processing on the second set of noise signals to map the second set of noise signals to a specified range and generate intermediate noise processing signals;
adopting a Legendre polynomial function to perform extension processing on the intermediate noise processing signal so as to generate a plurality of extension signals;
performing a compensation operation on the spread signal using a cooperation vector to output the second set of components,
wherein the activation function is formed using a hyperbolic tangent function configuration.
5. The method according to claim 4, wherein the updating the adaptive filter according to the relationship between the output signal and the desired signal to perform the filtering operation on the speech processing component according to the updated adaptive filter and output the enhanced speech signal comprises:
determining an offset value between the output signal and the desired signal;
controlling the linear filter and the nonlinear filter to execute iterative operation by adopting a normalized least mean square function according to the deviation value so as to enable the first group of components to approximate to first expected noise and the second group of components to approximate to second expected noise;
detecting that the deviation value is reduced until the output signal approaches the expected signal, and determining the updated filtering weight of the linear filter and the updated co-factor of the nonlinear filter;
and generating the updated adaptive filter according to the updated filtering weight and the updated synergistic factor.
6. The speech enhancement processing method according to any one of claims 1 to 5, wherein the performing delay compensation on the audio signals collected by the multi-microphone array to configure a synchronous audio signal comprises:
performing time delay estimation on the audio signal;
and performing time delay compensation according to the time delay estimation result to complete the synchronous processing of the audio signals and configure the synchronous audio signals.
7. A speech enhancement processing apparatus, comprising: a memory and a processor;
the memory for storing program code;
the processor, configured to invoke the program code to execute the speech enhancement processing method according to any one of claims 1 to 6.
8. A speech enhancement system, comprising:
the speech enhancement processing apparatus of claim 7;
and the clock device is electrically connected with the voice enhancement processing device and is used for providing a working clock for the voice enhancement processing device.
9. The speech enhancement system of claim 8, further comprising:
the reset device is electrically connected with the voice enhancement processing device and comprises a power-on reset module and a monitoring module, wherein the power-on reset module is used for loading firmware required by the voice enhancement processing device, and the monitoring module is used for monitoring the running state of the voice enhancement processing device.
10. The speech enhancement system of claim 8, further comprising:
the power supply device is electrically connected with the voice enhancement processing device and is used for supplying power to the voice enhancement processing device;
and the communication device is electrically connected with the voice enhancement processing device and is used for transmitting the processing result of the voice enhancement processing device.
11. The speech enhancement system according to any one of claims 8 to 10, wherein the speech enhancement processing means comprises:
a signal processing module for performing the speech enhancement processing method of any one of claims 1 to 6;
and the control module is used for controlling the signal processing module to execute the voice enhancement processing method.
12. An appliance, comprising:
the speech enhancement processing device of claim 7.
13. An appliance, comprising:
the speech enhancement system of any one of claims 8 to 11.
14. A computer-readable storage medium having stored thereon a speech enhancement processing program, characterized in that the speech enhancement processing program, when executed by a processor, implements the speech enhancement processing method of any one of claims 1 to 6.
CN201911343570.XA 2019-12-24 2019-12-24 Voice enhancement processing method, device and system, household appliance and storage medium Pending CN111048105A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911343570.XA CN111048105A (en) 2019-12-24 2019-12-24 Voice enhancement processing method, device and system, household appliance and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911343570.XA CN111048105A (en) 2019-12-24 2019-12-24 Voice enhancement processing method, device and system, household appliance and storage medium

Publications (1)

Publication Number Publication Date
CN111048105A true CN111048105A (en) 2020-04-21

Family

ID=70238713

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911343570.XA Pending CN111048105A (en) 2019-12-24 2019-12-24 Voice enhancement processing method, device and system, household appliance and storage medium

Country Status (1)

Country Link
CN (1) CN111048105A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101976565A (en) * 2010-07-09 2011-02-16 瑞声声学科技(深圳)有限公司 Dual-microphone-based speech enhancement device and method
CN102938254A (en) * 2012-10-24 2013-02-20 中国科学技术大学 Voice signal enhancement system and method
CN104835503A (en) * 2015-05-06 2015-08-12 南京信息工程大学 Improved GSC self-adaptive speech enhancement method
CN110491405A (en) * 2019-08-21 2019-11-22 南京信息工程大学 Microphone array voice enhancement method based on collaboration nonlinear adaptive filtering

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101976565A (en) * 2010-07-09 2011-02-16 瑞声声学科技(深圳)有限公司 Dual-microphone-based speech enhancement device and method
CN102938254A (en) * 2012-10-24 2013-02-20 中国科学技术大学 Voice signal enhancement system and method
CN104835503A (en) * 2015-05-06 2015-08-12 南京信息工程大学 Improved GSC self-adaptive speech enhancement method
CN110491405A (en) * 2019-08-21 2019-11-22 南京信息工程大学 Microphone array voice enhancement method based on collaboration nonlinear adaptive filtering

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
张建国: "《高频电子技术》", 31 July 2018 *
赵益波: "麦克风阵列的协同自适应滤波语音增强方法", 《现代电子技术》 *

Similar Documents

Publication Publication Date Title
CN107316649B (en) Speech recognition method and device based on artificial intelligence
RU2450368C2 (en) Multiple microphone voice activity detector
CN102365679B (en) Signal processing device and signal processing method
US20190172450A1 (en) Voice enhancement in audio signals through modified generalized eigenvalue beamformer
CN106537939A (en) Method of optimizing parameters in a hearing aid system and a hearing aid system
KR20080002990A (en) Systems and methods for reducing audio noise
EP1428412A1 (en) Hearing aid with performance-optimized power consumption for variable clock, supply voltage and dsp processing parameters
US9036816B1 (en) Frequency domain acoustic echo cancellation using filters and variable step-size updates
US20170287502A1 (en) Residual Interference Suppression
US9583120B2 (en) Noise cancellation apparatus and method
US8352256B2 (en) Adaptive reduction of noise signals and background signals in a speech-processing system
JP6190373B2 (en) Audio signal noise attenuation
CN111048105A (en) Voice enhancement processing method, device and system, household appliance and storage medium
Dewasthale et al. Acoustic noise cancellation using adaptive filters: A survey
CN108597531B (en) Method for improving dual-channel blind signal separation through multi-sound-source activity detection
CN109360578A (en) Echo cancel method, audio frequency apparatus and the readable storage medium storing program for executing of audio frequency apparatus
CN113299261A (en) Active noise reduction method and device, earphone, electronic equipment and readable storage medium
Doblinger An adaptive Kalman filter for the enhancement of noisy AR signals
CN108986837A (en) A kind of filter update method and device
Shin et al. An affine projection algorithm with update-interval selection
Vairetti et al. Sparse linear parametric modeling of room acoustics with orthonormal basis functions
Zeller et al. Efficient adaptive DFT-domain Volterra filters using an automatically controlled number of quadratic kernel diagonals
Eckhard et al. Data-based controller tuning: Improving the convergence rate
CN110518588B (en) Filtering method, device, equipment and power electronic device
CN113365176B (en) Method and device for realizing active noise elimination and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200421

RJ01 Rejection of invention patent application after publication