CN114627847A

CN114627847A - Active noise reduction method and system based on frequency spectrum mapping

Info

Publication number: CN114627847A
Application number: CN202210231973.0A
Authority: CN
Inventors: 汪付强; 夏源; 袁从刚; 张鹏; 吴晓明; 张建强
Original assignee: Shandong Computer Science Center National Super Computing Center in Jinan
Current assignee: Qilu University of Technology; Shandong Computer Science Center National Super Computing Center in Jinan
Priority date: 2022-03-09
Filing date: 2022-03-09
Publication date: 2022-06-14

Abstract

The invention belongs to the technical field of active noise reduction, and provides an active noise reduction method and system based on frequency spectrum mapping, wherein based on the frequency spectrum mapping, a noise signal or a voice signal with noise is input into an LSTM network to map and output the frequency spectrum of the noise signal or the noise signal in the voice signal with noise, the noise is counteracted according to a destructive interference principle, the purpose of reducing the noise sound pressure level at an error microphone is achieved, finally, the error signal is used for calculating a loss function and feeding back to the LSTM network until the network loss function is minimum and a convergence state is achieved, and the method uses deep learning knowledge in the field of active noise reduction, so that the noise reduction effect is improved, and the problem of dependence of a traditional signal processing method on hardware equipment is reduced.

Description

Active noise reduction method and system based on frequency spectrum mapping

Technical Field

The invention belongs to the technical field of active noise reduction, and particularly relates to an active noise reduction method and system based on frequency spectrum mapping.

Background

The statements in this section merely provide background information related to the present disclosure and may not constitute prior art.

With the development of science and technology and industry, the quality of life of people is continuously improved, and noise pollution is generated along with the improvement of the quality of life of people, so that the noise pollution is gradually an important factor influencing the quality of life and physical and psychological health of people. Research shows that mild noise can irritate people and affect sleep quality; if the human hearing is affected in a noise environment of 85 db or more for a long time, noise of more than 120 db may cause the human hearing to be lost. Meanwhile, the secretion of cortisol in a human body is increased due to noise, so that the incidence rate of diseases such as hypertension, heart disease, gastric ulcer and the like is increased. Therefore, the effective control of noise is receiving more and more extensive attention, and how to perform fast and effective noise control is a hotspot and difficulty of current research.

At present, the control of noise is mainly divided into passive noise reduction and active noise reduction. Most passive noise reduction adopts a physical method to control noise signals from three aspects of noise generation, transmission and reception, and the method has a good control effect on high-frequency noise. The active noise reduction method is based on a destructive interference principle, and compensates the poor low-frequency noise reduction effect of the passive noise reduction method by generating the anti-noise signal with the same amplitude as the noise signal and the opposite phase to the noise signal to offset the noise signal. Most of traditional active noise reduction methods use a linear filter as a controller, and the coefficient of the filter is updated in real time by using a self-adaptive algorithm to generate an anti-noise signal, so that the aim of minimizing the sound pressure value of an observation point is fulfilled. However, in a real situation, the effect of a conventional linear system is affected by the problem of nonlinear distortion existing in devices such as a speaker and a microphone, and at present, an active noise reduction method is mainly based on a signal processing method, and filter coefficients are updated by using an adaptive filtering algorithm to generate an Anti-noise (Anti-noise) signal.

Disclosure of Invention

In order to solve at least one technical problem in the background art, the present invention provides an active noise reduction method and system based on spectrum Mapping, which estimates an imaginary spectrogram and a real spectrogram of a noise signal from a noisy speech by using a Complex spectrum Mapping (LSTM) network, obtains an Anti-noise (Anti-noise) signal by performing an inverse fourier transform, and removes the noise signal.

In order to achieve the purpose, the invention adopts the following technical scheme:

the first aspect of the present invention provides an active noise reduction method based on spectrum mapping, which includes the following steps:

acquiring a reference voice signal;

training a neural network model according to the reference voice signal and the impulse response under the physical structures of different sound field environments, and performing frequency spectrum mapping by adopting the trained neural network model to generate an estimated noise signal;

playing the estimated noise signal after inverting to obtain an anti-noise signal;

the noise signal in the noise signal and the noisy speech signal is cancelled from the anti-noise signal according to a principle of destructive interference.

A second aspect of the present invention provides an active noise reduction system based on spectrum mapping, comprising:

a speech signal acquisition module configured to: acquiring a reference voice signal;

a noise signal estimation module configured to: training a neural network model according to the reference voice signal and the impulse response under the physical structures of different sound field environments, and performing frequency spectrum mapping by adopting the trained neural network model to generate an estimated noise signal;

an anti-noise signal generation module configured to: playing the estimated noise signal after inverting the phase to obtain an anti-noise signal;

a noise cancellation module configured to: the noise signal in the noise signal and the noisy speech signal is cancelled from the anti-noise signal according to a principle of destructive interference.

A third aspect of the invention provides a computer-readable storage medium.

A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of a method for active noise reduction based on spectral mapping as described above.

A fourth aspect of the invention provides a computer apparatus.

A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of a method for active noise reduction based on spectral mapping as described above when executing the program.

Compared with the prior art, the invention has the beneficial effects that:

the invention estimates the virtual spectrogram and the real spectrogram of the input noise signal and the real spectrogram and the virtual spectrogram of the noise signal in the voice signal with noise by using the LSTM network, and cancels the noise signal in the noise signal and the voice signal with noise according to the destructive interference principle, thereby improving the noise reduction effect.

The invention applies the deep learning knowledge to the field of active noise reduction, estimates the real spectrogram and the virtual spectrogram of an ideal anti-noise signal by using the pre-trained LSTM network, and does not estimate the frequency spectrum of the anti-noise signal generated by a filter, thereby reducing the problem of dependence of the traditional signal processing method on hardware equipment.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the invention and together with the description serve to explain the invention and not to limit the invention.

FIG. 1 is a general flowchart of an active noise reduction method based on spectrum mapping according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of an active noise reduction data processing flow based on spectrum mapping according to an embodiment of the present invention;

fig. 3 is a diagram of an LSTM network architecture according to an embodiment of the present invention.

Detailed Description

The invention is further described with reference to the following figures and examples.

It is to be understood that the following detailed description is exemplary and is intended to provide further explanation of the invention as claimed. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.

Example one

The embodiment provides an active noise reduction method based on spectrum Mapping, which estimates an imaginary spectrogram and a real spectrogram of a noise signal from a noisy speech by using a Complex spectrum Mapping (LSTM) network, obtains an Anti-noise (Anti-noise) signal by performing inverse fourier transform, and eliminates the noise signal. The invention uses the deep learning knowledge in the field of active noise reduction, not only improves the noise reduction effect, but also reduces the problem of dependence of the traditional signal processing method on hardware equipment. In order to overcome the defects of the traditional active noise reduction method, the invention provides a complex frequency spectrum mapping method for active noise reduction, the real spectrogram and the virtual spectrogram of an Anti-noise (Anti-noise) signal are estimated, and the generated ideal Anti-noise (Anti-noise) is the same as the main noise (Primary noise), so that the aim of completely eliminating the main noise (Primary noise) is fulfilled.

As shown in fig. 1-2, an active noise reduction method based on spectrum mapping includes the following steps:

s101, acquiring reference voice signal data x (t) at a reference microphone;

s102, converting the reference voice signal X (t) from a time domain to a frequency domain through short-time Fourier transform (STFT) to obtain a real frequency spectrum X_rSum imaginary spectrum X_i；

S103: simulating reverberation appearing in the sound field environment by using an imaging method according to a physical structure of the designed sound field environment, wherein the design comprises the sound field shape and a sensor placement position, and the estimation comprises the pulse response of a main path and a secondary path by using the imaging method;

s104, according to the real frequency spectrum X of the reference speech signal_rSum imaginary spectrum X_iTraining a neural network model in the impulse response under the physical structures of different sound field environments, and performing frequency spectrum mapping by adopting the trained neural network model to generate a real spectrogram Y of an estimated noise signal_rAnd dotted spectrum Y_i；

The training process of the neural network model comprises the following steps: simulating sound field space impulse responses of the main path and the secondary path by using an image method, and putting the model in different sound field space impulse responses for training; acquiring an error signal e (t) detected at an error microphone, calculating a loss function according to the error signal, and feeding back to the neural network model until the network loss function is minimum and reaches a convergence state;

s105, carrying out short-time inverse Fourier transform (ISTFT) on the real spectrogram and the imaginary spectrogram of the noise signal to obtain an estimated noise signal y (t);

s106, the loudspeaker plays the noise signal y (t) after inverting the phase to obtain an anti-noise signal a (t);

s107: because the anti-noise signal a (t) and the noise signal y (t) have the characteristics of same amplitude and opposite phases, the noise signal y (t) and the noise signal in the noise-containing speech signal are eliminated according to the anti-noise signal a (t) according to the destructive interference principle;

s108: the system noise reduction effect is evaluated using Normalized Mean Square Error (NMSE), short-term objective intelligibility (STOI), and Perceptual Evaluation of Speech Quality (PESQ).

After acquiring reference voice signal data x (t), preprocessing the reference voice signal data, including the following steps:

(1) classifying the reference voice signal, and dividing the reference voice signal into voice data without noise and a noise signal;

storing the data in the data set as HDF5 files, and classifying the voice data without noise into sph classes; noise is classified into noi categories; noisy speech is a mixture of a noise signal and a speech signal that is free of noise.

When the reference signal is a noise signal, selecting a plurality of noises in a sound effect library as a training set, and selecting a part of noises as a verification set;

when the reference signal is a voice signal with noise, the voice signal with noise is mixed by selecting the noise signal from the sound effect library and randomly selecting voice data without noise.

For example, when the reference signal is a noise signal, 10000 kinds of noises are selected from a sound effect library as a training set, and engine noise, factory noise and the like in NOISEX-92 are selected as a verification set to check the network noise reduction effect;

when the reference signal is a noisy speech signal, the noisy signal is obtained by randomly selecting and mixing the 10000 noise signals and clean speech in the TIMIT corpus.

(2) The signal is clipped and framed.

For example, speech data is clipped to a signal of 6 minutes duration, the input data set sampling rate is 16KHZ, the input signal is divided into 20ms frames, 0 frames of 5ms are added before the frames, and 5ms overlap between successive frames; the signal-to-noise ratio of the mixed noise is randomly selected from (-5dB, -4dB, -3dB, -2dB, -1dB, 0dB), but the size of each data is not limited to the above, and the data can be automatically adjusted according to the data sampling rate; and transmitting the processed signal into an LSTM network after Fourier transform (STFT).

In S103, the simulated sound field space is a rectangular space, and the specific arrangement method is as follows:

for example, the sound field space is simulated as a rectangular space of 3m × 4m × 2m, where the position of the reference microphone for receiving the noise signal is (1.5, 1, 1) m, the position of the speaker for generating the anti-noise signal is (1.5, 2.5, 1) m, and the position of the error microphone for detecting the error signal is (1.5, 3, 1) m.

The estimation of the primary and secondary paths includes:

example of implementation: the Method comprises the steps of simulating room Impulse Responses (IRs) generated by different reverberation durations by using an Image Method (Image Method), randomly selecting and generating the impulse responses from a plurality of reverberation times (0.15s, 0.175s, 0.225s, 0.2s and 0.25s) in a training phase, and setting the impulse response length to be 512. Each reverberation time is not limited to this, and may be adjusted according to physical structure conditions.

In S103, after Fourier transform, 0 frame with the same length as the predicted frame is added before the frame of the reference voice signal, so as to achieve the purpose of predicting the future frame;

if the frame length to be predicted is m, a 0 frame with the length of m should be added before a frame of the reference speech signal with the length of n, that is, the total length of the signal frames added with the 0 frame is (m + n), and since the neural network training target remains unchanged, it is equivalent to predict m frames in the future each time the frame with the length of n is truncated.

1/4 referring to the window length of the speech signal is used as frame shift to achieve the goal of obtaining multiple predicted frames by using one time-frequency unit.

As shown in fig. 3, in S104, the training process includes training noise and training a noisy speech signal, the neural network model adopts a Deep Neural Network (DNN) and a four-layer long-term memory network (LSTMnetwork), and full-connectivity layers are respectively added to two ends of the four-layer LSTM network, so that the advantages of the full-connectivity layer and the LSTM are retained;

the training samples for training the model are divided into noise signals and noisy speech signals:

if the training sample is a noise signal, the training target should be an ideal anti-noise signal;

if the training sample is a noisy speech signal, the training target should be a noise component in the noisy speech signal.

In S104, the calculating a loss function according to the error signal includes:

when the reference signal is a noise signal, the loss function is the mean square value of the error signal, and the formula is as follows:

where m is the signal length and e (t) is the error signal at the error microphone.

When the reference signal is a noisy speech signal, the loss function should be the difference between the error signal and the clean speech, which is expressed as follows:

where p (t) is the impulse response of the main path and v (t) is the clean speech signal.

The optimizer of the entire LSTM network, which uses Adam, is determined while reading out the noise signal of the HDF5 file.

Example two

This implementation provides an active noise reduction system based on spectrum mapping, including:

carrying out Fourier transform on the reference voice signal to obtain a real frequency spectrum and an imaginary frequency spectrum;

a noise signal estimation module configured to: training a neural network model in the real frequency spectrum and the virtual frequency spectrum of a reference voice signal and the impulse response under the physical structures of different sound field environments according to the designed physical structure of the sound field environment, and performing frequency spectrum mapping by adopting the trained neural network model to generate an estimated noise signal;

an anti-noise signal generation module configured to: playing the noise signal after inverting the phase to obtain an anti-noise signal;

a noise cancellation module configured to: the noise signal in the noise signal and the noisy speech signal is cancelled from the anti-noise signal according to a destructive interference principle.

EXAMPLE III

The present embodiment provides a computer readable storage medium having stored thereon a computer program which, when being executed by a processor, implements the steps of a method for active noise reduction based on spectral mapping as described above.

Example four

The present embodiment provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the program, the processor implements the steps in the active noise reduction method based on spectrum mapping as described above.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.

The present invention has been described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. An active noise reduction method based on spectrum mapping is characterized by comprising the following steps:

acquiring a reference voice signal;

playing the estimated noise signal after inverting the phase to obtain an anti-noise signal;

2. The method of claim 1, wherein the design of the physical structure of the soundfield environment comprises a design of a soundfield shape and a sensor placement location, the soundfield shape being a rectangular space, the sensor placement location comprising a reference microphone location, a speaker location that produces the anti-noise signal, and an error microphone location.

3. The active noise reduction method based on spectrum mapping as claimed in claim 1, wherein the neural network model employs a deep neural network and a four-layer long-and-short-term memory network, and full connection layers are added at two ends of the four-layer LSTM network respectively.

4. The active noise reduction method based on spectrum mapping of claim 1, wherein training the neural network model in the impulse response under the physical structure of the different sound field environments comprises: and (3) simulating sound field space impulse responses of the main path and the secondary path by using an image method, and placing the model in different sound field space impulse responses for training.

5. The active noise reduction method based on spectrum mapping according to claim 1, wherein the training of the neural network model further comprises: and obtaining an error signal detected by the error microphone, calculating a loss function according to the error signal, and feeding back to the neural network model until the network loss function is minimum and reaches a convergence state.

6. The active noise reduction method based on spectral mapping of claim 1, wherein after obtaining the reference speech signal data, the reference speech signal data is preprocessed, the reference speech signal is classified, the reference speech signal is divided into noise-free speech data and noise signal, and the classified reference speech signal is clipped and framed.

7. The active noise reduction method based on spectral mapping of claim 1, wherein after obtaining the reference speech signal data, the reference speech signal is transformed from the time domain to the frequency domain by short-time fourier transform to obtain real spectrum and imaginary spectrum.

8. An active noise reduction system based on spectral mapping, comprising:

an anti-noise signal generation module configured to: playing the estimated noise signal after inverting to obtain an anti-noise signal;

9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of a method for active noise reduction based on spectral mapping according to any one of claims 1-7.

10. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program implements the steps of a method for active noise reduction based on spectral mapping according to any of claims 1-7.