CN113470684A

CN113470684A - Audio noise reduction method, device, equipment and storage medium

Info

Publication number: CN113470684A
Application number: CN202110837937.4A
Authority: CN
Inventors: 张之勇; 王健宗
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2021-07-23
Filing date: 2021-07-23
Publication date: 2021-10-01
Anticipated expiration: 2041-07-23
Also published as: CN113470684B

Abstract

The invention relates to artificial intelligence and provides an audio noise reduction method, an audio noise reduction device, audio noise reduction equipment and a storage medium. The method can be used for preprocessing a band noise frequency to obtain frequency spectrum information, processing the frequency spectrum information based on a frequency domain signal processing network to obtain frequency spectrum mask characteristics, obtaining time frequency characteristics according to the frequency spectrum information and the frequency spectrum mask characteristics, processing the time frequency characteristics based on a time domain signal processing network to obtain the time frequency mask characteristics, generating predicted audio according to the time frequency characteristics and the time frequency mask characteristics, adjusting network parameters of a preset learner based on the predicted audio and pure audio to obtain a noise reduction model, obtaining request audio, and performing noise reduction processing on the request audio based on the noise reduction model to obtain target audio. The invention can improve the noise reduction accuracy and the real-time performance of the requested audio. Furthermore, the invention also relates to a blockchain technique, the target audio can be stored in a blockchain.

Description

Audio noise reduction method, device, equipment and storage medium

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to an audio noise reduction method, device, equipment and storage medium.

Background

In a telephone conference such as a remote office call, there is a high demand for real-time performance and accuracy of audio noise reduction, however, in the current noise reduction mode, information at a frame level is usually processed in a complete speech sequence, resulting in low noise reduction efficiency.

Therefore, how to improve the real-time performance and accuracy of audio noise reduction becomes a technical problem which needs to be solved urgently.

Disclosure of Invention

In view of the foregoing, it is desirable to provide an audio noise reduction method, apparatus, device and storage medium, which can improve the noise reduction accuracy and noise reduction real-time performance of the requested audio.

In one aspect, the present invention provides an audio noise reduction method, where the audio noise reduction method includes:

the method comprises the steps of obtaining an audio sample and obtaining a preset learner, wherein the audio sample comprises a noisy audio and a pure audio, and the preset learner comprises a frequency domain signal processing network and a time domain signal processing network;

preprocessing the band noise frequency to obtain frequency spectrum information;

processing the frequency spectrum information based on the frequency domain signal processing network to obtain frequency spectrum mask characteristics corresponding to the frequency spectrum information;

acquiring the time-frequency characteristics of the noisy audio according to the frequency spectrum information and the frequency spectrum mask characteristics;

processing the time-frequency characteristics based on the time-domain signal processing network to obtain time-frequency mask characteristics;

generating a prediction audio according to the time-frequency characteristics and the time-frequency mask characteristics;

adjusting network parameters of the preset learner based on the predicted audio and the pure audio to obtain a noise reduction model;

and acquiring a request audio, and performing noise reduction processing on the request audio based on the noise reduction model to obtain a target audio.

According to a preferred embodiment of the present invention, the obtaining the audio sample comprises:

counting the audio time of the pure audio;

acquiring audios with the duration less than or equal to the audio duration from a recording library to obtain a plurality of recorded audios;

carrying out arbitrary synthesis processing on the pure audio and each recorded audio to obtain a plurality of noisy audio;

determining a plurality of the noisy audio and the clean audio as the audio samples.

According to a preferred embodiment of the present invention, the preprocessing the band noise frequency to obtain frequency spectrum information includes:

acquiring a preset moving window function;

carrying out Fourier transform on the noisy frequency based on the preset moving window function to obtain a spectrogram;

acquiring preset processing time length, and calculating the ratio of the audio frequency time length to the preset processing time length;

and carrying out segmentation processing on the frequency spectrum graph according to the preset processing duration to obtain the frequency spectrum information, wherein the number of the frequency spectrum information is the same as the ratio.

According to a preferred embodiment of the present invention, the frequency domain signal processing network includes a gated neural network, a fully connected network, and an activation function, the gated neural network includes a reset gate and an update gate, and the processing the spectral information based on the frequency domain signal processing network to obtain a spectral mask feature corresponding to the spectral information includes:

acquiring time sequence information of the frequency spectrum information, wherein the time sequence information comprises a first frequency spectrum at a first moment and a second frequency spectrum at a second moment;

analyzing the first frequency spectrum and the second frequency spectrum based on the reset parameter of the reset gate to obtain candidate information of the second moment;

calculating an amount of information for the first spectrum based on an update parameter in the update gate, the first spectrum, and the second spectrum;

generating output information of the second moment according to the first frequency spectrum, the candidate information and the information amount, determining the output information as the first frequency spectrum until the time sequence information participates in training, and obtaining first network output of the gated neural network;

analyzing the network output according to the weight matrix and the bias value in the fully-connected network to obtain a second network output;

and processing the second network output based on the activation function to obtain the spectrum mask characteristic.

According to a preferred embodiment of the present invention, the obtaining the time-frequency feature of the noisy audio according to the spectrum information and the spectrum mask feature includes:

calculating amplitude information in the frequency spectrum information, and extracting phase information from the frequency spectrum information;

calculating the product of the amplitude information, the phase information and the spectrum mask characteristic to obtain a predicted spectrum;

carrying out inverse Fourier transform processing on the predicted frequency spectrum to obtain a predicted time frequency;

and extracting the characteristics in the predicted time frequency based on the first preset convolution layer to obtain the time frequency characteristics.

According to the preferred embodiment of the present invention, the generating a prediction audio according to the time-frequency feature and the time-frequency mask feature comprises:

calculating the product of the time-frequency characteristic and the time-frequency mask characteristic to obtain an enhanced characteristic;

performing upsampling processing on the enhanced features based on a second preset convolution layer to obtain a restored signal;

acquiring initial information of the restored signal on each time sequence;

if the number of the initial information on any time sequence is multiple, calculating the average value of the multiple initial information on any time sequence to obtain the overlapped information on any time sequence;

generating prediction information according to the initial information and the overlapping information;

and converting the prediction information to obtain the prediction audio.

According to a preferred embodiment of the present invention, the adjusting the network parameters of the preset learner based on the predicted audio and the pure audio to obtain the noise reduction model includes:

acquiring first time domain information of the pure audio and acquiring second time domain information of the predicted audio;

calculating a loss value of the preset learner according to the following formula:

wherein loss means the loss value, y_tRefers to the first time domain information that is,

refers to the second time domain information;

and adjusting the network parameters according to the loss value until the loss value is not reduced any more, so as to obtain the noise reduction model.

In another aspect, the present invention further provides an audio noise reduction apparatus, including:

the device comprises an acquisition unit and a learning unit, wherein the acquisition unit is used for acquiring an audio sample and acquiring a preset learner, the audio sample comprises a noisy audio and a pure audio, and the preset learner comprises a frequency domain signal processing network and a time domain signal processing network;

the preprocessing unit is used for preprocessing the band noise frequency to obtain frequency spectrum information;

the processing unit is used for processing the frequency spectrum information based on the frequency domain signal processing network to obtain frequency spectrum mask characteristics corresponding to the frequency spectrum information;

the acquiring unit is further configured to acquire a time-frequency feature of the noisy audio according to the frequency spectrum information and the frequency spectrum mask feature;

the processing unit is further configured to process the time-frequency feature based on the time-domain signal processing network to obtain a time-frequency mask feature;

the generating unit is used for generating a prediction audio according to the time-frequency characteristics and the time-frequency mask characteristics;

the adjusting unit is used for adjusting the network parameters of the preset learner based on the predicted audio and the pure audio to obtain a noise reduction model;

the obtaining unit is further configured to obtain a request audio, and perform noise reduction processing on the request audio based on the noise reduction model to obtain a target audio.

In another aspect, the present invention further provides an electronic device, including:

a memory storing computer readable instructions; and

a processor executing computer readable instructions stored in the memory to implement the audio noise reduction method.

In another aspect, the present invention also provides a computer-readable storage medium, in which computer-readable instructions are stored, and the computer-readable instructions are executed by a processor in an electronic device to implement the audio noise reduction method.

According to the technical scheme, the whole noise frequency can be converted into frequency spectrum information by preprocessing the noise frequency, so that the processing efficiency of the frequency spectrum information can be improved, the noise reduction efficiency of the noise frequency can be improved, the noise reduction of the noise frequency can be realized in the frequency domain through the frequency domain signal processing network, the phase information of a target sound source can be enhanced in the time frequency through the time frequency signal processing network, the dual noise reduction in the frequency domain and the time frequency can be realized, the noise reduction accuracy of the noise reduction model is improved, and the voice enhancement effect of the target sound frequency is further improved.

Drawings

FIG. 1 is a flow chart of a preferred embodiment of the audio noise reduction method of the present invention.

Fig. 2 is a functional block diagram of an audio noise reduction device according to a preferred embodiment of the present invention.

Fig. 3 is a schematic structural diagram of an electronic device implementing a method for audio noise reduction according to a preferred embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in detail with reference to the accompanying drawings and specific embodiments.

Fig. 1 is a flow chart of a preferred embodiment of the audio noise reduction method according to the present invention. The order of the steps in the flow chart may be changed and some steps may be omitted according to different needs.

The audio noise reduction method is applied to one or more electronic devices, which are devices capable of automatically performing numerical calculation and/or information processing according to computer readable instructions set or stored in advance, and the hardware thereof includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.

The electronic device may be any electronic product capable of performing human-computer interaction with a user, for example, a Personal computer, a tablet computer, a smart phone, a Personal Digital Assistant (PDA), a game machine, an interactive Internet Protocol Television (IPTV), a smart wearable device, and the like.

The electronic device may include a network device and/or a user device. Wherein the network device includes, but is not limited to, a single network electronic device, an electronic device group consisting of a plurality of network electronic devices, or a Cloud Computing (Cloud Computing) based Cloud consisting of a large number of hosts or network electronic devices.

The network in which the electronic device is located includes, but is not limited to: the internet, a wide area Network, a metropolitan area Network, a local area Network, a Virtual Private Network (VPN), and the like.

And S10, acquiring an audio sample and acquiring a preset learner, wherein the audio sample comprises a noisy audio and a pure audio, and the preset learner comprises a frequency domain signal processing network and a time domain signal processing network.

In at least one embodiment of the present invention, the noisy audio refers to an audio including noise information, and the noisy audio is synthesized according to the clean audio and the recorded audio.

The clean audio refers to audio that does not contain noise information.

The frequency domain signal processing network is a network for eliminating noise information from the frequency domain of the noisy audio.

The time domain signal processing network refers to a network which eliminates noise information from the time domain of the noisy audio frequency.

In at least one embodiment of the present invention, the electronic device obtaining an audio sample comprises:

counting the audio time of the pure audio;

Wherein the audio duration refers to a total duration of the pure audio.

The recording library stores a plurality of mapping relations of audio and duration.

The time length of the multiple recorded audios is less than or equal to the audio time length, and the multiple recorded audios can be background sounds such as sirens and the like.

The multiple recorded audios can be obtained through the audio time length, so that the time length of the synthesized noisy audio is the same as the audio time length of the pure audio.

And S11, preprocessing the band noise frequency to obtain frequency spectrum information.

In at least one embodiment of the present invention, the spectrum information refers to information of the band noise frequency on a frequency domain.

In at least one embodiment of the present invention, the electronic device preprocesses the noisy frequency to obtain frequency spectrum information, and the frequency spectrum information includes:

acquiring a preset moving window function;

The preset moving window function can be set according to requirements, and the preset moving window function can enable the noisy audio frequency to output a stable signal within a limited time width.

The spectrogram refers to the mapping relation of the noisy audio frequency on time-energy.

The preset processing time is set according to the noise reduction efficiency requirement.

Through it is right to predetermine the moving window function take noise frequency to carry out Fourier transform, can make the generation spectrogram is more steady, through right the spectrogram carries out segmentation processing, can be convenient for follow-up right the frequency spectrum information parallel processing, thereby improve take noise frequency's the efficiency of making an uproar that falls.

And S12, processing the frequency spectrum information based on the frequency domain signal processing network to obtain a frequency spectrum mask characteristic corresponding to the frequency spectrum information.

In at least one embodiment of the invention, the spectral mask feature is used to mask noise information in the frequency domain for the noisy frequency. The spectrum information corresponds to corresponding spectrum mask characteristics.

In at least one embodiment of the present invention, the frequency domain signal processing network includes a gated neural network, a fully connected network, and an activation function, the gated neural network includes a reset gate and an update gate, and the electronic device processes the spectral information based on the frequency domain signal processing network to obtain a spectral mask feature corresponding to the spectral information includes:

Wherein the reset parameter, the update parameter, the weight matrix and the bias value are network parameters initially set in the preset learner.

The information amount is an information amount of the first spectrum reserved in the second time.

The activation function is typically set to a sigmoid function.

The time sequence information is analyzed through the gated neural network, the problems of gradient disappearance and gradient explosion can be solved, and therefore the accuracy of the frequency spectrum mask characteristics can be improved.

And S13, acquiring the time-frequency characteristics of the noisy audio according to the frequency spectrum information and the frequency spectrum mask characteristics.

In at least one embodiment of the present invention, the time-frequency characteristics refer to characteristics of the noisy audio in time and frequency.

In at least one embodiment of the present invention, the obtaining, by the electronic device, the time-frequency feature of the noisy audio according to the spectrum information and the spectrum mask feature includes:

Wherein the convolution kernel size of the first preset convolution layer is typically set to 1 x 1.

Noise information in the noisy audio frequency can be accurately eliminated through the frequency spectrum mask features, accuracy of the predicted frequency spectrum is improved, and then the time-frequency features can be accurately extracted according to the convolutional layer.

And S14, processing the time-frequency characteristics based on the time-domain signal processing network to obtain time-frequency mask characteristics.

In at least one embodiment of the present invention, the time-frequency mask feature is used to mask noise information of the noisy frequency in the time domain.

In at least one embodiment of the present invention, the time domain signal processing network comprises a transient normalization layer, a gated cyclic unit layer, a fully connected layer, and an activation function. And the electronic equipment processes the time-frequency characteristics based on the instantaneous normalization layer, the gated circulation unit layer, the full connection layer and the activation function to obtain the time-frequency mask characteristics.

In at least one embodiment of the present invention, a manner in which the electronic device processes the time-frequency feature based on the time-domain signal processing network is similar to a manner in which the electronic device processes the frequency spectrum information based on the frequency-domain signal processing network, which is not described herein again.

And S15, generating a prediction audio according to the time frequency characteristics and the time frequency mask characteristics.

In at least one embodiment of the present invention, the predicted audio refers to an audio obtained by denoising the noisy audio in the frequency domain and the time domain by the preset learner.

In at least one embodiment of the present invention, the generating, by the electronic device, a predicted audio according to the time-frequency feature and the time-frequency mask feature includes:

acquiring initial information of the restored signal on each time sequence;

and converting the prediction information to obtain the prediction audio.

Wherein the prediction information refers to information of the predicted audio in a time domain.

According to the embodiment, the generated prediction information can be more gradual, so that the noise reduction effect of the prediction audio is improved.

Specifically, the electronic device generates prediction information according to the initial information and the overlapping information.

For example: the initial information on the first time sequence is n₁The initial information at the second timing has n₂、n₃、n₄The initial information at the third time sequence is n₅If a plurality of initial information is detected at the second timing, the overlapping information at the second timing is calculated as

The prediction information can be further generated as: n is₁、

n₅。

And S16, adjusting the network parameters of the preset learner based on the predicted audio and the pure audio to obtain a noise reduction model.

In at least one embodiment of the present invention, the network parameters include initialization configuration parameters of the frequency domain signal processing network and the time domain signal processing network.

The noise reduction model is used for eliminating noise information in the audio.

In at least one embodiment of the present invention, the adjusting, by the electronic device, the network parameter of the preset learner based on the predicted audio and the pure audio to obtain the noise reduction model includes:

refers to the second time domain information;

The accuracy of the loss value can be improved through the first time domain information and the second time domain information, and therefore the noise reduction precision of the noise reduction model can be ensured according to the loss value.

And S17, acquiring the request audio, and carrying out noise reduction processing on the request audio based on the noise reduction model to obtain the target audio.

In at least one embodiment of the present invention, the requested audio refers to audio that requires noise reduction. The requested audio may be any audio received in real-time.

The target audio is the audio obtained after the noise reduction is performed on the request audio. And if the accuracy of the noise reduction model reaches 100%, the target audio does not contain any noise information.

It is emphasized that, to further ensure the privacy and security of the target audio, the target audio may also be stored in a node of a blockchain.

In at least one embodiment of the present invention, a manner in which the electronic device performs noise reduction processing on the request audio based on the noise reduction model is similar to a manner in which the electronic device performs processing on the noisy audio based on the preset learner to obtain the predicted audio, which is not described in detail herein again.

According to the technical scheme, the model loss value in the preset learner can be accurately determined through the pure audio and the decoded audio predicted by the preset learner on the noisy audio, so that the network parameters can be accurately adjusted according to the model loss value, and the enhancement effect of the audio noise reduction model is improved. In addition, the coding network is used for coding the noisy frequency, and the audio coding information comprises phase information in each voice time sequence state, so that the enhancement effect of the audio noise reduction model can be improved, and the enhancement effect of the target audio can be improved.

Fig. 2 is a functional block diagram of an audio noise reduction device according to a preferred embodiment of the present invention. The audio noise reduction apparatus 11 includes an obtaining unit 110, a preprocessing unit 111, a processing unit 112, a generating unit 113, and an adjusting unit 114. The module/unit referred to herein is a series of computer readable instruction segments that can be accessed by the processor 13 and perform a fixed function and that are stored in the memory 12. In the present embodiment, the functions of the modules/units will be described in detail in the following embodiments.

The obtaining unit 110 obtains an audio sample, and obtains a preset learner, where the audio sample includes a noisy audio and a clean audio, and the preset learner includes a frequency domain signal processing network and a time domain signal processing network.

The clean audio refers to audio that does not contain noise information.

In at least one embodiment of the present invention, the obtaining unit 110 obtains the audio sample including:

counting the audio time of the pure audio;

Wherein the audio duration refers to a total duration of the pure audio.

The preprocessing unit 111 preprocesses the band noise frequency to obtain frequency spectrum information.

In at least one embodiment of the present invention, the preprocessing unit 111 preprocesses the noisy frequency to obtain frequency spectrum information, including:

acquiring a preset moving window function;

The processing unit 112 processes the spectrum information based on the frequency domain signal processing network, and obtains a spectrum mask feature corresponding to the spectrum information.

In at least one embodiment of the present invention, the frequency domain signal processing network includes a gated neural network, a fully connected network, and an activation function, the gated neural network includes a reset gate and an update gate, and the processing unit 112 processes the spectral information based on the frequency domain signal processing network to obtain a spectral mask feature corresponding to the spectral information includes:

The activation function is typically set to a sigmoid function.

The obtaining unit 110 obtains the time-frequency feature of the noisy audio according to the spectrum information and the spectrum mask feature.

In at least one embodiment of the present invention, the obtaining unit 110 obtains the time-frequency characteristic of the noisy audio according to the spectrum information and the spectrum mask characteristic includes:

The processing unit 112 processes the time-frequency feature based on the time-domain signal processing network to obtain a time-frequency mask feature.

In at least one embodiment of the present invention, the time domain signal processing network comprises a transient normalization layer, a gated cyclic unit layer, a fully connected layer, and an activation function. The processing unit 112 processes the time-frequency feature based on the instantaneous normalization layer, the gated cycle unit layer, the full link layer, and the activation function to obtain the time-frequency mask feature.

In at least one embodiment of the present invention, a manner in which the processing unit 112 processes the time-frequency feature based on the time-domain signal processing network is similar to a manner in which the processing unit 112 processes the frequency spectrum information based on the frequency-domain signal processing network, which is not described herein again.

The generating unit 113 generates a prediction audio according to the time-frequency feature and the time-frequency mask feature.

In at least one embodiment of the present invention, the generating unit 113 generates the prediction audio according to the time-frequency feature and the time-frequency mask feature includes:

acquiring initial information of the restored signal on each time sequence;

and converting the prediction information to obtain the prediction audio.

Specifically, the generation unit 113 generates prediction information from the initial information and the superimposition information.

For example: the initial information on the first time sequence is n₁The initial information at the second timing has n₂、n₃、n₄The initial information at the third time sequence is n₅Detecting how much of the initial information at the second timingThen, the overlapping information on the second time sequence is calculated as

The prediction information can be further generated as: n is₁、

n₅。

The adjusting unit 114 adjusts the network parameters of the predetermined learner based on the predicted audio and the clean audio to obtain a noise reduction model.

In at least one embodiment of the present invention, the adjusting unit 114 adjusts the network parameters of the preset learner based on the predicted audio and the pure audio, and obtaining the noise reduction model includes:

refers to the second time domain information;

The obtaining unit 110 obtains the requested audio, and performs noise reduction processing on the requested audio based on the noise reduction model to obtain the target audio.

In at least one embodiment of the present invention, a manner of performing noise reduction processing on the request audio by the obtaining unit 110 based on the noise reduction model is similar to a manner of obtaining the predicted audio by processing the noisy audio based on the preset learner, and details thereof are not repeated herein.

In one embodiment of the present invention, the electronic device 1 includes, but is not limited to, a memory 12, a processor 13, and computer readable instructions, such as an audio noise reduction program, stored in the memory 12 and executable on the processor 13.

It will be appreciated by a person skilled in the art that the schematic diagram is only an example of the electronic device 1 and does not constitute a limitation of the electronic device 1, and that it may comprise more or less components than shown, or some components may be combined, or different components, e.g. the electronic device 1 may further comprise an input output device, a network access device, a bus, etc.

The Processor 13 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. The processor 13 is an operation core and a control center of the electronic device 1, and is connected to each part of the whole electronic device 1 by various interfaces and lines, and executes an operating system of the electronic device 1 and various installed application programs, program codes, and the like.

Illustratively, the computer readable instructions may be partitioned into one or more modules/units that are stored in the memory 12 and executed by the processor 13 to implement the present invention. The one or more modules/units may be a series of computer readable instruction segments capable of performing specific functions, which are used for describing the execution process of the computer readable instructions in the electronic device 1. For example, the computer readable instructions may be partitioned into an acquisition unit 110, a pre-processing unit 111, a processing unit 112, a generation unit 113, and an adjustment unit 114.

The memory 12 may be used for storing the computer readable instructions and/or modules, and the processor 13 implements various functions of the electronic device 1 by executing or executing the computer readable instructions and/or modules stored in the memory 12 and invoking data stored in the memory 12. The memory 12 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data created according to use of the electronic device, and the like. The memory 12 may include non-volatile and volatile memories, such as: a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other storage device.

The memory 12 may be an external memory and/or an internal memory of the electronic device 1. Further, the memory 12 may be a memory having a physical form, such as a memory stick, a TF Card (Trans-flash Card), or the like.

The integrated modules/units of the electronic device 1 may be stored in a computer-readable storage medium if they are implemented in the form of software functional units and sold or used as separate products. Based on such understanding, all or part of the flow of the method according to the above embodiments may be implemented by hardware that is configured to be instructed by computer readable instructions, which may be stored in a computer readable storage medium, and when the computer readable instructions are executed by a processor, the steps of the method embodiments may be implemented.

Wherein the computer readable instructions comprise computer readable instruction code which may be in source code form, object code form, an executable file or some intermediate form, and the like. The computer-readable medium may include: any entity or device capable of carrying said computer readable instruction code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM).

The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.

In conjunction with fig. 1, the memory 12 of the electronic device 1 stores computer-readable instructions to implement an audio noise reduction method, and the processor 13 executes the computer-readable instructions to implement:

Specifically, the processor 13 may refer to the description of the relevant steps in the embodiment corresponding to fig. 1 for a specific implementation method of the computer readable instructions, which is not described herein again.

In the embodiments provided in the present invention, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.

The computer readable storage medium has computer readable instructions stored thereon, wherein the computer readable instructions when executed by the processor 13 are configured to implement the steps of:

The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.

The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.

Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. The plurality of units or devices may also be implemented by one unit or device through software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.

Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims

1. An audio noise reduction method, comprising:

2. The audio noise reduction method of claim 1, wherein the obtaining audio samples comprises:

counting the audio time of the pure audio;

3. The method of claim 2, wherein the pre-processing the noisy audio to obtain spectral information comprises:

acquiring a preset moving window function;

4. The method of audio noise reduction according to claim 1, wherein the frequency-domain signal processing network comprises a gated neural network, a fully connected network, and an activation function, the gated neural network comprises a reset gate and an update gate, and the processing the spectral information based on the frequency-domain signal processing network to obtain the spectral mask feature corresponding to the spectral information comprises:

5. The method of claim 1, wherein the obtaining the time-frequency feature of the noisy audio according to the spectral information and the spectral mask feature comprises:

6. The method of claim 1, wherein the generating the predicted audio according to the time-frequency features and the time-frequency mask features comprises:

acquiring initial information of the restored signal on each time sequence;

and converting the prediction information to obtain the prediction audio.

7. The method of claim 1, wherein the adjusting the network parameters of the pre-learner based on the predicted audio and the clean audio to obtain a noise reduction model comprises:

refers to the second time domain information;

8. An audio noise reduction apparatus, comprising:

9. An electronic device, characterized in that the electronic device comprises:

a memory storing computer readable instructions; and

a processor executing computer readable instructions stored in the memory to implement the audio noise reduction method of any of claims 1 to 7.

10. A computer-readable storage medium characterized by: the computer-readable storage medium has stored therein computer-readable instructions that are executed by a processor in an electronic device to implement the audio noise reduction method of any of claims 1 to 7.