CN116312586A

CN116312586A - Noise reduction method, device, terminal and storage medium

Info

Publication number: CN116312586A
Application number: CN202111471147.5A
Authority: CN
Inventors: 谢冠宏; 邱士嘉; 张立朋; 傅光辉; 徐曌
Original assignee: Wanmo Acoustics Co ltd
Current assignee: Wanmo Acoustics Co ltd
Priority date: 2021-12-03
Filing date: 2021-12-03
Publication date: 2023-06-23

Abstract

The application is applicable to the technical field of audio processing and provides a noise reduction method, a device, a terminal and a storage medium. The noise reduction method specifically comprises the following steps: acquiring an audio signal acquired by acquiring environmental sound; acquiring amplitude values corresponding to a plurality of preset frequency points of the frequency spectrum of the audio signal in a full frequency band, and averaging the amplitude values corresponding to the preset frequency points to obtain a full frequency band amplitude average value corresponding to the frequency spectrum of the audio signal in the full frequency band; and according to the difference of the range of the full-band amplitude mean value, confirming the category of the environmental sound as one of quiet noise, middle noise and noisy noise, and correspondingly adopting a comfortable noise reduction mode, an equilibrium noise reduction mode or a deep noise reduction mode to reduce noise of the audio signal. The embodiment of the application can reduce the operation power consumption and improve the noise reduction effect.

Description

Noise reduction method, device, terminal and storage medium

Technical Field

The application belongs to the technical field of audio processing, and particularly relates to a noise reduction method, a device, a terminal and a storage medium.

Background

The current noise scene recognition mainstream algorithm adopts a voice recognition algorithm based on a deep neural network and a hidden Markov model (DNN-HMM), and the noise reduction requirements under different scenes are met by windowing and framing the acquired environmental sound audio signals, extracting and analyzing spectral features, determining scene types, and switching to a noise reduction mode matched with the current scene for noise reduction.

Because the earphone applies DNN-HMM scene recognition algorithm and has the problems of long running time, high power consumption and the like, the earphone in the actual market usually uses the same noise reduction mode to actively reduce noise. Because the noise reduction filter bank is used for fixation, the noise reduction effects are different in different scenes, and the noise reduction effect in partial scenes is obviously deteriorated

Therefore, there is a need for a noise reduction method that reduces the operation power consumption while improving the noise reduction effect.

Disclosure of Invention

The embodiment of the application provides a noise reduction method, a device, a terminal and a storage medium, which can reduce operation power consumption and improve noise reduction effect.

A first aspect of an embodiment of the present application provides a noise reduction method, including:

acquiring an audio signal acquired by acquiring environmental sound;

acquiring amplitude values corresponding to a plurality of preset frequency points of the frequency spectrum of the audio signal in a full frequency band, and averaging the amplitude values corresponding to the preset frequency points to obtain a full frequency band amplitude average value corresponding to the frequency spectrum of the audio signal in the full frequency band;

if the full-band amplitude mean value is smaller than or equal to a first threshold value, confirming the category of the environmental sound as quiet noise, and adopting a comfortable noise reduction mode to reduce noise of the audio signal;

If the full-band amplitude mean value is larger than the first threshold value and smaller than or equal to the second threshold value, determining the category of the environmental sound as intermediate noise, and adopting an equalization noise reduction mode to reduce noise of the audio signal;

and if the full-band amplitude mean value is larger than the second threshold value, confirming the category of the environmental sound as noisy noise, and adopting a deep noise reduction mode to reduce noise of the audio signal.

A noise reduction device provided in a second aspect of the present application includes:

the acquisition unit is used for acquiring an audio signal acquired by the environmental sound;

the computing unit is used for obtaining the amplitude values corresponding to a plurality of preset frequency points of the frequency spectrum of the audio signal in the full frequency band, and averaging the amplitude values corresponding to the preset frequency points to obtain the full frequency band amplitude average value corresponding to the frequency spectrum of the audio signal in the full frequency band;

the comfortable noise reduction unit is used for confirming the category of the environmental sound as quiet noise if the full-band amplitude average value is smaller than or equal to a first threshold value, and adopting a comfortable noise reduction mode to reduce noise of the audio signal;

the equalization noise reduction unit is used for confirming the category of the environmental sound as intermediate noise and adopting an equalization noise reduction mode to reduce noise of the audio signal if the full-band amplitude average value is larger than the first threshold value and smaller than or equal to the second threshold value;

And the depth noise reduction unit is used for confirming the category of the environmental sound as noisy noise if the full-band amplitude average value is larger than the second threshold value, and adopting a depth noise reduction mode to reduce noise of the audio signal.

A third aspect of the embodiments of the present application provides a terminal, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the above method when executing the computer program.

A fourth aspect of the present embodiments provides a computer readable storage medium storing a computer program which, when executed by a processor, implements the steps of the above method.

A fifth aspect of the embodiments provides a computer program product which, when run on a terminal, causes the terminal to perform the steps of the method.

In the embodiment of the application, the audio signal acquired by acquiring the environmental sound is acquired, the amplitude corresponding to a plurality of preset frequency points of the frequency spectrum of the audio signal in a full frequency band is acquired, then the amplitude corresponding to the preset frequency points is averaged to obtain the full frequency band amplitude mean value corresponding to the frequency spectrum of the audio signal, and then according to the difference of the range of the full frequency band amplitude mean value, the category of the environmental sound is confirmed to be one of quiet noise, middle noise and noisy noise, and the comfortable noise reduction mode, the balanced noise reduction mode or the deep noise reduction mode is correspondingly adopted to reduce the noise of the audio signal.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required for the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic flowchart of an implementation of a noise reduction method according to an embodiment of the present application;

FIG. 2 is a time and frequency domain plot of an office scene ambient sound provided by an embodiment of the present application;

fig. 3 is a time domain and frequency domain diagram of in-vehicle scene environmental sound provided in an embodiment of the present application;

FIG. 4 is a time and frequency domain plot of ambient sound of a kindergarten scene provided by an embodiment of the present application;

fig. 5 is a time domain and frequency domain diagram of road scene environmental sound provided in an embodiment of the present application;

fig. 6 is a time domain and frequency domain diagram of a field scene environmental sound provided by an embodiment of the present application;

FIG. 7 is a time and frequency domain plot of borehole scene ambient sound provided by an embodiment of the present application;

FIG. 8 is a graph of the results of the amplitude analysis of scheme one provided in the embodiments of the present application;

FIG. 9 is a graph of the results of the amplitude analysis of scheme II provided in the embodiments of the present application;

FIG. 10 is a graph of the results of a mean square error MES analysis provided in an embodiment of the present application;

FIG. 11 is a graph of the results of amplitude analysis at the same energy level provided in the examples of the present application;

fig. 12 is a schematic structural diagram of a noise reduction device according to an embodiment of the present disclosure;

fig. 13 is a schematic structural diagram of a terminal provided in an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, are intended to be protected herein.

Therefore, the noise reduction method is provided, the ambient sound field Jing Leibie can be determined based on the ambient sound spectrum amplitude mean value, and the noise reduction is performed by adopting the corresponding noise reduction mode, so that complex audio signal processing like a DNN-HMM scene recognition algorithm is not needed, the operation power consumption in the noise reduction process is reduced, the noise reduction efficiency is improved, different noise reduction modes can be compatible, and the noise reduction effect is improved.

In order to illustrate the technical solution of the present application, the following description is made by specific examples.

Fig. 1 shows a schematic implementation flow chart of a noise reduction method provided in an embodiment of the present application, where the method may be applied to a terminal, and may be suitable for situations where the noise reduction effect needs to be improved while the operation power consumption is reduced.

In some embodiments of the present application, the terminal may be a device with audio processing capability, such as an earphone, a sound device, a mobile phone, or a smart watch, or may be a smart device connected to a device with audio processing capability.

Specifically, the above noise reduction method may include the following steps S101 to S105.

Step S101, acquiring an audio signal acquired by acquiring ambient sound.

In the embodiment of the application, the terminal can pick up sound of the environment through the microphone installed by the terminal, and the audio signal is obtained.

Specifically, the terminal may start to acquire an audio signal and perform noise reduction after a specific condition is satisfied. For example, the terminal may acquire an audio signal acquired by collecting the ambient sound in response to entering a call state, and reduce the noise of the ambient sound, so that a user of the terminal may have a clearer call effect. For another example, the terminal may acquire an audio signal acquired by collecting the ambient sound in response to entering the audio playing state, and reduce the noise of the ambient sound, so that a user of the terminal may have a clearer audio listening effect.

Step S102, the amplitude values corresponding to a plurality of preset frequency points of the frequency spectrum of the audio signal in the full frequency band are obtained, and the amplitude values corresponding to the preset frequency points are averaged to obtain the full frequency band amplitude average value corresponding to the frequency spectrum of the audio signal in the full frequency band.

In the embodiment of the application, the terminal can analyze the audio signal to obtain the frequency spectrum of the audio signal. The corresponding amplitude values of the frequency spectrum of the audio signal on a plurality of preset frequency points of the full frequency band are extracted, the corresponding amplitude values of the preset frequency points are averaged, the full frequency band amplitude average value can be obtained, and the obtained full frequency band amplitude average value can reflect the overall level of the energy of the environmental sound signal.

Wherein, the unit of the frequency spectrum amplitude is the sound pressure type and the sound intensity unit dBSPL.

In the embodiment of the present application, the frequency range pointed by the full frequency band and the selected preset frequency point on the full frequency band may be set according to actual situations, and the selected frequency point should be capable of reflecting the amplitude variation of the frequency spectrum of each frequency band.

In some embodiments of the present application, the terminal may divide the full frequency band into a plurality of sub-frequency bands according to the set frequency band range, and select a plurality of frequency points on each sub-frequency band as preset frequency points respectively. At this time, the frequency spectrum amplitude A corresponding to a plurality of preset frequency points of the full frequency band can be used ₁ 、A ₁ 、...、A _N Finding the full band amplitude mean

Wherein N is the number of preset frequency points.

Specifically, the terminal may set the frequency range from 50Hz to 5KHz as a frequency range corresponding to the full frequency band, divide the full frequency band into a low frequency band from 50Hz to 250Hz, an intermediate frequency band from 250Hz to 1KHz and a high frequency band from 1KHz to 5KHz, select a plurality of frequency points on each subband as preset frequency points, obtain amplitudes corresponding to the preset frequency points, and calculate an average value of the amplitudes of the full frequency band

In some embodiments of the present application, the terminal may select at least two frequency points in the high frequency band, the intermediate frequency band, and the low frequency band, that is, the plurality of preset frequency points include at least two frequency points in the high frequency band, at least two frequency points in the intermediate frequency band, and at least two frequency points in the low frequency band, and further, the terminal has at least two corresponding frequency points characterizing features in the high frequency band, the intermediate frequency band, and the low frequency band.

Step S103, if the full-band amplitude mean value is smaller than or equal to the first threshold value, the category of the environmental sound is confirmed as quiet noise, and the comfortable noise reduction mode is adopted to reduce noise of the audio signal.

Step S104, if the full-band amplitude mean value is larger than the first threshold value and smaller than or equal to the second threshold value, the class of the environmental sound is confirmed as middle noise, and the balanced noise reduction mode is adopted to reduce noise of the audio signal.

Step S105, if the full-band amplitude mean value is greater than the second threshold value, the class of the environmental sound is confirmed as noisy noise, and the audio signal is noise reduced by adopting a deep noise reduction mode.

In the embodiment of the application, the terminal may set a plurality of thresholds according to experience or experimental results, judge the range of the full-band amplitude mean according to the thresholds, determine the category of the environmental sound, and perform noise reduction on the audio signal by adopting a noise reduction mode corresponding to the category of the environmental sound.

Specifically, in some embodiments of the present application, if the full band amplitude average

If the total energy of the ambient sound signal is lower than or equal to the first threshold th_low, the terminal may confirm the type of the ambient sound as quiet noise, and reduce the noise of the audio signal by using a comfort noise reduction mode. At this time, the category of the environmental sound may specifically be the environmental sound of an office scene and the environmental sound of a field scene.

If the full frequency band amplitude value is the average value

And the signal energy of the environment sound signal is larger than the first threshold value TH_Low and smaller than or equal to the second threshold value TH_high, which indicates that the whole energy of the environment sound signal is at a medium level, the terminal can confirm the type of the environment sound as medium noise, and the balanced noise reduction mode is adopted to reduce the noise of the audio signal. In this case, the type of the environmental sound may specifically be environmental sound of a kindergarten scene.

If the full frequency band amplitude value is the average value

If the environmental sound signal energy is larger than the second threshold value th_high, which indicates that the overall energy of the environmental sound signal is at a higher level, the terminal can confirm the category of the environmental sound as noisy noise, and noise reduction is carried out on the audio signal by adopting a deep noise reduction mode. At this time, the type of the environmental sound may specifically be the environmental sound of the scene in the vehicle, the environmental sound of the road scene, and the environmental sound of the drilling scene.

The noise reduction filter sets corresponding to different noise reduction modes are different, and the different noise reduction filter sets can be used for generating destructive interference signals for canceling noise in environmental sound. The energy of the corresponding destructive interference signal in the balanced noise reduction mode is greater than the energy of the corresponding destructive interference signal in the comfortable noise reduction mode and less than the energy of the corresponding destructive interference signal in the deep noise reduction mode.

In order to further subdivide the category of the environmental sound, the terminal may divide the full frequency band into a plurality of sub-frequency bands with non-overlapping frequencies, where each sub-frequency band includes one or more preset sub-frequency points. The preset sub-frequency point may be a reselected frequency point, or may be a preset sub-frequency point obtained by dividing the preset frequency point according to a sub-frequency band where the preset frequency point is located.

Correspondingly, the flow of the noise reduction method can further specifically include: acquiring the amplitude corresponding to each preset sub-frequency point, and respectively averaging the amplitude corresponding to the preset sub-frequency point on each sub-frequency band to obtain the average value of the amplitude of the frequency spectrum of the audio signal in the sub-frequency band corresponding to each sub-frequency band; and determining the category of the environmental sound according to the full-band amplitude average value and the sub-band amplitude average value corresponding to each sub-band, and adopting a noise reduction mode corresponding to the category of the environmental sound to reduce noise of the audio signal.

In particular, the full frequency band may be divided into a high frequency band, an intermediate frequency band, and a low frequency band. The high frequency band comprises one or more high frequency points, the intermediate frequency band comprises one or more intermediate frequency points, and the low frequency band comprises one or more low frequency points.

For low-frequency point spectrum amplitude A _{1 low frequency} 、A _{2 low frequency} 、...、A _{N low frequency} Averaging to obtain low frequency mean value

Wherein N is _{Low frequency} The number of low-frequency points. If the low frequency mean value->

Below a preset threshold value th_low _{Low frequency} When the energy characteristics of the low-frequency part of the environmental sound are weak; if mean->

Above the threshold TH_high _{Low frequency} And when the energy characteristic of the low-frequency part of the environmental sound is obvious. In this case, it is possible to distinguish the environmental sound whose energy is mainly at a low frequency.

Intermediate frequency point spectrum amplitude A _{1 intermediate frequency} 、A _{2 intermediate frequency} 、...、A _{N intermediate frequency} Averaging may be performed to obtain a medium frequency average

Wherein N is _{Intermediate frequency} The number of the intermediate frequency points. If the medium frequency is +.>

Below a preset threshold value th_low _{Intermediate frequency} When the energy characteristics of the medium frequency part of the environmental sound are weak; if mean->

Above the threshold TH_high _{Intermediate frequency} And when the energy characteristics of the intermediate frequency part of the ambient sound are obvious. In this case, it is possible to distinguish the ambient sound whose energy is mainly the intermediate frequency. The environmental sound with energy mainly of middle and low frequency can be distinguished by combining the low frequency points.

For high frequency point spectrum amplitude A _{1 high frequency} 、A _{2 high frequency} 、...、A _{N high frequency} Averaging to obtain a high frequency mean value

Wherein,,N _{High Frequency} the number of high-frequency points. If the high frequency mean value->

Below a preset threshold value th_low _{High Frequency} When the energy characteristics of the high-frequency part of the environmental sound are weak; if mean->

Above the threshold TH_high _{High Frequency} And when the energy characteristic of the high-frequency part of the environmental sound is obvious. In this case, it is possible to distinguish the environmental sound whose energy is mainly high frequency.

Specifically, if the full band amplitude is the average

Is greater than a second threshold TH_high and meets a low frequency mean +.>

Less than or equal to the fifth threshold TH_high _{Low frequency} And high frequency mean->

Less than or equal to the sixth threshold TH_high _{High Frequency} The terminal may confirm the category of the ambient sound as first noisy noise and noise-reduce the audio signal using the first deep noise reduction mode. In this case, the type of the environmental sound may specifically be the environmental sound of the scene in the vehicle or the environmental sound of the road scene.

If the full frequency band amplitude value is the average value

Greater than a second threshold TH_high, low frequency mean +.>

Greater than a fifth threshold TH_high _{Low frequency} And high frequency mean->

Greater than a sixth threshold TH_high _{High Frequency} The terminal may determine the type of the ambient sound as second noisy noise and apply a second deep noise reduction mode to reduce noise of the audio signal. At this time, the type of the environmental sound may specifically be the environmental sound of the drilling scene.

The noise reduction filter sets used in the first and second deep noise reduction modes may be the same or different.

In practical applications, the overall level of the environmental acoustic energy in a part of the scene may be in a critical value in a certain range, that is, the average value of the full-band amplitude approaches to the first threshold th_low or the second threshold th_high. In order to make classification more accurate, in some embodiments of the present application, the terminal may further calculate a mean square error of magnitudes corresponding to a plurality of preset frequency points of a frequency spectrum of the audio signal in a full frequency band, determine a category of the environmental sound according to the full frequency band magnitude mean value, the sub-frequency band magnitude mean value corresponding to each sub-frequency band, and the mean square error, and then use a noise reduction mode corresponding to the category of the environmental sound to reduce noise of the audio signal.

Specifically, the terminal may utilize amplitude values a corresponding to multiple preset frequency points in the full frequency band ₁ 、A ₁ 、...、A _N Determining a mean square error MSE to evaluate the degree of dispersion, wherein the mean square error

If the mean square error MSE is lower than the fourth threshold thl_mse, it indicates that the corresponding energy among the frequency points of the environmental sound spectrum is relatively balanced, for example, the environmental sound is mixed with noise mainly comprising human sound and medium and low frequencies, and the energy of each frequency band presents the environmental sound with balanced distribution. If the mean square error MSE is greater than or equal to the fourth threshold thl_mse, it indicates that the corresponding energy difference between the frequency points of the environmental sound spectrum is large, for example, the environmental sound is an environmental sound mainly including low-frequency noise.

Specifically, if the full band amplitude is the average

Greater than a first threshold value TH_Low and less than or equal to a second threshold value TH_high, each sub-bandAnd if the absolute value of the difference value between the corresponding sub-band amplitude average values is smaller than a third threshold value and the mean square error MSE is smaller than a fourth threshold value THL_MSE, determining the category of the environmental sound as first intermediate noise, and adopting a first equalization noise reduction mode to reduce noise of the audio signal.

The absolute value of the difference value between every two sub-band amplitude mean values corresponding to each sub-band can refer to the absolute value of the difference value between every two high-frequency mean values, every two intermediate-frequency mean values and every two low-frequency mean values, and when the absolute value of the difference value is smaller than a third threshold value, the numerical value approach of the high-frequency mean values, the intermediate-frequency mean values and the low-frequency mean values is indicated.

The specific values of the third threshold, the fourth threshold, the fifth threshold, the sixth threshold, and the like may be set according to experience or experimental results.

In some embodiments of the present application, the energy exhibited by ambient sound at different times may be different due to the same scene. The terminal can collect the audio signal at a preset collection frequency, and select a corresponding noise reduction mode to reduce noise in real time according to the average value of the amplitude of the frequency spectrum corresponding to the current environmental sound.

In the embodiment of the application, the audio signal acquired by acquiring the environmental sound is acquired, the amplitude corresponding to a plurality of preset frequency points of the frequency spectrum of the audio signal in a full frequency band is acquired, then the amplitude corresponding to the plurality of preset frequency points is averaged to obtain the full frequency band amplitude mean value corresponding to the frequency spectrum of the audio signal, and then according to the difference of the range of the full frequency band amplitude mean value, the category of the environmental sound is confirmed to be one of quiet noise, middle noise and noisy noise, and a comfortable noise reduction mode, an equalizing noise reduction mode or a deep noise reduction mode is correspondingly adopted for noise reduction respectively.

To illustrate the effect achieved by the embodiments of the present application, two sets of protocols are provided for experimental verification.

Firstly, respectively acquiring audio signals of six types of scenes, and analyzing time domain and frequency domain diagrams of environmental sound under each type of scene. Fig. 2 to 7 correspond to time-domain and frequency-domain diagrams of environmental sounds in an office scene, an in-vehicle scene, a kindergarten scene, a road scene, a field scene, and a drilling scene, respectively.

As can be seen from fig. 2 to 7, the environmental sound spectrum energy of the office scene is small, the energy is mainly concentrated at medium and low frequencies below 600Hz, and the spectrum characteristic can be characterized by the spectrum characteristic of a more uniform selected frequency point in the range from 50Hz to 600 Hz. The environmental sound spectrum energy of the scene in the vehicle is concentrated at low frequencies below 250Hz, and the spectrum characteristic of the environmental sound spectrum energy can be characterized by the spectrum characteristic of a relatively uniform selected frequency point in the range from 50Hz to 250 Hz. The kindergarten scene has wider sound spectrum frequency band, has certain uniform distribution below 3KHz, mainly mixes the influence of different human voices, and has certain background noise, and the spectrum characteristics can be represented by the spectrum characteristics of more uniformly selected frequency points within the range from 50Hz to 3 KHz. The environmental sound spectrum energy of the road scene is mainly concentrated below 300Hz, but the spectrum energy is raised to a certain extent near 3KHz, the reason is not easily influenced by friction sound, whistling sound and the like generated by the motor vehicle, and the spectrum characteristics of the road scene can be represented by the spectrum characteristics of frequency points which are uniformly selected in each frequency band within the range from 50Hz to 3 KHz. The environmental sound spectrum energy of the field scene is weaker and is mainly distributed in a low-frequency region below 200Hz, and the spectrum characteristic of the field scene can be represented by the spectrum characteristic of a relatively uniform selected frequency point in the range of 50Hz to 200 Hz. The environmental sound composition of the drilling scene is complex, the spectrum energy is mainly concentrated in a low-frequency part below 300Hz, the spectrum energy of 3KHz and 5KHz frequency points is large, and the spectrum characteristics can be represented by the spectrum characteristics of frequency points which are uniformly selected in each frequency band within the range of 50Hz to 5 KHz.

Scheme one: the terminal selects frequency points with frequencies of 70Hz, 100Hz, 150Hz, 200Hz, 400Hz, 500Hz, 600Hz, 1000Hz, 2000Hz and 3000Hz as preset frequency points, wherein the low frequency points on the low frequency band are respectively: 70Hz, 100Hz, 150Hz, 200Hz; the intermediate frequency points on the intermediate frequency band are respectively: 400Hz, 500Hz, 600Hz; the high frequency points on the high frequency band are respectively: 1000Hz, 2000Hz, 3000Hz. Table 1 and fig. 8 corresponding to table 1 can be obtained by calculating the full-band amplitude mean value, the high-frequency mean value, the medium-frequency mean value, the low-frequency mean value, and the mean square error, respectively.

In fig. 8, type 1 indicates an office scene, type 2 indicates an in-vehicle scene, type 3 indicates a kindergarten scene, type 4 indicates a road scene, type 6 indicates a field scene, and type 7 indicates a drill scene.

Table 1 scheme one amplitude analysis results statistics table

At this time, if the full-band amplitude average value

If the noise is smaller than or equal to the first threshold th_low, the terminal may confirm the type of the ambient sound as quiet noise and reduce noise of the audio signal by using a comfort noise reduction mode. At this time, the category of the environmental sound may specifically be the environmental sound of an office scene and the environmental sound of a field scene.

If the full frequency band amplitude value is the average value

Is greater than a first threshold value TH_Low and is less than or equal to a second threshold value TH_high; or on the basis, the absolute value of the difference value between the sub-band amplitude mean values corresponding to the sub-bands is smaller than the third threshold value, and the mean square error MSE is smaller than the fourth threshold value THL_MSE, the terminal can confirm the category of the environmental sound as middle noise, and the balanced noise reduction mode is adopted to reduce noise of the audio signal. In this case, the type of the environmental sound may specifically be environmental sound of a kindergarten scene.

If the full frequency band amplitude value is the average value

If the noise is larger than the second threshold value TH_high, the terminal can confirm the type of the environmental sound as noisy noise, and the noise of the audio signal is reduced by adopting a deep noise reduction mode. At this time, the type of the environmental sound may specifically be the environmental sound of the scene in the vehicle and the road fieldAmbient sound of the scene, and ambient sound of the drilling scene.

From the above data, the scheme provided by the application can well distinguish three different scenes of quiet noise, intermediate noise and noisy noise.

Scheme II: the terminal selects frequency points with frequencies of 70Hz, 100Hz, 150Hz, 200Hz, 400Hz, 500Hz, 600Hz, 1000Hz, 2000Hz, 3000Hz, 4000Hz and 5000Hz as preset frequency points, wherein the low frequency points on the low frequency band are respectively: 70Hz, 100Hz, 150Hz, 200Hz; the intermediate frequency points on the intermediate frequency band are respectively: 400Hz, 500Hz, 600Hz; the high frequency points on the high frequency band are respectively: 1000Hz, 2000Hz, 3000Hz, 4000Hz, 5000Hz. Table 2 and fig. 9 corresponding to table 2 can be obtained by calculating the full-band amplitude mean value, the high-frequency mean value, the medium-frequency mean value, the low-frequency mean value, and the mean square error, respectively.

In fig. 9 to 11, like fig. 8, type 1 indicates an office scene, type 2 indicates an in-vehicle scene, type 3 indicates a kindergarten scene, type 4 indicates a road scene, type 6 indicates a field scene, and type 7 indicates a drill scene.

Table 2 scheme two amplitude analysis results statistics

At this time, if the full-band amplitude average value

If the full frequency band amplitude value is the average value

Is greater than a first threshold value TH_Low and is less than or equal to a second threshold value TH_high; or on the basis of it, eachThe absolute value of the difference value between the sub-band amplitude mean values corresponding to the sub-bands is smaller than the third threshold value, and the mean square error MSE is smaller than the fourth threshold value THL_MSE, the terminal can confirm the type of the environmental sound as middle noise, and the balanced noise reduction mode is adopted to reduce noise of the audio signal. In this case, the type of the environmental sound may specifically be environmental sound of a kindergarten scene.

If the full frequency band amplitude value is the average value

Is greater than a second threshold TH_high and meets a low frequency mean +. >

If the full frequency band amplitude value is the average value

Greater than a second threshold TH_high, low frequency mean +.>

To better represent the difference of the mean square error MSE values, fig. 10 shows a graph of the result of the mean square error analysis calculated using two schemes.

From the above data, the scheme provided by the application can well distinguish four different scenes of quiet noise, middle noise, first noisy noise and second noisy noise.

As shown in table 3 and fig. 11 corresponding to table 3, the accuracy of the ambient sound classification may decrease as the scene type increases. For example, when the energy levels are consistent, the environmental sound of the scene in the vehicle and the environmental sound of the road scene behave more similarly in amplitude; the ambient sound of the office scene and the ambient sound of the field scene behave more similarly in magnitude.

TABLE 3 statistics of amplitude analysis results at the same energy level

Therefore, in practical application, based on the requirements of precision, the number of scenes to be divided and noise reduction efficiency, one of the two schemes can be selected, or the number of preset frequency points can be further increased or reduced, and the mode of selecting the preset frequency points is changed to be further analyzed so as to adapt to different requirements.

It should be noted that, for the sake of simplicity of description, the foregoing method embodiments are all expressed as a series of combinations of actions, but it should be understood by those skilled in the art that the present application is not limited by the order of actions described, as some steps may be performed in other order according to the present application.

Fig. 12 is a schematic structural diagram of a noise reduction device 1200 according to an embodiment of the present application, where the noise reduction device 1200 is configured on a terminal.

Specifically, the noise reduction device 1200 may include:

an acquisition unit 1201, configured to acquire an audio signal acquired by acquiring an ambient sound;

a calculating unit 1202, configured to obtain magnitudes corresponding to a plurality of preset frequency points of a frequency spectrum of the audio signal in a full frequency band, and average the magnitudes corresponding to the plurality of preset frequency points to obtain a full frequency band magnitude average value corresponding to the frequency spectrum of the audio signal in the full frequency band;

A comfort noise reduction unit 1203 configured to confirm the type of the ambient sound as quiet noise if the full-band amplitude average value is less than or equal to a first threshold value, and perform noise reduction on the audio signal by using a comfort noise reduction mode;

the equalizing noise reduction unit 1204 is configured to determine the type of the environmental sound as intermediate noise if the full-band amplitude average value is greater than the first threshold value and less than or equal to the second threshold value, and perform noise reduction on the audio signal by adopting an equalizing noise reduction mode;

and a depth noise reduction unit 1205, configured to confirm the type of the ambient sound as noisy noise if the full-band amplitude average value is greater than the second threshold value, and perform noise reduction on the audio signal by using a depth noise reduction mode.

In some embodiments of the present application, the full frequency band may include a plurality of frequency sub-bands with non-overlapping frequencies, where each frequency sub-band includes one or more preset frequency sub-points; the noise reduction device 1200 may be specifically configured to: acquiring the amplitude corresponding to each preset sub-frequency point, and respectively averaging the amplitude corresponding to the preset sub-frequency point on each sub-frequency band to obtain the average value of the amplitude of the frequency spectrum of the audio signal in the sub-frequency band corresponding to each sub-frequency band; and determining the category of the environmental sound according to the full-band amplitude average value and the sub-band amplitude average value corresponding to each sub-band, and adopting a noise reduction mode corresponding to the category of the environmental sound to reduce noise of the audio signal.

In some embodiments of the present application, the noise reduction device 1200 may be specifically used for: calculating the mean square error of the amplitude corresponding to a plurality of preset frequency points of the frequency spectrum of the audio signal in the full frequency band; and determining the category of the environmental sound according to the full-band amplitude mean value, the sub-band amplitude mean value corresponding to each sub-band and the mean square error, and adopting a noise reduction mode corresponding to the category of the environmental sound to reduce noise of the audio signal.

In some embodiments of the present application, the noise reduction device 1200 may be specifically used for: and if the full-band amplitude mean value is larger than a first threshold value and smaller than or equal to a second threshold value, the absolute value of the difference value between every two sub-band amplitude mean values corresponding to the sub-bands is smaller than a third threshold value, and the mean square error is smaller than a fourth threshold value, determining the category of the environmental sound as first intermediate noise, and adopting a first equalization noise reduction mode to reduce noise of the audio signal.

In some embodiments of the present application, the sub-band may include a high-frequency band, an intermediate-frequency band, and a low-frequency band, where the average value of the magnitudes of the sub-bands corresponding to the high-frequency band is a high-frequency average value, the average value of the magnitudes of the sub-bands corresponding to the intermediate-frequency band is an intermediate-frequency average value, and the average value of the magnitudes of the sub-bands corresponding to the low-frequency band is a low-frequency average value; the noise reduction device 1200 may be specifically configured to: if the full-band amplitude mean value is greater than a second threshold value and at least one of the low-frequency mean value being less than or equal to a fifth threshold value and the high-frequency mean value being less than or equal to a sixth threshold value is met, determining the category of the environmental sound as first noisy noise, and adopting a first deep noise reduction mode to reduce noise of the audio signal.

In some embodiments of the present application, the noise reduction device 1200 may be specifically used for: and if the full-band amplitude average value is larger than the second threshold value, the low-frequency average value is larger than the fifth threshold value and the high-frequency average value is larger than the sixth threshold value, determining the category of the environmental sound as second noisy noise, and adopting a second deep noise reduction mode to reduce noise of the audio signal.

In some embodiments of the present application, the plurality of preset frequency points may include at least two frequency points of a high frequency band, at least two frequency points of a medium frequency band, and at least two frequency points of a low frequency band.

It should be noted that, for convenience and brevity, the specific working process of the noise reduction device 1200 may refer to the corresponding process of the method described in fig. 1 to 11, which is not described herein again.

Fig. 13 is a schematic diagram of a terminal according to an embodiment of the present application. The terminal 13 may include: a processor 130, a memory 131 and a computer program 132, such as a noise reduction program, stored in the memory 131 and executable on the processor 130. The processor 130, when executing the computer program 132, implements the steps of the various noise reduction method embodiments described above, such as steps S101 to S105 shown in fig. 1. Alternatively, the processor 130 may perform the functions of the modules/units in the above-described apparatus embodiments when executing the computer program 132, for example, the acquisition unit 1201, the calculation unit 1202, the comfort noise reduction unit 1203, the equalization noise reduction unit 1204, and the depth noise reduction unit 1205 shown in fig. 12.

The computer program may be divided into one or more modules/units, which are stored in the memory 131 and executed by the processor 130 to complete the present application. The one or more modules/units may be a series of computer program instruction segments capable of performing the specified functions, which instruction segments describe the execution of the computer program in the terminal.

For example, the computer program may be split into: the device comprises an acquisition unit, a calculation unit, a comfort noise reduction unit, an equalization noise reduction unit and a depth noise reduction unit. The specific functions of each unit are as follows: the acquisition unit is used for acquiring an audio signal acquired by the environmental sound; the computing unit is used for obtaining the amplitude values corresponding to a plurality of preset frequency points of the frequency spectrum of the audio signal in the full frequency band, and averaging the amplitude values corresponding to the preset frequency points to obtain the full frequency band amplitude average value corresponding to the frequency spectrum of the audio signal in the full frequency band; the comfortable noise reduction unit is used for confirming the category of the environmental sound as quiet noise if the full-band amplitude average value is smaller than or equal to a first threshold value, and adopting a comfortable noise reduction mode to reduce noise of the audio signal; the equalization noise reduction unit is used for confirming the category of the environmental sound as intermediate noise and adopting an equalization noise reduction mode to reduce noise of the audio signal if the full-band amplitude average value is larger than the first threshold value and smaller than or equal to the second threshold value; and the depth noise reduction unit is used for confirming the category of the environmental sound as noisy noise if the full-band amplitude average value is larger than the second threshold value, and adopting a depth noise reduction mode to reduce noise of the audio signal.

The terminal may include, but is not limited to, a processor 130, a memory 131. It will be appreciated by those skilled in the art that fig. 13 is merely an example of a terminal and is not intended to be limiting, and that more or fewer components than shown may be included, or certain components may be combined, or different components may be included, for example, the terminal may also include input and output devices, network access devices, buses, etc.

The processor 130 may be a central processing unit (Central Processing Unit, CPU), other general purpose processor, digital signal processor (Digital Signal Processor, DSP), application specific integrated circuit (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 131 may be an internal storage unit of the terminal, such as a hard disk or a memory of the terminal. The memory 131 may also be an external storage device of the terminal, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the terminal. Further, the memory 131 may also include both an internal storage unit and an external storage device of the terminal. The memory 131 is used for storing the computer program and other programs and data required by the terminal. The memory 131 may also be used to temporarily store data that has been output or is to be output.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus/terminal and method may be implemented in other manners. For example, the apparatus/terminal embodiments described above are merely illustrative, e.g., the division of the modules or units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated modules/units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of each method embodiment described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the computer readable medium contains content that can be appropriately scaled according to the requirements of jurisdictions in which such content is subject to legislation and patent practice, such as in certain jurisdictions in which such content is subject to legislation and patent practice, the computer readable medium does not include electrical carrier signals and telecommunication signals.

The above embodiments are only for illustrating the technical solution of the present application, and are not limiting; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims

1. A method of noise reduction, comprising:

acquiring an audio signal acquired by acquiring environmental sound;

2. The noise reduction method according to claim 1, wherein the full frequency band includes a plurality of sub-frequency bands with frequencies not overlapping each other, and each of the sub-frequency bands includes one or more preset sub-frequency points;

the noise reduction method further comprises the following steps:

acquiring the amplitude corresponding to each preset sub-frequency point, and respectively averaging the amplitude corresponding to the preset sub-frequency point on each sub-frequency band to obtain the average value of the amplitude of the frequency spectrum of the audio signal in the sub-frequency band corresponding to each sub-frequency band;

and determining the category of the environmental sound according to the full-band amplitude average value and the sub-band amplitude average value corresponding to each sub-band, and adopting a noise reduction mode corresponding to the category of the environmental sound to reduce noise of the audio signal.

3. The method of noise reduction according to claim 2, wherein determining the category of the ambient sound from the full-band amplitude average value and the sub-band amplitude average value corresponding to each of the sub-bands, and noise-reducing the audio signal using a noise reduction mode corresponding to the category of the ambient sound, comprises:

Calculating the mean square error of the amplitude corresponding to a plurality of preset frequency points of the frequency spectrum of the audio signal in the full frequency band;

and determining the category of the environmental sound according to the full-band amplitude mean value, the sub-band amplitude mean value corresponding to each sub-band and the mean square error, and adopting a noise reduction mode corresponding to the category of the environmental sound to reduce noise of the audio signal.

4. The method of noise reduction according to claim 3, wherein determining the category of the ambient sound based on the full-band amplitude mean, the sub-band amplitude mean corresponding to each of the sub-bands, and the mean square error, and noise reducing the audio signal using a noise reduction mode corresponding to the category of the ambient sound, comprises:

and if the full-band amplitude mean value is larger than a first threshold value and smaller than or equal to a second threshold value, the absolute value of the difference value between every two sub-band amplitude mean values corresponding to the sub-bands is smaller than a third threshold value, and the mean square error is smaller than a fourth threshold value, determining the category of the environmental sound as first intermediate noise, and adopting a first equalization noise reduction mode to reduce noise of the audio signal.

5. The noise reduction method according to claim 2, wherein the sub-bands include a high-frequency band, an intermediate-frequency band, and a low-frequency band, the average value of the magnitudes of the sub-bands corresponding to the high-frequency band is a high-frequency average value, the average value of the magnitudes of the sub-bands corresponding to the intermediate-frequency band is an intermediate-frequency average value, and the average value of the magnitudes of the sub-bands corresponding to the low-frequency band is a low-frequency average value;

The step of determining the category of the environmental sound according to the full-band amplitude average value and the sub-band amplitude average value corresponding to each sub-band, and adopting a noise reduction mode corresponding to the category of the environmental sound to reduce noise of the audio signal, includes:

if the full-band amplitude mean value is greater than a second threshold value and at least one of the low-frequency mean value being less than or equal to a fifth threshold value and the high-frequency mean value being less than or equal to a sixth threshold value is met, determining the category of the environmental sound as first noisy noise, and adopting a first deep noise reduction mode to reduce noise of the audio signal.

6. The method of noise reduction according to claim 5, wherein the determining the category of the ambient sound according to the full-band amplitude average value and the sub-band amplitude average value corresponding to each sub-band, and the noise reduction is performed on the audio signal using a noise reduction mode corresponding to the category of the ambient sound, further comprises:

and if the full-band amplitude average value is larger than the second threshold value, the low-frequency average value is larger than the fifth threshold value and the high-frequency average value is larger than the sixth threshold value, determining the category of the environmental sound as second noisy noise, and adopting a second deep noise reduction mode to reduce noise of the audio signal.

7. The noise reduction method according to any one of claims 1 to 6, wherein the plurality of preset frequency points includes at least two frequency points of a high frequency band, at least two frequency points of a medium frequency band, and at least two frequency points of a low frequency band.

8. A noise reduction device, comprising:

9. A terminal comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 7 when the computer program is executed.

10. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the method according to any one of claims 1 to 7.