CN114220454B

CN114220454B - Audio noise reduction method, medium and electronic equipment

Info

Publication number: CN114220454B
Application number: CN202210084791.5A
Authority: CN
Inventors: 玄建永; 刘镇亿; 高海宽; 曹国智
Original assignee: Beijing Honor Device Co Ltd
Current assignee: Beijing Honor Device Co Ltd
Priority date: 2022-01-25
Filing date: 2022-01-25
Publication date: 2022-12-09
Anticipated expiration: 2042-01-25
Also published as: CN114220454A

Abstract

The application relates to the technical field of audio processing, in particular to an audio noise reduction method, medium and electronic equipment, comprising: acquiring an audio signal to be subjected to noise reduction, wherein the audio signal to be subjected to noise reduction comprises audio signals of M sound sources; carrying out sound source separation on the audio signals to be subjected to noise reduction by utilizing N sound source separation technologies to obtain N groups of sound source sound signals corresponding to the N sound source separation technologies, wherein each group of sound source sound signals comprises audio signals of M sound sources; selecting N sound source sound signals corresponding to each of M sound sources from the N groups of sound source sound signals; and taking the sound source sound signal with the signal intensity meeting the preset condition in the N sound source sound signals corresponding to each sound source as a target sound source sound signal of the sound source. Therefore, the sound source sound signal obtained based on the scheme has high signal quality, the noise-reduced audio signal has high quality, and the electronic equipment or other equipment can play the high-quality noise-reduced audio signal, so that the user experience is improved.

Description

Audio noise reduction method, medium and electronic equipment

Technical Field

The present application relates to the field of audio processing technologies, and in particular, to an audio denoising method, medium, and electronic device.

Background

At present, in order to realize a sound pickup function, a microphone is often arranged on an electronic device, and a user can pick up sound through the microphone. Because the environment in microphone place can appear the noise, the microphone is in the in-process of picking up sound, often can gather the sound signal who carries the noise.

In order to obtain a high-quality sound signal, the sound signal collected by the microphone needs to be denoised. The existing noise removal scheme is as follows: and separating the collected sound signals by adopting a sound source separation technology capable of denoising to obtain the sound source sound signals of each sound source.

However, in the process of separating the sound source signals, the electronic device may separate the sound source signals containing much noise from the collected sound signals, and the quality of the sound source signals is poor, which affects the user experience.

Disclosure of Invention

The embodiment of the application provides an audio noise reduction method, medium and electronic equipment.

In a first aspect, an embodiment of the present application provides an audio noise reduction method, which is applied to an electronic device, and the method includes:

acquiring an audio signal to be subjected to noise reduction, wherein the audio signal to be subjected to noise reduction comprises audio signals of M sound sources;

carrying out sound source separation on the audio signals to be subjected to noise reduction by utilizing N sound source separation technologies to obtain N groups of sound source sound signals corresponding to the N sound source separation technologies, wherein each group of sound source sound signals comprises audio signals of M sound sources;

selecting N sound source sound signals corresponding to each of the M sound sources from the N groups of sound source sound signals;

taking a sound source sound signal with signal intensity meeting a preset condition in N sound source sound signals corresponding to each sound source as a target sound source sound signal of the sound source; wherein N and M are integers greater than or equal to 2.

Based on the noise reduction scheme, the electronic equipment can obtain the sound signals of the sound source with lower noise signal intensity. So, because sound source sound signal quality is higher, the audio signal quality after making an uproar is higher, electronic equipment itself or other equipment can play the audio signal after making an uproar that the quality is higher, improve user experience.

In a possible implementation of the first aspect, the sound source sound signal whose signal strength satisfies the preset condition is:

after the sound source sound signals of the sound sources are arranged according to the sequence of the signal intensity from small to large, one sound source sound signal in the first G sound source sound signals is arranged, wherein G is a natural number.

In one possible implementation of the first aspect, the target sound source sound signal is a sound source sound signal with the lowest intensity among the first G sound source sound signals.

In one possible implementation of the first aspect, the N sound source separation techniques include any two or more of a beamforming technique, a blind source separation technique, and a sound source separation neural network.

In a possible implementation of the first aspect, the method further includes: and performing sound image conversion on a target sound source sound signal of the sound source based on the position and the strength of the sound source to obtain audio data, wherein the audio data is 5.1 sound channel sound data or 7.1 sound channel sound data.

In a possible implementation of the first aspect, the audio signal to be denoised is an audio signal acquired by the electronic device in real time.

It can be understood that the audio signal to be denoised is an audio signal acquired by the electronic device in real time during recording.

In one possible implementation of the first aspect, the electronic device includes three microphones, and the three microphones are respectively disposed at the top, the bottom, and the side of the electronic device.

The electronic device suitable for the audio noise reduction method provided by the embodiment of the application can be a mobile phone, a computer, a tablet, a smart wearable device and the like, but is not limited thereto.

In a second aspect, the present application provides a readable medium, on which instructions are stored, and when executed on an electronic device, the instructions cause the electronic device to perform the audio noise reduction method according to any one of the first aspect.

In a third aspect, an embodiment of the present application provides an electronic device, including: a memory for storing instructions for execution by one or more processors of the electronic device, and the processor, being one of the processors of the electronic device, for performing the audio noise reduction method of any of the first aspects.

In a fourth aspect, the present application provides a computer program product comprising computer programs/instructions, which when executed by a processor, implement the audio noise reduction method of any one of the first aspect.

Drawings

Fig. 1 illustrates an application scenario diagram of an audio noise reduction method according to some embodiments of the present application;

FIG. 2 illustrates a flow diagram of an audio noise reduction method, according to some embodiments of the present application;

FIG. 3 illustrates a schematic structural diagram of a handset 100 for 3D sound recording, according to some embodiments of the present application;

FIG. 4A is a schematic diagram illustrating an application scenario of an audio denoising method during playing 5.1 channel audio data according to some embodiments of the present application;

FIG. 4B illustrates a schematic diagram of a channel distribution for a 5.1 channel system, according to some embodiments of the present application;

FIG. 5 illustrates a flow diagram of an audio noise reduction method, according to some embodiments of the present application;

FIG. 6A shows a user interface diagram of a camera application A;

FIG. 6B shows a schematic view of a capture interface;

FIG. 6C is a schematic diagram of a user interface in which a 3D sound recording control E is located;

FIG. 7 illustrates a schematic diagram of a method for measuring the location of an acoustic source based on a time-of-flight difference technique, according to some embodiments of the present application;

fig. 8 shows a schematic structural diagram of a mobile phone 100 suitable for the audio noise reduction method provided by the embodiment of the present application, according to some embodiments of the present application.

Detailed Description

Illustrative embodiments of the present application include, but are not limited to, an audio noise reduction method, medium, and electronic device.

In order to solve the technical problem in the background art, an embodiment of the present application provides an audio denoising method, which specifically includes: the electronic equipment acquires the audio signals to be denoised through the microphone, and then performs sound source separation processing on the audio signals to be denoised by utilizing multiple sound source separation technologies so as to acquire multiple groups of processed sound signals, wherein each group of processed sound signals comprises multiple sound source sound signals, and the number of the sound source sound signals is related to the number of the sound sources. For example, when two persons are uttered, the number of sound sources is two, and the set of processed sound signals includes two sound source sound signals. Then, the electronic device determines the signal intensity of each sound source sound signal in each set of processed sound signals, and takes the sound source sound signal with the lowest signal intensity as the audio signal of the corresponding sound source. Therefore, the electronic equipment can acquire the sound signals of the sound sources with lower noise signal intensity corresponding to each sound source.

It can be understood that, the sound source sound signals obtained by separating the audio signals to be denoised by each sound source separation technique include original sound signals and noise signals, and then the sound source sound signal intensity includes original sound signal intensity and noise signal intensity. And because the original sound signal intensity is the same, the noise signal intensity is low if the sound source signal intensity is low.

The electronic device suitable for the audio noise reduction method provided by the embodiment of the application may be a mobile phone, a computer, a tablet, a camera, a smart wearable device, and the like, but is not limited thereto. The following describes an audio noise reduction method provided by an embodiment of the present application, taking an electronic device as a mobile phone as an example.

For example, fig. 1 illustrates an application scenario diagram of an audio noise reduction method according to some embodiments of the present application. As shown in fig. 1, a user 1 records a process of performing a program by performers (e.g., performer 1 and performer 2 shown in fig. 1) on a stage through a mobile phone 100, and during the process of recording a video, the mobile phone 100 synchronously turns on a microphone in the mobile phone 100 to perform sound collection so as to acquire live audio signals, and sound sources of the live sound collection are the performer 1 and the performer 2. Wherein, the field audio signal is the audio signal to be denoised.

When the mobile phone 100 acquires an audio signal to be denoised, two sound source separation technologies, namely a beam forming technology and a blind source separation technology, are used to perform sound source separation processing on the audio signal to be denoised, so as to obtain a sound source sound signal of the performer 1 and a sound source sound signal of the performer 2 corresponding to the beam forming technology, and a sound source sound signal of the performer 1 and a sound source sound signal of the performer 2 corresponding to the blind source separation technology.

The source sound signal of the performer 1 obtained by the beam forming technique is compared with the source sound signal of the performer 1 obtained by the blind source separation technique, and the source sound signal with the lowest signal intensity is used as the source sound signal of the performer 1. Similarly, the sound source sound signal of the performer 2 obtained by the beam forming technique is compared with the sound source sound signal of the performer 2 obtained by the blind source separation technique, and the sound source sound signal with the lowest signal intensity is used as the sound source sound signal of the performer 2.

Fig. 2 illustrates a flow diagram of an audio noise reduction method, according to some embodiments of the present application. As shown in fig. 2, the main execution body of the process is a mobile phone 100, and the process includes the following steps:

201: and acquiring an audio signal to be subjected to noise reduction.

For example, as shown in fig. 1, the user 1 obtains an audio signal to be noise-reduced, such as a live audio signal, through the cellular phone 100.

202: and carrying out sound source separation processing on the audio signals to be subjected to noise reduction by utilizing a multi-group sound source separation technology to obtain a plurality of groups of processed sound signals, wherein each group of processed sound signals comprises a plurality of sound source sound signals, and the number of the sound source sound signals is related to the number of the sound sources.

The present application takes the sound source separation of the audio signal to be denoised by the mobile phone 100 using two sound source separation technologies, namely the beam forming technology and the blind source separation technology, as an example, and explains the technical scheme of the present application.

For example, as shown in fig. 1, when the mobile phone 100 acquires an audio signal to be denoised, two sound source separation techniques, namely a beam forming technique and a blind source separation technique, are used to perform sound source separation processing on the audio signal to be denoised, so as to obtain a sound source sound signal of the performer 1 and a sound source sound signal of the performer 2 corresponding to the beam forming technique, and a sound source sound signal of the performer 1 and a sound source sound signal of the performer 2 corresponding to the blind source separation technique.

203: and determining the signal intensity of each sound source sound signal in each group of processed sound signals, and taking the sound source sound signal with the lowest signal intensity as the audio signal of the corresponding sound source.

For example, based on the foregoing example, the sound source sound signal of the performer 1 obtained by the beam forming technique is compared with the sound source sound signal of the performer 1 obtained by the blind source separation technique, and the sound source sound signal with the lowest signal intensity is used as the sound source sound signal of the performer 1. Similarly, the sound source sound signal of the performer 2 obtained by the beam forming technique is compared with the sound source sound signal of the performer 2 obtained by the blind source separation technique, and the sound source sound signal with the lowest signal intensity is used as the sound source sound signal of the performer 2.

It can be understood that, since the original sound signal of the sound source included in the separation result obtained by the mobile phone 100 using the sound source separation technique is constant, the larger the value of the separation result is, the higher the noise becomes. In order to reduce the noise in the separation result, the mobile phone 100 compares the magnitudes of the first sound source sound signal and the second sound source sound signal, and takes the smaller of the magnitudes as the target audio signal corresponding to each sound source. For example, taking the audio signal to be denoised received by the separation microphone 1 as an example, the first sound source sound signal of the performer 1 is obtained based on the beamforming technique, and the second sound source sound signal of the performer 2 is obtained based on the blind source separation technique.

A first sound source sound signal of the performer 1 is obtained based on a blind source separation technology, and a second sound source sound signal of the performer 2 is obtained based on the blind source separation technology.

If the first sound source sound signal of the performer 1 obtained based on the beam forming technique is smaller than the first sound source sound signal of the performer 1 obtained based on the blind source separation technique, the first sound source sound signal of the performer 1 obtained based on the beam forming technique is determined as the target audio signal of the performer 1.

The comparative formula is as follows:

wherein,

，

and

and the signal source represents an nth target signal source, i represents the ith frame of audio data of the target signal source, and k represents the time frequency point of the target signal source.

It is understood that, in some other embodiments, the mobile phone 100 arranges the sound source sound signals of the sound sources in order from small to large according to the signal intensity, and then arranges one sound source sound signal of the first G sound source sound signals, where G is a natural number.

The following specifically introduces the technical solution of the present application by taking the mobile phone 100 in fig. 1 for performing 3D recording, denoising the audio signal of the 3D recording by the mobile phone 100, performing 5.1 channel audio data conversion on the denoised audio signal, and encoding the 5.1 channel audio data as an example.

It is understood that the mainstream recording scheme in the industry is mono or stereo (e.g., left and right channels, front and back channels) recording, the recorded audio data is mono data or binaural data, and the range and presence of the sound field of the mono data or binaural data are poor.

With the increasing requirement of users on the degree of restoration of the acoustic environment, recording a 3D sound source will become a subsequent development direction in the industry. The 3D recording can be saved as 5.1 or 7.1 sound track with the audio data that the user recorded, and during audio playback, the sound field environment when recording the audio frequency can be more complete reduction, brings the experience of "being personally on the scene" for the user, promotes user experience.

Referring to fig. 1, a schematic structure of a mobile phone 100 for 3D sound recording will be described below by taking 3D sound recording of the mobile phone 100 as an example. Fig. 3 illustrates a schematic structural diagram of a handset 100 for 3D sound recording, according to some embodiments of the present application.

As shown in fig. 3, three microphones are provided in the mobile phone 100: a microphone 101, a microphone 102 and a microphone 103 for picking up sound signals.

The handset 100 has two opposing faces, one being the front face where the display is located and the other being the back face opposite the display. Wherein the microphone 101 and the microphone 103 are arranged at the top and the bottom, respectively. The microphone 102 is disposed on the back side. It is understood that the microphone 101, the microphone 102, and the microphone 103 are disposed in various ways, and the present embodiment is not limited thereto.

Specifically, the mobile phone 100 may obtain three sound signals collected by three microphones through the three microphones as shown in fig. 3. Since the distances between the three microphones and the sound source are different, and the time of arrival of the sound at each microphone is different, in the embodiment of the present disclosure, it can be considered that the amplitudes and phases of the three sound signals collected by the three microphones at the same time are different.

It is understood that in some other embodiments, the number of microphones in the handset 100 is not limited to 3, and may be greater than 3, as long as the number and the layout of microphone arrays are any and are suitable for the audio noise reduction method provided in the embodiments of the present application.

Fig. 4A is a schematic diagram illustrating an application scenario of an audio noise reduction method in a 5.1 channel audio data playing process according to some embodiments of the present application. As shown in fig. 4A, the application scenario includes a 5.1 channel system 200, and the 5.1 channel system 200 may include: a speaker 201 corresponding to a left channel L, a speaker 202 corresponding to a right channel R, a speaker 203 corresponding to a left surround channel LS, a speaker 204 corresponding to a right surround channel RS, a speaker 205 corresponding to a center channel C, and a speaker 206 corresponding to a subwoofer channel LFE.

In a home theater, the user 1 plays the above-mentioned high-quality noise-reduced audio data through the 5.1 channel system 200 connected to the mobile phone 100, thereby improving user experience.

Corresponding to fig. 4A, fig. 4B shows a schematic diagram of a channel distribution for a 5.1 channel system, according to some embodiments of the present application. As shown in fig. 4B, the 5.1 channel system may include: a center channel C, a left channel L, a right channel R, a left surround channel LS, a right surround channel RS, and a subwoofer channel LFE.

Assuming that the position of user 1 is at the center point 0 in fig. 4B and toward the position of the center channel C, each channel is equidistant from the center point of user 1 and in the same plane. The center channel C is directly in front of the facing direction of the user 1. The left sound channel L and the right sound channel R are respectively arranged at two sides of the central sound channel C, and respectively form 30-degree included angles with the facing direction of the user 1, and are symmetrically arranged. The left surround channel LS and the right surround channel RS are respectively located at the back of two sides of the facing direction of the user 1, and form an included angle of 100-120 degrees with the facing direction of the user 1, and are symmetrically arranged. Because the sense of direction of the heavy bass is weak, the placement position of the LFE has no strict requirement, and the angle presented between the LFE and the facing direction of the user 1 is different, which may cause the change of the bass signal in the sound signal of the 5.1 channels, and the user 1 may adjust the placement position of the LFE as required. The present disclosure does not limit the angle of the subwoofer channel LFE to the facing direction of the user 1, but is only exemplarily identified in fig. 4B.

One point to be explained is as follows: the included angle between each sound channel and the facing direction of the user 1 in the 5.1 sound channel system related to the embodiment of the present disclosure is only exemplary, in addition, the distance between each sound channel and the user 1 may be different, and the height of each sound channel may also be different, that is, the sound channel may not be in one plane, the user 1 may adjust itself as needed, and the difference of the placement position of each sound channel may cause the difference of the sound signals, which is not limited in the embodiment of the present disclosure.

Based on the foregoing description and the 3D recording performed by the mobile phone 100 in fig. 1, and the mobile phone 100 denoising the audio signal of the 3D recording by using two sound source separation technologies, namely, a beam forming technology and a blind source separation technology, and performing 5.1 channel audio data conversion on the denoised audio signal, and encoding the 5.1 channel audio data, the following specifically introduces the technical solution of the present application.

Fig. 5 illustrates a flow diagram of an audio noise reduction method, according to some embodiments of the present application. As shown in fig. 5, the main execution body of the process is a mobile phone 100, and the process includes the following steps:

501: and acquiring an audio signal to be subjected to noise reduction.

For example, as shown in fig. 1, the mobile phone 100 acquires an audio signal to be noise-reduced, such as a live audio signal to be noise-reduced, through a microphone. Specifically, the mobile phone 100 may obtain three sound signals collected by three microphones through the three microphones as shown in fig. 2. Since the distances between the three microphones and the sound source are different, and the time of arrival of sound at each microphone is different, in the embodiment of the present disclosure, it can be considered that the three paths of sound signals collected by the three microphones at the same time have the same frequency but different amplitudes and phases.

It is understood that, in some embodiments, before the mobile phone 100 executes the audio noise reduction method provided in the embodiments of the present application, the mobile phone 100 may turn on the audio noise reduction function provided in the embodiments of the present application. For example, fig. 6A to 6C are schematic diagrams illustrating changes of a user interface for starting a 3D recording function. It can be understood that the 3D recording function is turned on, that is, the audio noise reduction function provided by the embodiment of the present application is turned on. As shown in fig. 6A, the user first opens the shooting interface of the camera by clicking on the camera application a. As shown in fig. 6B, the mobile phone 100 displays a shooting interface B. The shooting interface includes a setting button C. The user clicks the setting button C to open the setting interface. As shown in fig. 6C, the cellular phone 100 displays a setting interface D. The setting interface comprises a 3D recording control E, and a user clicks the 3D recording control E, namely, the audio noise reduction function provided by the embodiment of the application is started. It can be understood that, if the layout of the microphone disposed in the mobile phone 100 is as shown in fig. 1, the user is prompted to "shoot 360 ° recording sound azimuth across the screen" below the 3D recording control E; a better sense of space is obtained. Thus, in order to obtain a better recording effect, the user can perform horizontal screen shooting by using the mobile phone 100 while recording a video and starting a 3D recording function.

502: and determining sound source information in the audio signal to be denoised according to a multi-sound-source positioning technology, wherein the sound source information comprises the number of sound sources and the position of the sound sources.

For example, as shown in fig. 1, the sound recorded by the mobile phone 100 during the video recording process may include the sound of performer 1 and performer 2, and thus the number of sound sources is 2. The sound source positions may be at the front left and front right of the handset 100.

The sound source location may be obtained by a time-of-flight difference technique, and fig. 7 shows a schematic diagram of a principle of measuring the sound source location based on the time-of-flight difference technique according to some embodiments of the present application. As shown in fig. 7, the microphone 101, the microphone 102, and the microphone 103 of the mobile phone 100 all acquire the sound signals of the performer 1, and the mobile phone 100 calculates the left-right position and the front-back position of the performer 1 according to the three sound signals of the performer 1 respectively received by the microphone 101, the microphone 102, and the microphone 103.

The following is a schematic diagram illustrating the principle of how the mobile phone 100 measures the left and right positions of the sound source based on the time-of-flight technique, taking the microphone 101 and the microphone 102 as an example for receiving the sound of the performer 1. As shown in fig. 7, the distance c between the microphone 101 and the microphone 102 is fixed. The projection length of the distance c in the direction of the straight line where the microphone 101 and the performer 1 are located is a distance a.

The mobile phone 100 obtains the distance a according to the product of the time difference between the sound signals received by the microphone 101 and the microphone 102 and the speed of the sound in the air, divides the distance a by the distance c to obtain a cosine value, and calculates an inverse cosine value to obtain an included angle β between the performer 1 and the microphone 101 and between the microphone 101 and the microphone 103.

It is to be appreciated that the multi-source localization technique can be, but is not limited to, high resolution spectral estimation based localization, phase transformation weighted controlled response power (SRP-phot) based localization, TDOA based localization, and the like.

It can be understood that, in some embodiments, the mobile phone 100 changes the audio signal to be denoised obtained in step 502 from a time domain signal to a frequency domain signal, and determines the sound source information in the audio signal to be denoised after the frequency domain change according to the multiple sound source positioning technology.

For example, the mobile phone 100 performs fourier transform processing on the audio signal to be noise-reduced obtained in step 502, and determines sound source information in the audio signal to be noise-reduced after fourier transform according to a multiple sound source localization technique.

503: performing sound source separation processing on an audio signal to be denoised based on sound source information and a beam forming technology to obtain a processed first sound signal, wherein the first sound signal comprises a plurality of first sound source sound signals; and carrying out sound source separation processing on the audio signals to be subjected to noise reduction based on sound source information and a blind source separation technology to obtain processed second sound signals, wherein the second sound signals comprise a plurality of second sound source sound signals, and the number of the first sound source sound signals and the number of the second sound source sound signals are related to the number of the sound sources.

Taking the audio signal to be noise-reduced received by the separation microphone 101 as an example, a first separation result of the performer 1 is obtained based on a beamforming technique, and a second separation result of the performer 1 is obtained based on a blind source separation technique.

The technical solution of the mobile phone 100 for performing sound source separation on the audio signals to be denoised received by the microphone 102 and the microphone 103 is the same, and is not described herein again.

A first separation result of performer 2 is obtained based on a beamforming technique and a second separation result of performer 2 is obtained based on a blind source separation technique.

504: identity information of a plurality of second source sound signals is determined.

Since the mobile phone 100 can only separate the first sound source sound signal by using the blind source separation technique, it is not possible to determine which sound source the separated first sound source sound signal originates from, i.e., it is not possible to determine the identity of the sound source of the separated first sound source sound signal. The mobile phone 100 separates the second sound source sound signal by using the beam forming technology, and can determine which sound source the separated second sound source sound signal is emitted from, that is, can determine the identity of the sound source of the separated first sound source sound signal. Therefore, the mobile phone 100 can locate the identity of the sound source corresponding to the first sound source sound signal with higher correlation as the identity of the second sound source sound signal according to the correlation between the second sound source sound signal separated by the blind source separation technique and the first sound source sound signal separated by the mobile phone 100 by the beam forming technique.

For example, taking the audio signal to be noise-reduced received by the separation microphone 101 as an example, the first sound source sound signal of the performer 1 is obtained based on the beamforming technique, and the second sound source sound signal of the performer 1 is obtained based on the blind source separation technique.

The method comprises the steps of obtaining a first sound source sound signal of an unknown sound source identity based on a blind source separation technology, and obtaining a second sound source sound signal of the unknown sound source identity based on the blind source separation technology. Since the correlation between the first sound source sound signal of the performer 1 obtained based on the beamforming technique and the first sound source sound signal of the unknown sound source identity obtained based on the blind source separation technique is high, the unknown sound source identity is determined to be the performer 1.

The handset 100 calculates the correlation of the sum by the covariance formula.

The covariance formula is as follows:

wherein,

being audio signals

And audio signals

The covariance of (a) of (b),

is as follows

A sound source, a sound source and a sound source,

is as follows

A first sound source

The audio data of the frame is transmitted,

is the k-th frequency domain.

The mobile phone 100 selects one path of signal with the maximum covariance value between the BF scheme and the BSS scheme, and determines the identity of the unknown sound source in the BSS scheme.

Is shown as

Is the identity of the n1 st sound source.

The technical solution of the mobile phone 100 for determining the sound source identity of the second sound source sound signal corresponding to the microphone 102 and the microphone 103 is the same, and is not described herein again.

505: and comparing the signal intensity of each first sound source sound signal with the signal intensity of each first sound source sound signal, and taking the sound source sound signal with the lowest signal intensity as the target audio signal of the corresponding sound source.

It can be understood that, since the original sound signal of the sound source included in the separation result obtained by the mobile phone 100 using the sound source separation technique is constant, the larger the value of the separation result is, the higher the noise becomes. In order to reduce the noise in the separation result, the mobile phone 100 compares the magnitudes of the first sound source sound signal and the second sound source sound signal, and takes the smaller of the magnitudes as the target audio signal corresponding to each sound source. For example, taking the audio signal to be noise-reduced received by the separation microphone 1 as an example, a first sound source sound signal of the performer 1 is obtained based on a beam forming technique, and a second sound source sound signal of the performer 2 is obtained based on a blind source separation technique.

The comparative formula is as follows:

wherein,

，

and

506: and converting the sound image of each target audio signal to obtain 5.1-channel sound data.

It can be understood that the target audio signal corresponding to each sound source is subjected to sound image conversion to obtain the intensity of the audio to be played by each loudspeaker in the 5.1 channel system, i.e. to obtain 5.1 channel sound data. Specifically, the mobile phone 100 converts the target audio signal into 5.1 channel sound data according to the sound source position and intensity of the target audio signal.

For example, the lip translation formula is as follows:

wherein theta is the target direction of sound source localization,

for the speaker (speaker) direction playing 5.1 channels.

Corresponding to the gain of a 5.1 channel speaker.

For example, if the current sound source is positioned 30 ° to the left as shown in fig. 4B, the speakers of L correspond to

The gain of the other speakers is 1 and 0, and the speaker 201 in fig. 3 corresponding to the left channel L plays 5.1 channels of sound data.

It is understood that in other embodiments, the handset 100 may convert the target audio signal into 7.1 channel sound data according to the sound source position and intensity of the target audio signal.

507: and coding the 5.1 channel sound data to obtain sound coding data.

It can be understood that the sound encoding data can be obtained by encoding the 5.1-channel sound data by waveform encoding, parametric encoding, hybrid encoding, and the like.

It can be understood that the above describes the technical solution of the present application by taking the sound source separation of the audio signal to be denoised by the mobile phone 100 using two sound source separation techniques, namely the beam forming technique and the blind source separation technique.

In other embodiments, the mobile phone 100 may further adopt the audio noise reduction method provided in the embodiment of the present application, and perform sound source separation on the audio signal to be noise reduced by using three sound source separation technologies, namely, a beam forming technology, a blind source separation technology, and a neural network model, to reduce noise of multiple sound source signals in the audio signal to be noise reduced.

Fig. 8 shows a schematic structural diagram of a mobile phone 100 suitable for the audio noise reduction method provided by the embodiment of the present application, according to some embodiments of the present application. As shown in fig. 8, the mobile phone 100 may include a processor 110, a power module 140, a memory 180, a mobile communication module 130, a wireless communication module 120, a sensor module 190, an audio module 150, a camera 170, an interface module 160, keys 101, a display screen 102, and the like.

It is to be understood that the illustrated structure of the embodiment of the present invention does not specifically limit the mobile phone 100. In other embodiments of the present application, the handset 100 may include more or fewer components than shown, or combine certain components, or split certain components, or a different arrangement of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

The processor 110 may include one or more Processing units, for example, a Processing module or a Processing circuit that may include a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a Digital Signal Processor (DSP), a Microprocessor (MCU), an Artificial Intelligence (AI) processor, or a Programmable Gate Array (FPGA), etc. The different processing units may be separate devices or may be integrated into one or more processors. A memory unit may be provided in the processor 110 for storing instructions and data. In some embodiments, the storage unit in processor 110 is cache 180. The processor 110 may execute the audio noise reduction method provided by the embodiment of the present application.

The power module 140 may include a power supply, power management components, and the like. The power source may be a battery. The power management component is used for managing the charging of the power supply and the power supply of the power supply to other modules. In some embodiments, the power management component includes a charge management module and a power management module. The charging management module is used for receiving charging input from the charger; the power management module is used to connect a power source, the charging management module and the processor 110. The power management module receives power and/or charge management module input and provides power to the processor 110, the display 102, the camera 170, and the wireless communication module 120.

The mobile communication module 130 may include, but is not limited to, an antenna, a power amplifier, a filter, an LNA (Low noise amplifier), and the like. The mobile communication module 130 can provide a solution including wireless communication of 2G/3G/4G/5G, etc. applied to the mobile phone 100. The mobile communication module 130 may receive electromagnetic waves from the antenna, filter, amplify, etc. the received electromagnetic waves, and transmit the electromagnetic waves to the modem processor for demodulation. The mobile communication module 130 may also amplify the signal modulated by the modem processor, and convert the signal into electromagnetic wave through the antenna to radiate the electromagnetic wave. In some embodiments, at least some of the functional modules of the mobile communication module 130 may be disposed in the processor 110. In some embodiments, at least some of the functional modules of the mobile communication module 130 may be disposed in the same device as at least some of the modules of the processor 110. The wireless communication technology may include global system for mobile communications (GSM), general Packet Radio Service (GPRS), code Division Multiple Access (CDMA), wideband Code Division Multiple Access (WCDMA), time division code division multiple access (time-division multiple access, TD-SCDMA), long term evolution (long term evolution, LTE), bluetooth (bluetooth, BT), global Navigation Satellite System (GNSS), wireless Local Area Network (WLAN), near Field Communication (NFC), frequency modulation (frequency modulation, modulation and/or modulation (FM), infrared communication (IR), and so on. The GNSS may include a Global Positioning System (GPS), a global navigation satellite system (GLONASS), a beidou satellite navigation system (BDS), a quasi-zenith satellite system (QZSS), and/or a Satellite Based Augmentation System (SBAS).

The wireless communication module 120 may include an antenna, and implement transceiving of electromagnetic waves via the antenna. The wireless communication module 120 may provide a solution for wireless communication applied to the mobile phone 100, including Wireless Local Area Networks (WLANs) (e.g., wireless fidelity (Wi-Fi) networks), bluetooth (BT), global Navigation Satellite System (GNSS), frequency Modulation (FM), near Field Communication (NFC), infrared (IR), and the like. The handset 100 may communicate with a network and other devices via wireless communication techniques.

In some embodiments, the mobile communication module 130 and the wireless communication module 120 of the handset 100 may also be located in the same module.

The display screen 102 is used for displaying human-computer interaction interfaces, images, videos and the like. The display screen 102 includes a display panel. The display panel may adopt a Liquid Crystal Display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (active-matrix organic light-emitting diode, AMOLED), a flexible light-emitting diode (FLED), a MiniLed, a Micro-led, a quantum dot light-emitting diode (QLED), and the like.

The sensor module 190 may include a proximity light sensor, a pressure sensor, a gyroscope sensor, an air pressure sensor, a magnetic sensor, an acceleration sensor, a distance sensor, a fingerprint sensor, a temperature sensor, a touch sensor, an ambient light sensor, a bone conduction sensor, and the like.

The audio module 150 is used to convert digital audio information into an analog audio signal output or convert an analog audio input into a digital audio signal. The audio module 150 may also be used to encode and decode audio signals. In some embodiments, the audio module 150 may be disposed in the processor 110, or some functional modules of the audio module 150 may be disposed in the processor 110. In some embodiments, audio module 150 may include speakers, an earpiece, a microphone, and a headphone interface.

The camera 170 is used to capture still images or video. The object generates an optical image through the lens and projects the optical image to the photosensitive element. The photosensitive element converts the optical Signal into an electrical Signal, and then transmits the electrical Signal to an Image Signal Processing (ISP) to be converted into a digital Image Signal. The mobile phone 100 may implement a shooting function through an ISP, a camera 170, a video codec, a Graphics Processing Unit (GPU), a display screen 102, an application processor, and the like.

The interface module 160 includes an external memory interface, a Universal Serial Bus (USB) interface, a Subscriber Identity Module (SIM) card interface, and the like. The external memory interface may be used to connect an external memory card, such as a Micro SD card, to extend the storage capability of the mobile phone 100. The external memory card communicates with the processor 110 through an external memory interface to implement a data storage function. The usb interface is used for communication between the mobile phone 100 and other electronic devices. The SIM card interface is used to communicate with a SIM card installed in the handset 10010, such as to read a phone number stored in the SIM card or to write a phone number into the SIM card.

In some embodiments, the handset 100 also includes keys 101, motors, indicators, and the like. The keys 101 may include a volume key, an on/off key, and the like. The motor is used to generate a vibration effect to the mobile phone 100, for example, when the mobile phone 100 is called, to prompt the user to answer the call of the mobile phone 100. The indicators may include laser indicators, radio frequency indicators, LED indicators, and the like.

Embodiments of the mechanisms disclosed herein may be implemented in hardware, software, firmware, or a combination of these implementations. Embodiments of the application may be implemented as computer programs or program code executing on programmable systems comprising at least one processor, a storage system (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device.

Program code may be applied to input instructions to perform the functions described herein and generate output information. The output information may be applied to one or more output devices in a known manner. For purposes of this application, a processing system includes any system having a processor such as, for example, a Digital Signal Processor (DSP), a microcontroller, an Application Specific Integrated Circuit (ASIC), or a microprocessor.

The program code may be implemented in a high level procedural or object oriented programming language to communicate with a processing system. The program code can also be implemented in assembly or machine language, if desired. Indeed, the mechanisms described in this application are not limited in scope to any particular programming language. In any case, the language may be a compiled or interpreted language.

In some cases, the disclosed embodiments may be implemented in hardware, firmware, software, or any combination thereof. The disclosed embodiments may also be implemented as instructions carried by or stored on one or more transitory or non-transitory machine-readable (e.g., computer-readable) storage media, which may be read and executed by one or more processors. For example, the instructions may be distributed via a network or via other computer readable media. Thus, a machine-readable medium may include any mechanism for storing or transmitting information in a form readable by a machine (e.g., a computer), including, but not limited to, floppy diskettes, optical disks, read-only memories (CD-ROMs), magneto-optical disks, read-only memories (ROMs), random Access Memories (RAMs), erasable programmable read-only memories (EPROMs), electrically erasable programmable read-only memories (EEPROMs), magnetic or optical cards, flash memory, or a tangible machine-readable memory for transmitting information (e.g., carrier waves, infrared digital signals, etc.) using the internet in an electrical, optical, acoustical or other form of propagated signal. Thus, a machine-readable medium includes any type of machine-readable medium suitable for storing or transmitting electronic instructions or information in a form readable by a machine (e.g., a computer).

In the drawings, some features of the structures or methods may be shown in a particular arrangement and/or order. However, it is to be understood that such specific arrangement and/or ordering may not be required. Rather, in some embodiments, the features may be arranged in a manner and/or order different from that shown in the illustrative figures. In addition, the inclusion of a structural or methodological feature in a particular figure is not meant to imply that such feature is required in all embodiments, and in some embodiments may not be included or may be combined with other features.

It should be noted that, in the embodiments of the apparatuses in the present application, each unit/module is a logical unit/module, and physically, one logical unit/module may be one physical unit/module, or may be a part of one physical unit/module, and may also be implemented by a combination of multiple physical units/modules, where the physical implementation manner of the logical unit/module itself is not the most important, and the combination of the functions implemented by the logical unit/module is the key to solve the technical problem provided by the present application. Furthermore, in order to highlight the innovative part of the present application, the above-mentioned embodiments of the apparatus of the present application do not introduce units/modules that are not so closely related to solve the technical problems proposed by the present application, which does not indicate that there are no other units/modules in the above-mentioned embodiments of the apparatus.

It is noted that, in the examples and descriptions of this patent, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, the use of the verb "comprise a" to define an element does not exclude the presence of another, same element in a process, method, article, or apparatus that comprises the element.

While the present application has been shown and described with reference to certain preferred embodiments thereof, it will be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the spirit and scope of the present application.

Claims

1. An audio noise reduction method applied to an electronic device is characterized by comprising the following steps:

performing sound source separation on the audio signals to be denoised by using N sound source separation technologies to obtain N groups of sound source sound signals corresponding to the N sound source separation technologies, wherein each group of sound source sound signals comprises audio signals of M sound sources, and the N sound source separation technologies comprise a beam forming technology and a blind source separation technology, or comprise the beam forming technology, the blind source separation technology and a sound source separation neural network;

selecting N sound source sound signals corresponding to identity information to which each of the M sound sources belongs from the N groups of sound source sound signals, wherein each sound source sound signal determined by the beam forming technology corresponds to one identity information, and the identity information of each sound source sound signal determined by the blind source separation technology is determined according to the identity information corresponding to each sound source sound signal determined by the beam forming technology;

2. The method according to claim 1, wherein the sound source sound signals with signal strength satisfying the preset condition are:

3. The method according to claim 2, wherein the target sound source sound signal is the lowest intensity sound source sound signal among the previous G sound source sound signals.

4. The method of claim 1, further comprising: and performing sound image conversion on a target sound source sound signal of the sound source based on the position and the strength of the sound source to obtain audio data, wherein the audio data is 5.1 sound channel sound data or 7.1 sound channel sound data.

5. The method of claim 1, wherein the audio signal to be denoised is an audio signal acquired by the electronic device in real time.

6. The method of claim 1, wherein the electronic device comprises three microphones disposed at a top, a bottom, and a side of the electronic device, respectively.

7. A readable medium having stored thereon instructions that, when executed on an electronic device, cause the electronic device to perform the audio noise reduction method of any of claims 1 to 6.

8. An electronic device, comprising:

a memory for storing instructions for execution by one or more processors of an electronic device, and the processor, being one of the processors of the electronic device, for performing the audio noise reduction method of any of claims 1 to 6.