CN113658579A

CN113658579A - Audio signal processing method and device, electronic equipment and readable storage medium

Info

Publication number: CN113658579A
Application number: CN202111112448.9A
Authority: CN
Inventors: 张娟
Original assignee: Chongqing Unisinsight Technology Co Ltd
Current assignee: Chongqing Unisinsight Technology Co Ltd
Priority date: 2021-09-18
Filing date: 2021-09-18
Publication date: 2021-11-16
Anticipated expiration: 2041-09-18
Also published as: CN113658579B

Abstract

The application provides an audio signal processing method and device, electronic equipment and a readable storage medium, and relates to the technical field of computers. The method comprises the following steps: obtaining a first audio signal and a second audio signal which are obtained by respectively carrying out audio acquisition on two microphones, wherein the two microphones comprise a first microphone and a second microphone, and the axes of the first microphone and the second microphone are parallel and have the same direction; and obtaining a target audio signal by superposition according to the first audio signal and the second audio signal. Therefore, the target audio signal with high signal-to-noise ratio can be obtained, and the quality of the audio signal is improved.

Description

Audio signal processing method and device, electronic equipment and readable storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to an audio signal processing method and apparatus, an electronic device, and a readable storage medium.

Background

When the microphone performs audio acquisition in an environment with noise, not only effective signals but also noise can be acquired. The existence of noise can affect the quality of the voice signal and reduce the signal-to-noise ratio of the voice signal. Therefore, how to improve the snr of the speech signal becomes a technical problem that needs to be solved by those skilled in the art.

Disclosure of Invention

The embodiment of the application provides an audio signal processing method and device, electronic equipment and a readable storage medium, which can obtain a target audio signal with high signal-to-noise ratio by superposition based on two paths of audio signals acquired by two microphones with parallel axes and the same direction, thereby improving the quality of the audio signal.

The embodiment of the application can be realized as follows:

in a first aspect, an embodiment of the present application provides an audio signal processing method, including:

obtaining a first audio signal and a second audio signal which are obtained by respectively carrying out audio acquisition on two microphones, wherein the two microphones comprise a first microphone and a second microphone, and the axes of the first microphone and the second microphone are parallel and have the same direction;

and obtaining a target audio signal by superposition according to the first audio signal and the second audio signal.

In a second aspect, an embodiment of the present application provides an audio signal processing apparatus, including:

the signal acquisition module is used for acquiring a first audio signal and a second audio signal which are acquired by respectively carrying out audio acquisition on two microphones, wherein the two microphones comprise a first microphone and a second microphone, and the axes of the first microphone and the second microphone are parallel and have the same direction;

and the processing module is used for obtaining a target audio signal through superposition according to the first audio signal and the second audio signal.

In a third aspect, an embodiment of the present application provides an electronic device, which includes a processor and a memory, where the memory stores machine executable instructions that can be executed by the processor, and the processor can execute the machine executable instructions to implement the audio signal processing method described in the foregoing embodiment.

In a fourth aspect, the present application provides a readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the audio signal processing method according to the foregoing embodiments.

According to the audio signal processing method, the audio signal processing device, the electronic device and the readable storage medium, the target audio signal is obtained by superposing a first audio signal and a second audio signal which are obtained by two microphones with axes parallel and the same direction. Therefore, the signal-to-noise ratio of the audio signal is effectively improved from the angle of enhancing the effective signal energy, and the purpose of improving the definition of the effective signal in the audio signal is achieved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.

Fig. 1 is a block diagram of an electronic device according to an embodiment of the present disclosure;

fig. 2 is a schematic flowchart of an audio signal processing method according to an embodiment of the present application;

FIG. 3 is a flowchart illustrating the sub-steps included in step S200 of FIG. 2;

FIG. 4 is a schematic flow chart of aligning two audio signals;

FIG. 5 is a schematic flow chart of sub-steps included in sub-step S220 of FIG. 3;

FIG. 6 is a schematic flow chart of the substeps involved in substep S223 of FIG. 5;

fig. 7 is a block diagram of an audio signal processing apparatus according to an embodiment of the present application.

Icon: 100-an electronic device; 110-a memory; 120-a processor; 130-a communication unit; 200-audio signal processing means; 210-a signal obtaining module; 220-processing module.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.

It is noted that relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The field of application of the dual microphones is in two directions. One of the directions is an active noise reduction direction, for example, the active noise reduction direction can be applied to a mobile phone, which requires that the gains or sensitivities of two microphones in a dual-microphone are different. The other direction is a sound source localization or speech enhancement direction. Speech enhancement is a technique for extracting a useful speech signal from a noise background to suppress and reduce noise interference when the speech signal is interfered or even submerged by various noises. That is, speech enhancement is used to extract as clean as possible original speech from noisy speech. The voice enhancement is mapped to the practical application of the microphone, and the pickup distance is increased equivalently.

The current way of speech enhancement with dual microphones is: the noise is suppressed from the speech signal containing the noise, and a purer and clearer effective signal is obtained. For example, noise is suppressed by performing a difference operation on two noisy speech signals obtained based on two microphones. In this way, the directions of the two microphones are different. For example, a microphone is respectively arranged on the back of the lower end and the back of the upper end of the mobile phone, the microphone on the back of the upper end is mainly used for collecting noise, the microphone on the lower end is used for collecting voice and noise, and the noise can be suppressed by performing differential operation on the two paths of signals. It follows that the main direction of research in this approach is how to perform speech enhancement by noise reduction algorithms.

At present, in the related technologies, noise reduction is generally performed by using methods such as differential operation of two paths of data acquired by two microphones, that is, the used algorithm is mainly the direction of noise reduction. However, based on the two microphones, the voice quality is improved only by noise reduction through a signal difference method, and such direction and thought are single, and the advantages of the two microphones cannot be fully exerted.

Based on this, embodiments of the present application provide an audio signal processing method, an apparatus, an electronic device, and a readable storage medium, which can fully utilize the advantages of two microphones, improve the signal-to-noise ratio, and improve the quality of a speech signal.

Some embodiments of the present application will be described in detail below with reference to the accompanying drawings. The embodiments described below and the features of the embodiments can be combined with each other without conflict.

Referring to fig. 1, fig. 1 is a block diagram of an electronic device 100 according to an embodiment of the present disclosure. The electronic device 100 may be, but is not limited to, a smart phone, a computer, a server, etc. The electronic device 100 may include a memory 110, a processor 120, and a communication unit 130. The elements of the memory 110, the processor 120 and the communication unit 130 are electrically connected to each other directly or indirectly to realize data transmission or interaction. For example, the components may be electrically connected to each other via one or more communication buses or signal lines.

The memory 110 is used to store programs or data. The Memory 110 may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Read-Only Memory (EEPROM), and the like.

The processor 120 is used to read/write data or programs stored in the memory 110 and perform corresponding functions. For example, the memory 110 stores the audio signal processing apparatus 200, and the audio signal processing apparatus 200 includes at least one software functional module which can be stored in the memory 110 in the form of software or firmware (firmware). The processor 120 executes various functional applications and data processing, i.e., implements the audio signal processing method in the embodiment of the present application, by running software programs and modules stored in the memory 110, such as the audio signal processing apparatus 200 in the embodiment of the present application.

The communication unit 130 is used for establishing a communication connection between the electronic apparatus 100 and another communication terminal via a network, and for transceiving data via the network.

It should be understood that the structure shown in fig. 1 is only a schematic structural diagram of the electronic device 100, and the electronic device 100 may also include more or fewer components than shown in fig. 1, or have a different configuration than shown in fig. 1. The components shown in fig. 1 may be implemented in hardware, software, or a combination thereof.

Referring to fig. 2, fig. 2 is a flowchart illustrating an audio signal processing method according to an embodiment of the present disclosure. The method may be applied to the electronic device 100 described above. The following describes a specific flow of the audio signal processing method in detail. In this embodiment, the method may include step S100 and step S200.

Step S100, a first audio signal and a second audio signal obtained by respectively performing audio acquisition by two microphones are obtained.

In this embodiment, audio acquisition may be performed by an audio acquisition device, so as to obtain the first audio signal and the second audio signal. The audio acquisition equipment is a double-microphone device and comprises a first microphone and a second microphone.

The first microphone and the second microphone can simultaneously acquire audio from the same sound source (i.e., the same pickup target), so that the first microphone is used to acquire a first audio signal and the second audio signal is used to acquire a second audio signal. It can be understood that the sound source is in a reasonable range that can be collected by the audio collecting device, and may not be at an infinite distance or at an extreme condition that the volume is invalid and the like, so that a target audio signal with clear effective signals (i.e. voice signals) can be obtained subsequently. For example, the sound collecting distance of one microphone of the audio collecting device is 5m, and when the audio collecting device is used for collecting audio, the distance between the audio collecting device and a sound source is 8 m.

The microphone is directional. After the first microphone and the second microphone are fixed, the axes of the fixed positions of the first microphone and the second microphone are parallel and the directions of the fixed positions of the first microphone and the second microphone are consistent, so that a target audio signal can be obtained subsequently. Optionally, the distance between the first microphone and the second microphone is within 10-20 cm. It is understood that the distance between the two microphones can be set according to specific conditions, as long as the target audio signal with clear effective signals can be obtained.

As an alternative embodiment, the first microphone and the second microphone may be microphones with the same acquisition gain and other related parameters. Alternatively, the first microphone and the second microphone may be the same two microphones.

The audio acquisition device and the electronic device 100 may be the same device or different devices, and may be determined by actual conditions. In a case that the audio capture device is not the same as the electronic device 100, after the audio electronic device obtains the first audio signal and the second audio signal by using the two microphones, the audio electronic device may send the first audio signal and the second audio signal to the electronic device 100, so that the electronic device 100 obtains the first audio signal and the second audio signal.

And step S200, obtaining a target audio signal through superposition according to the first audio signal and the second audio signal.

In this embodiment, the first audio signal and the second audio signal are signals obtained by simultaneously performing audio acquisition on the same sound source by using two microphones, so that the first audio signal and the second audio signal have stronger correlation before, that is, the first audio signal and the second audio signal have the same voice, and the correlation is strong. And aiming at the first audio signal and the second audio signal, a target audio signal with improved signal-to-noise ratio can be obtained through superposition. For example, the first audio signal and the second audio signal are directly superimposed, so that the human voice in the two audio signals is superimposed; or the first audio signal and the second audio signal are preprocessed and then overlapped, so that the voices in the two paths of audio signals are overlapped. Therefore, the energy of the effective signal can be enhanced by superposing the voice in the two paths of audio signals of the two microphones, so that the signal-to-noise ratio of the audio signals is effectively improved from the angle of enhancing the energy of the effective signal, and the aim of improving the quality of the audio signals is fulfilled. Compared with the conventional mode of improving the signal to noise ratio from the noise reduction angle, the mode has the advantages of innovation and high efficiency.

Optionally, in this embodiment, the first microphone and the second microphone use the same capture frequency for audio capture, so as to ensure a subsequent superposition effect.

Wherein, the first collecting frequency used by the first microphone and the second microphone can be a preset collecting frequency. The first capture frequency may be set according to the actual audio playback frequency. The first capture frequency may be greater than or equal to the actual audio playback frequency.

For example, if the down-sampling is not performed subsequently, the first sampling frequency may be set as the actual audio playing frequency. For example, if the actual audio playing frequency is 8000HZ, the first capturing frequency may be set to 8000 HZ.

If the down-sampling is performed subsequently, the first acquisition frequency may be set to be greater than the actual audio playing frequency, so that after the down-sampling is performed, the frequency of the signal can meet the actual audio playing frequency requirement.

Optionally, as an optional embodiment, in a case where the first audio signal and the second audio signal are obtained, the first audio signal and the second audio signal may be directly superimposed, and the target audio signal may be determined based on a result of the superimposition. In this way, the target audio signal can be obtained quickly. Wherein, the superposition result can be expressed as: x (T) + y (T), where T e [0, T-1], T represents the total number of sampling points in the first audio signal and the second audio signal, and x (T) represents the signal strength of the T-th sampling point in the first audio signal, that is, represents the first audio signal; y (t) represents the signal strength of the t-th sample point in the second audio signal, i.e. represents the second audio signal.

Alternatively, the superposition result may be directly used as the target audio signal. Alternatively, the obtained superposition result may be multiplied by a preset weight, and the obtained product may be used as the target audio signal. Thus, data abnormality after superposition can be avoided. The preset weight may be less than 1, and the specific value may be set in combination with the actual application.

Since the sound source is fixed, but the positions of the first microphone and the second microphone are different, the distance, angle, etc. from the sound source to the two microphones are likely to be different, and thus the time and phase of acquiring the same signal by the two microphones are also different. Thus, the data may be aligned prior to being superimposed.

Alternatively, as another alternative embodiment, the target audio signal may be obtained by the method of fig. 3. Referring to fig. 3, fig. 3 is a flowchart illustrating sub-steps included in step S200 in fig. 2. In this embodiment, the step S200 may include a sub-step S210 and a sub-step S220.

And a substep S210, aligning the first audio signal and the second audio signal to obtain an aligned first audio signal and an aligned second audio signal.

And a substep S220, superimposing the aligned first audio signal and the aligned second audio signal to obtain the target audio signal.

Optionally, in a first optional implementation manner, an arbitrary alignment manner may be directly adopted to directly align the first audio signal and the second audio signal, and an obtained result is used as the first audio signal and the second audio signal after the alignment processing. And then the aligned first audio signal and the aligned second audio signal are superposed to obtain the target audio signal. Thereby, it is convenient to ensure the quality of the target audio signal.

Generally, when two audio signals are aligned, the problem is solved by converting the time domain into the frequency domain, then further analyzing the frequency domain according to the characteristics of the frequency domain, and finally converting the frequency domain into the time domain. However, this method is relatively computationally expensive. To reduce the amount of computation, the time domain signal may alternatively be used directly for alignment in the manner shown in fig. 4.

Referring to fig. 4, fig. 4 is a schematic flow chart illustrating alignment of two audio signals. The alignment may include substeps S2201 to S2203.

And a substep S2201, calculating in a time domain to obtain cross-correlation functions of the two paths of audio signals to be aligned.

And a substep S2202 of determining a maximum value of the cross-correlation function and using a shift value corresponding to the maximum value as a target offset.

And a substep S2203, aligning the two audio signals to be aligned according to the target offset.

In this embodiment, since the first audio signal and the second audio signal are directly aligned, the two audio signals to be aligned are the first audio signal a and the second audio signal B. The cross-correlation function of the first audio signal a and the second audio signal B may be calculated directly in the time domain.

Because the first audio signal a and the second audio signal B are discrete audio data, the cross-correlation function of the first audio signal a and the second audio signal B can be calculated according to the cross-correlation function formula of the discrete signals:

wherein R1(n) denotes a cross-correlation function sequence of the first audio signal a and the second audio signal B, n being a sequence index; n1 represents the same number of sampling points in the first audio signal A and the second audio signal B, m represents the subscript of the sampling point of the first audio signal A, and the value range is 0-N1-1; n represents a shift value.

The value MAX that maximizes the cross-correlation function R1(n) may be found and the x value of R1(x) in the R1(n) sequence to which MAX corresponds may be taken as the target offset. Then, the first audio signal a and the second audio signal B may be aligned according to the target offset x. Then, the target audio signal can be obtained by overlapping the sampling points a (0) and b (x) in alignment.

Optionally, the target audio signal may be obtained according to a superposition formula, the aligned first audio signal, and the aligned second audio signal, where the superposition formula is: g (m) ═ β [ e (m) + F (m + x) ], where g (m) represents the target audio signal; β represents a preset coefficient, i.e., an attenuation factor; e (m), F (m + x) denote the first audio signal and the second audio signal after the alignment processing, and x denotes a target offset amount.

By superimposing the formula as above, the target audio signal can be obtained: beta [ A (m) + B (m + x) ], wherein beta represents an attenuation factor, and the specific value is determined by actual conditions. Beta may be less than 1. A (m), B (m + x) denote the first audio signal and the second audio signal after the alignment processing.

Optionally, in a second optional implementation manner, noise reduction processing may be performed on the first audio signal and the second audio signal, then alignment is performed on the noise-reduced signals, and an obtained result is used as the aligned first audio signal and the aligned second audio signal. And then the aligned first audio signal and the aligned second audio signal are superposed to obtain the target audio signal. The specific manner of the noise reduction processing may be set according to actual requirements, and is not specifically limited herein. The superposition may be performed according to the superposition formula described above. Therefore, the alignment effect can be ensured, and the definition of effective signals in subsequent target audio signals is further ensured.

Optionally, an arbitrary single-microphone noise reduction algorithm may be directly adopted to process the first audio signal and the second audio signal respectively, and the noise-reduced first audio signal and the noise-reduced second audio signal are used as two audio signals to be aligned. Then, the first audio signal and the second audio signal after noise reduction may be aligned in a manner shown in fig. 4, so as to obtain the aligned first audio signal and the aligned second audio signal, and further obtain the target audio signal by superposition. The single-microphone noise reduction algorithm may be SPEEX, WEBRTC, and other open source libraries.

Optionally, the first sampling frequency is greater than the actual audio playing frequency, and in this way, under the condition that the playing frequency can be ensured, noise reduction and alignment can be performed in the manner shown in fig. 5. Referring to fig. 5, fig. 5 is a flowchart illustrating sub-steps included in sub-step S220 in fig. 3. The substep S220 may include substeps S221 to substep S223.

And a substep S221 of down-sampling the first audio signal to obtain a third audio signal.

And a substep S222, performing down-sampling on the second audio signal to obtain a fourth audio signal.

In this embodiment, the first audio signal and the second audio signal may be down-sampled by using a second sampling frequency, so as to obtain the third audio signal and the fourth audio signal. That is, the sampling frequencies used when obtaining the third audio signal and the fourth audio signal are both the second sampling frequency. The second sampling frequency is less than the first sampling frequency of the dual microphones during audio acquisition, and the specific value can be set according to actual requirements. In this way, the concept of "high-gain low-gain" is used to reduce the noise carried in the audio signal acquired by the two microphones, and at the same time, the amount of calculation can be reduced.

For example, the first sampling frequency may be 48000HZ, and the second sampling frequency may be 8000 HZ. 1/48000 is less than 1/8000.

And a substep S223 of aligning the third audio signal and the fourth audio signal to obtain the aligned first audio signal and the aligned second audio signal.

Optionally, the third audio signal and the fourth audio signal may be aligned directly in any manner, and the obtained result is used as the first audio signal and the second audio signal after the alignment processing. Optionally, the third audio signal and the fourth audio signal may be used as two paths of audio signals to be aligned in the alignment manner shown in fig. 4, and the method shown in fig. 4 is used to align the third audio signal and the fourth audio signal, and then the obtained results are used as the first audio signal and the second audio signal after the alignment processing. And then, overlapping can be carried out to obtain the target audio signal. Wherein, the superposition can be performed according to the superposition formula.

Optionally, in order to further reduce the influence of noise, further noise reduction may be performed as shown in fig. 6, so as to obtain the target audio signal. Referring to fig. 6, fig. 6 is a flowchart illustrating the sub-steps included in sub-step S223 in fig. 5. Sub-step S223 may include sub-step S2231 and sub-step S2232.

And a substep S2231, performing noise reduction processing on the third audio signal and the fourth audio signal respectively to obtain a fifth audio signal and a sixth audio signal.

Sub-step S2232 aligns the fifth audio signal and the sixth audio signal, and uses the aligned fifth audio signal and sixth audio signal as the aligned first audio signal and second audio signal.

Any single-microphone noise reduction algorithm can be adopted to respectively reduce the noise of the third audio signal and the fourth audio signal, so as to obtain a fifth audio signal and a sixth audio signal. The single-microphone noise reduction algorithm may be SPEEX, WEBRTC, and other open source libraries. Then, the fifth audio signal and the sixth audio signal may be used as two audio signals to be aligned in the manner shown in fig. 4, then the fifth audio signal and the sixth audio signal are aligned by using the manner shown in fig. 4, and the obtained results are used as the first audio signal and the second audio signal after the alignment processing. And then, overlapping can be carried out to obtain the target audio signal. This process can be expressed as: g (m) ═ β [ e (m) + F (m + x) ], where g (m) represents the target audio signal; beta represents a preset coefficient; e (m), F (m + x) represent the first audio signal and the second audio signal after the alignment processing, that is, represent the fifth audio signal and the sixth audio signal which are aligned; x represents the target offset. E (m) represents the signal strength of the mth sampling point in the fifth audio signal, and F (m + x) represents the signal strength of the m + x sampling point in the sixth audio signal.

In the embodiment, the first-stage noise reduction is carried out by a high-gain low-gain method, so that a good effect can be achieved by carrying out second-stage noise reduction by using a common single-microphone noise reduction algorithm. The noise reduction is carried out twice, the noise which influences the precision of the calculation result which is aligned by directly using the time domain signal can be removed, the sampling frequency is reduced, the time domain signal is directly used for calculation, the precision of the calculation result is ensured, the calculation amount is reduced, and the effect of higher efficiency is achieved.

And the signals subjected to secondary noise reduction and alignment are superposed, so that the energy of the same signal can be maximized. The noise is random and misaligned, and the noise is weakened or slightly enhanced after superposition, so that the overall signal-to-noise ratio of the finally obtained target audio signal is improved compared with the signal-to-noise ratio of the original first audio signal and the second audio signal.

Therefore, the embodiment of the application utilizes the cross correlation of two paths of audio data of the double microphones to align the data and then superpose the data, thereby enhancing the energy of effective signals and improving the signal-to-noise ratio.

After the target audio signal is obtained, the target audio signal can be played under the condition of need, so that the played sound source information content can be better heard. The voice acquisition is carried out by utilizing the embodiment of the application, the voice quality is higher, the distance of the audible voice information is farther, and the pickup distance of the double microphones can be increased.

The embodiment of the application makes full use of the correlation between two paths of data of the double microphones, effectively improves the signal to noise ratio of the voice signal from the angle of enhancing the effective signal energy, and has innovation and high efficiency compared with the traditional method only from the angle of reducing noise. In the data calculation process, the algorithm simplification degree directly influences the calculation efficiency and the accuracy of real-time results. The embodiment of the application directly aligns signals in the time domain, and compared with a common method which needs to convert time domain signals into frequency domain signals for analysis processing, the method only calculates in the time domain, greatly reduces the complexity degree and the calculation amount of the algorithm, and improves the efficiency of the algorithm.

In addition, the embodiment of the application has the advantages that the multistage noise reduction is carried out in advance before the alignment, so that the algorithm is clear in thought, and the calculation accuracy of the correlation is also guaranteed. The multi-stage noise reduction algorithm used in the embodiment of the application is an optimization algorithm performed on the basis of a conventional noise reduction algorithm. Firstly, the idea of 'high sampling and low sampling' is to increase the sampling frequency and then reduce the sampling frequency, so that the noise carried by the obtained sampling point is much smaller than that of the original frequency sampling. The method is simple and convenient, and other adverse factors cannot be introduced. And then, the conventional noise reduction algorithm aiming at the single microphone is directly used for reducing noise of the two paths of microphone data, so that a good noise reduction effect can be achieved, the use is convenient, the stability is high, and the accuracy of subsequent data processing is better ensured.

The above-described audio signal processing method is exemplified below.

The fixed position axes of the two microphones in the audio acquisition equipment are parallel and the directions are consistent. The distance between the two microphones is in the range of 10cm-20 cm.

Playing a section of sound source audio signal at a certain relative position of the audio acquisition equipment as a pickup target. The sound source is in a reasonable range which can be collected by the audio collecting equipment, and the sound source cannot be in an extreme condition such as infinite distance or infinite small volume. For example, the microphone may have a pickup distance of 5m, and the sound source may be located 8m away from the audio capture device.

And starting two microphones of the audio acquisition equipment to acquire audio, and acquiring a first audio signal A and a second audio signal B. The first sampling frequency used at this time is greater than the sampling frequency actually required to be used. Assuming that the sampling frequency that actually needs to be used is 8000HZ, the first sampling frequency may be set to 48000 HZ.

And performing down-sampling on the first audio signal A and the second audio signal B to obtain a third audio signal C and a fourth audio signal D.

And performing secondary noise reduction on the third audio signal C and the fourth audio signal D by using a noise reduction algorithm to obtain a fifth audio signal E and a sixth audio signal F. The noise reduction algorithm used here may be one commonly used in single microphones.

And calculating the offset of the fifth audio signal E and the sixth audio signal F, namely the number of offset sampling points. The fifth audio signal E and the sixth audio signal F are two paths of discrete audio data, and the cross-correlation function of the two signals is obtained by calculation according to the cross-correlation function formula of the discrete signals. The calculation process is as follows:

where r (N) denotes a cross-correlation function sequence of the fifth audio signal E and the sixth audio signal F, N denotes a sequence index, N denotes the number of correlated samples in the fifth audio signal E and the sixth audio signal F, m denotes a sample index of the fifth audio signal E, and m is 0-N-1. Finding the MAX that maximizes the cross-correlation function R (n), assuming R (x) in the R sequence corresponding to MAX, then x is the target offset of E, F two data samples. The fifth audio signal E and the sixth audio signal F can be aligned by using the target offset x.

After alignment, a superposition may be performed, resulting in a target audio signal. The process of alignment and overlay can be expressed as: g (m) ═ β [ E (m) + F (m + x) ], where g (m) represents data obtained by superposition, E (m) and F (m + x) represent the fifth audio signal E and the sixth audio signal F that are aligned, that is, the first audio signal and the second audio signal after alignment, and β represents an attenuation factor.

Thus, a target audio signal with high speech quality can be obtained.

In order to execute the corresponding steps in the above embodiments and various possible manners, an implementation manner of the audio signal processing apparatus 200 is given below, and optionally, the audio signal processing apparatus 200 may adopt the device structure of the electronic device 100 shown in fig. 1. Further, referring to fig. 7, fig. 7 is a block diagram illustrating an audio signal processing apparatus 200 according to an embodiment of the present disclosure. It should be noted that the basic principle and the generated technical effect of the audio signal processing apparatus 200 provided in the present embodiment are the same as those of the above embodiments, and for the sake of brief description, no part of the present embodiment is mentioned, and corresponding contents in the above embodiments may be referred to. The audio signal processing apparatus 200 may include: a signal obtaining module 210 and a processing module 220.

The signal obtaining module 210 is configured to obtain a first audio signal and a second audio signal obtained by performing audio acquisition on two microphones respectively. The dual microphones comprise a first microphone and a second microphone, and the axes of the first microphone and the second microphone are parallel and have the same direction.

The processing module 220 is configured to obtain a target audio signal by superimposing the first audio signal and the second audio signal.

Optionally, in this embodiment, the processing module 220 is specifically configured to: aligning the first audio signal and the second audio signal to obtain a first audio signal and a second audio signal which are aligned; and superposing the aligned first audio signal and the aligned second audio signal to obtain the target audio signal.

Optionally, in this embodiment, the manner of implementing the alignment by the processing module 220 includes: calculating in time domain to obtain cross-correlation functions of two paths of audio signals to be aligned; determining the maximum value of the cross-correlation function, and taking a shift value corresponding to the maximum value as a target offset; and aligning the two paths of audio signals to be aligned according to the target offset.

Optionally, in this embodiment, the processing module 220 is specifically configured to: obtaining the target audio signal according to a superposition formula, the aligned first audio signal and the aligned second audio signal, wherein the superposition formula is as follows:

G(m)＝β[E(m)+F(m+x)]

wherein g (m) represents the target audio signal, β represents a preset coefficient, e (m), F (m + x) represent the first audio signal and the second audio signal after the alignment processing, and x represents a target offset.

Optionally, in this embodiment, the processing module 220 is specifically configured to: down-sampling the first audio signal to obtain a third audio signal; down-sampling the second audio signal to obtain a fourth audio signal, wherein the sampling frequencies used for obtaining the third audio signal and the fourth audio signal are both second sampling frequencies, and the second sampling frequencies are smaller than the first sampling frequencies used for audio acquisition of the two microphones; aligning the third audio signal and the fourth audio signal to obtain the aligned first audio signal and the aligned second audio signal.

Optionally, in this embodiment, the processing module 220 is specifically configured to: respectively carrying out noise reduction processing on the third audio signal and the fourth audio signal to obtain a fifth audio signal and a sixth audio signal; aligning the fifth audio signal and the sixth audio signal, and using the aligned fifth audio signal and sixth audio signal as the aligned first audio signal and second audio signal.

Alternatively, the modules may be stored in the memory 110 shown in fig. 1 in the form of software or Firmware (Firmware) or may be fixed in an Operating System (OS) of the electronic device 100, and may be executed by the processor 120 in fig. 1. Meanwhile, data, codes of programs, and the like required to execute the above-described modules may be stored in the memory 110.

The embodiment of the present application further provides a readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the audio signal processing method.

In summary, the embodiments of the present application provide an audio signal processing method, an audio signal processing apparatus, an electronic device, and a readable storage medium, in which a target audio signal is obtained by superimposing a first audio signal and a second audio signal obtained by two microphones having axes parallel and the same direction. Therefore, the signal-to-noise ratio of the audio signal is effectively improved from the angle of enhancing the effective signal energy, and the purpose of improving the definition of the effective signal in the audio signal is achieved.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method can be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The foregoing is illustrative of only alternative embodiments of the present application and is not intended to limit the present application, which may be modified or varied by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. An audio signal processing method, comprising:

2. The method according to claim 1, wherein obtaining the target audio signal by superposition according to the first audio signal and the second audio signal comprises:

aligning the first audio signal and the second audio signal to obtain a first audio signal and a second audio signal which are aligned;

and superposing the aligned first audio signal and the aligned second audio signal to obtain the target audio signal.

3. The method of claim 2, wherein the performing an alignment process comprises:

calculating in time domain to obtain cross-correlation functions of two paths of audio signals to be aligned;

determining the maximum value of the cross-correlation function, and taking a shift value corresponding to the maximum value as a target offset;

and aligning the two paths of audio signals to be aligned according to the target offset.

4. The method according to claim 2 or 3, wherein the superimposing the aligned first audio signal and the aligned second audio signal to obtain the target audio signal comprises:

obtaining the target audio signal according to a superposition formula, the aligned first audio signal and the aligned second audio signal, wherein the superposition formula is as follows:

G(m)＝β[E(m)+F(m+x)]

5. The method according to claim 2 or 3, wherein the aligning the first audio signal and the second audio signal to obtain the aligned first audio signal and second audio signal comprises:

down-sampling the first audio signal to obtain a third audio signal;

down-sampling the second audio signal to obtain a fourth audio signal, wherein the sampling frequencies used for obtaining the third audio signal and the fourth audio signal are both second sampling frequencies, and the second sampling frequencies are smaller than the first sampling frequencies used for audio acquisition of the two microphones;

aligning the third audio signal and the fourth audio signal to obtain the aligned first audio signal and the aligned second audio signal.

6. The method of claim 5, wherein the aligning the third audio signal and the fourth audio signal to obtain the aligned first audio signal and the aligned second audio signal comprises:

respectively carrying out noise reduction processing on the third audio signal and the fourth audio signal to obtain a fifth audio signal and a sixth audio signal;

aligning the fifth audio signal and the sixth audio signal, and using the aligned fifth audio signal and sixth audio signal as the aligned first audio signal and second audio signal.

7. An audio signal processing apparatus, comprising:

8. The apparatus of claim 7, wherein the processing module is specifically configured to:

9. An electronic device comprising a processor and a memory, the memory storing machine executable instructions executable by the processor to implement the audio signal processing method of any one of claims 1-6.

10. A readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the audio signal processing method according to any one of claims 1 to 6.