CN113658579A - Audio signal processing method and device, electronic equipment and readable storage medium - Google Patents

Audio signal processing method and device, electronic equipment and readable storage medium Download PDF

Info

Publication number
CN113658579A
CN113658579A CN202111112448.9A CN202111112448A CN113658579A CN 113658579 A CN113658579 A CN 113658579A CN 202111112448 A CN202111112448 A CN 202111112448A CN 113658579 A CN113658579 A CN 113658579A
Authority
CN
China
Prior art keywords
audio signal
aligned
audio
microphone
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111112448.9A
Other languages
Chinese (zh)
Other versions
CN113658579B (en
Inventor
张娟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Unisinsight Technology Co Ltd
Original Assignee
Chongqing Unisinsight Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Unisinsight Technology Co Ltd filed Critical Chongqing Unisinsight Technology Co Ltd
Priority to CN202111112448.9A priority Critical patent/CN113658579B/en
Publication of CN113658579A publication Critical patent/CN113658579A/en
Application granted granted Critical
Publication of CN113658579B publication Critical patent/CN113658579B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers

Abstract

The application provides an audio signal processing method and device, electronic equipment and a readable storage medium, and relates to the technical field of computers. The method comprises the following steps: obtaining a first audio signal and a second audio signal which are obtained by respectively carrying out audio acquisition on two microphones, wherein the two microphones comprise a first microphone and a second microphone, and the axes of the first microphone and the second microphone are parallel and have the same direction; and obtaining a target audio signal by superposition according to the first audio signal and the second audio signal. Therefore, the target audio signal with high signal-to-noise ratio can be obtained, and the quality of the audio signal is improved.

Description

Audio signal processing method and device, electronic equipment and readable storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to an audio signal processing method and apparatus, an electronic device, and a readable storage medium.
Background
When the microphone performs audio acquisition in an environment with noise, not only effective signals but also noise can be acquired. The existence of noise can affect the quality of the voice signal and reduce the signal-to-noise ratio of the voice signal. Therefore, how to improve the snr of the speech signal becomes a technical problem that needs to be solved by those skilled in the art.
Disclosure of Invention
The embodiment of the application provides an audio signal processing method and device, electronic equipment and a readable storage medium, which can obtain a target audio signal with high signal-to-noise ratio by superposition based on two paths of audio signals acquired by two microphones with parallel axes and the same direction, thereby improving the quality of the audio signal.
The embodiment of the application can be realized as follows:
in a first aspect, an embodiment of the present application provides an audio signal processing method, including:
obtaining a first audio signal and a second audio signal which are obtained by respectively carrying out audio acquisition on two microphones, wherein the two microphones comprise a first microphone and a second microphone, and the axes of the first microphone and the second microphone are parallel and have the same direction;
and obtaining a target audio signal by superposition according to the first audio signal and the second audio signal.
In a second aspect, an embodiment of the present application provides an audio signal processing apparatus, including:
the signal acquisition module is used for acquiring a first audio signal and a second audio signal which are acquired by respectively carrying out audio acquisition on two microphones, wherein the two microphones comprise a first microphone and a second microphone, and the axes of the first microphone and the second microphone are parallel and have the same direction;
and the processing module is used for obtaining a target audio signal through superposition according to the first audio signal and the second audio signal.
In a third aspect, an embodiment of the present application provides an electronic device, which includes a processor and a memory, where the memory stores machine executable instructions that can be executed by the processor, and the processor can execute the machine executable instructions to implement the audio signal processing method described in the foregoing embodiment.
In a fourth aspect, the present application provides a readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the audio signal processing method according to the foregoing embodiments.
According to the audio signal processing method, the audio signal processing device, the electronic device and the readable storage medium, the target audio signal is obtained by superposing a first audio signal and a second audio signal which are obtained by two microphones with axes parallel and the same direction. Therefore, the signal-to-noise ratio of the audio signal is effectively improved from the angle of enhancing the effective signal energy, and the purpose of improving the definition of the effective signal in the audio signal is achieved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.
Fig. 1 is a block diagram of an electronic device according to an embodiment of the present disclosure;
fig. 2 is a schematic flowchart of an audio signal processing method according to an embodiment of the present application;
FIG. 3 is a flowchart illustrating the sub-steps included in step S200 of FIG. 2;
FIG. 4 is a schematic flow chart of aligning two audio signals;
FIG. 5 is a schematic flow chart of sub-steps included in sub-step S220 of FIG. 3;
FIG. 6 is a schematic flow chart of the substeps involved in substep S223 of FIG. 5;
fig. 7 is a block diagram of an audio signal processing apparatus according to an embodiment of the present application.
Icon: 100-an electronic device; 110-a memory; 120-a processor; 130-a communication unit; 200-audio signal processing means; 210-a signal obtaining module; 220-processing module.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.
It is noted that relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The field of application of the dual microphones is in two directions. One of the directions is an active noise reduction direction, for example, the active noise reduction direction can be applied to a mobile phone, which requires that the gains or sensitivities of two microphones in a dual-microphone are different. The other direction is a sound source localization or speech enhancement direction. Speech enhancement is a technique for extracting a useful speech signal from a noise background to suppress and reduce noise interference when the speech signal is interfered or even submerged by various noises. That is, speech enhancement is used to extract as clean as possible original speech from noisy speech. The voice enhancement is mapped to the practical application of the microphone, and the pickup distance is increased equivalently.
The current way of speech enhancement with dual microphones is: the noise is suppressed from the speech signal containing the noise, and a purer and clearer effective signal is obtained. For example, noise is suppressed by performing a difference operation on two noisy speech signals obtained based on two microphones. In this way, the directions of the two microphones are different. For example, a microphone is respectively arranged on the back of the lower end and the back of the upper end of the mobile phone, the microphone on the back of the upper end is mainly used for collecting noise, the microphone on the lower end is used for collecting voice and noise, and the noise can be suppressed by performing differential operation on the two paths of signals. It follows that the main direction of research in this approach is how to perform speech enhancement by noise reduction algorithms.
At present, in the related technologies, noise reduction is generally performed by using methods such as differential operation of two paths of data acquired by two microphones, that is, the used algorithm is mainly the direction of noise reduction. However, based on the two microphones, the voice quality is improved only by noise reduction through a signal difference method, and such direction and thought are single, and the advantages of the two microphones cannot be fully exerted.
Based on this, embodiments of the present application provide an audio signal processing method, an apparatus, an electronic device, and a readable storage medium, which can fully utilize the advantages of two microphones, improve the signal-to-noise ratio, and improve the quality of a speech signal.
Some embodiments of the present application will be described in detail below with reference to the accompanying drawings. The embodiments described below and the features of the embodiments can be combined with each other without conflict.
Referring to fig. 1, fig. 1 is a block diagram of an electronic device 100 according to an embodiment of the present disclosure. The electronic device 100 may be, but is not limited to, a smart phone, a computer, a server, etc. The electronic device 100 may include a memory 110, a processor 120, and a communication unit 130. The elements of the memory 110, the processor 120 and the communication unit 130 are electrically connected to each other directly or indirectly to realize data transmission or interaction. For example, the components may be electrically connected to each other via one or more communication buses or signal lines.
The memory 110 is used to store programs or data. The Memory 110 may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Read-Only Memory (EEPROM), and the like.
The processor 120 is used to read/write data or programs stored in the memory 110 and perform corresponding functions. For example, the memory 110 stores the audio signal processing apparatus 200, and the audio signal processing apparatus 200 includes at least one software functional module which can be stored in the memory 110 in the form of software or firmware (firmware). The processor 120 executes various functional applications and data processing, i.e., implements the audio signal processing method in the embodiment of the present application, by running software programs and modules stored in the memory 110, such as the audio signal processing apparatus 200 in the embodiment of the present application.
The communication unit 130 is used for establishing a communication connection between the electronic apparatus 100 and another communication terminal via a network, and for transceiving data via the network.
It should be understood that the structure shown in fig. 1 is only a schematic structural diagram of the electronic device 100, and the electronic device 100 may also include more or fewer components than shown in fig. 1, or have a different configuration than shown in fig. 1. The components shown in fig. 1 may be implemented in hardware, software, or a combination thereof.
Referring to fig. 2, fig. 2 is a flowchart illustrating an audio signal processing method according to an embodiment of the present disclosure. The method may be applied to the electronic device 100 described above. The following describes a specific flow of the audio signal processing method in detail. In this embodiment, the method may include step S100 and step S200.
Step S100, a first audio signal and a second audio signal obtained by respectively performing audio acquisition by two microphones are obtained.
In this embodiment, audio acquisition may be performed by an audio acquisition device, so as to obtain the first audio signal and the second audio signal. The audio acquisition equipment is a double-microphone device and comprises a first microphone and a second microphone.
The first microphone and the second microphone can simultaneously acquire audio from the same sound source (i.e., the same pickup target), so that the first microphone is used to acquire a first audio signal and the second audio signal is used to acquire a second audio signal. It can be understood that the sound source is in a reasonable range that can be collected by the audio collecting device, and may not be at an infinite distance or at an extreme condition that the volume is invalid and the like, so that a target audio signal with clear effective signals (i.e. voice signals) can be obtained subsequently. For example, the sound collecting distance of one microphone of the audio collecting device is 5m, and when the audio collecting device is used for collecting audio, the distance between the audio collecting device and a sound source is 8 m.
The microphone is directional. After the first microphone and the second microphone are fixed, the axes of the fixed positions of the first microphone and the second microphone are parallel and the directions of the fixed positions of the first microphone and the second microphone are consistent, so that a target audio signal can be obtained subsequently. Optionally, the distance between the first microphone and the second microphone is within 10-20 cm. It is understood that the distance between the two microphones can be set according to specific conditions, as long as the target audio signal with clear effective signals can be obtained.
As an alternative embodiment, the first microphone and the second microphone may be microphones with the same acquisition gain and other related parameters. Alternatively, the first microphone and the second microphone may be the same two microphones.
The audio acquisition device and the electronic device 100 may be the same device or different devices, and may be determined by actual conditions. In a case that the audio capture device is not the same as the electronic device 100, after the audio electronic device obtains the first audio signal and the second audio signal by using the two microphones, the audio electronic device may send the first audio signal and the second audio signal to the electronic device 100, so that the electronic device 100 obtains the first audio signal and the second audio signal.
And step S200, obtaining a target audio signal through superposition according to the first audio signal and the second audio signal.
In this embodiment, the first audio signal and the second audio signal are signals obtained by simultaneously performing audio acquisition on the same sound source by using two microphones, so that the first audio signal and the second audio signal have stronger correlation before, that is, the first audio signal and the second audio signal have the same voice, and the correlation is strong. And aiming at the first audio signal and the second audio signal, a target audio signal with improved signal-to-noise ratio can be obtained through superposition. For example, the first audio signal and the second audio signal are directly superimposed, so that the human voice in the two audio signals is superimposed; or the first audio signal and the second audio signal are preprocessed and then overlapped, so that the voices in the two paths of audio signals are overlapped. Therefore, the energy of the effective signal can be enhanced by superposing the voice in the two paths of audio signals of the two microphones, so that the signal-to-noise ratio of the audio signals is effectively improved from the angle of enhancing the energy of the effective signal, and the aim of improving the quality of the audio signals is fulfilled. Compared with the conventional mode of improving the signal to noise ratio from the noise reduction angle, the mode has the advantages of innovation and high efficiency.
Optionally, in this embodiment, the first microphone and the second microphone use the same capture frequency for audio capture, so as to ensure a subsequent superposition effect.
Wherein, the first collecting frequency used by the first microphone and the second microphone can be a preset collecting frequency. The first capture frequency may be set according to the actual audio playback frequency. The first capture frequency may be greater than or equal to the actual audio playback frequency.
For example, if the down-sampling is not performed subsequently, the first sampling frequency may be set as the actual audio playing frequency. For example, if the actual audio playing frequency is 8000HZ, the first capturing frequency may be set to 8000 HZ.
If the down-sampling is performed subsequently, the first acquisition frequency may be set to be greater than the actual audio playing frequency, so that after the down-sampling is performed, the frequency of the signal can meet the actual audio playing frequency requirement.
Optionally, as an optional embodiment, in a case where the first audio signal and the second audio signal are obtained, the first audio signal and the second audio signal may be directly superimposed, and the target audio signal may be determined based on a result of the superimposition. In this way, the target audio signal can be obtained quickly. Wherein, the superposition result can be expressed as: x (T) + y (T), where T e [0, T-1], T represents the total number of sampling points in the first audio signal and the second audio signal, and x (T) represents the signal strength of the T-th sampling point in the first audio signal, that is, represents the first audio signal; y (t) represents the signal strength of the t-th sample point in the second audio signal, i.e. represents the second audio signal.
Alternatively, the superposition result may be directly used as the target audio signal. Alternatively, the obtained superposition result may be multiplied by a preset weight, and the obtained product may be used as the target audio signal. Thus, data abnormality after superposition can be avoided. The preset weight may be less than 1, and the specific value may be set in combination with the actual application.
Since the sound source is fixed, but the positions of the first microphone and the second microphone are different, the distance, angle, etc. from the sound source to the two microphones are likely to be different, and thus the time and phase of acquiring the same signal by the two microphones are also different. Thus, the data may be aligned prior to being superimposed.
Alternatively, as another alternative embodiment, the target audio signal may be obtained by the method of fig. 3. Referring to fig. 3, fig. 3 is a flowchart illustrating sub-steps included in step S200 in fig. 2. In this embodiment, the step S200 may include a sub-step S210 and a sub-step S220.
And a substep S210, aligning the first audio signal and the second audio signal to obtain an aligned first audio signal and an aligned second audio signal.
And a substep S220, superimposing the aligned first audio signal and the aligned second audio signal to obtain the target audio signal.
Optionally, in a first optional implementation manner, an arbitrary alignment manner may be directly adopted to directly align the first audio signal and the second audio signal, and an obtained result is used as the first audio signal and the second audio signal after the alignment processing. And then the aligned first audio signal and the aligned second audio signal are superposed to obtain the target audio signal. Thereby, it is convenient to ensure the quality of the target audio signal.
Generally, when two audio signals are aligned, the problem is solved by converting the time domain into the frequency domain, then further analyzing the frequency domain according to the characteristics of the frequency domain, and finally converting the frequency domain into the time domain. However, this method is relatively computationally expensive. To reduce the amount of computation, the time domain signal may alternatively be used directly for alignment in the manner shown in fig. 4.
Referring to fig. 4, fig. 4 is a schematic flow chart illustrating alignment of two audio signals. The alignment may include substeps S2201 to S2203.
And a substep S2201, calculating in a time domain to obtain cross-correlation functions of the two paths of audio signals to be aligned.
And a substep S2202 of determining a maximum value of the cross-correlation function and using a shift value corresponding to the maximum value as a target offset.
And a substep S2203, aligning the two audio signals to be aligned according to the target offset.
In this embodiment, since the first audio signal and the second audio signal are directly aligned, the two audio signals to be aligned are the first audio signal a and the second audio signal B. The cross-correlation function of the first audio signal a and the second audio signal B may be calculated directly in the time domain.
Because the first audio signal a and the second audio signal B are discrete audio data, the cross-correlation function of the first audio signal a and the second audio signal B can be calculated according to the cross-correlation function formula of the discrete signals:
Figure BDA0003270737170000091
wherein R1(n) denotes a cross-correlation function sequence of the first audio signal a and the second audio signal B, n being a sequence index; n1 represents the same number of sampling points in the first audio signal A and the second audio signal B, m represents the subscript of the sampling point of the first audio signal A, and the value range is 0-N1-1; n represents a shift value.
The value MAX that maximizes the cross-correlation function R1(n) may be found and the x value of R1(x) in the R1(n) sequence to which MAX corresponds may be taken as the target offset. Then, the first audio signal a and the second audio signal B may be aligned according to the target offset x. Then, the target audio signal can be obtained by overlapping the sampling points a (0) and b (x) in alignment.
Optionally, the target audio signal may be obtained according to a superposition formula, the aligned first audio signal, and the aligned second audio signal, where the superposition formula is: g (m) ═ β [ e (m) + F (m + x) ], where g (m) represents the target audio signal; β represents a preset coefficient, i.e., an attenuation factor; e (m), F (m + x) denote the first audio signal and the second audio signal after the alignment processing, and x denotes a target offset amount.
By superimposing the formula as above, the target audio signal can be obtained: beta [ A (m) + B (m + x) ], wherein beta represents an attenuation factor, and the specific value is determined by actual conditions. Beta may be less than 1. A (m), B (m + x) denote the first audio signal and the second audio signal after the alignment processing.
Optionally, in a second optional implementation manner, noise reduction processing may be performed on the first audio signal and the second audio signal, then alignment is performed on the noise-reduced signals, and an obtained result is used as the aligned first audio signal and the aligned second audio signal. And then the aligned first audio signal and the aligned second audio signal are superposed to obtain the target audio signal. The specific manner of the noise reduction processing may be set according to actual requirements, and is not specifically limited herein. The superposition may be performed according to the superposition formula described above. Therefore, the alignment effect can be ensured, and the definition of effective signals in subsequent target audio signals is further ensured.
Optionally, an arbitrary single-microphone noise reduction algorithm may be directly adopted to process the first audio signal and the second audio signal respectively, and the noise-reduced first audio signal and the noise-reduced second audio signal are used as two audio signals to be aligned. Then, the first audio signal and the second audio signal after noise reduction may be aligned in a manner shown in fig. 4, so as to obtain the aligned first audio signal and the aligned second audio signal, and further obtain the target audio signal by superposition. The single-microphone noise reduction algorithm may be SPEEX, WEBRTC, and other open source libraries.
Optionally, the first sampling frequency is greater than the actual audio playing frequency, and in this way, under the condition that the playing frequency can be ensured, noise reduction and alignment can be performed in the manner shown in fig. 5. Referring to fig. 5, fig. 5 is a flowchart illustrating sub-steps included in sub-step S220 in fig. 3. The substep S220 may include substeps S221 to substep S223.
And a substep S221 of down-sampling the first audio signal to obtain a third audio signal.
And a substep S222, performing down-sampling on the second audio signal to obtain a fourth audio signal.
In this embodiment, the first audio signal and the second audio signal may be down-sampled by using a second sampling frequency, so as to obtain the third audio signal and the fourth audio signal. That is, the sampling frequencies used when obtaining the third audio signal and the fourth audio signal are both the second sampling frequency. The second sampling frequency is less than the first sampling frequency of the dual microphones during audio acquisition, and the specific value can be set according to actual requirements. In this way, the concept of "high-gain low-gain" is used to reduce the noise carried in the audio signal acquired by the two microphones, and at the same time, the amount of calculation can be reduced.
For example, the first sampling frequency may be 48000HZ, and the second sampling frequency may be 8000 HZ. 1/48000 is less than 1/8000.
And a substep S223 of aligning the third audio signal and the fourth audio signal to obtain the aligned first audio signal and the aligned second audio signal.
Optionally, the third audio signal and the fourth audio signal may be aligned directly in any manner, and the obtained result is used as the first audio signal and the second audio signal after the alignment processing. Optionally, the third audio signal and the fourth audio signal may be used as two paths of audio signals to be aligned in the alignment manner shown in fig. 4, and the method shown in fig. 4 is used to align the third audio signal and the fourth audio signal, and then the obtained results are used as the first audio signal and the second audio signal after the alignment processing. And then, overlapping can be carried out to obtain the target audio signal. Wherein, the superposition can be performed according to the superposition formula.
Optionally, in order to further reduce the influence of noise, further noise reduction may be performed as shown in fig. 6, so as to obtain the target audio signal. Referring to fig. 6, fig. 6 is a flowchart illustrating the sub-steps included in sub-step S223 in fig. 5. Sub-step S223 may include sub-step S2231 and sub-step S2232.
And a substep S2231, performing noise reduction processing on the third audio signal and the fourth audio signal respectively to obtain a fifth audio signal and a sixth audio signal.
Sub-step S2232 aligns the fifth audio signal and the sixth audio signal, and uses the aligned fifth audio signal and sixth audio signal as the aligned first audio signal and second audio signal.
Any single-microphone noise reduction algorithm can be adopted to respectively reduce the noise of the third audio signal and the fourth audio signal, so as to obtain a fifth audio signal and a sixth audio signal. The single-microphone noise reduction algorithm may be SPEEX, WEBRTC, and other open source libraries. Then, the fifth audio signal and the sixth audio signal may be used as two audio signals to be aligned in the manner shown in fig. 4, then the fifth audio signal and the sixth audio signal are aligned by using the manner shown in fig. 4, and the obtained results are used as the first audio signal and the second audio signal after the alignment processing. And then, overlapping can be carried out to obtain the target audio signal. This process can be expressed as: g (m) ═ β [ e (m) + F (m + x) ], where g (m) represents the target audio signal; beta represents a preset coefficient; e (m), F (m + x) represent the first audio signal and the second audio signal after the alignment processing, that is, represent the fifth audio signal and the sixth audio signal which are aligned; x represents the target offset. E (m) represents the signal strength of the mth sampling point in the fifth audio signal, and F (m + x) represents the signal strength of the m + x sampling point in the sixth audio signal.
In the embodiment, the first-stage noise reduction is carried out by a high-gain low-gain method, so that a good effect can be achieved by carrying out second-stage noise reduction by using a common single-microphone noise reduction algorithm. The noise reduction is carried out twice, the noise which influences the precision of the calculation result which is aligned by directly using the time domain signal can be removed, the sampling frequency is reduced, the time domain signal is directly used for calculation, the precision of the calculation result is ensured, the calculation amount is reduced, and the effect of higher efficiency is achieved.
And the signals subjected to secondary noise reduction and alignment are superposed, so that the energy of the same signal can be maximized. The noise is random and misaligned, and the noise is weakened or slightly enhanced after superposition, so that the overall signal-to-noise ratio of the finally obtained target audio signal is improved compared with the signal-to-noise ratio of the original first audio signal and the second audio signal.
Therefore, the embodiment of the application utilizes the cross correlation of two paths of audio data of the double microphones to align the data and then superpose the data, thereby enhancing the energy of effective signals and improving the signal-to-noise ratio.
After the target audio signal is obtained, the target audio signal can be played under the condition of need, so that the played sound source information content can be better heard. The voice acquisition is carried out by utilizing the embodiment of the application, the voice quality is higher, the distance of the audible voice information is farther, and the pickup distance of the double microphones can be increased.
The embodiment of the application makes full use of the correlation between two paths of data of the double microphones, effectively improves the signal to noise ratio of the voice signal from the angle of enhancing the effective signal energy, and has innovation and high efficiency compared with the traditional method only from the angle of reducing noise. In the data calculation process, the algorithm simplification degree directly influences the calculation efficiency and the accuracy of real-time results. The embodiment of the application directly aligns signals in the time domain, and compared with a common method which needs to convert time domain signals into frequency domain signals for analysis processing, the method only calculates in the time domain, greatly reduces the complexity degree and the calculation amount of the algorithm, and improves the efficiency of the algorithm.
In addition, the embodiment of the application has the advantages that the multistage noise reduction is carried out in advance before the alignment, so that the algorithm is clear in thought, and the calculation accuracy of the correlation is also guaranteed. The multi-stage noise reduction algorithm used in the embodiment of the application is an optimization algorithm performed on the basis of a conventional noise reduction algorithm. Firstly, the idea of 'high sampling and low sampling' is to increase the sampling frequency and then reduce the sampling frequency, so that the noise carried by the obtained sampling point is much smaller than that of the original frequency sampling. The method is simple and convenient, and other adverse factors cannot be introduced. And then, the conventional noise reduction algorithm aiming at the single microphone is directly used for reducing noise of the two paths of microphone data, so that a good noise reduction effect can be achieved, the use is convenient, the stability is high, and the accuracy of subsequent data processing is better ensured.
The above-described audio signal processing method is exemplified below.
The fixed position axes of the two microphones in the audio acquisition equipment are parallel and the directions are consistent. The distance between the two microphones is in the range of 10cm-20 cm.
Playing a section of sound source audio signal at a certain relative position of the audio acquisition equipment as a pickup target. The sound source is in a reasonable range which can be collected by the audio collecting equipment, and the sound source cannot be in an extreme condition such as infinite distance or infinite small volume. For example, the microphone may have a pickup distance of 5m, and the sound source may be located 8m away from the audio capture device.
And starting two microphones of the audio acquisition equipment to acquire audio, and acquiring a first audio signal A and a second audio signal B. The first sampling frequency used at this time is greater than the sampling frequency actually required to be used. Assuming that the sampling frequency that actually needs to be used is 8000HZ, the first sampling frequency may be set to 48000 HZ.
And performing down-sampling on the first audio signal A and the second audio signal B to obtain a third audio signal C and a fourth audio signal D.
And performing secondary noise reduction on the third audio signal C and the fourth audio signal D by using a noise reduction algorithm to obtain a fifth audio signal E and a sixth audio signal F. The noise reduction algorithm used here may be one commonly used in single microphones.
And calculating the offset of the fifth audio signal E and the sixth audio signal F, namely the number of offset sampling points. The fifth audio signal E and the sixth audio signal F are two paths of discrete audio data, and the cross-correlation function of the two signals is obtained by calculation according to the cross-correlation function formula of the discrete signals. The calculation process is as follows:
Figure BDA0003270737170000131
where r (N) denotes a cross-correlation function sequence of the fifth audio signal E and the sixth audio signal F, N denotes a sequence index, N denotes the number of correlated samples in the fifth audio signal E and the sixth audio signal F, m denotes a sample index of the fifth audio signal E, and m is 0-N-1. Finding the MAX that maximizes the cross-correlation function R (n), assuming R (x) in the R sequence corresponding to MAX, then x is the target offset of E, F two data samples. The fifth audio signal E and the sixth audio signal F can be aligned by using the target offset x.
After alignment, a superposition may be performed, resulting in a target audio signal. The process of alignment and overlay can be expressed as: g (m) ═ β [ E (m) + F (m + x) ], where g (m) represents data obtained by superposition, E (m) and F (m + x) represent the fifth audio signal E and the sixth audio signal F that are aligned, that is, the first audio signal and the second audio signal after alignment, and β represents an attenuation factor.
Thus, a target audio signal with high speech quality can be obtained.
In order to execute the corresponding steps in the above embodiments and various possible manners, an implementation manner of the audio signal processing apparatus 200 is given below, and optionally, the audio signal processing apparatus 200 may adopt the device structure of the electronic device 100 shown in fig. 1. Further, referring to fig. 7, fig. 7 is a block diagram illustrating an audio signal processing apparatus 200 according to an embodiment of the present disclosure. It should be noted that the basic principle and the generated technical effect of the audio signal processing apparatus 200 provided in the present embodiment are the same as those of the above embodiments, and for the sake of brief description, no part of the present embodiment is mentioned, and corresponding contents in the above embodiments may be referred to. The audio signal processing apparatus 200 may include: a signal obtaining module 210 and a processing module 220.
The signal obtaining module 210 is configured to obtain a first audio signal and a second audio signal obtained by performing audio acquisition on two microphones respectively. The dual microphones comprise a first microphone and a second microphone, and the axes of the first microphone and the second microphone are parallel and have the same direction.
The processing module 220 is configured to obtain a target audio signal by superimposing the first audio signal and the second audio signal.
Optionally, in this embodiment, the processing module 220 is specifically configured to: aligning the first audio signal and the second audio signal to obtain a first audio signal and a second audio signal which are aligned; and superposing the aligned first audio signal and the aligned second audio signal to obtain the target audio signal.
Optionally, in this embodiment, the manner of implementing the alignment by the processing module 220 includes: calculating in time domain to obtain cross-correlation functions of two paths of audio signals to be aligned; determining the maximum value of the cross-correlation function, and taking a shift value corresponding to the maximum value as a target offset; and aligning the two paths of audio signals to be aligned according to the target offset.
Optionally, in this embodiment, the processing module 220 is specifically configured to: obtaining the target audio signal according to a superposition formula, the aligned first audio signal and the aligned second audio signal, wherein the superposition formula is as follows:
G(m)=β[E(m)+F(m+x)]
wherein g (m) represents the target audio signal, β represents a preset coefficient, e (m), F (m + x) represent the first audio signal and the second audio signal after the alignment processing, and x represents a target offset.
Optionally, in this embodiment, the processing module 220 is specifically configured to: down-sampling the first audio signal to obtain a third audio signal; down-sampling the second audio signal to obtain a fourth audio signal, wherein the sampling frequencies used for obtaining the third audio signal and the fourth audio signal are both second sampling frequencies, and the second sampling frequencies are smaller than the first sampling frequencies used for audio acquisition of the two microphones; aligning the third audio signal and the fourth audio signal to obtain the aligned first audio signal and the aligned second audio signal.
Optionally, in this embodiment, the processing module 220 is specifically configured to: respectively carrying out noise reduction processing on the third audio signal and the fourth audio signal to obtain a fifth audio signal and a sixth audio signal; aligning the fifth audio signal and the sixth audio signal, and using the aligned fifth audio signal and sixth audio signal as the aligned first audio signal and second audio signal.
Alternatively, the modules may be stored in the memory 110 shown in fig. 1 in the form of software or Firmware (Firmware) or may be fixed in an Operating System (OS) of the electronic device 100, and may be executed by the processor 120 in fig. 1. Meanwhile, data, codes of programs, and the like required to execute the above-described modules may be stored in the memory 110.
The embodiment of the present application further provides a readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the audio signal processing method.
In summary, the embodiments of the present application provide an audio signal processing method, an audio signal processing apparatus, an electronic device, and a readable storage medium, in which a target audio signal is obtained by superimposing a first audio signal and a second audio signal obtained by two microphones having axes parallel and the same direction. Therefore, the signal-to-noise ratio of the audio signal is effectively improved from the angle of enhancing the effective signal energy, and the purpose of improving the definition of the effective signal in the audio signal is achieved.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method can be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The foregoing is illustrative of only alternative embodiments of the present application and is not intended to limit the present application, which may be modified or varied by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (10)

1. An audio signal processing method, comprising:
obtaining a first audio signal and a second audio signal which are obtained by respectively carrying out audio acquisition on two microphones, wherein the two microphones comprise a first microphone and a second microphone, and the axes of the first microphone and the second microphone are parallel and have the same direction;
and obtaining a target audio signal by superposition according to the first audio signal and the second audio signal.
2. The method according to claim 1, wherein obtaining the target audio signal by superposition according to the first audio signal and the second audio signal comprises:
aligning the first audio signal and the second audio signal to obtain a first audio signal and a second audio signal which are aligned;
and superposing the aligned first audio signal and the aligned second audio signal to obtain the target audio signal.
3. The method of claim 2, wherein the performing an alignment process comprises:
calculating in time domain to obtain cross-correlation functions of two paths of audio signals to be aligned;
determining the maximum value of the cross-correlation function, and taking a shift value corresponding to the maximum value as a target offset;
and aligning the two paths of audio signals to be aligned according to the target offset.
4. The method according to claim 2 or 3, wherein the superimposing the aligned first audio signal and the aligned second audio signal to obtain the target audio signal comprises:
obtaining the target audio signal according to a superposition formula, the aligned first audio signal and the aligned second audio signal, wherein the superposition formula is as follows:
G(m)=β[E(m)+F(m+x)]
wherein g (m) represents the target audio signal, β represents a preset coefficient, e (m), F (m + x) represent the first audio signal and the second audio signal after the alignment processing, and x represents a target offset.
5. The method according to claim 2 or 3, wherein the aligning the first audio signal and the second audio signal to obtain the aligned first audio signal and second audio signal comprises:
down-sampling the first audio signal to obtain a third audio signal;
down-sampling the second audio signal to obtain a fourth audio signal, wherein the sampling frequencies used for obtaining the third audio signal and the fourth audio signal are both second sampling frequencies, and the second sampling frequencies are smaller than the first sampling frequencies used for audio acquisition of the two microphones;
aligning the third audio signal and the fourth audio signal to obtain the aligned first audio signal and the aligned second audio signal.
6. The method of claim 5, wherein the aligning the third audio signal and the fourth audio signal to obtain the aligned first audio signal and the aligned second audio signal comprises:
respectively carrying out noise reduction processing on the third audio signal and the fourth audio signal to obtain a fifth audio signal and a sixth audio signal;
aligning the fifth audio signal and the sixth audio signal, and using the aligned fifth audio signal and sixth audio signal as the aligned first audio signal and second audio signal.
7. An audio signal processing apparatus, comprising:
the signal acquisition module is used for acquiring a first audio signal and a second audio signal which are acquired by respectively carrying out audio acquisition on two microphones, wherein the two microphones comprise a first microphone and a second microphone, and the axes of the first microphone and the second microphone are parallel and have the same direction;
and the processing module is used for obtaining a target audio signal through superposition according to the first audio signal and the second audio signal.
8. The apparatus of claim 7, wherein the processing module is specifically configured to:
aligning the first audio signal and the second audio signal to obtain a first audio signal and a second audio signal which are aligned;
and superposing the aligned first audio signal and the aligned second audio signal to obtain the target audio signal.
9. An electronic device comprising a processor and a memory, the memory storing machine executable instructions executable by the processor to implement the audio signal processing method of any one of claims 1-6.
10. A readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the audio signal processing method according to any one of claims 1 to 6.
CN202111112448.9A 2021-09-18 2021-09-18 Audio signal processing method, device, electronic equipment and readable storage medium Active CN113658579B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111112448.9A CN113658579B (en) 2021-09-18 2021-09-18 Audio signal processing method, device, electronic equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111112448.9A CN113658579B (en) 2021-09-18 2021-09-18 Audio signal processing method, device, electronic equipment and readable storage medium

Publications (2)

Publication Number Publication Date
CN113658579A true CN113658579A (en) 2021-11-16
CN113658579B CN113658579B (en) 2024-01-30

Family

ID=78484098

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111112448.9A Active CN113658579B (en) 2021-09-18 2021-09-18 Audio signal processing method, device, electronic equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN113658579B (en)

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0398594U (en) * 1990-01-26 1991-10-14
JPH08313659A (en) * 1995-05-16 1996-11-29 Atr Ningen Joho Tsushin Kenkyusho:Kk Signal time difference detector
JP2001100774A (en) * 1999-09-28 2001-04-13 Takayuki Arai Voice processor
CA2381516A1 (en) * 2001-04-12 2002-10-12 Gennum Corporation Digital hearing aid system
US6697494B1 (en) * 1999-12-15 2004-02-24 Phonak Ag Method to generate a predetermined or predeterminable receiving characteristic of a digital hearing aid, and a digital hearing aid
JP2007214913A (en) * 2006-02-09 2007-08-23 Yamaha Corp Sound collection apparatus
JP2009010593A (en) * 2007-06-27 2009-01-15 Yamaha Corp Portable communication terminal
JP2010026361A (en) * 2008-07-23 2010-02-04 Internatl Business Mach Corp <Ibm> Speech collection method, system and program
JP2011114623A (en) * 2009-11-27 2011-06-09 Teac Corp Sound recorder
CN102403022A (en) * 2010-09-13 2012-04-04 三洋电机株式会社 Recording apparatus, recording condition setting method, and recording condition setting program
CN105191345A (en) * 2013-03-29 2015-12-23 日产自动车株式会社 Microphone support device for sound source localization
CN107040856A (en) * 2016-02-04 2017-08-11 北京卓锐微技术有限公司 A kind of microphone array module
CN110400571A (en) * 2019-08-08 2019-11-01 Oppo广东移动通信有限公司 Audio-frequency processing method, device, storage medium and electronic equipment
CN111010646A (en) * 2020-03-11 2020-04-14 恒玄科技(北京)有限公司 Method and system for transparent transmission of earphone and earphone
CN111968667A (en) * 2020-08-13 2020-11-20 杭州芯声智能科技有限公司 Double-microphone voice noise reduction device and noise reduction method thereof
CN113270106A (en) * 2021-05-07 2021-08-17 深圳市友杰智新科技有限公司 Method, device and equipment for inhibiting wind noise of double microphones and storage medium

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0398594U (en) * 1990-01-26 1991-10-14
JPH08313659A (en) * 1995-05-16 1996-11-29 Atr Ningen Joho Tsushin Kenkyusho:Kk Signal time difference detector
JP2001100774A (en) * 1999-09-28 2001-04-13 Takayuki Arai Voice processor
US6697494B1 (en) * 1999-12-15 2004-02-24 Phonak Ag Method to generate a predetermined or predeterminable receiving characteristic of a digital hearing aid, and a digital hearing aid
CA2381516A1 (en) * 2001-04-12 2002-10-12 Gennum Corporation Digital hearing aid system
JP2007214913A (en) * 2006-02-09 2007-08-23 Yamaha Corp Sound collection apparatus
JP2009010593A (en) * 2007-06-27 2009-01-15 Yamaha Corp Portable communication terminal
JP2010026361A (en) * 2008-07-23 2010-02-04 Internatl Business Mach Corp <Ibm> Speech collection method, system and program
JP2011114623A (en) * 2009-11-27 2011-06-09 Teac Corp Sound recorder
CN102403022A (en) * 2010-09-13 2012-04-04 三洋电机株式会社 Recording apparatus, recording condition setting method, and recording condition setting program
CN105191345A (en) * 2013-03-29 2015-12-23 日产自动车株式会社 Microphone support device for sound source localization
CN107040856A (en) * 2016-02-04 2017-08-11 北京卓锐微技术有限公司 A kind of microphone array module
CN110400571A (en) * 2019-08-08 2019-11-01 Oppo广东移动通信有限公司 Audio-frequency processing method, device, storage medium and electronic equipment
CN111010646A (en) * 2020-03-11 2020-04-14 恒玄科技(北京)有限公司 Method and system for transparent transmission of earphone and earphone
CN111968667A (en) * 2020-08-13 2020-11-20 杭州芯声智能科技有限公司 Double-microphone voice noise reduction device and noise reduction method thereof
CN113270106A (en) * 2021-05-07 2021-08-17 深圳市友杰智新科技有限公司 Method, device and equipment for inhibiting wind noise of double microphones and storage medium

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
ALFREDO CIGADA: "The delay & sum algorithm applied to microphone array measurements: Numerical analysis and experimental validation", 《MECHANICAL SYSTEMS AND SIGNAL PROCESSING》 *
余刘昌: "\"基于手机麦克风阵列的语音增强方法研究\"", 《中国优秀硕士学位论文全文数据库(信息科技辑)》 *
王琪: "阵列语音增强算法的研究及实现", 《中国优秀硕士学位论文全文数据库(信息科技辑)》 *
苏龙: "" 麦克风阵列语音增强方法研究"", 《中国优秀硕士学位论文全文数据库(信息科技辑)》, pages 37 - 40 *
董晓娟: ""麦克风阵列语音增强的算法研究"", 《中国优秀硕士学位论文全文数据库(信息科技辑)》 *

Also Published As

Publication number Publication date
CN113658579B (en) 2024-01-30

Similar Documents

Publication Publication Date Title
CN110556103B (en) Audio signal processing method, device, system, equipment and storage medium
US5651071A (en) Noise reduction system for binaural hearing aid
CN109473118B (en) Dual-channel speech enhancement method and device
US20090279715A1 (en) Method, medium, and apparatus for extracting target sound from mixed sound
US8515085B2 (en) Signal processing apparatus
US20030061032A1 (en) Selective sound enhancement
WO2004077407A1 (en) Estimation of noise in a speech signal
US20070154031A1 (en) System and method for utilizing inter-microphone level differences for speech enhancement
CN110706719B (en) Voice extraction method and device, electronic equipment and storage medium
US20110307251A1 (en) Sound Source Separation Using Spatial Filtering and Regularization Phases
JP2010112996A (en) Voice processing device, voice processing method and program
CN110428851B (en) Beam forming method and device based on microphone array and storage medium
CN110503940B (en) Voice enhancement method and device, storage medium and electronic equipment
JP4816711B2 (en) Call voice processing apparatus and call voice processing method
CN110610718A (en) Method and device for extracting expected sound source voice signal
WO2022256577A1 (en) A method of speech enhancement and a mobile computing device implementing the method
Morita et al. Robust voice activity detection based on concept of modulation transfer function in noisy reverberant environments
CN110970051A (en) Voice data acquisition method, terminal and readable storage medium
CN111540365B (en) Voice signal determination method, device, server and storage medium
JP2003533109A (en) Receiving system for multi-sensor antenna
CN111883153B (en) Microphone array-based double-end speaking state detection method and device
CN113658579A (en) Audio signal processing method and device, electronic equipment and readable storage medium
JP6182169B2 (en) Sound collecting apparatus, method and program thereof
CN113660578B (en) Directional pickup method and device with adjustable pickup angle range for double microphones
WO2018214296A1 (en) Noise reduction method, device, terminal, and computer storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant