CN112037825A - Audio signal processing method and device and storage medium - Google Patents

Audio signal processing method and device and storage medium Download PDF

Info

Publication number
CN112037825A
CN112037825A CN202010797931.4A CN202010797931A CN112037825A CN 112037825 A CN112037825 A CN 112037825A CN 202010797931 A CN202010797931 A CN 202010797931A CN 112037825 A CN112037825 A CN 112037825A
Authority
CN
China
Prior art keywords
audio
signal
channel
target channel
energy
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010797931.4A
Other languages
Chinese (zh)
Other versions
CN112037825B (en
Inventor
何梦楠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Xiaomi Pinecone Electronic Co Ltd
Original Assignee
Beijing Xiaomi Pinecone Electronic Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Xiaomi Pinecone Electronic Co Ltd filed Critical Beijing Xiaomi Pinecone Electronic Co Ltd
Priority to CN202010797931.4A priority Critical patent/CN112037825B/en
Publication of CN112037825A publication Critical patent/CN112037825A/en
Application granted granted Critical
Publication of CN112037825B publication Critical patent/CN112037825B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B20/00Signal processing not specific to the method of recording or reproducing; Circuits therefor
    • G11B20/10Digital recording or reproducing
    • G11B20/10527Audio or video recording; Data buffering arrangements
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B20/00Signal processing not specific to the method of recording or reproducing; Circuits therefor
    • G11B20/10Digital recording or reproducing
    • G11B20/10009Improvement or modification of read or write signals
    • G11B20/10046Improvement or modification of read or write signals filtering or equalising, e.g. setting the tap weights of an FIR filter
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B20/00Signal processing not specific to the method of recording or reproducing; Circuits therefor
    • G11B20/10Digital recording or reproducing
    • G11B20/10009Improvement or modification of read or write signals
    • G11B20/10481Improvement or modification of read or write signals optimisation methods
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B20/00Signal processing not specific to the method of recording or reproducing; Circuits therefor
    • G11B20/10Digital recording or reproducing
    • G11B20/10527Audio or video recording; Data buffering arrangements
    • G11B2020/10537Audio or video recording
    • G11B2020/10546Audio or video recording specifically adapted for audio data

Abstract

The disclosure relates to a method and a device for processing an audio signal and a storage medium. Collecting audio signals by using at least two audio collecting channels; wherein the audio acquisition channels comprise: a target channel and an alternate channel; the audio signal collected by the alternative channel is used for carrying out noise filtering on the audio signal collected by the target channel; determining that the signal energy of the target channel meets a preset energy condition according to the signal energy of the acquired audio signal; and when the signal energy of the target channel meets a preset energy condition, switching the audio acquisition channel corresponding to the target channel. Through the technical scheme of the embodiment of the disclosure, a plurality of audio acquisition channels of the electronic equipment can be flexibly applied, and the audio quality is improved.

Description

Audio signal processing method and device and storage medium
Technical Field
The present disclosure relates to signal processing technologies, and in particular, to a method and an apparatus for processing an audio signal, and a storage medium.
Background
With the development of electronic devices, various mobile terminals and intelligent electronic devices gradually have more functions and applications, and therefore, the human-computer interaction function of the electronic devices is also more and more powerful. For audio acquisition, the electronic equipment can have a multi-channel acquisition mode, which is beneficial to noise filtering, echo cancellation, sound source positioning and other functions, and has higher practicability and stability. However, in the multi-channel audio acquisition mode, there may be a phenomenon that a part of channels break due to an excessive volume of a sound source, and the sound quality effect is poor.
Disclosure of Invention
The disclosure provides a method and a device for processing an audio signal and a storage medium.
According to a first aspect of the embodiments of the present disclosure, there is provided a method for processing an audio signal, including:
collecting audio signals by using at least two audio collecting channels; wherein the audio acquisition channels comprise: a target channel and an alternate channel; the audio signal collected by the alternative channel is used for carrying out noise filtering on the audio signal collected by the target channel;
determining that the signal energy of the target channel meets a preset energy condition according to the signal energy of the acquired audio signal;
and when the signal energy of the target channel meets a preset energy condition, switching the audio acquisition channel corresponding to the target channel.
In some embodiments, the switching the audio acquisition channel corresponding to the target channel when the signal energy of the target channel satisfies a preset energy condition includes:
and if the signal energy of the target channel exceeds the product of the signal energy of the alternative channel and a preset reference threshold, switching the audio acquisition channel corresponding to the target channel.
In some embodiments, the switching the audio acquisition channel corresponding to the target channel if the signal energy of the target channel exceeds a product of the signal energy of the alternative channel and a preset reference threshold includes:
and if the signal energy of the target channel exceeds the product of the signal energy of the alternative channel and a preset reference threshold value in at least N frames of the time domain, switching the audio acquisition channel corresponding to the target channel, wherein N is an integer greater than or equal to 1, and the signal energy is the energy of the audio signal in a preset frequency band.
In some embodiments, the switching the audio acquisition channel corresponding to the target channel when the signal energy of the target channel satisfies a preset energy condition includes:
if the signal energy of the target channel meets a preset energy condition, determining the cross-correlation coefficient between the audio signals acquired by the at least two audio acquisition channels and a preset reference signal;
and if the cross-correlation coefficients of the audio signals acquired by the at least two audio acquisition channels and the preset reference signal meet a preset correlation condition, switching the audio acquisition channel corresponding to the target channel.
In some embodiments, the switching the audio acquisition channel corresponding to the target channel if the cross-correlation coefficients of the audio signals acquired by the at least two audio acquisition channels and the reference signal satisfy a preset correlation condition includes:
if the cross-correlation coefficient between the audio signal acquired by the target channel and the preset reference signal is smaller than a first correlation threshold value and the cross-correlation coefficient between the audio signal acquired by the alternative channel and the preset reference signal is larger than a second correlation threshold value in at least M frames of a time domain, switching the audio acquisition channel corresponding to the target channel; wherein M is a positive integer greater than or equal to 1.
In some embodiments, the determining the cross-correlation coefficient between the audio signals acquired by the at least two audio acquisition channels and a preset reference signal comprises:
determining first autocorrelation coefficients of audio signals acquired by the at least two audio acquisition channels;
determining a second autocorrelation coefficient of the reference signal; the reference signal is an audio signal sent by the electronic equipment where the at least two audio acquisition channels are located;
and determining the cross-correlation coefficient of the audio signal acquired by each audio acquisition channel and the reference signal according to the first autocorrelation coefficient and the second autocorrelation coefficient of each audio acquisition channel.
In some embodiments, the method further comprises:
acquiring frequency domain noisy signals corresponding to the audio signals acquired by the at least two audio acquisition channels;
determining the power spectral densities of the audio signals acquired by the at least two audio acquisition channels at each frequency point according to the frequency domain noisy signals;
and determining the signal energy according to the weighted sum of the power spectral density of the audio signal at each frequency point according to a preset weight.
According to a second aspect of the embodiments of the present disclosure, there is provided an apparatus for processing an audio signal, including:
the acquisition module is used for acquiring audio signals by utilizing at least two audio acquisition channels; wherein the audio acquisition channels comprise: a target channel and an alternate channel; the audio signal collected by the alternative channel is used for carrying out noise filtering on the audio signal collected by the target channel;
the first determining module is used for determining that the signal energy of the target channel meets a preset energy condition according to the signal energy of the collected audio signal;
and the switching module is used for switching the audio acquisition channel corresponding to the target channel when the signal energy of the target channel meets a preset energy condition.
In some embodiments, the switching module comprises:
and the first switching submodule is used for switching the audio acquisition channel corresponding to the target channel if the signal energy of the target channel exceeds the product of the signal energy of the alternative channel and a preset reference threshold.
In some embodiments, the first switching submodule is specifically configured to:
and if the signal energy of the target channel exceeds the product of the signal energy of the alternative channel and a preset reference threshold value in at least N frames of the time domain, switching the audio acquisition channel corresponding to the target channel, wherein N is an integer greater than or equal to 1, and the signal energy is the energy of the audio signal in a preset frequency band.
In some embodiments, the switching module comprises:
the determining submodule is used for determining the cross-correlation coefficient between the audio signals acquired by the at least two audio acquisition channels and a preset reference signal if the signal energy of the target channel meets a preset energy condition;
and the second switching submodule is used for switching the audio acquisition channel corresponding to the target channel if the cross-correlation coefficient between the audio signals acquired by the at least two audio acquisition channels and the preset reference signal meets a preset correlation condition.
In some embodiments, the second switching submodule is specifically configured to:
if the cross-correlation coefficient between the audio signal acquired by the target channel and the preset reference signal is smaller than a first correlation threshold value and the cross-correlation coefficient between the audio signal acquired by the alternative channel and the preset reference signal is larger than a second correlation threshold value in at least M frames of a time domain, switching the audio acquisition channel corresponding to the target channel; wherein M is a positive integer greater than or equal to 1.
In some embodiments, the determining sub-module comprises:
a first determining unit, configured to determine first autocorrelation coefficients of the audio signals acquired by the at least two audio acquisition channels;
a second determining unit for determining a second autocorrelation coefficient of the reference signal; the reference signal is an audio signal sent by the electronic equipment where the at least two audio acquisition channels are located;
and the third determining unit is used for determining the cross-correlation coefficient between the audio signal acquired by each audio acquisition channel and the reference signal according to the first autocorrelation coefficient and the second autocorrelation coefficient of each audio acquisition channel.
In some embodiments, the apparatus further comprises:
the acquisition module is used for acquiring frequency domain noisy signals corresponding to the audio signals acquired by the at least two audio acquisition channels;
the second determining module is used for determining the power spectral densities of the audio signals acquired by the at least two audio acquisition channels at each frequency point according to the frequency domain noisy signals;
and the third determining module is used for determining the signal energy according to the weighted sum of the power spectral densities of the audio signals at all the frequency points according to the preset weight.
According to a third aspect of the present disclosure, there is provided an apparatus for processing an audio signal, the apparatus comprising at least: a processor and a memory for storing executable instructions operable on the processor, wherein:
the processor is configured to execute the executable instructions, and the executable instructions perform the steps of any of the audio signal processing methods.
According to a fourth aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium having stored therein computer-executable instructions that, when executed by a processor, implement the steps in any of the above-described methods of processing an audio signal.
The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects: through the technical scheme of the embodiment of the disclosure, whether the target channel meets the requirement or not is judged according to the signal energy of the audio signal collected by each audio collecting channel, and then the switching of the target channel is realized. Therefore, the target channel can be switched when the signal energy of the audio signal collected by the target channel cannot meet the requirement, and the signal quality of the target channel is improved. For example, when the signal energy of the target channel is too large to cause the sound breaking phenomenon, the target channel can be switched to reduce the sound breaking phenomenon; when the target channel is far away from the sound source and cannot collect clear audio signals, the target channel can be switched so as to collect clear audio signals conveniently; for another example, when the signal energy is not distributed in the predetermined frequency band range according to the requirement, the target channel is switched, so that the audio signal acquired by the target channel meets the requirement of the electronic device on the audio signal as a whole as much as possible.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
FIG. 1 is a first flowchart illustrating a method of processing an audio signal according to an exemplary embodiment;
FIG. 2 is a flowchart II illustrating a method of processing an audio signal according to an exemplary embodiment;
FIG. 3 is a schematic diagram illustrating a terminal with two audio acquisition channels in accordance with an exemplary embodiment;
fig. 4 is a flowchart three illustrating a method of processing an audio signal according to an exemplary embodiment;
fig. 5 is a block diagram illustrating a configuration of an audio signal processing apparatus according to an exemplary embodiment;
fig. 6 is a block diagram illustrating a physical structure of an audio signal processing apparatus according to an exemplary embodiment.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.
Fig. 1 is a flowchart illustrating a method of processing an audio signal according to an exemplary embodiment, as shown in fig. 1, the method including the steps of:
s101, collecting audio signals by using at least two audio collecting channels; wherein the audio acquisition channels comprise: a target channel and an alternate channel; the audio signal collected by the alternative channel is used for carrying out noise filtering on the audio signal collected by the target channel;
step S102, determining that the signal energy of a target channel meets a preset energy condition according to the signal energy of the collected audio signal;
and S103, when the signal energy of the target channel meets a preset energy condition, switching the audio acquisition channel corresponding to the target channel.
The method of the embodiment of the present disclosure may be applied to a terminal, where the terminal may be an electronic device having an audio acquisition component (e.g., having a microphone), and the method includes: the mobile phone, the notebook computer, the video camera, the wearable electronic equipment and various electronic equipment with human-computer interaction capability. The electronic device may also be an electronic device having an audio file processing function, such as a computer and a sound device that do not have an audio capture function but can process an audio file.
The audio signal is processed to obtain higher signal quality and most of the noise of the audio signal is filtered. In the embodiment of the present disclosure, the terminal may have at least two audio capturing channels located at different positions, wherein at least one audio capturing channel may be provided as the target channel. The audio signal collected by the target channel can be used as the audio signal to be processed. And carrying out noise reduction treatment on the audio signal acquired by the target channel, namely filtering the noise to obtain the noise-reduced audio signal, and then, playing, transmitting or storing the noise-reduced audio signal.
The alternative channel may be a channel that assists the target channel in noise filtering. In addition, the alternative channel can be used as an auxiliary target channel to realize the functions of sound source signal separation, sound source position judgment and the like.
In the embodiment of the present disclosure, since the audio signal acquired by the target channel is the target signal, the audio signal acquired by the target channel needs to have higher signal quality. If the quality of the audio signal collected by the current target signal is poor, for example, a sound breaking phenomenon is easily generated, the volume is too low, or the noise is large, the target channel needs to be switched to improve the signal quality of the target signal.
Therefore, in the embodiment of the present disclosure, whether the signal energy satisfies the preset energy condition is determined by detecting the signal energy of the target channel, and if the preset energy condition is satisfied, the audio acquisition channel corresponding to the target channel may be switched, so as to find a more suitable audio acquisition channel as the target channel.
In an embodiment, when the terminal is powered on or a collection function of the audio collection channel is just started, the target channel and the alternative channel may be determined according to default settings of the system, for example, there is one audio collection channel above and below the mobile phone, and since the audio collection channel below is closer to the sound source during a general call, the audio collection channel below may be set as the default target channel, and the collection channel above is the default alternative channel.
In another embodiment, the target channel may also be determined according to the terminal posture detection, for example, the audio acquisition channel with the lowest physical height may be determined as the target channel according to the terminal posture detected by the posture sensor.
Of course, the initial target channel and the alternate channels may also be randomly identified. And then judging whether a target channel and an alternative channel need to be switched or not according to the signal energy of each audio acquisition channel in the audio acquisition process.
Here, the preset energy condition may be that the signal energy is outside a predetermined energy range, for example, the signal energy is lower than a minimum threshold value of the predetermined energy range, or the signal energy is higher than a maximum threshold value of the predetermined energy range. If the signal energy is within the preset energy range, the audio signal collected by the target channel may not have a sound breaking phenomenon, and the volume is clear; if the signal energy is lower than the minimum threshold, the sound volume may be too small to be heard by the user, and if the signal energy is higher than the maximum threshold, the sound volume may be too large to easily generate a sound breaking phenomenon, or easily influence the hearing noise of the user. It should be noted that the predetermined energy range may be set corresponding to a common range of human voice, or a comfortable range of human hearing, etc.
Of course, the above-mentioned case of too low signal energy may be that the environment in which the audio acquisition channel is located is inherently quiet and no sound source is sounding. Therefore, the preset energy condition may also include: the signal energy of the audio signal acquired by the target channel is below a predetermined minimum threshold and the signal energy of the audio signal acquired by the alternative channel is within a predetermined energy range. In this way, the target channel may also be switched when the target channel fails or is too far from the sound source. If the signal energy of the audio signal collected by the target channel and the signal energy of the audio signal collected by the alternative channel are both within the predetermined energy range or both are lower than the predetermined minimum threshold, the target channel may not be switched, and the current setting of the target channel is maintained.
Therefore, by the method, the target channel can be switched in real time through the energy detection of the audio signals collected by the audio collecting channels, so that the clear audio signals with proper volume can be conveniently obtained, and the quality of the whole audio signals is improved.
In some embodiments, the switching the audio acquisition channel corresponding to the target channel when the signal energy of the target channel satisfies a preset energy condition includes:
and if the signal energy of the target channel exceeds the product of the signal energy of the alternative channel and a preset reference threshold, switching the audio acquisition channel corresponding to the target channel.
Here, a case where the above-mentioned preset energy condition is satisfied is provided, and whether the above-mentioned preset energy condition is satisfied is determined by a relationship between the signal energy of the target channel and the signal energy of the alternative channel. Here, the preset reference threshold may be determined according to the position relationship between the target channel and the candidate channel, for example, if the target channel and the candidate channel are closer, the energy between the two collected audio signals should be closer under normal conditions, and therefore, the preset reference threshold may be set at about 1. If the target channel is farther away from the candidate channel and the target channel is closer to the sound source, the signal energy of the audio signal collected by the target channel may be higher than that of the candidate channel under normal conditions, and then the preset reference threshold may be set to be larger, for example, larger than 2.
In addition, the preset reference threshold may also be determined according to the influence of the actual signal energy on the target channel on the sound quality effect, for example, in a general call scenario, if the signal energy of the target channel is greater than 3 times of the signal energy of the alternative channel, a sound breaking phenomenon may be generated, and a large influence may be brought to the sound quality effect. Therefore, the above-mentioned preset reference threshold value may be set to a value less than or equal to 3.
Therefore, the relation of signal energy among the signal channels is judged according to the preset reference threshold value, whether the target channel needs to be switched or not can be judged quickly, and the audio signal is kept to have a good tone quality effect as far as possible by switching the target channel.
In some embodiments, the switching the audio acquisition channel corresponding to the target channel if the signal energy of the target channel exceeds a product of the signal energy of the alternative channel and a preset reference threshold includes:
and if the signal energy of the target channel exceeds the product of the signal energy of the alternative channel and a preset reference threshold value in at least N frames of the time domain, switching the audio acquisition channel corresponding to the target channel, wherein N is an integer greater than or equal to 1, and the signal energy is the energy of the audio signal in a preset frequency band.
Since glitches may occur in the audio signal collected in the target channel due to environmental factors or signal interference of the device itself during audio collection. The glitches are of very short duration and are not easily noticeable, so that the target channel does not need to be switched at this time. However, if the audio signal collected by the target channel is repeated many times or the signal energy is continuously too large, the sound quality of the audio signal may be greatly affected.
Therefore, in the embodiment of the present disclosure, the target channel may be switched again when the number of signal frames in which the signal energy of the target channel exceeds the product of the signal energy of the alternative channel and the preset reference threshold is greater than or equal to N frames.
It should be noted that if the value of N is large, the switching of the target channel may not be sensitive enough, but unnecessary repeated switching may be reduced, and if the value of N is small, the switching of the target channel is more sensitive, but repeated switching may be caused by glitches occurring in the audio signals acquired by each audio acquisition channel. Therefore, in practical applications, the value of N may be set according to the requirement for audio signal acquisition or the actual audio acquisition environment, which is not limited herein.
In some embodiments, as shown in fig. 2, in the step S103, when the signal energy of the target channel satisfies a preset energy condition, switching the audio capture channel corresponding to the target channel includes:
step S201, if the signal energy of the target channel meets a preset energy condition, determining the cross-correlation coefficient between the audio signals acquired by the at least two audio acquisition channels and a preset reference signal;
step S202, if the cross correlation coefficients of the audio signals acquired by the at least two audio acquisition channels and the preset reference signal meet a preset correlation condition, switching the audio acquisition channel corresponding to the target channel.
Since the target channel and the alternative channel may be far away from each other, there is a large difference between the acquired signal energies, and if the target channel is switched according to the signal energy alone, it may be difficult to acquire an audio signal of a sufficient volume when switching to the target channel because the alternative channel is far away from the sound source.
Therefore, in the embodiment of the present disclosure, when the signal energy of the target channel satisfies the preset energy condition, whether to switch the target channel is further determined according to correlations between the audio signals acquired by the target channel and the alternative channel respectively and the preset reference signal.
The preset reference signal may be an audio signal sent by the terminal itself, for example, when the terminal is in a call with a far end, the terminal receives an audio signal played by a voice signal of the other party, including an audio signal of the other party in a hands-free state. As another example, background music played by an application program running on the terminal itself, or an audio signal sent by the terminal system, etc.
Since the preset reference signal is a signal that the audio capturing channel should be able to capture, if the correlation between the audio signal captured by a part of the audio capturing channels and the preset reference signal is low, and the correlation between the audio signal captured by the other audio capturing channels and the preset reference signal is high, it is indicated that the audio capturing channel with low correlation may capture a large noise or the audio capturing channel may malfunction.
Therefore, in the embodiment of the present disclosure, when the signal energy of the target channel satisfies the preset energy condition, it is determined whether the correlation between the audio signal distribution acquired by the target channel based on the candidate channel and the preset reference signal satisfies the correlation condition, and the correlation condition is used as a trigger condition for switching the target channel.
Therefore, the target channel can be accurately detected to have sound breaking and have large noise, and the target channel is switched, so that the audio acquisition channel with good tone quality for acquiring audio signals is used as the target channel.
In some embodiments, the switching the audio acquisition channel corresponding to the target channel if the cross-correlation coefficients of the audio signals acquired by the at least two audio acquisition channels and the reference signal satisfy a preset correlation condition includes:
if the cross-correlation coefficient between the audio signal acquired by the target channel and the preset reference signal is smaller than a first correlation threshold value and the cross-correlation coefficient between the audio signal acquired by the alternative channel and the preset reference signal is larger than a second correlation threshold value in at least M frames of a time domain, switching the audio acquisition channel corresponding to the target channel; wherein M is a positive integer greater than or equal to 1.
Similarly to N in the above-described embodiment, here, the number of frames satisfying the above-described correlation condition may be set, and if the above-described correlation condition is satisfied beyond M frames, it is determined that the switching of the target channel is necessary. The setting of M may also be adjusted according to actual requirements, and is not limited herein.
The first correlation threshold and the second correlation threshold may be the same value or different values. If the cross-correlation coefficient between the audio signal acquired by the target channel and the preset reference signal is smaller than the first correlation threshold, the preset reference signal is weaker in the audio signal acquired by the target channel; and if the cross-correlation coefficient of the audio signal acquired by the alternative channel and the preset reference signal is greater than the second correlation threshold value, indicating that the preset reference signal in the audio signal acquired by the alternative channel is stronger. At this time, the target channel may receive a large amount of noise interference, or the target channel may have a fault, and therefore, it may be determined that the target channel needs to be switched.
In some embodiments, the determining the cross-correlation coefficient between the audio signals acquired by the at least two audio acquisition channels and a preset reference signal comprises:
determining first autocorrelation coefficients of audio signals acquired by the at least two audio acquisition channels;
determining a second autocorrelation coefficient of the reference signal; the reference signal is an audio signal sent by the electronic equipment where the at least two audio acquisition channels are located;
and determining the cross-correlation coefficient of the audio signal acquired by each audio acquisition channel and the reference signal according to the first autocorrelation coefficient and the second autocorrelation coefficient of each audio acquisition channel.
The autocorrelation coefficients of the audio signal represent the smoothness of the signal. The cross-correlation coefficient of the audio signal acquired by the audio acquisition channel and the second autocorrelation coefficient of the reference signal can be calculated based on the two autocorrelation coefficients. Here, the plurality of audio acquisition channels includes a current target channel and an alternative channel.
The first autocorrelation coefficients of the audio acquisition channels may be determined by the following equation (1):
S1(k,l)=gamma*S1(k,l-1)+(1-gamma)*real(X1(k,l).*conj(X1(k,l))) (1)
s1(k, l) represents a first autocorrelation coefficient of an acquired audio signal of any audio acquisition channel at the kth frequency point of the l frame, gamma is a smoothing factor, and takes a value between 0 and 1, for example, gamma is 0.8, X1(k, l) is a frequency domain signal of the audio acquisition channel at the kth frequency point of the l frame, conj represents taking a conjugate, and real represents taking a real part.
The second autocorrelation coefficient of the reference signal may be determined by the following equation (2):
Sfar(k,l)=gamma*Sfar(k,l-1)+(1-gamma)*real(Xfar(k,l).*conj(Xfar(k,l))) (2)
wherein Sfar (k, l) represents a second autocorrelation coefficient of the reference signal at the kth frequency point of the l frame, and Xfar (k, l) is a frequency domain signal of the reference signal at the kth frequency point of the l frame.
Based on the first autocorrelation coefficient and the second autocorrelation coefficient, the cross-correlation coefficient between the audio signal acquired by the audio acquisition channel and the reference signal may be calculated according to the following formula (3) and formula (4).
Sfar1(k,l)=gamma*Sfar1(k,l-1)+(1-gamma)*real(Xfar(k,l).*conj(X1(k,l))) (3)
cohed(k,l)=real(Sfar1(k,l)*conj(Sfar1(k,l))/(Sfar(k,l)*S1(k,l)) (4)
Wherein Sfar1(k, l) is an intermediate variable, and cohed (k, l) is a cross-correlation coefficient.
In some embodiments, the method further comprises:
acquiring frequency domain noisy signals corresponding to the audio signals acquired by the at least two audio acquisition channels;
determining the power spectral densities of the audio signals acquired by the at least two audio acquisition channels at each frequency point according to the frequency domain noisy signals;
and determining the signal energy according to the weighted sum of the power spectral density of the audio signal at each frequency point according to a preset weight.
In the embodiment of the present disclosure, the audio signal collected by the audio collecting channel may be subjected to time-frequency conversion, and the time-domain signal may be converted into a frequency-domain signal. And then calculating the power spectral density of the frequency domain noisy signal. The power spectral density is the statistics of the power of the frequency domain signal with noise at each frequency point, so that the signal energy can be obtained by weighting the value of the power spectral density at each frequency point according to the preset weight. The weight here may be a smoothing factor, and may take a value between 0 and 1, for example, a value of 0.8.
By the method, the signal energy and the data of the correlation can be obtained by performing operation processing on the audio signals acquired by the audio acquisition channel, and then the comparison is performed to judge whether the target channel needs to be switched or not, so that in the audio acquisition process, the appropriate audio acquisition channel can be automatically switched in real time to serve as the target channel, and the acquired signal quality is improved.
The disclosed embodiments also provide the following examples:
when the mobile terminal performs audio acquisition, for example, during a call of a mobile phone, if a voice is spoken loudly, a phenomenon of sound breaking or distortion may occur. In the embodiment of the present disclosure, at least two audio capturing channels of the mobile terminal, such as the mobile phone shown in fig. 3, have audio capturing channels mic1 and mic2 to capture audio signals. In the acquisition process, the microphones used for acquiring the audio signals and providing audio to be processed are switched by taking the energy detection, the correlation of the individual channel signals and the correlation of the far-end signals as judgment bases, so that the definition of the audio signals is improved, and the phenomenon of sound breaking is reduced.
In the embodiment of the present disclosure, the processing may be performed by the steps shown in fig. 4:
s401, collecting audio signals by using two audio collecting channels;
here, the audio acquisition channel mic1 may be used as a main channel, i.e., the target channel, for performing noise reduction processing on the acquired audio signal to obtain a target audio signal. The audio acquisition channel mic2 is used as an alternative channel for assisting a target channel in noise cancellation, that is, assisting in noise reduction of an audio signal acquired by the mic 1.
The audio signal collected by the mic1 is X1, the audio signal collected by the mic2 is X2, and after frame division windowing and fourier transform are respectively performed, the frequency spectrums X1(k, l) and X2(k, l) of the kth frequency point of the l frame are obtained.
When the mic1d signal energy is too large, a sound breaking phenomenon is easily generated, so that the human ear feels harsh, and therefore, the quality of the final audio signal can be improved by switching the target channel. That is, the final signal quality can be improved by switching to the mic2 as the target channel and the mic1 as the candidate channel.
S402, carrying out energy detection on the audio signals acquired by the two audio acquisition channels;
the energy detection may utilize the following equations (5) and (6):
Mic1_sp=∑a*Mic1_sp(k,l-1)+(1-a)*P1(k,l) (5)
Mic2_sp=∑a*Mic2_sp(k,l-1)+(1-a)*P2(k,l) (6)
where, Mic1_ sp and Mic2_ sp are the signal energies of Mic1 and Mic2, respectively, P1(k, l) and P2(k, l) are the power spectrums of Mic1 and Mic2, respectively, and a is a smoothing factor, which can be set to a value of 0.8. And carrying out weighted summation on the power spectrum of each frequency point to obtain signal energy. It should be noted that, here, the summation may be performed only in a partial frequency band, for example, the two audio acquisition channels have a large difference in the frequency band range from 1kHz (kilohertz) to 4kHz, and therefore, the above summation may be performed on the power spectrum of the frequency point k belonging to 1kHz to 4 kHz.
Here, a detection threshold Eth may be set as a threshold when the energy satisfies the switching condition. And if the signal energy meets the condition of the detection threshold, continuing to calculate the correlation coefficient.
Here, whether the signal energy satisfies the condition of the detection threshold may be judged using the following equation (7):
Mic1_sp>Eth*Mic2_sp (7)
that is, when the signal energy of the mic1 is greater than the product of the signal energy of the mic2 and the detection threshold, the condition of the detection threshold is satisfied, and the calculation of the correlation coefficient may be continued. Of course, before determining whether the signal energy satisfies the condition of the detection threshold, the calculation of the correlation coefficient may be directly performed, and finally, the signal energy and the correlation coefficient are combined to determine whether to switch the target channel.
If whether the target channel is switched or not is judged only according to the signal energy, the count _ choose can be set as the number of frames which are accumulated to exceed the threshold of the energy, for example, the threshold of switching can be set as 5 frames, when the count _ choose is greater than or equal to 5, the target channel is switched, the mic2 is used as the target channel, and the mic1 is used as the alternative channel.
Step S403, calculating a correlation coefficient of the audio signal of the audio acquisition channel;
the correlation coefficient is considered because the selection of the target channel is also related to echo cancellation, i.e. the audio collected by the audio collecting channel and emitted by the terminal itself, for example, the voice of the other party during the call, i.e. far speaking. In the disclosed embodiments, the audio signal generated by telephoning is represented by a reference signal.
Here, the cross-correlation coefficient between each audio signal and the reference signal is calculated in consideration of the correlation between the audio signal acquired by each audio acquisition channel and the reference signal.
The autocorrelation coefficients of each audio acquisition channel can be determined by equation (1) in the above embodiment:
S1(k,l)=gamma*S1(k,l-1)+(1-gamma)*real(X1(k,l).*conj(X1(k,l))) (1)
s1(k, l) represents an autocorrelation coefficient of an acquired audio signal of any audio acquisition channel at the kth frequency point of the frame l, gamma is a smoothing factor, and takes a value between 0 and 1, for example, gamma is 0.8, X1(k, l) is a frequency domain signal of the audio acquisition channel at the kth frequency point of the frame l, conj represents taking a conjugate, and real represents taking a real part.
The autocorrelation coefficient of the reference signal can be determined by equation (2) in the above embodiment:
Sfar(k,l)=gamma*Sfar(k,l-1)+(1-gamma)*real(Xfar(k,l).*conj(Xfar(k,l))) (2)
wherein Sfar (k, l) represents a second autocorrelation coefficient of the reference signal at the kth frequency point of the l frame, and Xfar (k, l) is a frequency domain signal of the reference signal at the kth frequency point of the l frame.
The cross-correlation coefficient between the audio signal collected by the audio collection channel and the reference signal is calculated according to formula (3) and formula (4) in the above embodiments.
Sfar1(k,l)=gamma*Sfar1(k,l-1)+(1-gamma)*real(Xfar(k,l).*conj(X1(k,l))) (3)
cohed(k,l)=real(Sfar1(k,l)*conj(Sfar1(k,l))/(Sfar(k,l)*S1(k,l)) (4)
Wherein Sfar1(k, l) is an intermediate variable, and cohed (k, l) is a cross-correlation coefficient.
It should be noted that the calculation of the correlation coefficient may be performed at all frequency points, that is, the correlation coefficient is expressed at all frequency points of k.
And S404, judging whether to switch the target channel by combining energy detection and the correlation coefficient.
Here, it may be determined whether the correlation coefficient satisfies the threshold condition using the following equation (8):
cohed1<Ethrod1&&cohed2>Ethrod2 (8)
wherein, the cohed1 and cohed2 are cross-correlation coefficients of the mic1 and the mic2 with the reference signal, respectively. Ethrod1 is a threshold for the cross-correlation coefficient of mic1 with the reference signal, and Ethrod2 is a threshold for the cross-correlation coefficient of mic2 with the reference signal. When one frame meets the above conditions, the counter can be used for counting, and when the count value is greater than the preset threshold value, the target channel is switched. That is, the count _ choose may be set to be the number of frames accumulated to satisfy the above condition, for example, the threshold of switching may be set to be 5 frames, and when the count _ choose is greater than or equal to 5, the target channel is switched, the mic2 is used as the target channel, and the mic1 is used as the candidate channel.
Through the technical scheme of the embodiment of the disclosure, the target channel is automatically switched by utilizing the energy detection and the calculation of the correlation coefficient, so that the sound breaking phenomenon caused by overlarge sound can be well reduced, and the definition of the voice signal is improved.
Fig. 5 is a block diagram illustrating a configuration of an apparatus for processing an audio signal according to an exemplary embodiment, and as shown in fig. 5, the apparatus 500 includes:
an acquisition module 501, configured to acquire an audio signal by using at least two audio acquisition channels; wherein the audio acquisition channels comprise: a target channel and an alternate channel; the audio signal collected by the alternative channel is used for carrying out noise filtering on the audio signal collected by the target channel;
a first determining module 502, configured to determine, according to signal energy of an acquired audio signal, that signal energy of a target channel meets a preset energy condition;
the switching module 503 is configured to switch the audio acquisition channel corresponding to the target channel when the signal energy of the target channel meets a preset energy condition.
In some embodiments, the switching module comprises:
and the first switching submodule is used for switching the audio acquisition channel corresponding to the target channel if the signal energy of the target channel exceeds the product of the signal energy of the alternative channel and a preset reference threshold.
In some embodiments, the first switching submodule is specifically configured to:
and if the signal energy of the target channel exceeds the product of the signal energy of the alternative channel and a preset reference threshold value in at least N frames of the time domain, switching the audio acquisition channel corresponding to the target channel, wherein N is an integer greater than or equal to 1, and the signal energy is the energy of the audio signal in a preset frequency band.
In some embodiments, the switching module comprises:
the determining submodule is used for determining the cross-correlation coefficient between the audio signals acquired by the at least two audio acquisition channels and a preset reference signal if the signal energy of the target channel meets a preset energy condition;
and the second switching submodule is used for switching the audio acquisition channel corresponding to the target channel if the cross-correlation coefficient between the audio signals acquired by the at least two audio acquisition channels and the preset reference signal meets a preset correlation condition.
In some embodiments, the second switching submodule is specifically configured to:
if the cross-correlation coefficient between the audio signal acquired by the target channel and the preset reference signal is smaller than a first correlation threshold value and the cross-correlation coefficient between the audio signal acquired by the alternative channel and the preset reference signal is larger than a second correlation threshold value in at least M frames of a time domain, switching the audio acquisition channel corresponding to the target channel; wherein M is a positive integer greater than or equal to 1.
In some embodiments, the determining sub-module comprises:
a first determining unit, configured to determine first autocorrelation coefficients of the audio signals acquired by the at least two audio acquisition channels;
a second determining unit for determining a second autocorrelation coefficient of the reference signal; the reference signal is an audio signal sent by the electronic equipment where the at least two audio acquisition channels are located;
and the third determining unit is used for determining the cross-correlation coefficient between the audio signal acquired by each audio acquisition channel and the reference signal according to the first autocorrelation coefficient and the second autocorrelation coefficient of each audio acquisition channel.
In some embodiments, the apparatus further comprises:
the acquisition module is used for acquiring frequency domain noisy signals corresponding to the audio signals acquired by the at least two audio acquisition channels;
the second determining module is used for determining the power spectral densities of the audio signals acquired by the at least two audio acquisition channels at each frequency point according to the frequency domain noisy signals;
and the third determining module is used for determining the signal energy according to the weighted sum of the power spectral densities of the audio signals at all the frequency points according to the preset weight. With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
Fig. 6 is a block diagram illustrating an apparatus 600 for processing an audio signal according to an exemplary embodiment. For example, the apparatus 600 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and so forth.
Referring to fig. 6, apparatus 600 may include one or more of the following components: a processing component 601, a memory 602, a power component 603, a multimedia component 604, an audio component 605, an input/output (I/O) interface 606, a sensor component 607, and a communication component 608.
The processing component 601 generally controls the overall operation of the device 600, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing component 601 may include one or more processors 610 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 601 may also include one or more modules that facilitate interaction between the processing component 601 and other components. For example, the processing component 601 may include a multimedia module to facilitate interaction between the multimedia component 604 and the processing component 601.
The memory 610 is configured to store various types of data to support operations at the apparatus 600. Examples of such data include instructions for any application or method operating on the apparatus 600, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 602 may be implemented by any type or combination of volatile or non-volatile storage devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
The power supply component 603 provides power to the various components of the device 600. The power supply component 603 may include: a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the device 600.
The multimedia component 604 includes a screen that provides an output interface between the device 600 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 604 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the device 600 is in an operating mode, such as a shooting mode or a video mode. Each front camera and/or rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.
Audio component 605 is configured to output and/or input audio signals. For example, audio component 605 includes a Microphone (MIC) configured to receive external audio signals when apparatus 600 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 610 or transmitted via the communication component 608. In some embodiments, audio component 605 also includes a speaker for outputting audio signals.
The I/O interface 606 provides an interface between the processing component 601 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.
The sensor component 607 includes one or more sensors for providing various aspects of status assessment for the apparatus 600. For example, the sensor component 607 may detect the open/closed state of the apparatus 600, the relative positioning of components, such as a display and keypad of the apparatus 600, the sensor component 607 may also detect a change in the position of the apparatus 600 or a component of the apparatus 600, the presence or absence of user contact with the apparatus 600, orientation or acceleration/deceleration of the apparatus 600, and a change in the temperature of the apparatus 600. The sensor component 607 may include a proximity sensor configured to detect the presence of nearby objects in the absence of any physical contact. The sensor component 607 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor component 607 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 608 is configured to facilitate wired or wireless communication between the apparatus 600 and other devices. The apparatus 600 may access a wireless network based on a communication standard, such as WiFi, 2G, or 3G, or a combination thereof. In an exemplary embodiment, the communication component 608 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 608 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, or other technologies.
In an exemplary embodiment, the apparatus 600 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.
In an exemplary embodiment, a non-transitory computer readable storage medium comprising instructions, such as the memory 602 comprising instructions, executable by the processor 610 of the apparatus 600 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
The embodiments of the present disclosure also provide a non-transitory computer-readable storage medium, where instructions in the storage medium, when executed by a processor of a mobile terminal, enable the mobile terminal to perform the method provided in any of the embodiments.
Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.
It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims (16)

1. A method of processing an audio signal, comprising:
collecting audio signals by using at least two audio collecting channels; wherein the audio acquisition channels comprise: a target channel and an alternate channel; the audio signal collected by the alternative channel is used for carrying out noise filtering on the audio signal collected by the target channel;
determining that the signal energy of the target channel meets a preset energy condition according to the signal energy of the acquired audio signal;
and when the signal energy of the target channel meets a preset energy condition, switching the audio acquisition channel corresponding to the target channel.
2. The method according to claim 1, wherein switching the audio acquisition channel corresponding to the target channel when the signal energy of the target channel satisfies a preset energy condition comprises:
and if the signal energy of the target channel exceeds the product of the signal energy of the alternative channel and a preset reference threshold, switching the audio acquisition channel corresponding to the target channel.
3. The method according to claim 2, wherein switching the audio acquisition channel corresponding to the target channel if the signal energy of the target channel exceeds a product of the signal energy of the candidate channel and a preset reference threshold comprises:
and if the signal energy of the target channel exceeds the product of the signal energy of the alternative channel and a preset reference threshold value in at least N frames of the time domain, switching the audio acquisition channel corresponding to the target channel, wherein N is an integer greater than or equal to 1, and the signal energy is the energy of the audio signal in a preset frequency band.
4. The method according to any one of claims 1 to 3, wherein switching the audio capture channel corresponding to the target channel when the signal energy of the target channel satisfies a preset energy condition comprises:
if the signal energy of the target channel meets a preset energy condition, determining the cross-correlation coefficient between the audio signals acquired by the at least two audio acquisition channels and a preset reference signal;
and if the cross-correlation coefficients of the audio signals acquired by the at least two audio acquisition channels and the preset reference signal meet a preset correlation condition, switching the audio acquisition channel corresponding to the target channel.
5. The method according to claim 4, wherein the switching the audio acquisition channel corresponding to the target channel if the cross-correlation coefficient between the audio signals acquired by the at least two audio acquisition channels and the reference signal satisfies a preset correlation condition comprises:
if the cross-correlation coefficient between the audio signal acquired by the target channel and the preset reference signal is smaller than a first correlation threshold value and the cross-correlation coefficient between the audio signal acquired by the alternative channel and the preset reference signal is larger than a second correlation threshold value in at least M frames of a time domain, switching the audio acquisition channel corresponding to the target channel; wherein M is a positive integer greater than or equal to 1.
6. The method of claim 4, wherein the determining cross-correlation coefficients of the audio signals acquired by the at least two audio acquisition channels with a preset reference signal comprises:
determining first autocorrelation coefficients of audio signals acquired by the at least two audio acquisition channels;
determining a second autocorrelation coefficient of the reference signal; the reference signal is an audio signal sent by the electronic equipment where the at least two audio acquisition channels are located;
and determining the cross-correlation coefficient of the audio signal acquired by each audio acquisition channel and the reference signal according to the first autocorrelation coefficient and the second autocorrelation coefficient of each audio acquisition channel.
7. The method of claim 1, further comprising:
acquiring frequency domain noisy signals corresponding to the audio signals acquired by the at least two audio acquisition channels;
determining the power spectral densities of the audio signals acquired by the at least two audio acquisition channels at each frequency point according to the frequency domain noisy signals;
and determining the signal energy according to the weighted sum of the power spectral density of the audio signal at each frequency point according to a preset weight.
8. An apparatus for processing an audio signal, comprising:
the acquisition module is used for acquiring audio signals by utilizing at least two audio acquisition channels; wherein the audio acquisition channels comprise: a target channel and an alternate channel; the audio signal collected by the alternative channel is used for carrying out noise filtering on the audio signal collected by the target channel;
the first determining module is used for determining that the signal energy of the target channel meets a preset energy condition according to the signal energy of the collected audio signal;
and the switching module is used for switching the audio acquisition channel corresponding to the target channel when the signal energy of the target channel meets a preset energy condition.
9. The apparatus of claim 8, wherein the switching module comprises:
and the first switching submodule is used for switching the audio acquisition channel corresponding to the target channel if the signal energy of the target channel exceeds the product of the signal energy of the alternative channel and a preset reference threshold.
10. The apparatus of claim 9, wherein the first switching submodule is specifically configured to:
and if the signal energy of the target channel exceeds the product of the signal energy of the alternative channel and a preset reference threshold value in at least N frames of the time domain, switching the audio acquisition channel corresponding to the target channel, wherein N is an integer greater than or equal to 1, and the signal energy is the energy of the audio signal in a preset frequency band.
11. The apparatus according to any one of claims 8 to 10, wherein the switching module comprises:
the determining submodule is used for determining the cross-correlation coefficient between the audio signals acquired by the at least two audio acquisition channels and a preset reference signal if the signal energy of the target channel meets a preset energy condition;
and the second switching submodule is used for switching the audio acquisition channel corresponding to the target channel if the cross-correlation coefficient between the audio signals acquired by the at least two audio acquisition channels and the preset reference signal meets a preset correlation condition.
12. The apparatus according to claim 11, wherein the second switching submodule is specifically configured to:
if the cross-correlation coefficient between the audio signal acquired by the target channel and the preset reference signal is smaller than a first correlation threshold value and the cross-correlation coefficient between the audio signal acquired by the alternative channel and the preset reference signal is larger than a second correlation threshold value in at least M frames of a time domain, switching the audio acquisition channel corresponding to the target channel; wherein M is a positive integer greater than or equal to 1.
13. The apparatus of claim 12, wherein the determining sub-module comprises:
a first determining unit, configured to determine first autocorrelation coefficients of the audio signals acquired by the at least two audio acquisition channels;
a second determining unit for determining a second autocorrelation coefficient of the reference signal; the reference signal is an audio signal sent by the electronic equipment where the at least two audio acquisition channels are located;
and the third determining unit is used for determining the cross-correlation coefficient between the audio signal acquired by each audio acquisition channel and the reference signal according to the first autocorrelation coefficient and the second autocorrelation coefficient of each audio acquisition channel.
14. The apparatus of claim 8, further comprising:
the acquisition module is used for acquiring frequency domain noisy signals corresponding to the audio signals acquired by the at least two audio acquisition channels;
the second determining module is used for determining the power spectral densities of the audio signals acquired by the at least two audio acquisition channels at each frequency point according to the frequency domain noisy signals;
and the third determining module is used for determining the signal energy according to the weighted sum of the power spectral densities of the audio signals at all the frequency points according to the preset weight.
15. Communication device of a terminal, characterized in that it comprises at least: a processor and a memory for storing executable instructions operable on the processor, wherein:
the processor is configured to execute the executable instructions, and the executable instructions perform the steps in the communication method of the terminal as provided in any one of the preceding claims 1 to 7.
16. A non-transitory computer-readable storage medium, wherein computer-executable instructions are stored in the computer-readable storage medium, and when executed by a processor, implement the steps in the communication method of the terminal provided in any one of claims 1 to 7.
CN202010797931.4A 2020-08-10 2020-08-10 Audio signal processing method and device and storage medium Active CN112037825B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010797931.4A CN112037825B (en) 2020-08-10 2020-08-10 Audio signal processing method and device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010797931.4A CN112037825B (en) 2020-08-10 2020-08-10 Audio signal processing method and device and storage medium

Publications (2)

Publication Number Publication Date
CN112037825A true CN112037825A (en) 2020-12-04
CN112037825B CN112037825B (en) 2022-09-27

Family

ID=73577256

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010797931.4A Active CN112037825B (en) 2020-08-10 2020-08-10 Audio signal processing method and device and storage medium

Country Status (1)

Country Link
CN (1) CN112037825B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112925502A (en) * 2021-02-10 2021-06-08 歌尔科技有限公司 Audio channel switching equipment, method and device and electronic equipment
CN113571038A (en) * 2021-07-14 2021-10-29 北京小米移动软件有限公司 Voice conversation method, device, electronic equipment and storage medium
WO2023103824A1 (en) * 2021-12-06 2023-06-15 华为技术有限公司 Audio channel selection method and apparatus, storage medium and vehicle

Citations (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102969000A (en) * 2012-12-04 2013-03-13 中国科学院自动化研究所 Multi-channel speech enhancement method
KR20130079895A (en) * 2012-01-03 2013-07-11 삼성전자주식회사 Decoding method of audio signal and decoding apparatus thereof
CA2955928A1 (en) * 2014-07-16 2016-01-21 United States Bankruptcy Court For The District Of Utah Apparatus and methods for recording audio and video
CN205123978U (en) * 2015-04-29 2016-03-30 青岛歌尔声学科技有限公司 Audio signal treatment circuit and have electronic equipment of this circuit
US20160142454A1 (en) * 2014-11-14 2016-05-19 Qualcomm Incorporated Multi-channel audio alignment schemes
US20160255453A1 (en) * 2013-07-22 2016-09-01 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Method for processing an audio signal; signal processing unit, binaural renderer, audio encoder and audio decoder
CN106255000A (en) * 2016-07-29 2016-12-21 维沃移动通信有限公司 A kind of audio signal sample method and mobile terminal
CN106303804A (en) * 2016-07-28 2017-01-04 维沃移动通信有限公司 The control method of a kind of mike and mobile terminal
WO2017024778A1 (en) * 2015-08-10 2017-02-16 中兴通讯股份有限公司 Audio frequency adjustment method, terminal device and computer readable storage medium
CN106782614A (en) * 2016-12-26 2017-05-31 广州酷狗计算机科技有限公司 Sound quality detection method and device
CN107170465A (en) * 2017-06-29 2017-09-15 数据堂(北京)科技股份有限公司 A kind of audio quality detection method and audio quality detecting system
US20180301154A1 (en) * 2011-02-03 2018-10-18 Telefonaktiebolaget Lm Ericsson (Publ) Determining the Inter-Channel Time Difference of a Multi-Channel Audio Signal
CN108766457A (en) * 2018-05-30 2018-11-06 北京小米移动软件有限公司 Acoustic signal processing method, device, electronic equipment and storage medium
CN109151699A (en) * 2018-07-26 2019-01-04 Oppo广东移动通信有限公司 Microphone plug-hole detection method and Related product
CN109410975A (en) * 2018-10-31 2019-03-01 歌尔科技有限公司 A kind of voice de-noising method, equipment and storage medium
CN109451415A (en) * 2018-12-17 2019-03-08 深圳Tcl新技术有限公司 Microphone array auto-collation, device, equipment and storage medium
CN109727604A (en) * 2018-12-14 2019-05-07 上海蔚来汽车有限公司 Frequency domain echo cancel method and computer storage media for speech recognition front-ends
US10304468B2 (en) * 2017-03-20 2019-05-28 Qualcomm Incorporated Target sample generation
KR20190085399A (en) * 2018-01-10 2019-07-18 세종대학교산학협력단 Method and apparatus for analyzing characters for determining the authenticity of call reporting
US20190258452A1 (en) * 2018-02-19 2019-08-22 Kabushiki Kaisha Toshiba Audio output system, audio output method, and computer program product
CN110175013A (en) * 2019-05-20 2019-08-27 北京声智科技有限公司 Voice input method, apparatus, electronic equipment and storage medium
US10504541B1 (en) * 2018-06-28 2019-12-10 Invoca, Inc. Desired signal spotting in noisy, flawed environments
CN110648678A (en) * 2019-09-20 2020-01-03 厦门亿联网络技术股份有限公司 Scene identification method and system for conference with multiple microphones
CN110677780A (en) * 2019-09-26 2020-01-10 北京小米移动软件有限公司 Detection method and device of audio input module and storage medium
CN110875054A (en) * 2018-08-31 2020-03-10 阿里巴巴集团控股有限公司 Far-field noise suppression method, device and system
CN111010488A (en) * 2019-12-18 2020-04-14 维沃移动通信有限公司 Audio signal processing method and device and electronic equipment
CN111045633A (en) * 2018-10-12 2020-04-21 北京微播视界科技有限公司 Method and apparatus for detecting loudness of audio signal
US10650840B1 (en) * 2018-07-11 2020-05-12 Amazon Technologies, Inc. Echo latency estimation

Patent Citations (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180301154A1 (en) * 2011-02-03 2018-10-18 Telefonaktiebolaget Lm Ericsson (Publ) Determining the Inter-Channel Time Difference of a Multi-Channel Audio Signal
KR20130079895A (en) * 2012-01-03 2013-07-11 삼성전자주식회사 Decoding method of audio signal and decoding apparatus thereof
CN102969000A (en) * 2012-12-04 2013-03-13 中国科学院自动化研究所 Multi-channel speech enhancement method
US20160255453A1 (en) * 2013-07-22 2016-09-01 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Method for processing an audio signal; signal processing unit, binaural renderer, audio encoder and audio decoder
CA2955928A1 (en) * 2014-07-16 2016-01-21 United States Bankruptcy Court For The District Of Utah Apparatus and methods for recording audio and video
US20160142454A1 (en) * 2014-11-14 2016-05-19 Qualcomm Incorporated Multi-channel audio alignment schemes
CN205123978U (en) * 2015-04-29 2016-03-30 青岛歌尔声学科技有限公司 Audio signal treatment circuit and have electronic equipment of this circuit
WO2017024778A1 (en) * 2015-08-10 2017-02-16 中兴通讯股份有限公司 Audio frequency adjustment method, terminal device and computer readable storage medium
CN106303804A (en) * 2016-07-28 2017-01-04 维沃移动通信有限公司 The control method of a kind of mike and mobile terminal
CN106255000A (en) * 2016-07-29 2016-12-21 维沃移动通信有限公司 A kind of audio signal sample method and mobile terminal
CN106782614A (en) * 2016-12-26 2017-05-31 广州酷狗计算机科技有限公司 Sound quality detection method and device
US10304468B2 (en) * 2017-03-20 2019-05-28 Qualcomm Incorporated Target sample generation
CN107170465A (en) * 2017-06-29 2017-09-15 数据堂(北京)科技股份有限公司 A kind of audio quality detection method and audio quality detecting system
KR20190085399A (en) * 2018-01-10 2019-07-18 세종대학교산학협력단 Method and apparatus for analyzing characters for determining the authenticity of call reporting
US20190258452A1 (en) * 2018-02-19 2019-08-22 Kabushiki Kaisha Toshiba Audio output system, audio output method, and computer program product
CN108766457A (en) * 2018-05-30 2018-11-06 北京小米移动软件有限公司 Acoustic signal processing method, device, electronic equipment and storage medium
US10504541B1 (en) * 2018-06-28 2019-12-10 Invoca, Inc. Desired signal spotting in noisy, flawed environments
US10650840B1 (en) * 2018-07-11 2020-05-12 Amazon Technologies, Inc. Echo latency estimation
CN109151699A (en) * 2018-07-26 2019-01-04 Oppo广东移动通信有限公司 Microphone plug-hole detection method and Related product
CN110875054A (en) * 2018-08-31 2020-03-10 阿里巴巴集团控股有限公司 Far-field noise suppression method, device and system
CN111045633A (en) * 2018-10-12 2020-04-21 北京微播视界科技有限公司 Method and apparatus for detecting loudness of audio signal
CN109410975A (en) * 2018-10-31 2019-03-01 歌尔科技有限公司 A kind of voice de-noising method, equipment and storage medium
CN109727604A (en) * 2018-12-14 2019-05-07 上海蔚来汽车有限公司 Frequency domain echo cancel method and computer storage media for speech recognition front-ends
CN109451415A (en) * 2018-12-17 2019-03-08 深圳Tcl新技术有限公司 Microphone array auto-collation, device, equipment and storage medium
CN110175013A (en) * 2019-05-20 2019-08-27 北京声智科技有限公司 Voice input method, apparatus, electronic equipment and storage medium
CN110648678A (en) * 2019-09-20 2020-01-03 厦门亿联网络技术股份有限公司 Scene identification method and system for conference with multiple microphones
CN110677780A (en) * 2019-09-26 2020-01-10 北京小米移动软件有限公司 Detection method and device of audio input module and storage medium
CN111010488A (en) * 2019-12-18 2020-04-14 维沃移动通信有限公司 Audio signal processing method and device and electronic equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
曾辉等: "多通道同步语音数据采集系统的实现", 《微计算机应用》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112925502A (en) * 2021-02-10 2021-06-08 歌尔科技有限公司 Audio channel switching equipment, method and device and electronic equipment
CN112925502B (en) * 2021-02-10 2022-07-08 歌尔科技有限公司 Audio channel switching equipment, method and device and electronic equipment
CN113571038A (en) * 2021-07-14 2021-10-29 北京小米移动软件有限公司 Voice conversation method, device, electronic equipment and storage medium
WO2023103824A1 (en) * 2021-12-06 2023-06-15 华为技术有限公司 Audio channel selection method and apparatus, storage medium and vehicle

Also Published As

Publication number Publication date
CN112037825B (en) 2022-09-27

Similar Documents

Publication Publication Date Title
CN112037825B (en) Audio signal processing method and device and storage medium
CN109361828B (en) Echo cancellation method and device, electronic equipment and storage medium
CN111128221B (en) Audio signal processing method and device, terminal and storage medium
CN111968662A (en) Audio signal processing method and device and storage medium
CN111986693A (en) Audio signal processing method and device, terminal equipment and storage medium
CN111009257A (en) Audio signal processing method and device, terminal and storage medium
US20140236590A1 (en) Communication apparatus and voice processing method therefor
US20240096343A1 (en) Voice quality enhancement method and related device
CN109256145B (en) Terminal-based audio processing method and device, terminal and readable storage medium
CN111698593B (en) Active noise reduction method and device, and terminal
US11388281B2 (en) Adaptive method and apparatus for intelligent terminal, and terminal
CN114040285B (en) Method and device for generating feedforward filter parameters of earphone, earphone and storage medium
CN113596662B (en) Method for suppressing howling, device for suppressing howling, earphone, and storage medium
CN115714948A (en) Audio signal processing method and device and storage medium
CN115278441A (en) Voice detection method, device, earphone and storage medium
CN112217948B (en) Echo processing method, device, equipment and storage medium for voice call
CN113810828A (en) Audio signal processing method and device, readable storage medium and earphone
US20110206219A1 (en) Electronic device for receiving and transmitting audio signals
CN114040309B (en) Wind noise detection method and device, electronic equipment and storage medium
CN113852893A (en) Data processing method and device, terminal and storage medium
CN113825081B (en) Hearing aid method and device based on masking treatment system
CN116233696B (en) Airflow noise suppression method, audio module, sound generating device and storage medium
EP2362680A1 (en) Electronic device for receiving and transmitting audio signals
CN114979889A (en) Method and device for reducing occlusion effect of earphone, earphone and storage medium
CN115767346A (en) Earphone wind noise processing method and device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant