CN117156345A - Audio processing method, device and storage medium - Google Patents

Audio processing method, device and storage medium Download PDF

Info

Publication number
CN117156345A
CN117156345A CN202310402427.3A CN202310402427A CN117156345A CN 117156345 A CN117156345 A CN 117156345A CN 202310402427 A CN202310402427 A CN 202310402427A CN 117156345 A CN117156345 A CN 117156345A
Authority
CN
China
Prior art keywords
signal
audio
noise
audio signal
frequency
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310402427.3A
Other languages
Chinese (zh)
Inventor
魏彤
曾青林
张海宏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Honor Device Co Ltd
Original Assignee
Honor Device Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Honor Device Co Ltd filed Critical Honor Device Co Ltd
Priority to CN202310402427.3A priority Critical patent/CN117156345A/en
Publication of CN117156345A publication Critical patent/CN117156345A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2410/00Microphones
    • H04R2410/01Noise reduction using microphones having different directional characteristics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2430/00Signal processing covered by H04R, not provided for in its groups
    • H04R2430/03Synergistic effects of band splitting and sub-band processing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The embodiment of the application provides an audio processing method, equipment and a storage medium, which are applied to the technical field of terminals, wherein the method is applied to electronic equipment comprising a first audio acquisition component and a second audio acquisition component; the method comprises the following steps: acquiring an n+1th first audio signal acquired by a first audio acquisition component in an n+1th acquisition time, and an n+1th second audio signal acquired by a second audio acquisition component in the n+1th acquisition time; when a target type signal exists in a sound pickup environment where electronic equipment is located in the n+1th acquisition time, determining an n+1th noise spectrum in a manner that the greater the reverberation degree of the sound pickup environment in the n+1th acquisition time is, the smaller the weight of an n+1th second audio signal in the n+1th noise spectrum is, and the greater the weight of the n+1th noise spectrum in the n+1th noise spectrum is; and carrying out noise reduction processing on the n+1th first audio signal by adopting the n+1th noise spectrum. In this way, the noise reduction effect can be improved.

Description

Audio processing method, device and storage medium
Technical Field
The present application relates to the field of terminal technologies, and in particular, to an audio processing method, an audio processing device, and a storage medium.
Background
When a user uses the electronic device to conduct a voice call or a video conference, in a possible implementation manner, a microphone array on the electronic device performs noise reduction processing on the collected audio signals to obtain noise-reduced audio signals.
However, noise may still exist in the noise-reduced audio signal, which may cause interference to the voice call or video conference of the user.
Disclosure of Invention
The embodiment of the application provides an audio processing method, equipment and a storage medium, which are applied to the technical field of terminals, collect target type signals and noise signals through two audio collection components of electronic equipment, comprehensively consider factors such as the continuity of the audio signals, the collected noise signals, the reverberation degree of a sound pickup environment where the electronic equipment is located and the like, determine an accurate noise spectrum, and inhibit noise signals doped due to reverberation in the process of collecting the target type signals by adopting the determined noise spectrum, thereby reducing the interference of the noise signals on the target type signals.
In a first aspect, an embodiment of the present application provides an audio processing method, which is applied to an electronic device, where the electronic device includes a first audio acquisition component and a second audio acquisition component; the method comprises the following steps:
acquiring an n+1th first audio signal acquired by a first audio acquisition component in an n+1th acquisition time, and an n+1th second audio signal acquired by a second audio acquisition component in the n+1th acquisition time;
when the n+1th first audio signal contains a target type signal and/or the n+1th second audio signal contains a target type signal, obtaining an n+1th noise spectrum according to the n+1th leakage coefficient, the n+1th second audio signal and the n noise spectrum; the n noise spectrum is the noise spectrum corresponding to the last acquisition time of the n+1th acquisition time; the greater the n+1th leakage coefficient, the smaller the weight of the n+1th second audio signal in the n+1th noise spectrum, the greater the weight of the n-th noise spectrum in the n+1th noise spectrum; the n+1th leakage coefficient represents the reverberation degree of the pick-up environment where the electronic equipment is located in the n+1th acquisition time;
Adopting an n+1st noise spectrum to perform noise reduction treatment on the n+1st first audio signal; wherein n is an integer.
Thus, in the n+1th acquisition time, if the n+1th first audio signal includes the target type signal and/or the n+1th second audio signal includes the target type signal, it is explained that the n+1th first audio signal includes the target type signal and the noise signal, and the n+1th second audio signal includes the target type signal and the noise signal, the reverberation level of the pick-up environment where the electronic device is located affects the duty ratio of each of the target type signal and the noise signal in the n+1th first audio signal, and similarly, the reverberation level of the pick-up environment where the electronic device is located affects the duty ratio of each of the target type signal and the noise signal in the n+1th second audio signal, and meanwhile, the n+1th noise spectrum is determined according to the n+1th leakage coefficient, the n+1th second audio signal and the n noise spectrum, so that the accuracy of the n+1th noise spectrum can be improved. In addition, the larger the n+1th leakage coefficient is, the larger the duty ratio of the target type signal in the n+1th second audio signal is, the weight of the n+1th second audio signal in the n+1th noise spectrum is reduced, the weight of the n+1th noise spectrum in the n+1th noise spectrum is increased, the accuracy of the n+1th noise spectrum is higher, and furthermore, the n+1th noise spectrum is adopted to perform noise reduction processing on the n+1th first audio signal, so that the noise reduction effect of the n+1th first audio signal can be improved.
In one possible implementation, the n+1th second audio signal includes n+1th second frequency domain signals corresponding to the plurality of frequency points; the nth noise spectrum comprises an nth noise sub-spectrum corresponding to each of the plurality of frequency points; the n+1th noise spectrum comprises an n+1th noise sub-spectrum corresponding to each of the plurality of frequency points;
n+1th noise sub-spectrum corresponding to frequency spectrum k in n+1th noiseThe following formula is satisfied:
wherein beta is n+1 For the n+1th acquisition timeThe corresponding leakage coefficient, Y (k, n+1) is the n+1th second frequency domain signal corresponding to the frequency point k,is the nth noise sub-spectrum corresponding to the frequency point k, and is more than or equal to 0 and less than or equal to beta n+1 Is less than or equal to 1; the n+1th noise sub-spectrum corresponding to the frequency point k represents the intensity or energy of noise corresponding to the frequency point k in the n+1th acquisition time.
Therefore, the n+1th noise sub-spectrum corresponding to each frequency point corresponding to the n+1th noise spectrum can be accurately determined, and the accuracy of the n+1th noise spectrum can be improved.
In one possible implementation, the n+1th leakage coefficient relates to an energy ratio of the energy of the n+1th first audio signal to the energy of the n+1th second audio signal; wherein the larger the energy ratio is, the smaller the n+1th leakage amount coefficient is, and the smaller the energy ratio is, the larger the n+1th leakage amount coefficient is.
For example, the corresponding relation between the energy of the audio signal collected by the first audio collection component and the energy of the audio signal collected by the second audio collection component in the same time period is determined in advance according to experimental statistics, and the leakage coefficient beta corresponding to the same time period can be quickly and accurately determined based on the corresponding relation between delta and beta, so that the reverberation degree of the sound pickup environment where the electronic equipment is located can be quickly quantized, and the determination rate of the n+1th noise spectrum is improved.
In one possible implementation manner, the n+1th first audio signal includes an n+1th first frequency domain signal corresponding to each of the plurality of frequency points, and the n+1th second audio signal includes an n+1th second frequency domain signal corresponding to each of the plurality of frequency points;
for the frequency bin k, the n+1th first frequency domain signal X (k, n+1) corresponding to the frequency bin k contains the log likelihood ratio ln (δ (k, n+1)) of the target type signal, which satisfies the formula:
wherein σ (k, n+1) is the posterior signal-to-noise ratio of the n+1th first frequency domain signal X (k, n+1) corresponding to the frequency point k; ρ (k, n+1) is the a priori signal to noise ratio of X (k, n+1);
the method further comprises the steps of:
if the sum of log likelihood ratios of the n+1th first frequency domain signals corresponding to the frequency points respectively contains the target type signals is smaller than a first preset threshold value, determining that the n+1th first audio signals contain the target type signals and/or the n+1th second audio signals contain the target type signals;
if the sum of the log likelihood ratios of the n+1th first frequency domain signals corresponding to the frequency points respectively contains the target type signals is larger than or equal to a first preset threshold value, determining that the n+1th first audio signal does not contain the target type signals and the n+1th second audio signal does not contain the target type signals.
In this way, based on the sum of the log likelihood ratios of the n+1th first frequency domain signal corresponding to each of the plurality of frequency points and the target type signal, the log likelihood ratio of the n+1th first audio signal and the target type signal can be accurately determined, and whether the n+1th first audio signal contains the target type signal or not or whether the n+1th second audio signal contains the target type signal or not can be accurately determined, so that the n+1th noise spectrum can be accurately determined in a corresponding mode.
In one possible implementation, σ (k, n+1) satisfies the formula:
ρ (k, n+1) satisfies the formula:
ρ(k,n+1)=αρ(k,n)+(1-α)(σ(k,n+1)-1);
wherein Y (k, n+1) is the n+1th second frequency domain signal corresponding to the frequency point k, alpha is a weight factor, rho (k, n) is the prior signal-to-noise ratio of the first frequency domain signal X (k, n) of the last acquisition time of the n+1th acquisition time, and rho (k, n) satisfies the formula Z (k, n) is the frequency domain signal after noise reduction of X (k, n), and Y (k, n) is the second frequency domain signal of the last acquisition time of the n+1th acquisition time corresponding to the frequency point k.
In this way, the posterior signal-to-noise ratio of the n+1th first frequency domain signal X (k, n+1) corresponding to the frequency point k and the prior signal-to-noise ratio of the n+1th first frequency domain signal X (k, n+1) corresponding to the frequency point k can be rapidly and accurately determined.
In one possible implementation, the method further includes:
when the n+1th first audio signal does not contain the target type signal and the n+1th second audio signal does not contain the target type signal, obtaining an n+1th noise spectrum according to the n+1th second audio signal and the n noise spectrum; wherein the weight of the n+1th second audio signal in the n+1th noise spectrum is the same as the weight of the n noise spectrum in the n+1th noise spectrum.
Thus, when the n+1th first audio signal does not include the target type signal and the n+1th second audio signal does not include the target type signal, it is indicated that the pickup environment does not include the target type signal in the n+1th collection time, the audio signals collected in the n+1th collection time by the first audio collection component and the second audio collection component are noise signals, and the noise spectrum corresponding to the n+1th collection time can be quickly and accurately determined according to the noise spectrum (i.e., the n noise spectrum) corresponding to the last collection time of the n+1th second audio signal and the n+1th collection time, in consideration of the continuity of the audio signals.
In one possible implementation manner, the n+1th first audio signal includes n+1th first frequency domain signals corresponding to the plurality of frequency points respectively; the n+1th noise spectrum comprises an n+1th noise sub-spectrum corresponding to each of the plurality of frequency points;
adopting the n+1th noise spectrum to perform noise reduction processing on the n+1th first audio signal, wherein the noise reduction processing comprises the following steps:
for the frequency point k of the signal,
if the energy of the (n+1) th first frequency domain signal X (k, n+1) corresponding to the frequency point k is smaller than the (n+1) th noise sub-spectrum corresponding to the frequency point kThe energy of X (k, n+1) is determined as a preset attenuation value; the preset attenuation value is smaller than 1;
if the energy of the (n+1) th first frequency domain signal X (k, n+1) corresponding to the frequency point k is greater than or equal to the (n+1) th noise sub-spectrum corresponding to the frequency point kThe energy of X (k, n+1) is determined to be 1;
multiplying the enhancement coefficient corresponding to X (k, n+1) by X (k, n+1) to obtain an n+1th first noise reduction frequency domain signal Z (k, n+1) corresponding to the frequency point k;
performing time domain transformation processing on the n+1th first noise reduction frequency domain signal corresponding to each of the plurality of frequency points to obtain an n+1th first noise reduction time domain signal corresponding to the n+1th acquisition time;
the n+1th noise sub-spectrum corresponding to the frequency point k represents the intensity or energy of noise corresponding to the frequency point k in the n+1th acquisition time.
Thus, if the energy of the n+1th first frequency domain signal X (k, n+1) corresponding to the frequency point k is smaller than the n+1th noise sub-spectrum corresponding to the frequency point kThe energy of the n+1th first frequency domain signal X (k, n+1) corresponding to the frequency point k is described as a noise signal, and the n+1th first frequency domain signal X (k, n+1) corresponding to the frequency point k is denoised by adopting a preset attenuation value smaller than 1, so that the noise signal is suppressed by taking the frequency point as a minimum unit, and the denoising effect is improved. If the energy of the (n+1) th first frequency domain signal X (k, n+1) corresponding to the frequency point k is more than or equal to the (n+1) th noise sub-spectrum corresponding to the frequency point k>The energy of the n+1th first frequency domain signal X (k, n+1) corresponding to the frequency point k is described as a target type signal, 1 is adopted as an enhancement coefficient to reduce noise of the n+1th first frequency domain signal X (k, n+1) corresponding to the frequency point k, and the purpose of using the frequency point as a minimum unit to keep the target type signal without distortion is achieved. Each of a plurality of frequency pointsPerforming time domain transformation processing on the corresponding n+1th first noise reduction frequency domain signal to obtain an n+1th first noise reduction time domain signal corresponding to the n+1th acquisition time, so as to facilitate subsequent output of the n+1th first noise reduction time domain signal.
In one possible implementation, acquiring an n+1th first audio signal acquired by the first audio acquisition component during an n+1th acquisition time, and an n+1th second audio signal acquired by the second audio acquisition component during the n+1th acquisition time, includes:
Acquiring a first time domain signal acquired by a first audio acquisition component in a time period t and a second time domain signal acquired by a second audio acquisition component in the time period t;
framing, windowing and frequency domain transformation are carried out on the first time domain signal to obtain a plurality of frames of first audio signals; framing, windowing and frequency domain transformation are carried out on the second time domain signal to obtain a plurality of frames of second audio signals; the frame length of each frame of the first audio signal is the same as the n+1th acquisition time; the frame length of each frame of the second audio signal is the same as the n+1th acquisition time.
In this way, the audio signal corresponding to the time period t is divided into a plurality of audio signal segments of smaller time periods, and the noise reduction effect on the first time domain signal corresponding to the time period t can be improved by performing the noise reduction processing on the audio signal segments of each of the smaller time periods.
In a second aspect, an embodiment of the present application provides an electronic device, including: a processor and a memory; the memory stores computer-executable instructions; the processor executes computer-executable instructions stored in the memory to cause the electronic device to perform the method as in the first aspect.
In a third aspect, embodiments of the present application provide a computer-readable storage medium storing a computer program. The computer program, when executed by a processor, implements a method as in the first aspect.
In a fourth aspect, embodiments of the present application provide a computer program product comprising a computer program which, when run, causes a computer to perform the method as in the first aspect.
In a fifth aspect, an embodiment of the application provides a chip comprising a processor for invoking a computer program in memory to perform a method as in the first aspect.
It should be understood that the second to fifth aspects of the present application correspond to the technical solutions of the first aspect of the present application, and the advantages obtained by each aspect and the corresponding possible embodiments are similar, and are not repeated.
Drawings
FIG. 1 is a scene diagram of a microphone array capturing audio signals in a possible implementation;
fig. 2 is a scene diagram of an audio signal collected by an audio collection component according to an embodiment of the present application;
FIG. 3 is a flowchart of an audio processing method according to an embodiment of the present application;
fig. 4 is a schematic diagram of an electronic device 100 according to an embodiment of the present application;
fig. 5 is a flowchart of noise reduction of the n+1th first audio signal according to an embodiment of the present application;
FIG. 6 is a schematic diagram of a target signal segment and a non-target signal segment according to an embodiment of the present application;
FIG. 7 is a schematic diagram illustrating comparison between a first time domain signal and a first noise reduction audio signal according to an embodiment of the present application;
fig. 8 is a schematic diagram of a second structure of an electronic device 100 according to an embodiment of the present application;
fig. 9 is a schematic software structure of an electronic device 100 according to an embodiment of the present application;
fig. 10 is a schematic diagram of a hardware structure of a chip according to an embodiment of the present application.
Detailed Description
In order to facilitate the clear description of the technical solutions of the embodiments of the present application, the following simply describes some terms and techniques involved in the embodiments of the present application:
1. partial terminology
In embodiments of the present application, the words "first," "second," and the like are used to distinguish between identical or similar items that have substantially the same function and effect. For example, the first chip and the second chip are merely for distinguishing different chips, and the order of the different chips is not limited. It will be appreciated by those of skill in the art that the words "first," "second," and the like do not limit the amount and order of execution, and that the words "first," "second," and the like do not necessarily differ.
It should be noted that, in the embodiments of the present application, words such as "exemplary" or "such as" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "for example" should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion.
In the embodiments of the present application, "at least one" means one or more, and "a plurality" means two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a alone, a and B together, and B alone, wherein a, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b, or c may represent: a, b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c may be single or plural.
2. Electronic equipment
The electronic device of the embodiment of the application can comprise a handheld device, a vehicle-mounted device and the like with an image processing function. For example, some electronic devices are: a mobile phone, tablet, palm, notebook, mobile internet device (mobile internet device, MID), wearable device, virtual Reality (VR) device, augmented reality (augmented reality, AR) device, wireless terminal in industrial control (industrial control), wireless terminal in unmanned (self driving), wireless terminal in teleoperation (remote medical surgery), wireless terminal in smart grid (smart grid), wireless terminal in transportation security (transportation safety), wireless terminal in smart city (smart city), wireless terminal in smart home (smart home), cellular phone, cordless phone, session initiation protocol (session initiation protocol, SIP) phone, wireless local loop (wireless local loop, WLL) station, personal digital assistant (personal digital assistant, PDA), handheld device with wireless communication function, public computing device or other processing device connected to wireless modem, vehicle-mounted device, wearable device, terminal device in future communication network (public land mobile network), or land mobile communication network, etc. without limiting the application.
By way of example, and not limitation, in embodiments of the application, the electronic device may also be a wearable device. The wearable device can also be called as a wearable intelligent device, and is a generic name for intelligently designing daily wear by applying wearable technology and developing wearable devices, such as hearing aids, glasses, gloves, watches, clothes, shoes and the like. The wearable device is a portable device that is worn directly on the body or integrated into the clothing or accessories of the user. The wearable device is not only a hardware device, but also can realize a powerful function through software support, data interaction and cloud interaction. The generalized wearable intelligent device includes full functionality, large size, and may not rely on the smart phone to implement complete or partial functionality, such as: smart watches or smart glasses, etc., and focus on only certain types of application functions, and need to be used in combination with other devices, such as smart phones, for example, various smart bracelets, smart jewelry, etc. for physical sign monitoring.
In addition, in the embodiment of the application, the electronic equipment can also be terminal equipment in an internet of things (internet of things, ioT) system, and the IoT is an important component of the development of future information technology, and the main technical characteristics of the IoT are that the article is connected with a network through a communication technology, so that the man-machine interconnection and the intelligent network of the internet of things are realized.
The electronic device in the embodiment of the application may also be referred to as: a User Equipment (UE), a Mobile Station (MS), a Mobile Terminal (MT), an access terminal, a subscriber unit, a subscriber station, a mobile station, a remote terminal, a mobile device, a user terminal, a wireless communication device, a user agent, or a user equipment, etc.
In an embodiment of the present application, the electronic device or each network device includes a hardware layer, an operating system layer running on top of the hardware layer, and an application layer running on top of the operating system layer. The hardware layer includes hardware such as a central processing unit (central processing unit, CPU), a memory management unit (memory management unit, MMU), and a memory (also referred to as a main memory). The operating system may be any one or more computer operating systems that implement business processes through processes (processes), such as a Linux operating system, a Unix operating system, an Android operating system, an iOS operating system, or a windows operating system. The application layer comprises applications such as a browser, an address book, word processing software, instant messaging software and the like.
With the rapid development of communication technology and network technology, voice call and video conference become common communication contact ways. In a possible implementation manner, when a user uses an electronic device to perform a voice call and a video conference, a microphone array on the electronic device performs noise reduction processing on the collected audio signal, so as to obtain a noise-reduced audio signal.
In a possible implementation, the directivity of the microphone array on the electronic device is 8-shaped directivity as shown in fig. 1. The directivity determines the strength of the microphone array in terms of its ability to receive audio signals in different directions. As shown in fig. 1, the microphone array 11 has a strong receiving capability for audio signals in the target source direction and a weak receiving capability for audio signals in the non-target source direction. The non-target source directions include the noise source directions shown in fig. 1. Thus, the microphone array 11 can better collect the audio signal in the target source direction and suppress the noise in the non-target source direction, so as to realize the noise reduction processing on the audio signal collected by the microphone array 11.
However, in a sound pickup environment in which an electronic apparatus is located, reverberation is inevitably present. Noise in the pick-up environment can enter the microphone array from the target source direction due to reverberation and is not suppressed by the microphone array. Noise signals entering the microphone array will interfere with the audio signal emitted by the target source. In addition, the greater the reverberation level of the pick-up environment in which the electronic device is located, the more noise enters the microphone array, the greater the noise intensity or the higher the noise energy, and the stronger the interference on the audio signal emitted by the target source.
In view of this, the present application provides an audio processing method applied to an electronic device, in an n+1-th acquisition time, when an audio signal sent by a target source is in a pickup environment where the electronic device is located, according to the greater reverberation degree of the pickup environment in the n+1-th acquisition time, the smaller the weight of a noise signal collected in the n+1-th acquisition time in a noise spectrum corresponding to the n+1-th acquisition time, the greater the weight of a noise spectrum corresponding to the n+1-th acquisition time in a noise spectrum corresponding to the n+1-th acquisition time is, the n+1-th acquisition time corresponding to the noise spectrum is determined, so that the noise reduction effect on the audio signal sent by the target source collected in the n+1-th acquisition time can be improved, the noise signal doped in the audio signal sent by the target source collected in the n+1-th acquisition time can be reduced, and the interference on the audio signal sent by the target source can be reduced.
The audio processing method proposed by the present application is described below with reference to some embodiments.
Fig. 2 is a scene diagram of an audio signal collected by an audio collection component according to an embodiment of the present application. As shown in fig. 2, the electronic device 100 contains two audio acquisition components. The audio acquisition component may be a biomimetic microphone. The bionic microphone has 8-shaped directivity with the same full frequency band. The audio acquisition component may also be a microphone component having a full band consistent 8-shaped directivity. The two audio acquisition components may be referred to as a first audio acquisition component and a second audio acquisition component, respectively. The reference point of the first audio acquisition component and the reference point of the second audio acquisition component are located in the same preset area, for example, the reference point of the first audio acquisition component and the reference point of the second audio acquisition component are located at the same point. The reference point of the first audio acquisition component is, for example, a midpoint of the directivity of the first audio acquisition component. The reference point of the second audio acquisition component is, for example, a midpoint of the directivity of the second audio acquisition component. An included angle phi between the directional central axis of the first audio acquisition component and the directional central axis of the second audio acquisition component is larger than 45 degrees and smaller than 135 degrees. One of the two audio acquisition components is used for acquiring an audio signal emitted by a target source, and the other is used for acquiring a noise signal, such as an audio signal emitted by a noise source shown in fig. 2. The following describes an audio processing method provided by the embodiment of the present application, taking an example of an included angle Φ=90° between a directional central axis of the first audio acquisition component and a directional central axis of the second audio acquisition component shown in fig. 2.
As shown in fig. 2, the first audio acquisition component has strong reception of audio signals in the target source direction and weak reception of audio signals in non-target source directions. The audio signal in the direction of the target source comprises an audio signal emitted by the target source. The first audio acquisition component can better receive the audio signal sent by the target source and inhibit the audio signal in the direction of the noise source. The audio signal in the direction of the noise source includes an audio signal emitted by the noise source.
Similarly, the second audio acquisition component is strong in receiving the audio signal in the noise source direction and weak in receiving the audio signal in the non-noise source direction. The second audio signal can better receive the audio signal sent by the noise source, and inhibit the audio signal sent by the target source.
In the embodiment of the application, the audio signal sent by the target source belongs to the target type signal. The audio signal emitted by the noise source belongs to the noise signal.
Because of the reverberation in the pickup environment in which the electronic device 100 is located, noise signals emitted by noise sources may enter the first audio collection assembly from the target source direction after being reflected by the environment. Similarly, a target type signal from a target source may enter the second audio acquisition component from the direction of the noise source after being reflected by the pickup environment. When the target source sends out a target type signal, the first audio acquisition component and the second audio acquisition component acquire the target type signal. The first audio acquisition component acquires more signals and higher signal energy than the second audio acquisition component. Similarly, when the noise source emits a noise signal, the first audio acquisition component and the second audio acquisition component both acquire the noise signal. Compared with the noise signals collected by the first audio collection component, the noise signals collected by the second audio collection component are more in signal and higher in signal energy. In addition, the channel for the first audio acquisition component to acquire audio signals is mutually independent from the channel for the second audio acquisition component to acquire audio signals, and the audio signals acquired by the channels cannot interfere with each other.
Thus, the noise spectrum estimation can be performed based on the reverberation level of the sound pickup environment where the electronic device 100 is located and the audio signal collected by the second audio collection component, so as to obtain an accurate noise spectrum. The estimated accurate noise spectrum is adopted to carry out noise reduction processing on the audio signals acquired by the first audio acquisition component, so that the interference of the noise signals on the target type signals in the audio signals acquired by the first audio acquisition component can be reduced. Fig. 3 is a flowchart of an audio processing method according to an embodiment of the present application. The electronic device 100 may perform the noise reduction process on the audio signal acquired by the first audio acquisition component in the manner shown in S101-S103 in fig. 3.
S101, the electronic device 100 acquires an n+1th first audio signal acquired by the first audio acquisition component in an n+1th acquisition time, and an n+1th second audio signal acquired by the second audio acquisition component in the n+1th acquisition time.
S102, when the n+1th first audio signal contains the target type signal and/or the n+1th second audio signal contains the target type signal, the electronic device 100 obtains the n+1th noise spectrum according to the n+1th leakage coefficient, the n+1th second audio signal and the n noise spectrum. The n-th noise spectrum is the noise spectrum corresponding to the last acquisition time of the n+1-th acquisition time. The greater the n+1th leakage coefficient, the smaller the weight of the n+1th second audio signal in the n+1th noise spectrum, and the greater the weight of the n-th noise spectrum in the n+1th noise spectrum. The n+1 leakage coefficient characterizes the degree of reverberation of the sound pickup environment in which the electronic device 100 is located during the n+1 pickup time.
S103, the electronic equipment 100 adopts the n+1th noise spectrum to perform noise reduction processing on the n+1th first audio signal; wherein n is an integer.
The audio processing method shown in fig. 3 will be described with reference to fig. 2 to 7. Fig. 4 is a schematic diagram of an electronic device 100 according to an embodiment of the application. As shown in fig. 2 and 4, the electronic device 100 includes a first audio acquisition component and a second audio acquisition component. As shown in fig. 4, the electronic device 100 may further include a frequency domain transform module 21, a signal analysis module 22, an enhancement coefficient calculation module 23, and a time domain transform module 24.
As shown in fig. 2 and 4, the audio signal acquired by the first audio acquisition component of the electronic device 100 during the time period t is a first time domain signal x (t). The audio signal acquired by the second audio acquisition component of the electronic device 100 during the time period t is the second time domain signal y (t). After acquiring the first time domain signal acquired by the first audio acquisition component in the time period t and the second time domain signal acquired by the second audio acquisition component in the time period t, the electronic device 100 performs framing, windowing and frequency domain transformation on the first time domain signal x (t) to obtain multiple frames of first audio signals, and the electronic device 100 performs framing, windowing and frequency domain transformation on the second time domain signal y (t) to obtain multiple frames of second audio signals. For example, the frequency domain transforming module 21 frames, windows and frequency domain transforms the first time domain signal x (t) to obtain a plurality of frames of the first audio signal. The frequency domain transformation module 21 frames, windows and frequency domain transforms the second time domain signal y (t) to obtain a plurality of frames of second audio signals.
Illustratively, the frequency domain transforming module 21 may frame, window and frequency domain transform the first time domain signal x (t) as shown in the following (3.1) - (3.2) to obtain a multi-frame first audio signal:
(3.1) the frequency domain transforming module 21 performs windowing and framing processing on the first time domain signal x (t) according to the preset frame length and the preset frame shift, and divides the first time domain signal x (t) into N first audio signal segments. Each first audio signal segment is also referred to as a frame of first audio signal segments.
The windowing and framing process is performed on the first time domain signal x (t), and may be implemented by an algorithm that increases a window function. In the process of windowing and framing the first time domain signal x (t), the window function is a real function with values of 0 except for a given interval, and the product of any function and the window function is still the window function.
For example, the window function may be a rectangular window, a hamming window, a gaussian window, or the like.
When the first time domain signal x (t) is subjected to windowing and framing, frames are overlapped according to preset frame movement, so that the problem that two end parts of a first audio signal fragment are weakened due to windowing is avoided.
The time difference between the start positions of two adjacent frames is called frame shift. For example, the preset frame may be shifted to half the preset frame length. For example, the preset frame length is 10 ms, and the preset frame shift may be 5 ms. The time period t may be 1 second, or other time period.
It should be noted that, the setting of the frame length may be determined by a person skilled in the art according to the requirement for real-time performance of audio processing. The frame length can be set smaller for audio processing requiring high real-time performance, such as audio processing in video live broadcast. And the audio processing with low real-time requirements can set a larger frame length.
(3.2) the frequency domain transforming module 21 performs frequency domain transforming on each first audio signal segment to obtain a frequency domain transformed first audio signal segment corresponding to each first audio signal segment.
For example, the frequency domain transforming module 21 performs a frequency domain transformation on the n+1th first audio signal segment of the first time domain signal x (t) ordered in time sequence, to obtain the n+1th frequency domain transformed first audio signal segment.
The n+1th frequency-domain transformed first audio signal segment is referred to as the n+1th first audio signal. The n+1th first audio signal characterizes the audio signal acquired by the first audio acquisition component during the n+1th acquisition time. The length of the n+1th acquisition time is equal to the preset frame length.
Wherein N is more than or equal to 1 and less than N, and N is an integer. The frequency domain transform may be a fourier transform or a short-time fast fourier transform.
Similarly, the frequency domain transforming module 21 performs windowing and framing on the second time domain signal y (t) according to the preset frame length and the preset frame shift, and divides the second time domain signal y (t) into N second audio signal segments. Each second audio signal segment is also referred to as a frame of second audio signal segments.
The frequency domain transforming module 21 performs frequency domain transformation on each second audio signal segment to obtain a frequency domain transformed second audio signal segment corresponding to each second audio signal segment.
For example, the frequency domain transforming module 21 performs a frequency domain transformation on the n+1th second audio signal segment of the second time domain signal y (t) ordered in time sequence, resulting in the n+1th frequency domain transformed second audio signal segment.
The n+1th frequency-domain transformed second audio signal segment is referred to as the n+1th second audio signal. The n+1th second audio signal characterizes the audio signal acquired by the second audio acquisition component during the n+1th acquisition time. The length of the n+1th acquisition time is equal to the preset frame length.
The frequency domain transforming module 21 performs framing, windowing and frequency domain transforming on the second time domain signal y (t) to obtain a specific implementation manner of the multi-frame second audio signal, which is similar to the manners shown in (3.1) - (3.2), and is not repeated here.
The frequency domain transformation module 21 obtains a plurality of frames of first audio signals and a plurality of frames of second audio signals.
For each frame of the first audio signal and each frame of the second audio signal, e.g., the n+1st first audio signal and the n+1st second audio signal, the signal analysis module 22 determines whether the n+1st first audio signal contains a target type signal. The target type signal is, for example, an audio signal emitted by a target source.
Because the collection time of the n+1th first audio signal is the same as that of the n+1th second audio signal, under the reverberation of the pickup environment, if the n+1th first audio signal is determined to contain the target type signal, the n+1th second audio signal can be determined to also contain the target type signal; accordingly, if it is determined that the n+1th second audio signal contains the target type signal, it may be determined that the n+1th first audio signal also contains the target type signal. If it is determined that the n+1th first audio signal does not contain the target type signal, it may be determined that the n+1th second audio signal also does not contain the target type signal.
The signal analysis module 22 may determine whether the n+1th first audio signal contains a target type signal in the following manner (3.3) - (3.4).
When the target source emits an audio signal, the first audio acquisition component and the second audio acquisition component both acquire a target type signal. But the first audio acquisition component acquires more signals and higher signal energy than the second audio acquisition component.
Similarly, when the noise source emits a noise signal, the first audio acquisition component and the second audio acquisition component both acquire the noise signal. But the noise signal collected by the second audio collection assembly is more and has higher signal energy than the noise signal collected by the first audio collection assembly.
In addition, the channel for the first audio acquisition component to acquire audio signals is mutually independent from the channel for the second audio acquisition component to acquire audio signals, and the audio signals acquired by the channels cannot interfere with each other. Therefore, the signal analysis module 22 may determine, with the n+1th first audio signal and the n+1th second audio signal, a priori signal-to-noise ratio and a posterior signal-to-noise ratio of the audio signal acquired by the audio acquisition component of the electronic device 100 during the n+1th acquisition time, and determine, according to the a priori signal-to-noise ratio and the posterior signal-to-noise ratio of the audio signal acquired during the n+1th acquisition time, a log likelihood ratio that the n+1th first audio signal includes the target type signal, so as to determine whether the n+1th first audio signal includes the target type signal.
(3.3), the n+1th first audio signal includes n+1th first frequency domain signals corresponding to the plurality of frequency points. The n+1th second audio signal includes n+1th second frequency domain signals corresponding to the plurality of frequency points. In order to improve the accuracy of the log-likelihood ratio of the n+1th first audio signal containing the target type signal, the signal analysis module 22 may sum the log-likelihood ratios of the n+1th first frequency domain signal containing the target type signal corresponding to the n+1th first audio signal corresponding to each of the plurality of frequency points to obtain the log-likelihood ratio of the n+1th first audio signal containing the target type signal.
For the frequency bin k, the n+1th first frequency domain signal X (k, n+1) corresponding to the frequency bin k contains the log likelihood ratio ln (δ (k, n+1)) of the target type signal, which satisfies the formula:
wherein σ (k, n+1) is the posterior signal-to-noise ratio of the n+1th first frequency domain signal X (k, n+1) corresponding to the frequency point k. ρ (k, n+1) is the a priori signal to noise ratio of X (k, n+1).
Alternatively, σ (k, n+1) satisfies the formula:
ρ (k, n+1) satisfies the formula:
ρ(k,n+1)=αρ(k,n)+(1-α)(σ(k,n+1)-1)。
wherein Y (k, n+1) is the n+1th second frequency domain signal corresponding to the frequency point k. Alpha is a weight factor. For example, α may be 0.98.ρ (k, n) is the a priori signal-to-noise ratio of the first frequency domain signal X (k, n) at the time of the last acquisition time of the n+1th acquisition time, ρ (k, n) satisfying the formulaZ (k, n) is the frequency domain signal after noise reduction of X (k, n). Y (k, n) is the second frequency domain signal of the last acquisition time of the n+1th acquisition time corresponding to the frequency point k.
The last acquisition time of the n+1 acquisition time, for example, the acquisition time of the n-th frequency-domain transformed first audio signal segment (i.e., the n-th first audio signal) or the acquisition time of the n-th frequency-domain transformed second audio signal segment (i.e., the n-th second audio signal).
Alternatively, if n=1, ρ (k, n) =0.
Optionally, ρ (k, n+1) is a first frequency domain signal X (k, n) based on a last acquisition time of an n+1th acquisition time corresponding to the frequency point k, and a second frequency domain signal Y (k, n) based on a last acquisition time of an n+1th acquisition time corresponding to the frequency point k, and is obtained according to decision-directed (DD) decision. The frequency point k may be any one of a plurality of frequency points corresponding to the n+1th first audio signal. Similarly, the frequency bin k may be any one of a plurality of frequency bins corresponding to the n+1th second audio signal.
Optionally, the frequency bin k includes one frequency bin or a plurality of frequency bins. If the frequency point k includes a frequency point, the subsequent noise reduction effect can be improved, but the time consumption of noise reduction can be increased. If the frequency point k includes a plurality of frequency points, the setting of the number of frequency points in the frequency domain conversion module 21 and/or the setting of the number of frequency points in each frequency point can be determined by a person skilled in the art according to the requirement on real-time performance of audio processing, so as to improve the noise reduction effect and reduce the noise reduction time.
(3.4) the signal analysis module 22 sums the log-likelihood ratios of the n+1th first frequency domain signal corresponding to each of the plurality of frequency points corresponding to the n+1th first audio signal and including the target type signal to obtain a log-likelihood ratio Σ of the n+1th first audio signal and including the target type signal k ln(δ(k,n+1))。
If the n+1th first frequency domain signal corresponding to each of the plurality of frequency points corresponding to the n+1th first audio signal contains the sum sigma of the log likelihood ratios of the target type signal k ln (δ (k, n+1)) is smaller than the first preset threshold thr, the signal analysis module 22 determines that the n+1th first audio signal contains a target type signal, and/or the signal analysis module 22 determines that the n+1th second audio signal contains a target type signal.
If the n+1th first frequency domain signal corresponding to each of the plurality of frequency points corresponding to the n+1th first audio signal contains the sum sigma of the log likelihood ratios of the target type signal k ln (δ (k, n+1)) is equal to or greater thanThe signal analysis module 22 determines that the n+1th first audio signal contains the target type signal and that none of the n+1th second audio signals contains the target type signal.
Illustratively, fig. 6 shows a schematic diagram of a target signal segment versus a non-target signal segment. If the n+1th first audio signal includes a target type signal and/or the n+1th second audio signal includes a target type signal, the signal analysis module 22 detects that the n+1th first audio signal and the n+1th second audio signal both belong to the signal of the target signal segment. If the n+1th first audio signal contains the target type signal and the n+1th second audio signal does not contain the target type signal, the signal analysis module 22 detects that the n+1th first audio signal and the n+1th second audio signal both belong to the signals of the non-target signal segment. Thus, as shown in fig. 5, the target signal segment and the non-target signal segment can be detected based on the likelihood ratio that the n+1th first audio signal contains the target type signal.
The signal analysis module 22 determines whether the n+1th first audio signal contains the target type signal according to the modes (3.3) - (3.4), and the enhancement factor calculation module 23 of the electronic device 100 may perform noise estimation according to the result of whether the n+1th first audio signal determined by the signal analysis module 22 contains the target type signal according to the following mode (3.5) or (3.6), so as to obtain an accurate noise spectrum (i.e., n+1th noise spectrum) corresponding to the n+1th acquisition time.
(3.5) when the n+1th first audio signal does not contain the target type signal and the n+1th second audio signal does not contain the target type signal, indicating that the target source does not emit the audio signal in the n+1th acquisition time, or indicating that the target type signal does not exist in the n+1th acquisition time, wherein the audio signals acquired in the n+1th acquisition time by the first audio acquisition component and the second audio acquisition component are noise signals.
In consideration of the continuity of the audio signal, the enhancement coefficient calculation module 23 may quickly and accurately determine the noise spectrum corresponding to the n+1th acquisition time, i.e., determine the n+1th noise spectrum, according to the noise spectrum corresponding to the n+1th second audio signal and the last acquisition time of the n+1th acquisition time (i.e., the n noise spectrum).
Wherein the weight of the n+1th second audio signal in the n+1th noise spectrum is the same as the weight of the n noise spectrum in the n+1th noise spectrum.
Illustratively, the n+1th noise spectrum includes n+1th noise sub-spectrums to which the plurality of frequency points correspond, respectively. The nth noise spectrum also includes nth noise sub-spectrums corresponding to the plurality of frequency points. For a frequency point k, the n+1th noise sub-spectrum corresponding to the frequency point kThe formula is satisfied:
wherein Y (k, n+1) is the n+1th second frequency domain signal corresponding to the frequency point k.The nth noise sub-spectrum corresponding to the frequency point k. The n+1th noise sub-spectrum corresponding to the frequency point k represents the intensity or energy of noise corresponding to the frequency point k in the n+1th acquisition time.
(3.6) when the n+1th first audio signal contains a target type signal and/or the n+1th second audio signal contains a target type signal, indicating that the target source emits an audio signal during the n+1th acquisition time, the audio signals acquired during the n+1th acquisition time by the first audio acquisition component and the second audio acquisition component each contain a target type signal and a noise signal due to reverberation of the sound pickup environment.
The second audio collection assembly is more powerful in signal quantity and signal energy of noise signals in the audio signals collected in the n+1th collection time compared with noise signals in the audio signals collected in the n+1th collection time by the first audio collection assembly. The number and energy of the target type signals in the audio signals acquired by the second audio acquisition component in the n+1 acquisition time are related to the reverberation level of the pickup environment.
Thus, considering the continuity of the audio signal and the reverberation level of the sound pickup environment, the enhancement coefficient calculation module 23 may obtain the n+1th noise spectrum according to the reverberation level of the sound pickup environment, the n+1th second audio signal, and the n noise spectrum where the electronic device 100 is located in the n+1th acquisition time.
The n+1 leakage coefficient may be used to characterize the reverberation level of the sound pickup environment in which the electronic device 100 is located during the n+1 acquisition time. The greater the n+1th leakage coefficient, the smaller the weight of the n+1th second audio signal in the n+1th noise spectrum, and the greater the weight of the n-th noise spectrum in the n+1th noise spectrum.
Thus, when the reverberation degree of the sound pickup environment where the electronic device 100 is located in the n+1th acquisition time is larger, the target type signal duty ratio in the n+1th second audio signal is larger, the weight of the n+1th second audio signal in the n+1th noise spectrum is reduced, so that the influence of the target type signal in the n+1th second audio signal on the n+1th noise spectrum is reduced. Due to the continuity of the audio signals, when the target type signal duty ratio in the n+1th second audio signal is larger, the weight of the n+1th noise spectrum in the n+1th noise spectrum is increased, so that the accuracy of the n+1th noise spectrum is improved based on the n+1th noise spectrum.
The n+1th noise spectrum includes n+1th noise sub-spectrum corresponding to each of the plurality of frequency points. The nth noise spectrum comprises an nth noise sub-spectrum corresponding to each of the plurality of frequency points.
Illustratively, for bin k, the n+1th noise sub-spectrum corresponding to bin kThe formula is satisfied:
wherein beta is n+1 Is the n+1 leakage coefficient. Y (k, n+1) is the n+1th second frequency domain signal corresponding to the frequency point k.The nth noise sub-spectrum corresponding to the frequency point k. Beta is not less than 0 n+1 And is less than or equal to 1. The n+1th noise sub-spectrum representation corresponding to the frequency point k is acquired in the n+1thThe frequency point k in time corresponds to the intensity or energy of the noise.
The enhancement coefficient calculation module 23 determines the n+1th noise sub-spectrum corresponding to each frequency point corresponding to the n+1th noise spectrum, so that the n+1th noise spectrum can be accurately determined. The enhancement coefficient calculation module 23 adopts the accurate n+1th noise spectrum, and can perform noise reduction processing on the n+1th first audio signal, so as to improve the noise reduction effect of the n+1th first audio signal.
The degree of reverberation in the sound pickup environment in which the electronic device 100 is located affects the duty ratio (e.g., the energy duty ratio) of each of the target type signal and the noise signal in the n+1th first audio signal, and similarly, the degree of reverberation in the sound pickup environment in which the electronic device 100 is located affects the duty ratio (e.g., the energy duty ratio) of each of the target type signal and the noise signal in the n+1th second audio signal. The degree of reverberation of the sound pickup environment is related to the energy of the noise signal in the n+1th first audio signal. Similarly, the degree of reverberation of the sound pickup environment is related to the energy of the target type signal in the n+1th second audio signal.
N+1th leakage coefficient beta n+1 Energy ratio delta to energy of n+1th first audio signal and energy of n+1th second audio signal n+1 Related to; wherein the larger the energy ratio is, the smaller the n+1th leakage amount coefficient is, and the smaller the energy ratio is, the larger the n+1th leakage amount coefficient is. Delta n+1 The formula is satisfied:
based on the energy ratio of the energy of the n+1th first audio signal to the energy of the n+1th second audio signal, the enhancement coefficient calculation module 23 may determine the reverberation condition of the sound pickup environment.
Exemplary, the n+1th leakage coefficient beta corresponding to the reverberation of the sound-collecting environment in the n+1th acquisition time can be accurately obtained according to the energy ratio of the energy of the n+1th first audio signal to the energy of the n+1th second audio signal n+1
For example, the energy of the target type signal is typically higher than the energy of the noise signal. N+1thThe energy of an audio signal is greater than or equal to the energy of the n+1th second audio signal. The enhancement coefficient calculation module 23 determines δ rapidly and accurately according to the corresponding relationship between the energy ratio δ of the audio signal collected by the first audio collection component and the energy of the audio signal collected by the second audio collection component in a time period determined by experimental statistics in advance and the leakage coefficient β corresponding to the time period n+1 Corresponding n+1th leakage coefficient beta n+1 The method realizes rapid quantization of the reverberation degree of the pick-up environment where the electronic equipment 100 is located, and improves the determination rate of the (n+1) th noise spectrum. The correspondence of δ to β may be an experimentally established correspondence table of δ to β.
For example, delta may have a value of 1, infinity), 0.ltoreq.β.ltoreq.1. As δ approaches infinity, the corresponding β approaches 0. The smaller the energy of the n+1th second audio signal, the n+1th leakage amount coefficient beta n+1 The smaller the weight of the n+1th second audio signal in the n+1th noise spectrum is, the smaller the weight of the n+1th noise spectrum in the n+1th noise spectrum is. The greater the energy of the n+1th second audio signal, the n+1th leakage amount coefficient beta n+1 The larger the weight of the n+1th second audio signal in the n+1th noise spectrum is, the smaller the weight of the n+1th noise spectrum in the n+1th noise spectrum is.
Optionally, the n+1th leakage coefficient β is determined from the energy ratio of the energy of the n+1th first audio signal to the energy of the n+1th second audio signal n+1 Or the actions of (a) may be performed by signal analysis module 22.
Optionally, in order to facilitate experiments to establish the corresponding relation between delta and beta, 0.ltoreq.beta.ltoreq.M and 1.ltoreq.M. Correspondingly, 0.ltoreq.beta n+1 M is more than or equal to 1 and less than M. When the n+1th first audio signal contains a target type signal and/or the n+1th second audio signal contains a target type signal, the n+1th noise sub-spectrum corresponding to the frequency point k is used for the frequency point k The formula is satisfied:
in order to further improve the noise reduction effect of the n+1th first audio signal, the enhancement coefficient calculation module 23 may perform noise reduction processing on the n+1th first audio signal as shown in (3.7) below.
(3.7) for frequency bin k,
if the energy of the (n+1) th first frequency domain signal X (k, n+1) corresponding to the frequency point k is smaller than the (n+1) th noise sub-spectrum corresponding to the frequency point kThe enhancement coefficient calculation module 23 may determine that the enhancement coefficient G (k, n+1) corresponding to the X (k, n+1) is a preset attenuation value (attfactor) when the n+1 th first frequency domain signal X (k, n+1) corresponding to the frequency point k is a noise signal, and multiply the enhancement coefficient G (k, n+1) corresponding to the X (k, n+1) by the X (k, n+1) to obtain the n+1 th first noise reduction frequency domain signal Z (k, n+1) corresponding to the frequency point k.
For example, the preset attenuation value is less than 1.
Thus, for the audio signal in the n+1th acquisition time, the frequency point is taken as the minimum unit, the noise signal is suppressed, and the noise reduction effect is improved.
If the energy of the (n+1) th first frequency domain signal X (k, n+1) corresponding to the frequency point k is greater than or equal to the (n+1) th noise sub-spectrum corresponding to the frequency point kThe enhancement coefficient calculation module 23 may determine that the enhancement coefficient G (k, n+1) corresponding to X (k, n+1) is 1, and multiply the enhancement coefficient corresponding to X (k, n+1) by X (k, n+1) to obtain the n+1 th first noise reduction frequency domain signal Z (k, n+1) corresponding to the frequency point k, when the n+1 th first frequency domain signal X (k, n+1) corresponding to the frequency point k is the target type signal.
In this way, for the audio signal in the n+1th acquisition time, the frequency point is used as the minimum unit, and no distortion reservation is performed on the target type signal, or no distortion enhancement is performed on the target type signal.
Illustratively, the enhancement factor G (k, n+1) corresponding to X (k, n+1) satisfies the formula:
(3.8) performing noise reduction processing on the n+1th first noise reduction frequency domain signal corresponding to each frequency point of the n+1th first audio signal in the manner shown in (3.7), and then performing time domain conversion processing on the n+1th first noise reduction frequency domain signal corresponding to each of the plurality of frequency points corresponding to the n+1th first audio signal by the time domain conversion module 24 to obtain the n+1th first noise reduction time domain signal corresponding to the n+1th acquisition time. The time domain transform process is, for example, an inverse fourier transform or an inverse fast fourier transform.
The n+1th first noise reduction time domain signal is a time domain signal after noise reduction of the n+1th first audio signal segment of the first time domain signal x (t).
According to the embodiment shown in fig. 5, or as shown in (3.1) - (3.8), the electronic device 100 may perform noise reduction processing on N first audio signal segments corresponding to the first time domain signal x (t) respectively, to obtain time domain signals after noise reduction of the N first audio signal segments respectively.
The time domain transform module 24 may perform a overlap-add process on the time domain signals after the noise reduction of the N first audio signal segments corresponding to the first time domain signal x (t), so as to obtain a first noise reduction audio signal z (t) after the noise reduction of the first time domain signal x (t). A comparison of the first time domain signal x (t) and the first noise reduction audio signal z (t) is shown in fig. 7.
According to the audio processing method provided by the embodiment, in the time period t, the first time domain signal acquired by the first audio acquisition component and the second time domain signal acquired by the second audio acquisition component are respectively segmented to respectively obtain a plurality of first audio signal fragments with smaller time periods and a plurality of second audio signal fragments with smaller time periods, wherein the smaller time periods are the n+1th acquisition time period; respectively carrying out frequency domain transformation on a first audio signal segment in a smaller time period and a second audio signal segment in a smaller time period to respectively obtain a first frequency domain signal corresponding to a plurality of frequency points in the smaller time period and a second frequency domain signal corresponding to a plurality of frequency points in the smaller time period; obtaining a noise sub-spectrum corresponding to one frequency point of a smaller time period based on the reverberation degree corresponding to the smaller time period, the second frequency domain signal corresponding to one frequency point of the smaller time period and the noise sub-spectrum corresponding to the one frequency point of the last smaller time period of the smaller time period, or obtaining a noise sub-spectrum corresponding to one frequency point of the smaller time period based on the second frequency domain signal corresponding to one frequency point of the smaller time period and the noise sub-spectrum corresponding to the one frequency point of the last smaller time period of the smaller time period; the noise sub-spectrum corresponding to the frequency point in the smaller time period is adopted to reduce noise of the first frequency domain signal corresponding to the frequency point in the smaller time period, so that the noise reduction effect of the first time domain signal acquired by the first audio acquisition component in the time period t can be improved, and the interference of the noise signal on the target type signal in the audio signal acquired by the first audio acquisition component can be reduced.
Fig. 8 is a schematic diagram of a second structure of an electronic device 100 according to an embodiment of the present application. The electronic device 100 as shown in fig. 8 may be used to perform the method shown in an embodiment of the application. As shown in fig. 8, the electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (universal serial bus, USB) interface 130, a charge management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, a sensor module 180, keys 190, a motor 191, an indicator 192, a camera 193, a display 194, a card interface 195 of a subscriber identity module card, and the like. Subscriber identity module cards such as subscriber identity module (subscriber identification module, SIM) cards, subscriber identity module (user identity model, UIM) cards, and the like. The sensor module 180 may include a pressure sensor 180A, a gyro sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, a bone conduction sensor 180M, and the like.
It should be understood that the illustrated structure of the embodiment of the present application does not constitute a specific limitation on the electronic device 100. In other embodiments of the application, electronic device 100 may include more or fewer components than shown, or certain components may be combined, or certain components may be split, or different arrangements of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.
The processor 110 may include one or more processing units, such as: the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), a controller, a video codec, a digital signal processor (digital signal processor, DSP), a baseband processor, a satellite protocol stack, and/or a neural network processor (neural-network processing unit, NPU), etc. Wherein the different processing units may be separate devices or may be integrated in one or more processors.
The controller can generate operation control signals according to the instruction operation codes and the time sequence signals to finish the control of instruction fetching and instruction execution.
A memory may also be provided in the processor 110 for storing instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. The memory may hold instructions or data that the processor 110 has just used or recycled. If the processor 110 needs to reuse the instruction or data, it may be called directly from memory. Repeated accesses are avoided and the latency of the processor 110 is reduced, thereby improving the efficiency of the system.
In some embodiments, the processor 110 may include one or more interfaces. The interfaces may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, a universal asynchronous receiver transmitter (universal asynchronous receiver/transmitter, UART) interface, a mobile industry processor interface (mobile industry processor interface, MIPI), a general-purpose input/output (GPIO) interface, a subscriber identity module (subscriber identity module, SIM) interface, a serial peripheral interface (serial peripheral interface, SPI), and/or a universal serial bus (universal serial bus, USB) interface, among others.
It should be understood that the interfacing relationship between the modules illustrated in the embodiments of the present application is only illustrative, and is not meant to limit the structure of the electronic device 100. In other embodiments of the present application, the electronic device 100 may also employ different interfacing manners in the above embodiments, or a combination of multiple interfacing manners.
The charge management module 140 is configured to receive a charge input from a charger. The charger can be a wireless charger or a wired charger.
The power management module 141 is used for connecting the battery 142, and the charge management module 140 and the processor 110.
The wireless communication function of the electronic device 100 may be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, a modem processor, a baseband processor, and the like.
The antennas 1 and 2 are used for transmitting and receiving electromagnetic wave signals. Each antenna in the electronic device 100 may be used to cover a single or multiple communication bands. Different antennas may also be multiplexed to improve the utilization of the antennas. For example: the antenna 1 may be multiplexed into a diversity antenna of a wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.
The mobile communication module 150 may provide a solution for wireless communication including 2G/3G/4G/5G, etc., applied to the electronic device 100. The mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (low noise amplifier, LNA), etc. The mobile communication module 150 may receive electromagnetic waves from the antenna 1, perform processes such as filtering, amplifying, and the like on the received electromagnetic waves, and transmit the processed electromagnetic waves to the modem processor for demodulation. The mobile communication module 150 can amplify the signal modulated by the modem processor, and convert the signal into electromagnetic waves through the antenna 1 to radiate. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be disposed in the processor 110. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be provided in the same device as at least some of the modules of the processor 110.
The modem processor may include a modulator and a demodulator. The modulator is used for modulating the low-frequency baseband signal to be transmitted into a medium-high frequency signal. The demodulator is used for demodulating the received electromagnetic wave signal into a low-frequency baseband signal. The demodulator then transmits the demodulated low frequency baseband signal to the baseband processor for processing. The low frequency baseband signal is processed by the baseband processor and then transferred to the application processor. The application processor outputs sound signals through an audio device (not limited to the speaker 170A, the receiver 170B, etc.), or displays images or video through the display screen 194. In some embodiments, the modem processor may be a stand-alone device. In other embodiments, the modem processor may be provided in the same device as the mobile communication module 150 or other functional module, independent of the processor 110.
The wireless communication module 160 may provide solutions for wireless communication including wireless local area network (wireless local area networks, WLAN) (e.g., wireless fidelity (wireless fidelity, wi-Fi) network), bluetooth (BT), global navigation satellite system (global navigation satellite system, GNSS), satellite mobile communication system, frequency modulation (frequency modulation, FM), near field wireless communication technology (near field communication, NFC), infrared technology (IR), etc., applied to the electronic device 100. The wireless communication module 160 may be one or more devices that integrate at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2, modulates the electromagnetic wave signals, filters the electromagnetic wave signals, and transmits the processed signals to the processor 110. The wireless communication module 160 may also receive a signal to be transmitted from the processor 110, frequency modulate it, amplify it, and convert it to electromagnetic waves for radiation via the antenna 2.
In some embodiments, antenna 1 and mobile communication module 150 of electronic device 100 are coupled, and antenna 2 and wireless communication module 160 are coupled, such that electronic device 100 may communicate with a network and other devices through wireless communication techniques. Wireless communication techniques may include global system for mobile communications (global system for mobile communications, GSM), general packet radio service (general packet radio service, GPRS), code division multiple access (code division multiple access, CDMA), wideband code division multiple access (wideband code division multiple access, WCDMA), time division code division multiple access (time-division code division multiple access, TD-SCDMA), long term evolution (long term evolution, LTE), BT, GNSS, WLAN, NFC, FM, and/or IR techniques, among others. The GNSS may include a global satellite positioning system (global positioning system, GPS), a global navigation satellite system (global navigation satellite system, GLONASS), a beidou satellite navigation system (beidou navigation satellite system, BDS), a space-first satellite mobile communication system, a quasi-zenith satellite system (quasi-zenith satellite system, QZSS) and/or a satellite-based augmentation system (satellite based augmentation systems, SBAS).
The electronic device 100 implements display functions through a GPU, a display screen 194, an application processor, and the like. The GPU is a microprocessor for image processing, and is connected to the display 194 and the application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. Processor 110 may include one or more GPUs that execute program instructions to generate or change display information.
The digital signal processor is used for processing digital signals, and can process other digital signals besides digital image signals. For example, when the electronic device 100 selects a frequency bin, the digital signal processor is used to fourier transform the frequency bin energy, or the like.
Video codecs are used to compress or decompress digital video. The electronic device 100 may support one or more video codecs. In this way, the electronic device 100 may play or record video in a variety of encoding formats, such as: dynamic picture experts group (moving picture experts group, MPEG) 1, MPEG2, MPEG3, MPEG4, etc.
The NPU is a neural-network (NN) computing processor, and can rapidly process input information by referencing a biological neural network structure, for example, referencing a transmission mode between human brain neurons, and can also continuously perform self-learning. Applications such as intelligent awareness of the electronic device 100 may be implemented through the NPU, for example: image recognition, face recognition, speech recognition, text understanding, etc.
The external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to enable expansion of the memory capabilities of the electronic device 100. The external memory card communicates with the processor 110 through an external memory interface 120 to implement data storage functions. For example, files such as music, video, etc. are stored in an external memory card.
The internal memory 121 may be used to store computer-executable program code that includes instructions. The internal memory 121 may include a storage program area and a storage data area. The storage program area may store an application program (such as a sound playing function, an image playing function, etc.) required for at least one function of the operating system, etc. The storage data area may store data created during use of the electronic device 100 (e.g., audio data, phonebook, etc.), and so on. In addition, the internal memory 121 may include a high-speed random access memory, and may further include a nonvolatile memory such as at least one magnetic disk storage device, a flash memory device, a universal flash memory (universal flash storage, UFS), and the like. The processor 110 performs various functional applications of the electronic device 100 and data processing by executing instructions stored in the internal memory 121 and/or instructions stored in a memory provided in the processor.
The card interface 195 is used to connect to a subscriber identity module card.
The software system of the electronic device 100 may employ a layered architecture, an event driven architecture, a microkernel architecture, a microservice architecture, or a cloud architecture. In the embodiment of the application, taking an Android system with a layered architecture as an example, a software structure of the electronic device 100 is illustrated.
Fig. 9 is a schematic software structure of an electronic device 100 according to an embodiment of the present application.
The layered architecture divides the software into several layers, each with distinct roles and branches. The layers communicate with each other through a software interface. In some embodiments, the Android system is divided into four layers, from top to bottom, an application layer, an application framework layer, an Zhuoyun row (Android run) and system libraries, and a kernel layer, respectively.
The application layer may include a series of application packages.
As shown in fig. 9, the application package may include applications for satellite communications, cameras, calendars, phone calls, maps, navigation, WLAN, bluetooth, music, video, short messages, etc.
The application framework layer provides an application programming interface (application programming interface, API) and programming framework for application programs of the application layer. The application framework layer includes a number of predefined functions.
As shown in fig. 9, the application framework layer may include an audio processor, a content provider, a view system, a telephony manager, a resource manager, a notification manager, and the like.
The audio processor is used for managing the communication program. The audio processor can perform noise reduction or noise filtration on audio (such as collected audio signals) in the communication process according to the audio processing method provided by the embodiment of the application.
The content provider is used to store and retrieve data and make such data accessible to applications. The data may include video, images, audio, calls made and received, browsing history and bookmarks, phonebooks, etc.
The view system includes visual controls, such as controls to display text, controls to display pictures, and the like. The view system may be used to build applications. The display interface may be composed of one or more views. For example, a display interface including a text message notification icon may include a view displaying text and a view displaying a picture.
The telephony manager is used to provide the communication functions of the electronic device 100. Such as the management of call status (including on, hung-up, etc.).
The resource manager provides various resources for the application program, such as localization strings, icons, pictures, layout files, video files, and the like.
The notification manager allows the application to display notification information in a status bar, can be used to communicate notification type messages, can automatically disappear after a short dwell, and does not require user interaction. Such as notification manager is used to inform that the download is complete, message alerts, etc. The notification manager may also be a notification in the form of a chart or scroll bar text that appears on the system top status bar, such as a notification of a background running application, or a notification that appears on the screen in the form of a dialog window. For example, a text message is prompted in a status bar, a prompt tone is emitted, the electronic device vibrates, and an indicator light blinks, etc.
Android run time includes a core library and virtual machines. Android run time is responsible for scheduling and management of the Android system.
The core library consists of two parts: one part is a function which needs to be called by java language, and the other part is a core library of android.
The application layer and the application framework layer run in a virtual machine. The virtual machine executes java files of the application program layer and the application program framework layer as binary files. The virtual machine is used for executing the functions of object life cycle management, stack management, thread management, security and exception management, garbage collection and the like.
The system library may include a plurality of functional modules. For example: surface manager (surface manager), media Libraries (Media Libraries), three-dimensional graphics processing Libraries (e.g., openGL ES), 2D graphics engines (e.g., SGL), etc.
The surface manager is used to manage the display subsystem and provides a fusion of 2D and 3D layers for multiple applications.
Media libraries support a variety of commonly used audio, video format playback and recording, still image files, and the like. The media library may support a variety of audio and video encoding formats, such as MPEG4, h.264, MP3, AAC, AMR, JPG, PNG, etc.
The three-dimensional graphic processing library is used for realizing three-dimensional graphic drawing, image rendering, synthesis, layer processing and the like.
The 2D graphics engine is a drawing engine for 2D drawing.
The kernel layer is a layer between hardware and software. The inner core layer at least comprises a display driver, a camera driver, an audio driver and a sensor driver.
The workflow of the electronic device 100 software and hardware is illustrated below in connection with capturing a photo scene.
When touch sensor 180K receives a touch operation, a corresponding hardware interrupt is issued to the kernel layer. The kernel layer processes the touch operation into the original input event (including information such as touch coordinates, time stamp of touch operation, etc.). The original input event is stored at the kernel layer. The application framework layer acquires an original input event from the kernel layer, and identifies a control corresponding to the input event. Taking the touch operation as a touch click operation, taking a control corresponding to the click operation as an example of a control of a camera application icon, the camera application calls an interface of an application framework layer, starts the camera application, further starts a camera driver by calling a kernel layer, and captures a still image or video by the camera 193.
An embodiment of the present application provides an electronic device, including: a processor and a memory; the memory stores computer-executable instructions; the processor executes the computer-executable instructions stored in the memory to cause the electronic device to perform the method described above.
The embodiment of the application provides a chip. Fig. 10 is a schematic diagram of a hardware structure of a chip according to an embodiment of the present application. The chip includes one or more (including two) processors 41, communication lines 42, a communication interface 43, and a memory 44. The processor 41 is used to call a computer program in the memory to perform the technical solutions in the above embodiments. The principle and technical effects of the present application are similar to those of the above-described related embodiments, and will not be described in detail herein.
The embodiment of the application also provides a computer readable storage medium. The computer-readable storage medium stores a computer program. The computer program realizes the above method when being executed by a processor. The methods described in the above embodiments may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer readable media can include computer storage media and communication media and can include any medium that can transfer a computer program from one place to another. The storage media may be any target media that is accessible by a computer.
In one possible implementation, the computer readable medium may include RAM, ROM, compact disk-read only memory (CD-ROM) or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium targeted for carrying or storing the desired program code in the form of instructions or data structures and accessible by a computer. Also, any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital subscriber line (Digital Subscriber Line, DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. Disk and disc, as used herein, includes optical disc, laser disc, optical disc, digital versatile disc (Digital Versatile Disc, DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
Embodiments of the present application provide a computer program product comprising a computer program which, when executed, causes a computer to perform the above-described method.
Embodiments of the present application are described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processing unit of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processing unit of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The foregoing detailed description of the application has been presented for purposes of illustration and description, and it should be understood that the foregoing is by way of illustration and description only, and is not intended to limit the scope of the application.

Claims (12)

1. An audio processing method is characterized by being applied to electronic equipment, wherein the electronic equipment comprises a first audio acquisition component and a second audio acquisition component; the method comprises the following steps:
acquiring an n+1th first audio signal acquired by the first audio acquisition component in an n+1th acquisition time, and an n+1th second audio signal acquired by the second audio acquisition component in the n+1th acquisition time;
when the n+1th first audio signal contains a target type signal and/or the n+1th second audio signal contains the target type signal, obtaining an n+1th noise spectrum according to the n+1th leakage coefficient, the n+1th second audio signal and the n noise spectrum; the n noise spectrum is the noise spectrum corresponding to the last acquisition time of the n+1th acquisition time; the greater the n+1th leakage coefficient, the smaller the weight of the n+1th second audio signal in the n+1th noise spectrum, the greater the weight of the n+1th noise spectrum in the n+1th noise spectrum; the n+1th leakage quantity coefficient represents the reverberation degree of the pick-up environment where the electronic equipment is located in the n+1th acquisition time;
adopting the n+1th noise spectrum to perform noise reduction processing on the n+1th first audio signal; wherein n is an integer.
2. The method of claim 1, wherein the n+1th second audio signal comprises n+1th second frequency domain signals corresponding to respective frequency points; the nth noise spectrum comprises nth noise sub-spectrums corresponding to the plurality of frequency points respectively; the n+1th noise spectrum comprises n+1th noise sub-spectrums corresponding to the plurality of frequency points respectively;
the n+1th noise sub-spectrum corresponding to the frequency k in the n+1th noise spectrumThe following formula is satisfied:
wherein beta is n+1 For the leakage coefficient corresponding to the n+1th acquisition time, Y (k, n+1) is the n+1th second frequency domain signal corresponding to the frequency point k,is the nth noise sub-spectrum corresponding to the frequency point k, and is more than or equal to 0 and less than or equal to beta n+1 Is less than or equal to 1; and the n+1th noise sub-spectrum corresponding to the frequency point k represents the intensity or energy of noise corresponding to the frequency point k in the n+1th acquisition time.
3. Method according to claim 1 or 2, characterized in that the n+1 leakage coefficient relates to the energy ratio of the energy of the n+1 th first audio signal to the energy of the n+1 th second audio signal; wherein the larger the energy ratio is, the smaller the n+1th leakage amount coefficient is, and the smaller the energy ratio is, the larger the n+1th leakage amount coefficient is.
4. A method according to any one of claims 1-3, wherein the n+1th first audio signal comprises an n+1th first frequency domain signal to which a plurality of frequency points each correspond, and the n+1th second audio signal comprises an n+1th second frequency domain signal to which the plurality of frequency points each correspond;
For a frequency point k, the n+1th first frequency domain signal X (k, n+1) corresponding to the frequency point k includes a log likelihood ratio ln (δ (k, n+1)) of the target type signal to satisfy the formula:
wherein σ (k, n+1) is a posterior signal-to-noise ratio of the (n+1) th first frequency domain signal X (k, n+1) corresponding to the frequency point k; ρ (k, n+1) is the a priori signal to noise ratio of the X (k, n+1);
the method further comprises the steps of:
if the n+1th first frequency domain signal corresponding to each of the plurality of frequency points contains the sum of log likelihood ratios of the target type signals and is smaller than a first preset threshold value, determining that the n+1th first audio signal contains the target type signals and/or that the n+1th second audio signal contains the target type signals;
if the n+1th first frequency domain signal corresponding to each of the plurality of frequency points includes a sum of log likelihood ratios of the target type signals, which is greater than or equal to the first preset threshold, it is determined that the n+1th first audio signal does not include the target type signals, and the n+1th second audio signal does not include the target type signals.
5. The method of claim 4, wherein α (k, n+1) satisfies the formula:
the ρ (k, n+1) satisfies the formula:
p(k,n+1)=αp(k,n)+(1-α)(σ(k,n+1)-1);
wherein Y (k, n+1) is the n+1th second frequency domain signal corresponding to the frequency point k, alpha is a weight factor, rho (k, n) is the prior signal-to-noise ratio of the first frequency domain signal X (k, n) of the last acquisition time of the n+1th acquisition time, and rho (k, n) satisfies the formula Z (k, n) is the frequency domain signal after the noise reduction of the X (k, n), and Y (k, n) is the second frequency domain signal of the last acquisition time of the n+1th acquisition time corresponding to the frequency point k.
6. The method of any one of claims 1-5, further comprising:
when the n+1th first audio signal does not contain the target type signal and the n+1th second audio signal does not contain the target type signal, obtaining an n+1th noise spectrum according to the n+1th second audio signal and the n noise spectrum; wherein the weight of the n+1th second audio signal in the n+1th noise spectrum is the same as the weight of the n+1th noise spectrum in the n+1th noise spectrum.
7. The method of any of claims 1-6, wherein the n+1th first audio signal comprises an n+1th first frequency domain signal corresponding to each of a plurality of frequency points; the n+1th noise spectrum comprises n+1th noise sub-spectrums corresponding to the plurality of frequency points respectively;
and carrying out noise reduction processing on the n+1th first audio signal by adopting the n+1th noise spectrum, wherein the noise reduction processing comprises the following steps of:
for the frequency point k of the signal,
if the energy of the (n+1) th first frequency domain signal X (k, n+1) corresponding to the frequency point k is smaller than the (n+1) th noise sub-spectrum corresponding to the frequency point k Determining the enhancement coefficient corresponding to the X (k, n+1) as a preset attenuation value; the preset attenuation value is less than 1;
if the energy of the (n+1) th first frequency domain signal X (k, n+1) corresponding to the frequency point k is greater than or equal to the (n+1) th noise sub-spectrum corresponding to the frequency point kAnd then determining the corresponding enhancement of said X (k, n+1)The coefficient is 1;
multiplying the enhancement coefficient corresponding to the X (k, n+1) by the X (k, n+1) to obtain an n+1th first noise reduction frequency domain signal Z (k, n+1) corresponding to the frequency point k;
performing time domain transformation processing on the n+1th first noise reduction frequency domain signal corresponding to each of the plurality of frequency points to obtain an n+1th first noise reduction time domain signal corresponding to the n+1th acquisition time;
the n+1th noise sub-spectrum corresponding to the frequency point k represents the intensity or energy of noise corresponding to the frequency point k in the n+1th acquisition time.
8. The method of any of claims 1-7, wherein acquiring an n+1th first audio signal acquired by the first audio acquisition component during an n+1th acquisition time and an n+1th second audio signal acquired by the second audio acquisition component during the n+1th acquisition time comprises:
acquiring a first time domain signal acquired by the first audio acquisition component in a time period t and a second time domain signal acquired by the second audio acquisition component in the time period t;
Framing, windowing and frequency domain transformation are carried out on the first time domain signal to obtain a plurality of frames of first audio signals; framing, windowing and frequency domain transformation are carried out on the second time domain signal to obtain a plurality of frames of second audio signals; the frame length of each frame of the first audio signal is the same as the n+1th acquisition time; the frame length of each frame of the second audio signal is the same as the n+1th acquisition time.
9. An electronic device, comprising: a processor and a memory;
the memory stores computer-executable instructions;
the processor executing computer-executable instructions stored in the memory to cause the electronic device to perform the method of any one of claims 1-8.
10. A computer readable storage medium storing a computer program, which when executed by a processor implements the method according to any one of claims 1-8.
11. A computer program product comprising a computer program which, when run, causes a computer to perform the method of any of claims 1-8.
12. A chip comprising a processor for invoking a computer program in memory to perform the method of any of claims 1-8.
CN202310402427.3A 2023-04-11 2023-04-11 Audio processing method, device and storage medium Pending CN117156345A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310402427.3A CN117156345A (en) 2023-04-11 2023-04-11 Audio processing method, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310402427.3A CN117156345A (en) 2023-04-11 2023-04-11 Audio processing method, device and storage medium

Publications (1)

Publication Number Publication Date
CN117156345A true CN117156345A (en) 2023-12-01

Family

ID=88899476

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310402427.3A Pending CN117156345A (en) 2023-04-11 2023-04-11 Audio processing method, device and storage medium

Country Status (1)

Country Link
CN (1) CN117156345A (en)

Similar Documents

Publication Publication Date Title
CN111178546B (en) Searching method of machine learning model and related device and equipment
EP3968670B1 (en) Bluetooth-based object searching method and electronic device
CN114650363B (en) Image display method and electronic equipment
CN111563466B (en) Face detection method and related product
WO2022160715A1 (en) Voice signal processing method and electronic device
CN111382087A (en) Memory management method and electronic equipment
CN116665692B (en) Voice noise reduction method and terminal equipment
CN115543145A (en) Folder management method and device
CN117153181A (en) Voice noise reduction method, device and storage medium
CN111104209B (en) Task processing method and related equipment
WO2023160179A1 (en) Magnification switching method and magnification switching apparatus
CN117156345A (en) Audio processing method, device and storage medium
CN114828098B (en) Data transmission method and electronic equipment
CN113473013A (en) Display method and device for beautifying effect of image and terminal equipment
CN117133311B (en) Audio scene recognition method and electronic equipment
CN116993619B (en) Image processing method and related equipment
CN116095219B (en) Notification display method and terminal device
CN116087930B (en) Audio ranging method, device, storage medium, and program product
CN116703741B (en) Image contrast generation method and device and electronic equipment
CN116668764B (en) Method and device for processing video
CN116074624B (en) Focusing method and device
CN116546126B (en) Noise suppression method and electronic equipment
CN116795604B (en) Processing method, device and equipment for application exception exit
CN117149201A (en) Code execution method and device and electronic equipment
CN114025043A (en) Method and device for calling third-party application, terminal equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination