WO2020103035A1 - Procédé et appareil de traitement audio, ainsi que support de stockage et dispositif électronique - Google Patents
Procédé et appareil de traitement audio, ainsi que support de stockage et dispositif électroniqueInfo
- Publication number
- WO2020103035A1 WO2020103035A1 PCT/CN2018/116718 CN2018116718W WO2020103035A1 WO 2020103035 A1 WO2020103035 A1 WO 2020103035A1 CN 2018116718 W CN2018116718 W CN 2018116718W WO 2020103035 A1 WO2020103035 A1 WO 2020103035A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- audio
- sub
- signals
- beamforming
- current audio
- Prior art date
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 34
- 230000005236 sound signal Effects 0.000 claims abstract description 113
- 230000001629 suppression Effects 0.000 claims description 117
- 239000000284 extract Substances 0.000 claims description 12
- 238000004590 computer program Methods 0.000 claims description 10
- 230000001131 transforming effect Effects 0.000 claims description 10
- 238000000605 extraction Methods 0.000 claims description 8
- 230000000873 masking effect Effects 0.000 claims description 7
- 238000000034 method Methods 0.000 abstract description 16
- 230000015572 biosynthetic process Effects 0.000 abstract description 3
- 238000003786 synthesis reaction Methods 0.000 abstract description 3
- 230000005764 inhibitory process Effects 0.000 abstract 4
- 238000010586 diagram Methods 0.000 description 8
- 230000006870 function Effects 0.000 description 7
- 230000003287 optical effect Effects 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 230000003111 delayed effect Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000001934 delay Effects 0.000 description 1
- 238000009432 framing Methods 0.000 description 1
- 238000009434 installation Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
Definitions
- the present application belongs to the technical field of audio processing, and particularly relates to an audio processing method, device, storage medium, and electronic equipment.
- the embodiments of the present application provide an audio processing method, device, storage medium, and electronic equipment, which can improve the quality of sound collected by the electronic equipment.
- an embodiment of the present application provides an audio processing method, which is applied to an electronic device.
- the electronic device includes two microphones arranged back-to-back and separated by a preset distance.
- the audio processing method includes:
- Noise suppression is performed on the multiple beamforming signals according to the multiple gain factors, and the multiple beamforming signals after noise suppression are band-spliced and converted into the time domain to obtain an audio frame after noise suppression.
- an embodiment of the present application provides an audio processing device, which is applied to an electronic device.
- the electronic device includes two microphones arranged back-to-back and separated by a preset distance.
- the audio processing device includes:
- An audio collection module configured to collect sound through the two microphones to obtain two audio signals
- An audio extraction module for transforming the current audio frames of the two audio signals from the time domain to the frequency domain, and extracting sub audio signals from the respective desired directions in the two current audio frames in the frequency domain to obtain two sub audios A signal, wherein the desired directions corresponding to the two current audio frames are opposite;
- a beam forming module configured to divide the two sub-audio signals into frequency bands, and perform beam forming on the corresponding multiple sub-bands according to the corresponding beam forming filter coefficients to obtain multiple beam forming signals;
- a factor acquisition module for acquiring noise for the multiple beamforming signals in the multiple sub-bands respectively according to the corresponding beamforming filter coefficients and the respective autocorrelation coefficients of the two sub-audio signals Suppressed multiple gain factors;
- the noise suppression module is configured to perform noise suppression on the multiple beamforming signals according to the multiple gain factors, and convert the multiple beamforming signals after noise suppression into a time domain after frequency band splicing to obtain noise suppression Audio frame.
- an embodiment of the present application provides a storage medium on which a computer program is stored, wherein, when the computer program is executed on a computer, the computer is caused to execute the audio processing method provided in the embodiment of the present application. step.
- an embodiment of the present application provides an electronic device, including a memory, a processor, and two microphones arranged back-to-back and separated by a preset distance, and the processor is used by calling a computer program stored in the memory for carried out:
- Noise suppression is performed on the multiple beamforming signals according to the multiple gain factors, and the multiple beamforming signals after noise suppression are band-spliced and converted into the time domain to obtain an audio frame after noise suppression.
- the electronic device can collect sound through two microphones to obtain two audio signals; then transform the current audio frames of the two audio signals from the time domain to the frequency domain, and extract the two current audios in the frequency domain
- the sub-audio signals from the respective desired directions in the frame get two sub-audio signals, where the desired directions corresponding to the two current audio frames are opposite; then the two sub-audio signals are divided into frequency bands, and the divided sub-bands are divided according to the corresponding
- the beamforming filter coefficients are used for beamforming to obtain multiple beamforming signals; then, in multiple sub-bands, according to the corresponding beamforming filter coefficients and the respective autocorrelation coefficients of the two sub-audio signals, the respective Multiple gain factors that form a signal for noise suppression; finally, perform noise suppression on multiple beamforming signals according to the multiple gain factors, and convert the noise suppressed multiple beamforming signals into frequency domains after splicing the frequency bands to obtain noise
- the suppressed audio frame can obtain a complete audio signal after noise suppression, which can improve the quality of the sound collected by the
- FIG. 1 is a schematic flowchart of an audio processing method provided by an embodiment of the present application.
- FIG. 2 is a schematic diagram of the installation positions of two microphones in an embodiment of the present application.
- FIG. 3 is a schematic diagram of performing noise suppression on two audio signals collected by two microphones in an embodiment of the present application.
- FIG. 4 is another schematic flowchart of an audio processing method provided by an embodiment of the present application.
- FIG. 5 is a schematic structural diagram of an audio processing device provided by an embodiment of the present application.
- FIG. 6 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
- FIG 7 is another schematic structural diagram of an electronic device provided by an embodiment of the present application.
- FIG. 1 is a schematic flowchart of an audio processing method provided by an embodiment of the present application.
- the audio processing method can be applied to an electronic device including two microphones arranged back-to-back and separated by a preset distance.
- the flow of the audio processing method may include:
- the back-to-back arrangement of the two microphones means that the sound pickup holes of the two microphones face in opposite directions.
- the electronic device includes two microphones, which are a microphone 1 provided on the lower side of the electronic device and a microphone 2 provided on the upper side of the electronic device, wherein the microphone 1 has the sound hole facing down The sound hole of 2 is facing up.
- the two microphones provided in the electronic device may be non-directional microphones (or omnidirectional microphones).
- the electronic device when triggering sound collection, can simultaneously collect sound through two microphones arranged back-to-back and separated by a preset distance, thereby collecting two audio signals with the same duration.
- the electronic device can trigger sound collection when conducting a call, can also perform sound collection when receiving a recording instruction input by a user, can also trigger sound collection when performing voiceprint recognition, and so on.
- the current audio frames of the two audio signals are transformed from the time domain to the frequency domain, and the sub audio signals from the respective desired directions in the two current audio frames are extracted in the frequency domain to obtain two sub audio signals, where, The expected directions corresponding to the two current audio frames are opposite.
- the electronic device after acquiring two audio signals of the same duration through two microphones, the electronic device separately performs frame processing on the two audio signals to divide the two audio signals into multiple audio frames of the same number, Thus, noise suppression is performed frame by frame.
- the two collected audio signals are recorded as audio signal 1 and audio signal 2, respectively.
- the electronic device can divide audio signal 1 into n audio frames with a length of 20 milliseconds. 2 subframes are n audio frames with a length of 20 milliseconds, so that the first audio frame from audio signal 1 and the first audio frame from audio signal 2 are used for noise suppression to obtain the first audio after noise suppression Frame, perform noise suppression according to the second audio frame from audio signal 1 and the second audio frame from audio signal 2, to obtain the second audio frame after noise suppression, and according to the nth audio from audio signal 1
- the frame and the nth audio frame from the audio signal 2 are subjected to noise suppression to obtain the nth noise suppressed audio frame, and so on. In this way, a complete audio signal after noise suppression can be obtained from these audio frames after noise suppression, which will be described below.
- the current audio frame is not used to specifically refer to an audio frame, but is used to refer to the audio frame used for noise suppression at the current time, for example, if at the current time according to the fifth of the two audio signals If the audio frame is subjected to noise suppression, the fifth audio frame of the two audio signals is the current audio frame. If noise suppression is performed according to the sixth audio frame of the two audio signals at the current time, the sixth of the two audio signals An audio frame is the current audio frame, and so on.
- the electronic device transforms the current audio frames of the two audio signals from the time domain to the frequency domain, and extracts the two current audio frames from the respective desired directions (the desired direction of the microphone) in the frequency domain Sub audio signal, get two sub audio signals.
- the desired directions of the two microphones are opposite, wherein the desired direction of the microphone closer to the target sound source is the direction toward the target sound source, and the desired direction of the microphone farther away from the target sound source is the direction away from the target sound source .
- the electronic device collects sound when the owner is talking, the owner is the target sound source, and the two microphones of the electronic device are recorded as microphone 1 and microphone 2. If microphone 1 is closer to the owner, the expectation of microphone 1 is The direction is toward the owner, and the desired direction of the microphone 2 is away from the owner.
- one of the sub-audio signals carries more "target sounds” and the other sub-audio The signal carries more "noise”.
- the two sub-audio signals are divided into frequency bands, and the plurality of divided sub-bands are beamformed according to corresponding beamforming filter coefficients to obtain a plurality of beamforming signals.
- the electronic device after extracting two sub-audio signals from two current audio frames, divides the two sub-audio signals in the same frequency band division manner to obtain multiple sub-bands. Afterwards, for each sub-band, beamforming is performed according to the beamforming filter coefficient corresponding to the subband to obtain the beamforming signal of the subband. In this way, for dividing into multiple subbands, the electronic device will correspondingly obtain multiple beamforming signal.
- the electronic device divides two sub-audio signals according to the same frequency band division method to obtain i sub-bands, and performs beam forming on the i sub-bands according to corresponding beamforming filter coefficients to obtain i beam-forming signals.
- a plurality of gain factors respectively used for performing noise suppression on the plurality of beamforming signals are obtained in the plurality of subbands according to corresponding beamforming filter coefficients and respective autocorrelation coefficients of the two sub-audio signals.
- the electronic device After obtaining multiple beamforming signals, the electronic device performs auto-correlation calculation on the two sub-audio signals in each sub-band to obtain the auto-correlation coefficients of the two sub-audio signals in each sub-band. Then, for each sub-band, according to the beamforming filter coefficient corresponding to the sub-band and the autocorrelation coefficients of the two sub-audio signals in the sub-band, the gain for noise suppression of the beam-forming signal of the sub-band is obtained factor. In this way, for multiple beamforming signals that are beamformed, the electronic device will correspondingly obtain gain factors that are used to perform noise suppression on the multiple beamforming signals, respectively.
- the multiple beamforming signals are subjected to noise suppression according to multiple gain factors, and the multiple beamforming signals after noise suppression are band-spliced and converted into the time domain to obtain an audio frame after noise suppression.
- the electronic device may perform noise suppression on the multiple beamforming signals according to the multiple gain factors, respectively, to obtain Multiple beamforming signals after noise suppression. After that, the electronic device performs frequency band splicing on the multiple beamforming signals after noise suppression and converts them to the time domain to obtain an audio frame after noise suppression.
- the electronic device can collect sound through two microphones to obtain two audio signals; and then transform the current audio frames of the two audio signals from the time domain to the frequency domain, and in the frequency domain Extract the sub-audio signals from the respective desired directions in the two current audio frames to obtain two sub-audio signals, where the desired directions corresponding to the two current audio frames are opposite; then divide the two sub-audio signals into frequency bands Multiple sub-bands are beamformed according to the corresponding beamforming filter coefficients to obtain multiple beamforming signals; then in multiple subbands respectively according to the corresponding beamforming filter coefficients and the autocorrelation coefficients of the two sub-audio signals, respectively Multiple gain factors for performing noise suppression on multiple beamforming signals; finally, performing noise suppression on the multiple beamforming signals according to the multiple gain factors respectively, and converting the multiple beamforming signals after noise suppression into frequency bands to convert to In the time domain, the audio frame after noise suppression is obtained, thereby a complete audio signal after noise suppression can be obtained, which can improve the quality of sound collected by the
- FIG. 4 is another schematic flowchart of an audio processing method according to an embodiment of the present application.
- the audio processing method can be applied to an electronic device including two microphones arranged back-to-back and separated by a preset distance.
- the flow of the audio processing method may include:
- the electronic device collects sound through two microphones to obtain two audio signals.
- the back-to-back arrangement of the two microphones means that the sound pickup holes of the two microphones face in opposite directions.
- the electronic device includes two microphones, which are a microphone 1 provided on the lower side of the electronic device and a microphone 2 provided on the upper side of the electronic device, wherein the microphone 1 has the sound hole facing down and the microphone The sound hole of 2 is facing up.
- the two microphones provided in the electronic device may be non-directional microphones (or omnidirectional microphones).
- the electronic device when triggering sound collection, can simultaneously collect sound through two microphones arranged back-to-back and separated by a preset distance, thereby collecting two audio signals with the same duration.
- the electronic device can trigger sound collection when conducting a call, can also perform sound collection when receiving a recording instruction input by a user, can also trigger sound collection when performing voiceprint recognition, and so on.
- the electronic device separately frames the two audio signals into multiple audio frames according to the same frame division method to obtain two audio frame sequences.
- the electronic device sequentially selects one audio frame from the two audio frame sequences as the current audio frame of each of the two audio signals.
- the electronic device after acquiring two audio signals of the same duration through two microphones, the electronic device separately frames the two audio signals into multiple audio frames according to the same framing method to obtain two audio frame sequences , Frame-by-frame noise suppression.
- the two collected audio signals are denoted as audio signal 1 and audio signal 2, respectively, and the electronic device can divide audio signal 1 into n audio frames with a length of 20 milliseconds (composing an audio frame sequence 1) Similarly, divide the audio signal 2 into n audio frames with a length of 20 milliseconds (composing audio frame sequence 2).
- noise suppression is performed according to the first audio frame from audio frame sequence 1 and the first audio frame from audio frame sequence 2 to obtain the first noise-suppressed audio frame, and the second audio frame from audio frame sequence 1
- the audio frame and the second audio frame from the audio frame sequence 2 are subjected to noise suppression to obtain the second noise-suppressed audio frame
- the n-th audio frame from the audio frame sequence 1 and the second from the audio frame sequence 2 The n audio frames are subjected to noise suppression to obtain the n-th audio frame after noise suppression, and so on.
- the audio frame used for noise suppression at the current time is recorded as the current audio frame.
- the electronic device selects the current audio frame, it can follow the sequence of the audio frames in the time domain of the two audio frame sequences. Select one audio frame from the two audio frame sequences in sequence, as the current audio frame of each of the two audio signals, for noise suppression.
- the first audio frame in the sequence of two audio frames can be selected as the current audio frame of each of the two audio signals, and the current audio frame is selected for the second time
- the second audio frame in the two audio frame sequences can be selected as the current audio frames of the two audio signals, and so on, until noise suppression is performed according to all audio frames in the two audio frame sequences .
- the electronic device transforms the current audio frames of the two audio signals from the time domain to the frequency domain, and extracts the sub audio signals from the respective desired directions in the two current audio frames in the frequency domain to obtain two sub audio signals, Among them, the expected directions corresponding to the two current audio frames are opposite.
- the electronic device transforms the current audio frames of the two audio signals from the time domain to the frequency domain, and extracts the two current audio frames from the respective desired directions (the desired direction of the microphone) in the frequency domain Sub audio signal, get two sub audio signals.
- the desired direction of the two microphones is opposite, the desired direction of one microphone is the direction toward the target sound source, and the desired direction of the other microphone is the direction away from the target sound source.
- the electronic device collects sound when the owner is talking, the owner is the target sound source, and the two microphones of the electronic device are recorded as microphone 1 and microphone 2. If microphone 1 is closer to the owner, the expectation of microphone 1 is The direction is toward the owner, and the desired direction of the microphone 2 is away from the owner.
- one of the sub-audio signals carries more "target sounds” and the other sub-audio The signal carries more "noise”.
- the electronic device divides the frequency bands of the two sub-audio signals, and performs beamforming on the multiple subbands obtained according to the corresponding beamforming filter coefficients to obtain multiple beamforming signals.
- the electronic device after extracting two sub-audio signals from two current audio frames, divides the two sub-audio signals in the same frequency band division manner to obtain multiple sub-bands. Afterwards, for each subband, beamforming is performed according to the beamforming filter coefficient corresponding to the subband to obtain the beamforming signal of the subband. In this way, for multiple subbands divided, the electronic device will obtain multiple beamforming correspondingly signal.
- the electronic device divides the frequency bands of the two sub-audio signals according to the same frequency band division method to obtain i sub-bands, and performs beam forming on the i sub-bands according to the corresponding beamforming filter coefficients to obtain i beam-forming signals.
- the electronic device obtains a plurality of gain factors respectively used for noise suppression of the plurality of beamforming signals according to corresponding beamforming filter coefficients and respective autocorrelation coefficients of the two sub-audio signals in the plurality of subbands.
- the electronic device After obtaining multiple beamforming signals, the electronic device performs auto-correlation calculation on the two sub-audio signals in each sub-band to obtain the auto-correlation coefficients of the two sub-audio signals in each sub-band. Then, for each sub-band, according to the beamforming filter coefficient corresponding to the sub-band and the autocorrelation coefficients of the two sub-audio signals in the sub-band, the gain for noise suppression of the beam-forming signal of the sub-band is obtained factor. In this way, for multiple beamforming signals that are beamformed, the electronic device will correspondingly obtain gain factors that are used to perform noise suppression on the multiple beamforming signals, respectively.
- the electronic device separately performs noise suppression on the multiple beamforming signals according to the multiple gain factors, and converts the multiple beamforming signals after the noise suppression into the time domain by frequency band splicing to obtain an audio frame after the noise suppression.
- the electronic device may perform noise suppression on the multiple beamforming signals according to the multiple gain factors, respectively, to obtain Multiple beamforming signals after noise suppression. After that, the electronic device performs frequency band splicing on the multiple beamforming signals after noise suppression and converts them to the time domain to obtain an audio frame after noise suppression.
- the electronic device determines whether the current audio frame of each of the two sub-audio signals is the last audio frame. If yes, go to 209, otherwise go to 203.
- the electronic device determines whether the current audio frame of each of the two sub-audio signals is the last audio frame, so as to determine whether noise suppression has been performed according to all audio frames in the two audio frame sequences according to the obtained judgment result.
- the process proceeds to 209 to synthesize multiple audio frames after noise suppression into a complete audio signal.
- the process proceeds to 204, and the audio frames are selected from the two audio frame sequences for noise suppression.
- the electronic device performs synthesis processing on a plurality of noise-suppressed audio frames to obtain a noise-suppressed audio signal.
- the electronic device after performing noise suppression on all audio frames in two audio frame sequences, the electronic device will correspondingly obtain multiple audio frames after noise suppression, and perform synthesis processing on the multiple audio frames after noise suppression, The audio signal after noise suppression can be obtained.
- how to synthesize multiple noise-suppressed audio frames can be performed by a person of ordinary skill in the art according to actual needs, which is not described in detail in the embodiments of the present application.
- the sub-audio signals from the respective desired directions in the two current audio frames are extracted in the frequency domain to obtain two sub-audio signals, and the electronic device may perform:
- the electronic device delay-inverts the current audio frame with the lower energy in the two current audio frames and superimposes the current audio frame with the higher energy to obtain the sub-audio signal from the desired direction in the current audio frame with the higher energy;
- the electronic device delays the current audio frame with a large energy and subtracts the current audio frame with a small energy to obtain a sub-audio signal from its desired direction in the current audio frame with a small energy.
- the microphone closer to the target sound source and the microphone farther from the target sound source in the two microphones are determined according to the energy levels of the two audio frames.
- the electronic device calculates the energy of the two current audio frames to obtain the respective energy of the two current audio frames, and determines the microphone corresponding to the current audio frame with the larger energy as the microphone closer to the target sound source, and compares the energy
- the microphone corresponding to the large current audio frame is determined to be a microphone far away from the target sound source.
- the electronic device when extracting the sub-audio signals from the respective desired directions in the two current audio frames in the frequency domain, for a current audio frame with a larger energy, the electronic device can delay the current audio frame with a lower energy and compare it with the energy The large current audio frames are superimposed to obtain the sub-audio signal from the desired direction in the current audio frame with larger energy, which can be expressed as:
- X (k) is the frequency domain representation of the sub-audio signal from its desired direction in the current audio frame with greater energy
- a (k) is the frequency domain representation of the current audio frame with greater energy
- B (k) is the frequency domain representation of the current audio frame with less energy
- f k is the frequency corresponding to the k-th frequency point
- ⁇ is the delay time of the two microphones relative to the target sound source.
- the electronic device can delay the current audio frame with higher energy and subtract the current audio frame with lower energy to obtain the sub audio from the desired direction in the current audio frame with lower energy.
- the signal can be expressed as:
- Y (k) is the frequency domain representation of the sub-audio signal from its desired direction in the current audio frame with less energy.
- the electronic device may perform beamforming according to the following formula:
- Z (l) represents the beamforming signal of the l-th subband
- w (l) represents the corresponding beamforming filter coefficients of the two current audio frames in the l-th subband
- X (l) represents the current with a larger energy
- the sub-audio signal of the audio frame is the sub-band signal of the l-th sub-band
- Y (1) represents the sub-band signal of the sub-audio signal of the current audio frame with the smaller energy in the l-th sub-band.
- the electronic device updates the beamforming filter coefficients to obtain the corresponding beamforming filter coefficients of the current selected audio frame in the l-th subband next time, expressed as:
- w (l) represents the corresponding beamforming filter coefficient of the current selected audio frame in the l-th sub-band next time
- ⁇ l is the convergence step corresponding to the l-th sub-band.
- the convergence step length corresponding to each sub-band may be the same or different. Specifically, a person of ordinary skill in the art can take an empirical value according to actual needs.
- the corresponding beamforming filter coefficients in the l-th sub-band The electronic device updates the beamforming filter coefficients to obtain the corresponding beamforming filter coefficients of the second audio frame of the two audio frame sequences in its l-th subband
- the electronic device can obtain multiple gain factors according to the following formula:
- G (l) represents the gain factor used for noise suppression of the beamforming signal of the lth subband
- R XXl represents the autocorrelation coefficient of the current audio frame with greater energy in the lth subband
- R YYl represents the energy The autocorrelation coefficient of the small current audio frame in the l-th subband.
- R XXl ⁇ l R XXl '+ (1- ⁇ l) (X l) H X l
- R YYl ⁇ l R YYl' + (1- ⁇ l) (Y l) H Y l;
- R XXl ' is the autocorrelation coefficient of the current audio frame with the highest energy in the first subband of the two current audio frames selected last time
- R YYl ' is the one with the lower energy among the two current audio frames selected last The autocorrelation coefficient of the current audio frame in its l-th subband.
- the two selected current audio frames are the first audio frames in the sequence of two audio frames
- the auto-correlation coefficient of the audio frame with the greater energy in the two first audio frames in its lth sub-band The autocorrelation coefficient of the audio frame with the smaller energy in the two first audio frames in its l-th subband
- the two current audio frames selected next time are the second audio frames in the sequence of two audio frames, and the autocorrelation coefficient of the audio frame with the greater energy in the second sub-frame in its l-th sub-band
- the electronic device may perform:
- the electronic device divides the frequency bands of the two sub-audio signals extracted from the two current audio frames based on the critical frequency band of the human ear masking effect.
- the human ear masking effect means that the human ear is sensitive to a certain frequency signal with strong intensity, and is relatively weak to the relatively weak frequency band response, which means that a certain frequency of sound masks the sound of other frequencies.
- 24 sub-bands can be divided from 20 Hz to 16 kHz, which is more in line with the human ear hearing effect and also compresses the data that needs to be processed in the two sub-audio signals.
- the electronic device when transforming the current audio frames of the two audio signals from the time domain to the frequency domain, the electronic device may perform:
- the electronic device uses short-time Fourier transform to transform the current audio frames of the two audio signals from the time domain to the frequency domain.
- the electronic device when transforming the current audio frames of the two audio signals from the time domain to the frequency domain, the electronic device may perform:
- the electronic device uses fast Fourier transform to transform the current audio frames of the two audio signals from the time domain to the frequency domain.
- the electronic device when performing noise suppression on multiple beamforming signals according to multiple gain factors, the electronic device may perform:
- the electronic device multiplies the multiple beamforming signals by their corresponding gain factors to obtain the multiple beamforming signals after noise suppression.
- FIG. 5 is a schematic structural diagram of an audio processing device according to an embodiment of the present application.
- the audio processing apparatus can be applied to electronic equipment including two microphones arranged back-to-back and separated by a preset distance.
- the audio processing device may include an audio acquisition module 401, an audio extraction module 402, a beam forming module 403, a factor acquisition module 404, and a noise suppression module 405, wherein,
- the audio collection module 401 is used to collect sound through two microphones to obtain two audio signals;
- the audio extraction module 402 is used to transform the current audio frames of the two audio signals from the time domain to the frequency domain, and extract the sub audio signals from the respective desired directions in the two current audio frames in the frequency domain to obtain two sub audio signals , Where the expected directions corresponding to the two current audio frames are opposite;
- the beam forming module 403 is used to divide the frequency bands of the two sub-audio signals, and perform beam forming on the multiple sub-bands obtained according to the corresponding beam forming filter coefficients to obtain multiple beam forming signals;
- the factor obtaining module 404 is configured to obtain multiple gain factors respectively used for performing noise suppression on multiple beamforming signals according to corresponding beamforming filter coefficients and respective autocorrelation coefficients of the two sub-audio signals in multiple subbands;
- the noise suppression module 405 is configured to perform noise suppression on multiple beamforming signals according to multiple gain factors, and convert the multiple beamforming signals after noise suppression into a time domain after frequency band splicing to obtain an audio frame after noise suppression .
- the audio extraction module 402 when extracting sub-audio signals from respective desired directions in the two current audio frames in the frequency domain to obtain two sub-audio signals, the audio extraction module 402 may be used to:
- the current audio frame with the lower energy in the two current audio frames is delayed and inverted, and then superimposed on the current audio frame with the higher energy to obtain the sub audio signal from the desired direction in the current audio frame with the higher energy;
- the sub audio signal from the desired direction in the current audio frame with a smaller energy is obtained.
- the beamforming module 403 performs beamforming according to the following formula:
- Z (l) represents the beamforming signal of the l-th subband
- w (l) represents the corresponding beamforming filter coefficient of the l-th subband
- X (l) represents the sub-audio signal of the current audio frame with greater energy
- Y (1) represents the sub-band signal of the sub-audio signal of the current audio frame with a smaller energy in the l-th sub-band.
- the factor acquisition module 404 acquires multiple gain factors according to the following formula:
- G (l) represents the gain factor used for noise suppression of the beamforming signal of the lth subband
- R XXl represents the autocorrelation coefficient of the current audio frame with greater energy in the lth subband
- R YYl represents the energy The autocorrelation coefficient of the small current audio frame in the l-th subband.
- the beamforming module 403 when performing frequency band division on two sub-audio signals, the beamforming module 403 may be used to:
- the two sub-audio signals extracted from the two current audio frames are divided into frequency bands.
- the audio extraction module 402 when transforming the current audio frames of the two audio signals from the time domain to the frequency domain, the audio extraction module 402 may be used to:
- the short-time Fourier transform is used to transform the current audio frames of the two audio signals from the time domain to the frequency domain.
- the audio extraction module 402 when transforming the current audio frames of the two audio signals from the time domain to the frequency domain, the audio extraction module 402 may be used to:
- Fast Fourier transform is used to transform the current audio frames of the two audio signals from the time domain to the frequency domain.
- the noise suppression module 405 when performing noise suppression on multiple beamforming signals according to multiple gain factors, the noise suppression module 405 may be used to:
- An embodiment of the present application provides a computer-readable storage medium on which a computer program is stored.
- the stored computer program is executed on a computer, the computer is caused to perform the steps in the audio processing method provided by the embodiment of the present application.
- An embodiment of the present application further provides an electronic device, including a memory, a processor, and the processor executes the steps in the audio processing method provided by the embodiment of the present application by calling a computer program stored in the memory.
- FIG. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
- the electronic device may include a microphone 601, a memory 602, and a processor 603.
- a person of ordinary skill in the art may understand that the structure of the electronic device shown in FIG. 6 does not constitute a limitation on the electronic device, and may include more or fewer components than those illustrated, or combine certain components, or arrange different components .
- the electronic device includes two microphones 601, which are arranged back-to-back with a predetermined distance apart, and the microphone 601 can collect sounds to obtain audio signals.
- the memory 602 may be used to store application programs and data.
- the application program stored in the memory 602 contains executable code.
- the application program can form various functional modules.
- the processor 603 executes application programs stored in the memory 602 to execute various functional applications and data processing.
- the processor 603 is the control center of the electronic device, and uses various interfaces and lines to connect the various parts of the entire electronic device. Various functions and processing data, so as to carry out overall monitoring of electronic equipment.
- the processor 603 in the electronic device loads the executable code corresponding to the process of one or more audio processing programs into the memory 602 according to the following instructions, and the processor 603 executes and stores The application program in the memory 602, thereby executing:
- Noise suppression is performed on multiple beamforming signals according to multiple gain factors, and the multiple beamforming signals after noise suppression are band-spliced and converted to the time domain to obtain an audio frame after noise suppression.
- FIG. 7 is another schematic structural diagram of an electronic device provided by an embodiment of the present application.
- the electronic device further includes components such as an input unit 604 and an output unit 605.
- the input unit 604 can be used to receive input numbers, character information, or user characteristic information (such as fingerprints), and generate keyboard, mouse, joystick, optical, or trackball signal inputs related to user settings and function control.
- user characteristic information such as fingerprints
- the output unit 605 may be used to display information input by the user or information provided to the user, such as a screen.
- the processor 603 in the electronic device loads the executable code corresponding to the process of one or more audio processing programs into the memory 602 according to the following instructions, and the processor 603 executes the storage The application program in the memory 602, thereby executing:
- Noise suppression is performed on multiple beamforming signals according to multiple gain factors, and the multiple beamforming signals after noise suppression are band-spliced and converted to the time domain to obtain an audio frame after noise suppression.
- the processor 603 may execute:
- the current audio frame with the lower energy in the two current audio frames is delayed and inverted, and then superimposed on the current audio frame with the higher energy to obtain the sub audio signal from the desired direction in the current audio frame with the higher energy;
- the sub audio signal from the desired direction in the current audio frame with a smaller energy is obtained.
- the processor 603 can perform beamforming according to the following formula:
- Z (l) represents the beamforming signal of the l-th subband
- w (l) represents the corresponding beamforming filter coefficient of the l-th subband
- X (l) represents the sub-audio signal of the current audio frame with greater energy
- Y (1) represents the sub-band signal of the sub-audio signal of the current audio frame with a smaller energy in the l-th sub-band.
- the processor 603 may obtain multiple gain factors according to the following formula:
- G (l) represents the gain factor used for noise suppression of the beamforming signal of the lth subband
- R XXl represents the autocorrelation coefficient of the current audio frame with greater energy in the lth subband
- R YYl represents the energy The autocorrelation coefficient of the small current audio frame in the l-th subband.
- the processor 603 when performing frequency band division on the two sub-audio signals, the processor 603 may execute:
- the two sub-audio signals extracted from the two current audio frames are divided into frequency bands.
- the processor 603 may execute:
- the short-time Fourier transform is used to transform the current audio frames of the two audio signals from the time domain to the frequency domain.
- the processor 603 may execute:
- Fast Fourier transform is used to transform the current audio frames of the two audio signals from the time domain to the frequency domain.
- the processor 603 when performing noise suppression on multiple beamforming signals according to multiple gain factors, the processor 603 may execute:
- the audio processing device / electronic device provided by the embodiment of the present application and the audio processing method in the above embodiments belong to the same concept, and any method provided in the audio processing method embodiment can be run on the audio processing device / electronic device, which is specific For the implementation process, please refer to the audio processing method embodiment, which will not be repeated here.
- the computer may be stored in a computer-readable storage medium, such as stored in a memory, and executed by at least one processor, and the execution process may include a process such as an audio processing method embodiment.
- the storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM, Read Only Memory), a random access memory (RAM, Random Access Memory), etc.
- each functional module may be integrated into one processing chip, or each module may exist alone physically, or two or more modules may be integrated into one module.
- the above integrated modules may be implemented in the form of hardware or software function modules. If the integrated module is implemented in the form of a software function module and sold or used as an independent product, it can also be stored in a computer-readable storage medium, such as a read-only memory, magnetic disk, or optical disk.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Circuit For Audible Band Transducer (AREA)
Abstract
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201880098308.0A CN113168843B (zh) | 2018-11-21 | 2018-11-21 | 音频处理方法、装置、存储介质及电子设备 |
PCT/CN2018/116718 WO2020103035A1 (fr) | 2018-11-21 | 2018-11-21 | Procédé et appareil de traitement audio, ainsi que support de stockage et dispositif électronique |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2018/116718 WO2020103035A1 (fr) | 2018-11-21 | 2018-11-21 | Procédé et appareil de traitement audio, ainsi que support de stockage et dispositif électronique |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020103035A1 true WO2020103035A1 (fr) | 2020-05-28 |
Family
ID=70773279
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2018/116718 WO2020103035A1 (fr) | 2018-11-21 | 2018-11-21 | Procédé et appareil de traitement audio, ainsi que support de stockage et dispositif électronique |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN113168843B (fr) |
WO (1) | WO2020103035A1 (fr) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112309425A (zh) * | 2020-10-14 | 2021-02-02 | 浙江大华技术股份有限公司 | 一种声音变调方法、电子设备及计算机可读存储介质 |
CN113160846A (zh) * | 2021-04-22 | 2021-07-23 | 维沃移动通信有限公司 | 噪声抑制方法和电子设备 |
CN114071220A (zh) * | 2021-11-04 | 2022-02-18 | 深圳Tcl新技术有限公司 | 音效调节方法、装置、存储介质及电子设备 |
WO2023016032A1 (fr) * | 2021-08-12 | 2023-02-16 | 北京荣耀终端有限公司 | Procédé de traitement vidéo et dispositif électronique |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1953059A (zh) * | 2006-11-24 | 2007-04-25 | 北京中星微电子有限公司 | 一种噪声消除装置和方法 |
CN101466056A (zh) * | 2008-12-31 | 2009-06-24 | 瑞声声学科技(常州)有限公司 | 麦克风消噪方法及装置 |
CN102111697A (zh) * | 2009-12-28 | 2011-06-29 | 歌尔声学股份有限公司 | 一种麦克风阵列降噪控制方法及装置 |
CN102347028A (zh) * | 2011-07-14 | 2012-02-08 | 瑞声声学科技(深圳)有限公司 | 双麦克风语音增强装置及方法 |
US20120045074A1 (en) * | 2010-08-17 | 2012-02-23 | C-Media Electronics Inc. | System, method and apparatus with environmental noise cancellation |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101582264A (zh) * | 2009-06-12 | 2009-11-18 | 瑞声声学科技(深圳)有限公司 | 语音增强的方法及语音增加的声音采集系统 |
CN101593522B (zh) * | 2009-07-08 | 2011-09-14 | 清华大学 | 一种全频域数字助听方法和设备 |
CN101763858A (zh) * | 2009-10-19 | 2010-06-30 | 瑞声声学科技(深圳)有限公司 | 双麦克风信号处理方法 |
CN101976565A (zh) * | 2010-07-09 | 2011-02-16 | 瑞声声学科技(深圳)有限公司 | 基于双麦克风语音增强装置及方法 |
US20130121498A1 (en) * | 2011-11-11 | 2013-05-16 | Qsound Labs, Inc. | Noise reduction using microphone array orientation information |
CN105976822B (zh) * | 2016-07-12 | 2019-12-03 | 西北工业大学 | 基于参数化超增益波束形成器的音频信号提取方法及装置 |
-
2018
- 2018-11-21 CN CN201880098308.0A patent/CN113168843B/zh active Active
- 2018-11-21 WO PCT/CN2018/116718 patent/WO2020103035A1/fr active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1953059A (zh) * | 2006-11-24 | 2007-04-25 | 北京中星微电子有限公司 | 一种噪声消除装置和方法 |
CN101466056A (zh) * | 2008-12-31 | 2009-06-24 | 瑞声声学科技(常州)有限公司 | 麦克风消噪方法及装置 |
CN102111697A (zh) * | 2009-12-28 | 2011-06-29 | 歌尔声学股份有限公司 | 一种麦克风阵列降噪控制方法及装置 |
US20120045074A1 (en) * | 2010-08-17 | 2012-02-23 | C-Media Electronics Inc. | System, method and apparatus with environmental noise cancellation |
CN102347028A (zh) * | 2011-07-14 | 2012-02-08 | 瑞声声学科技(深圳)有限公司 | 双麦克风语音增强装置及方法 |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112309425A (zh) * | 2020-10-14 | 2021-02-02 | 浙江大华技术股份有限公司 | 一种声音变调方法、电子设备及计算机可读存储介质 |
CN113160846A (zh) * | 2021-04-22 | 2021-07-23 | 维沃移动通信有限公司 | 噪声抑制方法和电子设备 |
CN113160846B (zh) * | 2021-04-22 | 2024-05-17 | 维沃移动通信有限公司 | 噪声抑制方法和电子设备 |
WO2023016032A1 (fr) * | 2021-08-12 | 2023-02-16 | 北京荣耀终端有限公司 | Procédé de traitement vidéo et dispositif électronique |
CN114071220A (zh) * | 2021-11-04 | 2022-02-18 | 深圳Tcl新技术有限公司 | 音效调节方法、装置、存储介质及电子设备 |
CN114071220B (zh) * | 2021-11-04 | 2024-01-19 | 深圳Tcl新技术有限公司 | 音效调节方法、装置、存储介质及电子设备 |
Also Published As
Publication number | Publication date |
---|---|
CN113168843B (zh) | 2022-04-22 |
CN113168843A (zh) | 2021-07-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020103035A1 (fr) | Procédé et appareil de traitement audio, ainsi que support de stockage et dispositif électronique | |
EP3253075B1 (fr) | Prothèse auditive comprenant une unité de filtrage à formateur de faisceau comprenant une unité de lissage | |
CN101593522B (zh) | 一种全频域数字助听方法和设备 | |
Wang | Time-frequency masking for speech separation and its potential for hearing aid design | |
CN110021307B (zh) | 音频校验方法、装置、存储介质及电子设备 | |
EP3189521B1 (fr) | Procédé et appareil permettant d'améliorer des sources sonores | |
CN106233382B (zh) | 一种对若干个输入音频信号进行去混响的信号处理装置 | |
Kumatani et al. | Microphone array processing for distant speech recognition: Towards real-world deployment | |
US20140025374A1 (en) | Speech enhancement to improve speech intelligibility and automatic speech recognition | |
US20100217590A1 (en) | Speaker localization system and method | |
CN108447496B (zh) | 一种基于麦克风阵列的语音增强方法及装置 | |
Pertilä et al. | Distant speech separation using predicted time–frequency masks from spatial features | |
US20210312936A1 (en) | Method, Device, Computer Readable Storage Medium and Electronic Apparatus for Speech Signal Processing | |
KR20090037692A (ko) | 혼합 사운드로부터 목표 음원 신호를 추출하는 방법 및장치 | |
Pertilä et al. | Microphone array post-filtering using supervised machine learning for speech enhancement. | |
WO2022256577A1 (fr) | Procédé d'amélioration de la parole et dispositif informatique mobile mettant en oeuvre le procédé | |
WO2023108864A1 (fr) | Procédé et système de capture régionale pour dispositif de réseau de microphones miniatures | |
JP2022544065A (ja) | 信号認識または修正のために音声データから抽出した特徴を正規化するための方法および装置 | |
AU2020316738B2 (en) | Speech-tracking listening device | |
JP2009020472A (ja) | 音処理装置およびプログラム | |
Delikaris-Manias et al. | Cross spectral density based spatial filter employing maximum directivity beam patterns | |
Yang et al. | Binaural Angular Separation Network | |
Shankar et al. | Comparison and real-time implementation of fixed and adaptive beamformers for speech enhancement on smartphones for hearing study | |
Chang et al. | Plug-and-Play MVDR Beamforming for Speech Separation | |
Ananthakrishnan et al. | Recent trends and challenges in speech-separation systems research—A tutorial review |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18940509 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 18940509 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 29.09.2021) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 18940509 Country of ref document: EP Kind code of ref document: A1 |