WO2022000174A1 - Procédé et appareil de traitement audio, et dispositif électronique - Google Patents

Procédé et appareil de traitement audio, et dispositif électronique Download PDF

Info

Publication number
WO2022000174A1
WO2022000174A1 PCT/CN2020/098886 CN2020098886W WO2022000174A1 WO 2022000174 A1 WO2022000174 A1 WO 2022000174A1 CN 2020098886 W CN2020098886 W CN 2020098886W WO 2022000174 A1 WO2022000174 A1 WO 2022000174A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio
sound source
gain
electronic device
frame
Prior art date
Application number
PCT/CN2020/098886
Other languages
English (en)
Chinese (zh)
Inventor
莫品西
边云锋
刘洋
Original Assignee
深圳市大疆创新科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市大疆创新科技有限公司 filed Critical 深圳市大疆创新科技有限公司
Priority to CN202080030168.0A priority Critical patent/CN113767432A/zh
Priority to PCT/CN2020/098886 priority patent/WO2022000174A1/fr
Publication of WO2022000174A1 publication Critical patent/WO2022000174A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0316Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/005Circuits for transducers, loudspeakers or microphones for combining the signals of two or more microphones
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2410/00Microphones
    • H04R2410/01Noise reduction using microphones having different directional characteristics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2430/00Signal processing covered by H04R, not provided for in its groups
    • H04R2430/20Processing of the output signals of the acoustic transducers of an array for obtaining a desired directivity characteristic

Definitions

  • the present application relates to the technical field of audio processing, and in particular, to an audio processing method, an audio processing apparatus, an electronic device, and a computer-readable storage medium.
  • Directive pickup is a technology that only picks up sounds from sources in a specified direction. This technology is widely used in professional recording and film and television industries. However, with the rise of multimedia applications such as self-media and Vlog, the demand for directional pickup by ordinary consumers has also increased.
  • the present application provides an audio processing method, an audio processing apparatus, an electronic device, and a computer-readable storage medium, so as to realize directional sound pickup in any direction.
  • a first aspect of the present application provides an audio processing method, including:
  • a target audio signal is synthesized based on the gain-adjusted audio components.
  • a second aspect of the present application provides an audio processing device, comprising: a processor and a memory storing a computer program
  • the processor implements the following steps when executing the computer program:
  • a target audio signal is synthesized based on the gain-adjusted audio components.
  • a third aspect of the present application provides an electronic device, including: a processor and a memory storing a computer program;
  • the processor implements the following steps when executing the computer program:
  • a target audio signal is synthesized based on the gain-adjusted audio components.
  • a fourth aspect of the present application provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium; when the computer program is executed by a processor, the audio processing method described in the first aspect above is implemented.
  • the audio processing method provided by the embodiment of the present application pays attention to the audio components of different frequencies included in the audio signal to be processed, and determines the sound source direction for each audio component, and can determine the sound source direction according to the matching degree between the sound source direction and the target direction. , adjust the gain of the audio component, so that in the synthesized target audio signal, the sound from the target direction can be more prominent, and the directional sound pickup is realized. Moreover, since the gain adjustment can be performed on the audio components of different frequencies, the directivity at different frequencies is flexibly controllable. In addition, the target direction can be flexibly set according to requirements, so directivity pickup in any direction can be achieved.
  • FIG. 1 is a flowchart of an audio processing method provided by an embodiment of the present application.
  • FIG. 2 is an algorithm block diagram of an audio processing method provided by an embodiment of the present application.
  • FIG. 3 is a schematic structural diagram of an audio processing apparatus provided by an embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • Directional pickup that is, to pick up sounds in a specified direction.
  • multimedia applications such as self-media and vlog
  • the demand for directional pickup is gradually increasing among ordinary consumers.
  • the other is to achieve directional pickup based on algorithms, such as beamforming algorithms based on microphone arrays.
  • the beamforming algorithm based on the microphone array can perform directional pickup for any direction of interest.
  • the algorithm can perform phase and/or phase and/or analysis on the audio signals collected by each microphone in the microphone array after determining the direction of interest. Or adjust the amplitude, so that the audio signals collected by each microphone are enhanced in the direction of interest, and then each adjusted audio signal is weighted to synthesize the final audio signal required to achieve directional pickup.
  • the above-mentioned beamforming algorithm can perform directional pickup in any direction, its directional performance depends on the size of the microphone array and the placement of the microphones. Based on this, unreasonable placement of microphones will also lead to unsatisfactory directivity in some frequency bands.
  • the above-mentioned beamforming algorithms have different directivity strengths at each frequency, and often have good directivity on high-frequency signals, while low-frequency signals have almost no directivity.
  • FIG. 1 is a flowchart of an audio processing method provided by an embodiment of the present application.
  • the audio processing method can be applied to various electronic devices with sound pickup function, including but not limited to mobile phones, cameras, video cameras, motion cameras, PTZ cameras, voice recorders, microphones, wearable electronic devices, smart speakers, smart home appliances, monitoring , intelligent robots, etc.
  • the method can also be applied to an audio processing device with processing capability, and the audio processing device can be used to perform post-processing on audio signals collected by other devices.
  • Step 101 Acquire a to-be-processed audio signal.
  • the audio signal to be processed includes audio components of different frequencies.
  • Step 102 Determine the sound source direction corresponding to each audio component.
  • Step 103 adjusting the gain of the audio component based on the degree of matching between the sound source direction and the target direction;
  • Step 104 synthesizing a target audio signal based on the audio component after the gain adjustment.
  • the sound of the sound field can be collected by at least two microphones.
  • the audio signal to be processed may be an audio signal collected by any microphone in the sound field, or an audio signal synthesized by using audio signals collected by several microphones in the sound field.
  • the audio signal to be processed includes audio components of different frequencies.
  • Fourier transform may be performed on the audio signal to be processed to transform the audio signal to be processed from the time domain to the frequency domain, so as to determine the audio components of different frequencies contained in the audio signal.
  • filtering methods, sub-generation analysis methods, etc. can also be used as alternative means of Fourier transform, and these alternative means can also determine the audio components included in the audio signal to be processed.
  • the audio component of each frequency its corresponding sound source direction can be determined.
  • the sound source direction of the audio component it may be determined based on a sound source localization algorithm.
  • sound source localization algorithms such as beamforming algorithm, time difference of arrival estimation algorithm, differential microphone array algorithm, etc.
  • the sound source direction corresponding to the audio component of each frequency in the sound field can be calculated by using the audio signals collected by at least two microphones in the sound field.
  • the direction of the sound source may be represented by a circumference angle and/or a pitch angle.
  • the at least two microphones since the audio signals collected by the at least two microphones mainly participate in the calculation of the sound source direction, the at least two microphones may be called directional microphones.
  • the audio signal to be processed it may be obtained by using one or more of the audio signals collected by the directional microphone.
  • the audio signal to be processed may be the audio signal with the highest signal-to-noise ratio among the audio signals collected by the directional microphone.
  • the audio signal to be processed may be an audio signal synthesized from audio signals collected by a directional microphone.
  • the audio signal to be processed may also be obtained according to audio signals collected by other microphones than the directional microphone.
  • the microphone array may include 6 microphones, and 3 of them can be selected as directional microphones, and the audio signal to be processed can be obtained by using the audio signals collected by the 3 directional microphones, or it can be collected according to the other 3 microphones. obtained from the audio signal.
  • the audio signal to be processed may also be determined according to audio signals collected by other microphones outside the microphone array, and the other microphones may also be microphones on other devices.
  • a direction can correspond to an angle, a range that an angle falls into (such as southeast, northwest, front, back, left, right, interval), or a vector, or Corresponds to a coordinate (the direction can be determined by the coordinate and the reference point coordinate).
  • a direction can correspond to an angle, a range that an angle falls into (such as southeast, northwest, front, back, left, right, interval), or a vector, or Corresponds to a coordinate (the direction can be determined by the coordinate and the reference point coordinate).
  • the target direction may be the direction of interest of the user. In one embodiment, it may be a user-set direction.
  • the user can interact with the electronic device to which the method provided by this application is applied, and set the target direction by inputting direction information.
  • the above-mentioned electronic device may have a camera whose pose information can be changed, the electronic device may obtain the pose information of the camera, thereby determining the orientation of the camera, and may set the target direction to be the same as that of the camera. orientation to match.
  • the electronic device can be equipped with a PTZ, and a camera is installed on the PTZ, and the camera can be rotated in all directions under the control of the PTZ.
  • the camera can be installed on the slide rail, and the camera can be driven by a motor to slide on the slide rail.
  • the camera can move relative to the body, all of which belong to the camera with variable pose information mentioned in this application.
  • the gain of the audio component can be adjusted according to the matching degree between the sound source direction and the target direction.
  • the gain adjustment may be performed on the audio component with a high degree of matching between the sound source direction and the target direction, for example, the gain of the audio component with a high degree of matching may be increased.
  • gain adjustment may also be performed on the audio components with low matching degree, for example, reducing the gain of the audio components with low matching degree.
  • the user may wish to attenuate the sound in the target direction.
  • the gain of the audio component with a high degree of matching between the sound source direction and the target direction may be reduced, or the matching degree may be reduced.
  • the gain of the low audio component is increased, or the gain of the audio component with high matching degree is reduced and the gain of the audio component with low matching degree is increased at the same time.
  • the matching degree may be determined according to the difference between the sound source direction and the target direction.
  • a difference threshold may be set, and when the difference between the sound source direction and the target direction is less than the difference threshold, it is considered that the sound source direction and the target direction have a high degree of matching.
  • the difference between the sound source direction and the target direction can also be expressed in other ways, for example, the difference can be expressed by a level, for example, the sound source direction falls into the third interval, and the target direction is in the first interval. If there is a second interval between the first interval and the third interval, it can be determined that the difference between the sound source direction and the target direction is the second level.
  • there are many other expressions which are not listed here.
  • the target audio signal can be synthesized based on the audio components after the gain adjustment.
  • the synthesis of the target audio signal can be considered as the transformation from the frequency domain to the time domain, which can be implemented in various ways, such as through inverse Fourier transform, of course, there can also be other ways.
  • the audio processing method provided by the embodiment of the present application pays attention to the audio components of different frequencies included in the audio signal to be processed, and determines the sound source direction for each audio component, and can determine the sound source direction according to the matching degree between the sound source direction and the target direction. , adjust the gain of the audio component, so that in the synthesized target audio signal, the sound from the target direction can be more prominent, and the directional sound pickup is realized. Moreover, since the gain adjustment can be performed on the audio components of different frequencies, the directivity at different frequencies is flexibly controllable. In addition, the target direction can be flexibly set according to requirements, so directivity pickup in any direction can be achieved. Compared with the directional sound pickup based on the beamforming algorithm, the method provided by the embodiments of the present application only needs a small microphone array or a small number (more than two) of microphones to meet the requirement of strong directivity.
  • the gain coefficient of the audio component may be determined according to the matching degree between the sound source direction and the target direction, so that the gain of the audio component is adjusted by the gain coefficient.
  • the gain coefficient in an implementation manner, it may be determined according to a preset corresponding relationship.
  • the preset correspondence may be the correspondence between the matching degree and the gain coefficient. Therefore, after the matching degree of the audio component is determined, the gain coefficient corresponding to the matching degree may be determined through the preset correspondence.
  • the preset corresponding relationship can be flexibly adjusted according to the needs during specific setting. For example, in the preset correspondence relationship, the higher the matching degree, the larger the gain coefficient, that is, the matching degree and the gain coefficient are positively correlated.
  • the change step size of the gain coefficient with the change of the matching degree can be less than or equal to the specified change amount.
  • the corresponding change amount of the gain coefficient is less than or equal to the specified change amount, so that the change of the gain coefficient Relatively smooth, the synthesized target audio signal can also sound more natural
  • the preset correspondence may also be the correspondence between the sound source direction and the gain coefficient, that is, when determining the gain coefficient of a certain audio component, the preset correspondence may be based on the sound source direction of the audio component.
  • the relationship determines the gain coefficient corresponding to the sound source direction.
  • the gain coefficient needs to be set according to the matching relationship between the sound source direction and the target direction. For example, if the target direction is 12 o'clock, the gain coefficient corresponding to the sound source direction at 12 o'clock can be set as 1 in the preset correspondence, and the gain coefficient corresponding to the sound source direction at 11 o'clock is 0.8.
  • the gain factor corresponding to the sound source direction in 10 points is 0.5...
  • the two change parameters in the preset correspondence are the sound source direction and the gain coefficient, excluding the matching degree between the sound source direction and the target direction
  • the gain coefficient corresponding to the sound source direction is in the numerical value
  • the size is compatible with the matching degree between the sound source direction and the target direction.
  • the preset correspondence also has various expressions, one of which can be a function, which can be freely set according to requirements, and the function can reflect the change of the gain coefficient with the direction of the sound source.
  • the gain factor can change continuously and smoothly with the change of the sound source direction.
  • the gain adjustment of the audio component in addition to paying attention to the matching degree between the sound source direction of the audio component and the target direction, you can also pay attention to The frequency of the audio component, that is, the gain of the audio component can be adjusted according to the matching degree of the audio component in the direction and the frequency of the audio component.
  • the preset corresponding relationship may also be the corresponding relationship between the gain coefficient and the two parameters of the sound source direction and frequency, that is, in the preset corresponding relationship, only when the frequency of the audio component and the sound source direction are determined , the gain coefficient of the audio component is uniquely determined.
  • the directivity corresponding to the low-frequency part can be set to be weaker, and the directivity of the high-frequency part can be set to be stronger.
  • set The gain coefficient corresponding to the low frequency is set to be smaller than the gain coefficient corresponding to the high frequency. In this way, the target audio signal synthesized based on the preset correspondence can be more in line with the actual hearing sense of the human ear in terms of listening effect.
  • the audio signal to be processed may be an audio frame of the original audio signal, that is, the audio signal to be processed may be obtained by performing frame-by-frame processing on the original audio signal.
  • the audio frame includes a preset number of sampling points, and the audio frame may be referred to as a first audio frame.
  • the synthesized target audio signal is also an audio frame, and the audio frame corresponds to the first audio frame and may be referred to as a second audio frame.
  • the reason why the original audio signal is divided into frames is that when the signal is transformed from the time domain to the frequency domain, the transformation algorithm requires the input signal to be stable. Within the duration of one frame, a signal can be considered to be stable. Therefore, the original audio signal can be processed in frames according to the set frame length to obtain multiple audio frames of the original audio signal, and the audio signal to be processed Can be any of the plurality of audio frames.
  • the number of sampling points in the first audio frame may be a power of 2, so that when analyzing the audio components contained in the first audio frame (audio signal to be processed), fast Fourier analysis can be adopted. Lie transform FFT to speed up the calculation.
  • the first audio frame may be modulated into a periodic signal prior to analyzing the spectrum of the first audio frame.
  • the specific method of modulating into a periodic signal may be adding an analysis window to the first audio frame, that is, multiplying the first audio frame by a window function of the analysis window.
  • the window function of the analysis window can be a sine window, a Hanning window, or the like.
  • the audio signal to be processed is one audio frame (the first audio frame) of the original audio signal
  • the synthesized target audio signal is correspondingly only one audio frame (the second audio frame). Since the frame shift (the number of sampling points between two adjacent frames) is always smaller than the frame length (the number of sampling points in one frame) when the original audio signal is divided into frames, the difference between the audio frame and the audio frame is There will be overlapping sample points.
  • the second audio frame can be processed by the overlap-add method Overlap-add, and the second audio frame and the previous audio frame can be combined with each other. Overlapping sample points are accumulated.
  • the overlapping part may have a sudden change in amplitude.
  • the amplitudes at both ends of the second audio frame can be eliminated before accumulation.
  • a specific method for eliminating amplitude distortion may be adding a synthesis window to the second audio frame.
  • the window function of the synthesis window such as a sine window or a Hanning window.
  • the audio processing method provided by the embodiment of the present application pays attention to the audio components of different frequencies included in the audio signal to be processed, and determines the sound source direction for each audio component, and can determine the sound source direction according to the matching degree between the sound source direction and the target direction. , adjust the gain of the audio component, so that in the synthesized target audio signal, the sound from the target direction can be more prominent, and the directional sound pickup is realized. Moreover, since the gain adjustment can be performed on the audio components of different frequencies, the directivity at different frequencies is flexibly controllable. In addition, the target direction can be flexibly set according to requirements, so directivity pickup in any direction can be achieved. Compared with the directional sound pickup based on the beamforming algorithm, the method provided by the embodiments of the present application only needs a small microphone array or a small number (more than two) of microphones to meet the requirement of strong directivity.
  • each channel of the audio signals to be processed may be processed by the audio processing method provided in this embodiment of the present application.
  • the target directions for the directivity processing can be the same or different.
  • there may be two channels of audio signals to be processed one channel of audio signals to be processed may be directional pickup for the front, and the other channel of audio signals to be processed may be directional pickup for the rear.
  • FIG. 2 is an algorithm block diagram of an audio processing method provided by an embodiment of the present application.
  • the original audio signal (the to-be-processed audio signal is an audio frame of the original audio signal) can be represented by s i (t), where i represents the i-th original audio signal.
  • the original audio signal s i (t) and the audio signal x m (t) collected by each microphone can be divided into frames to obtain the first audio frame s i corresponding to the original audio signal.
  • An analysis window is added to the audio frame x m (n) l corresponding to the audio signal collected by the microphone and the first audio frame s i (n) l respectively, to obtain x' m (n) l and s' i (n) l .
  • Input the x' m (n) l and s' i (n) l after adding the analysis window to the FFT module, respectively, to obtain the corresponding frequency spectra of the time-domain audio frames x' m (n) l and s' i (n) l respectively X m (k) l and S i (k) l , where k represents a discrete spectrum sequence, k 1, 2, . . . , N.
  • the gain coefficient determination module includes a preset corresponding relationship, and the preset corresponding relationship can be the corresponding relationship between the gain coefficient and the two parameters of the sound source direction and frequency.
  • the function G i ( ⁇ , ⁇ , k ) Express.
  • the function G i ( ⁇ , ⁇ ,k) can be set flexibly, and the specific setting method can refer to the setting of the preset corresponding relationship in the foregoing.
  • the gain-adjusted audio components S i (k) l are input into the inverse fast Fourier transform IFFT module, and transformed from the frequency domain back to the time domain to obtain a time domain audio frame s' i (n) l .
  • a synthesis window can be added to each s' i (n) l to obtain s" i (n) l .
  • the audio frame s" i (n) l after each addition synthesis window is overlapped and added by the Overlap-add method.
  • the restored audio frame s i (n) l The final complete target audio signal can be synthesized using each audio frame s i (n) l.
  • FIG. 3 is a schematic structural diagram of an audio processing apparatus provided by an embodiment of the present application.
  • the audio processing apparatus may include: a processor 310 and a memory 320 storing a computer program;
  • the processor implements the following steps when executing the computer program:
  • a target audio signal is synthesized based on the gain-adjusted audio components.
  • the processor when adjusting the gain of the audio component based on the matching degree between the sound source direction and the target direction, is specifically configured to determine the audio frequency based on the matching degree between the sound source direction and the target direction. gain coefficient of the component, and adjust the gain of the audio component according to the gain coefficient.
  • the gain coefficient of the audio component is determined according to a preset corresponding relationship, and the preset corresponding relationship is a corresponding relationship between the matching degree and the gain coefficient.
  • the gain coefficient corresponding to the sound source direction is positively correlated with the matching degree.
  • the corresponding change amount of the gain coefficient when the matching degree changes by one unit is less than or equal to a specified change amount.
  • the matching degree is determined according to the difference between the sound source direction and the target direction.
  • the processor when the processor performs the adjusting the gain of the audio component based on the matching degree between the sound source direction and the target direction, the processor is specifically configured to adjust the gain of the audio component according to the matching degree between the sound source direction and the target direction.
  • the matching degree and the frequency of the audio component adjust the gain of the audio component.
  • the sound source direction is determined based on a sound source localization algorithm, using at least two microphones to collect audio signals from the same sound field.
  • the sound source localization algorithm includes any one of the following: a beamforming algorithm, a time difference of arrival estimation algorithm, and a differential microphone array algorithm.
  • the to-be-processed audio signal is obtained by using one or more of the audio signals collected by the at least two microphones.
  • the to-be-processed audio signal is the audio signal with the highest signal-to-noise ratio among the audio signals collected by the at least two microphones.
  • the audio signal to be processed is synthesized by using audio signals collected by the at least two microphones.
  • the to-be-processed audio signal is obtained according to audio signals collected by other microphones other than the at least two microphones.
  • the audio signal to be processed is a first audio frame including a preset number of sampling points
  • the target audio signal is a second audio frame corresponding to the first audio frame.
  • the preset number is a power of 2.
  • the audio components of different frequencies are determined by performing fast Fourier transform on the first audio frame.
  • the processor is further configured to modulate the first audio frame into a periodic signal before determining the audio components of different frequencies included in the first audio frame.
  • the processor when the processor modulates the first audio frame into a periodic signal, the processor is specifically configured to add an analysis window to the first audio frame.
  • sampling points overlapping the second audio frame and the previous audio frame are accumulated.
  • the processor is further configured to eliminate the distortion of the amplitudes at both ends of the second audio frame before accumulating the sampling points overlapping the second audio frame and the previous audio frame.
  • the processor is specifically configured to add a synthesis window to the second audio frame when performing the removing the distortion of the amplitudes at both ends of the second audio frame.
  • the target direction is set according to the direction information input by the user.
  • the electronic device has a camera whose pose information can be changed, and the target direction is determined according to the orientation of the camera.
  • the sound source direction includes: a circumference angle and/or a pitch angle.
  • the audio processing device provided by the embodiment of the present application pays attention to the audio components of different frequencies included in the audio signal to be processed, determines the sound source direction for each audio component, and can determine the sound source direction according to the matching degree between the sound source direction and the target direction. , adjust the gain of the audio component, so that in the synthesized target audio signal, the sound from the target direction can be more prominent, and the directional sound pickup is realized. Moreover, since the gain adjustment can be performed on the audio components of different frequencies, the directivity at different frequencies is flexibly controllable. In addition, the target direction can be flexibly set according to requirements, so directional pickup in any direction can be achieved. Compared with the directional sound pickup based on the beamforming algorithm, the device provided by the embodiments of the present application only needs a small microphone array or a small number (more than two) of microphones to meet the requirement of strong directivity.
  • FIG. 4 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • the electronic device includes: a processor 410 and a memory 420 storing a computer program;
  • the processor implements the following steps when executing the computer program:
  • a target audio signal is synthesized based on the gain-adjusted audio components.
  • the processor when the processor adjusts the gain of the audio component based on the matching degree between the sound source direction and the target direction, the processor is specifically configured to determine the audio frequency based on the matching degree between the sound source direction and the target direction. gain coefficient of the component, and adjust the gain of the audio component according to the gain coefficient.
  • the gain coefficient of the audio component is determined according to a preset corresponding relationship, and the preset corresponding relationship is the corresponding relationship between the matching degree and the gain coefficient.
  • the gain coefficient corresponding to the sound source direction is positively correlated with the matching degree.
  • the corresponding change amount of the gain coefficient is less than or equal to a specified change amount.
  • the matching degree is determined according to the difference between the sound source direction and the target direction.
  • the processor when the processor performs the adjusting the gain of the audio component based on the matching degree between the sound source direction and the target direction, the processor is specifically configured to adjust the gain of the audio component according to the matching degree between the sound source direction and the target direction.
  • the matching degree and the frequency of the audio component adjust the gain of the audio component.
  • it also includes: at least two microphones;
  • the sound source direction is determined based on a sound source localization algorithm using audio signals collected from the same sound field by the at least two microphones.
  • the sound source localization algorithm includes any one of the following: a beamforming algorithm, a time difference of arrival estimation algorithm, and a differential microphone array algorithm.
  • the to-be-processed audio signal is obtained by using one or more of the audio signals collected by the at least two microphones.
  • the to-be-processed audio signal is the audio signal with the highest signal-to-noise ratio among the audio signals collected by the at least two microphones.
  • the audio signal to be processed is synthesized by using audio signals collected by the at least two microphones.
  • the to-be-processed audio signal is obtained according to audio signals collected by other microphones other than the at least two microphones.
  • the audio signal to be processed is a first audio frame including a preset number of sampling points
  • the target audio signal is a second audio frame corresponding to the first audio frame.
  • the preset number is a power of 2.
  • the audio components of different frequencies are determined by performing fast Fourier transform on the first audio frame.
  • the processor is further configured to modulate the first audio frame into a periodic signal before determining the audio components of different frequencies included in the first audio frame.
  • the processor when performing the modulating the first audio frame into a periodic signal, is specifically configured to add an analysis window to the first audio frame.
  • sampling points overlapping the second audio frame and the previous audio frame are accumulated.
  • the processor is further configured to eliminate the distortion of the amplitudes at both ends of the second audio frame before accumulating the sampling points overlapping the second audio frame and the previous audio frame.
  • the processor is specifically configured to add a synthesis window to the second audio frame when performing the removing the distortion of the amplitudes at both ends of the second audio frame.
  • the target direction is set according to the direction information input by the user.
  • it further includes: a camera, the camera can move relative to the electronic device, and the target direction is determined according to the orientation of the camera.
  • the sound source direction includes: a circumference angle and/or a pitch angle.
  • the electronic device provided by the embodiment of the present application pays attention to the audio components of different frequencies included in the audio signal to be processed, determines the sound source direction for each audio component, and can determine the sound source direction according to the matching degree between the sound source direction and the target direction,
  • the gain of the audio component is adjusted, so that in the synthesized target audio signal, the sound originating from the target direction can be more prominent, realizing directional sound pickup.
  • the gain adjustment can be performed on the audio components of different frequencies, the directivity at different frequencies is flexibly controllable.
  • the target direction can be flexibly set according to requirements, so directivity pickup in any direction can be achieved.
  • the electronic device provided by the embodiments of the present application only needs a small microphone array or a small number (more than two) of microphones to meet the requirement of strong directivity.
  • Embodiments of the present application may take the form of a computer program product implemented on one or more storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having program code embodied therein.
  • Computer-usable storage media includes permanent and non-permanent, removable and non-removable media, and storage of information can be accomplished by any method or technology.
  • Information may be computer readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Flash Memory or other memory technology, Compact Disc Read Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, Magnetic tape cassettes, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission medium that can be used to store information that can be accessed by a computing device.
  • PRAM phase-change memory
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • RAM random access memory
  • ROM read only memory
  • EEPROM Electrically Erasable Programmable Read Only Memory
  • Flash Memory or other memory technology
  • CD-ROM Compact Disc Read Only Memory
  • CD-ROM Compact Disc Read Only Memory
  • DVD Digital Versatile Disc
  • Magnetic tape cassettes magnetic tape magnetic disk storage or other magnetic storage devices or any other non-

Abstract

Procédé de traitement audio. Le procédé de traitement audio comprend : l'acquisition d'un signal audio à traiter, ledit signal audio comprenant des composantes audio ayant des fréquences différentes (S101) ; la détermination d'une direction de source sonore correspondant à chaque composante audio (S102) ; le réglage des gains des composantes audio sur la base du degré de correspondance entre la direction de source sonore et une direction cible (S103) ; et la synthèse d'un signal audio cible sur la base des composantes audio à gains réglés (S104). Au moyen du procédé de traitement audio, une capture de son directionnelle dans n'importe quelle direction est réalisée.
PCT/CN2020/098886 2020-06-29 2020-06-29 Procédé et appareil de traitement audio, et dispositif électronique WO2022000174A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202080030168.0A CN113767432A (zh) 2020-06-29 2020-06-29 音频处理方法、音频处理装置、电子设备
PCT/CN2020/098886 WO2022000174A1 (fr) 2020-06-29 2020-06-29 Procédé et appareil de traitement audio, et dispositif électronique

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/098886 WO2022000174A1 (fr) 2020-06-29 2020-06-29 Procédé et appareil de traitement audio, et dispositif électronique

Publications (1)

Publication Number Publication Date
WO2022000174A1 true WO2022000174A1 (fr) 2022-01-06

Family

ID=78786249

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/098886 WO2022000174A1 (fr) 2020-06-29 2020-06-29 Procédé et appareil de traitement audio, et dispositif électronique

Country Status (2)

Country Link
CN (1) CN113767432A (fr)
WO (1) WO2022000174A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115567864A (zh) * 2022-12-02 2023-01-03 浙江华创视讯科技有限公司 麦克风增益的调整方法和装置、存储介质及电子设备

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116705047B (zh) * 2023-07-31 2023-11-14 北京小米移动软件有限公司 音频采集方法、装置及存储介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102006403A (zh) * 2009-08-28 2011-04-06 三洋电机株式会社 摄像装置及再生装置
CN104699445A (zh) * 2013-12-06 2015-06-10 华为技术有限公司 一种音频信息处理方法及装置
CN106653041A (zh) * 2017-01-17 2017-05-10 北京地平线信息技术有限公司 音频信号处理设备、方法和电子设备
CN107534725A (zh) * 2015-05-19 2018-01-02 华为技术有限公司 一种语音信号处理方法及装置
CN108769400A (zh) * 2018-05-23 2018-11-06 宇龙计算机通信科技(深圳)有限公司 一种定位录音的方法及装置
CN109036448A (zh) * 2017-06-12 2018-12-18 华为技术有限公司 一种声音处理方法和装置
WO2019121864A1 (fr) * 2017-12-19 2019-06-27 Koninklijke Kpn N.V. Communication multi-utilisateur audiovisuelle améliorée

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5564873B2 (ja) * 2009-09-25 2014-08-06 富士通株式会社 収音処理装置、収音処理方法、及びプログラム
JP5494699B2 (ja) * 2012-03-02 2014-05-21 沖電気工業株式会社 収音装置及びプログラム
CN105957521B (zh) * 2016-02-29 2020-07-10 青岛克路德机器人有限公司 一种用于机器人的语音和图像复合交互执行方法及系统
CN106782584B (zh) * 2016-12-28 2023-11-07 北京地平线信息技术有限公司 音频信号处理设备、方法和电子设备
JP6763332B2 (ja) * 2017-03-30 2020-09-30 沖電気工業株式会社 収音装置、プログラム及び方法
CN110782911A (zh) * 2018-07-30 2020-02-11 阿里巴巴集团控股有限公司 音频信号处理方法、装置、设备和存储介质

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102006403A (zh) * 2009-08-28 2011-04-06 三洋电机株式会社 摄像装置及再生装置
CN104699445A (zh) * 2013-12-06 2015-06-10 华为技术有限公司 一种音频信息处理方法及装置
CN107534725A (zh) * 2015-05-19 2018-01-02 华为技术有限公司 一种语音信号处理方法及装置
CN106653041A (zh) * 2017-01-17 2017-05-10 北京地平线信息技术有限公司 音频信号处理设备、方法和电子设备
CN109036448A (zh) * 2017-06-12 2018-12-18 华为技术有限公司 一种声音处理方法和装置
WO2019121864A1 (fr) * 2017-12-19 2019-06-27 Koninklijke Kpn N.V. Communication multi-utilisateur audiovisuelle améliorée
CN108769400A (zh) * 2018-05-23 2018-11-06 宇龙计算机通信科技(深圳)有限公司 一种定位录音的方法及装置

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115567864A (zh) * 2022-12-02 2023-01-03 浙江华创视讯科技有限公司 麦克风增益的调整方法和装置、存储介质及电子设备
CN115567864B (zh) * 2022-12-02 2024-03-01 浙江华创视讯科技有限公司 麦克风增益的调整方法和装置、存储介质及电子设备

Also Published As

Publication number Publication date
CN113767432A (zh) 2021-12-07

Similar Documents

Publication Publication Date Title
CN109102822B (zh) 一种基于固定波束形成的滤波方法及装置
US10382849B2 (en) Spatial audio processing apparatus
KR102470962B1 (ko) 사운드 소스들을 향상시키기 위한 방법 및 장치
US9552840B2 (en) Three-dimensional sound capturing and reproducing with multi-microphones
CN102947685B (zh) 用于减少环境噪声对收听者的影响的方法和装置
US8971548B2 (en) Motor noise reduction circuit
US8897454B2 (en) Sound zooming apparatus and method synchronized with moving picture zooming function
CN104699445A (zh) 一种音频信息处理方法及装置
JP2013543987A (ja) 遠距離場マルチ音源追跡および分離のためのシステム、方法、装置およびコンピュータ可読媒体
WO2022000174A1 (fr) Procédé et appareil de traitement audio, et dispositif électronique
US9967660B2 (en) Signal processing apparatus and method
US20200288262A1 (en) Spatial Audio Signal Processing
US10057702B2 (en) Audio signal processing apparatus and method for modifying a stereo image of a stereo signal
US20130253923A1 (en) Multichannel enhancement system for preserving spatial cues
JP4116600B2 (ja) 収音方法、収音装置、収音プログラム、およびこれを記録した記録媒体
WO2023118644A1 (fr) Appareil, procédés et programmes informatiques pour fournir un son spatialisé
WO2017171864A1 (fr) Compréhension d'un environnement acoustique dans une communication vocale machine-humain
Thiergart et al. Combining linear spatial filtering and non-linear parametric processing for high-quality spatial sound capturing
WO2021212287A1 (fr) Procédé de traitement de signal audio, dispositif de traitement audio et appareil d'enregistrement
JP2016092562A (ja) 音声処理装置および方法、並びにプログラム
WO2018066376A1 (fr) Dispositif de traitement de signal, procédé et programme
CN220043611U (zh) 微型指向性录音装置及电子设备
EP3643083A1 (fr) Traitement audio spatial
EP3029671A1 (fr) Procédé et appareil d'amélioration de sources acoustiques
US11950089B2 (en) Perceptual bass extension with loudness management and artificial intelligence (AI)

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20942691

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20942691

Country of ref document: EP

Kind code of ref document: A1