EP4220637A1 - Procédé et appareil d'acquisition de signal audio multicanal, et système - Google Patents

Procédé et appareil d'acquisition de signal audio multicanal, et système Download PDF

Info

Publication number
EP4220637A1
EP4220637A1 EP21870910.3A EP21870910A EP4220637A1 EP 4220637 A1 EP4220637 A1 EP 4220637A1 EP 21870910 A EP21870910 A EP 21870910A EP 4220637 A1 EP4220637 A1 EP 4220637A1
Authority
EP
European Patent Office
Prior art keywords
audio signal
target
acquire
main
channel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
EP21870910.3A
Other languages
German (de)
English (en)
Other versions
EP4220637A4 (fr
Inventor
Wendong Wang
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Oppo Mobile Telecommunications Corp Ltd
Original Assignee
Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Oppo Mobile Telecommunications Corp Ltd filed Critical Guangdong Oppo Mobile Telecommunications Corp Ltd
Publication of EP4220637A1 publication Critical patent/EP4220637A1/fr
Publication of EP4220637A4 publication Critical patent/EP4220637A4/fr
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/005Circuits for transducers, loudspeakers or microphones for combining the signals of two or more microphones
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S3/00Systems employing more than two channels, e.g. quadraphonic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • H04S7/305Electronic adaptation of stereophonic audio signals to reverberation of the listening space
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0316Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
    • G10L21/0356Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude for synchronising with other signals, e.g. video signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2420/00Details of connection covered by H04R, not provided for in its groups
    • H04R2420/01Input selection or mixing for amplifiers or loudspeakers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/01Multi-channel, i.e. more than two input channels, sound reproduction with two speakers wherein the multi-channel information is substantially preserved
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/11Positioning of individual sound objects, e.g. moving airplane, within a sound field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/15Aspects of sound capture and related signal processing for recording or reproduction

Definitions

  • the present disclosure relates to the field of audio technologies, in particular to a multi-channel audio signal acquisition method, a multi-channel audio signal acquisition device and a multi-channel audio signal acquisition system.
  • TWS Bluetooth headset uses a microphone on the TWS Bluetooth headset to capture a high-quality close-up audio signal far away from a user, and mixes the spatial audio signals collected by the microphone array in a main device and performs a binaural rendering to simulate a point shaped auditory target in a spatial sound field, which creates a more real immersive experience.
  • this solution only mixes the distributed audio signals, and does not suppress an ambient sound.
  • a multi-channel audio signal acquisition method, a multi-channel audio signal acquisition device and a multi-channel audio signal acquisition system are provided in embodiments of the present disclosure, which can use a relationship between distributed audio signals to suppress an ambient sound and improve a recording effect of an audio signal.
  • a multi-channel audio signal acquisition method is provided in the embodiments of the present disclosure and includes following operations.
  • the method includes: acquiring a main audio signal collected by a main device when the main device shoots video, and performing a multi-channel rendering to acquire an ambient multi-channel audio signal.
  • the method includes: acquiring an audio signal collected by an additional device, and determining a first additional audio signal, a distance between the additional device and the target shooting object is less than the first threshold.
  • the method includes: performing an ambient sound suppression processing on the first additional audio signal and the main audio signal to acquire a target audio signal.
  • the method includes: performing a multi-channel rendering on the target audio signal to acquire a target multi-channel audio signal.
  • the method includes: mixing the ambient multi-channel audio signal and the target multi-channel audio signal to acquire a mixed multi-channel audio signal.
  • a multi-channel audio signal acquisition device includes following components.
  • the multi-channel audio signal acquisition device includes an acquisition module configured to acquire a main audio signal collected by a main device when the main device shoots video of a target shooting object, and perform a first multi-channel rendering to acquire an ambient multi-channel audio signal, acquire an audio signal collected by an additional device, and determine a first additional audio signal, a distance between the additional device and the target shooting object is less than the first threshold.
  • the multi-channel audio signal acquisition device includes a processing module configured to perform an ambient sound suppression processing on the first additional audio signal and the main audio signal to acquire a target audio signal.
  • the processing module is configured to perform a multi-channel rendering on the target audio signal to acquire a target multi-channel audio signal.
  • the processing module is configured to mix the ambient multi-channel audio signal and the target multi-channel audio signal to acquire a mixed multi-channel audio signal.
  • a terminal device in a third aspect, includes a processor, a memory storing a computer program capable of running on the processor.
  • the computer program is executed by the processor to perform the multi-channel audio signal acquisition method in the first aspect.
  • a terminal device in a fourth aspect, includes the multi-channel audio signal acquisition device in the second aspect and a main device.
  • the main device is configured to collect the main audio signal when the main device shoots video, and send the main audio signal to the multi-channel audio signal acquisition device.
  • a multi-channel audio signal acquisition system includes the multi-channel audio signal acquisition device in the second aspect, a main device and an additional device, and the main device and the additional device establish a communication connection with the multi-channel audio signal respectively.
  • the main device is configured to collect a main audio signal when the main device shoots video, and send the main audio signal to the multi-channel audio signal acquisition device.
  • the additional device is configured to collect a second additional audio signal, and send the second additional audio signal to the multi-channel audio signal acquisition device.
  • a distance between the additional device and the target shooting object is less than the first threshold.
  • a computer-readable storage medium storing a computer program
  • the computer program is executed by a processor to perform the multi-channel audio signal acquisition method in the first aspect.
  • the multi-channel audio signal acquisition method may include: acquiring a main audio signal collected by a main device when the main device shoots video, and performing a multi-channel rendering to acquire an ambient multi-channel audio signal; acquiring an audio signal collected by the additional device, and determining a first additional audio signal, a distance between the additional device and the target shooting object being less than the first threshold; performing an ambient sound suppression processing on the first additional audio signal and the main audio signal to acquire a target audio signal; performing a multi-channel rendering on the target audio signal to acquire a target multi-channel audio signal; mixing the ambient multi-channel audio signal and the target multi-channel audio signal to acquire a mixed multi-channel audio signal.
  • distributed audio signals may be acquired from the main device and additional device, and the relationship between distributed audio signals may be used to perform the ambient sound suppression processing according to the first additional audio signal collected by the additional device and the main audio signal collected by the main device, so as to suppress the ambient sound in a recording process and acquire the target multi-channel audio signal.
  • the ambient multi-channel audio signal (which is acquired by performing multi-channel rendering on the main audio signal) is mixed with the target multi-channel audio signal.
  • the distributed audio signals are mixed, and the point shaped auditory target in the space sound field is simulated, but also the ambient sound is suppressed, thereby improving the recording effect of the audio signal.
  • association relationship that describes association objects, and it indicates three kinds of relationships.
  • a and/or B can indicate that there are three cases including: A alone, A and B together, and B alone.
  • the embodiments of the present disclosure provide a multi-channel audio signal acquisition method, a device and a system, which may be applied to video shooting scenes, especially applied to situations with multiple sound sources or noisy environments.
  • the distributed audio signals are mixed, the point shaped auditory target in the space sound field is simulated, and the ambient sound is suppressed, thereby improving the recording effect of the audio signal.
  • FIG. 1 is a schematic diagram of a multi-channel audio signal acquisition system according to some embodiments of the present disclosure.
  • the system may include a main device, an additional device, and an audio processing device (such as a multi-channel audio acquisition device in embodiments of the present disclosure).
  • the additional device in FIG. 1 may be a true wireless stereo (TWS) Bluetooth headset configured to collect audio streams (that is, an additional audio signal according to some embodiments of the present disclosure).
  • the main device may be configured to collect video streams and audio streams (that is, a main audio signal according to some embodiments of the present disclosure).
  • TWS true wireless stereo
  • the audio processing device may include the following modules such as a target tracking module, a scene-sound-source classification module, a delay compensation module, an adaptive filtering module, a spatial filtering module, a binaural rendering module, and a mixer module, etc. Specific functions of each module are described in combination with the multi-channel audio signal acquisition method described in following embodiments, which is not repeated here.
  • the main device and the audio processing device in the embodiments of the present disclosure may be two independent devices.
  • the main device and the audio processing device may also be integrated in a device.
  • the integrated device may be a terminal device that integrates functions of the main device and the audio processing device.
  • a connection manner between the additional device and the terminal device, or between the additional device and the audio processing device may be a wireless communication such as a Bluetooth connection, or a wireless fidelity (WiFi) connection.
  • the connection manner is not specifically limited.
  • the terminal device in the embodiments of the present disclosure may include a mobile phone, a tablet, a laptop, an ultra-mobile personal computer (UMPC), a handheld computer, a netbook, a personal digital assistant (PDA), a wearable device (such as a watch, a wrister, a glass, a helmet, or a headband, etc.), etc.
  • UMPC ultra-mobile personal computer
  • PDA personal digital assistant
  • the embodiments of the present disclosure do not make special limits on a specific form of the terminal device.
  • the additional device may be a terminal device independent of the main device and the audio processing device
  • the mobile terminal device may be a portable terminal device such as a Bluetooth headset, a wearable device (such as a watch, a wrister, a glass, a helmet, and a headband, etc.), etc.
  • the main device may shoot video, acquire the main audio signal, and send the main audio signal to the audio processing device. Since the additional device is close to a target shooting object in the video shooting scene (for example, a distance between the additional device and the target shooting object is less than a first threshold), the additional device may acquire the additional audio device, and then send it to the audio processing device.
  • the target shooting object may be a person or a musical instrument in the video shooting scene.
  • a plurality of shooting objects may be occurred in the video shooting scene, and the target shooting object may be one of the plurality of shooting objects.
  • FIG. 2A is a flowchart of a multi-channel audio signal acquisition method according to some embodiments of the present disclosure.
  • the method may be performed by the audio processing device (i.e., the multi-channel audio acquisition device) as shown in FIG. 1 , or performed by the terminal device that integrates functions of the audio processing device and the main device as shown in FIG. 1 .
  • the main device may be a functional module or functional entity that collects audio and video in the terminal device.
  • the terminal device performing the method is taken an example.
  • the method is described in detail below, as shown in FIG. 2A .
  • the method may include following operations.
  • Operation 201 includes: acquiring a main audio signal collected by a main device when the main device shoots video of a target shooting object, and performing a first multi-channel rendering to acquire an ambient multi-channel audio signal.
  • a distance between the target shooting object and the additional device may be less than the first threshold.
  • the user may arrange an additional device arranged on the target shooting object to be tracked, start a video shooting function of the terminal device, and select the target shooting object in a video content by clicking the video content displayed in a display screen.
  • a radio module in the main device of the terminal device and a radio module in the additional device may start recording and collecting audio signals.
  • the radio module in the main device may be a microphone array and the microphone array may be configured to collect the main audio signal.
  • the radio module in the additional device may be a microphone.
  • FIG. 2B is a schematic diagram of an interface of the terminal device, and the display screen of the terminal device may display the video content.
  • the user may click a character 21 displayed in the interface to determine the character 21 as the target shooting object.
  • the character 21 may carry a Bluetooth headset (i.e., the additional device) to collect audio signal near the character 21, and the Bluetooth headset may send the audio signal to the terminal device.
  • a Bluetooth headset i.e., the additional device
  • the multi-channel may be dual channels, four channels, 5.1 channels or more channels.
  • a binaural rendering may be performed on the main audio signal through a head related transfer function (HRTF) to acquire an ambient binaural audio signal.
  • HRTF head related transfer function
  • the binaural rendering may be performed on the main audio signal through the binaural renderer in FIG. 1 to acquire the environment binaural audio signal.
  • Operation 202 includes: acquiring an audio signal collected by an additional device, and determining a first additional audio signal.
  • methods of acquiring an audio signal acquired by the additional device, and determining a first additional audio signal may include two implementation operations.
  • a first implementation operation includes: acquiring a second additional audio signal collected by the additional device arranged on the target shooting object, and determining the second additional audio signal as the first additional audio signal.
  • a second implementation operation includes: acquiring the second additional audio signal collected by the additional device arranged on the target shooting object, aligning the second additional audio signal with the main audio signal in a time domain to acquire the first additional audio signal.
  • the main audio signal and the second additional audio signal may be aligned in a time domain to acquire the first additional audio signal.
  • an actual delay may be acquired by combining an estimated acoustic wave propagation delay (i.e., the delay between the main audio signal and the second additional audio signal) with the system delay, and the main audio signal and the second additional audio signal may be aligned in the time domain according to the actual delay to acquire the first additional audio signal.
  • an estimated acoustic wave propagation delay i.e., the delay between the main audio signal and the second additional audio signal
  • the main audio signal and the second additional audio signal may be aligned in the time domain according to the actual delay to acquire the first additional audio signal.
  • a delay compensator in FIG. 1 may be configured to align the additional audio signal with the main audio signal in the time domain according to the delay between the main audio signal and the second additional audio signal to acquire the first additional audio signal.
  • Operation 203 includes: performing an ambient sound suppression processing on the first additional audio signal and the main audio signal to acquire a target audio signal.
  • the spatial filtering is performed on the main audio signal in an area outside the shooting FOV of the main device to acquire reverse focusing audio signal.
  • the reverse focusing audio signal is taken as a reference signal, and an adaptive filtering is performed on the first additional audio signal to acquire the target audio signal.
  • the spatial filtering is performed on the main audio signal in the area outside the shooting FOV of the main device to acquire the reverse focusing audio signal, which suppresses a sound signal at a location of the target shooting object included in the main audio signal to acquire a purer ambient audio signal.
  • the reverse focusing audio signal is taken as a reference signal, and the adaptive filtering is performed on the first additional audio signal, the ambient sound in the additional audio signal may be further suppressed.
  • the spatial filtering is performed on the main audio signal within the shooting FOV to acquire a focusing audio signal.
  • the first additional audio signal is taken as the reference signal, and an adaptive filtering is performed on the focusing audio signal to acquire the target audio signal.
  • the spatial filtering is performed on the main audio signal in the area within the shooting FOV to acquire the focusing audio signal, which suppresses part of the ambient sound in the main audio signal.
  • the first additional audio signal is taken as the reference signal and the adaptive filtering is performed on the focusing audio signal, which may further suppress the ambient sound outside a focusing area that cannot be completely suppressed in the focusing audio signal, in particular a sound at a location of the target shooting object included in the ambient sound.
  • a spatial filter in FIG. 1 may be configured to perform the spatial filtering on the main audio signal to acquire a directionally enhanced audio signal.
  • a main purpose of the spatial filtering is to acquire a purer ambient audio signal.
  • a target area of the spatial filtering is an area outside the shooting FOV, and an acquired signal is called reverse focusing audio signal.
  • the close-up audio signal in the area within the shooting FOV needs to be acquired through the spatial filtering, so the target area of spatial filtering is an area within the shooting FOV, and an acquired signal is the focusing audio signal.
  • the spatial filtering method may be based on a beamforming method such as a minimum variance distortionless response (MVDR) method, or a beamforming method of a general sidelobe canceller (GSC).
  • MVDR minimum variance distortionless response
  • GSC general sidelobe canceller
  • FIG. 1 includes two sets of adaptive filters.
  • the two sets of adaptive filters are applied to the target audio signal acquired in the above two cases respectively.
  • only one set of adaptive filter may be enabled according to a change of the target shooting object in the shooting FOV.
  • the adaptive filter applied to the first additional audio signal is enabled, and the reverse focusing audio signal is taken as the reference signal and input to further suppress the ambient sound from the first additional audio signal, and make a sound near the target shooting object more prominent.
  • the adaptive filter applied to the focusing audio signal is enabled, and the first additional audio signal is taken as the reference signal and input to further suppress the sound outside the shooting FOV from the focusing audio signal, especially a sound at the location of the target shooting object.
  • the adaptive filtering method may be a least mean square (LMS) method.
  • LMS least mean square
  • Operation 204 includes: performing a second multi-channel rendering on the target audio signal to acquire a target multi-channel audio signal.
  • three sets of binaural renderers in FIG. 1 are applied to the main audio signal, the target audio signal performed on the adaptive filtering in above case (1), and the target audio signal performed on the adaptive filtering in above case (2) respectively to acquire three sets of binaural signals, i.e., an ambient binaural signal, an additional binaural signal, and a focusing binaural signal.
  • the binaural renderer applied to the target audio signal of above case (1) and the binaural renderer applied to the target audio signal of above case (2) may not be enabled at the same time, and the two binaural renderers may be selected to be enabled according to the change of the target shooting object in the shooting FOV of the main device.
  • the binaural renderer applied to the main audio signal is always enabled.
  • the binaural renderer applied to the target audio signal in above case (1) is enabled.
  • the binaural renderer applied to the target audio signal in above case (2) is enabled.
  • the binaural renderer may include a deccorelator and a convolver inside, and needs an HRTF corresponding to a target location to simulate a perception of an auditory target in desired direction distance.
  • the scene-sound-source classification module may be used to determine a rendering rule according to a determined current scene and the sound source type of the target shooting object, the determined rendering rule may be applied to the deccorelator to acquire different rendering styles, and an azimuth and a distance between the additional device and the main device may be used to control to generate the HRTF.
  • a HRTF corresponding to a particular location may be acquired by interpolating on a set of previously stored HRTF, or by using a method based on a deep neural network (DNN).
  • DNN deep neural network
  • Operation 205 includes: mixing the ambient multi-channel audio signal with the target multi-channel audio signal to acquire a mixed multi-channel audio signal.
  • mixing the ambient multi-channel audio signal and the target multi-channel audio signal means the ambient multi-channel audio signal adding up the target multi-channel audio signal according to a gain.
  • the ambient multi-channel audio signal adding up the target multi-channel audio signal according to a gain may indicates that signal sampling points in the ambient multi-channel audio signal add up, and then add up signal sampling points in the target multi-channel audio signal.
  • the gain may be a preset fixed value or a variable gain.
  • variable gain may be determined according to the shooting FOV.
  • a mixer in FIG. 1 is configured to mix two of the three sets of binaural signals mentioned above.
  • the ambient binaural signal and the additional binaural signal are mixed.
  • the target shooting object is outside the shooting FOV of the main device, the ambient binaural signal and the focusing binaural signal are mixed.
  • the method may include: acquiring the main audio signal acquired by the main device when the main device shoots video of the target shooting object, and performing the first multi-channel rendering to acquire an ambient multi-channel audio signal; acquiring the audio signal acquired by the additional device arranged on the target shooting object, the distance between the audio signal acquired by the additional device and the target shooting object being less than the first threshold, and determining a first additional audio signal; performing ambient sound suppression processing on the first additional audio signal and the main audio signal to acquire the target audio signal; performing the second multi-channel rendering on the target audio signal to acquire the target multi-channel audio signal; and mixing the ambient multi-channel audio signal and the target multi-channel audio signal to acquire the mixed multi-channel audio signal.
  • the distributed audio signals may be acquired from the main device and additional device, and the relationship between distributed audio signal may be used to perform the ambient sound suppression processing according to the first additional audio signal collected by the additional device and the main audio signal collected by the main device, so as to suppress the ambient sound in the recording process and acquire the target multi-channel audio signal.
  • the ambient multi-channel audio signal (which is acquired by performing multi-channel rendering on the main audio signal) is mixed with the target multi-channel audio signal, not only the distributed audio signals are mixed, and the point shaped auditory target in the space sound field is simulated, but also the ambient sound is suppressed, thereby improving the recording effect of the audio signal.
  • the embodiments of the present disclosure also provide a multi-channel audio signal acquisition method, which includes following operations.
  • Operation 301 includes: acquiring a main audio signal collected by a microphone array in a main device.
  • Operation 302 includes: acquiring a second additional audio signal collected by an additional device.
  • a terminal device may perform the operations 301 and 302 described above.
  • the terminal device may continuously track a movement of the target shooting object in the shooting FOV in response to the change of the shooting FOV the main device.
  • the method may include: acquiring video data (including the main audio signal) shot by the main device and the second additional audio signal collected by the additional device.
  • the method may include: determining a type of current scene and a type of the target shooting object according to above video data and/or the second additional audio signal, matching a rendering rule through the type of the current scene and the type of the target shooting object, performing a multi-channel rendering on a subsequent audio signal according to the determined rendering rule.
  • the method may include: performing the second multi-channel rendering on the target audio signal according to the determined rendering rule to acquire a target multi-channel audio signal, and performing a first multi-channel rendering on the main audio signal according to the determined rendering rule to acquire an ambient multi-channel audio signal.
  • the operation of performing a multi-channel rendering on the target audio signal according to the determined rendering rule to acquire a target multi-channel audio signal may include following operations.
  • the operations include: acquiring video data shot by the main device and the second additional audio signal collected by the additional device.
  • the operations include: determining a type of a current scene and a type of the target shooting object.
  • the operations include: performing the multi-channel rendering on the target audio signal through the first rendering rule matching the type of the current scene and the type of the target shooting object to acquire the target multi-channel audio signal.
  • the operation of performing a multi-channel rendering on the main audio signal according to the determined rendering rule to acquire an ambient multi-channel audio signal may include following operations.
  • the operations include: acquiring the main audio signal collected by the main device when the main device shoots video of the target shooting object.
  • the operations include: determining a type of a current scene.
  • the operations include: performing the first multi-channel rendering on the main audio signal through the second rendering rule matching the type of the current scene to acquire the ambient multi-channel audio signal.
  • the scene-sound-source classification module may include two paths, video stream information is applied to one of the two paths using and audio stream information is applied to another path.
  • the two paths may include a scene analyzer and a voice/instrument classifier.
  • the scene analyzer may analyze a current space where the user is according to the video or audio, the current space includes a small room, a medium room, a large room, a concert hall, a stadium, or outdoor, etc.
  • the voice/instrument classifier may analyze a current sound source near the target shooting object according to the video or audio, the current sound source includes a male voice, a female, a child, an accordion, a guitar, a bass, a piano, a keyboard and a percussion instrument.
  • both the scene analyzer and the voice/instrument classifier may be used based DNN methods.
  • the video is input by each frame of images, and the audio is input by a Mel spectrum or a Mel-frequency cepstrum coefficient (MFCC) of sound.
  • MFCC Mel-frequency cepstrum coefficient
  • a rendering rule to be used in a following binaural rendering module may also be determined by combining a result of spatial scene analysis and the voice/instrument classifier with user preferences.
  • Operation 303 may include: generating a first multi-channel transfer function according to a type of the microphone array in the main device, performing the multi-channel rendering on the main audio signal according to the first multi-channel transfer function to acquire the ambient multi-channel audio signal.
  • the first multi-channel transfer function may be an HRTF function.
  • a set of preset HRTF function and binaural rendering method may be set in the binaural renderer in FIG. 1 .
  • the preset HRTF function is determined according to the type of the microphone array in the main device, and the binaural rendering is performed on main audio signal the by the HRTF function to acquire the ambient binaural audio signal.
  • Operation of 304 includes: judging whether the target shooting object is within the shooting FOV of the main device.
  • a target tracking module in FIG. 1 may include a visual target tracker and an audio target tracker configured to determine a position of the target shooting object, and estimate an azimuth and a distance between the target shooting object and the main device by using visual data and/or an audio signal.
  • the visual data and the audio signal may be used to determine the position of the target shooting object.
  • the visual target tracker and the audio target tracker are enabled at the same time.
  • the audio signal may be used to determine the position of the target shooting object. At this time, only the audio target tracker may be enabled.
  • one of the visual data and the audio signal may also be used to determine the position of the target shooting object.
  • Operation 305 includes: determining a first azimuth between the target shooting object and the main device according to video information and shooting parameters acquired by the main device, acquiring a first active duration of the second additional audio signal and a first distance, and determining a second active duration of the main audio signal according to the first active duration and the first distance.
  • the first distance is a target distance between a last determined target shooting object and the main device.
  • Operation 306 includes: performing a direction-of-arrival (DOA) estimation by using the main audio signal in the second active duration to acquire a second azimuth between the target shooting object and the main device, performing a smoothing processing on the first azimuth and the second azimuth to acquire a target azimuth.
  • DOA direction-of-arrival
  • Operation 307 includes: determining a second distance between the target shooting object and the main device according to the video information acquired by the main device, and calculating a second delay according to the second distance and the sound speed.
  • Operation 308 includes: performing a beamforming processing on the main audio signal toward the target azimuth to acquire a beamforming signal, and determining a first delay between the beamforming signal and the second additional audio signal.
  • a sound source direction measurement and a beamformer may be used to perform the beamforming processing on the main audio signal toward the target azimuth to acquire the beamforming signal, and a delay estimator may be configured to further determine the first delay between the beamforming signal and the second additional audio signal.
  • Operation 309 includes: performing the smoothing processing on the second delay and the first delay to acquire a target delay, and calculating the target distance according to the target delay and the sound speed.
  • the video data acquired at this time includes the target shooting object.
  • the first azimuth may be acquired according to the position of the target shooting object in a video frame shot by the video frame combined with prior information such as camera parameters (such as a focal length) and zoom scale (different shooting fields correspond to different zoom scales).
  • the azimuth and distance between the target shooting object and the main device may be determined by the audio signal to acquire the second azimuth.
  • the target azimuth is acquired by performing the smoothing processing on the first azimuth and the second azimuth.
  • a rough distance estimation may be performed to acquire the above second distance.
  • the sound speed and a predicted system delay the second delay may be acquired.
  • the delay i.e., the first delay
  • the target delay may be acquired by performing the smoothing processing on the first delay and the second delay.
  • the smoothing processing may include calculating an average value.
  • an average value of the first azimuth and the second azimuth may be calculated as the target azimuth.
  • the target delay may be acquired by performing the smoothing processing on the first delay and the second delay, and an average value of the first delay and the second delay may be taken as the target delay.
  • the visual target tracker in FIG. 1 may be configured to detect the target azimuth and the target distance between the target shooting object and the main device through the shot video.
  • the visual target tracker and the audio target tracker are configured to simultaneously detect the target azimuth and the target distance between the target shooting object and the main device, thereby further improving an accuracy.
  • Operation 310 includes: aligning, according to the target delay, the second additional audio signal with the main audio signal in the time domain to acquire the first additional audio signal.
  • Operation 311 includes: performing, according to the shooting FOV of the main device, the spatial filtering on the main audio signal in the area outside the shooting FOV to acquire the reverse focusing audio signal.
  • Operation 312 includes: taking the reverse focusing audio signal as the reference signal, performing the adaptive filtering on the first additional audio signal to acquire the target audio signal.
  • Operation 313 includes: acquiring the first active duration of the second additional audio signal and the first distance, and determining the second active duration of the main audio signal according to the first active duration and the first distance.
  • the first distance is the target distance between the last determined target shooting object and the main device.
  • an active duration of the audio signal is a duration when there is an effective audio signal in the audio signal.
  • a first active duration of the second additional audio signal may be a duration when there is an effective audio signal in the second additional audio signal.
  • the effective audio signal may be voice or instrument voice.
  • the effective audio signal may be a sound of the target shooting object.
  • the delay between the second additional audio signal and the main audio signal may be determined according to the first distance and the sound speed, and then the audio signal of the second active duration corresponding to the second additional audio signal in the main audio signal may be determined according to the delay and the first active duration.
  • Operation 314 includes: performing the DOA estimation by using the main audio signal in the second active duration to acquire the target azimuth between the target shooting object and the main device.
  • Operation 315 includes: performing the beamforming processing on the main audio signal toward the target azimuth to acquire the beamforming signal, and determining the first delay between the beamforming signal and the second additional audio signal.
  • Operation 316 includes: calculating the target distance between the target shooting object and the main device according to the first delay and the sound speed.
  • the video data acquired at this time does not include the target shooting object.
  • the audio signal may be used to determine the position of the target shooting object.
  • the audio target tracker may estimate the target azimuth and target distance between the target shooting object and the main device by using the main audio signal and the additional audio signal, operations of estimating the target azimuth and target distance between the target shooting may specifically include a sound source direction measurement, a beamforming, and a delay estimation.
  • the target azimuth may be acquired by performing the DOA estimation on the main audio signal.
  • the second additional audio may be analyzed before performing DOA estimation, and a duration corresponding to an active part of effective audio signal (which may be an audio signal with the sound of the target shooting object) of the second additional audio may be acquired, that is, the first active duration may be acquired.
  • the delay (i.e., the first delay) between the second additional audio signal and the main audio signal may be acquired according to a last estimated target distance, and the first active duration is corresponded to the second active duration in the main audio signal.
  • a segment of the main audio signal at the second active duration is cut out and performed conduct DOA estimation to acquire an azimuth between the target shooting object and the main device, and the azimuth is taken as the above target azimuth.
  • a generalized cross correlation (GCC) method of phase transform (PHAT) may be used to perform a time-difference-of-arrival (TDOA) estimation, and then the DOA may be acquired by combining type information of the microphone array.
  • the multi-channel main audio signal acquires the beamforming signal through a fixed direction beamformer, and a directional enhancement is performed toward the direction of the above target azimuth to improve an accuracy of a next delay estimation.
  • the beamforming method may be a delay-sum or a minimum variance distortion response (MVDR).
  • the above first delay estimation is also performed between the main audio beamforming signal and the second additional audio signal by using the TDOA method.
  • the TDOA estimation is also performed only during the active duration of the second additional audio signal. According to the first delay, the sound speed and the predicted system delay, the distance between the target shooting object and the main device may be acquired, that is, the target distance may be acquired.
  • Operation 317 includes: aligning, according to the first delay, the second additional audio signal with the main audio signal in the time domain to acquire the first additional audio signal.
  • the first delay is taken as the target delay between the main audio signal and the second additional audio signal, and according to the first delay, the second additional audio signal is aligned with the main audio signal in the time domain to acquire the first additional audio signal.
  • the delay compensator in FIG. 1 may align, according to the first delay to acquire the first additional audio signal, the second additional audio signal with the main audio signal in the time domain.
  • Operation 318 includes: performing, according to the shooting FOV of the main device, the spatial filtering on the main audio signal within the shooting FOV to acquire the focusing audio signal.
  • Operation 319 includes: taking the first additional audio signal as the reference signal, performing the adaptive filtering on the focusing audio signal to acquire the target audio signal.
  • a main purpose of spatial filtering is to acquire a purer ambient audio signal, so a target area of spatial filtering is outside the shooting FOV, and an acquired signal is hereinafter referred to as the reverse focusing audio signal.
  • a close-up audio signal within the shooting FOV needs to be acquired through the spatial filtering, so the target area of spatial filtering is the shooting FOV, and an acquired signal is hereinafter referred to as the focusing audio signal.
  • the shooting FOV of the main device is combined, a change of the shooting FOV of the main device may be followed, such that a local audio signal is directionally enhanced.
  • FIG. 1 two sets of adaptive filters are applied to the focusing audio signal and the additional audio signal respectively. Only one set of adaptive filter is enabled according to the change of the target shooting object in the shooting FOV.
  • the adaptive filter applied to the additional audio signal is enabled, and the reverse focusing audio signal is taken as the reference signal and input to further suppress the ambient sound from the additional audio signal, such that a sound near the target shooting object is more prominent.
  • the adaptive filter applied to the focusing audio signal is enabled, and the additional audio signal is taken as the reference signal and input to further suppress the sound outside the shooting FOV from the focusing audio signal.
  • the adaptive filtering method may be the LMS, etc.
  • Operation 320 includes: generating a second multi-channel transfer function according to the target distance and the target azimuth.
  • Operation 321 includes: performing the multi-channel rendering on the target audio signal according to the second multi-channel transfer function to acquire the target multi-channel audio signal.
  • Operation 322 includes: determining a first gain of the ambient multi-channel audio signal and a second gain of the target multi-channel audio signal according to shooting parameters of the main device.
  • Operation 323 includes: mixing the ambient multi-channel audio signal with the target multi-channel audio signal according to the first gain and the second gain to acquire the mixed multi-channel audio signal.
  • a mixed gain controller may determine a mixed gain according to the user's shooting FOV, that is, the mixed gain is a proportion of two groups of signals in the mixed signal. For example, when a zoom level of the camera is increased, that is, when the FOV of the camera is reduced, a gain of the ambient binaural audio signal is reduced, a gain of the additional binaural audio signal (that is, the determined target multi-channel audio signal when the target shooting object is within the FOV) or the focusing binaural audio signal (that is, the determined target multi-channel audio signal when the target shooting object is outside the FOV) is increased. In this way, when the shooting FOV of the video is focused on a particular area, the audio is also focused on the particular area.
  • the range of the shooting FOV is determined according to the shooting parameters of the main device (such as the zoom level of the camera), and the first gain of the ambient multi-channel audio signal and the second gain of the target multi-channel audio signal are determined accordingly, such that when the shooting FOV of the video is focused to the particular area, the audio is also be focused to the particular area, thereby creating an effect of "immersive, sound follows image".
  • the multi-channel audio signal acquisition method provided by the embodiments of the present disclosure is a distributed recording and audio focusing method that may create a more realistic sense of presence. This method may simultaneously use the microphone array in the main device and the microphone in the additional device (TWS Bluetooth headset) of the terminal device for a distributed audio acquisition and fusion.
  • the microphone array of the terminal device collects the spatial audio (that is, the main audio signal in the embodiments of the present disclosure) at the location of the main device, and the TWS Bluetooth headset may be arranged on the target shooting object to be tracked, move along with the movement of the target shooting object to collect the high-quality close-up audio signal (that is, the first additional audio signal in the embodiments of the present disclosure) in the distance, , perform a corresponding adaptive filtering on the two groups of collected signals by combining with a FOV change in the video shooting process to achieve the ambient sound suppression, perform the spatial filtering on the spatial audio signal in the specified area to achieve the directional enhancement, track and locate the interested target shooting object in combination with the two positioning methods of vision and sound, perform a HRTF binaural rendering and an up mixing or a down mixing on the three groups signals respectively including the spatial audio, the high-quality close-up audio and the directional enhancement audio, acquire three sets of binaural signals including the ambient binaural signals, the additional binaural signals and the focusing binaural signal are acquired
  • This technical solution may have following technical effects.
  • a spatial sound field and a point shaped auditory target at the specified position may simultaneously simulated.
  • a good directional enhancement effect may be acquired by using the distributed audio signal, and interference sound and ambient sound may be suppressed obviously when the distributed audio signal is focused.
  • the embodiments of the present disclosure provide a multi-channel audio signal acquisition device 400, which may include following modules.
  • the multi-channel audio signal acquisition device 400 includes an acquisition module 401 configured to acquire a main audio signal collected by a main device when the main device shoots video of a target shooting object, and perform a first multi-channel rendering to acquire an ambient multi-channel audio signal, acquire an audio signal collected by an additional device, and determine a first additional audio signal.
  • a distance between the additional device and the target shooting object is less than a first threshold.
  • the multi-channel audio signal acquisition device 400 includes a processing module 402 configured to perform an ambient sound suppression processing on the first additional audio signal and the main audio signal to acquire a target audio signal.
  • the processing module 402 is configured to perform a second multi-channel rendering on the target audio signal to acquire a target multi-channel audio signal.
  • the processing module 402 is configured to mix the ambient multi-channel audio signal and the target multi-channel audio signal to acquire a mixed multi-channel audio signal.
  • the processing module 402 is configured to determine a first gain of the ambient multi-channel audio signal and a second gain of the target multi-channel audio signal according to shooting parameters of the main device.
  • the processing module 402 is configured to mix the ambient multi-channel audio signal with the target multi-channel audio signal according to the first gain and the second gain to acquire the mixed multi-channel audio signal.
  • the acquisition module 401 is configured to acquire the main audio signal collected by a microphone array in the main device.
  • the acquisition module 401 is configured to generate a first multi-channel transfer function according to a type of the microphone array in the main device.
  • the acquisition module 401 is configured to perform a multi-channel rendering on the main audio signal according to the first multi-channel transfer function to acquire the ambient multi-channel audio signal.
  • the acquisition module 401 is configured to acquire a second additional audio signal collected by the additional device arranged on the target shooting object, and determine the second additional audio signal as the first additional audio signal.
  • the acquisition module 401 is configured to acquire the second additional audio signal collected by the additional device arranged on the target shooting object, and align the second additional audio signal with the main audio signal in a time domain to acquire the first additional audio signal.
  • the processing module 402 is configured to acquire a target azimuth between the target shooting object and the main device.
  • the processing module 402 is configured to perform a beamforming processing on the main audio signal toward the target azimuth to acquire a beamforming signal.
  • the processing module 402 is configured to determine a target delay between the main audio signal and the second additional audio signal.
  • the processing module 402 is configured to align, according to the first delay, the second additional audio signal with the main audio signal in a time domain to acquire the first additional audio signal.
  • the processing module 402 is configured to acquire a target distance and the target azimuth between the target shooting object and the main device.
  • the processing module 402 is configured to generate a second multi-channel transfer function according to the target distance and the target azimuth.
  • the processing module 402 is configured to perform the multi-channel rendering on the target audio signal according to the second multi-channel transfer function to acquire the target multi-channel audio signal.
  • the acquisition module 401 is configured to acquire a first active duration of the second additional audio signal and a first distance when it is detected that the target shooting object is outside the shooting field of view of the main device.
  • the first distance is the target distance between a last determined target shooting object and the main device.
  • the acquisition module 401 is configured to determine a second active duration of the main audio signal according to the first active duration and the first distance
  • the acquisition module 401 is specifically configured to perform a direction-of-arrival (DOA) estimation by using the main audio signal in the second active duration to acquire a target azimuth between the target shooting object and the main device.
  • DOA direction-of-arrival
  • the acquisition module 401 is configured to perform the beamforming processing on the main audio signal toward the target azimuth to acquire the beamforming signal when the target shooting object is detected to be outside the shooting field of view of the main device.
  • the acquisition module 401 is configured to determine the first delay between the beamforming signal and the second additional audio signal.
  • the acquisition module 401 is configured to calculate the target distance between the target shooting object and the main device according to the first delay and the sound speed.
  • the processing module 402 is configured to perform the spatial filtering on the main audio signal within the shooting field of view according to the shooting field of view of the main device to acquire a focusing audio signal when the target shooting object is detected to be outside the shooting field of view of the main device.
  • the processing module 402 is configured to take the first additional audio signal as the reference signal, perform an adaptive filtering on the focusing audio signal to acquire the target audio signal.
  • the acquisition module 401 is configured to determine a first azimuth between the target shooting object and the main device according to video information and shooting parameters acquired by the main device when it is detected that the target shooting object is within the shooting field of view of the main device.
  • the acquisition module 401 is configured to acquire a first active duration of the second additional audio signal and a first distance.
  • the first distance is a target distance between a last determined target shooting object and the main device.
  • the acquisition module 401 is configured to determine a second active duration of the main audio signal according to the first active duration and first distance.
  • the acquisition module 401 is configured to perform the DOA estimation by using the main audio signal in the second active duration to acquire a second azimuth between the target shooting object and the main device.
  • the acquisition module 401 is configured to perform a smoothing processing on the first azimuth and the second azimuth to acquire the target azimuth.
  • the acquisition module 401 is configured to determine a second distance between the target shooting object and the main device according to the video information acquired by the main device when it is detected that the target shooting object is within the shooting field of view of the main device.
  • the acquisition module 401 is configured to calculate a second delay according to the second distance and the sound speed.
  • the acquisition module 401 is configured to perform a beamforming processing on the main audio signal toward the target azimuth to acquire a beamforming signal.
  • the acquisition module 401 is configured to determine a first delay between the beamforming signal and the second additional audio signal.
  • the acquisition module 401 is configured to perform a smoothing processing on the second delay and the first delay to acquire a target delay.
  • the acquisition module 401 is specifically configured to calculate a target distance according to the target delay and the sound speed.
  • the processing module 402 is configured to perform, according to the shooting field of view of the main device, the spatial filtering on the main audio signal in the area outside the shooting field of view to acquire the reverse focusing audio signal when the target shooting object is detected to be within the shooting field of view of the main device.
  • the processing module 402 is configured to take the reverse focusing audio signal as the reference signal, perform the adaptive filtering on the first additional audio signal to acquire the target audio signal.
  • the processing module 402 is configured to acquire the video data shot by the main device and the second additional audio signal collected by the additional device.
  • the processing module 402 is configured to determine a type of the current scene and a type of target shooting object.
  • the processing module 402 is configured to perform the multi-channel rendering on the target audio signal through a first rendering rule matching the type of the current scene and the type of the target shooting object to acquire the target multi-channel audio signal.
  • the processing module 402 is configured to acquire the main audio signal collected by the main device when the main device shoots video of the target shooting object.
  • the processing module 402 is configured to determine a type of a current scene.
  • the processing module 402 is configured to perform the first multi-channel rendering on the main audio signal through the second rendering rule matching the type of the current scene to acquire the ambient multi-channel audio signal.
  • the embodiments of the present disclosure provide a terminal device including a processor, a memory, and a computer program stored on the memory and capable of running on the processor.
  • the computer program is executed by the processor to perform the multi-channel audio signal acquisition method provided by the embodiment of the above method.
  • the embodiments of the present disclosure also provide a terminal device including a multi-channel audio signal acquisition device 400 and a main device 500.
  • the main device is configured to collect the main audio signal when the main device shoots video, and send the main audio signal to the multi-channel audio signal acquisition device.
  • the embodiments of the present disclosure also provide a terminal device including but not limited to a radio frequency (RF) circuit 601, a memory 602, an input unit 603, a display unit 604, a sensor 605, an audio circuit 606, a WiFi module 607, a processor 608, a Bluetooth module 609, a camera 610 and other components.
  • the RF circuit 601 includes a receiver 6011 and a transmitter 6012.
  • the terminal device shown in FIG. 6 does not limit to the terminal device, the terminal device may include more or fewer components than the terminal device shown in FIG. 6 , combine some components, or include different component arrangements.
  • the RF circuit 601 may be configured to receive and send information or receive and send signal during a call. Specifically, a downlink information of a base station is received and sent to the processor 608 for processing. In addition, designed uplink data is sent to the base station.
  • the RF circuit 601 includes but is not limited to an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier (LNA), and a duplexer, etc.
  • the RF circuit 601 may also communicate with the network and other devices through the wireless communication.
  • the above wireless communication may use any communication standard or protocol including but not limited to the global system of mobile communication (GSM), the general packet radio service (GPRS), the code division multiple access (CDMA), the wideband code division multiple access (WCDMA), the long term evolution (LTE), the E-mail, and the short messaging service (SMS), etc.
  • GSM global system of mobile communication
  • GPRS general packet radio service
  • CDMA code division multiple access
  • WCDMA wideband code division multiple access
  • LTE long term evolution
  • E-mail the E-mail
  • SMS short messaging service
  • the memory 602 may be configured to store software programs and modules, and the processor 608 may execute various functional applications and data processing of the terminal device by running the software programs and modules stored in the memory 602.
  • the memory 602 may mainly include a program storage area and a data storage area, the program storage area may store an operating system, an application program required for at least one function (such as a sound playing function or an image playing function, etc.), etc.
  • the data storage area may store data (such as audio signal or phone book, etc.) that has been created and used during using the terminal device.
  • the memory 602 may include a high-speed random access memory, and may also include non-volatile memory such as at least one disk storage component, a flash memory component, or other volatile solid-state storage components.
  • the input unit 603 may be configured to receive input digital or character information and generate key signal input related to user settings and function control of the terminal device.
  • the input unit 603 may include a touch panel 6031 and other input devices 6032.
  • the touch panel 6031 known as the touch screen may collect the user's touch operations (such as the user's operation on or near the touch panel 6031 with any suitable object or accessory such as fingers, and stylus, etc.) on or near it, and drive a corresponding connection device according to a preset program.
  • the touch panel 6031 may include two parts: a touch detection device and a touch controller. The touch detection device detects a user's touch position and a signal brought by the touch operation, and transmits the signal to the touch controller.
  • the touch controller receives touch information from the touch detection device, converts the touch information into contact coordinates, and then sends the contact coordinates to the processor 608, and may receive commands from the processor 608 and execute the commands.
  • the touch panel 6031 may be realized by a resistance, a capacitance, an infrared ray, and a surface acoustic wave, etc.
  • the input unit 603 may also include the other input devices 6032.
  • the other input devices 6032 may include but are not limited to one or more of a physical keyboard, a function key (such as a volume control key or a switch key, etc.), a trackball, a mouse, and a joystick, etc.
  • the display unit 604 may be configured to display information input by the user, information provided to the user and various menus of the terminal device.
  • the display unit 604 may include a display panel 6041.
  • the display panel 6041 may be configured in a form of a liquid crystal display (LCD), or an organic light emitting diode (OLED), etc.
  • the touch panel 6031 may cover the display panel 6041. When the touch panel 6031 detects the touch operation on or near it, the touch operation is transmitted to the processor 608 to determine a touch event, and then the processor 608 provides a corresponding visual output on the display panel 6041 according to the touch event.
  • the touch panel 6031 and the display panel 6041 are two independent components to realize the input and output functions of the terminal device, in some embodiments, the touch panel 6031 may be integrated with the display panel 6041 to perform the input and output functions of the terminal device.
  • the terminal device may also include at least one sensor 605 such as a light sensor, a motion sensor, and other sensors.
  • the light sensor may include an ambient light sensor and a proximity sensor.
  • the ambient light sensor may adjust a brightness of the display panel 6041 according to a brightness of the ambient light
  • the proximity sensor may exit the display panel 6041 and/or backlight when the terminal device moves to the ear.
  • an accelerometer sensor may detect value of acceleration in all directions (generally three-axis), and may detect value and direction of gravity when the accelerometer sensor is stationary, which may be configured to identify applications of pose of the terminal device (such as horizontal and vertical screen switching, related games, magnetometer pose calibration), functions related to vibration recognition (such as a pedometer, knocking), etc.
  • the terminal device may include an acceleration sensor, a depth sensor, and a distance sensor, etc.
  • the audio circuit 606, a loudspeaker 6061 and a microphone 6062 may provide audio interfaces between the user and the terminal device.
  • the audio circuit 606 may transmit a converted electrical signal of the received audio signal to the loudspeaker 6061, and then the loudspeaker 6061 convert the electrical signal into a sound signal for output.
  • the microphone 6062 converts the collected sound signal into an electrical signal, which is received by the audio circuit 606 and converted into an audio signal, and then the audio signal is output to the processor 608 for processing, then the audio signal is sent to another terminal device through the RF circuit 601, or the audio signal is output to the memory 602 for further processing.
  • the microphone 6062 may be a microphone array.
  • the WiFi is a short-range wireless transmission technology.
  • the terminal device may help the user send and receive e-mails, browse web pages and access streaming media through the WiFi module 607.
  • the WiFi provides the user with wireless broadband Internet access.
  • FIG. 6 shows the WiFi module 607, it may be understood that the WiFi module 607 has no need to be included in the terminal device, and may be omitted as needed without changing the essence of the present disclosure.
  • the processor 608 is a control center of the terminal device, which connect various parts of the entire terminal device through various interfaces and circuits, and performs various functions and processes data of the terminal device by running or executing software programs and/or modules stored in the memory 602, and calling data stored in the memory 602, so as to monitor the terminal device.
  • the processor 608 may include one or more processing units.
  • the processor 608 may integrate an application processor and a modem processor, the application processor mainly processes an operating system, a user interface, and an application program, etc., and the modem processor mainly processes wireless communication. It should be understood that the above modem processor may not be integrated into the processor 608.
  • the terminal device may also include a Bluetooth module 609 configured for short distance wireless communication and may be divided into a Bluetooth data module and a Bluetooth voice module according to functions.
  • the Bluetooth module is a basic circuit set of chips integrated with Bluetooth function, which is configured for wireless network communication.
  • the Bluetooth module may be roughly divided into three types: a data transmission module, a Bluetooth audio module, and a Bluetooth module combining audio and data, etc.
  • the terminal device may also include other functional modules, which will not be repeated here.
  • the microphone 6062 may be configured to collect the main audio signal, and the terminal device may connect to the additional device through the WiFi module 607 or the Bluetooth module 609, and receive the second additional audio signal collected by the additional device.
  • the processor 608 is configured to acquire the main audio signal, perform the multi-channel rendering, acquire the ambient multi-channel audio signal, acquire the audio signal collected by the additional device, determine the first additional audio signal, perform the ambient sound suppression processing through the first additional audio signal and the main audio signal to acquire a target audio signal, perform the multi-channel rendering on the target audio signal to acquire the target multi-channel audio signal, and mix the ambient multi-channel audio signal with the target multi-channel audio signal to acquire the mixed multi-channel audio signal.
  • the distance between the additional device and the target shooting object is less than the first threshold value.
  • the processor 608 may also be configured to perform other processes implemented by the terminal device in the above method embodiments, which is not be repeated here.
  • the embodiments of the present disclosure also provide a multi-channel audio signal acquisition system including a multi-channel audio signal acquisition device, a main device, and an additional device.
  • the main device and the additional device establish communication connections with the multi-channel audio signal respectively.
  • the main device is configured to collect the main audio signal when main device shoots video of the target shooting object, and send the main audio signal to the multi-channel audio signal acquisition device.
  • the additional device is configured to collect the second additional audio signal and send the second additional audio signal to the multi-channel audio signal acquisition device.
  • the multi-channel audio signal acquisition system may be as shown in FIG. 1
  • the audio processing device in FIG. 1 may be the multi-channel audio signal acquisition device.
  • the embodiments of the present disclosure also provide a computer-readable storage medium including a computer program, and the multi-channel audio signal acquisition method in the above method embodiments are performed when the computer program is executed by a processor.
  • the system, the device and the method may be realized in other ways.
  • the device embodiments described above is only exemplary.
  • a division of the units is only a division according to logical function, and there may be another division mode when it is actually implemented.
  • multiple units or components may be combined or integrated into another system, or some features may be ignored or not performed.
  • mutual coupling, direct coupling or communication connection shown or discussed above may be indirect coupling or communication connection through some interfaces
  • indirect coupling or communication connection of devices or units may be electrical, mechanical or other forms.
  • the units spaced apart may or may not be physically spaced apart, and the displayed unit may or may not be a physical unit, that is, the displayed unit may be located in one place or distributed to multiple network units. Some or all of the units may be selected according to the practical needs to achieve the purpose of the embodiments.
  • each functional unit in the embodiments of the present disclosure may be integrated in a processing unit, or each unit may physically exist independently, or two or more units may be integrated in a unit.
  • the above integrated units may be realized in a form of hardware or a software functional unit.
  • the integrated unit When the integrated unit is realized in the form of the software functional unit and sold or used as an independent product, the integrated unit may be stored in a computer-readable storage medium.
  • the technical solution of the present disclosure may be embodied in the form of software product in essence, or the part that contributes to the related art may be embodied in the form of software product, or the whole or part of the technical solution may be embodied in the form of software product.
  • the computer software product is stored in a storage medium including a number of instructions to enable a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or some the operations of the method described in various embodiments of the present invention.
  • the aforementioned storage medium may include a USB flash disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disc or an optical disc and other medium that may store program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Quality & Reliability (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Otolaryngology (AREA)
  • Stereophonic System (AREA)
EP21870910.3A 2020-09-25 2021-06-29 Procédé et appareil d'acquisition de signal audio multicanal, et système Pending EP4220637A4 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011027264.8A CN114255781A (zh) 2020-09-25 2020-09-25 一种多通道音频信号获取方法、装置及系统
PCT/CN2021/103110 WO2022062531A1 (fr) 2020-09-25 2021-06-29 Procédé et appareil d'acquisition de signal audio multicanal, et système

Publications (2)

Publication Number Publication Date
EP4220637A1 true EP4220637A1 (fr) 2023-08-02
EP4220637A4 EP4220637A4 (fr) 2024-01-24

Family

ID=80790688

Family Applications (1)

Application Number Title Priority Date Filing Date
EP21870910.3A Pending EP4220637A4 (fr) 2020-09-25 2021-06-29 Procédé et appareil d'acquisition de signal audio multicanal, et système

Country Status (3)

Country Link
EP (1) EP4220637A4 (fr)
CN (1) CN114255781A (fr)
WO (1) WO2022062531A1 (fr)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117714851A (zh) * 2022-05-25 2024-03-15 荣耀终端有限公司 录像方法、装置及存储介质
CN116668892B (zh) * 2022-11-14 2024-04-12 荣耀终端有限公司 音频信号的处理方法、电子设备及可读存储介质

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102969003A (zh) * 2012-11-15 2013-03-13 东莞宇龙通信科技有限公司 摄像声音提取方法及装置
CN104599674A (zh) * 2014-12-30 2015-05-06 西安乾易企业管理咨询有限公司 一种摄像中定向录音的系统及方法
EP3251116A4 (fr) * 2015-01-30 2018-07-25 DTS, Inc. Système et procédé de capture, de codage, de distribution, et de décodage d'audio immersif
GB2543275A (en) * 2015-10-12 2017-04-19 Nokia Technologies Oy Distributed audio capture and mixing
GB2543276A (en) * 2015-10-12 2017-04-19 Nokia Technologies Oy Distributed audio capture and mixing
US10616681B2 (en) * 2015-09-30 2020-04-07 Hewlett-Packard Development Company, L.P. Suppressing ambient sounds
US9998606B2 (en) * 2016-06-10 2018-06-12 Glen A. Norris Methods and apparatus to assist listeners in distinguishing between electronically generated binaural sound and physical environment sound
GB2556058A (en) * 2016-11-16 2018-05-23 Nokia Technologies Oy Distributed audio capture and mixing controlling
CN108389586A (zh) * 2017-05-17 2018-08-10 宁波桑德纳电子科技有限公司 一种远程集音装置、监控装置及远程集音方法
US10178490B1 (en) * 2017-06-30 2019-01-08 Apple Inc. Intelligent audio rendering for video recording
GB2567244A (en) * 2017-10-09 2019-04-10 Nokia Technologies Oy Spatial audio signal processing
CN110970057B (zh) * 2018-09-29 2022-10-28 华为技术有限公司 一种声音处理方法、装置与设备
CN111050269B (zh) * 2018-10-15 2021-11-19 华为技术有限公司 音频处理方法和电子设备
EP3683794B1 (fr) * 2019-01-15 2021-07-28 Nokia Technologies Oy Traitement audio

Also Published As

Publication number Publication date
CN114255781A (zh) 2022-03-29
EP4220637A4 (fr) 2024-01-24
WO2022062531A1 (fr) 2022-03-31

Similar Documents

Publication Publication Date Title
EP3440538B1 (fr) Sortie audio spatialisée basée sur des données de position prédites
US11049519B2 (en) Method for voice recording and electronic device thereof
US9516241B2 (en) Beamforming method and apparatus for sound signal
WO2021037129A1 (fr) Procédé et appareil de collecte de son
JP6400566B2 (ja) ユーザインターフェースを表示するためのシステムおよび方法
US20180376273A1 (en) System and method for determining audio context in augmented-reality applications
US10891938B2 (en) Processing method for sound effect of recording and mobile terminal
EP4220637A1 (fr) Procédé et appareil d'acquisition de signal audio multicanal, et système
WO2014161309A1 (fr) Procédé et appareil pour qu'un terminal mobile mette en œuvre un suivi de source vocale
CN113014983B (zh) 视频播放方法、装置、存储介质及电子设备
US9832587B1 (en) Assisted near-distance communication using binaural cues
CN114205701B (zh) 降噪方法、终端设备及计算机可读存储介质
WO2023231787A1 (fr) Procédé et appareil de traitement audio
CN110297543B (zh) 一种音频播放方法及终端设备
JP7394937B2 (ja) デバイス決定方法及び装置、電子機器、コンピュータ読み取り可能な記憶媒体
CN115407272A (zh) 超声信号定位方法及装置、终端、计算机可读存储介质
EP3917160A1 (fr) Capture de contenu
WO2019183904A1 (fr) Procédé d'identification automatique de différentes voix humaines dans un son
WO2023197646A1 (fr) Procédé de traitement de signal audio et dispositif électronique
WO2024027315A1 (fr) Procédé et appareil de traitement audio, dispositif électronique, support de stockage et produit-programme
CN110428802B (zh) 声音混响方法、装置、计算机设备及计算机存储介质
CN117098060A (zh) 方位信息确定方法、装置、电子设备、存储介质及芯片
CN117636928A (zh) 一种拾音装置及相关音频增强方法
Mao Acoustic sensing on smart devices
CN117153180A (zh) 声音信号处理方法、装置、存储介质及电子设备

Legal Events

Date Code Title Description
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE

PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE

17P Request for examination filed

Effective date: 20221229

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

DAV Request for validation of the european patent (deleted)
DAX Request for extension of the european patent (deleted)
A4 Supplementary search report drawn up and despatched

Effective date: 20231222

RIC1 Information provided on ipc code assigned before grant

Ipc: G10L 21/0356 20130101ALN20231218BHEP

Ipc: G10L 21/0216 20130101ALN20231218BHEP

Ipc: H04S 3/00 20060101ALI20231218BHEP

Ipc: H04R 3/00 20060101ALI20231218BHEP

Ipc: G10L 21/0208 20130101AFI20231218BHEP