WO2021143656A1 - 立体声拾音方法、装置、终端设备和计算机可读存储介质 - Google Patents

立体声拾音方法、装置、终端设备和计算机可读存储介质 Download PDF

Info

Publication number
WO2021143656A1
WO2021143656A1 PCT/CN2021/071156 CN2021071156W WO2021143656A1 WO 2021143656 A1 WO2021143656 A1 WO 2021143656A1 CN 2021071156 W CN2021071156 W CN 2021071156W WO 2021143656 A1 WO2021143656 A1 WO 2021143656A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
sound pickup
microphone
terminal device
target
Prior art date
Application number
PCT/CN2021/071156
Other languages
English (en)
French (fr)
Inventor
韩博
刘鑫
熊伟
靖霄
李峰
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN202180007656.4A priority Critical patent/CN114846816B/zh
Priority to BR112022013690A priority patent/BR112022013690A2/pt
Priority to CN202311246081.9A priority patent/CN117528349A/zh
Priority to EP21740899.6A priority patent/EP4075825A4/en
Priority to US17/758,927 priority patent/US20230048860A1/en
Priority to JP2022543511A priority patent/JP2023511090A/ja
Publication of WO2021143656A1 publication Critical patent/WO2021143656A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R5/00Stereophonic arrangements
    • H04R5/04Circuit arrangements, e.g. for selective connection of amplifier inputs/outputs to loudspeakers, for loudspeaker detection, or for adaptation of settings to personal preferences or hearing impairments
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R5/00Stereophonic arrangements
    • H04R5/027Spatial or constructional arrangements of microphones, e.g. in dummy heads
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R1/00Details of transducers, loudspeakers or microphones
    • H04R1/20Arrangements for obtaining desired frequency or directional characteristics
    • H04R1/32Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only
    • H04R1/40Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only by combining a number of identical transducers
    • H04R1/406Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only by combining a number of identical transducers microphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R29/00Monitoring arrangements; Testing arrangements
    • H04R29/004Monitoring arrangements; Testing arrangements for microphones
    • H04R29/005Microphone arrays
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/005Circuits for transducers, loudspeakers or microphones for combining the signals of two or more microphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S1/00Two-channel systems
    • H04S1/007Two-channel systems in which the audio signals are in digital form
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2205/00Details of stereophonic arrangements covered by H04R5/00 but not provided for in any of its subgroups
    • H04R2205/026Single (sub)woofer with two or more satellite loudspeakers for mid- and high-frequency band reproduction driven via the (sub)woofer
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2430/00Signal processing covered by H04R, not provided for in its groups
    • H04R2430/20Processing of the output signals of the acoustic transducers of an array for obtaining a desired directivity characteristic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2499/00Aspects covered by H04R or H04S not otherwise provided for in their subgroups
    • H04R2499/10General applications
    • H04R2499/11Transducers incorporated or for use in hand-held devices, e.g. mobile phones, PDA's, camera's
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/13Aspects of volume control, not necessarily automatic, in stereophonic sound systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/15Aspects of sound capture and related signal processing for recording or reproduction

Definitions

  • the present invention relates to the field of audio processing, and in particular, to a stereo sound pickup method, device, terminal equipment, and computer-readable storage medium.
  • the direction of the stereo beam generated by the terminal equipment is often unable to be adjusted due to the solidification of configuration parameters. It is difficult for the terminal equipment to adapt to the requirements of various scenes, so that a better stereo recording effect cannot be obtained.
  • the purpose of the present invention is to provide a stereo sound pickup method, device, terminal equipment and computer-readable storage medium, so that the terminal equipment can obtain better stereo recording effects in different video recording scenes.
  • an embodiment of the present invention provides a stereo sound pickup method, which is applied to a terminal device, the terminal device includes a plurality of microphones, and the method includes:
  • a target beam parameter group corresponding to the multiple target sound pickup data is determined from a plurality of pre-stored beam parameter groups; wherein, the target beam parameter group includes the multiple The beam parameters corresponding to the target sound pickup data;
  • a stereo beam is formed according to the target beam parameter group and the multiple target sound pickup data.
  • the target beam parameter group is determined according to the posture data and camera data of the terminal device, when the terminal device is in different video recording scenes, different posture data and cameras will be obtained Data, and then determine different target beam parameter sets, so that when stereo beams are formed according to the target beam parameter sets and multiple target pickup data, different target beam parameter sets can be used to adjust the direction of the stereo beam, thereby effectively reducing the recording environment
  • the camera data includes activation data, and the activation data characterizes an activated camera;
  • the step of determining a target beam parameter group corresponding to the plurality of target sound pickup data from a plurality of pre-stored beam parameter groups according to the attitude data and the camera data includes: according to the attitude data and the The activation data determines a first target beam parameter group corresponding to the plurality of target sound pickup data from a plurality of pre-stored beam parameter groups;
  • the step of forming a stereo beam according to the target beam parameter group and the plurality of target sound pickup data includes: forming a first stereo beam according to the first target beam parameter group and the plurality of target sound pickup data; wherein, The first stereo beam points to the shooting direction of the activated camera.
  • the first target beam parameter group is determined by the posture data of the terminal device and the activation data characterizing the activated camera, and the first stereo beam is formed according to the first target beam parameter group and multiple target sound pickup data , It is realized that in different video recording scenes, the direction of the first stereo beam is adjusted adaptively according to the attitude data and the activation data, ensuring that the terminal device can obtain a better stereo recording effect when recording video.
  • the multiple beam parameter groups include a first beam parameter group, a second beam parameter group, a third beam parameter group, and a fourth beam parameter group.
  • the first beam parameter group, the The beam parameters in the second beam parameter group, the third beam parameter group, and the fourth beam parameter group are different;
  • the first target beam parameter group is the first beam parameter group
  • the first target beam parameter group is the second beam parameter group
  • the first target beam parameter group is the third beam parameter group
  • the first target beam parameter group is the fourth beam parameter group.
  • the camera data includes activation data and zoom data, wherein the zoom data is a zoom factor of an activated camera characterized by the activation data;
  • the step of determining a target beam parameter group corresponding to the plurality of target sound pickup data from a plurality of pre-stored beam parameter groups according to the attitude data and the camera data includes: according to the attitude data, the The activation data and the zoom data determine a second target beam parameter group corresponding to the plurality of target sound pickup data from a plurality of pre-stored beam parameter groups;
  • the step of forming a stereo beam according to the target beam parameter group and the plurality of target sound pickup data includes: forming a second stereo beam according to the second target beam parameter group and the plurality of target sound pickup data; wherein, The second stereo beam points to the shooting direction of the activated camera, and the width of the second stereo beam narrows as the zoom factor increases.
  • the second target beam parameter group is determined by the posture data of the terminal device, the activation data characterizing the activated camera, and the zoom data, and the second target beam parameter group is formed according to the second target beam parameter group and multiple target sound pickup data.
  • the second stereo beam realizes that in different video recording scenarios, the direction and width of the second stereo beam can be adjusted adaptively according to the attitude data, activation data, and zoom data, so that it can be used in noisy environments and long-distance sound pickup conditions. Achieve better recording robustness.
  • the step of obtaining multiple target sound pickup data from the sound pickup data of the multiple microphones includes:
  • If there is abnormal sound data eliminate the abnormal sound data in the sound pickup data of the multiple microphones to obtain the initial target sound pickup data;
  • multiple target pickup data used to form a stereo beam are determined by detecting microphone jams on multiple microphones and processing abnormal sounds on the pickup data of multiple microphones. In the case of interference and microphone plugging, it still has good recording robustness, so as to ensure a good stereo recording effect.
  • the step of obtaining the serial number of the microphone that has not blocked the microphone according to the sound pickup data of the multiple microphones includes:
  • a relatively accurate detection result of microphone jamming can be obtained, which is beneficial to the subsequent determination of multiple target pickup data used to form a stereo beam. , So as to ensure a good stereo recording effect.
  • the step of detecting whether there is abnormal sound data in the sound pickup data of each of the microphones includes:
  • the pre-trained abnormal sound detection network and the frequency domain information corresponding to the sound pickup data of each microphone it is detected whether there is abnormal sound data in the sound pickup data of each microphone.
  • the microphone pickup data is subjected to frequency domain transformation processing, and the pre-trained abnormal sound detection network and the frequency domain information corresponding to the microphone pickup data are used to detect whether there is an abnormality in the microphone pickup data.
  • Sound data which is convenient for obtaining relatively clean pickup data in the follow-up, so as to ensure a good stereo recording effect.
  • the step of eliminating abnormal sound data in the sound pickup data of the multiple microphones includes:
  • the abnormal sound when the abnormal sound is eliminated, by detecting whether there is preset sound data in the abnormal sound data, and taking different elimination measures based on the detection result, it can ensure that relatively clean sound pickup data is obtained. It can also prevent the sound data that the user expects to be recorded to be completely eliminated.
  • the step of obtaining multiple target sound pickup data from the sound pickup data of the multiple microphones includes:
  • the microphone is blocked by detecting multiple microphones, and then the pickup data corresponding to the serial number of the microphone that is not blocked is selected for subsequent formation of a stereo beam, so that the terminal device will not be blocked by the microphone when recording video.
  • the hole causes the sound quality to be significantly reduced, or the stereo is obviously unbalanced, that is, when the microphone is blocked, the stereo recording effect can be guaranteed, and the recording robustness is good.
  • the step of obtaining multiple target sound pickup data from the sound pickup data of the multiple microphones includes:
  • the abnormal sound data in the sound pickup data of the plurality of microphones is eliminated to obtain a plurality of target sound pickup data.
  • the method further includes:
  • the frequency response can be corrected to be flat, thereby obtaining a better stereo recording effect.
  • the method further includes:
  • the low-volume pickup data can be heard clearly, and the large-volume pickup data will not produce clipping and distortion, thereby adjusting the sound recorded by the user to an appropriate volume. , Improve the user's video recording experience.
  • the camera data includes the zoom factor of the activated camera
  • the step of adjusting the gain of the stereo beam includes:
  • the gain of the stereo beam is adjusted according to the zoom factor of the camera.
  • the gain of the stereo beam is adjusted according to the zoom factor of the camera, so that the volume of the target sound source will not decrease due to the distance, thereby improving the sound effect of the recorded video.
  • the number of the microphones is 3 to 6, wherein at least one microphone is arranged on the front of the screen of the terminal device or the back of the terminal device.
  • the number of the microphones is three, one microphone is set on the top and bottom of the terminal device, and one microphone is set on the front of the screen of the terminal device or on the back of the terminal device.
  • the number of the microphones is six, two microphones are respectively arranged on the top and bottom of the terminal device, and one microphone is respectively arranged on the front of the screen of the terminal device and the back of the terminal device.
  • an embodiment of the present invention provides a stereo sound pickup device, which is applied to a terminal device, the terminal device includes a plurality of microphones, and the device includes:
  • a pickup data acquisition module configured to acquire multiple target pickup data from the pickup data of the multiple microphones
  • a device parameter acquisition module which is used to acquire the posture data and camera data of the terminal device
  • the beam parameter determination module is configured to determine a target beam parameter group corresponding to the multiple target sound pickup data from a plurality of pre-stored beam parameter groups according to the attitude data and the camera data; wherein, the target beam The parameter group includes beam parameters corresponding to each of the multiple target sound pickup data;
  • the beam forming module is configured to form a stereo beam according to the target beam parameter group and the multiple target sound pickup data.
  • an embodiment of the present invention provides a terminal device, including a memory storing a computer program and a processor.
  • a terminal device including a memory storing a computer program and a processor.
  • the computer program is read and executed by the processor, the implementation is as described in any of the foregoing embodiments. The method described.
  • an embodiment of the present invention provides a computer-readable storage medium on which a computer program is stored.
  • the computer program is read and executed by a processor, the method according to any one of the foregoing embodiments is implemented. .
  • embodiments of the present invention also provide a computer program product, which when the computer program product runs on a computer, causes the computer to execute the method described in any one of the foregoing embodiments.
  • an embodiment of the present invention also provides a chip system, which includes a processor and may also include a memory, configured to implement the method according to any one of the foregoing embodiments.
  • the chip system can be composed of chips, or it can include chips and other discrete devices.
  • FIG. 1 shows a schematic diagram of a hardware structure of a terminal device provided by an embodiment of the present invention
  • FIG. 2 shows a schematic diagram of the layout when the number of microphones on the terminal device is 3 according to an embodiment of the present invention
  • FIG. 3 shows a schematic diagram of the layout when the number of microphones on the terminal device is 6 according to an embodiment of the present invention
  • FIG. 4 shows a schematic flowchart of a stereo sound pickup method provided by an embodiment of the present invention
  • FIG. 5 is a schematic diagram of another flow chart of a stereo sound pickup method provided by an embodiment of the present invention.
  • FIG. 6 shows a schematic diagram of the corresponding first stereo beam when the terminal device is in a landscape state and the rear camera is enabled
  • FIG. 7 shows a schematic diagram of the corresponding first stereo beam when the terminal device is in a landscape state and the front camera is enabled
  • FIG. 8 shows a schematic diagram of the corresponding first stereo beam when the terminal device is in a vertical screen state and the rear camera is enabled
  • FIG. 9 shows a schematic diagram of the corresponding first stereo beam when the terminal device is in a vertical screen state and the front camera is enabled
  • FIG. 10 shows another schematic flowchart of a stereo sound pickup method provided by an embodiment of the present invention.
  • 11a-11c show schematic diagrams of the width of the second stereo beam varying with the zoom factor of the activated camera
  • FIG. 12 shows a schematic flowchart of a sub-step of S201 in FIG. 4;
  • FIG. 13 shows a schematic flowchart of another seed step of S201 in FIG. 4;
  • FIG. 14 shows a schematic flowchart of another sub-step of S201 in FIG. 4;
  • FIG. 15 shows another schematic flowchart of a stereo sound pickup method provided by an embodiment of the present invention.
  • FIG. 16 shows another schematic flowchart of a stereo sound pickup method provided by an embodiment of the present invention.
  • FIG. 17 shows a schematic diagram of a functional module of a stereo sound pickup device provided by an embodiment of the present invention.
  • FIG. 18 shows a schematic diagram of another functional module of a stereo sound pickup device provided by an embodiment of the present invention.
  • FIG. 19 shows a schematic diagram of another functional module of the stereo sound pickup device provided by an embodiment of the present invention.
  • first and “second” and other relational terms are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply one of these entities or operations. There is any such actual relationship or order between.
  • the terms “include”, “include” or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, method, article or device including a series of elements not only includes those elements, but also includes those that are not explicitly listed Other elements of, or also include elements inherent to this process, method, article or equipment. If there are no more restrictions, the element defined by the sentence “including a" does not exclude the existence of other same elements in the process, method, article, or equipment that includes the element.
  • FIG. 1 shows a schematic diagram of a hardware structure of a terminal device.
  • the terminal device may include a processor 110, an internal memory 120, an external memory interface 130, a sensor module 140, a camera 150, a display screen 160, an audio module 170, a speaker 171, a microphone 172, a receiver 173, a headset interface 174, a mobile communication module 180, Wireless communication module 190, USB (Universal Serial Bus) interface 101, charging management module 102, power management module 103, battery 104, buttons 105, motor 106, indicator 107, subscriber identification module (Subscriber Identification Module, SIM) card interface 108, antenna 1, antenna 2, etc.
  • USB Universal Serial Bus
  • FIG. 1 is only an example.
  • the terminal device of the embodiment of the present invention may have more or fewer components than the terminal device shown in FIG. 1, may combine two or more components, or may have different component configurations.
  • the various components shown in FIG. 1 may be implemented in hardware, software, or a combination of hardware and software including one or more signal processing and/or application specific integrated circuits.
  • the processor 110 may include one or more processing units.
  • the processor 110 may include an application processor (Application Processor, AP), a modem processor, a graphics processor (Graphics Processing Unit, GPU), an image signal processor (Image Signal Processor, ISP), a controller, and a memory.
  • Video codec digital signal processor (Digital Signal Processor, DSP), baseband processor, and/or neural network processor (Neural-network Processing Unit, NPU), etc.
  • the different processing units may be independent devices or integrated in one or more processors.
  • the controller can be the nerve center and command center of the terminal device.
  • the controller can generate operation control signals according to the instruction operation code and timing signals, and complete the control of fetching and executing instructions.
  • a memory may also be provided in the processor 110 to store instructions and data.
  • the memory in the processor 110 is a cache memory.
  • the memory can store instructions or data that have just been used or recycled by the processor 110. If the processor 110 needs to use the instruction or data again, it can be directly called from the memory, which avoids repeated access and reduces the waiting time of the processor 110, thereby improving the efficiency of the system.
  • the internal memory 120 may be used to store computer programs and/or data.
  • the internal memory 120 may include a storage program area and a storage data area.
  • the storage program area can store the operating system, at least one application program required by the function (such as sound playback function, image playback function, face recognition function), etc.;
  • the storage data area can store data created during the use of the terminal device (such as audio data, image data) and so on.
  • the processor 110 may execute various functional applications and data processing of the terminal device by running a computer program and/or data stored in the internal memory 120.
  • the terminal device when the computer program and/or data stored in the internal memory 120 are read and run by the processor 110, the terminal device can be enabled to execute the stereo sound pickup method provided in the embodiment of the present invention, so that the terminal device can record different videos Better stereo recording effect can be obtained in the scene.
  • the internal memory 120 may include a high-speed random access memory, and may also include a non-volatile memory.
  • the non-volatile memory may include at least one magnetic disk storage device, flash memory device, Universal Flash Storage (UFS), and so on.
  • the external memory interface 130 may be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the terminal device.
  • the external memory card communicates with the processor 110 through the external memory interface 130 to realize the data storage function. For example, save audio, video and other files in an external memory card.
  • the sensor module 140 may include one or more sensors.
  • acceleration sensor 140A acceleration sensor 140A, gyroscope sensor 140B, distance sensor 140C, pressure sensor 140D, touch sensor 140E, fingerprint sensor 140F, ambient light sensor 140G, bone conduction sensor 140H, proximity light sensor 140J, temperature sensor 140K, air pressure sensor 140L, The magnetic sensor 140M, etc., is not limited to this.
  • the acceleration sensor 140A can perceive changes in acceleration force, such as shaking, falling, rising, falling, and changes in the angle of the handheld terminal device, etc., which can be converted into electrical signals by the acceleration sensor 140A.
  • the acceleration sensor 140A can detect whether the terminal device is in a horizontal screen state or a vertical screen state.
  • the gyro sensor 140B may be used to determine the motion posture of the terminal device.
  • the angular velocity of the terminal device around three axes ie, x, y, and z axes
  • the gyro sensor 140B can be used for image stabilization.
  • the gyro sensor 140B detects the shake angle of the terminal device, calculates the distance that the lens module needs to compensate according to the angle, and allows the lens to counteract the shake of the terminal device through reverse movement to achieve anti-shake.
  • the gyro sensor 140B can also be used for navigation and somatosensory game scenes.
  • the distance sensor 140C may be used to measure distance.
  • the terminal device can measure the distance by infrared or laser. Exemplarily, in a shooting scene, the terminal device may use the distance sensor 140C to measure distance to achieve rapid focusing.
  • the pressure sensor 140D can be used to sense pressure signals and convert the pressure signals into electrical signals.
  • the pressure sensor 140D may be provided on the display screen 160.
  • the capacitive pressure sensor may include at least two parallel plates with conductive materials.
  • the touch sensor 140E is also called “touch panel”.
  • the touch sensor 140E may be disposed on the display screen 160, and the touch sensor 140E and the display screen 160 form a touch screen, which is also called a “touch screen”.
  • the touch sensor 140E is used to detect touch operations acting on or near it.
  • the touch sensor 140E may transmit the detected touch operation to the application processor to determine the type of the touch event, and may provide visual output related to the touch operation through the display screen 160.
  • the touch sensor 140E may also be disposed on the surface of the terminal device, which is different from the position of the display screen 160.
  • the fingerprint sensor 140F can be used to collect fingerprints.
  • the terminal device can use the collected fingerprint characteristics to realize functions such as fingerprint unlocking, accessing the application lock, fingerprint taking pictures, and fingerprint answering calls.
  • the ambient light sensor 140G can be used to sense the brightness of the ambient light.
  • the terminal device can adaptively adjust the brightness of the display screen 160 according to the perceived brightness of the ambient light.
  • the ambient light sensor 140G can also be used to automatically adjust the white balance when taking pictures.
  • the ambient light sensor 140G can also cooperate with the proximity light sensor 140J to detect whether the terminal device is in the pocket to prevent accidental touch.
  • the bone conduction sensor 140H may be used to obtain vibration signals.
  • the bone conduction sensor 140H can acquire the vibration signal of the vibrating bone mass of the human voice.
  • the bone conduction sensor 140H can also contact the human pulse and receive the blood pressure pulse signal.
  • the bone conduction sensor 140H may also be provided in the earphone, combined with the bone conduction earphone.
  • the audio module 170 can parse the voice signal based on the vibration signal of the sound part vibrating bone block obtained by the bone conduction sensor 140H to realize the voice function.
  • the application processor can analyze the heart rate information based on the blood pressure beating signal obtained by the bone conduction sensor 140H, and realize the heart rate detection function.
  • the proximity light sensor 140J may include, for example, a light emitting diode (LED) and a light detector, such as a photodiode.
  • the light emitting diode may be an infrared light emitting diode.
  • the terminal device emits infrared light to the outside through the light emitting diode.
  • Terminal equipment uses photodiodes to detect infrared reflected light from nearby objects. When sufficient reflected light is detected, it can be determined that there is an object near the terminal device. When insufficient reflected light is detected, the terminal device can determine that there is no object near the terminal device.
  • the terminal device can use the proximity light sensor 140J to detect that the user holds the terminal device close to the ear to talk, so as to automatically turn off the screen to save power.
  • the temperature sensor 140K can be used to detect temperature.
  • the terminal device uses the temperature detected by the temperature sensor 140K to execute the temperature processing strategy. For example, when the temperature reported by the temperature sensor 140K exceeds a threshold value, the terminal device executes to reduce the performance of the processor located near the temperature sensor 140K, so as to reduce power consumption and implement thermal protection.
  • the terminal device when the temperature is lower than another threshold, the terminal device heats the battery 104 to avoid abnormal shutdown of the terminal device due to low temperature.
  • the terminal device boosts the output voltage of the battery 104 to avoid abnormal shutdown caused by low temperature.
  • the air pressure sensor 140L can be used to measure air pressure.
  • the terminal device calculates the altitude based on the air pressure value measured by the air pressure sensor 140L to assist positioning and navigation.
  • the magnetic sensor 140M may include a Hall sensor.
  • the terminal device can use the magnetic sensor 140M to detect the opening and closing of the flip holster.
  • the terminal device when the terminal device is a flip machine, the terminal device can detect the opening and closing of the flip cover according to the magnetic sensor 140M, and then set the flip cover to automatically unlock according to the detected opening and closing state of the leather case or the opening and closing state of the flip cover, etc. characteristic.
  • the camera 150 is used to capture images or videos.
  • the object generates an optical image through the lens and is projected to the photosensitive element.
  • the photosensitive element can be a Charge Coupled Device (CCD) or a Complementary Metal-Oxide-Semiconductor (CMOS) phototransistor.
  • CCD Charge Coupled Device
  • CMOS Complementary Metal-Oxide-Semiconductor
  • the photosensitive element converts the light signal into an electric signal, and then transfers the electric signal to the ISP to convert it into a digital image signal.
  • the ISP outputs the digital image signal to the DSP for processing, and the DSP converts the digital image signal into standard RGB, YUV and other formats. Signal.
  • the terminal device may include one or more cameras 150, which is not limited.
  • the terminal device includes two cameras 150, such as one front camera and one rear camera; in another example, the terminal device includes five cameras 150, such as three rear cameras and two front cameras .
  • the terminal device can realize the shooting function through the ISP, the camera 150, the video codec, the GPU, the display screen 160, and the application processor.
  • the display screen 160 is used to display images, videos, and the like.
  • the display screen 160 includes a display panel, and the display panel can be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active matrix organic light-emitting diode or an active matrix organic light-emitting diode Body (Active-Matrix Organic Light Emitting Diode, AMOLED), Flexible Light-Emitting Diode (FLED), Miniled, MicroLed, Micro-oLed, Quantum Dot Light Emitting Diodes (QLED), etc.
  • the terminal device may implement a display function through a GPU, a display screen 160, an application processor, and the like.
  • the terminal device can implement audio functions through the audio module 170, the speaker 171, the microphone 172, the receiver 173, the earphone interface 174, and the application processor. For example, audio playback, recording, etc.
  • the audio module 170 is used to convert digital audio information into an analog audio signal for output, and is also used to convert an analog audio input into a digital audio signal.
  • the audio module 170 can also be used to encode and decode audio signals.
  • the audio module 170 may be provided in the processor 110, or part of the functional modules of the audio module 170 may be provided in the processor 110.
  • the speaker 171 also called “speaker” is used to convert audio electrical signals into sound signals.
  • the terminal device can play music, give voice prompts, etc. through the speaker 171.
  • the microphone 172 also called “microphone” or “microphone”, is used to collect sounds (such as ambient sounds, including sounds made by people, sounds made by equipment, etc.), and convert the sound signals into audio electrical signals, that is, in this embodiment Pickup data in.
  • the terminal device can be provided with multiple microphones 172. By arranging multiple microphones 172 on the terminal device, the user can obtain high-quality stereo recording effects when using the terminal device to record video.
  • the number of microphones 172 provided on the terminal device can be 3 to 6, wherein at least one microphone 172 is provided on the front of the screen of the terminal device or the back of the terminal device to ensure that it can be formed to point to the front and back direction of the terminal device.
  • Stereo beam
  • the number of microphones 172 when the number of microphones is three, one microphone (ie m1 and m2) is set on the top and bottom of the terminal device, and one microphone is set on the front of the screen of the terminal device or on the back of the terminal device ( That is m3); as shown in Figure 3, when the number of microphones is 6, two microphones (ie m1, m2, and m3, m4) are set on the top and bottom of the terminal device, and the front of the screen of the terminal device and the terminal device Set a microphone (namely m5 and m6) on the back of the. It can be understood that, in other embodiments, the number of microphones 172 may also be 4 or 5, and at least one microphone 172 is provided on the front of the screen of the terminal device or the back of the terminal device.
  • the receiver 173 also called “earpiece” is used to convert audio electrical signals into sound signals.
  • the terminal device answers a call or voice message, it can receive the voice by bringing the receiver 173 close to the human ear.
  • the earphone interface 174 is used to connect wired earphones.
  • the earphone interface 174 may be a USB interface, or a 3.5mm Open Mobile Terminal Platform (OMTP) standard interface, or a Cellular Telecommunications Industry Association (Cellular Telecommunications Industry Association of the USA, CTIA) standard interface.
  • OMTP Open Mobile Terminal Platform
  • CTIA Cellular Telecommunications Industry Association
  • the wireless communication function of the terminal device can be implemented by the antenna 1, the antenna 2, the mobile communication module 180, the wireless communication module 190, the modem processor, and the baseband processor.
  • the antenna 1 and the antenna 2 are used to transmit and receive electromagnetic wave signals.
  • Each antenna in the terminal device can be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization.
  • antenna 1 can be multiplexed as a diversity antenna of a wireless local area network.
  • the antenna can be used in combination with a tuning switch.
  • the mobile communication module 180 can provide wireless communication solutions including 2G/3G/4G/5G, etc., which are applied to terminal devices.
  • the mobile communication module 180 may include at least one filter, a switch, a power amplifier, a low noise amplifier (LNA), and the like.
  • the mobile communication module 180 can receive electromagnetic waves by the antenna 1, and perform processing such as filtering and amplifying the received electromagnetic waves, and transmit them to the modem processor for demodulation.
  • the mobile communication module 180 can also amplify the signal modulated by the modem processor, and convert it into electromagnetic waves for radiation via the antenna 1.
  • at least part of the functional modules of the mobile communication module 180 may be provided in the processor 110.
  • at least part of the functional modules of the mobile communication module 180 and at least part of the modules of the processor 110 may be provided in the same device.
  • the modem processor may include a modulator and a demodulator.
  • the modulator is used to modulate the low frequency baseband signal to be sent into a medium and high frequency signal
  • the demodulator is used to demodulate the received electromagnetic wave signal into a low frequency baseband signal.
  • the demodulator then transmits the demodulated low-frequency baseband signal to the baseband processor for processing.
  • the application processor outputs a sound signal through an audio device (not limited to the speaker 171, the receiver 173, etc.), or displays an image or video through the display screen 160.
  • the modem processor may be an independent device.
  • the modem processor may be independent of the processor 110 and be provided in the same device as the mobile communication module 180 or other functional modules.
  • the wireless communication module 190 can provide applications on terminal devices including Wireless Local Area Networks (WLAN) (such as Wireless Fidelity (Wi-Fi) networks), Bluetooth (BitTorrent, BT), and global navigation satellite systems. (Global Navigation Satellite System, GNSS), Frequency Modulation (FM), Near Field Communication (NFC), Infrared Technology (Infrared Radiation, IR) and other wireless communication solutions.
  • the wireless communication module 190 may be one or more devices integrating at least one communication processing module.
  • the wireless communication module 190 receives electromagnetic waves via the antenna 2, frequency modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 110.
  • the wireless communication module 190 may also receive the signal to be sent from the processor 110, perform frequency modulation and amplification processing on it, and convert it into electromagnetic waves to radiate through the antenna 2.
  • the antenna 1 of the terminal device is coupled with the mobile communication module 180, and the antenna 2 is coupled with the wireless communication module 190, so that the terminal device can communicate with the network and other devices through wireless communication technology.
  • the wireless communication technology may include Global System for Mobile Communication (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (CDMA), and broadband code Division Multiple Access (Wideband Code Division Multiple Access, WCDMA), Time Division Code Division Multiple Access (Time Division-Synchronous Code Division Multiple Access, TD-SCDMA), Long Term Evolution (LTE), BT, GNSS, WLAN, NFC , FM, and/or IR technology, etc.
  • GSM Global System for Mobile Communication
  • GPRS General Packet Radio Service
  • CDMA Code Division Multiple Access
  • WCDMA Wideband Code Division Multiple Access
  • WCDMA Wideband Code Division Multiple Access
  • Time Division Code Division Multiple Access Time Division-Synchronous Code Division Multiple Access
  • TD-SCDMA Time Division Code Division Multiple Access
  • LTE Long Term Evolution
  • GNSS can include Global Positioning System (GPS), Global Navigation Satellite System (GLONASS), BeiDou Navigation Satellite System (BDS), Quasi-Zenith Satellite System (Quasi-Zenith Satellite System, QZSS) and/or Satellite Based Augmentation System (SBAS).
  • GPS Global Positioning System
  • GLONASS Global Navigation Satellite System
  • BDS BeiDou Navigation Satellite System
  • QZSS Quasi-Zenith Satellite System
  • SBAS Satellite Based Augmentation System
  • the USB interface 101 is an interface that complies with the USB standard specifications, and specifically may be a Mini USB interface, a Micro USB interface, a USB Type C interface, and so on.
  • the USB interface 101 can be used to connect a charger to charge the terminal device, and can also be used to transfer data between the terminal device and peripheral devices. It can also be used to connect earphones and play sound through earphones.
  • the USB interface 101 can also be used to connect to other terminal devices, such as AR (Augmented Reality) devices, computers, and so on.
  • the charging management module 102 is used to receive charging input from the charger.
  • the charger can be a wireless charger or a wired charger.
  • the charging management module 102 may receive the charging input of the wired charger through the USB interface 101.
  • the charging management module 102 may receive a wireless charging input through a wireless charging coil of the terminal device. While the charging management module 102 charges the battery 104, it can also supply power to the terminal device through the power management module 103.
  • the power management module 103 is used to connect the battery 104, the charging management module 102, and the processor 110.
  • the power management module 103 receives input from the battery 104 and/or the charging management module 102, and supplies power to the processor 110, the internal memory 120, the camera 150, the display screen 160, and the like.
  • the power management module 103 can also be used to monitor parameters such as battery capacity, battery cycle times, and battery health status (leakage, impedance).
  • the power management module 103 may be provided in the processor 110. In other embodiments, the power management module 103 and the charging management module 102 may also be provided in the same device.
  • the button 105 includes a power-on button, a volume button, and so on.
  • the button 105 may be a mechanical button or a touch button.
  • the terminal device can receive key input, and generate key signal input related to the user settings and function control of the terminal device.
  • the motor 106 can generate vibration prompts.
  • the motor 106 can be used for incoming call vibrating prompts, and can also be used for touch vibration feedback.
  • touch operations for different applications can correspond to different vibration feedback effects.
  • Different application scenarios for example: time reminding, receiving information, alarm clock, games, etc.
  • the touch vibration feedback effect can also support customization.
  • the indicator 107 can be an indicator light, which can be used to indicate the charging status, power change, and can also be used to indicate messages, missed calls, notifications, etc.
  • the SIM card interface 108 is used to connect to a SIM card.
  • the SIM card can be inserted into the SIM card interface 108 or pulled out from the SIM card interface 108 to achieve contact and separation with the terminal device.
  • the terminal device can support one or more SIM card interfaces.
  • the SIM card interface 108 may support Nano SIM cards, Micro SIM cards, SIM cards, etc.
  • the same SIM card interface 108 can insert multiple cards at the same time. The types of multiple cards can be the same or different.
  • the SIM card interface 108 can also be compatible with different types of SIM cards.
  • the SIM card interface 108 may also be compatible with external memory cards.
  • the terminal equipment interacts with the network through the SIM card to realize functions such as call and data communication.
  • the terminal device adopts an eSIM, that is, an embedded SIM card.
  • the eSIM card can be embedded in the terminal device and cannot be separated from the terminal device.
  • the stereo sound pickup method provided by the embodiment of the present invention uses the posture data of the terminal device and the camera data to determine the target beam parameter group, and combines the target sound pickup data picked up by the microphone to form a stereo beam. Since different attitude data and camera data determine different target beam parameter groups, different target beam parameter groups can be used to adjust the direction of the stereo beam, thereby effectively reducing the impact of noise in the recording environment, allowing the terminal equipment to record different videos Better stereo recording effect can be obtained in the scene. In addition, by detecting the blocking of the microphone, eliminating various abnormal sound data, correcting the tone of the stereo beam, and adjusting the gain of the stereo beam, while ensuring a good stereo recording effect, the robustness of the recording is further enhanced.
  • FIG. 4 is a schematic flowchart of a stereo sound pickup method provided by an embodiment of the present invention.
  • the stereo sound pickup method can be implemented on a terminal device having the above-mentioned hardware structure. Please refer to FIG. 4, the stereo sound pickup method may include the following steps:
  • S201 Acquire multiple target sound pickup data from the sound pickup data of multiple microphones.
  • the terminal device may collect sound through multiple microphones provided thereon, and then obtain multiple target sound pickup data from the sound pickup data of the multiple microphones.
  • the multiple target sound pickup data can be obtained directly according to the pickup data of the multiple microphones, or can be obtained by selecting the pickup data of some of the multiple microphones according to a certain rule, or it can be obtained by combining multiple microphones.
  • the sound pickup data is obtained after processing in a certain way, and there is no restriction on this.
  • the posture data of the terminal device can be obtained through the aforementioned acceleration sensor 140A, and the posture data can indicate that the terminal device is in a horizontal screen state or a vertical screen state; the camera data can be understood as the user using the terminal device to record video During the process, the usage situation corresponding to the camera set on the terminal device.
  • S203 Determine a target beam parameter group corresponding to the multiple target sound pickup data from the multiple pre-stored beam parameter groups according to the attitude data and the camera data; wherein the target beam parameter group includes the respective beams corresponding to the multiple target sound pickup data. parameter.
  • the beam parameter group can be obtained through pre-training and stored in the terminal device, and it includes several parameters that affect stereo beam forming.
  • the posture data and camera data corresponding to the terminal device can be determined in advance for the video recording scene that the terminal device may be in, and a matching beam parameter group can be set based on the posture data and camera data.
  • a matching beam parameter group can be set based on the posture data and camera data.
  • multiple beam parameter groups can be obtained, respectively corresponding to different video recording scenes, and the multiple beam parameter groups are stored in the terminal device for subsequent use in video recording. For example, when a user uses a terminal device to take a picture or record a video, the terminal device can determine a matching target beam parameter group from multiple beam parameter groups based on the currently acquired posture data and camera data.
  • the posture data and camera data corresponding to the terminal device will change accordingly, so based on the posture data and camera data, different target beams can be determined from multiple beam parameter groups.
  • the parameter group that is, the beam parameters corresponding to the multiple target sound pickup data will change with the different video recording scenes.
  • the beam parameters in the target beam parameter group can be understood as a weight value.
  • each target sound pickup data and the corresponding weight can be used Values are weighted and summed to finally get a stereo beam.
  • the stereo beam has spatial directivity
  • different levels of suppression can be achieved on the pickup data outside the spatial direction of the stereo beam pointing, thereby effectively reducing the recording environment.
  • Noise impact since the beam parameters corresponding to multiple target sound pickup data will change with different video recording scenes, the direction of the stereo beam formed by the target beam parameter group and the multiple target sound pickup data will also follow The video recording scene changes and changes, so that the terminal device can obtain a better stereo recording effect in different video recording scenes.
  • the camera data of the terminal device may include activation data, and the activation data is used to characterize the activated camera. As shown in FIG.
  • the above step S203 may include sub-step S203-1: determining a first target beam parameter group corresponding to multiple target sound pickup data from a plurality of pre-stored beam parameter groups according to the attitude data and the activation data;
  • the above step S204 may include a sub-step S204-1: forming a first stereo beam according to the first target beam parameter group and multiple target sound pickup data, where the first stereo beam points to the shooting direction of the activated camera.
  • the terminal device when it is in a different video recording scene, it needs to correspond to different beam parameter groups, so multiple beam parameter groups can be pre-stored in the terminal device.
  • the plurality of beam parameter groups may include a first beam parameter group, a second beam parameter group, a third beam parameter group, and a fourth beam parameter group, the first beam parameter group, the second beam parameter group, and the first beam parameter group.
  • the beam parameters in the three-beam parameter group and the fourth beam parameter group are different.
  • the first target The beam parameter group is the first beam parameter group; when the posture data characterizes the terminal device in the landscape state and the enable data characterizes the front camera is enabled, the first target beam parameter group is the second beam parameter group; when the posture data characterizes the terminal When the device is in the vertical screen state, and the data characterization rear camera is enabled, the first target beam parameter group is the third beam parameter group; when the posture data characterization terminal device is in the vertical screen state, and the data characterization front camera is enabled When, the first target beam parameter group is the fourth beam parameter group.
  • the direction of the first stereo beam changes according to the switching of the horizontal and vertical screen states of the terminal device and the activation of the front and rear cameras.
  • the terminal device in Figure 6 is in the landscape state and the rear camera is enabled for shooting
  • the terminal device in Figure 7 is in the landscape state and the front camera is enabled for shooting
  • the terminal device in Figure 8 is in the portrait state and enabled
  • the rear camera is used for shooting
  • the terminal device in Figure 9 is in a vertical screen state and the front camera is enabled for shooting.
  • the left and right arrows indicate the directions of the left and right beams respectively.
  • the first stereo beam can be understood as a composite beam of the left and right beams; the horizontal plane refers to the current shooting attitude (horizontal) of the terminal device.
  • the vertical side of the vertical plane in the screen state or the vertical screen state), and the main axis of the formed first stereo beam is located in the horizontal plane.
  • the direction of the first stereo beam will also change accordingly.
  • the main axis of the first stereo beam shown in FIG. 6 is located on a horizontal plane perpendicular to the vertical side of the terminal device in the horizontal screen state.
  • the main axis of the first stereo beam is located on the horizontal plane.
  • the vertical side is vertical on the horizontal plane, as shown in Figure 8.
  • the shooting direction of the activated camera is generally the direction that the user needs to focus on collecting sound
  • the direction of the first stereo beam will also change with the shooting direction of the activated camera. For example, in FIGS. 6 and 8, the direction of the first stereo beam all points to the shooting direction of the rear camera, and in FIGS. 7 and 9, the direction of the first stereo beam all points to the shooting direction of the front camera.
  • the multiple target pickup data will correspond to different first target beam parameter groups, and then form first stereo beams in different directions, so that the direction of the first stereo beam is based on the terminal device
  • the switching of the horizontal and vertical screen status and the activation of the front and rear cameras are adaptively adjusted to ensure that the terminal device can obtain a better stereo recording effect when recording video.
  • the camera data may include the aforementioned activation data and zoom data, where the zoom data is the zoom factor of the activated camera represented by the activation data.
  • the above step S203 may include sub-step S203-2: determine the second target beam corresponding to the multiple target sound pickup data from the multiple beam parameter groups stored in advance according to the attitude data, the activation data, and the zoom data.
  • step S204 may include sub-step S204-2: forming a second stereo beam according to the second target beam parameter group and multiple target pickup data; wherein the second stereo beam points to the shooting direction of the activated camera, and The width of the second stereo beam narrows as the zoom factor increases.
  • the width of the second stereo beam becomes narrower with the increase of the zoom factor of the enabled camera, which can make the sound image more concentrated, because when the user uses the zoom, it is often a long-distance sound pickup scene.
  • the signal-to-noise ratio is lower, and the signal-to-noise ratio can be improved through the narrowing of the second stereo beam, so that the terminal device has better recording robustness in the case of a low signal-to-noise ratio, thereby obtaining a better stereo recording effect.
  • the second stereo beam in order to realize that the width of the second stereo beam becomes narrower as the zoom factor of the activated camera increases, the second stereo beam can be preset under different attitude data, activation data, and zoom data.
  • Corresponding target shape and then use the least squares method to train to obtain a matching beam parameter set, so that the second stereo beam formed according to the beam parameter set is similar to the set target shape, so as to obtain different attitude data, activation data and zoom data. Under the corresponding beam parameter group.
  • the terminal device can be matched to the second target beam parameter group corresponding to different zoom factors, and then based on the second target beam parameter group and multiple targets.
  • the audio data forms a second stereo beam of different widths to meet the user's video recording needs.
  • FIGS. 11a-11c it is a schematic diagram of the width of the second stereo beam changing with the zoom factor of the activated camera.
  • the second stereo beam is a composite beam of the left and right beams
  • the 0-degree direction is the shooting direction of the camera that is activated when the user records the video (also referred to as the target direction).
  • the terminal device can match the second target beam parameter group corresponding to the low zoom factor to form a wider second stereo beam as shown in FIG. 11a; among them, the left and right beams in FIG. 11a The right beams are respectively directed at 45 degrees to the left and right of the shooting direction.
  • the terminal device can match the second target beam parameter group corresponding to the medium zoom factor, and then form the narrowed second stereo beam as shown in Figure 11b; where, the left and right in Figure 11b The direction of the beam is narrowed to around 30 degrees to the left and right of the shooting direction.
  • the terminal device can match the second target beam parameter group corresponding to the high zoom factor to form a further narrower second stereo beam as shown in Figure 11c; among them, the left and right in Figure 11c The direction of the right beam is further narrowed to around 10 degrees to the left and right of the shooting direction.
  • the second target beam parameter group is formed to form a second stereo beam with different directions and widths, so that the direction and width of the second stereo beam can be adaptively adjusted with the posture of the terminal device, the enabled camera and the change of the zoom factor , So it can achieve better recording robustness in noisy environments and long-distance sound pickup conditions.
  • the stereo recording effect will not only be interfered by environmental noise, but it is also easy to cause the user to hold the terminal device and block the microphone with his fingers or other parts, or to enter the guide due to dirt.
  • the microphone blocking problem caused by the sound hole is affected; and as the function of the terminal device becomes more and more powerful, the self-noise of the terminal device (that is, the noise generated by the internal circuit of the terminal device) is also more and more likely to be picked up by the microphone, such as Camera motor noise, WiFi interference sound, noise caused by capacitor charging and discharging, etc.; in addition, the user’s fingers or other parts will touch the screen or rub near the microphone hole due to zooming or other operations during the camera, resulting in something that is not what the user expects Abnormal sound recorded.
  • These self-noise or abnormal sound interferences affect the stereo recording effect of the video to a certain extent.
  • this embodiment proposes that after the sound pickup data of multiple microphones is acquired, the microphone blocking detection is performed on the multiple microphones and the abnormal sound processing is performed on the sound pickup data of the multiple microphones to determine the stereo beam.
  • Multiple target pickup data can still achieve better recording robustness in the case of abnormal sound interference and/or microphone plugging, so as to ensure a good stereo recording effect.
  • the process of acquiring multiple target sound pickup data will be described in detail.
  • S201 includes the following sub-steps:
  • S2011-A Obtain the serial number of the microphone without microphone jam based on the sound pickup data of multiple microphones.
  • the terminal device after the terminal device obtains the sound pickup data of multiple microphones, by performing time domain framing processing and frequency domain transformation processing on the sound pickup data of each microphone, the corresponding sound pickup data of each microphone can be obtained.
  • Time domain information and frequency domain information compare the time domain information and frequency domain information corresponding to the pickup data of different microphones respectively, and obtain the time domain comparison result and the frequency domain comparison result, according to the time domain comparison result and frequency domain comparison result Determine the serial number of the microphone that has jammed, and determine the serial number of the microphone that has not jammed based on the serial number of the microphone that has jammed.
  • the same time domain information does not mean that the two signals are exactly the same. The signal needs to be further analyzed from the perspective of the frequency domain.
  • the sound pickup data of the microphone is obtained from the time domain and the frequency domain. Analyzing these two different angles can effectively improve the accuracy of microphone jam detection and avoid the misjudgment of microphone jamming caused by analysis from a single angle.
  • the time domain information may be the RMS (Root-Mean-Square) value of the time domain signal corresponding to the pickup data
  • the frequency domain information may be the frequency domain signal corresponding to the pickup data at the set frequency. (For example, 2KHz) above the RMS value of the high-frequency part, the RMS value of the high-frequency part is more obvious when the microphone is blocked.
  • the RMS value of the time domain signal and the RMS value of the high-frequency part of the pickup data of the microphone that has blocked the microphone and the microphone that has not occurred will be both There are differences. Even between microphones without microphone blocking, due to the influence of factors such as the structure of the microphone and the shielding of the terminal device housing, there will be slight differences between the RMS value of the time domain signal and the RMS value of the high frequency part. Therefore, in the research and development stage of terminal equipment, it is necessary to find the difference between microphones with and without microphone jams, and set the corresponding time domain threshold and frequency domain threshold according to the difference, which are used to compare the time domain.
  • the time domain threshold and the frequency domain threshold may be empirical values obtained by those skilled in the art through experiments.
  • the serial numbers of the 3 microphones are m1, m2, and m3 respectively.
  • the RMS values of the time domain signals corresponding to the pickup data of the 3 microphones are A1, A2, and A3, respectively.
  • the RMS values of the high-frequency parts corresponding to the pickup data of two microphones are B1, B2, and B3 respectively; when comparing the time domain information corresponding to the pickup data of the three microphones in the time domain, A1 and A2 can be calculated separately , The difference between A1 and A3, A2 and A3, and compare the difference with the set time domain threshold.
  • the time domain corresponding to the pickup data of the two microphones is considered The information is consistent; when the difference is higher than the time domain threshold, the time domain information corresponding to the pickup data of the two microphones is considered to be inconsistent, and the magnitude relationship of the time domain information corresponding to the pickup data of the two microphones is determined;
  • the difference between B1 and B2, B1 and B3, and B2 and B3 can be calculated respectively, and the difference is compared with the set frequency domain The threshold is compared.
  • the frequency domain information corresponding to the pickup data of the two microphones is considered to be the same; when the difference is higher than the frequency domain threshold, the pickup data of the two microphones are considered to correspond
  • the frequency domain information of the two microphones is inconsistent, and the magnitude relationship of the frequency domain information corresponding to the pickup data of the two microphones is determined.
  • the time domain comparison result when combining the time domain comparison result and the frequency domain comparison result to determine whether the microphone is blocked, if you want to detect the blocked microphone as much as possible, you can use the time domain information and frequency domain information of the two microphones. One of them is inconsistent, to determine the microphone blocking the microphone.
  • the frequency domain transformation process can be performed on the pickup data of each microphone, and the frequency domain information corresponding to the pickup data of each microphone can be obtained. According to the pre-trained abnormal sound detection network and the pickup of each microphone The frequency domain information corresponding to the data detects whether there is abnormal sound data in the sound pickup data of each microphone.
  • the pre-trained abnormal sound detection network can be performed by collecting a large amount of abnormal sound data (for example, some sound data with a specific frequency) in the research and development stage of the terminal device, and using AI (Artificial Intelligence) algorithm to perform characteristics. Learned.
  • AI Artificial Intelligence
  • the frequency domain information corresponding to the sound pickup data of each microphone is input into the pre-trained abnormal sound detection network, and the detection result of whether there is abnormal sound data can be obtained.
  • the abnormal sound data may include the self-noise of the terminal device, the user’s finger touching the screen or rubbing the microphone hole and other abnormal sounds.
  • the abnormal sound data can be eliminated by using AI algorithms combined with time domain filtering and frequency domain filtering. To process.
  • the frequency of the abnormal sound data can be reduced by gain, that is, multiplied by a value between 0 and 1, to achieve the purpose of eliminating the abnormal sound data or reducing the intensity of the abnormal sound data.
  • a pre-trained sound detection network can be used to detect whether there is preset sound data in the abnormal sound data, where the pre-trained sound detection network can be obtained by feature learning using an AI algorithm, and the preset sound data It can be understood as the non-noise data that the user expects to record, such as speech, music, etc.
  • the pre-trained voice detection network has the non-noise data that the user expects to record, the abnormal sound data is not eliminated, and only needs to be reduced.
  • the intensity of the abnormal sound data (for example, multiplied by a value of 0.5); when there is no non-noise data that the user expects to record using the pre-trained sound detection network, the abnormal sound data is directly eliminated (for example, multiplied by a value of 0) .
  • S2014-A Select the sound pickup data corresponding to the serial number of the microphone that has not blocked the microphone from the initial target sound pickup data as multiple target sound pickup data.
  • the serial number of the microphone that has jammed is m1
  • the serial number of the microphone that has not jammed is m2 and m3
  • the serial number can be selected from the initial target pickup data
  • the sound pickup data corresponding to m2 and m3 are used as target sound pickup data, and multiple target sound pickup data are obtained for subsequent formation of stereo beams.
  • S2011-A can be executed before S2012-A, can also be executed after S2012-A, and can also be executed at the same time as S2012-A; that is, this embodiment does not detect blocked microphones and abnormal sounds.
  • the order of data processing is restricted.
  • the microphone jam detection and the abnormal sound processing of the microphone pickup data it is possible to determine multiple target pickup data used to form a stereo beam.
  • the user uses the terminal device to record video, even if there is The microphone hole is blocked and there is abnormal sound data in the microphone's pickup data, which can still ensure a good stereo recording effect, thereby achieving better recording robustness.
  • S201 when multiple target pickup data for forming stereo beams are determined by detecting microphone jamming, S201 includes the following sub-steps:
  • S2011-B Obtain the serial number of the microphone without microphone jam based on the sound pickup data of multiple microphones.
  • S2011-B can refer to the aforementioned S2011-A, which will not be repeated here.
  • S2012-B Select the sound pickup data corresponding to the serial number of the microphone without microphone jam from the sound pickup data of the multiple microphones as the multiple target sound pickup data.
  • the serial number of the microphone that has jammed is m1
  • the serial number of the microphone that has not jammed is m2 and m3
  • the sound pickup data of the microphones with serial numbers m2 and m3 are selected as the target sound pickup data, and multiple target sound pickup data are obtained.
  • the terminal device obtains the sound pickup data of multiple microphones, it performs a blocking detection on the multiple microphones according to the sound pickup data of the multiple microphones, and obtains the failure
  • the serial number of the microphone that is clogged, and the pickup data corresponding to the serial number of the microphone that is not clogged is selected for subsequent formation of a stereo beam.
  • S201 when multiple target sound pickup data for forming a stereo beam are determined by performing abnormal sound processing on the sound pickup data of the microphone, S201 includes the following sub-steps:
  • S2011-C can refer to the aforementioned S2012-A, which will not be repeated here.
  • the terminal device after the terminal device obtains the sound pickup data of multiple microphones, it can obtain relatively "clean" sound pickup data by performing abnormal sound detection and abnormal sound elimination processing on the sound pickup data of the multiple microphones. (I.e. multiple target pickup data) for subsequent formation of stereo beams.
  • abnormal sound data such as finger rubbing on the microphone and various self-noises of the terminal device on the stereo recording effect is effectively reduced when the terminal device records the video.
  • the stereo sound pickup method further includes the following steps:
  • the frequency response can be corrected to be flat, so as to obtain a better stereo recording effect.
  • gain control may also be performed on the generated stereo beam.
  • the stereo sound pickup method further includes the following steps:
  • the low-volume pickup data can be heard clearly, and the high-volume pickup data will not produce clipping distortion, so as to adjust the sound recorded by the user to an appropriate volume and improve the user's video recording Experience.
  • the present embodiment proposes to adjust the gain of the stereo beam according to the zoom factor of the camera.
  • the zoom factor increases, the amount of gain enlargement also increases, thereby ensuring the target of the long-distance sound pickup scene
  • the volume of the sound source is still clear and loud.
  • the terminal device after the terminal device forms a stereo beam according to the target beam parameter group and multiple target pickup data, it can first perform tone correction on the stereo beam, and then adjust the gain of the stereo beam , In order to get a better stereo recording effect.
  • FIG. 17 is a functional block diagram of a stereo sound pickup device provided by an embodiment of the present invention. It should be noted that the basic principles and technical effects of the stereo sound pickup device provided by this embodiment are the same as those of the above embodiment. For a brief description, for the parts not mentioned in this embodiment, please refer to the above embodiment. In the corresponding content.
  • the stereo sound pickup device includes: a pickup data acquisition module 510, a device parameter acquisition module 520, a beam parameter determination module 530, and a beam forming module 540.
  • the pickup data acquisition module 510 is used to acquire multiple target pickup data from the pickup data of multiple microphones.
  • the pickup data acquisition module 510 can execute the above S201.
  • the device parameter acquisition module 520 is used to acquire the posture data and camera data of the terminal device.
  • the device parameter acquisition module 520 can execute the foregoing S202.
  • the beam parameter determination module 530 is configured to determine a target beam parameter group corresponding to multiple target sound pickup data from a plurality of pre-stored beam parameter groups according to attitude data and camera data; wherein, the target beam parameter group includes multiple target pickup data.
  • the beam parameters corresponding to the sound data are configured to determine a target beam parameter group corresponding to multiple target sound pickup data from a plurality of pre-stored beam parameter groups according to attitude data and camera data; wherein, the target beam parameter group includes multiple target pickup data. The beam parameters corresponding to the sound data.
  • the beam parameter determining module 530 can perform the foregoing S203.
  • the beam forming module 540 is used to form a stereo beam according to the target beam parameter group and multiple target sound pickup data.
  • the beam forming module 540 can perform the foregoing S204.
  • the camera data may include activation data, which characterizes the activated camera
  • the beam parameter determination module 530 is configured to determine, according to the posture data and the activation data, from a plurality of beam parameter groups stored in advance.
  • the first target beam parameter group corresponding to the target sound pickup data.
  • the beam forming module 540 may form a first stereo beam according to the first target beam parameter group and multiple target sound pickup data; wherein, the first stereo beam is directed to the shooting direction of the activated camera.
  • the multiple beam parameter groups include a first beam parameter group, a second beam parameter group, a third beam parameter group, and a fourth beam parameter group, the first beam parameter group, the second beam parameter group, and the third beam parameter group
  • the beam parameters in the group and the fourth beam parameter group are different.
  • the first target beam parameter group is the first beam parameter group; when the posture data characterizes the terminal device in the landscape state and is enabled, the first target beam parameter group is the second beam parameter group; when the posture data characterization terminal device is in the vertical screen state, and the data characterization rear camera is enabled, the first target beam parameter group It is the third beam parameter group; when the posture data characterizes that the terminal device is in the vertical screen state, and the activation data characterizes that the front camera is activated, the first target beam parameter group is the fourth beam parameter group.
  • the beam parameter determining module 530 can perform the above S203-1, and the beam forming module 540 can perform the above S204-1.
  • the camera data may include activation data and zoom data, where the zoom data is the zoom factor of the activated camera represented by the activation data, and the beam parameter determination module 530 is configured to perform according to the attitude data, the activation data, and the zoom factor.
  • the data determines a second target beam parameter group corresponding to the plurality of target sound pickup data from a plurality of pre-stored beam parameter groups.
  • the beam forming module 540 can form a second stereo beam according to the second target beam parameter group and multiple target pickup data; wherein, the second stereo beam points to the shooting direction of the activated camera, and the width of the second stereo beam increases with The zoom factor increases and narrows.
  • the beam parameter determining module 530 can perform the above S203-2, and the beam forming module 540 can perform the above S204-2.
  • the pickup data acquisition module 510 may include a blocked microphone detection module 511 and/or an abnormal sound processing module 512, and a target pickup data selection module 513, through the blocked microphone detection module 511 and/or an abnormal sound processing module 512, and the target sound pickup data selection module 513 can obtain multiple target sound pickup data from the sound pickup data of multiple microphones.
  • the microphone jam detection module 511 is configured to detect the sound of the multiple microphones.
  • the data acquires the serial number of the microphone that has not blocked the microphone.
  • the abnormal sound processing module 512 is used to detect whether there is abnormal sound data in the sound pickup data of each microphone, and if there is abnormal sound data, eliminate the sound pickup data of multiple microphones.
  • the target sound pickup data selection module 513 is used to select the sound pickup data corresponding to the serial number of the microphone that has not blocked the microphone from the initial target sound pickup data as multiple target sound pickup data .
  • the blocked microphone detection module 511 is used to perform time domain framing processing and frequency domain transformation processing on the pickup data of each microphone to obtain the time domain information and frequency domain information corresponding to the pickup data of each microphone.
  • the time domain information and frequency domain information corresponding to the pickup data of different microphones are respectively compared, and the time domain comparison result and the frequency domain comparison result are obtained.
  • the serial number of the microphone that has blocked the microphone is determined. Determine the serial number of the microphone that has not blocked the microphone based on the serial number of the microphone that has blocked the microphone.
  • the abnormal sound processing module 512 is used to perform frequency domain transformation processing on the pickup data of each microphone to obtain the frequency domain information corresponding to the pickup data of each microphone, according to the pre-trained abnormal sound detection network and the pickup of each microphone.
  • the frequency domain information corresponding to the sound data detects whether there is abnormal sound data in the sound pickup data of each microphone.
  • the pre-trained sound detection network can be used to detect whether there is preset sound data in the abnormal sound data. If there is no preset sound data, the abnormal sound data is eliminated. If there is a preset sound Data, reduce the intensity of abnormal sound data.
  • the microphone jam detection module 511 is used to obtain the non-occurring microphone jam based on the sound pickup data of the multiple microphones.
  • the target sound pickup data selection module 513 selects the sound pickup data corresponding to the serial number of the microphone that has not blocked the microphone from the sound pickup data of the multiple microphones as the multiple target sound pickup data.
  • the abnormal sound processing module 512 is used to detect whether there is an abnormal sound in the sound pickup data of each microphone. Data, if there is abnormal sound data, the abnormal sound data in the sound pickup data of a plurality of microphones is eliminated to obtain a plurality of target sound pickup data.
  • the blocked microphone detection module 511 can execute the aforementioned S2011-A and S2011-B; the abnormal sound processing module 512 can execute the aforementioned S2012-A, S2013-A, and S2011-C; the target pickup data selection module 513 can Perform the above S2014-A, S2012-B, and S2012-C.
  • the stereo sound pickup device may further include a tone color correction module 550 and a gain control module 560.
  • the timbre correction module 550 is used to correct the timbre of the stereo beam.
  • tone color correction module can execute the above-mentioned S301.
  • the gain control module 560 is used to adjust the gain of the stereo beam.
  • the gain control module 560 can adjust the gain of the stereo beam according to the zoom factor of the camera.
  • gain control module 560 can execute the foregoing S401.
  • the embodiment of the present invention also provides a computer-readable storage medium on which a computer program is stored.
  • a computer program is stored on which a computer program is stored.
  • the stereo sound pickup method disclosed in each of the foregoing embodiments is implemented.
  • the embodiment of the present invention also provides a computer program product, which when the computer program product runs on a computer, causes the computer to execute the stereo sound pickup method disclosed in each of the foregoing embodiments.
  • the embodiment of the present invention also provides a chip system, which includes a processor and may also include a memory, which is used to implement the stereo sound pickup method disclosed in each of the foregoing embodiments.
  • the chip system can be composed of chips, or it can include chips and other discrete devices.
  • the target beam parameter set is determined according to the posture data and camera data of the terminal device, when the terminal device is in a different position When recording a video scene, different posture data and camera data will be obtained, and then different target beam parameter groups will be determined.
  • different target beam parameters are used when forming a stereo beam based on the target beam parameter group and multiple target pickup data.
  • the group can adjust the direction of the stereo beam, thereby effectively reducing the impact of noise in the recording environment, so that the terminal device can obtain a better stereo recording effect in different video recording scenes.
  • by detecting the blocking of the microphone and eliminating various abnormal sound data it is possible to record video in the case of microphone blocking and abnormal sound data, which can still ensure a good stereo recording effect and sound recording.
  • each block in the flowchart or block diagram may represent a module, program segment, or part of the code, and the module, program segment, or part of the code contains one or more functions for realizing the specified logic function.
  • Executable instructions may also occur in a different order from the order marked in the drawings.
  • each block in the block diagram and/or flowchart, and a combination of blocks in the block diagram and/or flowchart can be implemented by a dedicated hardware-based system that performs the specified functions or actions Or it can be realized by a combination of dedicated hardware and computer instructions.
  • the functional modules in the various embodiments of the present invention may be integrated together to form an independent part, or each module may exist alone, or two or more modules may be integrated to form an independent part.
  • the function is implemented in the form of a software function module and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the technical solution of the present invention essentially or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which can be a mobile phone, a tablet computer, etc.) execute all or part of the steps of the methods described in the various embodiments of the present invention.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disks or optical disks and other media that can store program codes. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Otolaryngology (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Studio Devices (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

本发明实施例提出一种立体声拾音方法、装置、终端设备和计算机可读存储介质。终端设备从多个麦克风的拾音数据中获取多个目标拾音数据,获取终端设备设备的姿态数据和摄像头数据,根据姿态数据和摄像头数据从预先存储的多个波束参数组中确定与多个目标拾音数据对应的目标波束参数组,并根据目标波束参数组和多个目标拾音数据形成立体声波束。如此,当终端设备处于不同的视频录制场景时,根据不同的姿态数据和摄像头数据确定出不同的目标波束参数组,进而利用不同的目标波束参数组调整立体声波束的方向,故可以有效降低录制环境中的噪声影响,使得终端设备在不同的视频录制场景中均能获得较佳的立体声录音效果。

Description

立体声拾音方法、装置、终端设备和计算机可读存储介质
本申请要求在2020年1月16日提交中国国家知识产权局、申请号为202010048851.9的中国专利申请的优先权,发明名称为“立体声拾音方法、装置、终端设备和计算机可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及音频处理领域,具体而言,涉及一种立体声拾音方法、装置、终端设备和计算机可读存储介质。
背景技术
随着终端技术的发展,视频录制已成为手机、平板等终端设备中的一项重要应用,用户对视频的录音效果的要求也越来越高。
目前,在使用终端设备录制视频时,一方面因视频录制场景复杂多变以及录制过程中环境噪声的影响,另一方面终端设备生成的立体声波束的方向往往因配置参数的固化而无法调节,导致终端设备难以适应各种场景需求,从而无法获得较佳的立体声录音效果。
发明内容
有鉴于此,本发明的目的在于提供一种立体声拾音方法、装置、终端设备和计算机可读存储介质,以使终端设备在不同的视频录制场景中均能获得较佳的立体声录音效果。
为了实现上述目的,本发明实施例采用的技术方案如下:
第一方面,本发明实施例提供一种立体声拾音方法,应用于终端设备,所述终端设备包括多个麦克风,所述方法包括:
从所述多个麦克风的拾音数据中获取多个目标拾音数据;
获取所述终端设备的姿态数据和摄像头数据;
根据所述姿态数据和所述摄像头数据从预先存储的多个波束参数组中确定与所述多个目标拾音数据对应的目标波束参数组;其中,所述目标波束参数组包括所述多个目标拾音数据各自对应的波束参数;
根据所述目标波束参数组和所述多个目标拾音数据形成立体声波束。
本发明实施例提供的立体声拾音方法中,由于目标波束参数组是根据终端设备的姿态数据和摄像头数据来确定的,当终端设备处于不同的视频录制场景时, 将获得不同的姿态数据和摄像头数据,进而确定出不同的目标波束参数组,这样在根据目标波束参数组和多个目标拾音数据形成立体声波束时,利用不同的目标波束参数组可以调整立体声波束的方向,从而有效降低录制环境中的噪声影响,使得终端设备在不同的视频录制场景中均能获得较佳的立体声录音效果。在可选的实施方式中,所述摄像头数据包括启用数据,所述启用数据表征被启用的摄像头;
所述根据所述姿态数据和所述摄像头数据从预先存储的多个波束参数组中确定与所述多个目标拾音数据对应的目标波束参数组的步骤包括:根据所述姿态数据和所述启用数据从预先存储的多个波束参数组中确定与所述多个目标拾音数据对应的第一目标波束参数组;
根据所述目标波束参数组和所述多个目标拾音数据形成立体声波束的步骤包括:根据所述第一目标波束参数组和所述多个目标拾音数据形成第一立体声波束;其中,所述第一立体声波束指向被启用的摄像头的拍摄方向。
本发明实施例中,通过终端设备的姿态数据和表征被启用的摄像头的启用数据来确定第一目标波束参数组,并根据第一目标波束参数组和多个目标拾音数据形成第一立体声波束,实现了在不同的视频录制场景下,第一立体声波束的方向根据姿态数据和启用数据进行适应性地调整,确保终端设备录制视频时可以获得较佳的立体声录音效果。
在可选的实施方式中,所述多个波束参数组包括第一波束参数组、第二波束参数组、第三波束参数组和第四波束参数组,所述第一波束参数组、所述第二波束参数组、所述第三波束参数组和所述第四波束参数组中的所述波束参数不同;
其中,当所述姿态数据表征所述终端设备处于横屏状态,且所述启用数据表征后置摄像头被启用时,所述第一目标波束参数组为所述第一波束参数组;
当所述姿态数据表征所述终端设备处于横屏状态,且所述启用数据表征前置摄像头被启用时,所述第一目标波束参数组为所述第二波束参数组;
当所述姿态数据表征所述终端设备处于竖屏状态,且所述启用数据表征后置摄像头被启用时,所述第一目标波束参数组为所述第三波束参数组;
当所述姿态数据表征所述终端设备处于竖屏状态,且所述启用数据表征前置摄像头被启用时,所述第一目标波束参数组为所述第四波束参数组。
在可选的实施方式中,所述摄像头数据包括启用数据和变焦数据,其中所述变焦数据为所述启用数据表征的被启用的摄像头的变焦倍数;
所述根据所述姿态数据和所述摄像头数据从预先存储的多个波束参数组中确定与所述多个目标拾音数据对应的目标波束参数组的步骤包括:根据所述姿态数据、所述启用数据和所述变焦数据从预先存储的多个波束参数组中确定与所述 多个目标拾音数据对应的第二目标波束参数组;
根据所述目标波束参数组和所述多个目标拾音数据形成立体声波束的步骤包括:根据所述第二目标波束参数组和所述多个目标拾音数据形成第二立体声波束;其中,所述第二立体声波束指向被启用的摄像头的拍摄方向,且所述第二立体声波束的宽度随着所述变焦倍数的增大而收窄。
本发明实施例中,通过终端设备的姿态数据、表征被启用的摄像头的启用数据以及变焦数据来确定第二目标波束参数组,并根据第二目标波束参数组和多个目标拾音数据形成第二立体声波束,实现了在不同的视频录制场景下,第二立体声波束的方向和宽度根据姿态数据、启用数据以及变焦数据进行适应性地调整,从而在嘈杂环境以及远距离拾音条件下,能够实现较好的录音鲁棒性。
在可选的实施方式中,所述从所述多个麦克风的拾音数据中获取多个目标拾音数据的步骤包括:
根据所述多个麦克风的拾音数据获取未发生堵麦的麦克风的序号;
检测每个所述麦克风的拾音数据中是否存在异常音数据;
若存在异常音数据,则消除所述多个麦克风的拾音数据中的异常音数据,得到初始目标拾音数据;
从所述初始目标拾音数据中选取所述未发生堵麦的麦克风的序号对应的拾音数据作为所述多个目标拾音数据。
本发明实施例中,通过对多个麦克风进行堵麦检测以及对多个麦克风的拾音数据进行异常音处理,来确定用于形成立体声波束的多个目标拾音数据,实现了在有异常声音干扰和麦克风堵孔的情况下,仍具有较好的录音鲁棒性,从而保证良好的立体声录音效果。
在可选的实施方式中,所述根据所述多个麦克风的拾音数据获取未发生堵麦的麦克风的序号的步骤包括:
对每个所述麦克风的拾音数据均进行时域分帧处理和频域变换处理,以得到每个所述麦克风的拾音数据对应的时域信息和频域信息;
将不同麦克风的拾音数据对应的时域信息和频域信息分别进行比较,得到时域比较结果和频域比较结果;
根据所述时域比较结果和所述频域比较结果确定发生堵麦的麦克风的序号;
基于所述发生堵麦的麦克风的序号确定未发生堵麦的麦克风的序号。
本发明实施例中,通过比较不同麦克风的拾音数据对应的时域信息和频域信息,能够得到比较准确的堵麦检测结果,有利于后续确定用于形成立体声波束的 多个目标拾音数据,从而保证良好的立体声录音效果。
在可选的实施方式中,所述检测每个所述麦克风的拾音数据中是否存在异常音数据的步骤包括:
对每个所述麦克风的拾音数据进行频域变换处理,得到每个所述麦克风的拾音数据对应的频域信息;
根据预先训练的异常音检测网络和每个所述麦克风的拾音数据对应的频域信息检测每个所述麦克风的拾音数据中是否存在异常音数据。
本发明实施例中,通过将麦克风的拾音数据进行频域变换处理,并利用预先训练的异常音检测网络及麦克风的拾音数据对应的频域信息来检测麦克风的拾音数据中是否存在异常音数据,便于后续得到比较干净的拾音数据,从而保证良好的立体声录音效果。
在可选的实施方式中,所述消除所述多个麦克风的拾音数据中的异常音数据的步骤包括:
利用预先训练的声音检测网络检测所述异常音数据中是否存在预设的声音数据;
若不存在预设的声音数据,则消除所述异常音数据;
若存在预设的声音数据,则降低所述异常音数据的强度。
本发明实施例中,在对异常音进行消除处理时,通过检测异常音数据中是否存在预设的声音数据,并基于检测结果采取不同的消除措施,既能保证获得比较干净的拾音数据,又能避免用户期望录到的声音数据被完全消除。
在可选的实施方式中,所述从所述多个麦克风的拾音数据中获取多个目标拾音数据的步骤包括:
根据所述多个麦克风的拾音数据获取未发生堵麦的麦克风的序号;
从所述多个麦克风的拾音数据中选取所述未发生堵麦的麦克风的序号对应的拾音数据作为所述多个目标拾音数据。
本发明实施例中,通过对多个麦克风进行堵麦检测,进而选取未发生堵塞的麦克风的序号对应的拾音数据,用于后续形成立体声波束,可使终端设备录制视频时不会因为麦克风堵孔导致音质的明显降低,或者立体声的明显不平衡,即在有麦克风堵孔的情况下,可以保证立体声录音效果,录音鲁棒性好。
在可选的实施方式中,所述从所述多个麦克风的拾音数据中获取多个目标拾音数据的步骤包括:
检测每个所述麦克风的拾音数据中是否存在异常音数据;
若存在异常音数据,则消除所述多个麦克风的拾音数据中的异常音数据,得到多个目标拾音数据。
本发明实施例中,通过对该多个麦克风的拾音数据进行异常音检测和异常音消除处理,可以得到比较干净的拾音数据,用于后续形成立体声波束。如此,实现了在终端设备录制视频时,有效降低异常音数据对立体声录音效果的影响。在可选的实施方式中,所述根据所述目标波束参数组和所述多个目标拾音数据形成立体声波束的步骤之后,所述方法还包括:
修正所述立体声波束的音色。
本发明实施例中,通过修正立体声波束的音色,可将频响修正平直,从而获得较好的立体声录音效果。
在可选的实施方式中,所述根据所述目标波束参数组和所述多个目标拾音数据形成立体声波束的步骤之后,所述方法还包括:
调节所述立体声波束的增益。
本发明实施例中,通过调节立体声波束的增益,可使小音量的拾音数据能够听得清,大音量的拾音数据不会产生削波失真,从而将用户录到的声音调整到合适音量,提高用户的视频录制体验。
在可选的实施方式中,所述摄像头数据包括被启用的摄像头的变焦倍数,所述调节所述立体声波束的增益的步骤包括:
根据所述摄像头的变焦倍数调节所述立体声波束的增益。
本发明实施例中,根据摄像头的变焦倍数调节立体声波束的增益,可使目标声源的音量不会因为距离远而降低,从而提升录制视频的声音效果。
在可选的实施方式中,所述麦克风的数量为3至6个,其中至少一个麦克风设置在所述终端设备的屏幕正面或所述终端设备的背面。
本发明实施例中,通过设置至少一个麦克风在终端设备的屏幕正面或终端设备的背面,以确保能够形成指向终端设备前后方向的立体声波束。
在可选的实施方式中,所述麦克风的数量为3个,所述终端设备的顶部和底部分别设置一个麦克风,所述终端设备的屏幕正面或所述终端设备的背面设置一个麦克风。
在可选的实施方式中,所述麦克风的数量为6个,所述终端设备的顶部和底部分别设置两个麦克风,所述终端设备的屏幕正面和所述终端设备的背面分别设置一个麦克风。
第二方面,本发明实施例提供一种立体声拾音装置,应用于终端设备,所述 终端设备包括多个麦克风,所述装置包括:
拾音数据获取模块,用于从所述多个麦克风的拾音数据中获取多个目标拾音数据;
设备参数获取模块,用于获取所述终端设备的姿态数据和摄像头数据;
波束参数确定模块,用于根据所述姿态数据和所述摄像头数据从预先存储的多个波束参数组中确定与所述多个目标拾音数据对应的目标波束参数组;其中,所述目标波束参数组包括所述多个目标拾音数据各自对应的波束参数;
波束形成模块,用于根据所述目标波束参数组和所述多个目标拾音数据形成立体声波束。
第三方面,本发明实施例提供一种终端设备,包括存储有计算机程序的存储器和处理器,所述计算机程序被所述处理器读取并运行时,实现如前述实施方式中任一项所述的方法。
第四方面,本发明实施例提供一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器读取并运行时,实现如前述实施方式中任一项所述的方法。
第五方面,本发明实施例还提供一种计算机程序产品,当计算机程序产品在计算机上运行时,使得计算机执行前述实施方式中任一项所述的方法。
第六方面,本发明实施例还提供一种芯片系统,该芯片系统包括处理器,还可以包括存储器,用于实现如前述实施方式中任一项所述的方法。该芯片系统可以由芯片构成,也可以包含芯片和其他分立器件。
为使本发明的上述目的、特征和优点能更明显易懂,下文特举较佳实施例,并配合所附附图,作详细说明如下。
附图说明
为了更清楚地说明本发明实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,应当理解,以下附图仅示出了本发明的某些实施例,因此不应被看作是对范围的限定,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他相关的附图。
图1示出了本发明实施例提供的终端设备的一种硬件结构示意图;
图2示出了本发明实施例提供的终端设备上的麦克风数量为3个时的布局示意图;
图3示出了本发明实施例提供的终端设备上的麦克风数量为6个时的布局示意图;
图4示出了本发明实施例提供的立体声拾音方法的一种流程示意图;
图5本发明实施例提供的立体声拾音方法的另一种流程示意图;
图6示出了终端设备处于横屏状态且启用后置摄像头时对应的第一立体声波束的示意图;
图7示出了终端设备处于横屏状态且启用前置摄像头时对应的第一立体声波束的示意图;
图8示出了终端设备处于竖屏状态且启用后置摄像头时对应的第一立体声波束的示意图;
图9示出了终端设备处于竖屏状态且启用前置摄像头时对应的第一立体声波束的示意图;
图10示出了本发明实施例提供的立体声拾音方法的又一种流程示意图;
图11a-11c示出了第二立体声波束的宽度随被启用的摄像头的变焦倍数的变化而变化的示意图;
图12示出了图4中S201的一种子步骤流程示意图;
图13示出了图4中S201的另一种子步骤流程示意图;
图14示出了图4中S201的又一种子步骤流程示意图;
图15示出了本发明实施例提供的立体声拾音方法的又一种流程示意图;
图16示出了本发明实施例提供的立体声拾音方法的又一种流程示意图;
图17示出了本发明实施例提供的立体声拾音装置的一种功能模块示意图;
图18示出了本发明实施例提供的立体声拾音装置的另一种功能模块示意图;
图19示出了本发明实施例提供的立体声拾音装置的又一种功能模块示意图。
具体实施方式
下面将结合本发明实施例中附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。通常在此处附图中描述和示出的本发明实施例的组件可以以各种不同的配置来布置和设计。
因此,以下对在附图中提供的本发明的实施例的详细描述并非旨在限制要求保护的本发明的范围,而是仅仅表示本发明的选定实施例。基于本发明的实施例, 本领域技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例,都属于本发明保护的范围。
需要说明的是,术语“第一”和“第二”等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。
本发明实施例提供的立体声拾音方法及装置可以应用于手机、平板电脑等终端设备中。示例性的,图1示出了终端设备的一种硬件结构示意图。终端设备可以包括处理器110、内部存储器120、外部存储器接口130、传感器模块140、摄像头150、显示屏160、音频模块170、扬声器171、麦克风172、受话器173、耳机接口174、移动通信模块180、无线通信模块190、USB(Universal Serial Bus,通用串行总线)接口101、充电管理模块102、电源管理模块103、电池104、按键105、马达106、指示器107、用户标识模块(Subscriber Identification Module,SIM)卡接口108、天线1、天线2等。
应当理解的是,图1所示的硬件结构仅是一个示例。本发明实施例的终端设备可以具有比图1中所示终端设备更多的或者更少的部件,可以组合两个或更多的部件,或者可以具有不同的部件配置。图1中所示出的各种部件可以在包括一个或多个信号处理和/或专用集成电路在内的硬件、软件、或硬件和软件的组合中实现。
其中,处理器110可以包括一个或多个处理单元。例如,处理器110可以包括应用处理器(Application Processor,AP),调制解调处理器,图形处理器(Graphics Processing Unit,GPU),图像信号处理器(Image Signal Processor,ISP),控制器,存储器,视频编解码器,数字信号处理器(Digital Signal Processor,DSP),基带处理器,和/或神经网络处理器(Neural-network Processing Unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。控制器可以是终端设备的神经中枢和指挥中心,控制器可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。
处理器110中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器110中的存储器为高速缓冲存储器。该存储器可以保存处理器110刚用过或循环使用的指令或数据。如果处理器110需要再次使用该指令或数据,可从存储器中直接调用,避免了重复存取,减少了处理器110的等待时间,因而提高了系统的效率。
内部存储器120可以用于存储计算机程序和/或数据。在一些实施例中,内部存储器120可以包括存储程序区和存储数据区。其中,存储程序区可存储操作系统、至少一个功能所需的应用程序(比如声音播放功能,图像播放功能、人脸识别功能)等;存储数据区可存储终端设备使用过程中所创建的数据(比如音频数据、图像数据)等。示例性的,处理器110可以通过运行存储在内部存储器120的计算机程序和/或数据,从而执行终端设备的各种功能应用以及数据处理。例如,当内部存储器120中存储的计算机程序和/或数据被处理器110读取并运行时,可使终端设备执行本发明实施例所提供的立体声拾音方法,使得终端设备在不同的视频录制场景中均能获得较佳的立体声录音效果。此外,内部存储器120可以包括高速随机存取存储器,还可以包括非易失性存储器。例如,非易失性存储器可以包括至少一个磁盘存储器件、闪存器件、通用闪存存储器(Universal Flash Storage,UFS)等。
外部存储器接口130可以用于连接外部存储卡,例如Micro SD卡,实现扩展终端设备的存储能力。外部存储卡通过外部存储器接口130与处理器110通信,实现数据存储功能。例如将音频、视频等文件保存在外部存储卡中。
传感器模块140可以包括一个或多个传感器。例如,加速度传感器140A、陀螺仪传感器140B、距离传感器140C、压力传感器140D、触摸传感器140E、指纹传感器140F、环境光传感器140G、骨传导传感器140H、接近光传感器140J、温度传感器140K、气压传感器140L、磁传感器140M等,对此不作限定。
其中,该加速度传感器140A能够感知到加速力的变化,比如晃动、跌落、上升、下降以及手持终端设备的角度的变化等各种移动变化,都能被加速度传感器140A转化为电信号。在本实施例中,通过加速度传感器140A可以检测终端设备处于横屏状态或者是竖屏状态。
陀螺仪传感器140B可以用于确定终端设备的运动姿态。在一些实施例中,可以通过陀螺仪传感器140B确定终端设备围绕三个轴(即,x,y和z轴)的角速度。陀螺仪传感器140B可以用于拍摄防抖。示例性的,当按下快门,陀螺仪传感器140B检测终端设备抖动的角度,根据角度计算出镜头模组需要补偿的距离,让镜头通过反向运动抵消终端设备的抖动,实现防抖。陀螺仪传感器140B还可以用于导航,体感游戏场景。
距离传感器140C可以用于测量距离。终端设备可以通过红外或激光测量距离。示例性的,终端设备在拍摄场景下,可以利用距离传感器140C测距以实现快速对焦。
压力传感器140D可以用于感受压力信号,将压力信号转换成电信号。在一些实施例中,压力传感器140D可以设置于显示屏160。压力传感器140D的种类很多,如电阻式压力传感器,电感式压力传感器,电容式压力传感器等。电容式压力传感器可以是包括至少两个具有导电材料的平行板。当有力作用于压力传 感器140D时,电极之间的电容改变,终端设备根据电容的变化确定压力的强度。当有触摸操作作用于显示屏160时,终端设备可以通过压力传感器140D检测触摸操作强度,还可以根据压力传感器140D的检测信号计算触摸的位置。
触摸传感器140E,也称“触控面板”。触摸传感器140E可以设置于显示屏160,由触摸传感器140E与显示屏160组成触摸屏,也称“触控屏”。触摸传感器140E用于检测作用于其上或附近的触摸操作。触摸传感器140E可以将检测到的触摸操作传递给应用处理器,以确定触摸事件类型,可以通过显示屏160提供与触摸操作相关的视觉输出。在另一些实施例中,触摸传感器140E也可以设置于终端设备的表面,与显示屏160所处的位置不同。
指纹传感器140F可以用于采集指纹。终端设备可以利用采集的指纹特性实现指纹解锁,访问应用锁,指纹拍照,指纹接听来电等功能。
环境光传感器140G可以用于感知环境光亮度。终端设备可以根据感知的环境光亮度自适应调节显示屏160亮度。环境光传感器140G也可用于拍照时自动调节白平衡。环境光传感器140G还可以与接近光传感器140J配合,检测终端设备是否在口袋里,以防误触。骨传导传感器140H可以用于获取振动信号。在一些实施例中,骨传导传感器140H可以获取人体声部振动骨块的振动信号。骨传导传感器140H也可以接触人体脉搏,接收血压跳动信号。在一些实施例中,骨传导传感器140H也可以设置于耳机中,结合成骨传导耳机。音频模块170可以基于骨传导传感器140H获取的声部振动骨块的振动信号,解析出语音信号,实现语音功能。应用处理器可以基于骨传导传感器140H获取的血压跳动信号解析心率信息,实现心率检测功能。
接近光传感器140J可以包括例如发光二极管(LED)和光检测器,例如光电二极管。发光二极管可以是红外发光二极管。终端设备通过发光二极管向外发射红外光。终端设备使用光电二极管检测来自附近物体的红外反射光。当检测到充分的反射光时,可以确定终端设备附近有物体。当检测到不充分的反射光时,终端设备可以确定终端设备附近没有物体。终端设备可以利用接近光传感器140J检测用户手持终端设备贴近耳朵通话,以便自动熄灭屏幕达到省电的目的。
温度传感器140K可以用于检测温度。在一些实施例中,终端设备利用温度传感器140K检测的温度,执行温度处理策略。例如,当温度传感器140K上报的温度超过阈值,终端设备执行降低位于温度传感器140K附近的处理器的性能,以便降低功耗实施热保护。在另一些实施例中,当温度低于另一阈值时,终端设备对电池104加热,以避免低温导致终端设备异常关机。在其他一些实施例中,当温度低于又一阈值时,终端设备对电池104的输出电压执行升压,以避免低温导致的异常关机。
气压传感器140L可以用于测量气压。在一些实施例中,终端设备通过气压传感器140L测得的气压值计算海拔高度,辅助定位和导航。
磁传感器140M可以包括霍尔传感器。终端设备可以利用磁传感器140M检测翻盖皮套的开合。在一些实施例中,当终端设备是翻盖机时,终端设备可以根据磁传感器140M检测翻盖的开合,进而根据检测到的皮套的开合状态或翻盖的开合状态,设置翻盖自动解锁等特性。
摄像头150用于捕获图像或视频。物体通过镜头生成光学图像投射到感光元件,感光元件可以是电荷耦合器件(Charge Coupled Device,CCD)或互补金属氧化物半导体(Complementary Metal-Oxide-Semiconductor,CMOS)光电晶体管。感光元件把光信号转换成电信号,之后将电信号传递给ISP转换成数字图像信号,ISP将数字图像信号输出到DSP加工处理,DSP将数字图像信号转换成标准的RGB、YUV等格式的图像信号。在一些实施例中,终端设备可以包括1个或多个摄像头150,对此不作限定。一个示例中,终端设备包括2个摄像头150,例如1个前置摄像头和1个后置摄像头;又一个示例中,终端设备包括5个摄像头150,例如3个后置摄像头和2个前置摄像头。终端设备可以通过ISP、摄像头150、视频编解码器、GPU、显示屏160以及应用处理器等实现拍摄功能。
显示屏160用于显示图像、视频等。显示屏160包括显示面板,显示面板可以采用液晶显示屏(Liquid Crystal Display,LCD)、有机发光二极管(Organic Light-Emitting Diode,OLED)、有源矩阵有机发光二极体或主动矩阵有机发光二极体(Active-Matrix Organic Light Emitting Diode的,AMOLED),柔性发光二极管(Flex Light-Emitting Diode,FLED)、Miniled、MicroLed、Micro-oLed、量子点发光二极管(Quantum Dot Light Emitting Diodes,QLED)等。示例性的,终端设备可以通过GPU、显示屏160、应用处理器等实现显示功能。
在本实施例中,终端设备可以通过音频模块170、扬声器171、麦克风172、受话器173、耳机接口174,以及应用处理器等实现音频功能。例如音频播放、录音等。
音频模块170用于将数字音频信息转换成模拟音频信号输出,也用于将模拟音频输入转换为数字音频信号。音频模块170还可以用于对音频信号编码和解码。在一些实施例中,音频模块170可以设置于处理器110中,或将音频模块170的部分功能模块设置于处理器110中。
扬声器171,也称“喇叭”,用于将音频电信号转换为声音信号。例如,终端设备可以通过扬声器171播放音乐、发出语音提示等。
麦克风172,也称“话筒”、“传声器”,用于采集声音(例如周围环境声音,包括人发出的声音、设备发出的声音等),并将声音信号转换为音频电信号,即本实施例中的拾音数据。需要说明的是,终端设备可以设置多个麦克风172,通过在终端设备上布置多个麦克风172,可使用户在使用终端设备录制视频时,获得优质的立体声录音效果。
在本实施例中,终端设备上设置的麦克风172的数量可以为3至6个,其中,至少一个麦克风172设置在终端设备的屏幕正面或终端设备的背面,以确保能够形成指向终端设备前后方向的立体声波束。
示例性的,如图2所示,当麦克风的数量为3个时,终端设备的顶部和底部分别设置一个麦克风(即m1和m2),终端设备的屏幕正面或终端设备的背面设置一个麦克风(即m3);如图3所示,当麦克风的数量为6个时,终端设备的顶部和底部分别设置两个麦克风(即m1、m2,和m3、m4),终端设备的屏幕正面和终端设备的背面分别设置一个麦克风(即m5和m6)。可以理解,在其他实施例中,麦克风172的数量还可以为4个或者5个,且至少一个麦克风172设置在终端设备的屏幕正面或终端设备的背面。
受话器173,也称“听筒”,用于将音频电信号转换为声音信号。当终端设备接听电话或语音信息时,可以通过将受话器173靠近人耳接听语音。
耳机接口174用于连接有线耳机。耳机接口174可以是USB接口,也可以是3.5mm的开放移动终端设备平台(Open Mobile Terminal Platform,OMTP)标准接口,美国蜂窝电信工业协会(Cellular Telecommunications Industry Association of the USA,CTIA)标准接口。
终端设备的无线通信功能可以通过天线1、天线2、移动通信模块180、无线通信模块190、调制解调处理器以及基带处理器等实现。
天线1和天线2用于发射和接收电磁波信号。终端设备中的每个天线可用于覆盖单个或多个通信频带。不同的天线还可以复用,以提高天线的利用率。例如,可以将天线1复用为无线局域网的分集天线。在另外一些实施例中,天线可以和调谐开关结合使用。
移动通信模块180可以提供应用在终端设备上的包括2G/3G/4G/5G等无线通信的解决方案。移动通信模块180可以包括至少一个滤波器、开关、功率放大器、低噪声放大器(Low Noise Amplifier,LNA)等。移动通信模块180可以由天线1接收电磁波,并对接收的电磁波进行滤波、放大等处理,传送至调制解调处理器进行解调。移动通信模块180还可以对经调制解调处理器调制后的信号放大,经天线1转为电磁波辐射出去。在一些实施例中,移动通信模块180的至少部分功能模块可以被设置于处理器110中。在另一些实施例中,移动通信模块180的至少部分功能模块可以与处理器110的至少部分模块被设置在同一个器件中。
调制解调处理器可以包括调制器和解调器。其中,调制器用于将待发送的低频基带信号调制成中高频信号,解调器用于将接收的电磁波信号解调为低频基带信号。随后解调器将解调得到的低频基带信号传送至基带处理器处理。低频基带信号经基带处理器处理后,被传递给应用处理器。应用处理器通过音频设备(不 限于扬声器171,受话器173等)输出声音信号,或通过显示屏160显示图像或视频。在一些实施例中,调制解调处理器可以是独立的器件。在另一些实施例中,调制解调处理器可以独立于处理器110,与移动通信模块180或其他功能模块设置在同一个器件中。
无线通信模块190可以提供应用在终端设备上的包括无线局域网(Wireless Local Area Networks,WLAN)(如无线保真(Wireless Fidelity,Wi-Fi)网络),蓝牙(BitTorrent,BT),全球导航卫星系统(Global Navigation Satellite System,GNSS),调频(Frequency Modulation,FM),近距离无线通信技术(Near Field Communication,NFC),红外技术(Infrared Radiation,IR)等无线通信的解决方案。无线通信模块190可以是集成至少一个通信处理模块的一个或多个器件。无线通信模块190经由天线2接收电磁波,将电磁波信号调频以及滤波处理,将处理后的信号发送到处理器110。无线通信模块190还可以从处理器110接收待发送的信号,对其进行调频、放大处理,经天线2转为电磁波辐射出去。
在一些实施例中,终端设备的天线1和移动通信模块180耦合,天线2和无线通信模块190耦合,使得终端设备可以通过无线通信技术与网络以及其他设备通信。该无线通信技术可以包括全球移动通讯系统(Global System for Mobile Communication,GSM),通用分组无线服务(General Packet Radio Service,GPRS),码分多址接入(Code Division Multiple Access,CDMA),宽带码分多址(Wideband Code Division Multiple Access,WCDMA),时分码分多址(Time Division-Synchronous Code Division Multiple Access,TD-SCDMA),长期演进(Long Term Evolution,LTE),BT,GNSS,WLAN,NFC,FM,和/或IR技术等。GNSS可以包括全球卫星定位系统(Global Positioning System,GPS),全球导航卫星系统(Global Navigation Satellite System,GLONASS),北斗卫星导航系统(BeiDou Navigation Satellite System,BDS),准天顶卫星系统(Quasi-Zenith Satellite System,QZSS)和/或星基增强系统(Satellite Based Augmentation System,SBAS)。
USB接口101是符合USB标准规范的接口,具体可以是Mini USB接口、Micro USB接口、USB Type C接口等。USB接口101可以用于连接充电器为终端设备充电,也可以用于终端设备与外围设备之间传输数据。还可以用于连接耳机,通过耳机播放声音。示例性的,USB接口101除了可以为耳机接口174以外,还可以用于连接其他终端设备,例如AR(Augmented Reality,增强现实)设备、计算机等。
充电管理模块102用于从充电器接收充电输入。其中,充电器可以是无线充电器,也可以是有线充电器。在一些有线充电的实施例中,充电管理模块102可以通过USB接口101接收有线充电器的充电输入。在一些无线充电的实施例中,充电管理模块102可以通过终端设备的无线充电线圈接收无线充电输入。充电管理模块102为电池104充电的同时,还可以通过电源管理模块103为终端设备供电。
电源管理模块103用于连接电池104、充电管理模块102与处理器110。电源管理模块103接收电池104和/或充电管理模块102的输入,为处理器110、内部存储器120、摄像头150、显示屏160等供电。电源管理模块103还可以用于监测电池容量、电池循环次数、电池健康状态(漏电、阻抗)等参数。在一些实施例中,电源管理模块103可以设置于处理器110中。在另一些实施例中,电源管理模块103和充电管理模块102也可以设置于同一个器件中。
按键105包括开机键,音量键等。按键105可以是机械按键,也可以是触摸式按键。终端设备可以接收按键输入,产生与终端设备的用户设置以及功能控制有关的按键信号输入。
马达106可以产生振动提示。马达106可以用于来电振动提示,也可以用于触摸振动反馈。例如,作用于不同应用(例如摄像,音频播放等)的触摸操作,可以对应不同的振动反馈效果。作用于显示屏160不同区域的触摸操作,马达106也可对应不同的振动反馈效果。不同的应用场景(例如:时间提醒,接收信息,闹钟,游戏等)也可以对应不同的振动反馈效果。触摸振动反馈效果还可以支持自定义。
指示器107可以是指示灯,可以用于指示充电状态,电量变化,也可以用于指示消息,未接来电,通知等。
SIM卡接口108用于连接SIM卡。SIM卡可以通过插入SIM卡接口108,或从SIM卡接口108拔出,实现和终端设备的接触和分离。终端设备可以支持一个或多个SIM卡接口。SIM卡接口108可以支持Nano SIM卡,Micro SIM卡,SIM卡等。同一个SIM卡接口108可以同时插入多张卡。多张卡的类型可以相同,也可以不同。SIM卡接口108也可以兼容不同类型的SIM卡。SIM卡接口108也可以兼容外部存储卡。终端设备通过SIM卡和网络交互,实现通话以及数据通信等功能。在一些实施例中,终端设备采用eSIM,即:嵌入式SIM卡。eSIM卡可以嵌在终端设备中,不能和终端设备分离。
本发明实施例提供的立体声拾音方法,利用终端设备的姿态数据和摄像头数据确定目标波束参数组,并结合麦克风拾取的目标拾音数据形成立体声波束。由于不同的姿态数据和摄像头数据决定了不同的目标波束参数组,因此可以利用不同的目标波束参数组调整立体声波束的方向,从而有效降低录制环境中的噪声影响,使得终端设备在不同的视频录制场景中均能获得较佳的立体声录音效果。此外,通过检测麦克风的堵孔情况、消除各种异常音数据、修正立体声波束的音色以及调节立体声波束的增益,在保证良好的立体声录音效果的同时,进一步增强了录音的鲁棒性。
图4为本发明实施例提供的立体声拾音方法的一种流程示意图,该立体声拾音方法可以在具有上述硬件结构的终端设备上实现。请参照图4,该立体声拾音方法可以包括以下步骤:
S201,从多个麦克风的拾音数据中获取多个目标拾音数据。
在本实施例中,当用户使用终端设备摄像或者录制视频时,终端设备可以通过其上设置的多个麦克风采集声音,然后从该多个麦克风的拾音数据中获得多个目标拾音数据。
其中,该多个目标拾音数据既可以根据该多个麦克风的拾音数据直接获得,也可以按照一定规则选取该多个麦克风中的部分麦克风的拾音数据得到,还可以是将多个麦克风的拾音数据按照一定方式进行处理后得到,对此不作限制。
S202,获取终端设备的姿态数据和摄像头数据。
在本实施例中,该终端设备的姿态数据可以通过上述的加速度传感器140A获得,该姿态数据可以表征终端设备处于横屏状态或者是竖屏状态;该摄像头数据可以理解为用户使用终端设备录制视频的过程中,终端设备上设置的摄像头所对应的使用情况。
S203,根据姿态数据和摄像头数据从预先存储的多个波束参数组中确定与多个目标拾音数据对应的目标波束参数组;其中,目标波束参数组包括多个目标拾音数据各自对应的波束参数。
在本实施例中,该波束参数组可以预先训练得到并存储在终端设备中,其包括若干影响立体声波束形成的参数。在一个示例中,可以预先针对终端设备可能处于的视频录制场景,确定终端设备所对应的姿态数据和摄像头数据,并基于该姿态数据和摄像头数据设置相匹配的波束参数组。如此,可以得到多个波束参数组,分别对应不同的视频录制场景,将该多个波束参数组存储在终端设备中以供后续录制视频时使用。例如,当用户使用终端设备摄像或者录制视频时,终端设备基于当前获取的姿态数据和摄像头数据,可以从多个波束参数组中确定匹配的目标波束参数组。
可以理解,当终端设备处于不同的视频录制场景时,终端设备对应的姿态数据和摄像头数据会相应地发生变化,故基于姿态数据和摄像头数据可从多个波束参数组中确定出不同的目标波束参数组,即多个目标拾音数据各自对应的波束参数会随着视频录制场景的不同而发生改变。
S204,根据目标波束参数组和多个目标拾音数据形成立体声波束。
在本实施例中,目标波束参数组中的波束参数可以理解为权重值,在根据目标波束参数组和多个目标拾音数据形成立体声波束时,可以利用每个目标拾音数据和对应的权重值进行加权求和运算,最终得到立体声波束。
由于立体声波束具备空间指向性,故通过对多个目标拾音数据进行波束形成处理,可对立体声波束指向的空间方向之外的拾音数据实现不同程度的抑制作用,从而有效降低录制环境中的噪声影响。同时,由于多个目标拾音数据各自对应的 波束参数会随着视频录制场景的不同而发生改变,故根据目标波束参数组和多个目标拾音数据形成的立体声波束的方向,也将随着视频录制场景的变化而变化,使得终端设备在不同的视频录制场景中均能获得较佳的立体声录音效果。
在一些实施例中,用户使用终端设备录制视频时,会根据录制场景的不同选用不同的摄像头进行拍摄,还可能调整终端设备的姿态使其处于横屏状态或者竖屏状态。在此情形下,终端设备的摄像头数据可以包括启用数据,该启用数据用于表征被启用的摄像头。如图5所示,上述步骤S203可以包括子步骤S203-1:根据姿态数据和启用数据从预先存储的多个波束参数组中确定与多个目标拾音数据对应的第一目标波束参数组;上述步骤S204可以包括子步骤S204-1:根据第一目标波束参数组和多个目标拾音数据形成第一立体声波束,其中,第一立体声波束指向被启用的摄像头的拍摄方向。
在实际应用中,当终端设备处于不同的视频录制场景时,需要对应不同的波束参数组,故终端设备中可以预先存储多个波束参数组。在一个示例中,该多个波束参数组可以包括第一波束参数组、第二波束参数组、第三波束参数组和第四波束参数组,第一波束参数组、第二波束参数组、第三波束参数组和第四波束参数组中的波束参数不同。
以视频录制场景包括终端设备的横、竖屏状态以及前、后置摄像头的使用情况为例,当姿态数据表征终端设备处于横屏状态,且启用数据表征后置摄像头被启用时,第一目标波束参数组为第一波束参数组;当姿态数据表征终端设备处于横屏状态,且启用数据表征前置摄像头被启用时,第一目标波束参数组为第二波束参数组;当姿态数据表征终端设备处于竖屏状态,且启用数据表征后置摄像头被启用时,第一目标波束参数组为第三波束参数组;当姿态数据表征终端设备处于竖屏状态,且启用数据表征前置摄像头被启用时,第一目标波束参数组为第四波束参数组。
示例性的,如图6~图9所示,为第一立体声波束的方向根据终端设备的横、竖屏状态的切换以及前、后置摄像头的启用而变化的示意图。其中,图6中的终端设备处于横屏状态且启用后置摄像头进行拍摄,图7中的终端设备处于横屏状态且启用前置摄像头进行拍摄,图8中的终端设备处于竖屏状态且启用后置摄像头进行拍摄,图9中的终端设备处于竖屏状态且启用前置摄像头进行拍摄。
在图6~图9中,左、右箭头分别表示左、右波束的方向,该第一立体声波束可以理解为左、右波束的合成波束;水平面指的是与终端设备的当前拍摄姿态(横屏状态或竖屏状态)下的竖边垂直的平面,所形成的第一立体声波束的主轴位于该水平面内。当终端设备发生横、竖屏切换时,第一立体声波束的方向也会随之变化。例如,图6所示的第一立体声波束的主轴位于与终端设备的横屏状态下的竖边垂直的水平面上,当终端设备发生横、竖屏切换后,第一立体声波束的主轴则位于与竖屏状态下的竖边垂直的水平面上,如图8所示。
此外,由于被启用的摄像头的拍摄方向一般为用户重点需要拾音的方向,故第一立体声波束的方向还会跟随被启用的摄像头的拍摄方向而变化。例如,在图6和图8中,第一立体声波束的方向均指向后置摄像头的拍摄方向,在图7和图9中,第一立体声波束的方向均指向前置摄像头的拍摄方向。
由此可见,在不同的视频录制场景下,该多个目标拾音数据将对应不同的第一目标波束参数组,进而形成不同方向的第一立体声波束,使得第一立体声波束的方向根据终端设备的横、竖屏状态的切换以及前、后置摄像头的启用进行适应性地调整,确保终端设备录制视频时可以获得较佳的立体声录音效果。
在一些实施例中,用户使用终端设备录制视频时,不仅会对终端设备进行横、竖屏切换以及选用不同的摄像头进行拍摄,而且还会根据拍摄目标的距离远近使用变焦。在此情形下,该摄像头数据可以包括上述的启用数据和变焦数据,其中变焦数据为该启用数据表征的被启用的摄像头的变焦倍数。如图10所示,上述步骤S203可以包括子步骤S203-2:根据姿态数据、启用数据和变焦数据从预先存储的多个波束参数组中确定与多个目标拾音数据对应的第二目标波束参数组;上述步骤S204可以包括子步骤S204-2:根据第二目标波束参数组和多个目标拾音数据形成第二立体声波束;其中,第二立体声波束指向被启用的摄像头的拍摄方向,且第二立体声波束的宽度随着变焦倍数的增大而收窄。
其中,该第二立体声波束的宽度随着被启用的摄像头的变焦倍数的增大而变窄,可以使声像更加集中,因为在用户使用变焦的时候,往往是远距离拾音场景,目标的信噪比更低,通过第二立体声波束的收窄可以提升信噪比,使得终端设备在低信噪比的情况下录音鲁棒性更好,从而获得较佳的立体声录音效果。
在本实施例中,为了实现第二立体声波束的宽度随着被启用的摄像头的变焦倍数的增大而变窄,可以预先设定第二立体声波束在不同姿态数据、启用数据和变焦数据情况下对应的目标形状,然后利用最小二乘法训练得到匹配的波束参数组,使得根据该波束参数组形成的第二立体声波束近似于设定的目标形状,从而得到不同姿态数据、启用数据和变焦数据情况下对应的波束参数组。
当用户使用终端设备录制视频时,随着变焦倍数的调大或者调小,终端设备可以匹配到不同变焦倍数对应的第二目标波束参数组,进而基于第二目标波束参数组和多个目标拾音数据形成不同宽度的第二立体声波束,以适应用户的视频录制需求。示例性的,如图11a-11c所示,为第二立体声波束的宽度随被启用的摄像头的变焦倍数的变化而变化的示意图。在图11a-11c中,第二立体声波束为左、右波束的合成波束,0度方向为用户录制视频时被启用的摄像头的拍摄方向(也可称作目标方向)。当用户使用低变焦倍数录制视频时,终端设备可以匹配到低变焦倍数对应的第二目标波束参数组,进而形成图11a所示的较宽的第二立体声波束;其中,图11a中的左、右波束分别指向拍摄方向的左右45度。当用户使用中等变焦倍数录制视频时,终端设备可以匹配到中等变焦倍数对应的第二目标 波束参数组,进而形成图11b所示收窄的第二立体声波束;其中,图11b中的左、右波束的指向收窄到拍摄方向的左右30度附近。当用户使用高等变焦倍数录制视频时,终端设备可以匹配到高等变焦倍数对应的第二目标波束参数组,进而形成图11c所示进一步较窄的第二立体声波束;其中,图11c中的左、右波束的指向进一步收窄到拍摄方向的左右10度附近。
从图11a-11c中可以看出,第二立体声波束的宽度随着被启用的摄像头的变焦倍数的增大而变窄,可以提高非目标方向上的降噪能力。以左波束为例,在图11a中,其对60度方向上的拾音数据几乎没有抑制作用;在图11b中,对60度方向上的拾音数据有一定的抑制作用;在图11c中,对60度方向上的拾音数据有较大的抑制作用。
可见,在用户使用终端设备录制视频且有使用变焦时,根据终端设备的横、竖屏状态的切换,前、后置摄像头的启用,以及被启用的摄像头的变焦倍数的变化,可确定出不同的第二目标波束参数组,进而形成不同方向和宽度的第二立体声波束,使得第二立体声波束的方向和宽度能够随着终端设备的姿态、被启用的摄像头以及变焦倍数的变化而自适应调整,故在嘈杂环境以及远距离拾音条件下,能够实现较好的录音鲁棒性。
在实际应用中,用户使用终端设备录制视频时,立体声录音效果除了会受到环境噪声的干扰,还很容易因为用户手持终端设备而发生手指或其它部位堵住麦克风的情况,或者由于脏污进入导声孔而产生的堵麦问题而受到影响;以及随着终端设备的功能越来越强大,终端设备的自噪声(即终端设备内部电路产生的噪声)也越来越容易被麦克风拾取到,比如摄像头的马达噪声、WiFi干扰声、电容充放电导致的杂音等;此外,用户在摄像时因为变焦或其它操作,手指或其他部位会触碰屏幕或者摩擦到麦克孔附近,从而产生一些不是用户期望录到的异常声音。这些自噪声或者异常声音的干扰,在一定程度上影响了视频的立体声录音效果。
基于此,本实施例提出在获取到多个麦克风的拾音数据后,通过对多个麦克风进行堵麦检测以及对多个麦克风的拾音数据进行异常音处理,来确定用于形成立体声波束的多个目标拾音数据,以在有异常声音干扰和/或麦克风堵孔的情况下,仍能实现较好的录音鲁棒性,从而保证良好的立体声录音效果。下面,对获取多个目标拾音数据的过程进行详细说明。
如图12所示,S201包括如下子步骤:
S2011-A,根据多个麦克风的拾音数据获取未发生堵麦的麦克风的序号。
可选地,终端设备在获取多个麦克风的拾音数据后,通过对每个麦克风的拾音数据均进行时域分帧处理和频域变换处理,可以得到每个麦克风的拾音数据对应的时域信息和频域信息,将不同麦克风的拾音数据对应的时域信息和频域信息 分别进行比较,可得到时域比较结果和频域比较结果,根据时域比较结果和频域比较结果确定发生堵麦的麦克风的序号,基于发生堵麦的麦克风的序号确定未发生堵麦的麦克风的序号。由于在对信号进行时域分析时,时域信息相同并不能说明两个信号完全相同,需要从频域角度对信号进一步分析,故本实施例通过对麦克风的拾音数据从时域和频域这两个不同角度进行分析,可以有效提高麦克风堵麦检测的准确性,避免从单一角度分析导致麦克风堵麦的误判。在一个示例中,时域信息可以是拾音数据对应的时域信号的RMS(Root-Mean-Square,均方根)值,频域信息可以是拾音数据对应的频域信号在设定频率(例如2KHz)以上高频部分的RMS值,该高频部分的RMS值在麦克风出现堵孔时的特征更加明显。
在实际应用中,当终端设备中存在发生堵麦的麦克风时,发生堵麦的麦克风和未发生堵麦的麦克风的拾音数据中,时域信号的RMS值和高频部分的RMS值,都会存在差别,即便是未发生堵麦的麦克风之间,由于麦克风自身结构以及终端设备壳体遮挡等因素的影响,时域信号的RMS值和高频部分的RMS值也会存在细微差异。因此,可在终端设备研发阶段,需要找出发生堵麦和未发生堵麦的麦克风之间的差异,并根据该差异设定对应的时域阈值和频域阈值,分别用于在时域对不同麦克风的拾音数据对应的时域信号的RMS值进行比较,得到时域比较结果,以及在频域对不同麦克风的拾音数据对应的高频部分的RMS值进行比较,得到频域比较结果,进而结合时域比较结果和频域比较结果判断是否存在发生堵麦的麦克风。在本实施例中,该时域阈值和频域阈值可为本领域技术人员通过实验获得的经验值。
以终端设备包括3个麦克风为例,该3个麦克风的序号分别为m1、m2、m3,该3个麦克风的拾音数据对应的时域信号的RMS值分别为A1、A2、A3,该3个麦克风的拾音数据对应的高频部分的RMS值分别为B1、B2、B3;当在时域对该3个麦克的拾音数据对应的时域信息进行比较时,可分别计算A1与A2、A1与A3、A2与A3的差值,并将该差值与设定的时域阈值进行比较,当差值未超过时域阈值时,则认为两个麦克风的拾音数据对应的时域信息一致;当差值高于时域阈值时,则认为两个麦克风的拾音数据对应的时域信息不一致,并确定两个麦克风的拾音数据对应的时域信息的大小关系;同理,在频域对该3个麦克的拾音数据对应的频域信息进行比较时,可分别计算B1与B2、B1与B3、B2与B3的差值,并将该差值与设定的频域阈值进行比较,当差值未超过频域阈值时,则认为两个麦克风的拾音数据对应的频域信息一致;当差值高于频域阈值时,则认为两个麦克风的拾音数据对应的频域信息不一致,并确定两个麦克风的拾音数据对应的频域信息的大小关系。
在本实施例中,当结合时域比较结果和频域比较结果判断麦克风是否发生堵麦时,若想尽量将堵麦的麦克风检测出来,则可以根据两个麦克风的时域信息和频域信息其中之一不一致,来确定发生堵麦的麦克风。例如,当将不同麦克风的拾音数据对应的时域信息和频域信息分别进行比较,得到的时域比较结果为: A1=A2=A3,得到的频域比较结果为:B1<B2、B1<B3、B2=B3;则基于该时域比较结果和频域比较结果可以确定发生堵麦的麦克风的序号为m1,未发生堵麦的麦克风的序号为m2和m3。
若想避免发生误检,则可以根据两个麦克风的时域信息和频域信息均不一致,来确定发生堵麦的麦克风。例如,当将不同麦克风的拾音数据对应的时域信息和频域信息分别进行比较,得到的时域比较结果为:A1<A2、A1<A3、A2=A3,得到的频域比较结果为:B1<B2、B1<B3、B2=B3;则基于该时域比较结果和频域比较结果可以确定发生堵麦的麦克风的序号为m1,未发生堵麦的麦克风的序号为m2和m3。
S2012-A,检测每个麦克风的拾音数据中是否存在异常音数据。
在本实施例中,可以对每个麦克风的拾音数据进行频域变换处理,得到每个麦克风的拾音数据对应的频域信息,根据预先训练的异常音检测网络和每个麦克风的拾音数据对应的频域信息检测每个麦克风的拾音数据中是否存在异常音数据。
其中,该预先训练的异常音检测网络可以是在终端设备研发阶段,通过收集大量的异常音数据(例如,一些具有特定频率的声音数据),并采用AI(Artificial Intelligence,人工智能)算法进行特征学习得到。在检测阶段,将每个麦克风的拾音数据对应的频域信息输入该预先训练的异常音检测网络,即可得到是否存在异常音数据的检测结果。
S2013-A,若存在异常音数据,则消除多个麦克风的拾音数据中的异常音数据,得到初始目标拾音数据。
在本实施例中,异常音数据可以包括终端设备的自噪声、用户手指触碰屏幕或摩擦麦克孔等异常声音,异常音数据的消除可以采用AI算法并结合时域滤波、频域滤波的方式进行处理。可选地,当检测到异常音数据时,可以对异常音数据的频点降低增益,即乘以0~1之间的数值,达到消除异常音数据或者降低异常音数据的强度的目的。
在一个示例中,可以利用预先训练的声音检测网络检测异常音数据中是否存在预设的声音数据,其中,该预先训练的声音检测网络可以采用AI算法进行特征学习得到,该预设的声音数据可以理解为用户期望录到的非噪声数据,例如说话声、音乐等,当利用预先训练的声音检测网络存在用户期望录到的非噪声数据时,则不对该异常音数据进行消除,只需降低该异常音数据的强度(例如,乘以数值0.5);当利用预先训练的声音检测网络不存在用户期望录到的非噪声数据时,则直接消除该异常音数据(例如,乘以数值0)。
S2014-A,从初始目标拾音数据中选取未发生堵麦的麦克风的序号对应的拾音数据作为多个目标拾音数据。
例如,在序号分别为m1、m2、m3的麦克风中,若发生堵麦的麦克风的序号为m1,未发生堵麦的麦克风的序号为m2和m3,则可从初始目标拾音数据中选取序号m2和m3对应的拾音数据作为目标拾音数据,得到多个目标拾音数据,用于后续形成立体声波束。
需要说明的是,上述S2011-A可以在S2012-A之前执行,也可以在S2012-A之后执行,还可以和S2012-A同时执行;也即是说,本实施例不对堵麦检测和异常音数据处理的顺序进行限制。
在本实施例中,通过结合麦克风的堵麦检测和麦克风的拾音数据的异常音处理,可以确定用于形成立体声波束的多个目标拾音数据,当用户使用终端设备录制视频时,即使有麦克风发生堵孔以及麦克风的拾音数据中存在异常音数据,仍能保证良好的立体声录音效果,从而实现较好的录音鲁棒性。在实际应用中,还可以仅通过对麦克风进行堵麦检测或者对麦克风的拾音数据进行异常音处理,来确定用于形成立体声波束的多个目标拾音数据。
如图13所示,当通过对麦克风进行堵麦检测来确定用于形成立体声波束的多个目标拾音数据时,S201包括如下子步骤:
S2011-B,根据多个麦克风的拾音数据获取未发生堵麦的麦克风的序号。
其中,S2011-B的具体内容可以参考前述S2011-A,此处不再赘述。
S2012-B,从多个麦克风的拾音数据中选取未发生堵麦的麦克风的序号对应的拾音数据作为多个目标拾音数据。
例如,在序号分别为m1、m2、m3的麦克风中,若发生堵麦的麦克风的序号为m1,未发生堵麦的麦克风的序号为m2和m3,则在该3个麦克风的拾音数据中选择序号为m2和m3的麦克风的拾音数据为目标拾音数据,得到多个目标拾音数据。
可见,针对用户录制视频时可能出现麦克风堵孔的情况,终端设备在获取到多个麦克风的拾音数据后,根据该多个麦克风的拾音数据对多个麦克风进行堵麦检测,得出未发生堵塞的麦克风的序号,并选取未发生堵塞的麦克风的序号对应的拾音数据,用于后续形成立体声波束。如此,可使终端设备录制视频时不会因为麦克风堵孔导致音质的明显降低,或者立体声的明显不平衡,即在有麦克风堵孔的情况下,可以保证立体声录音效果,录音鲁棒性好。
如图14所示,当通过对麦克风的拾音数据进行异常音处理来确定用于形成立体声波束的多个目标拾音数据时,S201包括如下子步骤:
S2011-C,检测每个麦克风的拾音数据中是否存在异常音数据。
其中,S2011-C的具体内容可以参考前述S2012-A,此处不再赘述。
S2012-C,若存在异常音数据,则消除多个麦克风的拾音数据中的异常音数据,得到多个目标拾音数据。
也即是说,终端设备在获取到多个麦克风的拾音数据后,通过对该多个麦克风的拾音数据进行异常音检测和异常音消除处理,则可得到比较“干净”的拾音数据(即多个目标拾音数据),用于后续形成立体声波束。如此,实现了在终端设备录制视频时,有效降低手指摩擦麦克风、终端设备的各种自噪声等异常音数据对立体声录音效果的影响。
在实际应用中,由于声波从终端设备的麦克孔到模数转换过程中产生的频响变化,例如麦克本体频响不平直、麦克管道共振效应、滤波电路等因素,也会在一定程度上影响立体声录音效果。基于此,请参照图15,在根据目标波束参数组和多个目标拾音数据形成立体声波束后(即步骤S204后),该立体声拾音方法还包括以下步骤:
S301,修正立体声波束的音色。
通过修正立体声波束的音色,可将频响修正平直,从而获得较好的立体声录音效果。
在一些实施例中,为了将用户录到的声音调整到合适的音量,还可以对生成的立体声波束进行增益控制。请参照图16,在根据目标波束参数组和多个目标拾音数据形成立体声波束后(即步骤S204后),该立体声拾音方法还包括以下步骤:
S401,调节立体声波束的增益。
通过调节立体声波束的增益,可使小音量的拾音数据能够听得清,大音量的拾音数据不会产生削波失真,从而将用户录到的声音调整到合适音量,提高用户的视频录制体验。
在实际应用中,用户一般会在远距离拾音的场景下使用变焦,此时目标声源的音量会因为距离远而降低,从而影响录制的声音效果。基于此,本实施例提出根据摄像头的变焦倍数调节立体声波束的增益,在远距离拾音场景下,随着变焦倍数的增大,增益放大量也随之增加,从而保证远距离拾音场景目标声源的音量仍旧清晰大声。
需要说明的是,在实际的视频录制过程中,终端设备在根据目标波束参数组和多个目标拾音数据形成立体声波束后,可以先对该立体声波束进行音色修正,然后调节该立体声波束的增益,以得到更好的立体声录音效果。
为了执行上述实施例及各个可能的方式中的相应步骤,下面给出一种立体声拾音装置的实现方式。请参阅图17,为本发明实施例提供的一种立体声拾音装置的功能模块图。需要说明的是,本实施例所提供的立体声拾音装置,其基本原 理及产生的技术效果和上述实施例相同,为简要描述,本实施例部分未提及之处,可参考上述的实施例中相应内容。该立体声拾音装置包括:拾音数据获取模块510、设备参数获取模块520、波束参数确定模块530、波束形成模块540。
该拾音数据获取模块510用于从多个麦克风的拾音数据中获取多个目标拾音数据。
可以理解,该拾音数据获取模块510可以执行上述S201。
该设备参数获取模块520用于获取终端设备的姿态数据和摄像头数据。
可以理解,该设备参数获取模块520可以执行上述S202。
该波束参数确定模块530用于根据姿态数据和摄像头数据从预先存储的多个波束参数组中确定与多个目标拾音数据对应的目标波束参数组;其中,目标波束参数组包括多个目标拾音数据各自对应的波束参数。
可以理解,该波束参数确定模块530可以执行上述S203。
该波束形成模块540用于根据目标波束参数组和多个目标拾音数据形成立体声波束。
可以理解,该波束形成模块540可以执行上述S204。
在一些实施例中,该摄像头数据可以包括启用数据,启用数据表征被启用的摄像头,该波束参数确定模块530用于根据姿态数据和启用数据从预先存储的多个波束参数组中确定与多个目标拾音数据对应的第一目标波束参数组。该波束形成模块540可以根据第一目标波束参数组和多个目标拾音数据形成第一立体声波束;其中,第一立体声波束指向被启用的摄像头的拍摄方向。
可选地,多个波束参数组包括第一波束参数组、第二波束参数组、第三波束参数组和第四波束参数组,第一波束参数组、第二波束参数组、第三波束参数组和第四波束参数组中的波束参数不同。
其中,当姿态数据表征终端设备处于横屏状态,且启用数据表征后置摄像头被启用时,第一目标波束参数组为第一波束参数组;当姿态数据表征终端设备处于横屏状态,且启用数据表征前置摄像头被启用时,第一目标波束参数组为第二波束参数组;当姿态数据表征终端设备处于竖屏状态,且启用数据表征后置摄像头被启用时,第一目标波束参数组为第三波束参数组;当姿态数据表征终端设备处于竖屏状态,且启用数据表征前置摄像头被启用时,第一目标波束参数组为第四波束参数组。
可以理解,该波束参数确定模块530可以执行上述S203-1,该波束形成模块540可以执行上述S204-1。
在另一些实施例中,该摄像头数据可以包括启用数据和变焦数据,其中变焦 数据为启用数据表征的被启用的摄像头的变焦倍数,该波束参数确定模块530用于根据姿态数据、启用数据和变焦数据从预先存储的多个波束参数组中确定与多个目标拾音数据对应的第二目标波束参数组。该波束形成模块540可以根据第二目标波束参数组和多个目标拾音数据形成第二立体声波束;其中,第二立体声波束指向被启用的摄像头的拍摄方向,且第二立体声波束的宽度随着变焦倍数的增大而收窄。
可以理解,该波束参数确定模块530可以执行上述S203-2,该波束形成模块540可以执行上述S204-2。
请参照图18,该拾音数据获取模块510可以包括堵麦检测模块511和/或异常音处理模块512,以及目标拾音数据选取模块513,通过堵麦检测模块511和/或异常音处理模块512,以及目标拾音数据选取模块513可以从多个麦克风的拾音数据中获取多个目标拾音数据。
可选地,当通过堵麦检测模块511、异常音处理模块512和目标拾音数据选取模块513来获取多个目标拾音数据时,该堵麦检测模块511用于根据多个麦克风的拾音数据获取未发生堵麦的麦克风的序号,该异常音处理模块512用于检测每个麦克风的拾音数据中是否存在异常音数据,若存在异常音数据,则消除多个麦克风的拾音数据中的异常音数据,得到初始目标拾音数据,该目标拾音数据选取模块513用于从初始目标拾音数据中选取未发生堵麦的麦克风的序号对应的拾音数据作为多个目标拾音数据。
其中,该堵麦检测模块511用于对每个麦克风的拾音数据均进行时域分帧处理和频域变换处理,以得到每个麦克风的拾音数据对应的时域信息和频域信息,将不同麦克风的拾音数据对应的时域信息和频域信息分别进行比较,得到时域比较结果和频域比较结果,根据时域比较结果和频域比较结果确定发生堵麦的麦克风的序号,基于发生堵麦的麦克风的序号确定未发生堵麦的麦克风的序号。
该异常音处理模块512用于对每个麦克风的拾音数据进行频域变换处理,得到每个麦克风的拾音数据对应的频域信息,根据预先训练的异常音检测网络和每个麦克风的拾音数据对应的频域信息检测每个麦克风的拾音数据中是否存在异常音数据。当需要消除异常音数据时,可以利用预先训练的声音检测网络检测异常音数据中是否存在预设的声音数据,若不存在预设的声音数据,则消除异常音数据,若存在预设的声音数据,则降低异常音数据的强度。
可选地,当通过堵麦检测模块511和目标拾音数据选取模块513来获取多个目标拾音数据时,该堵麦检测模块511用于根据多个麦克风的拾音数据获取未发生堵麦的麦克风的序号,该目标拾音数据选取模块513从多个麦克风的拾音数据中选取未发生堵麦的麦克风的序号对应的拾音数据作为多个目标拾音数据。
可选地,当通过异常音处理模块512和目标拾音数据选取模块513来获取多 个目标拾音数据时,该异常音处理模块512用于检测每个麦克风的拾音数据中是否存在异常音数据,若存在异常音数据,则消除多个麦克风的拾音数据中的异常音数据,得到多个目标拾音数据。
可以理解,该堵麦检测模块511可以执行上述S2011-A、S2011-B;该异常音处理模块512可以执行上述S2012-A、S2013-A、S2011-C;该目标拾音数据选取模块513可以执行上述S2014-A、S2012-B、S2012-C。
请参照图19,该立体声拾音装置还可以包括音色修正模块550和增益控制模块560。
其中,音色修正模块550用于修正立体声波束的音色。
可以理解,该音色修正模块可以执行上述S301。
该增益控制模块560用于调节立体声波束的增益。
其中,该增益控制模块560可以根据摄像头的变焦倍数调节立体声波束的增益。
可以理解,该增益控制模块560可以执行上述S401。
本发明实施例还提供了一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器读取并运行时,实现上述各个实施例所揭示的立体声拾音方法。
本发明实施例还提供了一种计算机程序产品,当计算机程序产品在计算机上运行时,使得计算机执行上述各个实施例所揭示的立体声拾音方法。
本发明实施例还提供了一种芯片系统,该芯片系统包括处理器,还可以包括存储器,用于实现上述各个实施例所揭示的立体声拾音方法。该芯片系统可以由芯片构成,也可以包含芯片和其他分立器件。
综上,本发明实施例提供的立体声拾音方法、装置、终端设备和计算机可读存储介质,由于目标波束参数组是根据终端设备的姿态数据和摄像头数据来确定的,当终端设备处于不同的视频录制场景时,将获得不同的姿态数据和摄像头数据,进而确定出不同的目标波束参数组,这样在根据目标波束参数组和多个目标拾音数据形成立体声波束时,利用不同的目标波束参数组可以调整立体声波束的方向,从而有效降低录制环境中的噪声影响,使得终端设备在不同的视频录制场景中均能获得较佳的立体声录音效果。此外,通过检测麦克风的堵孔情况以及针对各种异常音数据进行消除处理,实现了在有麦克风发生堵孔及存在异常音数据的情况下录制视频,仍能保证良好的立体声录音效果,录音鲁棒性好。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,也可以通过其它的方式实现。以上所描述的装置实施例仅仅是示意性的,例如,附图 中的流程图和框图显示了根据本发明的多个实施例的装置、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段或代码的一部分,所述模块、程序段或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现方式中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个连续的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或动作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
另外,在本发明各个实施例中的各功能模块可以集成在一起形成一个独立的部分,也可以是各个模块单独存在,也可以两个或两个以上模块集成形成一个独立的部分。
所述功能如果以软件功能模块的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是手机、平板电脑等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述仅为本发明的优选实施例而已,并不用于限制本发明,对于本领域的技术人员来说,本发明可以有各种更改和变化。凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。

Claims (19)

  1. 一种立体声拾音方法,应用于终端设备,所述终端设备包括多个麦克风,其特征在于,所述方法包括:
    从所述多个麦克风的拾音数据中获取多个目标拾音数据;
    获取所述终端设备的姿态数据和摄像头数据;
    根据所述姿态数据和所述摄像头数据从预先存储的多个波束参数组中确定与所述多个目标拾音数据对应的目标波束参数组;其中,所述目标波束参数组包括所述多个目标拾音数据各自对应的波束参数;
    根据所述目标波束参数组和所述多个目标拾音数据形成立体声波束。
  2. 根据权利要求1所述的方法,其特征在于,所述摄像头数据包括启用数据,所述启用数据表征被启用的摄像头;
    所述根据所述姿态数据和所述摄像头数据从预先存储的多个波束参数组中确定与所述多个目标拾音数据对应的目标波束参数组的步骤包括:根据所述姿态数据和所述启用数据从预先存储的多个波束参数组中确定与所述多个目标拾音数据对应的第一目标波束参数组;
    根据所述目标波束参数组和所述多个目标拾音数据形成立体声波束的步骤包括:根据所述第一目标波束参数组和所述多个目标拾音数据形成第一立体声波束;其中,所述第一立体声波束指向被启用的摄像头的拍摄方向。
  3. 根据权利要求2所述的方法,其特征在于,所述多个波束参数组包括第一波束参数组、第二波束参数组、第三波束参数组和第四波束参数组,所述第一波束参数组、所述第二波束参数组、所述第三波束参数组和所述第四波束参数组中的所述波束参数不同;
    其中,当所述姿态数据表征所述终端设备处于横屏状态,且所述启用数据表征后置摄像头被启用时,所述第一目标波束参数组为所述第一波束参数组;
    当所述姿态数据表征所述终端设备处于横屏状态,且所述启用数据表征前置摄像头被启用时,所述第一目标波束参数组为所述第二波束参数组;
    当所述姿态数据表征所述终端设备处于竖屏状态,且所述启用数据表征后置摄像头被启用时,所述第一目标波束参数组为所述第三波束参数组;
    当所述姿态数据表征所述终端设备处于竖屏状态,且所述启用数据表征前置摄像头被启用时,所述第一目标波束参数组为所述第四波束参数组。
  4. 根据权利要求1所述的方法,其特征在于,所述摄像头数据包括启用数据和变焦数据,其中所述变焦数据为所述启用数据表征的被启用的摄像头的变焦倍 数;
    所述根据所述姿态数据和所述摄像头数据从预先存储的多个波束参数组中确定与所述多个目标拾音数据对应的目标波束参数组的步骤包括:根据所述姿态数据、所述启用数据和所述变焦数据从预先存储的多个波束参数组中确定与所述多个目标拾音数据对应的第二目标波束参数组;
    根据所述目标波束参数组和所述多个目标拾音数据形成立体声波束的步骤包括:根据所述第二目标波束参数组和所述多个目标拾音数据形成第二立体声波束;其中,所述第二立体声波束指向被启用的摄像头的拍摄方向,且所述第二立体声波束的宽度随着所述变焦倍数的增大而收窄。
  5. 根据权利要求1-4任一项所述的方法,其特征在于,所述从所述多个麦克风的拾音数据中获取多个目标拾音数据的步骤包括:
    根据所述多个麦克风的拾音数据获取未发生堵麦的麦克风的序号;
    检测每个所述麦克风的拾音数据中是否存在异常音数据;
    若存在异常音数据,则消除所述多个麦克风的拾音数据中的异常音数据,得到初始目标拾音数据;
    从所述初始目标拾音数据中选取所述未发生堵麦的麦克风的序号对应的拾音数据作为所述多个目标拾音数据。
  6. 根据权利要求5所述的方法,其特征在于,所述根据所述多个麦克风的拾音数据获取未发生堵麦的麦克风的序号的步骤包括:
    对每个所述麦克风的拾音数据均进行时域分帧处理和频域变换处理,以得到每个所述麦克风的拾音数据对应的时域信息和频域信息;
    将不同麦克风的拾音数据对应的时域信息和频域信息分别进行比较,得到时域比较结果和频域比较结果;
    根据所述时域比较结果和所述频域比较结果确定发生堵麦的麦克风的序号;
    基于所述发生堵麦的麦克风的序号确定未发生堵麦的麦克风的序号。
  7. 根据权利要求5所述的方法,其特征在于,所述检测每个所述麦克风的拾音数据中是否存在异常音数据的步骤包括:
    对每个所述麦克风的拾音数据进行频域变换处理,得到每个所述麦克风的拾音数据对应的频域信息;
    根据预先训练的异常音检测网络和每个所述麦克风的拾音数据对应的频域信息检测每个所述麦克风的拾音数据中是否存在异常音数据。
  8. 根据权利要求5所述的方法,其特征在于,所述消除所述多个麦克风的拾音数据中的异常音数据的步骤包括:
    利用预先训练的声音检测网络检测所述异常音数据中是否存在预设的声音数据;
    若不存在预设的声音数据,则消除所述异常音数据;
    若存在预设的声音数据,则降低所述异常音数据的强度。
  9. 根据权利要求1-4任一项所述的方法,其特征在于,所述从所述多个麦克风的拾音数据中获取多个目标拾音数据的步骤包括:
    根据所述多个麦克风的拾音数据获取未发生堵麦的麦克风的序号;
    从所述多个麦克风的拾音数据中选取所述未发生堵麦的麦克风的序号对应的拾音数据作为所述多个目标拾音数据。
  10. 根据权利要求1-4任一项所述的方法,其特征在于,所述从所述多个麦克风的拾音数据中获取多个目标拾音数据的步骤包括:
    检测每个所述麦克风的拾音数据中是否存在异常音数据;
    若存在异常音数据,则消除所述多个麦克风的拾音数据中的异常音数据,得到多个目标拾音数据。
  11. 根据权利要求1-4任一项所述的方法,其特征在于,所述根据所述目标波束参数组和所述多个目标拾音数据形成立体声波束的步骤之后,所述方法还包括:
    修正所述立体声波束的音色。
  12. 根据权利要求1-4任一项所述的方法,其特征在于,所述根据所述目标波束参数组和所述多个目标拾音数据形成立体声波束的步骤之后,所述方法还包括:
    调节所述立体声波束的增益。
  13. 根据权利要求12所述的方法,其特征在于,所述摄像头数据包括被启用的摄像头的变焦倍数,所述调节所述立体声波束的增益的步骤包括:
    根据所述摄像头的变焦倍数调节所述立体声波束的增益。
  14. 根据权利要求1-4任一项所述的方法,其特征在于,所述麦克风的数量为3至6个,其中至少一个麦克风设置在所述终端设备的屏幕正面或所述终端设备的背面。
  15. 根据权利要求14所述的方法,其特征在于,所述麦克风的数量为3个,所述终端设备的顶部和底部分别设置一个麦克风,所述终端设备的屏幕正面或所述终端设备的背面设置一个麦克风。
  16. 根据权利要求14所述的方法,其特征在于,所述麦克风的数量为6个,所述终端设备的顶部和底部分别设置两个麦克风,所述终端设备的屏幕正面和所述终端设备的背面分别设置一个麦克风。
  17. 一种立体声拾音装置,应用于终端设备,所述终端设备包括多个麦克风,其特征在于,所述装置包括:
    拾音数据获取模块,用于从所述多个麦克风的拾音数据中获取多个目标拾音数据;
    设备参数获取模块,用于获取所述终端设备的姿态数据和摄像头数据;
    波束参数确定模块,用于根据所述姿态数据和所述摄像头数据从预先存储的多个波束参数组中确定与所述多个目标拾音数据对应的目标波束参数组;其中,所述目标波束参数组包括所述多个目标拾音数据各自对应的波束参数;
    波束形成模块,用于根据所述目标波束参数组和所述多个目标拾音数据形成立体声波束。
  18. 一种终端设备,其特征在于,包括存储有计算机程序的存储器和处理器,所述计算机程序被所述处理器读取并运行时,实现如权利要求1-16中任一项所述的方法。
  19. 一种计算机可读存储介质,其特征在于,其上存储有计算机程序,所述计算机程序被处理器读取并运行时,实现如权利要求1-16中任一项所述的方法。
PCT/CN2021/071156 2020-01-16 2021-01-12 立体声拾音方法、装置、终端设备和计算机可读存储介质 WO2021143656A1 (zh)

Priority Applications (6)

Application Number Priority Date Filing Date Title
CN202180007656.4A CN114846816B (zh) 2020-01-16 2021-01-12 立体声拾音方法、装置、终端设备和计算机可读存储介质
BR112022013690A BR112022013690A2 (pt) 2020-01-16 2021-01-12 Método e aparelho de captura de som estéreo, dispositivo terminal, e meio de armazenamento legível por computador
CN202311246081.9A CN117528349A (zh) 2020-01-16 2021-01-12 立体声拾音方法、装置、终端设备和计算机可读存储介质
EP21740899.6A EP4075825A4 (en) 2020-01-16 2021-01-12 STEREO SOUND CAPTURE METHOD AND APPARATUS, TERMINAL DEVICE AND COMPUTER READABLE MEMORY MEDIA
US17/758,927 US20230048860A1 (en) 2020-01-16 2021-01-12 Stereo Sound Pickup Method and Apparatus, Terminal Device, and Computer-Readable Storage Medium
JP2022543511A JP2023511090A (ja) 2020-01-16 2021-01-12 ステレオ収音方法および装置、端末デバイス、ならびにコンピュータ可読記憶媒体

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010048851.9 2020-01-16
CN202010048851.9A CN113132863B (zh) 2020-01-16 2020-01-16 立体声拾音方法、装置、终端设备和计算机可读存储介质

Publications (1)

Publication Number Publication Date
WO2021143656A1 true WO2021143656A1 (zh) 2021-07-22

Family

ID=76771795

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/071156 WO2021143656A1 (zh) 2020-01-16 2021-01-12 立体声拾音方法、装置、终端设备和计算机可读存储介质

Country Status (6)

Country Link
US (1) US20230048860A1 (zh)
EP (1) EP4075825A4 (zh)
JP (1) JP2023511090A (zh)
CN (3) CN113132863B (zh)
BR (1) BR112022013690A2 (zh)
WO (1) WO2021143656A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023016032A1 (zh) * 2021-08-12 2023-02-16 北京荣耀终端有限公司 一种视频处理方法及电子设备
CN116668892A (zh) * 2022-11-14 2023-08-29 荣耀终端有限公司 音频信号的处理方法、电子设备及可读存储介质

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115843054A (zh) * 2021-09-18 2023-03-24 维沃移动通信有限公司 参数选择方法、参数配置方法、终端及网络侧设备
CN115134499B (zh) * 2022-06-28 2024-02-02 世邦通信股份有限公司 一种音视频监控方法及系统
CN116700659B (zh) * 2022-09-02 2024-03-08 荣耀终端有限公司 一种界面交互方法及电子设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20050062266A (ko) * 2003-12-20 2005-06-23 엘지전자 주식회사 이동 통신 단말기의 캠코더용 외부마이크 장치
CN104244137A (zh) * 2014-09-30 2014-12-24 广东欧珀移动通信有限公司 一种录像过程中提升远景录音效果的方法及系统
CN108200515A (zh) * 2017-12-29 2018-06-22 苏州科达科技股份有限公司 多波束会议拾音系统及方法
CN108831474A (zh) * 2018-05-04 2018-11-16 广东美的制冷设备有限公司 语音识别设备及其语音信号捕获方法、装置和存储介质
WO2019130908A1 (ja) * 2017-12-26 2019-07-04 キヤノン株式会社 撮像装置及びその制御方法及び記録媒体

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102780947B (zh) * 2011-05-13 2015-12-16 宏碁股份有限公司 降低手持式电子装置录音噪音的系统及其方法
EP2680616A1 (en) * 2012-06-25 2014-01-01 LG Electronics Inc. Mobile terminal and audio zooming method thereof
KR102060712B1 (ko) * 2013-01-31 2020-02-11 엘지전자 주식회사 이동 단말기, 및 그 동작방법
CN104424953B (zh) * 2013-09-11 2019-11-01 华为技术有限公司 语音信号处理方法与装置
GB2583028B (en) * 2013-09-12 2021-05-26 Cirrus Logic Int Semiconductor Ltd Multi-channel microphone mapping
US9338575B2 (en) * 2014-02-19 2016-05-10 Echostar Technologies L.L.C. Image steered microphone array
US9716944B2 (en) * 2015-03-30 2017-07-25 Microsoft Technology Licensing, Llc Adjustable audio beamforming
US10122914B2 (en) * 2015-04-17 2018-11-06 mPerpetuo, Inc. Method of controlling a camera using a touch slider
CN106486147A (zh) * 2015-08-26 2017-03-08 华为终端(东莞)有限公司 指向性录音方法、装置及录音设备
CN106157986B (zh) * 2016-03-29 2020-05-26 联想(北京)有限公司 一种信息处理方法及装置、电子设备
CN107026934B (zh) * 2016-10-27 2019-09-27 华为技术有限公司 一种声源定位方法和装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20050062266A (ko) * 2003-12-20 2005-06-23 엘지전자 주식회사 이동 통신 단말기의 캠코더용 외부마이크 장치
CN104244137A (zh) * 2014-09-30 2014-12-24 广东欧珀移动通信有限公司 一种录像过程中提升远景录音效果的方法及系统
WO2019130908A1 (ja) * 2017-12-26 2019-07-04 キヤノン株式会社 撮像装置及びその制御方法及び記録媒体
CN108200515A (zh) * 2017-12-29 2018-06-22 苏州科达科技股份有限公司 多波束会议拾音系统及方法
CN108831474A (zh) * 2018-05-04 2018-11-16 广东美的制冷设备有限公司 语音识别设备及其语音信号捕获方法、装置和存储介质

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023016032A1 (zh) * 2021-08-12 2023-02-16 北京荣耀终端有限公司 一种视频处理方法及电子设备
CN116668892A (zh) * 2022-11-14 2023-08-29 荣耀终端有限公司 音频信号的处理方法、电子设备及可读存储介质
CN116668892B (zh) * 2022-11-14 2024-04-12 荣耀终端有限公司 音频信号的处理方法、电子设备及可读存储介质

Also Published As

Publication number Publication date
CN114846816B (zh) 2023-10-20
EP4075825A1 (en) 2022-10-19
US20230048860A1 (en) 2023-02-16
CN114846816A (zh) 2022-08-02
JP2023511090A (ja) 2023-03-16
CN113132863A (zh) 2021-07-16
EP4075825A4 (en) 2023-05-24
CN113132863B (zh) 2022-05-24
BR112022013690A2 (pt) 2022-09-06
CN117528349A (zh) 2024-02-06

Similar Documents

Publication Publication Date Title
WO2021143656A1 (zh) 立体声拾音方法、装置、终端设备和计算机可读存储介质
WO2020078237A1 (zh) 音频处理方法和电子设备
US11703960B2 (en) Air mouse mode implementation method and related device
US20220206682A1 (en) Gesture Interaction Method and Apparatus, and Terminal Device
US11960327B2 (en) Display control method for electronic device with foldable screen and electronic device
US20220270343A1 (en) Video image processing method and apparatus
CN113810601B (zh) 终端的图像处理方法、装置和终端设备
US11956607B2 (en) Method and apparatus for improving sound quality of speaker
US11606455B2 (en) Method for preventing mistouch by using top-emitted proximity light, and terminal
EP3993460B1 (en) Method, electronic device and system for realizing functions through nfc tag
CN113496708B (zh) 拾音方法、装置和电子设备
WO2021180085A1 (zh) 拾音方法、装置和电子设备
WO2020019355A1 (zh) 一种可穿戴设备的触控方法、可穿戴设备及系统
WO2022156555A1 (zh) 屏幕亮度的调整方法、装置和终端设备
CN114697812A (zh) 声音采集方法、电子设备及系统
WO2023273476A1 (zh) 一种检测设备方法和电子设备
CN114339429A (zh) 音视频播放控制方法、电子设备和存储介质
US20240135946A1 (en) Method and apparatus for improving sound quality of speaker
WO2020077508A1 (zh) 一种对内部存储器动态调频的方法及电子设备
US20230162718A1 (en) Echo filtering method, electronic device, and computer-readable storage medium
US20230370718A1 (en) Shooting Method and Electronic Device
US11978384B2 (en) Display method for electronic device and electronic device
WO2022142795A1 (zh) 一种设备的识别方法及设备
CN113436635A (zh) 分布式麦克风阵列的自校准方法、装置和电子设备
CN113867520A (zh) 设备控制方法、电子设备和计算机可读存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21740899

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2022543511

Country of ref document: JP

Kind code of ref document: A

REG Reference to national code

Ref country code: BR

Ref legal event code: B01A

Ref document number: 112022013690

Country of ref document: BR

ENP Entry into the national phase

Ref document number: 2021740899

Country of ref document: EP

Effective date: 20220714

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 112022013690

Country of ref document: BR

Kind code of ref document: A2

Effective date: 20220708