WO2022007030A1 - Audio signal processing method and apparatus, device and readable medium - Google Patents

Audio signal processing method and apparatus, device and readable medium Download PDF

Info

Publication number
WO2022007030A1
WO2022007030A1 PCT/CN2020/104772 CN2020104772W WO2022007030A1 WO 2022007030 A1 WO2022007030 A1 WO 2022007030A1 CN 2020104772 W CN2020104772 W CN 2020104772W WO 2022007030 A1 WO2022007030 A1 WO 2022007030A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
microphone
audio signal
audio
signal processing
Prior art date
Application number
PCT/CN2020/104772
Other languages
French (fr)
Chinese (zh)
Inventor
张金宇
Original Assignee
瑞声声学科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 瑞声声学科技(深圳)有限公司 filed Critical 瑞声声学科技(深圳)有限公司
Publication of WO2022007030A1 publication Critical patent/WO2022007030A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/003Changing voice quality, e.g. pitch or formants
    • G10L21/007Changing voice quality, e.g. pitch or formants characterised by the process used
    • G10L21/01Correction of time axis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/67Focus control based on electronic image sensor signals
    • H04N23/675Focus control based on electronic image sensor signals comprising setting of focusing regions
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2430/00Signal processing covered by H04R, not provided for in its groups
    • H04R2430/20Processing of the output signals of the acoustic transducers of an array for obtaining a desired directivity characteristic

Definitions

  • the present invention relates to the field of computer data processing, and in particular, to an audio signal processing method, apparatus, device and readable medium.
  • the video recording function is mainly used to obtain the image information and audio information corresponding to the target object at the same time, which is mainly realized by the camera and microphone device set in the device.
  • the microphone devices in the equipment are generally omnidirectional, that is, they cannot be zoomed, so that when recording video, the zoom camera is aimed at the target object by zooming, but the signal collection range of the microphone is limited. It is still relatively large, which results in inconsistent display ranges of audio and images, which affects the user's video recording experience.
  • An audio signal processing method is based on a target device, the target device includes a microphone array, and the microphone array includes a plurality of microphone devices arranged in different positions;
  • the method includes:
  • the target combined audio signal is determined according to the preset beamforming algorithm, the sub-audio signal, and the target adjustment parameter.
  • the target device also includes a zoom camera device
  • the audio signal processing method further includes:
  • the target audio adjustment parameter is adjusted according to the focal length parameter of the zoom camera device.
  • the target audio adjustment parameter includes a phase compensation value and a spatial phase difference corresponding to each microphone position in the microphone array
  • the obtaining target audio adjustment parameters includes:
  • the phase compensation value and the spatial phase difference corresponding to each of the microphone devices are respectively determined according to the signal delay time of each of the microphone devices.
  • the target parameter further includes a compensation coefficient, and the magnitude of the compensation coefficient is proportional to the focal length parameter of the variable-focus camera device.
  • the adjusting the target audio adjustment parameter according to the focal length parameter of the zoom camera device includes:
  • the compensation coefficient takes a value of 1;
  • the value of the compensation coefficient is less than 1.
  • a target terminal the target terminal includes a body and an accessory module, the accessory module is rotatably connected to the body, and the accessory module includes a zoom camera device and a microphone array;
  • the zoom camera and the microphone array are located on two adjacent sides of the accessory module, and the photosensitive direction of the zoom camera is the same as the sound collection direction of the microphone array.
  • the microphone array is a linear array, including a plurality of microphone devices, and the connection lines of the plurality of microphone devices are perpendicular to the photosensitive surface of the variable-focus camera.
  • An audio signal processing device the device comprises:
  • Acquisition unit used to acquire the sub-audio signals collected by each microphone device
  • Determining unit used to obtain target audio adjustment parameters, and obtain target audio adjustment values according to the target audio adjustment parameters
  • Combining unit used to determine a target combined audio signal according to a preset beamforming algorithm, the sub-audio signal, and the target adjustment parameter.
  • a computer device comprising a memory and a processor
  • the memory stores a computer program
  • the processor causes the processor to perform the steps as described above
  • a computer-readable storage medium storing a computer program, when executed by a processor, causes the processor to perform the steps described above.
  • the sub-audio signals collected by each microphone device are obtained respectively; then the target audio adjustment parameters are determined, and finally the target combined audio signal is determined according to the preset beamforming algorithm, the foregoing sub-audio signals, and the target adjustment parameters.
  • the invention can obtain suitable target adjustment parameters for different audio source application scenarios, improve the sound quality of the audio signal of the target device, and meet the different usage requirements of users.
  • Fig. 1 shows the flow chart of the audio signal processing method in one embodiment
  • Fig. 2 shows the receiving beam angle required by the microphone array corresponding to the sound source in one embodiment
  • Fig. 3 shows the receiving beam angle required by the microphone array corresponding to the sound source in another embodiment
  • Fig. 4 shows the receiving beam angle required by the microphone array corresponding to the sound source in yet another embodiment
  • Fig. 5 shows the flow chart of determining the phase compensation value and the spatial phase difference corresponding to each microphone device in one embodiment
  • Fig. 6 shows the front structure schematic diagram of the target terminal in one embodiment
  • FIG. 7 shows a schematic diagram of a rear view structure of a target terminal in one embodiment
  • Fig. 8 shows the flow chart of the audio signal processing method in still another embodiment
  • Fig. 9 shows the structural block diagram of the audio signal processing apparatus in one embodiment
  • Figure 10 shows an internal structure diagram of a computer device in one embodiment.
  • the present invention provides an audio signal processing method.
  • the present invention may be based on a target device, wherein the target device includes a microphone array, and the microphone array includes a plurality of microphone devices arranged in different positions.
  • the target device may be, for example, a mobile phone, a tablet computer, etc., or a photographing auxiliary tool for connecting with other devices such as a mobile phone.
  • an embodiment of the present invention provides an audio signal processing method.
  • FIG. 1 shows a flowchart of an audio signal processing method in one embodiment.
  • the audio signal processing method described in the present invention may include steps S1022-S1026 as shown in FIG. 1 , which are described in detail as follows:
  • step S1022 the sub-audio signals collected by each microphone device are acquired respectively.
  • the microphone array used to collect the audio signal is first introduced.
  • Microphone array refers to an array formed by a group of omnidirectional microphones located at different positions in space according to a certain shape and rule. It is a device for spatial sampling of spatially propagated sound signals. Spatial location information of the sound signal. According to the distance between the sound source and the microphone array, the microphone array can be divided into a near-field model and a far-field model. According to the topology of the microphone array, it can be divided into linear array, planar array, volume array and so on.
  • the field model regards the sound wave as a spherical wave, which considers the amplitude difference between the received signals of the microphone array elements; the far-field model regards the sound wave as a plane wave, it ignores the amplitude difference between the received signals of each array element, and approximately considers the difference between the received signals of each array element. is a simple time-delay relationship. Obviously, the far-field model is a simplification of the actual model, which greatly simplifies the processing difficulty.
  • the general speech enhancement method is based on the far-field model.
  • the design methods (topologies) of the microphone arrays included in different types and purposes of equipment are quite different, that is, the number of microphones in the microphone array, the number of microphones in each microphone device There are also differences in distance.
  • a common microphone array structure may exist such as a one-dimensional microphone array, that is, a linear microphone array, the centers of which are located on the same straight line. According to whether the distance between adjacent array elements is the same, it can be divided into uniform linear array (Uniform Linear Array, ULA) and nested linear array, the linear array can only get the horizontal direction angle information of the signal.
  • ULA Uniform Linear Array
  • a two-dimensional microphone array that is, a planar microphone array
  • the centers of the array elements are distributed on a plane.
  • the geometric shape of the array it can be divided into equilateral triangle array, T-shaped array, uniform circular array, uniform square array, coaxial circular array, circular or rectangular area array, etc.
  • the planar array can obtain the horizontal azimuth and vertical azimuth information of the signal.
  • the spacing between the various elements in the microphone array for example, in the linear four-microphone array configuration, four microphone devices are set at equal distances, and the spacing between each microphone device is 20-60mm, while in the ring
  • the six-mic array is in a circular layout, and the six microphones are evenly distributed on the circumference clockwise, and the radius is generally 20 ⁇ 60mm.
  • the microphone array is a linear array, and the distance between each microphone device and the target sound source is different, so the space and timing information of the received sound wave information are different.
  • the sub-audio signals of each microphone in the linear array are combined to obtain an audio data corresponding to the target sound source object.
  • step 1024 the target audio adjustment parameter is obtained, and the target audio adjustment value is obtained according to the target audio adjustment parameter.
  • the target audio adjustment parameter includes a spatial phase difference. Considering the difference in the setting position of each microphone device in the microphone array, after obtaining the above-mentioned sub-audio signal, it is necessary to calculate the spatial phase difference corresponding to each microphone device to perform phase compensation on the sub-audio collected by each microphone device. .
  • steps S1032-S1034 shown in FIG. 5 may also be included after the process of acquiring the sub-audio signals collected by each microphone device respectively.
  • FIG. 5 shows a flowchart of determining the phase compensation value and the spatial phase difference corresponding to each microphone device in one embodiment.
  • step S1032 the signal delay time of each microphone device is determined according to preset distance information and sound speed between each of the microphone devices.
  • the standard sound speed is about 340m/s, but in different real-time collection environments (affected by factors such as wind speed, air pressure, temperature, etc.), the sound speed collected by different collection devices varies. Therefore, it is necessary to obtain the current sound speed information in real time, so as to calculate the signal delay time of each microphone device in combination with the current sound speed and the distance between each microphone device.
  • the specific signal delay time can be obtained according to the ratio of the distance information and the current sound speed.
  • the preset distance information between the microphone devices may be stored in the device memory, and can be obtained.
  • step S1034 the phase compensation value and the spatial phase difference corresponding to each of the microphone devices are respectively determined according to the signal delay time of each of the microphone devices.
  • the corresponding phase compensation value can be determined according to the signal delay time of each microphone device.
  • the delay difference between at least two microphones in the microphone array can be described in the frequency domain by a phase difference function, commonly referred to as differential phase, which takes a value between -180 degrees and +180 degrees .
  • the spatial phase difference can be calculated from the distance between two adjacent microphone devices in the microphone array and the speed of sound.
  • the target audio adjustment value is the product of the phase compensation value of each microphone device and the spatial phase difference.
  • the target audio adjustment value of microphone 1 is phase compensation 1*spatial phase difference ⁇
  • the target audio adjustment value of microphone 2 is phase compensation 2*spatial phase difference ⁇ , and so on.
  • a target combined audio signal is determined according to a preset beamforming algorithm, the sub-audio signal, and the target adjustment value.
  • Beamforming refers to the delay or phase compensation and amplitude weighting of the output of each array element in the microphone array to form a beam pointing in a specific direction. Unlike omnidirectional microphones, the beam in this specific direction represents the direction of signal acquisition, so that signal data in a specific direction can be collected in a more targeted manner.
  • the preset beamforming algorithm can be beamforming with fixed weights, or adaptive beamforming according to signal characteristics.
  • the target audio adjustment parameters of each microphone device determined in the previous step and the specific sub-audio signals of each microphone can be combined into a directional target combined audio signal with a minimum beam angle according to a beamforming algorithm. It can be known from the above description that the minimum beam angle in this implementation scenario is related to the number of microphone devices and the distance between two adjacent microphone devices.
  • FIG. 6 is a schematic diagram of a front view structure of a target terminal in an embodiment
  • FIG. 7 is a schematic diagram of a rear view structure of the target terminal in an embodiment
  • the target terminal 10 includes a main body 11 and an accessory module 12.
  • the accessory module 12 is rotatably connected to the main body 11, for example, connected by a rotating shaft.
  • the rotating shaft connects the center position of the accessory module 12 and the main body 11.
  • the rotating shaft may also connect the accessory module 12 and the edge position of the main body 11 .
  • the accessory module 12 includes a zoom camera device 121 and a microphone array 122.
  • the zoom camera device 121 and the microphone array 122 are located on two adjacent sides of the accessory module 12.
  • the microphone array 122 is located on the side close to the user, and the zoom camera device is located in the accessory. The side with the smallest area of the module 12 .
  • the shooting direction of the zoom camera device 121 is the same as the sound collection direction of the microphone array 122 .
  • the microphone array 121 is a linear array, including a plurality of microphone devices, and the arrangement direction of the plurality of microphone devices is perpendicular to the photosensitive surface of the zoom camera 121, so that the zoom camera device 121 and the microphone array 122 point in the same direction, which better guarantees The subject of the sound is the same as the subject of the shooting.
  • the accessory module 12 is a rectangular parallelepiped
  • the microphone array 122 is located on the rectangular surface formed by the long side and the wide side of the rectangular parallelepiped
  • the arrangement direction of the plurality of microphone devices corresponds to the long side of the rectangular parallelepiped. parallel.
  • the variable-focus camera device 121 is located on a rectangular surface formed by the wide side and the high side of the cuboid, and the photosensitive surface of the variable-focus camera device 121 is parallel to the rectangular surface. Therefore, the arrangement direction of the plurality of microphone devices is perpendicular to the photosensitive surface of the variable-focus camera device 121 .
  • the arrangement direction of the plurality of microphone devices is the sound collection direction of the microphone array 122 , and the sound collection direction of the microphone array 122 is the same as the light receiving direction of the zoom camera.
  • FIG. 8 shows a flowchart of an audio signal processing method in one embodiment.
  • the audio signal processing method described in the present invention may include steps S2022-S2026 as shown in FIG. 7, which are described in detail as follows:
  • step S2022 the sub-audio signals collected by each microphone device are acquired respectively.
  • This step is basically the same as step S1022 of the audio signal processing method in the embodiment shown in FIG. 1 , and will not be repeated here.
  • step S2024 the target audio adjustment parameter is obtained according to the focal length parameter of the variable-focus camera device, and the target audio adjustment value is obtained according to the target audio adjustment parameter.
  • the focal length parameter when the zoom camera is used for recording, the focal length parameter reflects the acquisition range of the image data of the target object during recording. With the adjustment of the focal length parameter of the camera, the obtained image The range will also be adjusted accordingly.
  • a lens with a focal length below 24mm is called an "ultra-wide-angle lens". This lens has a large viewing angle and can obtain a large range of images.
  • the focal length is 100mm and above, it is generally It is a macro lens that captures a small range of images, and is generally used for macro photography and very close-up close-ups.
  • the focal length parameter It can be inferred according to the focal length parameter.
  • the focal length parameter used is smaller, it proves that the range to be shot is larger, and the range of the sound source is also larger at this time.
  • the focal length parameter is larger, the range to be shot is smaller. At this time, the range of the sound source is also smaller, so the target audio adjustment parameter can be adjusted according to the focal length parameter, so that the received audio signal quality of the target device is higher.
  • the target audio adjustment parameter further includes a compensation coefficient
  • the size of the compensation coefficient is proportional to the focal length parameter of the zoom camera device. Specifically, when the focal length parameter is greater than a preset threshold, the compensation coefficient takes a value of 1; when the focal length parameter is less than or equal to the preset threshold, the compensation coefficient takes a value of less than 1.
  • the compensation coefficient when taking telephoto-super telephoto shooting (for example, when the focal length parameter is 100mm), the compensation coefficient here can be set to 1, that is, the adjustment of the spatial phase difference is not performed for each microphone device. In this case, it is similar to omnidirectional fixed beam angle far-field sound pickup, so that only the sound of the subject in the screen is correspondingly collected, avoiding the interference of the surrounding environment.
  • a smaller compensation coefficient such as 0.5
  • a smaller compensation coefficient such as 0.5
  • the target audio adjustment value is equal to the product of the compensation coefficient, the phase compensation value corresponding to each microphone position, and the spatial phase difference.
  • the target audio adjustment value of microphone 1 is phase compensation 1*compensation coefficient k*spatial phase difference ⁇
  • the target audio adjustment value of microphone 2 is phase compensation 2*compensation coefficient k*spatial phase difference ⁇ , and so on, so that analogy.
  • the image in order to further improve the user's audio experience, considering the limitation of the hardware of the capture device, similar to when the image is acquired at close focus, the image may be blurred or out of focus, resulting in the user recording If the experience is not good, the sub-audio signal may also be denoised according to a preset preprocessing algorithm before combining the target combined audio signal.
  • the target combined audio signal is also determined.
  • the adjustment parameters input through a preset interface or device are acquired, and the target adjustment parameters are determined according to the adjustment parameters.
  • the adjustment parameter here can be a preset recording mode selected by the user, such as "concert mode”, “indoor mode”, “sports mode”, etc., and then adjust according to the selected preset recording mode parameters and goals The parameters are determined.
  • the target adjustment parameter for the audio zoom can be appropriately reduced, for example, adjusted from 0.6 determined according to the focal length parameter to 0.4.
  • a target combined audio signal is determined according to a preset beamforming algorithm, the sub-audio signal, and the target adjustment value.
  • This step is basically the same as step S1026 of the audio signal processing method in the embodiment shown in FIG. 1 , and will not be repeated here.
  • FIG. 9 shows a structural block diagram of an audio signal processing apparatus in an embodiment.
  • an audio signal processing apparatus 1060 includes: an obtaining unit 1062 , a determining unit 1064 , and a combining unit 1066 .
  • the obtaining unit 1062 is used to obtain the sub-audio signals collected by each microphone device respectively;
  • Determining unit 1064 configured to obtain a focal length parameter through the zoom camera device, and determine a target audio adjustment parameter according to the focal length parameter;
  • Combining unit 1066 configured to determine a target combined audio signal according to a preset beamforming algorithm, the sub-audio signal, and the target adjustment parameter.
  • the target device further includes a zoom camera device
  • the determining unit 1064 is further configured to:
  • the target audio adjustment parameter is adjusted according to the focal length parameter of the zoom camera device.
  • the target audio adjustment parameter includes a phase compensation value and a spatial phase difference corresponding to each microphone position in the microphone array.
  • the determination unit 1064 is also used to:
  • the phase compensation value and the spatial phase difference corresponding to each of the microphone devices are respectively determined according to the signal delay time of each of the microphone devices.
  • the target parameter further includes a compensation coefficient, and the magnitude of the compensation coefficient is proportional to the focal length parameter of the variable-focus camera device.
  • the compensation coefficient when the focal length parameter is greater than a preset threshold, the compensation coefficient takes a value of 1; when the focal length parameter is less than or equal to the preset threshold, the compensation coefficient takes a value of less than 1.
  • the target terminal includes a body and an accessory module, the accessory module is rotatably connected to the body, the accessory module includes a zoom camera device and the microphone array, and the microphone array is connected to the zoom camera.
  • the devices are located on the same side of the accessory module and point in the same direction.
  • the microphone array is a linear array.
  • Figure 10 shows a diagram of the internal structure of a computer device in one embodiment.
  • the computer device may be a terminal or a server.
  • the computer device includes a processor, a memory and an output module, an acquisition module, and a processing module connected through a system bus.
  • the memory includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium of the computer device stores an operating system, and also stores a computer program, which, when executed by the processor, enables the processor to implement the audio signal processing method.
  • a computer program can also be stored in the internal memory. When the computer program is executed by the processor, the processor can execute the audio signal processing method.
  • FIG. 10 is only a block diagram of a partial structure related to the solution of the present invention, and does not constitute a limitation on the computer equipment to which the solution of the present invention is applied. Include more or fewer components than shown in the figures, or combine certain components, or have a different arrangement of components.
  • a computer device including a memory and a processor, wherein the memory stores a computer program, and when the computer program is executed by the processor, the processor executes the execution of FIG. 1 and FIG. 5 and the steps shown in Figure 8.
  • Nonvolatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in various forms such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Road (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
  • SRAM static RAM
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • DDRSDRAM double data rate SDRAM
  • ESDRAM enhanced SDRAM
  • SLDRAM synchronous chain Road (Synchlink) DRAM
  • SLDRAM synchronous chain Road (Synchlink) DRAM
  • Rambus direct RAM
  • DRAM direct memory bus dynamic RAM
  • RDRAM memory bus dynamic RAM

Landscapes

  • Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

Disclosed in the embodiments of the present invention are an audio signal processing method and apparatus, a device and a readable medium. The method is based on a target device. The target device comprises a microphone array, and the microphone array comprises a plurality of microphone apparatuses disposed at different positions. The method comprises: separately acquiring sub-audio signals collected by the microphone apparatuses; acquiring target audio adjustment parameters; and determining a target combined audio signal according to a preset beamforming algorithm, the sub-audio signals, and the target adjustment parameters. The present invention improves the quality of the recorded audio.

Description

音频信号处理方法、装置、设备及可读介质Audio signal processing method, apparatus, device and readable medium 技术领域technical field
本发明涉及计算机数据处理领域,尤其涉及一种音频信号处理方法、装置、设备及可读介质。The present invention relates to the field of computer data processing, and in particular, to an audio signal processing method, apparatus, device and readable medium.
背景技术Background technique
随着智能设备和移动终端的日益普及,越来越多的设备所具备的录像功能成为被用户广泛使用的功能之一。录像功能主要用于同时获取目标对象所对应的图像信息和音频信息,这主要是通过设备中设置的摄像头和麦克风装置实现的。With the increasing popularity of smart devices and mobile terminals, the video recording function provided by more and more devices has become one of the functions widely used by users. The video recording function is mainly used to obtain the image information and audio information corresponding to the target object at the same time, which is mainly realized by the camera and microphone device set in the device.
技术问题technical problem
由于可变焦光学摄像头的出现和相关光学处理技术的发展,大部分设备的摄像头已实现很大程度上的变焦,即既可以拍摄近物(焦距较小),也可以拍摄远处物体(焦距较大)。Due to the emergence of variable zoom optical cameras and the development of related optical processing technologies, the cameras of most devices have achieved a large degree of zoom, that is, they can shoot both close objects (with a smaller focal length) and distant objects (with a longer focal length). Big).
但与此同时,在现有技术中,设备中的麦克风装置一般为多为全指向即不可变焦,这样就导致在进行录像时,变焦摄像头通过变焦对准了目标物体,但是麦克风的信号采集范围仍然较大,这样就造成音频和图像的展示范围不一致,影响了用户的录像体验。However, at the same time, in the prior art, the microphone devices in the equipment are generally omnidirectional, that is, they cannot be zoomed, so that when recording video, the zoom camera is aimed at the target object by zooming, but the signal collection range of the microphone is limited. It is still relatively large, which results in inconsistent display ranges of audio and images, which affects the user's video recording experience.
技术解决方案technical solutions
基于此,有必要针对上述问题,提出一种音频信号处理方法、装置、计算机设备及可读介质。Based on this, it is necessary to propose an audio signal processing method, apparatus, computer equipment and readable medium in response to the above problems.
一种音频信号处理方法,所述方法基于一目标设备,所述目标设备包括麦克风阵列,所述麦克风阵列包括设置于不同位置的多个麦克风装置;An audio signal processing method, the method is based on a target device, the target device includes a microphone array, and the microphone array includes a plurality of microphone devices arranged in different positions;
所述方法包括:The method includes:
分别获取各个麦克风装置采集的分音频信号;respectively acquiring the sub-audio signals collected by each microphone device;
获取目标音频调节参数,根据所述目标音频调节参数获取目标音频调节值;Obtain target audio adjustment parameters, and obtain target audio adjustment values according to the target audio adjustment parameters;
按照预设的波束成型算法和所述分音频信号、所述目标调节参数确定目标组合音频信号。The target combined audio signal is determined according to the preset beamforming algorithm, the sub-audio signal, and the target adjustment parameter.
其中,更进一步地,所述目标设备还包括可变焦摄像头装置;Wherein, further, the target device also includes a zoom camera device;
所述音频信号处理方法还包括:The audio signal processing method further includes:
根据所述可变焦摄像头装置的焦距参数调节所述目标音频调节参数。The target audio adjustment parameter is adjusted according to the focal length parameter of the zoom camera device.
更进一步地,所述目标音频调节参数包括所述麦克风阵列中每个麦克风位置对应的相位补偿值和空间相位差;Further, the target audio adjustment parameter includes a phase compensation value and a spatial phase difference corresponding to each microphone position in the microphone array;
所述获取目标音频调节参数,包括:The obtaining target audio adjustment parameters includes:
分别根据各个所述麦克风装置之间的间距和声速信息确定各个所述麦克风装置的信号延迟时间;Determining the signal delay time of each of the microphone devices according to the distance between each of the microphone devices and the speed of sound information;
根据所述各个麦克风装置的信号延迟时间分别确定各个所述麦克风装置对应的所述相位补偿值和所述空间相位差。The phase compensation value and the spatial phase difference corresponding to each of the microphone devices are respectively determined according to the signal delay time of each of the microphone devices.
更进一步地,所述目标参数还包括补偿系数,所述补偿系数的大小与所述可变焦摄像头装置的焦距参数成正比。Further, the target parameter further includes a compensation coefficient, and the magnitude of the compensation coefficient is proportional to the focal length parameter of the variable-focus camera device.
更进一步地,所述根据所述可变焦摄像头装置的焦距参数调节所述目标音频调节参数,包括:Further, the adjusting the target audio adjustment parameter according to the focal length parameter of the zoom camera device includes:
在所述焦距参数大于预设阈值时,所述补偿系数取值为1;When the focal length parameter is greater than a preset threshold, the compensation coefficient takes a value of 1;
在所述焦距参数小于或等于所述预设阈值时,所述补偿系数的取值小于1。When the focal length parameter is less than or equal to the preset threshold, the value of the compensation coefficient is less than 1.
一种目标终端,所述目标终端包括本体和附件模组,所述附件模组与所述本体可旋转连接,所述附件模组包括可变焦摄像头装置和麦克风阵列;A target terminal, the target terminal includes a body and an accessory module, the accessory module is rotatably connected to the body, and the accessory module includes a zoom camera device and a microphone array;
所述可变焦摄像头和所述麦克风阵列位于所述附件模组相邻的两面,所述可变焦摄像头的感光方向与所述麦克风阵列的收音方向相同。The zoom camera and the microphone array are located on two adjacent sides of the accessory module, and the photosensitive direction of the zoom camera is the same as the sound collection direction of the microphone array.
更进一步地,所述麦克风阵列为线性阵列,包括多个麦克风装置,所述多个麦克风装置的连线与所述可变焦摄像头的感光面垂直。Further, the microphone array is a linear array, including a plurality of microphone devices, and the connection lines of the plurality of microphone devices are perpendicular to the photosensitive surface of the variable-focus camera.
一种音频信号处理装置,所述装置包括:An audio signal processing device, the device comprises:
获取单元:用于获取各个麦克风装置采集的分音频信号;Acquisition unit: used to acquire the sub-audio signals collected by each microphone device;
确定单元:用于获取目标音频调节参数,根据所述目标音频调节参数获取目标音频调节值;Determining unit: used to obtain target audio adjustment parameters, and obtain target audio adjustment values according to the target audio adjustment parameters;
组合单元:用于按照预设的波束成型算法和所述分音频信号、所述目标调节参数确定目标组合音频信号。Combining unit: used to determine a target combined audio signal according to a preset beamforming algorithm, the sub-audio signal, and the target adjustment parameter.
一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,所述计算机程序被所述处理器执行时,使得所述处理器执行如上所述的步骤A computer device, comprising a memory and a processor, the memory stores a computer program, and when the computer program is executed by the processor, the processor causes the processor to perform the steps as described above
一种计算机可读存储介质,存储有计算机程序,所述计算机程序被处理器执行时,使得所述处理器执行如上所述的步骤。A computer-readable storage medium storing a computer program, when executed by a processor, causes the processor to perform the steps described above.
有益效果beneficial effect
在本发明实施例中,首先分别获取各个麦克风装置采集的分音频信号;然后确定目标音频调节参数,最后按照预设的波束成型算法和前述分音频信号、目标调节参数确定目标组合音频信号,本发明可以针对不同的音源应用场景获取合适的目标调节参数,提升目标设备音频信号的声音质量,满足用户不同的使用需求。In the embodiment of the present invention, firstly, the sub-audio signals collected by each microphone device are obtained respectively; then the target audio adjustment parameters are determined, and finally the target combined audio signal is determined according to the preset beamforming algorithm, the foregoing sub-audio signals, and the target adjustment parameters. The invention can obtain suitable target adjustment parameters for different audio source application scenarios, improve the sound quality of the audio signal of the target device, and meet the different usage requirements of users.
附图说明Description of drawings
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to explain the embodiments of the present invention or the technical solutions in the prior art more clearly, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments of the present invention. For those of ordinary skill in the art, other drawings can also be obtained according to these drawings without creative efforts.
其中:in:
图1示出了一个实施例中音频信号处理方法的流程图;Fig. 1 shows the flow chart of the audio signal processing method in one embodiment;
图2示出了一个实施例中麦克风阵列对应音源所需要的接收波束角;Fig. 2 shows the receiving beam angle required by the microphone array corresponding to the sound source in one embodiment;
图3示出了另一个实施例中麦克风阵列对应音源所需要的接收波束角;Fig. 3 shows the receiving beam angle required by the microphone array corresponding to the sound source in another embodiment;
图4示出了又一个实施例中麦克风阵列对应音源所需要的接收波束角;Fig. 4 shows the receiving beam angle required by the microphone array corresponding to the sound source in yet another embodiment;
图5示出了一个实施例中确定各个麦克风装置对应的相位补偿值和空间相位差的流程图;Fig. 5 shows the flow chart of determining the phase compensation value and the spatial phase difference corresponding to each microphone device in one embodiment;
图6示出了一个实施例中目标终端的正视结构示意图;Fig. 6 shows the front structure schematic diagram of the target terminal in one embodiment;
图7示出了一个实施例中目标终端的背视结构示意图;7 shows a schematic diagram of a rear view structure of a target terminal in one embodiment;
图8示出了又一个实施例中音频信号处理方法的流程图;Fig. 8 shows the flow chart of the audio signal processing method in still another embodiment;
图9示出了一个实施例中音频信号处理装置的结构框图;Fig. 9 shows the structural block diagram of the audio signal processing apparatus in one embodiment;
图10示出了一个实施例中计算机设备的内部结构图。Figure 10 shows an internal structure diagram of a computer device in one embodiment.
本发明的实施方式Embodiments of the present invention
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, but not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.
本发明提出了一种音频信号处理方法,在一个实施例中,本发明可以基于一目标设备,其中该目标设备包括麦克风阵列该麦克风阵列中包括设置于不同位置的多个麦克风装置。在一个可选的实施例中,目标设备可以是如手机、平板电脑等,也可以是一个拍摄辅助工具,用于与手机等其他设备进行连接。The present invention provides an audio signal processing method. In one embodiment, the present invention may be based on a target device, wherein the target device includes a microphone array, and the microphone array includes a plurality of microphone devices arranged in different positions. In an optional embodiment, the target device may be, for example, a mobile phone, a tablet computer, etc., or a photographing auxiliary tool for connecting with other devices such as a mobile phone.
参考图1,本发明实施例提供了一种音频信号处理方法。Referring to FIG. 1 , an embodiment of the present invention provides an audio signal processing method.
图1示出了一个实施例中音频信号处理方法的流程图。本发明中所述的音频信号处理方法可以包括如图1所示的步骤S1022-S1026,详细介绍如下:FIG. 1 shows a flowchart of an audio signal processing method in one embodiment. The audio signal processing method described in the present invention may include steps S1022-S1026 as shown in FIG. 1 , which are described in detail as follows:
在步骤S1022中,分别获取各个麦克风装置采集的分音频信号。In step S1022, the sub-audio signals collected by each microphone device are acquired respectively.
在进行详细的音频信号的处理方法的介绍之前,首先对用于采集音频信号的麦克风阵列进行介绍。Before the detailed introduction of the audio signal processing method, the microphone array used to collect the audio signal is first introduced.
麦克风阵列指的是一组位于空间不同位置的全向麦克风按一定的形状规则布置形成的阵列,是对空间传播声音信号进行空间采样的一种装置,其采集到的信号包含了空间中传播的声音信号的空间位置信息。根据声源和麦克风阵列之间距离的远近,可将麦克风阵列分为近场模型和远场模型。根据麦克风阵列的拓扑结构,则可分为线性阵列、平面阵列、体阵列等。Microphone array refers to an array formed by a group of omnidirectional microphones located at different positions in space according to a certain shape and rule. It is a device for spatial sampling of spatially propagated sound signals. Spatial location information of the sound signal. According to the distance between the sound source and the microphone array, the microphone array can be divided into a near-field model and a far-field model. According to the topology of the microphone array, it can be divided into linear array, planar array, volume array and so on.
场模型将声波看成球面波,它考虑麦克风阵元接收信号间的幅度差;远场模型则将声波看成平面波,它忽略各阵元接收信号间的幅度差,近似认为各接收信号之间是简单的时延关系。显然远场模型是对实际模型的简化,极大地简化了处理难度。一般语音增强方法就是基于远场模型。The field model regards the sound wave as a spherical wave, which considers the amplitude difference between the received signals of the microphone array elements; the far-field model regards the sound wave as a plane wave, it ignores the amplitude difference between the received signals of each array element, and approximately considers the difference between the received signals of each array element. is a simple time-delay relationship. Obviously, the far-field model is a simplification of the actual model, which greatly simplifies the processing difficulty. The general speech enhancement method is based on the far-field model.
因此容易理解的是,为了获得不同的拾音效果,不同类型和用途的设备中包含的麦克风阵列的设计方式(拓扑结构)存在较大不同,即麦克风阵列中的麦克风数量、每个麦克风装置之间的距离也存在区别。Therefore, it is easy to understand that in order to obtain different sound pickup effects, the design methods (topologies) of the microphone arrays included in different types and purposes of equipment are quite different, that is, the number of microphones in the microphone array, the number of microphones in each microphone device There are also differences in distance.
通常的麦克风阵列结构可以存在如一维麦克风阵列,即线性麦克风阵列,其阵元中心位于同一条直线上。根据相邻阵元间距是否相同,又可分为均匀线性阵列(Uniform Linear Array,ULA)和嵌套线性阵列,线性阵列只能得到信号的水平方向角信息。A common microphone array structure may exist such as a one-dimensional microphone array, that is, a linear microphone array, the centers of which are located on the same straight line. According to whether the distance between adjacent array elements is the same, it can be divided into uniform linear array (Uniform Linear Array, ULA) and nested linear array, the linear array can only get the horizontal direction angle information of the signal.
或者,二维麦克风阵列,即平面麦克风阵列,其阵元中心分布在一个平面上。根据阵列的几何形状可分为等边三角形阵、T型阵、均匀圆阵、均匀方阵、同轴圆阵、圆形或矩形面阵等。平面阵列可以得到信号的水平方位角和垂直方位角信息。Or, a two-dimensional microphone array, that is, a planar microphone array, the centers of the array elements are distributed on a plane. According to the geometric shape of the array, it can be divided into equilateral triangle array, T-shaped array, uniform circular array, uniform square array, coaxial circular array, circular or rectangular area array, etc. The planar array can obtain the horizontal azimuth and vertical azimuth information of the signal.
关于麦克风阵列中各个阵元之间的间距设置,举例说明,在线性四麦阵列构型中,4个麦克风装置为等距设置,每个麦克风装置之间的间距为20~60mm,而在环形六麦阵列呈圆形布局,6个麦克风顺时针均匀分布在圆周,半径范围一般为20~60mm。Regarding the setting of the spacing between the various elements in the microphone array, for example, in the linear four-microphone array configuration, four microphone devices are set at equal distances, and the spacing between each microphone device is 20-60mm, while in the ring The six-mic array is in a circular layout, and the six microphones are evenly distributed on the circumference clockwise, and the radius is generally 20~60mm.
在本实施场景中,麦克风阵列为线性阵列,各个麦克风装置与目标声源的距离存在差异,从而接收到的声波信息的空间和时序信息存在不同,先获取各个麦克风装置采集的分音频信号,将线性阵列中的各个麦克风的分音频信号进行组合得到一个对应于目标音源对象的音频数据。 In this implementation scenario, the microphone array is a linear array, and the distance between each microphone device and the target sound source is different, so the space and timing information of the received sound wave information are different. The sub-audio signals of each microphone in the linear array are combined to obtain an audio data corresponding to the target sound source object.
在步骤1024中,获取目标音频调节参数,根据目标音频调节参数获取目标音频调节值。In step 1024, the target audio adjustment parameter is obtained, and the target audio adjustment value is obtained according to the target audio adjustment parameter.
在本实施场景中,目标音频调节参数包括空间相位差。考虑到麦克风阵列的中各个麦克风装置的设置位置的区别,在获取了上述的分音频信号之后还需要计算出每个麦克风装置对应的空间相位差从而对每个麦克风装置采集的分音频进行相位补偿。In this implementation scenario, the target audio adjustment parameter includes a spatial phase difference. Considering the difference in the setting position of each microphone device in the microphone array, after obtaining the above-mentioned sub-audio signal, it is necessary to calculate the spatial phase difference corresponding to each microphone device to perform phase compensation on the sub-audio collected by each microphone device. .
因此在获取分别获取各个麦克风装置采集的分音频信号的过程之后还可以包括图5示出的步骤S1032-S1034。图5示出了一个实施例中确定各个麦克风装置对应的相位补偿值和空间相位差的流程图。Therefore, steps S1032-S1034 shown in FIG. 5 may also be included after the process of acquiring the sub-audio signals collected by each microphone device respectively. FIG. 5 shows a flowchart of determining the phase compensation value and the spatial phase difference corresponding to each microphone device in one embodiment.
在步骤S1032中,根据各个所述麦克风装置之间的预设的间距信息和声速确定各个麦克风装置的信号延迟时间。In step S1032, the signal delay time of each microphone device is determined according to preset distance information and sound speed between each of the microphone devices.
在1个标准大气压和15℃的条件下,标准声速约为340m/s,但在不同的实时采集环境(受到风速、气压、温度等因素的影响)中不同的采集设备所采集的声速存在变化,因此需要实时获取当前声速信息,从而结合当前声速和每个麦克风装置之间的间距计算出各个麦克风装置的信号延迟时间。Under the conditions of 1 standard atmospheric pressure and 15°C, the standard sound speed is about 340m/s, but in different real-time collection environments (affected by factors such as wind speed, air pressure, temperature, etc.), the sound speed collected by different collection devices varies. Therefore, it is necessary to obtain the current sound speed information in real time, so as to calculate the signal delay time of each microphone device in combination with the current sound speed and the distance between each microphone device.
具体的信号延迟时间可以是根据间距信息与当前声速的比值得到。The specific signal delay time can be obtained according to the ratio of the distance information and the current sound speed.
另外,此处的各个麦克风装置之间的预设的间距信息可以是存储在设备内存之中,获取即可。In addition, the preset distance information between the microphone devices here may be stored in the device memory, and can be obtained.
在步骤S1034中,根据所述各个麦克风装置的信号延迟时间分别确定各个所述麦克风装置对应的相位补偿值和空间相位差。In step S1034, the phase compensation value and the spatial phase difference corresponding to each of the microphone devices are respectively determined according to the signal delay time of each of the microphone devices.
首先,目标声源所产生的声波到达麦克风阵列中的位于不同位置的各个麦克风装置的时间存在差别,即此处的信号延迟时间,而对应的不同的信号达到时间代表了各个麦克风装置所采集到的声波信号的相位差异(如在同一时刻声波的波峰和波谷到达不同的位置,被不同的麦克风装置所采集),因此可以根据每个麦克风装置的信号延迟时间确定对应的相位补偿值。First of all, there is a difference in the time when the sound waves generated by the target sound source reach the microphone devices located at different positions in the microphone array, that is, the signal delay time here, and the corresponding arrival times of the different signals represent the data collected by each microphone device. Therefore, the corresponding phase compensation value can be determined according to the signal delay time of each microphone device.
麦克风阵列中的至少两个麦克风之间的延迟差可以通过相位差函数在频域中描述,所述相位差函数通常称为差分相位,其取介于-180度与+180度之间的值。通过麦克风阵列中相邻两个麦克风装置之间的间距和声速可以计算出空间相位差。The delay difference between at least two microphones in the microphone array can be described in the frequency domain by a phase difference function, commonly referred to as differential phase, which takes a value between -180 degrees and +180 degrees . The spatial phase difference can be calculated from the distance between two adjacent microphone devices in the microphone array and the speed of sound.
具体地说,目标音频调节值为每个麦克风装置的相位补偿值和空间相位差的乘积。例如,麦克风1的目标音频调节值为相位补偿1*空间相位差φ,麦克风2的目标音频调节值为相位补偿2*空间相位差φ,等等,以此类推。Specifically, the target audio adjustment value is the product of the phase compensation value of each microphone device and the spatial phase difference. For example, the target audio adjustment value of microphone 1 is phase compensation 1*spatial phase difference φ, the target audio adjustment value of microphone 2 is phase compensation 2*spatial phase difference φ, and so on.
在步骤S1026中,按照预设的波束成型算法和所述分音频信号、所述目标调节值确定目标组合音频信号。In step S1026, a target combined audio signal is determined according to a preset beamforming algorithm, the sub-audio signal, and the target adjustment value.
首先针对波束成型原理进行介绍:波束成型,是指对麦克风阵列中各阵元的输出进行时延或相位补偿、幅度加权处理,从而以形成指向特定方向的波束。而区别于全指向的麦克风,这一特定方向的波束就代表了信号采集时的指向,从而可以更有针对性地采集到特定方向上的信号数据。First, the principle of beamforming is introduced: Beamforming refers to the delay or phase compensation and amplitude weighting of the output of each array element in the microphone array to form a beam pointing in a specific direction. Unlike omnidirectional microphones, the beam in this specific direction represents the direction of signal acquisition, so that signal data in a specific direction can be collected in a more targeted manner.
预设的波束成型算法可以是固定权重的波束成型,也可以是根据信号特性进行自适应波束成型,如先确定一个预设的准则函数,可以依据的准则有信噪比(snr)最大准则、均方误差最小准则(MSE)、线性约束最小方差准则(LCMV)、最大似然准则(ML)等,再对准则函数进行求解,得到目标波束成型的信号组合,如图2-如4所示,为麦克风对应不同的接受波束角的收音范围示意图。The preset beamforming algorithm can be beamforming with fixed weights, or adaptive beamforming according to signal characteristics. The minimum mean square error criterion (MSE), the linear constraint minimum variance criterion (LCMV), the maximum likelihood criterion (ML), etc., and then the criterion function is solved to obtain the signal combination of the target beamforming, as shown in Figure 2-Figure 4 , is a schematic diagram of the radio range of the microphone corresponding to different receiving beam angles.
而具体的,可以首先根据上一步中确定出的各个麦克风装置的目标音频调节参数和各个麦克风具体的分音频信号按照波束成型算法组合成具有最小波束角的有指向的目标组合音频信号。根据上述描述可知,本实施场景中最小波束角与麦克风装置的数量和相邻两个麦克风装置的间距相关。Specifically, the target audio adjustment parameters of each microphone device determined in the previous step and the specific sub-audio signals of each microphone can be combined into a directional target combined audio signal with a minimum beam angle according to a beamforming algorithm. It can be known from the above description that the minimum beam angle in this implementation scenario is related to the number of microphone devices and the distance between two adjacent microphone devices.
请结合参考图6和图7,图6是一个实施例中的目标终端的正视结构示意图,图7是一个实施例中的目标终端的背视结构示意图。目标终端10包括本体11和附件模组12,附件模组12与本体11可旋转连接,例如,通过旋转轴连接,本实施场景中,旋转轴连接附件模组12和本体11的中心位置,在其他实施场景中,旋转轴还可以连接附件模组12和本体11的边缘位置。附件模组12包括可变焦摄像头装置121和麦克风阵列122,可变焦摄像头装置121和麦克风阵列122位于附件模块12的相邻的两面,例如麦克风阵列122位于靠近用户的一面,可变焦摄像头装置位于附件模组12的面积最小的侧面。可变焦摄像头装置121的拍摄方向与麦克风阵列122的收音方向相同。例如,麦克风阵列121为线性阵列,包括多个麦克风装置,该多个麦克风装置的排列方向与可变焦摄像头121的感光面垂直,使得可变焦摄像头装置121和麦克风阵列122指向相同,更好的保证收音的对象与拍摄的对象一致。Please refer to FIG. 6 and FIG. 7 in combination. FIG. 6 is a schematic diagram of a front view structure of a target terminal in an embodiment, and FIG. 7 is a schematic diagram of a rear view structure of the target terminal in an embodiment. The target terminal 10 includes a main body 11 and an accessory module 12. The accessory module 12 is rotatably connected to the main body 11, for example, connected by a rotating shaft. In this implementation scenario, the rotating shaft connects the center position of the accessory module 12 and the main body 11. In other implementation scenarios, the rotating shaft may also connect the accessory module 12 and the edge position of the main body 11 . The accessory module 12 includes a zoom camera device 121 and a microphone array 122. The zoom camera device 121 and the microphone array 122 are located on two adjacent sides of the accessory module 12. For example, the microphone array 122 is located on the side close to the user, and the zoom camera device is located in the accessory. The side with the smallest area of the module 12 . The shooting direction of the zoom camera device 121 is the same as the sound collection direction of the microphone array 122 . For example, the microphone array 121 is a linear array, including a plurality of microphone devices, and the arrangement direction of the plurality of microphone devices is perpendicular to the photosensitive surface of the zoom camera 121, so that the zoom camera device 121 and the microphone array 122 point in the same direction, which better guarantees The subject of the sound is the same as the subject of the shooting.
如图6和图7中所示的,附件模组12为一长方体,麦克风阵列122位于该长方体的长边和宽边所构成的长方形面上,多个麦克风装置的排列方向与长方体的长边平行。可变焦摄像头装置121位于该长方体的宽边和高边所构成的长方形面上,可变焦摄像头装置121的感光面平行于该长方形面。因此,多个麦克风装置的排列方向与可变焦摄像头装置121的感光面垂直。多个麦克风装置的排列方向为麦克风阵列122的收音方向,则麦克风阵列122的收音方向与可变焦摄像头的感光方向相同。As shown in FIG. 6 and FIG. 7 , the accessory module 12 is a rectangular parallelepiped, the microphone array 122 is located on the rectangular surface formed by the long side and the wide side of the rectangular parallelepiped, and the arrangement direction of the plurality of microphone devices corresponds to the long side of the rectangular parallelepiped. parallel. The variable-focus camera device 121 is located on a rectangular surface formed by the wide side and the high side of the cuboid, and the photosensitive surface of the variable-focus camera device 121 is parallel to the rectangular surface. Therefore, the arrangement direction of the plurality of microphone devices is perpendicular to the photosensitive surface of the variable-focus camera device 121 . The arrangement direction of the plurality of microphone devices is the sound collection direction of the microphone array 122 , and the sound collection direction of the microphone array 122 is the same as the light receiving direction of the zoom camera.
请结合参考图8,图8示出了一个实施例中音频信号处理方法的流程图。本发明中所述的音频信号处理方法可以包括如图7所示的步骤S2022-S2026,详细介绍如下:Please refer to FIG. 8 , which shows a flowchart of an audio signal processing method in one embodiment. The audio signal processing method described in the present invention may include steps S2022-S2026 as shown in FIG. 7, which are described in detail as follows:
在步骤S2022中,分别获取各个麦克风装置采集的分音频信号。In step S2022, the sub-audio signals collected by each microphone device are acquired respectively.
本步骤与图1所示的一个实施例中音频信号处理方法的步骤S1022基本一致,此处不再进行赘述。This step is basically the same as step S1022 of the audio signal processing method in the embodiment shown in FIG. 1 , and will not be repeated here.
在步骤S2024中,根据可变焦摄像头装置的焦距参数获取目标音频调节参数,根据所述目标音频调节参数获取目标音频调节值。In step S2024, the target audio adjustment parameter is obtained according to the focal length parameter of the variable-focus camera device, and the target audio adjustment value is obtained according to the target audio adjustment parameter.
首先此处获取焦距参数的原因在于,在使用可变焦摄像头进行录像时,焦距参数即反映了在录像时对于目标对象的图像数据的采集范围,随着摄像头的焦距参数的调整,所获取到图像范围也会随着调整,如根据摄影常识,在焦距在24mm以下的镜头称为“超广角镜头”,这种镜头视角大,获取的图像范围较大,而在焦距为100mm以及以上时,一般都是微距镜头,获取的图像范围较小,一般进行微距摄影以及非常近距离的特写。First of all, the reason why the focal length parameter is obtained here is that when the zoom camera is used for recording, the focal length parameter reflects the acquisition range of the image data of the target object during recording. With the adjustment of the focal length parameter of the camera, the obtained image The range will also be adjusted accordingly. For example, according to photography common sense, a lens with a focal length below 24mm is called an "ultra-wide-angle lens". This lens has a large viewing angle and can obtain a large range of images. When the focal length is 100mm and above, it is generally It is a macro lens that captures a small range of images, and is generally used for macro photography and very close-up close-ups.
可以根据焦距参数进行推断,当使用的焦距参数越小,则证明需要拍摄的范围越大,则此时声源的范围也越大,而焦距参数越大时,需要拍摄的范围越小,则此时声源的范围也越小,因此可以根据焦距参数调节目标音频调节参数,从而使得接收到的目标设备音频信号质量更高。It can be inferred according to the focal length parameter. When the focal length parameter used is smaller, it proves that the range to be shot is larger, and the range of the sound source is also larger at this time. When the focal length parameter is larger, the range to be shot is smaller. At this time, the range of the sound source is also smaller, so the target audio adjustment parameter can be adjusted according to the focal length parameter, so that the received audio signal quality of the target device is higher.
在一个具体的实施场景中,目标音频调节参数还包括补偿系数,补偿系数的大小与所述可变焦摄像头装置的焦距参数成正比。具体的说,在所述焦距参数大于预设阈值时,所述补偿系数取值为1;在所述焦距参数小于或等于所述预设阈值时,所述补偿系数的取值小于1。In a specific implementation scenario, the target audio adjustment parameter further includes a compensation coefficient, and the size of the compensation coefficient is proportional to the focal length parameter of the zoom camera device. Specifically, when the focal length parameter is greater than a preset threshold, the compensation coefficient takes a value of 1; when the focal length parameter is less than or equal to the preset threshold, the compensation coefficient takes a value of less than 1.
举例进行说明,如进行长焦-超长焦拍摄时(如焦距参数为100mm时),此处的补偿系数可以取1,即不针对各个麦克风装置进行空间相位差的调节,这样在不调节的情况下就类似于全指向的固定波束角远场拾音,从而对应地只采集画面内主体的声音,避免了周围环境的干扰。在进行广角拍摄多人对话、主体与环境的互动等场景时(如焦距参数为24mm时),可以取较小的补偿系数(如0.5),从而采集更大范围内的声音,避免丢失必要的声音信息。For example, when taking telephoto-super telephoto shooting (for example, when the focal length parameter is 100mm), the compensation coefficient here can be set to 1, that is, the adjustment of the spatial phase difference is not performed for each microphone device. In this case, it is similar to omnidirectional fixed beam angle far-field sound pickup, so that only the sound of the subject in the screen is correspondingly collected, avoiding the interference of the surrounding environment. When shooting scenes such as multi-person conversations, the interaction between the subject and the environment, etc. (such as when the focal length parameter is 24mm), a smaller compensation coefficient (such as 0.5) can be used to collect sounds in a wider range and avoid losing necessary sound information.
当补偿系数k取值为0时,无相位补偿,即退化为全指向拾音,即“超广角”的极限。当k在[0,1]之前取值,波束角将在[θ,2π]之间变化。When the compensation coefficient k is 0, there is no phase compensation, that is, it degenerates into omnidirectional pickup, that is, the limit of "super wide angle". When k takes a value before [0, 1], the beam angle will vary between [θ, 2π].
在本实施场景中,目标音频调节值等于补偿系数与每个麦克风位置对应相位补偿值以及空间相位差的乘积。例如,例如麦克风1的目标音频调节值为相位补偿1*补偿系数k*空间相位差φ,麦克风2的目标音频调节值为相位补偿2*补偿系数k*空间相位差φ,等等,以此类推。In this implementation scenario, the target audio adjustment value is equal to the product of the compensation coefficient, the phase compensation value corresponding to each microphone position, and the spatial phase difference. For example, for example, the target audio adjustment value of microphone 1 is phase compensation 1*compensation coefficient k*spatial phase difference φ, the target audio adjustment value of microphone 2 is phase compensation 2*compensation coefficient k*spatial phase difference φ, and so on, so that analogy.
在可选的实施例中,为了进一步地提升用户的音频体验,考虑到由于采集设备的硬件的限制,类似于在近焦进行图像获取时,可能出现图像较为模糊或者失焦的情况导致用户录像体验不佳,在进行目标组合音频信号的组合之前还可以按照预设的预处理算法对所述分音频信号进行去噪处理。In an optional embodiment, in order to further improve the user's audio experience, considering the limitation of the hardware of the capture device, similar to when the image is acquired at close focus, the image may be blurred or out of focus, resulting in the user recording If the experience is not good, the sub-audio signal may also be denoised according to a preset preprocessing algorithm before combining the target combined audio signal.
同样的,考虑到在实际应用中,用户在录像时可能对音效有自己的偏好,如故意收录环境音或者环境音收录的范围并不是完全与画面所展现的范围一致的,如应用一些特殊的拍摄手法,因此在可选的实施例中,在按照预设的波束成型算法、根据各个所述麦克风装置空间相位差、所述分音频信号和所述目标调节参数确定目标组合音频信号之后,还包括:Similarly, considering that in practical applications, users may have their own preferences for sound effects when recording, such as deliberately recording ambient sounds or the range of ambient sounds is not completely consistent with the range displayed on the screen, such as applying some special Therefore, in an optional embodiment, after determining the target combined audio signal according to the preset beamforming algorithm, according to the spatial phase difference of each of the microphone devices, the sub-audio signal and the target adjustment parameter, the target combined audio signal is also determined. include:
获取通过预设的界面或装置输入的调节参数,根据所述调节参数确定所述目标调节参数。The adjustment parameters input through a preset interface or device are acquired, and the target adjustment parameters are determined according to the adjustment parameters.
举例说明,此处的调节参数可以是用户所选择的预设的录音模式如“演唱会模式”、“室内模式”、“运动模式”等,然后根据被选择的预设录音模式参数与目标调节参数进行确定,如在“演唱会模式”为输入的调节参数时,用于音频变焦的目标调节参数可以适当缩小,如从根据焦距参数确定出的0.6调整为0.4。For example, the adjustment parameter here can be a preset recording mode selected by the user, such as "concert mode", "indoor mode", "sports mode", etc., and then adjust according to the selected preset recording mode parameters and goals The parameters are determined. For example, when the "concert mode" is the input adjustment parameter, the target adjustment parameter for the audio zoom can be appropriately reduced, for example, adjusted from 0.6 determined according to the focal length parameter to 0.4.
在步骤S2026中,按照预设的波束成型算法和所述分音频信号、所述目标调节值确定目标组合音频信号。In step S2026, a target combined audio signal is determined according to a preset beamforming algorithm, the sub-audio signal, and the target adjustment value.
本步骤与图1所示的一个实施例中音频信号处理方法的步骤S1026基本一致,此处不再进行赘述。This step is basically the same as step S1026 of the audio signal processing method in the embodiment shown in FIG. 1 , and will not be repeated here.
图9示出了一个实施例中音频信号处理装置的结构框图。FIG. 9 shows a structural block diagram of an audio signal processing apparatus in an embodiment.
参考图9所示,根据本发明的一个实施例的音频信号处理装置1060,包括:获取单元1062、确定单元1064、组合单元1066。Referring to FIG. 9 , an audio signal processing apparatus 1060 according to an embodiment of the present invention includes: an obtaining unit 1062 , a determining unit 1064 , and a combining unit 1066 .
其中,获取单元1062:用于分别获取各个麦克风装置采集的分音频信号;Wherein, the obtaining unit 1062 is used to obtain the sub-audio signals collected by each microphone device respectively;
确定单元1064:用于通过所述可变焦摄像头装置获取焦距参数,根据所述焦距参数确定目标音频调节参数;Determining unit 1064: configured to obtain a focal length parameter through the zoom camera device, and determine a target audio adjustment parameter according to the focal length parameter;
组合单元1066:用于按照预设的波束成型算法和所述分音频信号、所述目标调节参数确定目标组合音频信号。Combining unit 1066: configured to determine a target combined audio signal according to a preset beamforming algorithm, the sub-audio signal, and the target adjustment parameter.
其中,更进一步地,所述目标设备还包括可变焦摄像头装置,确定单元1064还用于:Wherein, further, the target device further includes a zoom camera device, and the determining unit 1064 is further configured to:
根据所述可变焦摄像头装置的焦距参数调节所述目标音频调节参数。The target audio adjustment parameter is adjusted according to the focal length parameter of the zoom camera device.
所述目标音频调节参数包括所述麦克风阵列中每个麦克风位置对应的相位补偿值和空间相位差。The target audio adjustment parameter includes a phase compensation value and a spatial phase difference corresponding to each microphone position in the microphone array.
确定单元1064还用于:The determination unit 1064 is also used to:
分别根据各个所述麦克风装置之间的间距和声速信息确定各个所述麦克风装置的信号延迟时间;Determining the signal delay time of each of the microphone devices according to the distance between each of the microphone devices and the speed of sound information;
根据所述各个麦克风装置的信号延迟时间分别确定各个所述麦克风装置对应的所述相位补偿值和所述空间相位差。The phase compensation value and the spatial phase difference corresponding to each of the microphone devices are respectively determined according to the signal delay time of each of the microphone devices.
更进一步地,所述目标参数还包括补偿系数,所述补偿系数的大小与所述可变焦摄像头装置的焦距参数成正比。Further, the target parameter further includes a compensation coefficient, and the magnitude of the compensation coefficient is proportional to the focal length parameter of the variable-focus camera device.
更进一步地,在所述焦距参数大于预设阈值时,所述补偿系数取值为1;在所述焦距参数小于或等于所述预设阈值时,所述补偿系数的取值小于1。Further, when the focal length parameter is greater than a preset threshold, the compensation coefficient takes a value of 1; when the focal length parameter is less than or equal to the preset threshold, the compensation coefficient takes a value of less than 1.
其中,目标终端包括本体和附件模组,所述附件模组与所述本体可旋转连接,所述附件模组包括可变焦摄像头装置和所述麦克风阵列,所述麦克风阵列与所述可变焦摄像头装置位于所述附件模组的同一面且指向相同。The target terminal includes a body and an accessory module, the accessory module is rotatably connected to the body, the accessory module includes a zoom camera device and the microphone array, and the microphone array is connected to the zoom camera. The devices are located on the same side of the accessory module and point in the same direction.
其中,所述麦克风阵列为线性阵列。Wherein, the microphone array is a linear array.
图10示出了一个实施例中计算机设备的内部结构图。该计算机设备具体可以是终端,也可以是服务器。如图10所示,该计算机设备包括通过系统总线连接的处理器、存储器和输出模块、获取模块、处理模块。其中,存储器包括非易失性存储介质和内存储器。该计算机设备的非易失性存储介质存储有操作系统,还可存储有计算机程序,该计算机程序被处理器执行时,可使得处理器实现本音频信号处理方法。该内存储器中也可储存有计算机程序,该计算机程序被处理器执行时,可使得处理器执行本音频信号处理方法。本领域技术人员可以理解,图10中示出的结构,仅仅是与本发明方案相关的部分结构的框图,并不构成对本发明方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。Figure 10 shows a diagram of the internal structure of a computer device in one embodiment. Specifically, the computer device may be a terminal or a server. As shown in FIG. 10 , the computer device includes a processor, a memory and an output module, an acquisition module, and a processing module connected through a system bus. Wherein, the memory includes a non-volatile storage medium and an internal memory. The non-volatile storage medium of the computer device stores an operating system, and also stores a computer program, which, when executed by the processor, enables the processor to implement the audio signal processing method. A computer program can also be stored in the internal memory. When the computer program is executed by the processor, the processor can execute the audio signal processing method. Those skilled in the art can understand that the structure shown in FIG. 10 is only a block diagram of a partial structure related to the solution of the present invention, and does not constitute a limitation on the computer equipment to which the solution of the present invention is applied. Include more or fewer components than shown in the figures, or combine certain components, or have a different arrangement of components.
在一个实施例中,提出了一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,所述计算机程序被所述处理器执行时,使得所述处理器执行如图1、图5和图8所示的步骤。In one embodiment, a computer device is proposed, including a memory and a processor, wherein the memory stores a computer program, and when the computer program is executed by the processor, the processor executes the execution of FIG. 1 and FIG. 5 and the steps shown in Figure 8.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一非易失性计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,本发明所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink) DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented by instructing relevant hardware through a computer program, and the program can be stored in a non-volatile computer-readable storage medium , when the program is executed, it may include the flow of the embodiments of the above-mentioned methods. Wherein, any reference to memory, storage, database or other medium used in the various embodiments provided by the present invention may include non-volatile and/or volatile memory. Nonvolatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in various forms such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Road (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。The technical features of the above embodiments can be combined arbitrarily. In order to make the description simple, all possible combinations of the technical features in the above embodiments are not described. However, as long as there is no contradiction in the combination of these technical features It is considered to be the range described in this specification.
以上所述实施例仅表达了本发明的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对本发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本发明构思的前提下,还可以做出若干变形和改进,这些都属于本发明的保护范围。因此,本发明专利的保护范围应以所附权利要求为准。The above-mentioned embodiments only represent several embodiments of the present invention, and the descriptions thereof are specific and detailed, but should not be construed as a limitation on the scope of the patent of the present invention. It should be pointed out that for those of ordinary skill in the art, without departing from the concept of the present invention, several modifications and improvements can also be made, which all belong to the protection scope of the present invention. Therefore, the protection scope of the patent of the present invention should be subject to the appended claims.

Claims (10)

  1. 一种音频信号处理方法,其特征在于,所述方法基于一目标设备,所述目标设备包括麦克风阵列,所述麦克风阵列包括设置于不同位置的多个麦克风装置;An audio signal processing method, wherein the method is based on a target device, the target device includes a microphone array, and the microphone array includes a plurality of microphone devices arranged at different positions;
    所述方法包括:The method includes:
    分别获取各个麦克风装置采集的分音频信号;respectively acquiring the sub-audio signals collected by each microphone device;
    获取目标音频调节参数,根据所述目标音频调节值获取目标音频调节值;Obtain target audio adjustment parameters, and obtain target audio adjustment values according to the target audio adjustment values;
    按照预设的波束成型算法和所述分音频信号、所述目标调节值确定目标设备音频信号。The audio signal of the target device is determined according to the preset beamforming algorithm, the sub-audio signal, and the target adjustment value.
  2. 根据权利要求1所述的音频信号处理方法,其特征在于,所述目标设备还包括可变焦摄像头装置;The audio signal processing method according to claim 1, wherein the target device further comprises a zoom camera device;
    所述音频信号处理方法还包括:The audio signal processing method further includes:
    根据所述可变焦摄像头装置的焦距参数获取所述目标音频调节参数。The target audio adjustment parameter is acquired according to the focal length parameter of the variable-focus camera device.
  3. 根据权利要求2所述的音频信号处理方法,其特征在于,所述目标音频调节参数包括所述麦克风阵列中每个麦克风位置对应的相位补偿值和空间相位差;The audio signal processing method according to claim 2, wherein the target audio adjustment parameter comprises a phase compensation value and a spatial phase difference corresponding to each microphone position in the microphone array;
    所述获取目标音频调节参数,包括:The obtaining target audio adjustment parameters includes:
    分别根据各个所述麦克风装置之间的间距和声速信息确定各个所述麦克风装置的信号延迟时间;Determining the signal delay time of each of the microphone devices according to the distance between each of the microphone devices and the speed of sound information;
    根据所述各个麦克风装置的信号延迟时间分别确定各个所述麦克风装置对应的所述相位补偿值和所述空间相位差。The phase compensation value and the spatial phase difference corresponding to each of the microphone devices are respectively determined according to the signal delay time of each of the microphone devices.
  4. 根据权利要求2所述的音频信号处理方法,其特征在于,所述目标参数还包括补偿系数,所述补偿系数的大小与所述可变焦摄像头装置的焦距参数成正比。The audio signal processing method according to claim 2, wherein the target parameter further comprises a compensation coefficient, and the size of the compensation coefficient is proportional to the focal length parameter of the variable-focus camera device.
  5. 根据权利要求4所述的音频信号处理方法,其特征在于, The audio signal processing method according to claim 4, wherein,
    所述根据所述可变焦摄像头装置的焦距参数获取所述目标音频调节参数,包括:The acquiring the target audio adjustment parameter according to the focal length parameter of the variable-focus camera device includes:
    在所述焦距参数大于预设阈值时,所述补偿系数取值为1;When the focal length parameter is greater than a preset threshold, the compensation coefficient takes a value of 1;
    在所述焦距参数小于或等于所述预设阈值时,所述补偿系数的取值小于1。When the focal length parameter is less than or equal to the preset threshold, the value of the compensation coefficient is less than 1.
  6. 一种目标终端,其特征在于,所述目标终端包括本体和附件模组,所述附件模组与所述本体可旋转连接,所述附件模组包括可变焦摄像头装置和麦克风阵列;A target terminal, characterized in that the target terminal includes a body and an accessory module, the accessory module is rotatably connected to the body, and the accessory module includes a zoom camera device and a microphone array;
    所述可变焦摄像头和所述麦克风阵列位于所述附件模组相邻的两面,所述可变焦摄像头的感光方向与所述麦克风阵列的收音方向相同。The zoom camera and the microphone array are located on two adjacent sides of the accessory module, and the photosensitive direction of the zoom camera is the same as the sound collection direction of the microphone array.
  7. 根据权利要求6所述的目标终端,其特征在于,所述麦克风阵列为线性阵列,包括多个麦克风装置,所述多个麦克风装置的连线与所述可变焦摄像头的感光面垂直。The target terminal according to claim 6, wherein the microphone array is a linear array, comprising a plurality of microphone devices, and a connection line of the plurality of microphone devices is perpendicular to the photosensitive surface of the variable-focus camera.
  8. 一种音频信号处理装置,其特征在于,所述装置包括:An audio signal processing device, characterized in that the device comprises:
    获取单元:用于分别获取各个麦克风装置采集的分音频信号;Obtaining unit: used to obtain the sub-audio signals collected by each microphone device respectively;
    确定单元:用于获取目标音频调节参数,根据所述目标音频调节参数获取目标音频调节值;Determining unit: used to obtain target audio adjustment parameters, and obtain target audio adjustment values according to the target audio adjustment parameters;
    组合单元:用于按照预设的波束成型算法和所述分音频信号、所述目标调节参数确定目标组合音频信号。Combining unit: used to determine a target combined audio signal according to a preset beamforming algorithm, the sub-audio signal, and the target adjustment parameter.
  9. 一种可读存储介质,存储有计算机程序,所述计算机程序被处理器执行时,使得所述处理器执行如权利要求1至7中任一项所述方法的步骤。A readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of the method according to any one of claims 1 to 7.
  10. 一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,所述计算机程序被所述处理器执行时,使得所述处理器执行如权利要求1至7中任一项所述方法的步骤。A computer device comprising a memory and a processor, the memory stores a computer program, when the computer program is executed by the processor, the processor causes the processor to execute the method according to any one of claims 1 to 7 A step of.
PCT/CN2020/104772 2020-07-10 2020-07-27 Audio signal processing method and apparatus, device and readable medium WO2022007030A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010663763.X 2020-07-10
CN202010663763.XA CN111916094B (en) 2020-07-10 2020-07-10 Audio signal processing method, device, equipment and readable medium

Publications (1)

Publication Number Publication Date
WO2022007030A1 true WO2022007030A1 (en) 2022-01-13

Family

ID=73226324

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/104772 WO2022007030A1 (en) 2020-07-10 2020-07-27 Audio signal processing method and apparatus, device and readable medium

Country Status (2)

Country Link
CN (1) CN111916094B (en)
WO (1) WO2022007030A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115631758A (en) * 2022-12-21 2023-01-20 无锡沐创集成电路设计有限公司 Audio signal processing method, apparatus, device and storage medium

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111641794B (en) * 2020-05-25 2023-03-28 维沃移动通信有限公司 Sound signal acquisition method and electronic equipment
CN112929606A (en) * 2021-01-29 2021-06-08 世邦通信股份有限公司 Audio and video acquisition method and device and storage medium
CN113225646B (en) * 2021-04-28 2022-09-20 世邦通信股份有限公司 Audio and video monitoring method and device, electronic equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104053088A (en) * 2013-03-11 2014-09-17 联想(北京)有限公司 Microphone array adjustment method, microphone array and electronic device
CN104699445A (en) * 2013-12-06 2015-06-10 华为技术有限公司 Audio information processing method and device
CN107181845A (en) * 2016-03-10 2017-09-19 中兴通讯股份有限公司 A kind of microphone determines method and terminal
CN108766457A (en) * 2018-05-30 2018-11-06 北京小米移动软件有限公司 Acoustic signal processing method, device, electronic equipment and storage medium
WO2020037983A1 (en) * 2018-08-20 2020-02-27 华为技术有限公司 Audio processing method and apparatus
CN210518437U (en) * 2019-11-27 2020-05-12 维沃移动通信有限公司 Electronic equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104053088A (en) * 2013-03-11 2014-09-17 联想(北京)有限公司 Microphone array adjustment method, microphone array and electronic device
CN104699445A (en) * 2013-12-06 2015-06-10 华为技术有限公司 Audio information processing method and device
CN107181845A (en) * 2016-03-10 2017-09-19 中兴通讯股份有限公司 A kind of microphone determines method and terminal
CN108766457A (en) * 2018-05-30 2018-11-06 北京小米移动软件有限公司 Acoustic signal processing method, device, electronic equipment and storage medium
WO2020037983A1 (en) * 2018-08-20 2020-02-27 华为技术有限公司 Audio processing method and apparatus
CN210518437U (en) * 2019-11-27 2020-05-12 维沃移动通信有限公司 Electronic equipment

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115631758A (en) * 2022-12-21 2023-01-20 无锡沐创集成电路设计有限公司 Audio signal processing method, apparatus, device and storage medium

Also Published As

Publication number Publication date
CN111916094A (en) 2020-11-10
CN111916094B (en) 2024-02-23

Similar Documents

Publication Publication Date Title
WO2022007030A1 (en) Audio signal processing method and apparatus, device and readable medium
JP6023779B2 (en) Audio information processing method and apparatus
CN103026734B (en) Electronic apparatus for generating beamformed audio signals with steerable nulls
Farina et al. A spherical microphone array for synthesizing virtual directive microphones in live broadcasting and in post production
EP2875624B1 (en) Portable electronic device with directional microphones for stereo recording
JP2013543987A (en) System, method, apparatus and computer readable medium for far-field multi-source tracking and separation
US9967660B2 (en) Signal processing apparatus and method
CN112492445B (en) Method and processor for realizing signal equalization by using ear-covering type earphone
CN112686824A (en) Image correction method, image correction device, electronic equipment and computer readable medium
CN104244137A (en) Method and system for improving long-shot recording effect during videoing
US20170188138A1 (en) Microphone beamforming using distance and enrinonmental information
WO2022000174A1 (en) Audio processing method, audio processing apparatus, and electronic device
WO2016197444A1 (en) Method and terminal for achieving shooting
CN115547354A (en) Beam forming method, device and equipment
WO2021237565A1 (en) Audio processing method, electronic device and computer-readable storage medium
CN114554154A (en) Audio and video pickup position selection method and system, audio and video acquisition terminal and storage medium
CN110268705A (en) Image pick up equipment and image picking system
JP2013135373A (en) Zoom microphone device
CN115884038A (en) Audio acquisition method, electronic device and storage medium
WO2023088156A1 (en) Sound velocity correction method and apparatus
CN113824916A (en) Image display method, device, equipment and storage medium
US20220030353A1 (en) Flexible differential microphone arrays with fractional order
CN111629126A (en) Audio and video acquisition device and method
CN205028652U (en) System for reduce motor vibration noise in video system of shooting with video -corder
US20230105785A1 (en) Video content providing method and video content providing device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20944173

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20944173

Country of ref document: EP

Kind code of ref document: A1