WO2022007030A1

WO2022007030A1 - Audio signal processing method and apparatus, device and readable medium

Info

Publication number: WO2022007030A1
Application number: PCT/CN2020/104772
Authority: WO
Inventors: 张金宇
Original assignee: 瑞声声学科技(深圳)有限公司
Priority date: 2020-07-10
Filing date: 2020-07-27
Publication date: 2022-01-13
Also published as: CN111916094A; CN111916094B

Abstract

Disclosed in the embodiments of the present invention are an audio signal processing method and apparatus, a device and a readable medium. The method is based on a target device. The target device comprises a microphone array, and the microphone array comprises a plurality of microphone apparatuses disposed at different positions. The method comprises: separately acquiring sub-audio signals collected by the microphone apparatuses; acquiring target audio adjustment parameters; and determining a target combined audio signal according to a preset beamforming algorithm, the sub-audio signals, and the target adjustment parameters. The present invention improves the quality of the recorded audio.

Description

Audio signal processing method, apparatus, device and readable medium

technical field

The present invention relates to the field of computer data processing, and in particular, to an audio signal processing method, apparatus, device and readable medium.

Background technique

With the increasing popularity of smart devices and mobile terminals, the video recording function provided by more and more devices has become one of the functions widely used by users. The video recording function is mainly used to obtain the image information and audio information corresponding to the target object at the same time, which is mainly realized by the camera and microphone device set in the device.

technical problem

Due to the emergence of variable zoom optical cameras and the development of related optical processing technologies, the cameras of most devices have achieved a large degree of zoom, that is, they can shoot both close objects (with a smaller focal length) and distant objects (with a longer focal length). Big).

However, at the same time, in the prior art, the microphone devices in the equipment are generally omnidirectional, that is, they cannot be zoomed, so that when recording video, the zoom camera is aimed at the target object by zooming, but the signal collection range of the microphone is limited. It is still relatively large, which results in inconsistent display ranges of audio and images, which affects the user's video recording experience.

technical solutions

Based on this, it is necessary to propose an audio signal processing method, apparatus, computer equipment and readable medium in response to the above problems.

An audio signal processing method, the method is based on a target device, the target device includes a microphone array, and the microphone array includes a plurality of microphone devices arranged in different positions;

The method includes:

respectively acquiring the sub-audio signals collected by each microphone device;

Obtain target audio adjustment parameters, and obtain target audio adjustment values according to the target audio adjustment parameters;

The target combined audio signal is determined according to the preset beamforming algorithm, the sub-audio signal, and the target adjustment parameter.

Wherein, further, the target device also includes a zoom camera device;

The audio signal processing method further includes:

The target audio adjustment parameter is adjusted according to the focal length parameter of the zoom camera device.

Further, the target audio adjustment parameter includes a phase compensation value and a spatial phase difference corresponding to each microphone position in the microphone array;

The obtaining target audio adjustment parameters includes:

Determining the signal delay time of each of the microphone devices according to the distance between each of the microphone devices and the speed of sound information;

The phase compensation value and the spatial phase difference corresponding to each of the microphone devices are respectively determined according to the signal delay time of each of the microphone devices.

Further, the target parameter further includes a compensation coefficient, and the magnitude of the compensation coefficient is proportional to the focal length parameter of the variable-focus camera device.

Further, the adjusting the target audio adjustment parameter according to the focal length parameter of the zoom camera device includes:

When the focal length parameter is greater than a preset threshold, the compensation coefficient takes a value of 1;

When the focal length parameter is less than or equal to the preset threshold, the value of the compensation coefficient is less than 1.

A target terminal, the target terminal includes a body and an accessory module, the accessory module is rotatably connected to the body, and the accessory module includes a zoom camera device and a microphone array;

The zoom camera and the microphone array are located on two adjacent sides of the accessory module, and the photosensitive direction of the zoom camera is the same as the sound collection direction of the microphone array.

Further, the microphone array is a linear array, including a plurality of microphone devices, and the connection lines of the plurality of microphone devices are perpendicular to the photosensitive surface of the variable-focus camera.

An audio signal processing device, the device comprises:

Acquisition unit: used to acquire the sub-audio signals collected by each microphone device;

Determining unit: used to obtain target audio adjustment parameters, and obtain target audio adjustment values according to the target audio adjustment parameters;

Combining unit: used to determine a target combined audio signal according to a preset beamforming algorithm, the sub-audio signal, and the target adjustment parameter.

A computer device, comprising a memory and a processor, the memory stores a computer program, and when the computer program is executed by the processor, the processor causes the processor to perform the steps as described above

A computer-readable storage medium storing a computer program, when executed by a processor, causes the processor to perform the steps described above.

beneficial effect

In the embodiment of the present invention, firstly, the sub-audio signals collected by each microphone device are obtained respectively; then the target audio adjustment parameters are determined, and finally the target combined audio signal is determined according to the preset beamforming algorithm, the foregoing sub-audio signals, and the target adjustment parameters. The invention can obtain suitable target adjustment parameters for different audio source application scenarios, improve the sound quality of the audio signal of the target device, and meet the different usage requirements of users.

Description of drawings

In order to explain the embodiments of the present invention or the technical solutions in the prior art more clearly, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments of the present invention. For those of ordinary skill in the art, other drawings can also be obtained according to these drawings without creative efforts.

in:

Fig. 1 shows the flow chart of the audio signal processing method in one embodiment;

Fig. 2 shows the receiving beam angle required by the microphone array corresponding to the sound source in one embodiment;

Fig. 3 shows the receiving beam angle required by the microphone array corresponding to the sound source in another embodiment;

Fig. 4 shows the receiving beam angle required by the microphone array corresponding to the sound source in yet another embodiment;

Fig. 5 shows the flow chart of determining the phase compensation value and the spatial phase difference corresponding to each microphone device in one embodiment;

Fig. 6 shows the front structure schematic diagram of the target terminal in one embodiment;

7 shows a schematic diagram of a rear view structure of a target terminal in one embodiment;

Fig. 8 shows the flow chart of the audio signal processing method in still another embodiment;

Fig. 9 shows the structural block diagram of the audio signal processing apparatus in one embodiment;

Figure 10 shows an internal structure diagram of a computer device in one embodiment.

Embodiments of the present invention

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, but not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

The present invention provides an audio signal processing method. In one embodiment, the present invention may be based on a target device, wherein the target device includes a microphone array, and the microphone array includes a plurality of microphone devices arranged in different positions. In an optional embodiment, the target device may be, for example, a mobile phone, a tablet computer, etc., or a photographing auxiliary tool for connecting with other devices such as a mobile phone.

Referring to FIG. 1 , an embodiment of the present invention provides an audio signal processing method.

FIG. 1 shows a flowchart of an audio signal processing method in one embodiment. The audio signal processing method described in the present invention may include steps S1022-S1026 as shown in FIG. 1 , which are described in detail as follows:

In step S1022, the sub-audio signals collected by each microphone device are acquired respectively.

Before the detailed introduction of the audio signal processing method, the microphone array used to collect the audio signal is first introduced.

Microphone array refers to an array formed by a group of omnidirectional microphones located at different positions in space according to a certain shape and rule. It is a device for spatial sampling of spatially propagated sound signals. Spatial location information of the sound signal. According to the distance between the sound source and the microphone array, the microphone array can be divided into a near-field model and a far-field model. According to the topology of the microphone array, it can be divided into linear array, planar array, volume array and so on.

The field model regards the sound wave as a spherical wave, which considers the amplitude difference between the received signals of the microphone array elements; the far-field model regards the sound wave as a plane wave, it ignores the amplitude difference between the received signals of each array element, and approximately considers the difference between the received signals of each array element. is a simple time-delay relationship. Obviously, the far-field model is a simplification of the actual model, which greatly simplifies the processing difficulty. The general speech enhancement method is based on the far-field model.

Therefore, it is easy to understand that in order to obtain different sound pickup effects, the design methods (topologies) of the microphone arrays included in different types and purposes of equipment are quite different, that is, the number of microphones in the microphone array, the number of microphones in each microphone device There are also differences in distance.

A common microphone array structure may exist such as a one-dimensional microphone array, that is, a linear microphone array, the centers of which are located on the same straight line. According to whether the distance between adjacent array elements is the same, it can be divided into uniform linear array (Uniform Linear Array, ULA) and nested linear array, the linear array can only get the horizontal direction angle information of the signal.

Or, a two-dimensional microphone array, that is, a planar microphone array, the centers of the array elements are distributed on a plane. According to the geometric shape of the array, it can be divided into equilateral triangle array, T-shaped array, uniform circular array, uniform square array, coaxial circular array, circular or rectangular area array, etc. The planar array can obtain the horizontal azimuth and vertical azimuth information of the signal.

Regarding the setting of the spacing between the various elements in the microphone array, for example, in the linear four-microphone array configuration, four microphone devices are set at equal distances, and the spacing between each microphone device is 20-60mm, while in the ring The six-mic array is in a circular layout, and the six microphones are evenly distributed on the circumference clockwise, and the radius is generally 20~60mm.

In this implementation scenario, the microphone array is a linear array, and the distance between each microphone device and the target sound source is different, so the space and timing information of the received sound wave information are different. The sub-audio signals of each microphone in the linear array are combined to obtain an audio data corresponding to the target sound source object.

In step 1024, the target audio adjustment parameter is obtained, and the target audio adjustment value is obtained according to the target audio adjustment parameter.

In this implementation scenario, the target audio adjustment parameter includes a spatial phase difference. Considering the difference in the setting position of each microphone device in the microphone array, after obtaining the above-mentioned sub-audio signal, it is necessary to calculate the spatial phase difference corresponding to each microphone device to perform phase compensation on the sub-audio collected by each microphone device. .

Therefore, steps S1032-S1034 shown in FIG. 5 may also be included after the process of acquiring the sub-audio signals collected by each microphone device respectively. FIG. 5 shows a flowchart of determining the phase compensation value and the spatial phase difference corresponding to each microphone device in one embodiment.

In step S1032, the signal delay time of each microphone device is determined according to preset distance information and sound speed between each of the microphone devices.

Under the conditions of 1 standard atmospheric pressure and 15°C, the standard sound speed is about 340m/s, but in different real-time collection environments (affected by factors such as wind speed, air pressure, temperature, etc.), the sound speed collected by different collection devices varies. Therefore, it is necessary to obtain the current sound speed information in real time, so as to calculate the signal delay time of each microphone device in combination with the current sound speed and the distance between each microphone device.

The specific signal delay time can be obtained according to the ratio of the distance information and the current sound speed.

In addition, the preset distance information between the microphone devices here may be stored in the device memory, and can be obtained.

In step S1034, the phase compensation value and the spatial phase difference corresponding to each of the microphone devices are respectively determined according to the signal delay time of each of the microphone devices.

First of all, there is a difference in the time when the sound waves generated by the target sound source reach the microphone devices located at different positions in the microphone array, that is, the signal delay time here, and the corresponding arrival times of the different signals represent the data collected by each microphone device. Therefore, the corresponding phase compensation value can be determined according to the signal delay time of each microphone device.

The delay difference between at least two microphones in the microphone array can be described in the frequency domain by a phase difference function, commonly referred to as differential phase, which takes a value between -180 degrees and +180 degrees . The spatial phase difference can be calculated from the distance between two adjacent microphone devices in the microphone array and the speed of sound.

Specifically, the target audio adjustment value is the product of the phase compensation value of each microphone device and the spatial phase difference. For example, the target audio adjustment value of microphone 1 is phase compensation 1*spatial phase difference φ, the target audio adjustment value of microphone 2 is phase compensation 2*spatial phase difference φ, and so on.

In step S1026, a target combined audio signal is determined according to a preset beamforming algorithm, the sub-audio signal, and the target adjustment value.

First, the principle of beamforming is introduced: Beamforming refers to the delay or phase compensation and amplitude weighting of the output of each array element in the microphone array to form a beam pointing in a specific direction. Unlike omnidirectional microphones, the beam in this specific direction represents the direction of signal acquisition, so that signal data in a specific direction can be collected in a more targeted manner.

The preset beamforming algorithm can be beamforming with fixed weights, or adaptive beamforming according to signal characteristics. The minimum mean square error criterion (MSE), the linear constraint minimum variance criterion (LCMV), the maximum likelihood criterion (ML), etc., and then the criterion function is solved to obtain the signal combination of the target beamforming, as shown in Figure 2-Figure 4 , is a schematic diagram of the radio range of the microphone corresponding to different receiving beam angles.

Specifically, the target audio adjustment parameters of each microphone device determined in the previous step and the specific sub-audio signals of each microphone can be combined into a directional target combined audio signal with a minimum beam angle according to a beamforming algorithm. It can be known from the above description that the minimum beam angle in this implementation scenario is related to the number of microphone devices and the distance between two adjacent microphone devices.

Please refer to FIG. 6 and FIG. 7 in combination. FIG. 6 is a schematic diagram of a front view structure of a target terminal in an embodiment, and FIG. 7 is a schematic diagram of a rear view structure of the target terminal in an embodiment. The target terminal 10 includes a main body 11 and an accessory module 12. The accessory module 12 is rotatably connected to the main body 11, for example, connected by a rotating shaft. In this implementation scenario, the rotating shaft connects the center position of the accessory module 12 and the main body 11. In other implementation scenarios, the rotating shaft may also connect the accessory module 12 and the edge position of the main body 11 . The accessory module 12 includes a zoom camera device 121 and a microphone array 122. The zoom camera device 121 and the microphone array 122 are located on two adjacent sides of the accessory module 12. For example, the microphone array 122 is located on the side close to the user, and the zoom camera device is located in the accessory. The side with the smallest area of the module 12 . The shooting direction of the zoom camera device 121 is the same as the sound collection direction of the microphone array 122 . For example, the microphone array 121 is a linear array, including a plurality of microphone devices, and the arrangement direction of the plurality of microphone devices is perpendicular to the photosensitive surface of the zoom camera 121, so that the zoom camera device 121 and the microphone array 122 point in the same direction, which better guarantees The subject of the sound is the same as the subject of the shooting.

As shown in FIG. 6 and FIG. 7 , the accessory module 12 is a rectangular parallelepiped, the microphone array 122 is located on the rectangular surface formed by the long side and the wide side of the rectangular parallelepiped, and the arrangement direction of the plurality of microphone devices corresponds to the long side of the rectangular parallelepiped. parallel. The variable-focus camera device 121 is located on a rectangular surface formed by the wide side and the high side of the cuboid, and the photosensitive surface of the variable-focus camera device 121 is parallel to the rectangular surface. Therefore, the arrangement direction of the plurality of microphone devices is perpendicular to the photosensitive surface of the variable-focus camera device 121 . The arrangement direction of the plurality of microphone devices is the sound collection direction of the microphone array 122 , and the sound collection direction of the microphone array 122 is the same as the light receiving direction of the zoom camera.

Please refer to FIG. 8 , which shows a flowchart of an audio signal processing method in one embodiment. The audio signal processing method described in the present invention may include steps S2022-S2026 as shown in FIG. 7, which are described in detail as follows:

In step S2022, the sub-audio signals collected by each microphone device are acquired respectively.

This step is basically the same as step S1022 of the audio signal processing method in the embodiment shown in FIG. 1 , and will not be repeated here.

In step S2024, the target audio adjustment parameter is obtained according to the focal length parameter of the variable-focus camera device, and the target audio adjustment value is obtained according to the target audio adjustment parameter.

First of all, the reason why the focal length parameter is obtained here is that when the zoom camera is used for recording, the focal length parameter reflects the acquisition range of the image data of the target object during recording. With the adjustment of the focal length parameter of the camera, the obtained image The range will also be adjusted accordingly. For example, according to photography common sense, a lens with a focal length below 24mm is called an "ultra-wide-angle lens". This lens has a large viewing angle and can obtain a large range of images. When the focal length is 100mm and above, it is generally It is a macro lens that captures a small range of images, and is generally used for macro photography and very close-up close-ups.

It can be inferred according to the focal length parameter. When the focal length parameter used is smaller, it proves that the range to be shot is larger, and the range of the sound source is also larger at this time. When the focal length parameter is larger, the range to be shot is smaller. At this time, the range of the sound source is also smaller, so the target audio adjustment parameter can be adjusted according to the focal length parameter, so that the received audio signal quality of the target device is higher.

In a specific implementation scenario, the target audio adjustment parameter further includes a compensation coefficient, and the size of the compensation coefficient is proportional to the focal length parameter of the zoom camera device. Specifically, when the focal length parameter is greater than a preset threshold, the compensation coefficient takes a value of 1; when the focal length parameter is less than or equal to the preset threshold, the compensation coefficient takes a value of less than 1.

For example, when taking telephoto-super telephoto shooting (for example, when the focal length parameter is 100mm), the compensation coefficient here can be set to 1, that is, the adjustment of the spatial phase difference is not performed for each microphone device. In this case, it is similar to omnidirectional fixed beam angle far-field sound pickup, so that only the sound of the subject in the screen is correspondingly collected, avoiding the interference of the surrounding environment. When shooting scenes such as multi-person conversations, the interaction between the subject and the environment, etc. (such as when the focal length parameter is 24mm), a smaller compensation coefficient (such as 0.5) can be used to collect sounds in a wider range and avoid losing necessary sound information.

When the compensation coefficient k is 0, there is no phase compensation, that is, it degenerates into omnidirectional pickup, that is, the limit of "super wide angle". When k takes a value before [0, 1], the beam angle will vary between [θ, 2π].

In this implementation scenario, the target audio adjustment value is equal to the product of the compensation coefficient, the phase compensation value corresponding to each microphone position, and the spatial phase difference. For example, for example, the target audio adjustment value of microphone 1 is phase compensation 1*compensation coefficient k*spatial phase difference φ, the target audio adjustment value of microphone 2 is phase compensation 2*compensation coefficient k*spatial phase difference φ, and so on, so that analogy.

In an optional embodiment, in order to further improve the user's audio experience, considering the limitation of the hardware of the capture device, similar to when the image is acquired at close focus, the image may be blurred or out of focus, resulting in the user recording If the experience is not good, the sub-audio signal may also be denoised according to a preset preprocessing algorithm before combining the target combined audio signal.

Similarly, considering that in practical applications, users may have their own preferences for sound effects when recording, such as deliberately recording ambient sounds or the range of ambient sounds is not completely consistent with the range displayed on the screen, such as applying some special Therefore, in an optional embodiment, after determining the target combined audio signal according to the preset beamforming algorithm, according to the spatial phase difference of each of the microphone devices, the sub-audio signal and the target adjustment parameter, the target combined audio signal is also determined. include:

The adjustment parameters input through a preset interface or device are acquired, and the target adjustment parameters are determined according to the adjustment parameters.

For example, the adjustment parameter here can be a preset recording mode selected by the user, such as "concert mode", "indoor mode", "sports mode", etc., and then adjust according to the selected preset recording mode parameters and goals The parameters are determined. For example, when the "concert mode" is the input adjustment parameter, the target adjustment parameter for the audio zoom can be appropriately reduced, for example, adjusted from 0.6 determined according to the focal length parameter to 0.4.

In step S2026, a target combined audio signal is determined according to a preset beamforming algorithm, the sub-audio signal, and the target adjustment value.

This step is basically the same as step S1026 of the audio signal processing method in the embodiment shown in FIG. 1 , and will not be repeated here.

FIG. 9 shows a structural block diagram of an audio signal processing apparatus in an embodiment.

Referring to FIG. 9 , an audio signal processing apparatus 1060 according to an embodiment of the present invention includes: an obtaining unit 1062 , a determining unit 1064 , and a combining unit 1066 .

Wherein, the obtaining unit 1062 is used to obtain the sub-audio signals collected by each microphone device respectively;

Determining unit 1064: configured to obtain a focal length parameter through the zoom camera device, and determine a target audio adjustment parameter according to the focal length parameter;

Combining unit 1066: configured to determine a target combined audio signal according to a preset beamforming algorithm, the sub-audio signal, and the target adjustment parameter.

Wherein, further, the target device further includes a zoom camera device, and the determining unit 1064 is further configured to:

The target audio adjustment parameter includes a phase compensation value and a spatial phase difference corresponding to each microphone position in the microphone array.

The determination unit 1064 is also used to:

Further, when the focal length parameter is greater than a preset threshold, the compensation coefficient takes a value of 1; when the focal length parameter is less than or equal to the preset threshold, the compensation coefficient takes a value of less than 1.

The target terminal includes a body and an accessory module, the accessory module is rotatably connected to the body, the accessory module includes a zoom camera device and the microphone array, and the microphone array is connected to the zoom camera. The devices are located on the same side of the accessory module and point in the same direction.

Wherein, the microphone array is a linear array.

Figure 10 shows a diagram of the internal structure of a computer device in one embodiment. Specifically, the computer device may be a terminal or a server. As shown in FIG. 10 , the computer device includes a processor, a memory and an output module, an acquisition module, and a processing module connected through a system bus. Wherein, the memory includes a non-volatile storage medium and an internal memory. The non-volatile storage medium of the computer device stores an operating system, and also stores a computer program, which, when executed by the processor, enables the processor to implement the audio signal processing method. A computer program can also be stored in the internal memory. When the computer program is executed by the processor, the processor can execute the audio signal processing method. Those skilled in the art can understand that the structure shown in FIG. 10 is only a block diagram of a partial structure related to the solution of the present invention, and does not constitute a limitation on the computer equipment to which the solution of the present invention is applied. Include more or fewer components than shown in the figures, or combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is proposed, including a memory and a processor, wherein the memory stores a computer program, and when the computer program is executed by the processor, the processor executes the execution of FIG. 1 and FIG. 5 and the steps shown in Figure 8.

Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented by instructing relevant hardware through a computer program, and the program can be stored in a non-volatile computer-readable storage medium , when the program is executed, it may include the flow of the embodiments of the above-mentioned methods. Wherein, any reference to memory, storage, database or other medium used in the various embodiments provided by the present invention may include non-volatile and/or volatile memory. Nonvolatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in various forms such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Road (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

The technical features of the above embodiments can be combined arbitrarily. In order to make the description simple, all possible combinations of the technical features in the above embodiments are not described. However, as long as there is no contradiction in the combination of these technical features It is considered to be the range described in this specification.

The above-mentioned embodiments only represent several embodiments of the present invention, and the descriptions thereof are specific and detailed, but should not be construed as a limitation on the scope of the patent of the present invention. It should be pointed out that for those of ordinary skill in the art, without departing from the concept of the present invention, several modifications and improvements can also be made, which all belong to the protection scope of the present invention. Therefore, the protection scope of the patent of the present invention should be subject to the appended claims.

Claims

An audio signal processing method, wherein the method is based on a target device, the target device includes a microphone array, and the microphone array includes a plurality of microphone devices arranged at different positions;

The method includes:

respectively acquiring the sub-audio signals collected by each microphone device;

Obtain target audio adjustment parameters, and obtain target audio adjustment values according to the target audio adjustment values;

The audio signal of the target device is determined according to the preset beamforming algorithm, the sub-audio signal, and the target adjustment value.
The audio signal processing method according to claim 1, wherein the target device further comprises a zoom camera device;

The audio signal processing method further includes:

The target audio adjustment parameter is acquired according to the focal length parameter of the variable-focus camera device.
The audio signal processing method according to claim 2, wherein the target audio adjustment parameter comprises a phase compensation value and a spatial phase difference corresponding to each microphone position in the microphone array;

The obtaining target audio adjustment parameters includes:

Determining the signal delay time of each of the microphone devices according to the distance between each of the microphone devices and the speed of sound information;

The phase compensation value and the spatial phase difference corresponding to each of the microphone devices are respectively determined according to the signal delay time of each of the microphone devices.
The audio signal processing method according to claim 2, wherein the target parameter further comprises a compensation coefficient, and the size of the compensation coefficient is proportional to the focal length parameter of the variable-focus camera device.
The audio signal processing method according to claim 4, wherein,

The acquiring the target audio adjustment parameter according to the focal length parameter of the variable-focus camera device includes:

When the focal length parameter is greater than a preset threshold, the compensation coefficient takes a value of 1;

When the focal length parameter is less than or equal to the preset threshold, the value of the compensation coefficient is less than 1.
A target terminal, characterized in that the target terminal includes a body and an accessory module, the accessory module is rotatably connected to the body, and the accessory module includes a zoom camera device and a microphone array;

The zoom camera and the microphone array are located on two adjacent sides of the accessory module, and the photosensitive direction of the zoom camera is the same as the sound collection direction of the microphone array.
The target terminal according to claim 6, wherein the microphone array is a linear array, comprising a plurality of microphone devices, and a connection line of the plurality of microphone devices is perpendicular to the photosensitive surface of the variable-focus camera.
An audio signal processing device, characterized in that the device comprises:

Obtaining unit: used to obtain the sub-audio signals collected by each microphone device respectively;

Determining unit: used to obtain target audio adjustment parameters, and obtain target audio adjustment values according to the target audio adjustment parameters;

Combining unit: used to determine a target combined audio signal according to a preset beamforming algorithm, the sub-audio signal, and the target adjustment parameter.
A readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of the method according to any one of claims 1 to 7.
A computer device comprising a memory and a processor, the memory stores a computer program, when the computer program is executed by the processor, the processor causes the processor to execute the method according to any one of claims 1 to 7 A step of.