WO2023066213A1

WO2023066213A1 - Microphone array and signal processing method and apparatus therefor, and device and medium

Info

Publication number: WO2023066213A1
Application number: PCT/CN2022/125739
Authority: WO
Inventors: 李天宇
Original assignee: 广州视源电子科技股份有限公司; 广州视源人工智能创新研究院有限公司
Priority date: 2021-10-21
Filing date: 2022-10-17
Publication date: 2023-04-27
Also published as: CN116017230A

Abstract

The present application provides a non-uniform linear microphone array. The microphone array comprises a central microphone pair, and a plurality of extended microphones, which are symmetrically arranged on two sides of the central microphone pair, the central microphone pair and the extended microphones being arranged on the same straight line, and the distances between adjacent microphones being unequal, wherein the larger the distance between the extended microphone and the central microphone pair, the larger the distance between the extended microphone and an adjacent microphone at the side close to the center of the array. By means of the structure, a better sound pickup effect can be achieved when the same number of microphones is used, or the number of microphones in an array can be reduced while the sound pickup effect is ensured. By means of an audio signal processing method and apparatus for a microphone array, and a medium of the present application, weighting coefficient optimization solving can be performed on an audio signal that is collected by the non-uniform linear microphone array, and spatial filtering is performed on the audio signal by means of a weighting coefficient, thereby improving the quality of the signal.

Description

Microphone array and its signal processing method, device, equipment and medium

This application claims the priority of the Chinese patent application with the application number 202111228311X submitted to the China Patent Office on October 21, 2021, and the invention title is "microphone array and its signal processing method, device, equipment and medium", the entire content of which is passed References are incorporated in this application.

technical field

The present application relates to the field of circuit technology, for example, to a microphone array and a signal processing method, device, device and medium thereof.

Background technique

In the existing uniform linear microphone array, the microphones are evenly arranged on a straight line, and the distance between adjacent microphones is equal. To improve the sound pickup quality of the above-mentioned arrays, beamforming techniques are usually applied, which usually require the distance between the microphones to be comparable to the signal wavelength. The voice signal has a wide frequency band. Effective beamforming for high-frequency signals requires a sufficiently small spacing between microphone array elements, and effective beamforming for low-frequency signals requires a sufficiently large array aperture. The inventors found that if the existing array structure with evenly arranged microphones is used, in order to meet the requirements of high and low frequency beamforming at the same time, the number of microphones required is large, which not only increases the hardware cost and structural complexity, but also increases the beam Form the calculation amount of the algorithm.

In addition, the existing beamforming algorithm is usually a delay-accumulation method. When using this method to perform beamforming on wideband speech signals, there are three problems: first, the shape of the beam pattern is related to frequency, and the width of the main lobe of the beam varies with frequency. The second is that the attenuation of noise in the entire frequency band is non-uniform, resulting in artificial noise in the beam output. The third is that when the incident direction of the sound wave deviates from the main lobe direction, the beamforming process introduces a low-pass filter effect, resulting in distortion of the output signal.

technical problem

The purpose of this application is to provide a non-uniform linear microphone array structure and corresponding audio signal processing method, so as to obtain better sound pickup effect under the condition of using the same number of microphones, or reduce the array under the condition of ensuring the sound pickup effect the number of microphones in the

technical solution

The application provides a non-uniform linear microphone array, which includes a central microphone pair and several extended microphones arranged symmetrically on both sides of the central microphone pair; the central microphone pair and the extended microphones are arranged On the same straight line, and the spacing between adjacent microphones is not equal, wherein the larger the spacing between the expansion microphone and the center microphone pair, the greater the distance between the expansion microphone and the adjacent microphones on the side near the center of the array. The distance is also larger.

The present application provides an audio signal processing method of a microphone array, wherein the method includes:

Calculate the corresponding array steering vector group according to the structural parameters of the microphone array and the signal acquisition channel, wherein the array steering vector group includes several array steering vector matrices corresponding to different frequency points;

According to the frequency band to which each frequency point belongs, select the corresponding constraint condition and cost function, and optimize and solve the weighting coefficient, wherein, the cost function is calculated by the array steering vector matrix and the weighting coefficient, and the frequency independence is completed. After optimizing the weighting coefficient of , it is smoothed in the frequency domain;

Extract the time-domain audio signal corresponding to the signal acquisition channel from the multi-channel signal collected by the microphone array, and perform discrete Fourier transform on the time-domain audio signal corresponding to the signal acquisition channel to obtain the corresponding multi-channel time-frequency domain Signal;

Perform weighted summation on the multi-channel time-frequency domain signals of corresponding frequency points respectively by the weighting coefficients to obtain time-frequency domain beam output signals, and complete spatial filtering;

Inverse discrete Fourier transform is performed on the time-frequency domain beam output signal to calculate and obtain a target audio signal.

The present application also proposes an audio signal processing device for a microphone array, wherein the software includes:

The array steering vector group calculation unit is used to calculate the corresponding array steering vector group according to the structural parameters of the microphone array and the signal acquisition channel, wherein the array steering vector group includes several array steering vector matrices corresponding to different frequency points;

The weighting coefficient solving unit is used to select corresponding constraints and cost functions according to the frequency band to which each frequency point belongs, and optimize and solve the weighting coefficients, wherein the cost function is composed of the array steering vector matrix and the weighting coefficients Calculated, frequency-domain smoothing is performed after completing frequency-independent weighting coefficient optimization;

A signal extraction unit, configured to extract a time-domain audio signal corresponding to the signal acquisition channel from the multi-channel signal collected by the microphone array, and perform discrete Fourier transform on the time-domain audio signal corresponding to the signal acquisition channel to obtain a corresponding multi-channel time-frequency domain signal;

The spatial filtering unit is configured to perform weighted summation of the multi-channel time-frequency domain signals of corresponding frequency points through the weighting coefficients to obtain a time-frequency domain beam output signal and complete spatial filtering;

A signal generating module, configured to perform inverse discrete Fourier transform on the time-frequency domain beam output signal to calculate and obtain a target audio signal.

The present application also proposes a computer device, including a memory and a processor, the memory stores a computer program, and when the processor executes the computer program, an audio signal processing method of a microphone array is implemented, which is applied to the above-mentioned A microphone array; wherein, the audio signal processing method of the microphone array includes: calculating a corresponding array steering vector group according to the structural parameters and signal acquisition channels of the microphone array, wherein the array steering vector group includes corresponding different Several array-steering vector matrices of frequency points; according to the frequency band to which each frequency point belongs, select corresponding constraints and cost functions to optimize and solve the weighting coefficients, wherein the cost function is composed of the array-steering vector matrix and the The above weighting coefficients are calculated; extract the time-domain audio signal corresponding to the signal acquisition channel from the multi-channel signal collected by the microphone array, and perform discrete Fourier transform on the time-domain audio signal corresponding to the signal acquisition channel to obtain Corresponding multi-channel time-frequency domain signals; respectively weighting and summing the multi-channel time-frequency domain signals of corresponding frequency points through the weighting coefficients to obtain time-frequency domain beam output signals, and completing spatial filtering; Inverse discrete Fourier transform is performed on the output signal of the domain beam to calculate the target audio signal.

The present application also proposes a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, an audio signal processing method of a microphone array is implemented, which is applied to the above-mentioned microphone array ; Wherein, the audio signal processing method of the microphone array includes: calculating the corresponding array steering vector group according to the structural parameters and signal acquisition channels of the microphone array, wherein the array steering vector group includes several corresponding to different frequency points array steering vector matrix; according to the frequency band to which each frequency point belongs, select corresponding constraints and cost functions, and optimize and solve the weighting coefficients, wherein the cost function is calculated by the array steering vector matrix and the weighting coefficients Extract the time-domain audio signal corresponding to the signal acquisition channel from the multi-channel signal collected by the microphone array, and perform discrete Fourier transform on the time-domain audio signal corresponding to the signal acquisition channel to obtain the corresponding multi-channel time frequency domain signal; weighting and summing the multi-channel time-frequency domain signals of corresponding frequency points respectively by the weighting coefficients to obtain a time-frequency domain beam output signal, and completing spatial filtering; performing the time-frequency domain beam output signal The inverse discrete Fourier transform is used to calculate the target audio signal.

Beneficial effect

A non-uniform linear microphone array of the present application arranges the expansion microphones non-uniformly around the center microphone group, and the farther away from the center, the larger the distance between adjacent expansion microphones, thus taking into account the High and low frequency beamforming requires minimum array element spacing and maximum array aperture. When the number of array elements is the same, the wavelength range that can be covered is wider. When the array area is the same, the number of microphone array elements required is less; and through this An audio signal processing method for a microphone array is applied for, which uses different loss functions to attenuate the output power of the original audio signal obtained by the above-mentioned non-uniform linear microphone array according to different angle ranges and frequency ranges, thereby improving the obtained target audio frequency. The quality of the signal.

Description of drawings

Fig. 1 is a structural schematic diagram of a non-uniform linear microphone array of an embodiment;

Fig. 2 is a specific structural schematic diagram of a non-uniform linear microphone array of an embodiment;

Fig. 3 is a specific structural schematic diagram of a non-uniform linear microphone array of an embodiment;

FIG. 4 is a schematic flowchart of an audio signal processing method of a microphone array in an embodiment;

FIG. 5 is a schematic flowchart of a method for processing an audio signal of a microphone array according to an embodiment;

FIG. 6 is a broadband beam diagram of an audio signal in the prior art;

Fig. 7 is the broadband beam diagram of the target audio signal of an embodiment;

FIG. 8 is a schematic block diagram of the structure of an audio signal processing device for a microphone array according to an embodiment of the present application;

FIG. 9 is a schematic block diagram of a computer device according to an embodiment of the present application.

In Figure 1 to Figure 3:

1. Microphone array; 101. Center microphone pair; 102. Extended microphones.

The realization, functional features and advantages of the present application will be further described in conjunction with the embodiments and with reference to the accompanying drawings.

BEST MODE FOR CARRYING OUT THE INVENTION

Referring to FIG. 1 , it is a non-uniform linear microphone array disclosed in the present application, which includes a central microphone pair and several extended microphones arranged symmetrically on both sides of the central microphone pair; the central microphone pair and the The extended microphones are arranged on the same straight line, and the distance between adjacent microphones is not equal, wherein, the larger the distance between the extended microphone and the center pair of microphones, the greater the distance between the extended microphone and the pair near the center of the array. The spacing between adjacent microphones is also greater.

In one embodiment, the non-uniform linear microphone array provided by this embodiment is usually used in large audio and video conference screens, smart blackboards, and other devices that have certain requirements for sound pickup quality, and is used to collect noise around the device. voice message. The non-uniform linear microphone array includes a center microphone pair and at least two extension microphones, wherein the center microphone pair includes two center microphones, and the center microphone pair and the extension microphones are arranged on the same straight line. Specifically, referring to Figure 2, when it is necessary to reserve the installation position of the camera and other devices in the microphone array, the distance between the center microphone pair d0 can be reserved according to its specific size, and it is symmetrical with the vertical line of the center microphone pair connection line Axis, a number of extension microphones are arranged symmetrically along the extension line of the above-mentioned two central microphones, and the distance between the extension microphone and the adjacent microphone on the center side of the array is d1, d2, d3 from inside to outside. Since the microphone array is often used in conjunction with the camera module, in this embodiment, the value of d0 can be selected according to the actual situation, and the remaining distances are designed to satisfy d1<d2<d3.

In one embodiment, in order to reduce the number of microphones while ensuring signal quality, the distance between adjacent expansion microphones increases as the distance from the central microphone pair increases. Compared with the array structure in which the same number of microphones are arranged in a uniform manner, the non-uniform microphone array structure arranged according to the above principles can simultaneously achieve smaller spacing between adjacent microphone units and a larger overall aperture of the microphone array, namely The center microphone pair has a smaller-than-average spacing between its adjacent expansion microphones, while the front and rear side microphones have a larger-than-average spacing between their neighbors, allowing for greater optimal beamforming with the same number of microphones Frequency range, improve pickup quality.

In one embodiment, referring to FIG. 3 , the aforementioned non-uniform linear microphone array structure includes a single hardware structure, and also includes a combination of several non-uniform and asymmetric microphone array hardware structures in the form of sub-arrays.

Referring to Fig. 4, it is a schematic flow chart of an audio signal processing algorithm of a microphone array disclosed in the present application, and the method includes:

S1. Calculate a corresponding array steering vector group according to the structural parameters and signal acquisition channels of the microphone array, wherein the array steering vector group includes several array steering vector matrices corresponding to different frequency points;

S2. According to the frequency band to which each frequency point belongs, select the corresponding constraint condition and cost function, and optimize and solve the weighting coefficient, wherein, the cost function is calculated by the array steering vector matrix and the weighting coefficient, and the completion The frequency-independent weighting coefficients are optimized and then smoothed in the frequency domain;

S3. Extract the time-domain audio signal corresponding to the signal acquisition channel from the multi-channel signal collected by the microphone array, and perform discrete Fourier transform on the time-domain audio signal corresponding to the signal acquisition channel to obtain the corresponding multi-channel time frequency domain signal;

S4. Perform weighted summation on the multi-channel time-frequency domain signals of corresponding frequency points respectively by using the weighting coefficients to obtain time-frequency domain beam output signals, and complete spatial filtering;

S5. Perform inverse discrete Fourier transform on the time-frequency domain beam output signal to calculate and obtain the target audio signal;

In the actual execution process, usually the above-mentioned steps S1 and S2 only need to be performed once, and the frequency-domain beamforming weighting coefficients are obtained and stored, and the frequency-domain beamforming weighting coefficients are no longer modified with the change of the received signal until The structural parameters of the microphone array are changed; the structural parameters include the number of microphones included in the microphone array.

As described in the above step S1, firstly, it is necessary to calculate the array steering vector group according to the structural parameters of the microphone array and the signal acquisition channels. Specifically, the actual audio signal is simulated by the imaginary signal, and an array guide is calculated corresponding to each analysis frequency point and incoming wave direction according to the above-mentioned non-uniform linear microphone array structure, signal acquisition channel, signal sampling rate, and number of analysis frequency points. vector. Exemplarily, if the above-mentioned microphone array consists of 8 microphones in total, all 8 channels are designated as signal acquisition channels, and the incoming wave directions are divided into 181 discrete incoming wave directions at intervals of 1° from 0° to 180°, and the number of frequency points is analyzed If 512 is selected, the array steering vector group consists of 512 array steering vector matrices with a dimension of 8×181, wherein each array steering vector matrix includes 181 steering vectors corresponding to different directions of incoming waves. Exemplarily, if the remaining 6 channels except the first and last two channels are designated as signal acquisition channels, the dimension of the array steering vector matrix is 6×181.

As described in the above step S2, the meaning of the above-mentioned sub-band optimization weighting coefficients is: according to the signal sampling rate and the number of analysis frequency points, all the processing frequency bands are divided into three processing frequency bands: low, medium and high, and the frequency domain is adjusted in the low, medium and high frequency bands. When beamforming weighting coefficients are optimized and solved, different constraints and cost functions are used to complete frequency-independent weighting coefficient optimization and then frequency domain smoothing.

As described in step S3 above, the beamforming process is performed in the frequency domain, so it is necessary to transform the original microphone acquisition signal into the time-frequency domain. In a specific application, according to different beamforming purposes, the signals of all channels of the microphone array can be extracted, or only part of the channel signals can be extracted, and the positions of the microphones corresponding to the channels can be asymmetrical. Referring to Fig. 5, assuming that the number of channels currently extracted is M, the sound pressure signals at M microphone positions are respectively expressed as x ₁ (t),..., x _M (t), and the corresponding multiples are obtained after sampling and buffering Channel time-domain audio signals x ₁ (l),...,x _M (l), and then after discrete Fourier transform (Discrete Fourier Transform, DFT), get a multi-channel time-frequency domain signal with K analysis frequency points , that is, x ₁ (1),...,x ₁ (K),...,x _M (1),...,x _M (K). In practical applications, a specific analysis window is usually used to complete the time-frequency domain conversion. Exemplarily, if the microphone array is composed of 8 microphones, the number of analysis frequency points is selected as 512, and all channels are selected for beamforming, the multi-channel time-frequency domain signal of one frame is expressed as an 8×512 matrix.

As described in the above step S4, after the multi-channel audio signal is transformed into the frequency domain, it is weighted and summed according to the weighting coefficient matrix calculated in the above step S2 to obtain the beam output frequency domain signal Y(1 ),...,Y(K).

As described in step S5 above, according to the corresponding windowing strategy in step S3, the beam output frequency domain signals Y(1),...,Y(K) corresponding to K frequency points are subjected to inverse discrete Fourier transform (Inverse Discrete Fourier Transform, IDFT), and finally obtain a multi-channel time-domain audio signal y(l).

In summary, by performing time-frequency domain transformation and beamforming processing on the original audio signal collected by the non-uniform linear microphone array, the sound pickup function with specific directivity is finally obtained, which improves the accuracy of the picked-up audio signal. SNR.

In one embodiment, the designated manner of the signal acquisition channel is one of the following manners:

Selecting all channel signals in the microphone array and designating them as the signal acquisition channels;

In the microphone array, a part of channels is symmetrically selected with the center microphone pair as the center, and designated as the signal acquisition channel;

In the microphone array, some channels are asymmetrically selected and designated as the signal acquisition channels, for example, an asymmetric multi-channel signal composed of the first seven microphones is extracted from the microphone array shown in FIG. 2 .

As mentioned above, in the multi-channel signal selection link, all channel signals can be selected and designated as the signal acquisition channel, and the time-domain audio signal corresponding to the signal acquisition channel is extracted from the multi-channel signal collected by the microphone array. Discrete Fourier transform is performed on the time-domain audio signal corresponding to the signal acquisition channel to obtain the corresponding multi-channel time-frequency domain signal. At this time, the number of microphones to be gated is equal to the total number of microphones in the array, and their positions are symmetrical. Correspondingly, the dimension of the array steering vector matrix determined according to this channel selection method is the number of microphone units in the array, and The desired beam response during optimization of the weighting coefficients is symmetric.

In the multi-channel signal selection link, in addition to selecting all channels to be designated as the signal acquisition channels, some symmetrical channels may also be selected to be designated as the signal acquisition channels for subsequent beamforming. At this time, the number of gated microphones is less than that of the array The total number of microphones, and their positions are symmetrical. Compared with the above method of selecting all channel signals, the dimension of the array steering vector matrix determined according to this channel selection method is smaller than the number of microphone units in the array, and the expected beam response in the process of optimizing the weighting coefficients is symmetrical. Compared with the above-mentioned method of selecting all channel signals, in the scheme of selecting part of symmetrical channel signals, the number of channels participating in beamforming processing is reduced, so the number of weighting coefficients used is reduced, and the calculation amount of beamforming processing is reduced.

In the multi-channel signal selection link, in addition to the above-mentioned method of selecting all or part of the symmetric channels as the signal acquisition channel, it is also possible to select part of the asymmetric channel signal and designate it as the signal acquisition channel for subsequent beamforming. At this time, The number of gated microphones is less than the number of microphone units in the array, and their positions are asymmetrical. Correspondingly, the dimension of the array steering vector matrix determined according to this channel selection method is less than the number of microphone units in the array, and the weighting coefficients are optimized The desired beam response in the process is asymmetric.

To sum up, by combining the above three schemes for specifying signal acquisition channels, several sets of weighting coefficients corresponding to the signal acquisition channel schemes can be obtained, and multiple beamforming results can be obtained, providing multi-channel for subsequent signal processing. Signal.

In one embodiment, the frequency bands include low frequency bands, middle frequency bands and high frequency bands;

Wherein, the frequency band is obtained by dividing all processing frequency bands according to the signal sampling rate and the number of analysis frequency points; the low frequency band, middle frequency band and high frequency band respectively correspond to different constraint conditions and cost functions.

As mentioned above, in one embodiment, all the processing frequency bands are divided into three processing frequency bands: low, medium and high, and when optimizing and solving the frequency domain beamforming weighting coefficients in the low, medium and high frequency bands, different constraint conditions are respectively adopted and cost function. Among them, the division of the processing frequency band is based on whether the distance between the microphone array elements and the value of the half-wavelength of the signal are similar. The mid-frequency band is the frequency range in which the array structure can be used to form a more ideal desired beam response by optimizing the weighting coefficients, that is, the basis for determining the frequency cut-off point between the mid-frequency band and the low-frequency band is the half-wavelength value of the frequency signal and the maximum microphone spacing of the array Similarly, the basis for determining the frequency cut-off point between the mid-frequency band and the high-frequency band is that the half-wavelength value of the frequency signal is close to the minimum microphone spacing of the array. Exemplarily, the non-uniform linear microphone array structure is determined according to the following microphone spacing design parameters: d0=30mm, d1=25mm, d2=35mm, d3=90mm. At this time, the distance between the two microphones at the beginning and the end is the largest, which is 3300mm, and the distance between the center microphone pair and the adjacent extended microphone is the smallest, which is 25mm. According to the principle of frequency band division, the half-wavelength of the lower limit frequency and upper limit frequency signals of the mid-frequency band are 3300mm and 25mm respectively. , when the sound velocity is 330m/s, the corresponding frequencies are 500Hz and 6600Hz. Therefore, a division of low, medium and high frequency bands suitable for the microphone array structure is shown in Table 1 below.

Table 1

In one embodiment, after optimizing and solving the weighting coefficients, further comprising:

Calculate the first-order difference of the weighting coefficient of each channel at each frequency point, and the first-order difference average of the adjacent frequency points of each frequency point;

If the relative deviation between the first-order difference value at a certain frequency point and the corresponding first-order difference average value is greater than a preset deviation threshold, then use the frequency point as a discontinuity point of the weighting coefficient;

Select the discontinuous point and the adjacent frequency points of the discontinuous point as the interval to be smoothed, smooth the weight coefficient in the interval to be smoothed, and update the weight coefficient in the interval to be smoothed to smooth weighting factor.

As mentioned above, since the optimization solution of the loss function is carried out independently at each frequency point, the weighting coefficients obtained through the optimization solution have discontinuity between frequency points, that is, there is a gap between the low, medium and high frequency bands. A series of obvious discontinuities will have a certain degree of artificial noise in the corresponding beam output signal. First judge the main discontinuity point of the weighting coefficient of each channel in the frequency domain, then set a transition band covering several frequency points near the discontinuity point, and smooth the weighting coefficient of the frequency points covered by the transition band, which can reduce the beam Artifacts in the output audio signal.

Specifically, the above-mentioned adjacent frequency point refers to the previous frequency point and the next frequency point of a certain frequency point. For example, a certain frequency point is the 256th frequency point among all 512 analysis frequency points. The points are the 255th frequency point and the 257th frequency point, and the corresponding first-order difference average is the average value between the first-order differences corresponding to the 255th frequency point and the 257th frequency point. When the relative deviation between the first-order difference value of the 256th frequency point and the corresponding first-order difference average value is greater than the preset deviation threshold, it means that the rate of change of the weighting coefficient at the 256th frequency point is relatively large, that is, not enough Smoothing, therefore, the 255th frequency point to the 257th frequency point are used as the interval to be smoothed, and the weighting coefficient in this interval is smoothed, for example, the weighting coefficient is set to be the same as the average value of the first-order difference, etc.

In one embodiment, the array steering vector matrix is calculated based on hypothetical signals of all directions of arrival, and the method further includes:

Obtain the frequency range, beam main lobe angle range, beam attenuation angle range and weighting coefficient norm threshold of each frequency band, and obtain the expected main lobe response of the mid-frequency band and the main lobe deviation threshold of the mid-high frequency band;

The constraint conditions of the low frequency band include: the norm of the weighting coefficient is less than the low frequency band threshold of the weighting coefficient norm; the cost function of the low frequency band is: the imaginary signals of all incoming wave directions within the low frequency band beam attenuation angle range pass through beamforming output power;

The constraints of the mid-frequency band include: the norm of the weighting coefficient is less than the mid-frequency band threshold of the weighting coefficient norm, and the deviation between the main lobe of the mid-frequency band beam output and the expected main lobe response of the mid-frequency band is smaller than the main lobe deviation Frequency band threshold; the cost function of the middle frequency band is: the hypothetical signals of all incoming wave directions within the beam attenuation angle range of the middle frequency band pass through the beamforming output power;

The constraints of the high-frequency band include: the norm of the weighting coefficient is less than the high-band threshold of the norm of the weighting coefficient, and the deviation between the main lobe of the high-frequency beam output and the expected main lobe response of the high-frequency band is smaller than the deviation of the main lobe Frequency band threshold; the cost function of the high frequency band is: the hypothetical signals of all incoming wave directions within the beam attenuation angle range of the high frequency band pass through the beamforming output power;

Wherein, the output power is calculated by the array steering vector matrix and the weighting coefficient; the output main lobe of the mid-band beam and the output main lobe of the high-frequency beam are at corresponding frequency points, according to the The weighting coefficient is obtained by performing weighted summation on the array steering vector matrix;

The constraint conditions corresponding to the low frequency band, the middle frequency band and the high frequency band also include: the output gain of the hypothetical signal from the angular direction of the center of the main lobe after beamforming is 1.

As mentioned above, in the process of optimizing weighting coefficients by sub-frequency bands, the frequency range, beam main lobe angle range, beam attenuation angle range and weighting coefficient norm threshold of each frequency band are specified first, and the expected main lobe response and mid-high frequency range of the mid-frequency band are specified. The main lobe deviation threshold of each frequency band is specified, and then the specific constraints and cost functions of each frequency band are specified.

Exemplarily, if the above-mentioned microphone array consists of 8 microphones in total, the number of analysis frequency points is selected to be 512. First, a frequency point to be optimized is selected, and the beam outputs of all directions of arrival are calculated according to the array steering vector matrix of the corresponding frequency point and the weighting coefficients to be optimized, and then the cost function is calculated according to the definition and optimized under constraints. Finally, an 8×512 weighting coefficient matrix is obtained, which includes frequency-domain weighting coefficients corresponding to 8 microphone channels at 512 frequency points, and is used for weighted summation of multi-channel time-frequency domain signals in beamforming processing.

In one embodiment, different constraint conditions and cost functions are used when optimizing and solving the frequency-domain beamforming weighting coefficients in the low, medium and high frequency bands. Exemplarily, the frequency ranges of the specified frequency bands are as shown in Table 1 above. At this time, consider the actual usage scenario of the microphone array: in a conference room, the microphone array is usually placed directly in front of the user's seat, and when the user is speaking into the microphone , the incident angle of the signal is 90°, but since there are usually more than one speaker participating in the meeting and their positions may be distributed on both sides of the long rectangular table, the original audio signal collected by the microphone array usually comes from a direction of 60°~120° . Therefore, 60°~120° can be taken as the expected beam main lobe angle range D ₂ of the mid-frequency band. Since it is difficult to form an obvious beam in the low frequency band, 50°~130° is taken as the expected beam main lobe angle range D ₁ in the low frequency band. The expected beam main lobe angle range _D3 in the high frequency band will be given later. Angular ranges outside the desired beam main lobe angular range are designated as beam attenuation angular ranges (C ₁ , C ₂ , C ₃ ) in all frequency bands.

When optimizing the weighting coefficient, in order to ensure a certain robustness, the norm of the weighting coefficient needs to be constrained. Since it is difficult to optimize the ideal beam pattern in the low frequency band, a reasonable design paradigm is to gradually strengthen the constraint as the frequency increases. Therefore, the weighting coefficient norm thresholds (α _1, α _2, α ₃ ) of the low, medium and high frequency bands are selected as 1.5, 1.2 and 1.0 in turn.

When optimizing the weighting coefficients, the deviation between the optimized beam main lobe response and the expected beam main lobe response is constrained by using the beam main lobe deviation threshold. The beam mainlobe deviation threshold constraint _β2 is specified as 0.7 in the mid-band. The desired beam main lobe angle range _β3 in the high frequency band will be given later. Since it is difficult to form a clear beam in the low frequency band, this constraint is not carried out.

When optimizing the weighting coefficients in the mid-frequency band, it is necessary to give the expected beam response main lobe shape. Since the arrangement of the non-uniform linear microphone array is flexible and changeable, in order to realize a more general design method, this embodiment does not directly give the analytical formula of the expected main lobe response, but the expected main lobe response

It is specified as the beam pattern obtained by the uniform weighting of the microphone array at a certain frequency, and the frequency is specified here as twice the lower limit of the mid-band frequency f _2L .

In one embodiment, the acquisition method of the high frequency band expected main lobe response is:

After completing the optimization of the weighting coefficients at the middle frequency band, the array steering vector matrix is weighted and summed according to the optimized weighting coefficients to obtain the beam main lobe shape, and the beam main lobe shape is used as the high frequency band expectation main lobe response.

As mentioned above, when optimizing the weighting coefficients in the high frequency band, the signal wavelength is reduced to be smaller than the minimum element spacing in the microphone array, and it is difficult to pre-specify the specific expected main lobe response form before optimization. Therefore, the optimized mid-band beam output main lobe shape can be used as the expected main lobe response y ₃ when the high-band weighting coefficients are optimized. Since the output main lobe shape of the optimized mid-band beam is unknown, the angular range _D3 of the high-band beam main lobe can be specified as the angle range corresponding to the -6dB beamwidth of the optimized mid-band beam output main lobe.

The selection example of the above index parameters is applicable to the non-uniform linear microphone array determined by the following spacing: d0=30mm, d1=25mm, d2=35mm, d3=90mm, all index parameters are summarized in Table 2 below.

Table 2

The beam pattern of the complete frequency band obtained according to the above-mentioned optimized weighting coefficients is shown in Figure 6, and Figure 7 is the beam pattern of the complete frequency band obtained by the microphone in the prior art, it can be seen that the audio signal of the microphone array provided by this embodiment The processing method is in the main frequency band of speech (0.5kHz~6.0kHz), the main lobe width of the beam pattern is approximately constant, the gain in the 90° direction is maintained at 0dB, and the signal is strongly attenuated in the 0°~60° and 120°~180° directions , that is, the effect of filtering out noise and interference is better.

In summary, for the audio signal processing method of the microphone array provided in the embodiment of the present application, by using the norm of the weighting coefficient and the expected main lobe response as constraints, the output signal within the main lobe angle of the beam is consistent with the preset The beam main lobe response deviation is small and has high robustness; by optimizing the output signal power within the beam attenuation angle as a cost function, the maximum degree of suppression of the incoming wave signal in the direction of the non-beam main lobe can be achieved; through Designing different constraints and cost functions in the low, medium, and high frequency bands can avoid the situation that the optimization problem has no solution; by setting a transition band covering several frequency points near the discontinuity point of the weighting coefficient, and covering the transition band The weighting coefficients of the frequency points are smoothed, which can reduce the artificial noise in the beam output audio signal; through the weighting coefficient design method of first dividing the frequency bands into independent optimization and then smoothing the frequency domain, a group of multi-channel weighting coefficients can be obtained, according to the multi-channel The weighting coefficient performs spatial filtering on the multi-channel audio signal collected by the non-uniform linear microphone array, which can improve the signal-to-noise ratio of the target audio signal obtained after processing; in addition, the multi-channel signal can also be selected in different ways for the processing , so as to meet the requirements of other subsequent multi-channel audio processing algorithms. By replacing the real channel signal in the subsequent multi-channel algorithm with the signal after spatial filtering, the number of channels can be reduced and the signal quality can be improved.

Referring to FIG. 8, the present application also proposes an audio signal processing device for a microphone array, including:

The array steering vector group calculation unit 100 is used to calculate the corresponding array steering vector group according to the structural parameters and signal acquisition channels of the microphone array, wherein the array steering vector group includes several array steering vector matrices corresponding to different frequency points ;

The weighting coefficient solving unit 200 is used to select corresponding constraints and cost functions according to the frequency band to which each frequency point belongs, and optimize and solve the weighting coefficients, wherein the cost function is composed of the array steering vector matrix and the weighting The coefficients are calculated, and then the frequency-domain smoothing is performed after completing the frequency-independent weighting coefficient optimization;

The signal extraction unit 300 is configured to extract the time-domain audio signal corresponding to the signal acquisition channel from the multi-channel signal collected by the microphone array, and perform discrete Fourier transform on the time-domain audio signal corresponding to the signal acquisition channel to obtain Corresponding multi-channel time-frequency domain signal;

The spatial filtering unit 400 is configured to perform weighted summation of the multi-channel time-frequency domain signals of corresponding frequency points through the weighting coefficients, to obtain a time-frequency domain beam output signal, and complete spatial filtering;

The signal generation module 500 is configured to perform inverse discrete Fourier transform on the time-frequency domain beam output signal to calculate and obtain a target audio signal.

Selecting some channels in the microphone array in an asymmetric manner and designating them as the signal acquisition channels.

In an embodiment, the spatial domain filtering unit 400 is further configured to:

In one embodiment, the array steering vector matrix is calculated according to the imaginary signals of all directions of arrival, and the weighting coefficient solving unit 200 is further used for:

The constraint conditions of the low frequency band include: the norm of the weighting coefficient is less than the low frequency band threshold of the weighting coefficient norm; the cost function of the low frequency band is: the imaginary signals of all incoming wave directions within the beam attenuation angle range of the low frequency band pass through Beamforming output power;

The constraints of the high frequency band include: the norm of the weighting coefficient is less than the high frequency band threshold of the weighting coefficient norm, and the deviation between the main lobe of the high frequency beam output and the expected main lobe response of the high frequency band is smaller than the deviation of the main lobe Frequency band threshold; the cost function of the high frequency band is: the hypothetical signals of all incoming wave directions within the beam attenuation angle range of the high frequency band pass through the beamforming output power;

Wherein, the output power is calculated by the array steering vector matrix and the weighting coefficients; the output main lobe of the mid-frequency band beam is calculated according to the weighting coefficients optimized in the mid-frequency band to the array steering vector matrix It is obtained by performing weighted summation, and the output main lobe of the high-frequency band beam is obtained by performing weighted summation on the array steering vector matrix according to the optimized weight coefficient in the high-frequency band;

Referring to FIG. 9 , an embodiment of the present application also provides a computer device, which may be a server, and its internal structure may be as shown in FIG. 9 . The computer device includes a processor, memory, network interface and database connected by a system bus. Among them, the processor designed by the computer is used to provide calculation and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs and databases. The memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium. The database of the computer equipment is used to store data such as the audio signal processing method of the microphone array. The network interface of the computer device is used to communicate with an external terminal via a network connection. When the computer program is executed by a processor, an audio signal processing method of a microphone array is realized. The audio signal processing method of the microphone array is applied to a microphone array; wherein, the microphone array includes a central microphone pair, and several extended microphones arranged symmetrically on both sides of the central microphone pair; the central microphone pair and The extended microphones are arranged on the same straight line, and the distance between adjacent microphones is not equal, wherein, the larger the distance between the extended microphone and the central microphone pair, the larger the distance between the extended microphone and the side near the center of the array. The distance between adjacent microphones is also larger; the method includes: calculating the corresponding array steering vector group according to the structural parameters of the microphone array and the signal acquisition channel, wherein the array steering vector group includes corresponding to different frequency points Several array-steering vector matrices; according to the frequency band to which each frequency point belongs, select the corresponding constraints and cost functions, and optimize and solve the weighting coefficients, wherein the cost function is composed of the array-steering vector matrix and the weighting The coefficients are calculated, and then frequency-domain smoothing is performed after completing frequency-independent weighting coefficient optimization; the time-domain audio signal corresponding to the signal acquisition channel is extracted from the multi-channel signal collected by the microphone array, and the signal is collected The time-domain audio signal corresponding to the channel is discretely Fourier transformed to obtain the corresponding multi-channel time-frequency domain signal; the multi-channel time-frequency domain signal of the corresponding frequency point is respectively weighted and summed by the weighting coefficient to obtain the time-frequency domain beam output signal to complete spatial domain filtering; performing inverse discrete Fourier transform on the time-frequency domain beam output signal to calculate and obtain a target audio signal.

After completing the optimization of the mid-frequency band weighting coefficients, the array steering vector matrix is weighted and summed according to the optimized weighting coefficients to obtain the beam main lobe shape, and the beam main lobe shape is used as the high-frequency band desired main lobe response.

An embodiment of the present application also provides a computer-readable storage medium, the storage medium is a volatile storage medium or a non-volatile storage medium, on which a computer program is stored, and when the computer program is executed by a processor, a The audio signal processing method of a microphone array is applied to a microphone array; the microphone array includes a central microphone pair, and several extended microphones arranged symmetrically on both sides of the central microphone pair; the central microphone pair and the extended microphone Arranged on the same straight line, and the distance between adjacent microphones is not equal, wherein the larger the distance between the expansion microphone and the center microphone pair, the greater the distance between the expansion microphone and the adjacent microphone on the side near the center of the array. The distance between them is also larger; the method includes: calculating the corresponding array steering vector group according to the structural parameters and signal acquisition channels of the microphone array, wherein the array steering vector group includes several arrays corresponding to different frequency points Steering vector matrix; according to the frequency band to which each frequency point belongs, select corresponding constraints and cost functions, and optimize and solve the weighting coefficients, wherein the cost function is calculated by the array steering vector matrix and the weighting coefficients ; Extract the time-domain audio signal corresponding to the signal acquisition channel from the multi-channel signal collected by the microphone array, and perform discrete Fourier transform on the time-domain audio signal corresponding to the signal acquisition channel to obtain the corresponding multi-channel time-frequency domain signal; weighting and summing the multi-channel time-frequency domain signals of corresponding frequency points respectively by the weighting coefficients to obtain a time-frequency domain beam output signal, and completing spatial filtering; discretizing the time-frequency domain beam output signal The inverse Fourier transform is used to calculate the target audio signal.

The audio signal processing method of the microphone array implemented above uses the norm of the weighting coefficient and the expected main lobe response as constraints, so that the output signal within the beam main lobe angle has a small deviation from the preset beam main lobe response and has a relatively small High robustness; by optimizing the output signal power within the beam attenuation angle as a cost function, it can achieve the maximum suppression of the incoming wave signal in the direction of the non-beam main lobe; by designing different Constraint conditions and cost functions can avoid the situation that the optimization problem has no solution; by setting a transition band covering several frequency points near the discontinuity point of the weight coefficient, and smoothing the weight coefficient of the frequency points covered by the transition band, it can reduce The artificial noise in the beam output audio signal; through the weighting coefficient design method of frequency-domain smoothing after independent optimization of the frequency band first, a group of multi-channel weighting coefficients can be obtained, and the non-uniform linear microphone can be adjusted according to the multi-channel weighting coefficients. The multi-channel audio signal collected by the array is subjected to spatial filtering, which can improve the signal-to-noise ratio of the target audio signal obtained after processing; in addition, the multi-channel signal can also be selected in different ways for the processing, thereby satisfying other subsequent multi-channel audio processing algorithms Requirements, by replacing the real channel signal in the subsequent multi-channel algorithm with the signal that has been filtered in the spatial domain, the number of channels can be reduced and the signal quality can be improved.

Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented through computer programs to instruct related hardware, and the computer programs can be stored in a non-volatile computer-readable memory In the medium, when the computer program is executed, it may include the processes of the embodiments of the above-mentioned methods. Wherein, any references to memory, storage, database or other media provided in the present application and used in the embodiments may include non-volatile and/or volatile memory. Nonvolatile memory can include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory can include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in many forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (SSRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

Claims

A non-uniform linear microphone array, which includes a central microphone pair and several extended microphones arranged symmetrically on both sides of the central microphone pair; the central microphone pair and the extended microphones are arranged on the same straight line , and the distances between adjacent microphones are not equal, wherein the greater the distance between the expansion microphone and the central microphone pair, the greater the distance between the expansion microphone and the adjacent microphones on the side near the center of the array big.
A method for processing audio signals of a microphone array, applied to a microphone array as claimed in claim 1, wherein the method comprises:

Calculate the corresponding array steering vector group according to the structural parameters of the microphone array and the signal acquisition channel, wherein the array steering vector group includes several array steering vector matrices corresponding to different frequency points;

According to the frequency band to which each frequency point belongs, select corresponding constraints and cost functions, and optimize and solve the weighting coefficients, wherein the cost function is calculated by the array steering vector matrix and the weighting coefficients;

Extract the time-domain audio signal corresponding to the signal acquisition channel from the multi-channel signal collected by the microphone array, and perform discrete Fourier transform on the time-domain audio signal corresponding to the signal acquisition channel to obtain the corresponding multi-channel time-frequency domain Signal;

Perform weighted summation on the multi-channel time-frequency domain signals of corresponding frequency points respectively by the weighting coefficients to obtain time-frequency domain beam output signals, and complete spatial filtering;

Inverse discrete Fourier transform is performed on the time-frequency domain beam output signal to calculate and obtain a target audio signal.
The audio signal processing method of the microphone array according to claim 2, wherein the specified mode of the signal acquisition channel is one of the following modes:

Selecting all channel signals in the microphone array and designating them as the signal acquisition channels;

In the microphone array, a part of channels is symmetrically selected with the center microphone pair as the center, and designated as the signal acquisition channel;

Selecting some channels in the microphone array in an asymmetric manner and designating them as the signal acquisition channels.
The audio signal processing method of a microphone array according to claim 2, wherein the frequency bands include low frequency bands, middle frequency bands and high frequency bands;

Wherein, the frequency band is obtained by dividing all processing frequency bands according to the signal sampling rate and the number of analysis frequency points; the low frequency band, middle frequency band and high frequency band respectively correspond to different constraint conditions and cost functions.
The audio signal processing method of a microphone array according to claim 2, wherein, after optimizing and solving the weighting coefficients, further comprising:

Calculate the first-order difference of the weighting coefficient of each channel at each frequency point, and the first-order difference average of the adjacent frequency points of each frequency point;

If the relative deviation between the first-order difference value at a certain frequency point and the corresponding first-order difference average value is greater than a preset deviation threshold, then use the frequency point as a discontinuity point of the weighting coefficient;

Select the discontinuous point and the adjacent frequency points of the discontinuous point as the interval to be smoothed, smooth the weight coefficient in the interval to be smoothed, and update the weight coefficient in the interval to be smoothed to smooth weighting factor.
The audio signal processing method of the microphone array according to claim 4, wherein, the array steering vector matrix is calculated according to the imaginary signals of all directions of arrival, and the method further comprises:

Obtain the frequency range, beam main lobe angle range, beam attenuation angle range and weighting coefficient norm threshold of each frequency band, and obtain the expected main lobe response of the mid-frequency band and the main lobe deviation threshold of the mid-high frequency band;

The constraint conditions of the low frequency band include: the norm of the weighting coefficient is less than the low frequency band threshold of the weighting coefficient norm; the cost function of the low frequency band is: the imaginary signals of all incoming wave directions within the low frequency band beam attenuation angle range pass through beamforming output power;

The constraints of the mid-frequency band include: the norm of the weighting coefficient is less than the mid-frequency band threshold of the weighting coefficient norm, and the deviation between the main lobe of the mid-frequency band beam output and the expected main lobe response of the mid-frequency band is smaller than the main lobe deviation Frequency band threshold; the cost function of the middle frequency band is: the hypothetical signals of all incoming wave directions within the beam attenuation angle range of the middle frequency band pass through the beamforming output power;

The constraints of the high frequency band include: the norm of the weighting coefficient is less than the high frequency band threshold of the weighting coefficient norm, and the deviation between the main lobe of the high frequency beam output and the expected main lobe response of the high frequency band is smaller than the deviation of the main lobe Frequency band threshold; the cost function of the high frequency band is: the hypothetical signals of all incoming wave directions within the beam attenuation angle range of the high frequency band pass through the beamforming output power;

Wherein, the output power is calculated by the array steering vector matrix and the weighting coefficient; the output main lobe of the mid-band beam and the output main lobe of the high-frequency beam are at corresponding frequency points, according to the The weighting coefficient is obtained by performing weighted summation on the array steering vector matrix;

The constraint conditions corresponding to the low frequency band, the middle frequency band and the high frequency band also include: the output gain of the hypothetical signal from the angular direction of the center of the main lobe after beamforming is 1.
The audio signal processing method of the microphone array according to claim 5, wherein, the acquisition method of the expected main lobe response of the high frequency band is:

After completing the optimization of the mid-frequency band weighting coefficients, the array steering vector matrix is weighted and summed according to the optimized weighting coefficients to obtain the beam main lobe shape, and the beam main lobe shape is used as the high-frequency band desired main lobe response.
An audio signal processing device for a microphone array, wherein the device includes:

The array steering vector group calculation unit is used to calculate the corresponding array steering vector group according to the structural parameters of the microphone array and the signal acquisition channel, wherein the array steering vector group includes several array steering vector matrices corresponding to different frequency points;

The weighting coefficient solving unit is used to select corresponding constraints and cost functions according to the frequency band to which each frequency point belongs, and optimize and solve the weighting coefficients, wherein the cost function is composed of the array steering vector matrix and the weighting coefficients calculated;

A signal extraction unit, configured to extract a time-domain audio signal corresponding to the signal acquisition channel from the multi-channel signal collected by the microphone array, and perform discrete Fourier transform on the time-domain audio signal corresponding to the signal acquisition channel to obtain a corresponding multi-channel time-frequency domain signal;

The spatial filtering unit is configured to perform weighted summation of the multi-channel time-frequency domain signals of corresponding frequency points through the weighting coefficients to obtain a time-frequency domain beam output signal and complete spatial filtering;

A signal generating module, configured to perform inverse discrete Fourier transform on the time-frequency domain beam output signal to calculate and obtain a target audio signal.
A kind of computer equipment, comprises memory and processor, and described memory stores computer program, it is characterized in that, when described processor executes described computer program, realizes the audio signal processing method of a kind of microphone array, is applied to claim 1 A kind of microphone array described in; Wherein, the audio signal processing method of described microphone array comprises:

Calculate the corresponding array steering vector group according to the structural parameters of the microphone array and the signal acquisition channel, wherein the array steering vector group includes several array steering vector matrices corresponding to different frequency points;

According to the frequency band to which each frequency point belongs, select corresponding constraints and cost functions, and optimize and solve the weighting coefficients, wherein the cost function is calculated by the array steering vector matrix and the weighting coefficients;

Extract the time-domain audio signal corresponding to the signal acquisition channel from the multi-channel signal collected by the microphone array, and perform discrete Fourier transform on the time-domain audio signal corresponding to the signal acquisition channel to obtain the corresponding multi-channel time-frequency domain Signal;

Perform weighted summation on the multi-channel time-frequency domain signals of corresponding frequency points respectively by the weighting coefficients to obtain time-frequency domain beam output signals, and complete spatial filtering;

Inverse discrete Fourier transform is performed on the time-frequency domain beam output signal to calculate and obtain a target audio signal.
The computer device according to claim 9, wherein the designated manner of the signal acquisition channel is one of the following manners:

Selecting all channel signals in the microphone array and designating them as the signal acquisition channels;

In the microphone array, a part of channels is symmetrically selected with the center microphone pair as the center, and designated as the signal acquisition channel;

Selecting some channels in the microphone array in an asymmetric manner and designating them as the signal acquisition channels.
The computer device according to claim 9, wherein the frequency bands include low frequency bands, mid frequency bands and high frequency bands;

Wherein, the frequency band is obtained by dividing all processing frequency bands according to the signal sampling rate and the number of analysis frequency points; the low frequency band, middle frequency band and high frequency band respectively correspond to different constraint conditions and cost functions.
The computer device according to claim 9, wherein, after optimizing and solving the weighting coefficients, further comprising:

Calculate the first-order difference of the weighting coefficient of each channel at each frequency point, and the first-order difference average of the adjacent frequency points of each frequency point;

If the relative deviation between the first-order difference value at a certain frequency point and the corresponding first-order difference average value is greater than a preset deviation threshold, then use the frequency point as a discontinuity point of the weighting coefficient;

Select the discontinuous point and the adjacent frequency points of the discontinuous point as the interval to be smoothed, smooth the weight coefficient in the interval to be smoothed, and update the weight coefficient in the interval to be smoothed to smooth weighting factor.
The computer device according to claim 11, wherein the array steering vector matrix is calculated based on hypothetical signals of all directions of arrival, the method further comprising:

Obtain the frequency range, beam main lobe angle range, beam attenuation angle range and weighting coefficient norm threshold of each frequency band, and obtain the expected main lobe response of the mid-frequency band and the main lobe deviation threshold of the mid-high frequency band;

The constraint conditions of the low frequency band include: the norm of the weighting coefficient is less than the low frequency band threshold of the weighting coefficient norm; the cost function of the low frequency band is: the imaginary signals of all incoming wave directions within the low frequency band beam attenuation angle range pass through beamforming output power;

The constraints of the mid-frequency band include: the norm of the weighting coefficient is less than the mid-frequency band threshold of the weighting coefficient norm, and the deviation between the main lobe of the mid-frequency band beam output and the expected main lobe response of the mid-frequency band is smaller than the main lobe deviation Frequency band threshold; the cost function of the middle frequency band is: the hypothetical signals of all incoming wave directions within the beam attenuation angle range of the middle frequency band pass through the beamforming output power;

The constraints of the high frequency band include: the norm of the weighting coefficient is less than the high frequency band threshold of the weighting coefficient norm, and the deviation between the main lobe of the high frequency beam output and the expected main lobe response of the high frequency band is smaller than the deviation of the main lobe Frequency band threshold; the cost function of the high frequency band is: the hypothetical signals of all incoming wave directions within the beam attenuation angle range of the high frequency band pass through the beamforming output power;

Wherein, the output power is calculated by the array steering vector matrix and the weighting coefficient; the output main lobe of the mid-band beam and the output main lobe of the high-frequency beam are at corresponding frequency points, according to the The weighting coefficient is obtained by performing weighted summation on the array steering vector matrix;

The constraint conditions corresponding to the low frequency band, the middle frequency band and the high frequency band also include: the output gain of the hypothetical signal from the angular direction of the center of the main lobe after beamforming is 1.
The computer device according to claim 12, wherein the acquisition method of the high frequency band expected main lobe response is:

After completing the optimization of the mid-frequency band weighting coefficients, the array steering vector matrix is weighted and summed according to the optimized weighting coefficients to obtain the beam main lobe shape, and the beam main lobe shape is used as the high-frequency band desired main lobe response.
A computer-readable storage medium, on which a computer program is stored, characterized in that, when the computer program is executed by a processor, an audio signal processing method of a microphone array is implemented, which is applied to a method as described in claim 1 A microphone array; Wherein, the audio signal processing method of the microphone array comprises:

Calculate the corresponding array steering vector group according to the structural parameters of the microphone array and the signal acquisition channel, wherein the array steering vector group includes several array steering vector matrices corresponding to different frequency points;

According to the frequency band to which each frequency point belongs, select corresponding constraints and cost functions, and optimize and solve the weighting coefficients, wherein the cost function is calculated by the array steering vector matrix and the weighting coefficients;

Extract the time-domain audio signal corresponding to the signal acquisition channel from the multi-channel signal collected by the microphone array, and perform discrete Fourier transform on the time-domain audio signal corresponding to the signal acquisition channel to obtain the corresponding multi-channel time-frequency domain Signal;

Perform weighted summation on the multi-channel time-frequency domain signals of corresponding frequency points respectively by the weighting coefficients to obtain time-frequency domain beam output signals, and complete spatial filtering;

Inverse discrete Fourier transform is performed on the time-frequency domain beam output signal to calculate and obtain a target audio signal.
The computer-readable storage medium according to claim 15, wherein the designated manner of the signal acquisition channel is one of the following manners:

Selecting all channel signals in the microphone array and designating them as the signal acquisition channels;

In the microphone array, a part of channels is symmetrically selected with the center microphone pair as the center, and designated as the signal acquisition channel;

Selecting some channels in the microphone array in an asymmetric manner and designating them as the signal acquisition channels.
The computer readable storage medium of claim 15, wherein the frequency bands include low frequency bands, mid frequency bands and high frequency bands;

Wherein, the frequency band is obtained by dividing all processing frequency bands according to the signal sampling rate and the number of analysis frequency points; the low frequency band, middle frequency band and high frequency band respectively correspond to different constraint conditions and cost functions.
The computer-readable storage medium according to claim 15, wherein, after optimizing and solving the weighting coefficients, further comprising:

Calculate the first-order difference of the weighting coefficient of each channel at each frequency point, and the first-order difference average of the adjacent frequency points of each frequency point;

If the relative deviation between the first-order difference value at a certain frequency point and the corresponding first-order difference average value is greater than a preset deviation threshold, then use the frequency point as a discontinuity point of the weighting coefficient;

Select the discontinuous point and the adjacent frequency points of the discontinuous point as the interval to be smoothed, smooth the weight coefficient in the interval to be smoothed, and update the weight coefficient in the interval to be smoothed to smooth weighting factor.
The computer-readable storage medium according to claim 17, wherein the array steering vector matrix is calculated based on hypothetical signals of all directions of arrival, the method further comprising:

Obtain the frequency range, beam main lobe angle range, beam attenuation angle range and weighting coefficient norm threshold of each frequency band, and obtain the expected main lobe response of the mid-frequency band and the main lobe deviation threshold of the mid-high frequency band;

The constraint conditions of the low frequency band include: the norm of the weighting coefficient is less than the low frequency band threshold of the weighting coefficient norm; the cost function of the low frequency band is: the imaginary signals of all incoming wave directions within the low frequency band beam attenuation angle range pass through Beamforming output power;

The constraints of the mid-frequency band include: the norm of the weighting coefficient is less than the mid-frequency band threshold of the weighting coefficient norm, and the deviation between the main lobe of the mid-frequency band beam output and the expected main lobe response of the mid-frequency band is smaller than the main lobe deviation Frequency band threshold; the cost function of the middle frequency band is: the hypothetical signals of all incoming wave directions within the beam attenuation angle range of the middle frequency band pass through the beamforming output power;

The constraints of the high frequency band include: the norm of the weighting coefficient is less than the high frequency band threshold of the weighting coefficient norm, and the deviation between the main lobe of the high frequency beam output and the expected main lobe response of the high frequency band is smaller than the deviation of the main lobe Frequency band threshold; the cost function of the high frequency band is: the hypothetical signals of all incoming wave directions within the beam attenuation angle range of the high frequency band pass through the beamforming output power;

Wherein, the output power is calculated by the array steering vector matrix and the weighting coefficient; the output main lobe of the mid-band beam and the output main lobe of the high-frequency beam are at corresponding frequency points, according to the The weighting coefficient is obtained by performing weighted summation on the array steering vector matrix;

The constraint conditions corresponding to the low frequency band, the middle frequency band and the high frequency band also include: the output gain of the hypothetical signal from the angular direction of the center of the main lobe after beamforming is 1.
The computer-readable storage medium according to claim 18, wherein the method for obtaining the expected main lobe response in the high frequency band is:

After completing the optimization of the mid-frequency band weighting coefficients, the array steering vector matrix is weighted and summed according to the optimized weighting coefficients to obtain the beam main lobe shape, and the beam main lobe shape is used as the high-frequency band desired main lobe response.