WO2011153904A1

WO2011153904A1 - Speech signal processing method and device based on microphone array

Info

Publication number: WO2011153904A1
Application number: PCT/CN2011/074794
Authority: WO
Inventors: 何宏森; 黄志宏; 邱小军; 袁浩
Original assignee: 中兴通讯股份有限公司
Priority date: 2010-06-08
Filing date: 2011-05-27
Publication date: 2011-12-15
Also published as: CN101867853A; CN101867853B

Abstract

The present invention discloses a speech signal processing method based on a microphone array, and the microphone array is composed of more than two directional microphones. The method comprises the following steps: determining the energy values of speech signals of the same frame, received by each directional microphone; determining adjustment parameters of speech signals of the same frame according to energy values; determining the weight of each sampling point signal in the speech signals according to the adjustment parameter of each speech signal, multiplying each sampling point signal in each speech signal by each weight, accumulating product values of sampling point signals corresponding to each speech signal, and outputting accumulated sampling point signals in sequence. The present invention also discloses a speech signal processing device based on the microphone array. The present invention has a simple compute mode, needs no complex calculation and circuit, and has favorable reverberation resistance and orientated pickup functions.

Description

Voice signal processing method and device based on microphone array

The present invention relates to voice signal processing technologies, and in particular, to a voice signal processing method and apparatus based on a microphone array. Background technique

At conference venues, the presence of various sources of interference and noise such as reverberations that interfere with the speech signal can dramatically degrade the performance of the speech processing system, so speech enhancement techniques are important. The multi-channel speech enhancement algorithm based on the microphone array combines the spatio-temporal information of the signal, and uses the difference of noise and speech to denoise. In recent years, it has become an important technology relying on multimedia conference, communication, voice control and other systems. The sound quality and performance will seriously affect the overall effect and market competitiveness of the audio conferencing system. Therefore, for noise, noise cancellation is often achieved through the microphone array technology, which makes the participants of the audio conferencing system completely free from the handheld microphone and oriented. The shackles of the microphone greatly improve the practicality of the audio conferencing system. For speech signal processing, it is necessary to strive to make the speech quality of the input encoder better, such as low reverberation, low noise, etc., and the microphone array is designed to ensure low reverberation and low noise of the speech signal.

A "voice conference system" is disclosed in Chinese Patent Application Publication No. CN101496417A, the disclosure of which is hereby incorporated by reference. a signal, and thereafter, a signal level of the voice 釆 bundle signal corresponding to the direction of arrival of the voice becomes higher, and the voice concentrating portion selects a voice 釆 bundle signal whose signal level exceeds the set threshold, and sends the signal to the communication portion . In this technical solution, there may be more than one voice 釆 bundle signal exceeding the threshold, so that the reverberation is increased in the small room, and the sound clarity is lowered.

U.S. Patent Application No. US20050195988A1, published on September 8, 2005 Please disclose a "System and method for beamforming using a microphone array". The technical solution is a system and method for bunching using a microphone array. The essence of the technical solution is to design a buncher, which first utilizes the buncher. Describe the parameter information of the characteristics and structure of the microphone array to calculate the frequency domain related weight matrix, combined with one or more noise models automatically generated or calculated for the surroundings of the microphone array to optimize the fixed beam of the microphone array The design is performed, and then, when the audio signal received by the microphone array is subjected to frequency domain focusing processing, the weight matrix is used to frequency-weight the output of each of the microphones in the microphone array. The method needs to calculate the weighting matrix in the frequency domain according to the characteristics and structure of the array, so as to achieve the purpose of forming a beam, increasing the complexity of the system, increasing the development difficulty of the system and reducing the reliability of the system. Summary of the invention

In view of the above, the main object of the present invention is to provide a method and apparatus for processing a voice signal based on a microphone array. The strong directional microphone array can amplify the voice signal closest to the speaker, thereby dynamically tracking the speaker.

In order to achieve the above object, the technical solution of the present invention is achieved as follows:

A voice signal processing method based on a microphone array, the microphone array being composed of two or more directional microphones; the method comprising:

Determining an energy value of a speech signal of the same frame received by each directional microphone;

Determining, according to the energy value, an adjustment parameter of each voice signal of the same frame;

Determining the weight of each sampling point signal in the speech signal according to the adjustment parameter of each speech signal, multiplying each sampling point signal in each speech signal by a respective weight value, and performing a product value of the corresponding sampling point signal of each speech signal Accumulatively, the accumulated sampling point signals are sequentially output.

Preferably, the determining, according to the energy value, an adjustment parameter of each voice signal of the same frame is:

Comparing the energy values of the respective speech signals of the same frame with the maximum energy value; The quotient values are subjected to exponential adjustment processing as adjustment parameters for each speech signal.

Preferably, the exponential adjustment processing is performed on each quotient value, and as an adjustment parameter of each voice signal, it is:

The E-th power of each quotient is used as an adjustment parameter of each speech signal; wherein E is a positive number greater than or equal to 2 and less than or equal to 10.

Preferably, the determining, according to the adjustment parameter of each voice signal, the weight of each sampling point signal in the voice signal, which is calculated according to the following formula:

w _i {n) = w _i {n-\) + {\-X)C-where, _w ) is the weight of the nth sample point signal in the current speech signal frame in the microphone i, _W -1) The weight of the n-1th sampling point signal in the current speech signal frame in the microphone i; is a predetermined forgetting factor, 0<<1; C is an adjustment parameter of the current speech signal frame.

Preferably, the determining, according to the adjustment parameter of each voice signal, the weight of each sampling point signal in the voice signal is:

w _i {n) = w _i {n-\) + {\-X)C-where, _w ) is the initial weight of the nth sample point signal in the current speech signal frame in the microphone i, _W -1) The initial weight of the n-1th sampling point signal in the current speech signal frame in the microphone i; is a predetermined forgetting factor, 0<<1; C is an adjustment parameter of the current speech signal frame;

The following method is used to treat w ) as the final weight of the nth sampling point signal in the current speech signal frame in the microphone i:

v. (n)

^(η) = ^ ( (, ' , , --, where max ( ) is the maximum value.

Max^(n), w ₂ (n), ..., w _N (n)) Preferably, the microphone array is a circular array or a spherical array; the number of microphones in the microphone array is 4 to 16.

A voice signal processing device based on a microphone array, the microphone array being composed of two or more directional microphones; the device comprising a first determining unit, a second determining unit, and a meter Computing unit and output unit; wherein

a first determining unit, configured to determine an energy value of a voice signal of the same frame received by each directional microphone;

a second determining unit, configured to determine, according to the energy value, an adjustment parameter of each voice signal of the same frame;

a calculating unit, configured to determine, according to an adjustment parameter of each voice signal, a weight of each sampling point signal in the voice signal, multiply each sampling point signal in each voice signal by a respective weight, and corresponding sampling points of each voice signal The product value of the signal is accumulated;

The output unit is configured to sequentially output the accumulated sampling point signals.

Preferably, the second determining unit further compares the energy values of the voice signals of the same frame with the maximum energy value; and performs exponential adjustment processing on each quotient value as an adjustment parameter of each voice signal.

Preferably, the second determining unit further uses the E-th power of each quotient as an adjustment parameter of each voice signal; wherein, E is a positive number greater than or equal to 2 and less than or equal to 10.

Preferably, the calculating unit further calculates a weight of each sampling point signal in the voice signal according to the following formula:

Preferably, the calculating unit further calculates the weight of each sampling point signal in the voice signal as follows:

w _i {n) = w _i {n-\) + {\-X)C-where, _w ) is the initial weight of the nth sample point signal in the current speech signal frame in the microphone i, _W -1) Is the initial weight of the n-1th sampling point signal in the current speech signal frame in the microphone i; is a predetermined forgetting factor, 0<<1; C The adjustment parameter for the current speech signal frame;

v. (n)

^ (n) = ^ ( (, ¹ —- , where max ( ) is the maximum value.

Max^(n), w ₂ (n), ..., w _N (n)) Preferably, the microphone array is a circular array or a spherical array; the number of microphones in the microphone array is 3 to 16.

In the present invention, a strong array of N directional microphones is used to form a circular array, and the pickup of the array covers a 360-degree orientation; first, the energy value of the speech signal received by each microphone in the strong directional microphone array, and the energy of the speech signal. The value information determines an adjustment parameter of the voice signal of the current voice frame received by each microphone, and uses the adjustment parameter to calculate the weight of each sample point signal of the current voice frame, and the calculated weight value and the corresponding weight The sample signals are multiplied, and the product of the sample signals at the same position is accumulated and sequentially output in the order of the sample points. The invention utilizes the energy values of the speech signals received by the microphones in the microphone array to determine the adjustment parameters of the respective speech signals, and smoothes the signal of each sample point by using the forgetting factor, so that the outputted speech signals are more consistent. The invention has simple calculation method, does not require complicated calculations and circuits, and has good anti-reverberation and directional pickup functions. DRAWINGS

1 is a flow chart of a method for processing a voice signal based on a microphone array according to the present invention; FIG. 2 is a normalized energy change of a voice signal frame of a voice signal picked up by each microphone in a microphone array when two sound sources of the reverberation chamber are switched to each other. Schematic diagram of the relationship;

FIG. 3 is a schematic diagram showing the relationship between the average weights of the voice frames of each channel in the output signal of the microphone array when the two sound sources are switched to each other in the reverberation chamber;

4 is a schematic diagram showing a normalized energy change relationship of a speech signal speech frame picked up by each microphone in the microphone array when two sound sources are simultaneously sounded in the reverberation chamber; FIG. 5 is a schematic diagram showing the relationship between the average weights of voice frames of each channel in the output signal of the microphone array when two sound sources are simultaneously sounded in the reverberation chamber;

6 is a schematic diagram showing a normalized energy change relationship of a speech signal speech frame picked up by each microphone in the microphone array when two sound sources are switched to each other in an ordinary room;

FIG. 7 is a schematic diagram showing the relationship between the average weights of the voice frames of each channel in the output signal of the microphone array when the two sound sources are switched to each other in an ordinary room;

8 is a schematic diagram showing a normalized energy change relationship of a speech signal speech frame picked up by each microphone in the microphone array when two sound sources are simultaneously sounded in an ordinary room;

FIG. 9 is a schematic diagram showing the relationship between the average weights of voice frames of each channel in the output signal of the microphone array when two sound sources are simultaneously sounded in an ordinary room;

Fig. 10 is a schematic view showing the structure of a voice signal processing apparatus based on a microphone array of the present invention. detailed description

The basic idea of the present invention is to form a circular array by using N strong directional microphones, so that the pickup of the microphone array covers 360 degrees of orientation; calculate the energy of the signals picked up by the microphones, and maintain the maximum energy by comparing the energy. The amplitude of the speech signal of the channel is unchanged, and the speech signal of other channels is weakened; the degree of weakening of the speech signal is controlled by the adjustment parameter; and, in order to ensure that the speech signal is smooth and natural without switching noise when switching between channels based on energy comparison, A smoothing mechanism-forgetting factor is introduced, and the current sampling point is combined with the signal of the previous sampling point to switch.

The present invention will be further described in detail below with reference to the accompanying drawings.

In the method of the present invention, the microphones in the microphone array are all strong directivity microphones, rather than omnidirectional microphones. The so-called strong directional microphone, that is, the microphone can perform the collection of voice signals by pointing. Strong directional microphone can effectively reduce the reverberation intensity entering each microphone; the invention It is the directional pick-up feature of the strong directional microphone that uses the energy of the same speech frame picked up by each microphone to determine the weight of each sample signal in each of the same speech frames, so that the output is better. Voice signal. The microphone array of the present invention uses a circumferential or spherical layout to collect the speech signals of the various bits. In the present invention, the number of strong directional microphones in the microphone array is generally 3 to 16, so as to be evenly distributed on the set circumference or the spherical surface, and the corresponding microphones are provided for the respective points to perform voice collection. The radius of the circumference or the sphere is generally 3 to 20 cm, and the diaphragms of the microphones face outward in the radial direction of the circumference or the sphere.

Based on the microphone array, the kth frame (frame length L milliseconds) received by the (=1, 2, .., N) microphones in the above microphone array is as shown in (1):

x _i (n) = x _i ((kl)L + j), j = \, 2, ···, L (1)

FIG. 1 is a flowchart of a method for processing a voice signal based on a microphone array according to the present invention. As shown in FIG. 1, the voice signal processing method based on the microphone array of the present invention specifically includes the following steps: Step 101: Calculate an i (i =1, 2) , ..., Ν) The energy of the first frame signal received by the microphone. Because the speech signal collected by the microphone of the sound source is relatively strong, the energy of the speech signal can be used to make a preliminary judgment of the sound source orientation; the energy value of the calculated speech signal is also determined as the microphone. The basis for the weight value of the speech signal processing will be how to determine the corresponding weight value in the following steps. The energy value of the first frame signal received by the (1, 2, . . . , N) microphones is as shown in equation (2):

AW = i , (( - D ^{L +} Γ (2) In the present invention, the length of the speech frame of each channel for calculating energy can be taken as 400 ms; the system response time of adaptive switching between channels is taken as 400 ms. The above frame length is processed by The processing speed of the device is determined, and other lengths, such as 450ms or 500ms, can also be taken.

Step 102: Normalize the energy value determined by the equation (2) based on the maximum value of the energy of the frame signal of the N channels. In this step, the normalization process is to convert the energy value of the frame signal of each channel to a value between 0 and 1, for subsequent processing. Normalization The processing method is as shown in the formula (3), where is the normalized processing result of the pair. e _l{k) = ^ (3 )

Mx(E _l (k), E ₂ (k , ·'·,Ε _Ν (k))

Where max ( ) is the maximum value.

Step 103: Calculate an adjustment parameter according to the normalized energy of the first frame signal received by the (=1, 2, . . . , N) microphones. The purpose of determining the adjustment parameters is to make the speech signal on the channel with a large energy value larger, and to make the speech signal on the channel with a smaller energy value smaller, and thereby increase the energy value and the larger speech signal and energy. The difference between the smaller voice signals, which can highlight the signal in the direction of the sound source, suppress the signal in other directions, make the sound clearer and the reverberation is smaller. Specifically, for the normalized energy values, they are respectively subjected to a power operation. In this step, the selected adjustment index value is a positive number greater than or equal to 2 and less than or equal to 10. In order to facilitate the calculation and take into account the difference in the speech signal, the adjustment index is generally selected 4, 5, 6. The adjustment parameter WW is determined as shown in equation (4):

Mk) =[ _£i (k)f (4)

It is called the adjustment index, and adjusts the proportion of each channel signal in the output signal according to the energy relationship of the speech frames of each channel.

Step 104: Calculate weights of the nth sample point signals of the (1, 2, .., N) microphone sets in the array output signal; the change of the weight is according to each sample point signal The step-by-step calculation, specifically, the weight of the nth sample point signal ^) is determined as shown in equation (5): w _t (n) = lw _t (n - 1) + (1 - λ) ^ (k) (5)

Among them, the forgetting factor is used to smooth the volume of the speech frame before and after switching, to avoid the flickering of the speech signal, and to suppress the switching noise caused by the change of the speech frame energy of the channel when switching. For a parameter set in advance, a number greater than 0 and less than 1, in order to ensure the smoothness of the speech signal, which is a number close to 1, λ = 0.9998 can be set in the present invention; λ can also be set to other values, such as 0.9996, 0.9992, 0.9990, etc. The specific value is determined by the smoothness desired by the user. Step 105: Normalize the weight of each sample point of the signal of the i-th (i =1, 2, N) microphone set according to the maximum value thereof. This is mainly to make the signal volume of the maximum energy channel output by the microphone array equal to the volume of the signal collected by the channel microphone with the largest energy. The normalization process for the weight of each sample signal of the signal set by the (=1, 2, .., N) microphones is as shown in equation (6): max(w ₁ (n) , w ₂ (n), ···, w _N (n)) where max ( ) is the maximum value.

Step 106: Calculate the output sample signal of the microphone array, and output them in sequence. The output of each sample signal is as shown in equation (7):

N

s(n) = ^w _i (n)x _i (n) Equation (7) is to multiply each sample point in the speech signal of the same frame of each microphone in the microphone array by the determined corresponding weight. The corresponding sample point signals of the respective microphones are accumulated as the output sample point signals.

In the present invention, the typical front-end processing before entering the processing of the algorithm in actual work is to convert the voice signal into an electrical signal through a microphone, and perform processing by amplifying and analog-to-digital conversion into a digital signal processor (DSP).

The following is an example in which the microphone array is evenly distributed along the circumference of four microphones to illustrate the results of speech signal processing in each application environment. Among them, the radius of the circumference is 5cm, the forgetting factor is 0.9998, and the adjustment index is "=5.0.

2 is a schematic diagram showing the relationship between the normalized energy changes of the speech signal speech frames picked up by the microphones in the microphone array when the two sound sources are switched to each other in the reverberation chamber, as shown in FIG. 2, showing the reverberation chamber. When the two sound sources switch to each other, the normalized energy change relationship of the speech signal speech frames picked up by the microphones in the microphone array is calculated by the method of the present invention after calculating the energy of the speech frames picked up by the respective microphones. FIG. 3 is a schematic diagram showing the relationship between the average weights of the voice frames of each channel in the output signal of the microphone array when the two sound sources are switched to each other in the reverberation chamber, as shown in FIG. 3, two sound sources in the reverberation chamber. When switching the sounds of each other, the average weight value of the voice frames of each channel in the output signal of the microphone array is calculated by the method of the present invention, and the present invention can be picked up according to the microphones. The voice frame energy of the sound is automatically switched, and the switching process is naturally stable. After the voice signal picked up by each microphone is processed by the method of the present invention, the sound quality of the output voice signal of the microphone array is smooth and natural, and the reverberation is greatly reduced.

4 is a schematic diagram showing the relationship between the normalized energy changes of the speech signal speech frames picked up by the microphones in the microphone array when the two sound sources are simultaneously sounded in the reverberation chamber, as shown in FIG. 4, which shows two in the reverberation chamber. When a sound source simultaneously emits sound, the method of the present invention calculates the normalized energy change relationship of the speech frame energy picked up by each microphone and the speech signal speech frame picked up by each microphone in the microphone array.

FIG. 5 is a schematic diagram showing the relationship between the average weights of the speech frames of each channel in the output signal of the microphone array when two sound sources are simultaneously sounded in the reverberation chamber, as shown in FIG. 5, at the same time in the reverberation chamber. When the sound is made, the method of the present invention calculates the average frame weight change of the speech frame energy of each channel in the output signal of the microphone array. It can be seen that the present invention can automatically switch according to the size of the speech frame energy of each microphone pickup, and the switching process is naturally stable. After the voice signal picked up by each microphone is processed by the method of the present invention, the sound quality of the output voice signal of the microphone array is smooth and natural.

6 is a schematic diagram showing the relationship of the normalized energy change of the speech signal speech frames picked up by the microphones in the microphone array when the two sound sources are switched to each other in an ordinary room. As shown in FIG. 6, two are shown in the ordinary room. When the sound sources switch to each other, the method of the present invention calculates the normalized energy change relationship of the speech frame energy picked up by each microphone and the speech signal speech frame picked up by each microphone in the microphone array.

FIG. 7 is a schematic diagram showing the relationship between the average weights of voice frames of each channel in the output signal of the microphone array when two sound sources are switched to each other in an ordinary room, as shown in FIG. When the two sound sources are switched to each other, the method of the present invention is used to calculate the energy of the speech frame picked up by each microphone, and the average weight change relationship of the voice frames of each channel in the output signal of the microphone array. It can be seen that the present invention can automatically switch according to the size of the speech frame energy of each microphone pickup, and the switching process is naturally stable. After the voice signal picked up by each microphone is processed by the method of the present invention, the sound quality of the output voice signal of the microphone array is smooth and natural. Reverberation is reduced.

FIG. 8 is a schematic diagram showing the relationship between the normalized energy changes of the speech signal speech frames picked up by the microphones in the microphone array when two sound sources are simultaneously sounded in an ordinary room. As shown in FIG. 8, two sound sources simultaneously sound in the ordinary room. When the present invention is used to calculate the speech frame energy picked up by each microphone, and the normalized energy change relationship of the speech signal speech frames picked up by the microphones in the microphone array;

FIG. 9 is a schematic diagram showing the relationship between the average weights of voice frames of each channel in the output signal of the microphone array when two sound sources are simultaneously sounded in an ordinary room. As shown in FIG. 9, when two sound sources are simultaneously sounded in an ordinary room. The invention calculates the average frame weight change of the speech frame energy of each channel in the output signal of the microphone array by using the present invention. It can be seen that the present invention can automatically switch according to the size of the speech frame energy of each microphone pickup, and the switching process is naturally stable. After the voice signal picked up by each microphone is processed by the method of the present invention, the sound quality of the output voice signal of the microphone array is smooth and natural.

The speech signal processed by the above steps can be output as a digital signal or as an analog signal after digital-to-analog conversion.

FIG. 10 is a schematic structural diagram of a structure of a voice signal processing apparatus based on a microphone array according to the present invention. As shown in FIG. 10, the apparatus includes a first determining unit 100, a second determining unit 101, a calculating unit 102, and an output unit 103.

a first determining unit 100, configured to determine an energy value of a voice signal of the same frame received by each directional microphone;

a second determining unit 101, configured to determine, according to the energy value, an adjustment parameter of each voice signal of the same frame; The calculating unit 102 is configured to determine weights of the sampling point signals in the voice signal according to the adjustment parameters of the voice signals, multiply each sampling point signal in each voice signal by a respective weight, and perform corresponding sampling on each voice signal. The product value of the point signal is accumulated;

The output unit 103 is configured to sequentially output the accumulated sampling point signals.

In the present invention, the microphone array is composed of two or more directional microphones.

The second determining unit 101 further compares the energy values of the speech signals of the same frame with the maximum energy value; and performs exponential adjustment processing on each quotient value as an adjustment parameter of each speech signal.

The second determining unit 101 further uses the E-th power of each quotient value as an adjustment parameter of each speech signal; wherein E is a positive number greater than or equal to 2 and less than or equal to 10.

The calculating unit 102 further calculates the weights of the sampling point signals in the speech signal according to the following formula: w, (n) = /lw, (nl) + (l-/l) C; wherein, _w ) is in the microphone i The weight of the nth sampling point signal in the current speech signal frame, _W -1) is the weight of the n-1th sampling point signal in the current speech signal frame in the microphone i; is a predetermined forgetting factor, 0 <<1; C is the adjustment parameter of the current speech signal frame.

The above calculating unit 102 further calculates the weight of each sampling point signal in the speech signal as follows:

w,(n) = /lw,(nl) + (l-/l)C; where _w ) is the initial weight of the nth sample point signal in the current speech signal frame in the microphone i, _W -1) The initial weight of the n-1th sampling point signal in the current speech signal frame in the microphone i; is a predetermined forgetting factor, 0<<1; C is an adjustment parameter of the current speech signal frame;

^(n) = ^ ( (, ^Λ , ---, where max ( ) is the maximum value.

Max ^ (n), w ₂ ( ),..., w _N (n)) The microphone array described above is a circular array or a spherical array; the number of microphones in the microphone array is 3 to 16.

It should be understood by those skilled in the art that the microphone signal processing apparatus based on the microphone array shown in FIG. 10 is designed to implement the aforementioned voice signal processing method based on the microphone array, and the functions of the processing units in the apparatus shown in FIG. 10 can be referred to. It is understood from the description of the foregoing method that the functions of the various processing units can be implemented by a program running on a processor, or by a specific logic circuit.

The above is only the preferred embodiment of the present invention and is not intended to limit the scope of the present invention.

Claims

Claim

A voice signal processing method based on a microphone array, characterized in that the microphone array is composed of two or more directional microphones; the method comprises:

2. The method according to claim 1, wherein the determining an adjustment parameter of each voice signal of the same frame according to the energy value is:

The energy values of the speech signals of the same frame are respectively compared with the maximum energy value; the quotient values are subjected to exponential adjustment processing, and are used as adjustment parameters of the respective speech signals.

The method according to claim 2, wherein the exponential adjustment processing is performed on each quotient value, and as an adjustment parameter of each voice signal, it is:

The method according to claim 1, wherein the determining, according to the adjustment parameter of each voice signal, the weight of each sampling point signal in the voice signal, which is calculated according to the following formula:

w,(n) = /lw,(n -l) + (l-/l)C ; where _w ) is the weight of the nth sample point signal in the current speech signal frame in the microphone i, _W -1 Is the weight of the n-1th sampling point signal in the current speech signal frame in the microphone i; is a predetermined forgetting factor, 0<<1; C is an adjustment parameter of the current speech signal frame.

The method according to claim 1, wherein the determining, according to the adjustment parameter of each voice signal, the weight of each sampling point signal in the voice signal is:

w,(n) = /lw,(n -l) + (l-/l)C ; where _w ) is the current speech signal frame in the microphone i The initial weight of the nth sampling point signal, _W -1) is the initial weight of the n-1th sampling point signal in the current speech signal frame in the microphone i; is a predetermined forgetting factor, 0<<1; C is the adjustment parameter of the current speech signal frame;

The w) is processed as follows, and is used as the final weight of the nth sample point signal in the current speech signal frame in the microphone i:

^ (n) = ^ ( (, ^Λ , --- , where max ( ) is the maximum value.

Max ^ (n), w ₂ ( ),..., w _N (n))

The method according to any one of claims 1 to 5, characterized in that the array of microphones is a circular array or a spherical array; the number of microphones in the array of microphones is from 3 to 16.

7. A speech signal processing apparatus based on a microphone array, wherein the microphone array is composed of two or more directional microphones; the apparatus comprises a first determining unit, a second determining unit, a calculating unit and an output unit ; among them,

The device according to claim 7, wherein the second determining unit further compares the energy values of the voice signals of the same frame with the maximum energy value; and indexes each quotient value The adjustment process is used as an adjustment parameter for each voice signal.

The device according to claim 8, wherein the second determining unit further uses the E-th power of each quotient as an adjustment parameter of each voice signal; wherein, E is greater than or equal to 2 and less than or equal to 10 number.

10. The apparatus according to claim 7, wherein the calculating unit further calculates a weight of each sampling point signal in the voice signal according to the following formula:

w,(n) = /lw,(nl) + (l-/l)C; where _w ) is the weight of the nth sample point signal in the current speech signal frame in the microphone i, _W -1) The weight of the n-1th sampling point signal in the current speech signal frame in the microphone i; is a predetermined forgetting factor, 0<<1; C is an adjustment parameter of the current speech signal frame.

11. The apparatus according to claim 7, wherein the calculating unit further calculates the weight of each sampling point signal in the voice signal as follows:

^(n) = ^ ( (, ^Λ , ---, where max ( ) is the maximum value.

Max(w _l (n), w ₂ (n), ···, w _N (n))

The apparatus according to any one of claims 7 to 11, wherein the microphone array is a circular array or a spherical array; the number of microphones in the microphone array is 3 to 16