CN106710603A

CN106710603A - Speech recognition method and system based on linear microphone array

Info

Publication number: CN106710603A
Application number: CN201611202169.0A
Authority: CN
Inventors: 贺来朋
Original assignee: SHANGHAI YUZHIYI INFORMATION TECHNOLOGY Co Ltd
Current assignee: Xiamen Yunzhixin Intelligent Technology Co Ltd; Unisound Shanghai Intelligent Technology Co Ltd
Priority date: 2016-12-23
Filing date: 2016-12-23
Publication date: 2017-05-24
Anticipated expiration: 2036-12-23
Also published as: CN106710603B

Abstract

The invention discloses a speech recognition method based on a linear microphone array. The method comprises the following steps: recording environment sound by utilizing the linear microphone array to form audio data; for a sound obtaining region in front of the linear microphone array, setting beam formers, and forming a main wave beam area in the center and a first noise wave beam area and a second noise wave beam area at the two sides in the sound obtaining region by utilizing the beam formers; inputting the audio data to the beam formers to obtain a main wave beam corresponding to the main wave beam area, a first noise wave beam corresponding to the first noise wave beam area and a second noise wave beam corresponding to the second noise wave beam area respectively; filtering out the first noise wave beam and the second noise wave beam in the main wave beam to obtain speech data to be identified; and carrying out speech recognition on the speech data to be identified to obtain corresponding text data and outputting the text data. The method and system are small in calculation amount, are high in quality of the obtained speech data and can improve speech recognition accuracy.

Description

Using the audio recognition method and system of linear microphone array

Technical field

The present invention relates to man machine language's identification field, espespecially a kind of audio recognition method using linear microphone array and System.

Background technology

In speech recognition system, the audio signal that generally got to microphone carries out noise reduction process, so as to suppress Ambient noise component in audio signal, so as to improve the recognition accuracy of speech recognition system.According to the wheat used in system The difference of gram wind number, the noise reduction algorithm of use can substantially be divided into single microphone noise reduction, dual microphone noise reduction and microphone Array noise reduction algorithm etc..

With the fast development of hardware system, microphone array is just increasingly widely applied.According to opening up for array element Flutter structure different, microphone array can be generally divided into linear array and annular array.Either linear array or circular array Row, when noise reduction process is carried out, are typically necessary the dimensional orientation that desired signal is obtained by auditory localization algorithm, then by solid Determine beamforming algorithm and form a reception wave beam with given shape, and desired signal place is pointed at beam main lobe center Direction.

However, auditory localization is carried out simultaneously and the amount of calculation of adaptive beamforming is very big, and when auditory localization occurs During deviation, it is easy to desired signal is caused to suppress or introduce distortion, and then has influence on speech recognition system performance.

The content of the invention

Defect it is an object of the invention to overcome prior art, proposes that a kind of voice using linear microphone array is known Other method and system, solve existing microphone array set-up mode exist it is computationally intensive, calculate complicated and cost of implementation compared with Problem high, it is therefore intended that reach good noise reduction using microphone array, to obtain high-quality voice data and carry The accuracy rate of speech recognition high.

To achieve these goals, the invention provides a kind of audio recognition method using linear microphone array, institute The method of stating includes：

The sound of environment is recorded to form voice data using linear microphone array；For the linear microphone array The sound in front obtains region and sets Beam-former, and obtaining region formation in the sound using the Beam-former is located at The main beam region at middle part and the first noise beam area and the second noise beam area positioned at both sides；By the audio number Main beam, correspondence the first noise wave beam in the correspondence main beam region are obtained according to being input in the Beam-former The first noise wave beam in region and the second noise wave beam of correspondence the second noise beam area；From the main beam The first noise wave beam and the second noise wave beam are filtered to obtain speech data to be identified；To the language to be identified Sound data carry out speech recognition to obtain corresponding text data and export.

Beneficial effects of the present invention are：The present invention forms three beam areas by obtaining to be designed in region sound, its In two wave beams be used to obtain noise, another wave beam is used to obtain desired signal, and exports correspondence by Beam-former Noise wave beam and main beam, noise wave beam is then further filtered out from main beam by sef-adapting filter module.The party Method does not need real-time tracking sound bearing, it is to avoid traditional algorithm probably due to sound source position estimated bias bring to expecting to believe Number suppression or distortion；Algorithm amount of calculation is small simultaneously, and implementation process is simple and convenient, and cost is relatively low, the speech data quality of acquisition It is high, it is possible to increase the accuracy rate of speech recognition.In addition combined with speech data to the self adaptation of speech recognition device, can be further Improve the accuracy rate of speech recognition.

Further improvement of the present invention is：Region being obtained for the sound in front of the linear microphone array, wave beam is set Shaper, including：The sound obtains region includes the plane domain of 0 ° to 180 ° of angle；It is provided for forming described first and makes an uproar Point to the sound and obtain in first Beam-former in sound wave region, the center of the wave beam that first Beam-former is formed Take 20 ° of directions in region；It is provided for being formed second Beam-former in the main beam region, by second Wave beam forming Point to 90 ° of directions that the sound obtains region in the center of the wave beam that device is formed；It is provided for forming second noise waves Point to the sound and obtain in 3rd Beam-former in beam region, the center of the wave beam that the 3rd Beam-former is formed 160 ° of directions in region.

Further improvement of the present invention is：When Beam-former is set, it is provided with each Beam-former and the line Property microphone array in each microphone correspondence connection wave filter, it is each Wave beam forming to use fixed beam shaping Algorithm Wave filter in device calculates filter coefficient；

The fixed beam shaping Algorithm includes：

y_n(k)=x_n(k)+v_n(k), n=1,2 ..., N (formula one)

In formula one, y_nK () is the voice data that n-th microphone is collected, x_n(k) and v_nK () collects respectively Desired signal and additive noise；In formula two,It is the output of Beam-former, the output of Beam-former is approached linearly The desired signal that certain microphone is received in microphone array,It is the corresponding filter coefficient of n-th microphone；

In formula three, e_mK () represents the output signal of Beam-former and the error of the desired signal for collecting, it is equal to The error e of desired signal_{X, m}The error e of (k) and additive noise_{V, m}The sum of (k)；And the error e of desired signal_{X, m}K () is made an uproar with additivity The error e of sound_{V, m}K () can be represented with formula four and formula five；

Formula six and formula seven are obtained based on mean square error is minimized, by minimizingTo make additive noise minimum, With reference to constraint e_{X, m}K ()=0 is drawing optimum filter coefficient h_{M, o}, h therein_mIt is all wave filter correspondences in Beam-former Filter coefficient matrices, h_{M, o}It is the corresponding optimal filter coefficient value of all wave filters in Beam-former.

Further improvement of the present invention is：Speech recognition is carried out to the speech data to be identified, including：First with institute State speech data to be identified carries out self adaptation operation to acoustic model；The acoustic model through self adaptation operation is then utilized to institute Stating speech data to be identified carries out speech recognition.

Further improvement of the present invention is：Self adaptation behaviour is carried out to acoustic model using the speech data to be identified Make, including：The speech data to be identified of setting quantity is extracted, and speech data to be identified to being extracted carries out text mark Note；

Extract it is described setting quantity the corresponding acoustic feature of speech data to be identified, and by corresponding text marking with The acoustic feature combines to form adaptive training data；

Adaptive training is carried out to the acoustic model using the adaptive training data.

Present invention also offers a kind of linear Microphone Array Speech identifying system, the system includes：It is linear with described The Beam-former of microphone array communication connection, sound of the Beam-former in front of the linear microphone array is obtained Region is taken to be formed positioned at the main beam region and the first noise beam area and the second noise beam zone positioned at both sides at middle part Domain, for being processed the received voice data and is obtained the main beam in the correspondence main beam region, correspondence First noise wave beam of the first noise beam area and the second noise waves of correspondence the second noise beam area Beam；

Sef-adapting filter module, with Beam-former communication connection, receives the main beam, first noise Wave beam and the second noise wave beam, and for filtering the first noise wave beam and second noise waves from the main beam Beam is obtaining speech data to be identified；

Speech recognition device, with sef-adapting filter module communication connection, receives the speech data to be identified, and For carrying out speech recognition to the speech data to be identified to obtain corresponding text data and export.

Further improvement of the present invention is：The sound obtains region includes plane domain of the angle from 0 ° to 180 °；Institute Stating Beam-former includes：The first Beam-former for forming the first noise waves region, first Wave beam forming Point to 20 ° of directions that the sound obtains region in the center of the wave beam that device is formed；

The second Beam-former for forming the main beam region, the wave beam that second Beam-former is formed Center point to 90 ° of directions that the sound obtains region；

The 3rd Beam-former for forming the second noise beam area, the 3rd Beam-former is formed The center of wave beam point to 160 ° of directions that the sound obtains region.

Further improvement of the present invention is：Be provided with each Beam-former with it is each in the linear microphone array The wave filter of microphone correspondence connection, the wave filter in each Beam-former is provided with corresponding filter coefficient；The filter Ripple device coefficient is calculated by fixed beam shaping Algorithm；

The fixed beam shaping Algorithm includes：

y_n(k)=x_n(k)+v_n(k), n=1,2 ..., N (formula one)

Further improvement of the present invention is：Institute's speech recognizer includes an acoustic model, and the acoustic model is through described Speech data to be identified is used further to identification speech data to be identified after carrying out adaptive training.

Further improvement of the present invention is：Institute's speech recognizer also include characteristic extracting module, text input module, Training data memory module and training module；

The characteristic extracting module is communicated to connect with the sef-adapting filter module, receives the voice number to be identified According to for extracting acoustic feature from the speech data to be identified for being received；

The text input module is used to be input into text marking corresponding with the speech data to be identified；

The training data memory module is communicated to connect with the characteristic extracting module and the text input module, is used for The acoustic feature and corresponding text marking are stored, the acoustic feature and corresponding text marking combine to form self adaptation instruction Practice data；

The training module is communicated to connect with the training data memory module, is read in the training data memory module Storage adaptive training data simultaneously carry out adaptive training using the adaptive training data for being read to the acoustic model.

Brief description of the drawings

Fig. 1 is the schematic diagram that sound obtains region；

Fig. 2 is the method flow diagram of linear Microphone Array Speech identification；

Fig. 3 is the schematic diagram of multi-channel adaptive wave filter.

Specific embodiment

Below in conjunction with the accompanying drawings, the present invention is described in further detail.With the fast development of hardware system, microphone array Row are just increasingly widely applied.Particularly in man machine language's interaction scenarios, conventional art pair carries out auditory localization simultaneously Amount of calculation with adaptive beamforming is very big, and when deviation occurs in auditory localization, it is easy to suppression is caused to desired signal System introduces distortion, and then has influence on speech recognition system performance.The present invention is applied in front of the linear microphone array 180 ° of plane domain, its applicable scene is man-machine interactive voice.Speaker when by Voice command machine, the speaker Can stand in face of machine, so when linear microphone array is listed in the voice for obtaining speaker, it is only necessary to consider in front of machine Voice, the voice without considering machine rear.The present invention is divided the plane domain in front of microphone, in utilization The wave beam wider in portion obtains the voice of speaker as much as possible, while suppressing ambient noise as much as possible；Using both sides compared with Narrow wave beam obtains environmental noise as much as possible, while suppressing desired human voice signal.Again by adaptive filter algorithm, from Ambient noise component is further eliminated in the output of main beam.Below, with reference to accompanying drawing to linear microphone array language of the invention The method and system of sound identification are illustrated.

As shown in Fig. 2 the invention discloses a kind of linear Microphone Array Speech identifying system, the system includes linear wheat Gram wind array 1, Beam-former, sef-adapting filter module 3 and speech recognition device 4.

Wherein, linear microphone array is used for recording the sound of external environment condition and changing into voice signal by digitizing Voice data.The sound acquisition region recorded is formed in the front of linear microphone array, Beam-former obtains area in sound Domain is formed with positioned at the main beam region at middle part, and in the first noise beam area of main beam region both sides and the second noise Beam area, Beam-former is communicated to connect with linear microphone array, receive main beam region, the first noise beam area and Voice data in second noise beam area simultaneously obtains main beam, the correspondence institute in the correspondence main beam region after being processed State the first noise wave beam of the first noise beam area and the second noise wave beam of correspondence the second noise beam area.

As shown in figure 3, sef-adapting filter module 3 is a multi-channel filter, with Beam-former communication connection, receive Main beam, the first noise wave beam and the second noise wave beam sent in Beam-former, first is passed through by the first noise wave beam Sef-adapting filter 31 carries out self adaptation, second noise wave beam is carried out into self adaptation by the second sef-adapting filter 32, then To carry out filtering in the first noise wave beam after self adaptation and the second noise wave beam main beam and exporting, the result after output passes through Feedback mechanism is delivered to sef-adapting filter 31 and 32, and according to normalization minimum mean-square (Normalized LeastMean Square, NLMS) algorithm constantly updates adaptive filter coefficient, and last result exports from sef-adapting filter module 3, Obtain speech data to be identified.

Speech recognition device 4, with the communication connection of sef-adapting filter module, speech recognition device 4 receives sef-adapting filter mould The speech data to be identified of block output, and speech recognition is carried out to speech data to be identified obtain corresponding text data simultaneously Output.

Sound of the angle from 0 ° to 180 ° is included before linear microphone array 1 and obtains region, wave beam shape The number grown up to be a useful person is 3, respectively the first Beam-former 21, the second Beam-former 22 and the 3rd Beam-former 23, First Beam-former 21 is used to form the first noise beam area, and the center of the wave beam that the first Beam-former 21 is formed refers to 20 ° of region of direction is obtained to sound；Second Beam-former 22 is used to form main beam region, the second Beam-former 22 Point to the direction that sound obtains 90 ° of region in the center of the wave beam for being formed；3rd Beam-former 23 is used for formation second and makes an uproar Point to the direction that sound obtains 160 ° of region in beam of sound region, the center of the wave beam that the 3rd Beam-former 23 is formed.Wherein, The width of the main lobe of the wave beam that the second Beam-former 22 is formed is greater than the first Beam-former and the 3rd Beam-former The width of the main lobe of the wave beam for being formed, it is preferred that the second Beam-former 22 formed wave beam main lobe width less than etc. In 90 °, the width of the main lobe of the wave beam that the first Beam-former and the 3rd Beam-former are formed is less than or equal to 40 °.

As presently preferred embodiments of the present invention, in the first Beam-former 21, the second Beam-former 22 and the 3rd ripple The wave filter of connection corresponding with each microphone in linear microphone array is designed with beamformer 23, and it is each Individual wave filter is all filtered by filter coefficient corresponding with itself, and filter coefficient is to shape to calculate by fixed beam Method is calculated, and the fixed beam shaping Algorithm includes：

y_n(k)=x_n(k)+v_n(k), n=1,2 ..., N (formula one)

In formula one, y_nK () is the voice data that n-th microphone is collected, x_n(k) and v_nK () collects respectively Desired signal and additive noise, in formula two,Be Beam-former estimation output, voice data is filtered so that The output of Beam-former is exported again after approaching the desired signal that certain microphone is received in linear microphone array,It is The corresponding filter coefficient of n-th microphone；

In formula three, e_mK () represents the output signal of Beam-former and the error of the desired signal for collecting, it is equal to The error e of desired signal_{X, m}The error e of (k) and additive noise_{V, m}The sum of (k)；And the error e of desired signal_{X, m}K () is made an uproar with additivity The error e of sound_{V, m}K () can be represented with formula four and formula five, the error of desired signal is equal to output and the wave beam of Beam-former The difference of the input of shaper, the error of additive noise is equal to the sum of all additive noises.

Because the additive noise of last Beam-former desired output is small as far as possible, obtained using based on error criterion Go out the minimum mean square error as shown in formula sixWhenWhen minimum, the optimum filter of output filter Coefficient h_{M, o}, as shown in formula seven, and the distortion in order to ensure desired signal is minimum, so constraints is added, e_{X, m}Estimate (k)=0 Count out optimum filter coefficient h_{M, o}。

Acoustic model is provided with speech recognition device therein 4, using acoustic model to the speech data to be identified that is input into Speech recognition is carried out, to identify corresponding speech text.Because the filtering process of microphone array is inevitably to master Wave beam causes distortion, the accuracy rate of identification can be influenceed when acoustic model carries out speech recognition to speech data, to reduce the mistake Very to the influence of speech recognition accuracy, before acoustic model carries out speech recognition, treated using by the microphone array Speech data adaptive training is done to the acoustic model, speech recognition device 4 then passes through when speech recognition is carried out The identification carried out through the acoustic model of adaptive training, it is accurate so as to improve identification of the speech recognition device to speech data Rate, reduces influence of the distortion to speech recognition accuracy.

As presently preferred embodiments of the present invention, be additionally provided with speech recognition device characteristic extracting module, text input module, Training data memory module and training module.Wherein, characteristic extracting module is communicated to connect with the sef-adapting filter module, Speech data to be identified for receiving Beam-former output, then extracts from the speech data to be identified for being received Acoustic feature；Text input module is used to receive the text marking corresponding with speech data to be identified being manually entered；Training Data memory module and characteristic extracting module and text input module are all communicated to connect, and it stores the sound that characteristic extracting module is extracted The corresponding text marking exported in feature and text input module is learned, and acoustic feature and corresponding text marking are combined into shape Into adaptive training data；Training module is communicated to connect with training data memory module, and it is read in training data memory module The adaptive training data of storage simultaneously carry out adaptive training using the adaptive training data for being read to acoustic model.

By taking speech control automatic teller machine as an example, the horizontally set multiple microphone on automatic teller machine, the microphone is at least provided with three It is individual, and be configured with certain spacing, the microphone is used to obtain the sound in front of automatic teller machine to form voice signal.Laterally The sound that the microphone of setting forms 180 ° in front of automatic teller machine obtains region, and 180 ° of the sound obtains the sound in region Sound includes the control instruction sound and ambient noise of speaker.Obtained in the sound and formed by three Beam-formers in region Three wave beams, main beam, the first noise wave beam and the second noise wave beam.The voice signal that multiple microphones are obtained is digitized into It is input in Beam-former after forming voice data, the first noise wave beam is exported by the first Beam-former, by second Beam-former exports main beam, and the second noise wave beam is exported by the 3rd Beam-former.By main beam, the first noise wave beam Adaptive-filtering module is input to together with the second noise wave beam to be filtered, and the first noise wave beam and are filtered from main beam Two noise wave beams, so as to export speech data to be identified, the speech data to be identified is exported for microphone array.To wait to know Other speech data carries out self adaptation operation in being input to speech recognition device, to improve speech recognition device to the voice to be identified The recognition accuracy of data, is then identified to the speech data to be identified and is formed corresponding text data output again. This article notebook data can be sent to automatic teller machine to make automatic teller machine perform corresponding action.

Present invention also offers a kind of linear Microphone Array Speech recognition methods, the method is first by microphone according to line Property array is configured, the microphone and audio signal for saying people is converted into voice data；Then voice data is sent into ripple Beamformer forms main beam and noise wave beam and export through filtering gives sef-adapting filter module, then by sef-adapting filter The information of module acquisition noise wave beam and main beam simultaneously rejects noise wave beam from main beam, forms voice number to be identified According to, finally by formed speech data to be identified be input in speech recognition device be identified and formed text data output. Specifically,

The microphone of first step, selection three and the above, be horizontally arranged at interval being aligned microphone array 1, Region is obtained in the sound for being previously formed one 0 ° to 180 ° of linear microphone array, as shown in Figure 1.The sound obtains area The sound that domain obtains both includes people's one's voice in speech and the noise from surrounding environment.Microphone receives the sound and obtains region The sound of acquisition, and the audio signal of acquisition is digitized treatment one voice data of formation, and export.

Second step, with reference to Fig. 2, the sound in the front of linear microphone array 1 obtains region and sets Beam-former, Beam-former obtains region and is formed with positioned at the main beam region at middle part in sound, and the first of main beam region both sides Noise beam area and the second noise beam area,；Then voice data is input in Beam-former, and is exported and main ripple The corresponding main beam in beam region, the first noise wave beam corresponding with the first noise beam area, and with the second noise beam zone The corresponding second noise wave beam in domain.

Third step, the first noise wave beam that Beam-former is exported, main beam and the second noise wave beam are filtered Ripple treatment, the first noise wave beam and the second noise wave beam noise information according to input make an uproar the first noise wave beam and second Beam of sound is rejected from main beam, forms a speech data to be identified.Specifically, the master sent in Beam-former is received Wave beam, the first noise wave beam and the second noise wave beam, the first noise wave beam is carried out certainly by the first sef-adapting filter 31 Adapt to, the second noise wave beam is carried out into self adaptation by the second sef-adapting filter 32, then will carry out first after self adaptation Noise wave beam and the second noise wave beam are filtered and exported from main beam, and the result after output passes through a judge module and artificially sets Fixed standard is compared, if the beam quality after adaptive-filtering does not reach standard, the result of output is returned into first Again self adaptation in the sef-adapting filter 32 of sef-adapting filter 31 and second, moves in circles, until the result of output reaches and recognizes It is the standard of setting, last result is exported from sef-adapting filter module 3, obtains speech data to be identified.

Four steps, recognizes speech data to be identified, and formation one text data output is on machine.

As presently preferred embodiments of the present invention, an angle is included before linear microphone array for 0 ° to 180 ° Sound obtain region, be input to the voice data of formation in Beam-former by microphone, and the number of Beam-former is 3 It is individual, it is divided into the first Beam-former 21, the second Beam-former 22 and the 3rd Beam-former 23, by the first Beam-former For forming the first noise beam area, 180 ° of sound are pointed at the center of the wave beam of first Beam-former and obtains region 20 ° of direction, the first Beam-former 21 obtains the voice data positioned at the first noise of linear microphone array beam area, and Export the first noise wave beam；Second Beam-former 22 is used to form main beam region, by the wave beam of second Beam-former Center point to the direction that 180 ° of sound obtain 90 ° of region, the second Beam-former 22 is obtained and is located at linear microphone array Main beam region voice data, export main beam；3rd Beam-former 23 is used to form the second noise beam area, will Point to the direction that 180 ° of sound obtain 160 ° of region, the 3rd Beam-former 23 in the center of the wave beam of the 3rd Beam-former The noise positioned at the second noise beam area of linear microphone array is obtained, the second noise wave beam is exported.Wherein, the second wave beam The width of the main lobe of the wave beam that shaper 22 is formed is greater than the first Beam-former and the 3rd Beam-former formed The width of the main lobe of wave beam, it is preferred that the width of the main lobe of the wave beam that the second Beam-former 22 is formed is less than or equal to 90 °, the The width of the main lobe of the wave beam that one Beam-former and the 3rd Beam-former are formed is less than or equal to 40 °.

y_n(k)=x_n(k)+v_n(k), n=1,2 ..., N (formula one)

As presently preferred embodiments of the present invention, self adaptation behaviour is carried out to acoustic model using the speech data to be identified Make, it is further comprising the steps of：Speech recognition device extract first setting quantity speech data to be identified, and to extract wait know Other speech data carries out text marking；Then the corresponding acoustic feature of speech data to be identified of setting quantity is extracted again, And combine corresponding text marking with acoustic feature to form adaptive training data；Finally using adaptive training data to sound Learning model carries out adaptive training.Adaptive training terminate after through self adaptation operate acoustic model to speech data to be identified Carry out speech recognition.

The present invention is directed to specific man-machine interactive voice, it is not necessary to real-time tracking sound bearing, it is to avoid traditional algorithm may Because suppression or distortion to desired signal that sound source position estimated bias bring；Algorithm amount of calculation is small simultaneously, implementation process letter Just, cost is relatively low for folk prescription, and the speech data quality of acquisition is high, it is possible to increase the accuracy rate of speech recognition.

The present invention is described in detail above in association with drawings and Examples, those skilled in the art can basis Described above makes many variations example to the present invention.Thus, some of embodiment details should not constitute limitation of the invention, The scope that to be defined using appended claims of the present invention is used as protection scope of the present invention.

Claims

1. a kind of audio recognition method using linear microphone array, it is characterised in that methods described comprises the following steps：

The sound of environment is recorded to form voice data using linear microphone array；

Region is obtained for the sound in front of the linear microphone array and Beam-former is set, using the Beam-former Region is obtained in the sound form main beam region and the first noise beam area positioned at both sides and the positioned at middle part Two noise beam areas；

The voice data is input in the Beam-former to obtain main beam, the correspondence in the correspondence main beam region First noise wave beam of the first noise beam area and the second noise waves of correspondence the second noise beam area Beam；

The first noise wave beam and the second noise wave beam is filtered from the main beam to obtain voice number to be identified According to；

Speech recognition is carried out to the speech data to be identified to obtain corresponding text data and export.

2. the method for claim 1, it is characterised in that the sound obtains region includes that angle is flat from 0 ° to 180 ° Face region, obtains region and sets Beam-former for the sound in front of the linear microphone array, including：It is provided for shape Into the first Beam-former of the first noise beam area, the center of the wave beam that first Beam-former is formed Point to 20 ° of directions that the sound obtains region；

It is provided for being formed second Beam-former in the main beam region, the ripple that second Beam-former is formed Point to 90 ° of directions that the sound obtains region in the center of beam；

It is provided for being formed the 3rd Beam-former of the second noise beam area, by the 3rd Beam-former institute shape Into the center of wave beam point to 160 ° of directions that the sound obtains region.

3. method as claimed in claim 2, it is characterised in that when setting Beam-former, is set in each Beam-former There is the wave filter of connection corresponding with each microphone in the linear microphone array, use fixed beam shaping Algorithm for every Wave filter in one Beam-former calculates filter coefficient；

The fixed beam shaping Algorithm includes：

y_n(k)=x_n(k)+v_n(k), n=1,2 ..., N (formula one)

In formula one, y_nK () is the voice data that n-th microphone is collected, x_n(k) and v_nK () is respectively the expectation for collecting Signal and additive noise；In formula two,It is the output of Beam-former, the output of Beam-former is approached into linear Mike The desired signal that certain microphone is received in wind array,It is the corresponding filter coefficient of n-th microphone；

In formula three, e_mK () represents the output signal of Beam-former and the error of the desired signal for collecting, it is equal to expects The error e of signal_{X, m}The error e of (k) and additive noise_{V, m}The sum of (k)；And the error e of desired signal_{X, m}(k) and additive noise Error e_{V, m}K () can be represented with formula four and formula five：

Formula six and formula seven are obtained based on mean square error is minimized, by minimizingMinimum, the knot with modern power mouthful property noise Contract beam e_{X, m}K ()=0 is drawing optimum filter coefficient h_{M, o}, h therein_mFor all wave filters are corresponding in Beam-former Filter coefficient matrices, n_{M, o}It is the corresponding optimal filter coefficient value of all wave filters in Beam-former.

4. the method for claim 1, it is characterised in that carry out speech recognition to the speech data to be identified, wraps Include：Self adaptation operation is carried out to acoustic model first with the speech data to be identified；Then utilize what is operated through self adaptation Acoustic model carries out speech recognition to the speech data to be identified.

5. method as claimed in claim 4, it is characterised in that carried out to acoustic model using the speech data to be identified Self adaptation is operated, including：

The speech data to be identified of setting quantity is extracted, and speech data to be identified to being extracted carries out text marking：

Extract it is described setting quantity the corresponding acoustic feature of speech data to be identified, and by corresponding text marking with it is described Acoustic feature combines to form adaptive training data；

6. a kind of linear Microphone Array Speech identifying system, it is characterised in that the system includes：

Linear microphone array, for recording the sound of environment to form voice data；

The Beam-former communicated to connect with the linear microphone array, the Beam-former is in the linear microphone array Sound in front of row obtain region formed positioned at middle part main beam region and the first noise beam area positioned at both sides and Second noise beam area, for being processed the received voice data and is obtained the correspondence main beam region Main beam, the first noise wave beam of correspondence the first noise beam area and correspondence the second noise beam area The second noise wave beam；

Sef-adapting filter module, with Beam-former communication connection, receives the main beam, the first noise wave beam And the output of the second noise wave beam, and for filtering the first noise wave beam and described second from the main beam Noise wave beam is obtaining speech data to be identified；

Speech recognition device, with sef-adapting filter module communication connection, receives the speech data to be identified, and be used for Speech recognition is carried out to the speech data to be identified to obtain corresponding text data and export.

7. system as claimed in claim 6, it is characterised in that the sound obtains region includes that angle is flat from 0 ° to 180 ° Face region；

The Beam-former includes：The first Beam-former for forming the first noise beam area, described first Point to 20 ° of directions that the sound obtains region in the center of the wave beam that Beam-former is formed；

The second Beam-former for forming the main beam region, in the wave beam that second Beam-former is formed The heart points to 90 ° of directions that the sound obtains region；

The 3rd Beam-former for forming the second noise beam area, the ripple that the 3rd Beam-former is formed Point to 160 ° of directions that the sound obtains region in the center of beam.

8. system as claimed in claim 6, it is characterised in that be provided with each Beam-former and the linear microphone array The wave filter of each microphone correspondence connection in row, the wave filter in each Beam-former is provided with corresponding wave filter system Number；The filter coefficient is calculated by fixed beam shaping Algorithm；

The fixed beam shaping Algorithm includes：

y_n(k)=x_n(k)+v_n(k), n=1,2 ..., N (formula one)

In formula three, e_mK () represents the output signal of Beam-former and the error of the desired signal for collecting, it is equal to expects The error e of signal_{X, m}The error e of (k) and additive noise_{V, m}The sum of (k)；And the error e of desired signal_{X, m}(k) and additive noise Error e_{V, m}K () can be represented with formula four and formula five；

Formula six and formula seven are obtained based on mean square error is minimized, by minimizingTo make additive noise minimum, with reference to Constraint e_{X, m}K ()=0 is drawing optimum filter coefficient h_{M, o}, h therein_mIt is the corresponding filter of all wave filters in Beam-former Ripple device coefficient matrix, h_{M, o}It is the corresponding optimal filter coefficient value of all wave filters in Beam-former.

9. system as claimed in claim 6, it is characterised in that institute's speech recognizer includes an acoustic model, the acoustics Model is used further to identification speech data to be identified after carrying out adaptive training through the speech data to be identified.

10. system as claimed in claim 9, it is characterised in that institute's speech recognizer also includes characteristic extracting module, text This input module, training data memory module and training module；

The characteristic extracting module is communicated to connect with the sef-adapting filter module, receives the speech data to be identified, For extracting acoustic feature from the speech data to be identified for being received；

The training data memory module is communicated to connect with the characteristic extracting module and the text input module, for storing The acoustic feature and corresponding text marking, the acoustic feature and corresponding text marking combine to form adaptive training number According to；

The training module is communicated to connect with the training data memory module, reads the training data memory module memory storage Adaptive training data simultaneously carry out adaptive training using the adaptive training data for being read to the acoustic model.