The content of the invention
Defect it is an object of the invention to overcome prior art, proposes that a kind of voice using linear microphone array is known
Other method and system, solve existing microphone array set-up mode exist it is computationally intensive, calculate complicated and cost of implementation compared with
Problem high, it is therefore intended that reach good noise reduction using microphone array, to obtain high-quality voice data and carry
The accuracy rate of speech recognition high.
To achieve these goals, the invention provides a kind of audio recognition method using linear microphone array, institute
The method of stating includes:
The sound of environment is recorded to form voice data using linear microphone array;For the linear microphone array
The sound in front obtains region and sets Beam-former, and obtaining region formation in the sound using the Beam-former is located at
The main beam region at middle part and the first noise beam area and the second noise beam area positioned at both sides;By the audio number
Main beam, correspondence the first noise wave beam in the correspondence main beam region are obtained according to being input in the Beam-former
The first noise wave beam in region and the second noise wave beam of correspondence the second noise beam area;From the main beam
The first noise wave beam and the second noise wave beam are filtered to obtain speech data to be identified;To the language to be identified
Sound data carry out speech recognition to obtain corresponding text data and export.
Beneficial effects of the present invention are:The present invention forms three beam areas by obtaining to be designed in region sound, its
In two wave beams be used to obtain noise, another wave beam is used to obtain desired signal, and exports correspondence by Beam-former
Noise wave beam and main beam, noise wave beam is then further filtered out from main beam by sef-adapting filter module.The party
Method does not need real-time tracking sound bearing, it is to avoid traditional algorithm probably due to sound source position estimated bias bring to expecting to believe
Number suppression or distortion;Algorithm amount of calculation is small simultaneously, and implementation process is simple and convenient, and cost is relatively low, the speech data quality of acquisition
It is high, it is possible to increase the accuracy rate of speech recognition.In addition combined with speech data to the self adaptation of speech recognition device, can be further
Improve the accuracy rate of speech recognition.
Further improvement of the present invention is:Region being obtained for the sound in front of the linear microphone array, wave beam is set
Shaper, including:The sound obtains region includes the plane domain of 0 ° to 180 ° of angle;It is provided for forming described first and makes an uproar
Point to the sound and obtain in first Beam-former in sound wave region, the center of the wave beam that first Beam-former is formed
Take 20 ° of directions in region;It is provided for being formed second Beam-former in the main beam region, by second Wave beam forming
Point to 90 ° of directions that the sound obtains region in the center of the wave beam that device is formed;It is provided for forming second noise waves
Point to the sound and obtain in 3rd Beam-former in beam region, the center of the wave beam that the 3rd Beam-former is formed
160 ° of directions in region.
Further improvement of the present invention is:When Beam-former is set, it is provided with each Beam-former and the line
Property microphone array in each microphone correspondence connection wave filter, it is each Wave beam forming to use fixed beam shaping Algorithm
Wave filter in device calculates filter coefficient;
The fixed beam shaping Algorithm includes:
yn(k)=xn(k)+vn(k), n=1,2 ..., N (formula one)
In formula one, ynK () is the voice data that n-th microphone is collected, xn(k) and vnK () collects respectively
Desired signal and additive noise;In formula two,It is the output of Beam-former, the output of Beam-former is approached linearly
The desired signal that certain microphone is received in microphone array,It is the corresponding filter coefficient of n-th microphone;
In formula three, emK () represents the output signal of Beam-former and the error of the desired signal for collecting, it is equal to
The error e of desired signalX, mThe error e of (k) and additive noiseV, mThe sum of (k);And the error e of desired signalX, mK () is made an uproar with additivity
The error e of soundV, mK () can be represented with formula four and formula five;
Formula six and formula seven are obtained based on mean square error is minimized, by minimizingTo make additive noise minimum,
With reference to constraint eX, mK ()=0 is drawing optimum filter coefficient hM, o, h thereinmIt is all wave filter correspondences in Beam-former
Filter coefficient matrices, hM, oIt is the corresponding optimal filter coefficient value of all wave filters in Beam-former.
Further improvement of the present invention is:Speech recognition is carried out to the speech data to be identified, including:First with institute
State speech data to be identified carries out self adaptation operation to acoustic model;The acoustic model through self adaptation operation is then utilized to institute
Stating speech data to be identified carries out speech recognition.
Further improvement of the present invention is:Self adaptation behaviour is carried out to acoustic model using the speech data to be identified
Make, including:The speech data to be identified of setting quantity is extracted, and speech data to be identified to being extracted carries out text mark
Note;
Extract it is described setting quantity the corresponding acoustic feature of speech data to be identified, and by corresponding text marking with
The acoustic feature combines to form adaptive training data;
Adaptive training is carried out to the acoustic model using the adaptive training data.
Present invention also offers a kind of linear Microphone Array Speech identifying system, the system includes:It is linear with described
The Beam-former of microphone array communication connection, sound of the Beam-former in front of the linear microphone array is obtained
Region is taken to be formed positioned at the main beam region and the first noise beam area and the second noise beam zone positioned at both sides at middle part
Domain, for being processed the received voice data and is obtained the main beam in the correspondence main beam region, correspondence
First noise wave beam of the first noise beam area and the second noise waves of correspondence the second noise beam area
Beam;
Sef-adapting filter module, with Beam-former communication connection, receives the main beam, first noise
Wave beam and the second noise wave beam, and for filtering the first noise wave beam and second noise waves from the main beam
Beam is obtaining speech data to be identified;
Speech recognition device, with sef-adapting filter module communication connection, receives the speech data to be identified, and
For carrying out speech recognition to the speech data to be identified to obtain corresponding text data and export.
Further improvement of the present invention is:The sound obtains region includes plane domain of the angle from 0 ° to 180 °;Institute
Stating Beam-former includes:The first Beam-former for forming the first noise waves region, first Wave beam forming
Point to 20 ° of directions that the sound obtains region in the center of the wave beam that device is formed;
The second Beam-former for forming the main beam region, the wave beam that second Beam-former is formed
Center point to 90 ° of directions that the sound obtains region;
The 3rd Beam-former for forming the second noise beam area, the 3rd Beam-former is formed
The center of wave beam point to 160 ° of directions that the sound obtains region.
Further improvement of the present invention is:Be provided with each Beam-former with it is each in the linear microphone array
The wave filter of microphone correspondence connection, the wave filter in each Beam-former is provided with corresponding filter coefficient;The filter
Ripple device coefficient is calculated by fixed beam shaping Algorithm;
The fixed beam shaping Algorithm includes:
yn(k)=xn(k)+vn(k), n=1,2 ..., N (formula one)
In formula one, ynK () is the voice data that n-th microphone is collected, xn(k) and vnK () collects respectively
Desired signal and additive noise;In formula two,It is the output of Beam-former, the output of Beam-former is approached linearly
The desired signal that certain microphone is received in microphone array,It is the corresponding filter coefficient of n-th microphone;
In formula three, emK () represents the output signal of Beam-former and the error of the desired signal for collecting, it is equal to
The error e of desired signalX, mThe error e of (k) and additive noiseV, mThe sum of (k);And the error e of desired signalX, mK () is made an uproar with additivity
The error e of soundV, mK () can be represented with formula four and formula five;
Formula six and formula seven are obtained based on mean square error is minimized, by minimizingTo make additive noise minimum,
With reference to constraint eX, mK ()=0 is drawing optimum filter coefficient hM, o, h thereinmIt is all wave filter correspondences in Beam-former
Filter coefficient matrices, hM, oIt is the corresponding optimal filter coefficient value of all wave filters in Beam-former.
Further improvement of the present invention is:Institute's speech recognizer includes an acoustic model, and the acoustic model is through described
Speech data to be identified is used further to identification speech data to be identified after carrying out adaptive training.
Further improvement of the present invention is:Institute's speech recognizer also include characteristic extracting module, text input module,
Training data memory module and training module;
The characteristic extracting module is communicated to connect with the sef-adapting filter module, receives the voice number to be identified
According to for extracting acoustic feature from the speech data to be identified for being received;
The text input module is used to be input into text marking corresponding with the speech data to be identified;
The training data memory module is communicated to connect with the characteristic extracting module and the text input module, is used for
The acoustic feature and corresponding text marking are stored, the acoustic feature and corresponding text marking combine to form self adaptation instruction
Practice data;
The training module is communicated to connect with the training data memory module, is read in the training data memory module
Storage adaptive training data simultaneously carry out adaptive training using the adaptive training data for being read to the acoustic model.
Specific embodiment
Below in conjunction with the accompanying drawings, the present invention is described in further detail.With the fast development of hardware system, microphone array
Row are just increasingly widely applied.Particularly in man machine language's interaction scenarios, conventional art pair carries out auditory localization simultaneously
Amount of calculation with adaptive beamforming is very big, and when deviation occurs in auditory localization, it is easy to suppression is caused to desired signal
System introduces distortion, and then has influence on speech recognition system performance.The present invention is applied in front of the linear microphone array
180 ° of plane domain, its applicable scene is man-machine interactive voice.Speaker when by Voice command machine, the speaker
Can stand in face of machine, so when linear microphone array is listed in the voice for obtaining speaker, it is only necessary to consider in front of machine
Voice, the voice without considering machine rear.The present invention is divided the plane domain in front of microphone, in utilization
The wave beam wider in portion obtains the voice of speaker as much as possible, while suppressing ambient noise as much as possible;Using both sides compared with
Narrow wave beam obtains environmental noise as much as possible, while suppressing desired human voice signal.Again by adaptive filter algorithm, from
Ambient noise component is further eliminated in the output of main beam.Below, with reference to accompanying drawing to linear microphone array language of the invention
The method and system of sound identification are illustrated.
As shown in Fig. 2 the invention discloses a kind of linear Microphone Array Speech identifying system, the system includes linear wheat
Gram wind array 1, Beam-former, sef-adapting filter module 3 and speech recognition device 4.
Wherein, linear microphone array is used for recording the sound of external environment condition and changing into voice signal by digitizing
Voice data.The sound acquisition region recorded is formed in the front of linear microphone array, Beam-former obtains area in sound
Domain is formed with positioned at the main beam region at middle part, and in the first noise beam area of main beam region both sides and the second noise
Beam area, Beam-former is communicated to connect with linear microphone array, receive main beam region, the first noise beam area and
Voice data in second noise beam area simultaneously obtains main beam, the correspondence institute in the correspondence main beam region after being processed
State the first noise wave beam of the first noise beam area and the second noise wave beam of correspondence the second noise beam area.
As shown in figure 3, sef-adapting filter module 3 is a multi-channel filter, with Beam-former communication connection, receive
Main beam, the first noise wave beam and the second noise wave beam sent in Beam-former, first is passed through by the first noise wave beam
Sef-adapting filter 31 carries out self adaptation, second noise wave beam is carried out into self adaptation by the second sef-adapting filter 32, then
To carry out filtering in the first noise wave beam after self adaptation and the second noise wave beam main beam and exporting, the result after output passes through
Feedback mechanism is delivered to sef-adapting filter 31 and 32, and according to normalization minimum mean-square (Normalized LeastMean
Square, NLMS) algorithm constantly updates adaptive filter coefficient, and last result exports from sef-adapting filter module 3,
Obtain speech data to be identified.
Speech recognition device 4, with the communication connection of sef-adapting filter module, speech recognition device 4 receives sef-adapting filter mould
The speech data to be identified of block output, and speech recognition is carried out to speech data to be identified obtain corresponding text data simultaneously
Output.
Sound of the angle from 0 ° to 180 ° is included before linear microphone array 1 and obtains region, wave beam shape
The number grown up to be a useful person is 3, respectively the first Beam-former 21, the second Beam-former 22 and the 3rd Beam-former 23,
First Beam-former 21 is used to form the first noise beam area, and the center of the wave beam that the first Beam-former 21 is formed refers to
20 ° of region of direction is obtained to sound;Second Beam-former 22 is used to form main beam region, the second Beam-former 22
Point to the direction that sound obtains 90 ° of region in the center of the wave beam for being formed;3rd Beam-former 23 is used for formation second and makes an uproar
Point to the direction that sound obtains 160 ° of region in beam of sound region, the center of the wave beam that the 3rd Beam-former 23 is formed.Wherein,
The width of the main lobe of the wave beam that the second Beam-former 22 is formed is greater than the first Beam-former and the 3rd Beam-former
The width of the main lobe of the wave beam for being formed, it is preferred that the second Beam-former 22 formed wave beam main lobe width less than etc.
In 90 °, the width of the main lobe of the wave beam that the first Beam-former and the 3rd Beam-former are formed is less than or equal to 40 °.
As presently preferred embodiments of the present invention, in the first Beam-former 21, the second Beam-former 22 and the 3rd ripple
The wave filter of connection corresponding with each microphone in linear microphone array is designed with beamformer 23, and it is each
Individual wave filter is all filtered by filter coefficient corresponding with itself, and filter coefficient is to shape to calculate by fixed beam
Method is calculated, and the fixed beam shaping Algorithm includes:
yn(k)=xn(k)+vn(k), n=1,2 ..., N (formula one)
In formula one, ynK () is the voice data that n-th microphone is collected, xn(k) and vnK () collects respectively
Desired signal and additive noise, in formula two,Be Beam-former estimation output, voice data is filtered so that
The output of Beam-former is exported again after approaching the desired signal that certain microphone is received in linear microphone array,It is
The corresponding filter coefficient of n-th microphone;
In formula three, emK () represents the output signal of Beam-former and the error of the desired signal for collecting, it is equal to
The error e of desired signalX, mThe error e of (k) and additive noiseV, mThe sum of (k);And the error e of desired signalX, mK () is made an uproar with additivity
The error e of soundV, mK () can be represented with formula four and formula five, the error of desired signal is equal to output and the wave beam of Beam-former
The difference of the input of shaper, the error of additive noise is equal to the sum of all additive noises.
Formula six and formula seven are obtained based on mean square error is minimized, by minimizingTo make additive noise minimum,
With reference to constraint eX, mK ()=0 is drawing optimum filter coefficient hM, o, h thereinmIt is all wave filter correspondences in Beam-former
Filter coefficient matrices, hM, oIt is the corresponding optimal filter coefficient value of all wave filters in Beam-former.
Because the additive noise of last Beam-former desired output is small as far as possible, obtained using based on error criterion
Go out the minimum mean square error as shown in formula sixWhenWhen minimum, the optimum filter of output filter
Coefficient hM, o, as shown in formula seven, and the distortion in order to ensure desired signal is minimum, so constraints is added, eX, mEstimate (k)=0
Count out optimum filter coefficient hM, o。
Acoustic model is provided with speech recognition device therein 4, using acoustic model to the speech data to be identified that is input into
Speech recognition is carried out, to identify corresponding speech text.Because the filtering process of microphone array is inevitably to master
Wave beam causes distortion, the accuracy rate of identification can be influenceed when acoustic model carries out speech recognition to speech data, to reduce the mistake
Very to the influence of speech recognition accuracy, before acoustic model carries out speech recognition, treated using by the microphone array
Speech data adaptive training is done to the acoustic model, speech recognition device 4 then passes through when speech recognition is carried out
The identification carried out through the acoustic model of adaptive training, it is accurate so as to improve identification of the speech recognition device to speech data
Rate, reduces influence of the distortion to speech recognition accuracy.
As presently preferred embodiments of the present invention, be additionally provided with speech recognition device characteristic extracting module, text input module,
Training data memory module and training module.Wherein, characteristic extracting module is communicated to connect with the sef-adapting filter module,
Speech data to be identified for receiving Beam-former output, then extracts from the speech data to be identified for being received
Acoustic feature;Text input module is used to receive the text marking corresponding with speech data to be identified being manually entered;Training
Data memory module and characteristic extracting module and text input module are all communicated to connect, and it stores the sound that characteristic extracting module is extracted
The corresponding text marking exported in feature and text input module is learned, and acoustic feature and corresponding text marking are combined into shape
Into adaptive training data;Training module is communicated to connect with training data memory module, and it is read in training data memory module
The adaptive training data of storage simultaneously carry out adaptive training using the adaptive training data for being read to acoustic model.
By taking speech control automatic teller machine as an example, the horizontally set multiple microphone on automatic teller machine, the microphone is at least provided with three
It is individual, and be configured with certain spacing, the microphone is used to obtain the sound in front of automatic teller machine to form voice signal.Laterally
The sound that the microphone of setting forms 180 ° in front of automatic teller machine obtains region, and 180 ° of the sound obtains the sound in region
Sound includes the control instruction sound and ambient noise of speaker.Obtained in the sound and formed by three Beam-formers in region
Three wave beams, main beam, the first noise wave beam and the second noise wave beam.The voice signal that multiple microphones are obtained is digitized into
It is input in Beam-former after forming voice data, the first noise wave beam is exported by the first Beam-former, by second
Beam-former exports main beam, and the second noise wave beam is exported by the 3rd Beam-former.By main beam, the first noise wave beam
Adaptive-filtering module is input to together with the second noise wave beam to be filtered, and the first noise wave beam and are filtered from main beam
Two noise wave beams, so as to export speech data to be identified, the speech data to be identified is exported for microphone array.To wait to know
Other speech data carries out self adaptation operation in being input to speech recognition device, to improve speech recognition device to the voice to be identified
The recognition accuracy of data, is then identified to the speech data to be identified and is formed corresponding text data output again.
This article notebook data can be sent to automatic teller machine to make automatic teller machine perform corresponding action.
Present invention also offers a kind of linear Microphone Array Speech recognition methods, the method is first by microphone according to line
Property array is configured, the microphone and audio signal for saying people is converted into voice data;Then voice data is sent into ripple
Beamformer forms main beam and noise wave beam and export through filtering gives sef-adapting filter module, then by sef-adapting filter
The information of module acquisition noise wave beam and main beam simultaneously rejects noise wave beam from main beam, forms voice number to be identified
According to, finally by formed speech data to be identified be input in speech recognition device be identified and formed text data output.
Specifically,
The microphone of first step, selection three and the above, be horizontally arranged at interval being aligned microphone array 1,
Region is obtained in the sound for being previously formed one 0 ° to 180 ° of linear microphone array, as shown in Figure 1.The sound obtains area
The sound that domain obtains both includes people's one's voice in speech and the noise from surrounding environment.Microphone receives the sound and obtains region
The sound of acquisition, and the audio signal of acquisition is digitized treatment one voice data of formation, and export.
Second step, with reference to Fig. 2, the sound in the front of linear microphone array 1 obtains region and sets Beam-former,
Beam-former obtains region and is formed with positioned at the main beam region at middle part in sound, and the first of main beam region both sides
Noise beam area and the second noise beam area,;Then voice data is input in Beam-former, and is exported and main ripple
The corresponding main beam in beam region, the first noise wave beam corresponding with the first noise beam area, and with the second noise beam zone
The corresponding second noise wave beam in domain.
Third step, the first noise wave beam that Beam-former is exported, main beam and the second noise wave beam are filtered
Ripple treatment, the first noise wave beam and the second noise wave beam noise information according to input make an uproar the first noise wave beam and second
Beam of sound is rejected from main beam, forms a speech data to be identified.Specifically, the master sent in Beam-former is received
Wave beam, the first noise wave beam and the second noise wave beam, the first noise wave beam is carried out certainly by the first sef-adapting filter 31
Adapt to, the second noise wave beam is carried out into self adaptation by the second sef-adapting filter 32, then will carry out first after self adaptation
Noise wave beam and the second noise wave beam are filtered and exported from main beam, and the result after output passes through a judge module and artificially sets
Fixed standard is compared, if the beam quality after adaptive-filtering does not reach standard, the result of output is returned into first
Again self adaptation in the sef-adapting filter 32 of sef-adapting filter 31 and second, moves in circles, until the result of output reaches and recognizes
It is the standard of setting, last result is exported from sef-adapting filter module 3, obtains speech data to be identified.
Four steps, recognizes speech data to be identified, and formation one text data output is on machine.
As presently preferred embodiments of the present invention, an angle is included before linear microphone array for 0 ° to 180 °
Sound obtain region, be input to the voice data of formation in Beam-former by microphone, and the number of Beam-former is 3
It is individual, it is divided into the first Beam-former 21, the second Beam-former 22 and the 3rd Beam-former 23, by the first Beam-former
For forming the first noise beam area, 180 ° of sound are pointed at the center of the wave beam of first Beam-former and obtains region
20 ° of direction, the first Beam-former 21 obtains the voice data positioned at the first noise of linear microphone array beam area, and
Export the first noise wave beam;Second Beam-former 22 is used to form main beam region, by the wave beam of second Beam-former
Center point to the direction that 180 ° of sound obtain 90 ° of region, the second Beam-former 22 is obtained and is located at linear microphone array
Main beam region voice data, export main beam;3rd Beam-former 23 is used to form the second noise beam area, will
Point to the direction that 180 ° of sound obtain 160 ° of region, the 3rd Beam-former 23 in the center of the wave beam of the 3rd Beam-former
The noise positioned at the second noise beam area of linear microphone array is obtained, the second noise wave beam is exported.Wherein, the second wave beam
The width of the main lobe of the wave beam that shaper 22 is formed is greater than the first Beam-former and the 3rd Beam-former formed
The width of the main lobe of wave beam, it is preferred that the width of the main lobe of the wave beam that the second Beam-former 22 is formed is less than or equal to 90 °, the
The width of the main lobe of the wave beam that one Beam-former and the 3rd Beam-former are formed is less than or equal to 40 °.
As presently preferred embodiments of the present invention, in the first Beam-former 21, the second Beam-former 22 and the 3rd ripple
The wave filter of connection corresponding with each microphone in linear microphone array is designed with beamformer 23, and it is each
Individual wave filter is all filtered by filter coefficient corresponding with itself, and filter coefficient is to shape to calculate by fixed beam
Method is calculated, and the fixed beam shaping Algorithm includes:
yn(k)=xn(k)+vn(k), n=1,2 ..., N (formula one)
In formula one, ynK () is the voice data that n-th microphone is collected, xn(k) and vnK () collects respectively
Desired signal and additive noise, in formula two,Be Beam-former estimation output, voice data is filtered so that
The output of Beam-former is exported again after approaching the desired signal that certain microphone is received in linear microphone array,It is
The corresponding filter coefficient of n-th microphone;
In formula three, emK () represents the output signal of Beam-former and the error of the desired signal for collecting, it is equal to
The error e of desired signalX, mThe error e of (k) and additive noiseV, mThe sum of (k);And the error e of desired signalX, mK () is made an uproar with additivity
The error e of soundV, mK () can be represented with formula four and formula five, the error of desired signal is equal to output and the wave beam of Beam-former
The difference of the input of shaper, the error of additive noise is equal to the sum of all additive noises.
Because the additive noise of last Beam-former desired output is small as far as possible, obtained using based on error criterion
Go out the minimum mean square error as shown in formula sixWhenWhen minimum, the optimum filter of output filter
Coefficient hM, o, as shown in formula seven, and the distortion in order to ensure desired signal is minimum, so constraints is added, eX, mEstimate (k)=0
Count out optimum filter coefficient hM, o。
Acoustic model is provided with speech recognition device therein 4, using acoustic model to the speech data to be identified that is input into
Speech recognition is carried out, to identify corresponding speech text.Because the filtering process of microphone array is inevitably to master
Wave beam causes distortion, the accuracy rate of identification can be influenceed when acoustic model carries out speech recognition to speech data, to reduce the mistake
Very to the influence of speech recognition accuracy, before acoustic model carries out speech recognition, treated using by the microphone array
Speech data adaptive training is done to the acoustic model, speech recognition device 4 then passes through when speech recognition is carried out
The identification carried out through the acoustic model of adaptive training, it is accurate so as to improve identification of the speech recognition device to speech data
Rate, reduces influence of the distortion to speech recognition accuracy.
As presently preferred embodiments of the present invention, self adaptation behaviour is carried out to acoustic model using the speech data to be identified
Make, it is further comprising the steps of:Speech recognition device extract first setting quantity speech data to be identified, and to extract wait know
Other speech data carries out text marking;Then the corresponding acoustic feature of speech data to be identified of setting quantity is extracted again,
And combine corresponding text marking with acoustic feature to form adaptive training data;Finally using adaptive training data to sound
Learning model carries out adaptive training.Adaptive training terminate after through self adaptation operate acoustic model to speech data to be identified
Carry out speech recognition.
By taking speech control automatic teller machine as an example, the horizontally set multiple microphone on automatic teller machine, the microphone is at least provided with three
It is individual, and be configured with certain spacing, the microphone is used to obtain the sound in front of automatic teller machine to form voice signal.Laterally
The sound that the microphone of setting forms 180 ° in front of automatic teller machine obtains region, and 180 ° of the sound obtains the sound in region
Sound includes the control instruction sound and ambient noise of speaker.Obtained in the sound and formed by three Beam-formers in region
Three wave beams, main beam, the first noise wave beam and the second noise wave beam.The voice signal that multiple microphones are obtained is digitized into
It is input in Beam-former after forming voice data, the first noise wave beam is exported by the first Beam-former, by second
Beam-former exports main beam, and the second noise wave beam is exported by the 3rd Beam-former.By main beam, the first noise wave beam
Adaptive-filtering module is input to together with the second noise wave beam to be filtered, and the first noise wave beam and are filtered from main beam
Two noise wave beams, so as to export speech data to be identified, the speech data to be identified is exported for microphone array.To wait to know
Other speech data carries out self adaptation operation in being input to speech recognition device, to improve speech recognition device to the voice to be identified
The recognition accuracy of data, is then identified to the speech data to be identified and is formed corresponding text data output again.
This article notebook data can be sent to automatic teller machine to make automatic teller machine perform corresponding action.
The present invention is directed to specific man-machine interactive voice, it is not necessary to real-time tracking sound bearing, it is to avoid traditional algorithm may
Because suppression or distortion to desired signal that sound source position estimated bias bring;Algorithm amount of calculation is small simultaneously, implementation process letter
Just, cost is relatively low for folk prescription, and the speech data quality of acquisition is high, it is possible to increase the accuracy rate of speech recognition.
The present invention is described in detail above in association with drawings and Examples, those skilled in the art can basis
Described above makes many variations example to the present invention.Thus, some of embodiment details should not constitute limitation of the invention,
The scope that to be defined using appended claims of the present invention is used as protection scope of the present invention.