Summary of the invention
It is an object of the invention to overcome the deficiencies of existing technologies, propose that a kind of voice using linear microphone array is known
Other method and system solve the set-up mode of existing microphone array there are computationally intensive, calculate complicated and cost of implementation compared with
High problem, it is therefore intended that reach good noise reduction effect using microphone array, to obtain the audio data of high quality and mention
The accuracy rate of high speech recognition.
To achieve the goals above, the present invention provides a kind of audio recognition method using linear microphone array, institutes
The method of stating includes:
The sound of environment is recorded using linear microphone array to form audio data;For the linear microphone array
The sound in front obtains region and Beam-former is arranged, and obtains region formation in the sound using the Beam-former and is located at
The main beam region at middle part and the first noise beam area and the second noise beam area positioned at both sides;By the audio number
According to being input in the Beam-former to obtain the main beam in the corresponding main beam region, correspond to the first noise wave beam
The first noise wave beam in region and the second noise wave beam of corresponding second noise beam area;From the main beam
The first noise wave beam and the second noise wave beam are filtered out to obtain voice data to be identified;To the language to be identified
Sound data carry out speech recognition to obtain corresponding text data and export.
The invention has the benefit that the present invention forms three beam areas by the way that sound is obtained design in region,
In two wave beams for obtaining noise, another wave beam passes through Beam-former output correspondence for obtaining desired signal
Noise wave beam and main beam, noise wave beam is then further filtered out from main beam by sef-adapting filter module.The party
Method does not need real-time tracking sound bearing, and avoiding traditional algorithm may be because that sound source position estimated bias bring believes expectation
Number inhibition or distortion;Algorithm calculation amount is small simultaneously, realizes that process is simple and convenient, cost is relatively low, the voice data quality of acquisition
Height can be improved the accuracy rate of speech recognition.It, can be further in addition combined with voice data to the adaptive of speech recognition device
Improve the accuracy rate of speech recognition.
Further improvement of the present invention are as follows: obtain region for the sound in front of the linear microphone array and wave beam is set
Shaper, comprising: the sound obtains the plane domain that region includes 0 ° to 180 ° of angle;Setting is used to form described first and makes an uproar
The center that first Beam-former is formed by wave beam is directed toward the sound and obtained by first Beam-former in sound wave region
Take 20 ° of directions in region;The second Beam-former for being used to form the main beam region is set, by second Wave beam forming
It is directed toward 90 ° of directions that the sound obtains region in the center that device is formed by wave beam;Setting is used to form second noise waves
The center that the third Beam-former is formed by wave beam is directed toward the sound and obtained by the third Beam-former in beam region
160 ° of directions in region.
Further improvement of the present invention are as follows: when setting Beam-former, be provided with and the line in each Beam-former
The filter that each microphone in property microphone array is correspondingly connected with, uses fixed beam shaping Algorithm for each Wave beam forming
Filter in device calculates filter coefficient;
The fixed beam shaping Algorithm includes:
yn(k)=xn(k)+vn(k), n=1,2 ..., N (formula one)
In formula one, ynIt (k) is the collected audio data of n-th of microphone, xn(k) and vnIt (k) is collected respectively
Desired signal and additive noise;In formula two,It is the output of Beam-former, the output of Beam-former is approached linearly
The desired signal that some microphone receives in microphone array,It is the corresponding filter coefficient of n-th of microphone;
In formula three, em(k) output signal of Beam-former and the error of collected desired signal are indicated, it is equal to
The error e of desired signalX, m(k) with the error e of additive noiseV, m(k) sum;And the error e of desired signalX, m(k) it makes an uproar with additivity
The error e of soundV, m(k) it can be indicated with formula four and formula five;
Formula six and formula seven are obtained based on mean square error is minimized, passes through minimumTo enable additive noise minimum,
In conjunction with constraint eX, m(k)=0 to obtain optimum filter coefficient hM, o, h thereinmIt is corresponding for filters all in Beam-former
Filter coefficient matrices, hM, oFor the corresponding optimal filter coefficient value of filters all in Beam-former.
Further improvement of the present invention are as follows: speech recognition is carried out to the voice data to be identified, comprising: first with institute
It states voice data to be identified and acoustic model is adaptively operated;Then using the acoustic model through adaptively operating to institute
It states voice data to be identified and carries out speech recognition.
Further improvement of the present invention are as follows: acoustic model is adaptively grasped using the voice data to be identified
Make, comprising: extract the voice data to be identified of setting quantity, and text mark is carried out to extracted voice data to be identified
Note;
Extract it is described setting quantity the corresponding acoustic feature of voice data to be identified, and by corresponding text marking with
The acoustic feature combines to form adaptive training data;
Adaptive training is carried out to the acoustic model using the adaptive training data.
The present invention also provides a kind of linear Microphone Array Speech identifying system, the system comprises: with it is described linear
The Beam-former of microphone array communication connection, sound of the Beam-former in front of the linear microphone array obtain
Region is taken to form the main beam region for being located at middle part and the first noise beam area and the second noise beam zone positioned at both sides
Domain, for being handled the received audio data and being obtained main beam, the correspondence in the corresponding main beam region
First noise wave beam of first noise beam area and the second noise waves of corresponding second noise beam area
Beam;
Sef-adapting filter module communicates to connect with the Beam-former, receives the main beam, first noise
Wave beam and the second noise wave beam, and for filtering out the first noise wave beam and second noise waves from the main beam
Beam is to obtain voice data to be identified;
Speech recognition device is communicated to connect with the sef-adapting filter module, receives the voice data to be identified, and
For carrying out speech recognition to the voice data to be identified to obtain corresponding text data and export.
Further improvement of the present invention are as follows: it includes plane domain of the angle from 0 ° to 180 ° that the sound, which obtains region,;Institute
Stating Beam-former includes: the first Beam-former for being used to form first noise waves region, first Wave beam forming
It is directed toward 20 ° of directions that the sound obtains region in the center that device is formed by wave beam;
It is used to form second Beam-former in the main beam region, second Beam-former is formed by wave beam
Center be directed toward 90 ° of directions that the sound obtains region;
It is used to form the third Beam-former of second noise beam area, the third Beam-former is formed
The center of wave beam be directed toward 160 ° of directions that the sound obtains region.
Further improvement of the present invention are as follows: in each Beam-former be equipped with it is each in the linear microphone array
The filter that microphone is correspondingly connected with, the filter in each Beam-former are provided with corresponding filter coefficient;The filter
Wave device coefficient is calculated by fixed beam shaping Algorithm;
The fixed beam shaping Algorithm includes:
yn(k)=xn(k)+vn(k), n=1,2 ..., N (formula one)
In formula one, ynIt (k) is the collected audio data of n-th of microphone, xn(k) and vnIt (k) is collected respectively
Desired signal and additive noise;In formula two,It is the output of Beam-former, the output of Beam-former is approached linearly
The desired signal that some microphone receives in microphone array,It is the corresponding filter coefficient of n-th of microphone;
In formula three, em(k) output signal of Beam-former and the error of collected desired signal are indicated, it is equal to
The error e of desired signalX, m(k) with the error e of additive noiseV, m(k) sum;And the error e of desired signalX, m(k) it makes an uproar with additivity
The error e of soundV, m(k) it can be indicated with formula four and formula five;
Formula six and formula seven are obtained based on mean square error is minimized, passes through minimumTo enable additive noise minimum,
In conjunction with constraint eX, m(k)=0 to obtain optimum filter coefficient hM, o, h thereinmIt is corresponding for filters all in Beam-former
Filter coefficient matrices, hM, oFor the corresponding optimal filter coefficient value of filters all in Beam-former.
Further improvement of the present invention are as follows: institute's speech recognizer includes an acoustic model, described in the acoustic model warp
Voice data to be identified is used further to identify voice data to be identified after carrying out adaptive training.
Further improvement of the present invention are as follows: institute's speech recognizer further include have characteristic extracting module, text input module,
Training data memory module and training module;
The characteristic extracting module and the sef-adapting filter module communicate to connect, and receive the voice number to be identified
According to for extracting acoustic feature from the received voice data to be identified of institute;
The text input module is for inputting text marking corresponding with the voice data to be identified;
The training data memory module and the characteristic extracting module and the text input module communicate to connect, and are used for
The acoustic feature and corresponding text marking are stored, the acoustic feature and corresponding text marking combine to form adaptive instruction
Practice data;
The training module and the training data memory module communicate to connect, and read in the training data memory module
It stores adaptive training data and adaptive training is carried out to the acoustic model using read adaptive training data.
Specific embodiment
With reference to the accompanying drawing, invention is further described in detail.With the fast development of hardware system, microphone array
Column just have been more and more widely used.Especially in man machine language's interaction scenarios, traditional technology to carrying out auditory localization simultaneously
It is very big with the calculation amount of adaptive beamforming, and when deviation occurs in auditory localization, it is easy to desired signal is caused to press down
System introduces distortion, and then influences speech recognition system performance.The present invention is suitable in front of the linear microphone array
180 ° of plane domain, applicable scene are man-machine interactive voice.Speaker is when passing through voice control machine, the speaker
It can stand in face of machine, so when linear microphone array is listed in the voice for obtaining speaker, it is only necessary to consider in front of machine
Voice, without considering the voice of rear of machine.The present invention is divided the plane domain in front of microphone, in utilization
The wider wave beam in portion obtains the voice of speaker as much as possible, while inhibiting ambient noise as much as possible;Using two sides compared with
Narrow wave beam obtains environmental noise as much as possible, while inhibiting desired human voice signal.Pass through adaptive filter algorithm again, from
Ambient noise component is further eliminated in the output of main beam.In the following, in conjunction with attached drawing to the linear microphone array language of the present invention
The method and system of sound identification are illustrated.
As shown in Fig. 2, the invention discloses a kind of linear Microphone Array Speech identifying system, which includes linear wheat
Gram wind array 1, Beam-former, sef-adapting filter module 3 and speech recognition device 4.
Wherein, linear microphone array is used to record the sound of external environment and is converted to voice signal by digitizing
Audio data.The sound recorded is formed in the front of linear microphone array and obtains region, and Beam-former obtains area in sound
Domain is formed with positioned at the main beam region at middle part, and in the first noise beam area of main beam region two sides and the second noise
Beam area, Beam-former and linear microphone array communicate to connect, receive main beam region, the first noise beam area and
Audio data in second noise beam area simultaneously obtains the main beam in the corresponding main beam region, corresponding institute after being handled
State the first noise wave beam of the first noise beam area and the second noise wave beam of corresponding second noise beam area.
As shown in figure 3, sef-adapting filter module 3 is a multi-channel filter, communicate to connect, receives with Beam-former
First noise wave beam is passed through first by main beam, the first noise wave beam and the second noise wave beam issued in Beam-former
Sef-adapting filter 31 carries out adaptively, adaptively, then the second noise wave beam is carried out by the second sef-adapting filter 32
It will filter out and export in the first noise wave beam and the second noise wave beam main beam after carrying out adaptively, the result after output passes through
Feedback mechanism is transmitted to sef-adapting filter 31 and 32, and according to normalization minimum mean-square (Normalized LeastMean
Square, NLMS) algorithm constantly updates adaptive filter coefficient, and last result exports from sef-adapting filter module 3,
Obtain voice data to be identified.
Speech recognition device 4 is communicated to connect with sef-adapting filter module, and speech recognition device 4 receives sef-adapting filter mould
The voice data to be identified of block output, and speech recognition is carried out to voice data to be identified and obtains corresponding text data simultaneously
Output.
It include that sound of the angle from 0 ° to 180 ° obtains region, wave beam shape before linear microphone array 1
The number grown up to be a useful person is 3, respectively the first Beam-former 21, the second Beam-former 22 and third Beam-former 23,
First Beam-former 21 is used to form the first noise beam area, and the center that the first Beam-former 21 is formed by wave beam refers to
20 ° of the direction in region is obtained to sound;Second Beam-former 22 is used to form main beam region, the second Beam-former 22
It is directed toward 90 ° of the direction that sound obtains region in the center for being formed by wave beam;Third Beam-former 23 is used to form second and makes an uproar
160 ° of the direction that sound obtains region is directed toward at beam of sound region, the center for the wave beam that third Beam-former 23 is formed.Wherein,
The width for the main lobe that second Beam-former 22 is formed by wave beam is greater than the first Beam-former and third Beam-former
Be formed by the width of the main lobe of wave beam, preferably, the second Beam-former 22 formed wave beam main lobe width be less than etc.
In 90 °, the width for the main lobe that the first Beam-former and third Beam-former are formed by wave beam is less than or equal to 40 °.
As presently preferred embodiments of the present invention, in the first Beam-former 21, the second Beam-former 22 and third wave
The filter being all correspondingly connected with each of linear microphone array microphone is designed in beamformer 23, and each
A filter is all filtered by filter coefficient corresponding with itself, and filter coefficient is to shape to calculate by fixed beam
Method is calculated, which includes:
yn(k)=xn(k)+vn(k), n=1,2 ..., N (formula one)
In formula one, ynIt (k) is the collected audio data of n-th of microphone, xn(k) and vnIt (k) is collected respectively
Desired signal and additive noise, in formula two,Be Beam-former estimation output, audio data is filtered so that
The output of Beam-former exports again after approaching the desired signal that some microphone receives in linear microphone array,It is
The corresponding filter coefficient of n-th of microphone;
In formula three, em(k) output signal of Beam-former and the error of collected desired signal are indicated, it is equal to
The error e of desired signalX, m(k) with the error e of additive noiseV, m(k) sum;And the error e of desired signalX, m(k) it makes an uproar with additivity
The error e of soundV, m(k) it can be indicated with formula four and formula five, the error of desired signal is equal to output and the wave beam of Beam-former
The difference of the input of shaper, the error of additive noise are equal to the sum of all additive noises.
Formula six and formula seven are obtained based on mean square error is minimized, passes through minimumTo enable additive noise minimum,
In conjunction with constraint eX, m(k)=0 to obtain optimum filter coefficient hM, o, h thereinmIt is corresponding for filters all in Beam-former
Filter coefficient matrices, hM, oFor the corresponding optimal filter coefficient value of filters all in Beam-former.
Because the additive noise of last Beam-former desired output is small as far as possible, obtained using based on error criterion
Mean square error is minimized as shown in formula six outWhenWhen minimum, the optimum filter of output filter
Coefficient hM, o, as shown in formula seven, and in order to guarantee that the distortion of desired signal is minimum, so constraint condition is added, eX, m(k)=0 estimate
Count out optimum filter coefficient hM, o。
It is equipped with acoustic model in speech recognition device 4 therein, using acoustic model to the voice data to be identified of input
Speech recognition is carried out, to identify corresponding speech text.Since the filtering processing of microphone array is inevitably to master
Wave beam causes to be distorted, and carries out the accuracy rate that will affect identification when speech recognition, to voice data in acoustic model to reduce the mistake
Very to the influence of speech recognition accuracy, before acoustic model carries out speech recognition, processed using by the microphone array
Voice data adaptive training is done to the acoustic model, speech recognition device 4 then passes through when carrying out speech recognition
The identification that the acoustic model through adaptive training carries out, so that it is accurate to the identification of voice data to improve speech recognition device
Rate reduces the influence being distorted to speech recognition accuracy.
As presently preferred embodiments of the present invention, be additionally provided in speech recognition device characteristic extracting module, text input module,
Training data memory module and training module.Wherein, characteristic extracting module and the sef-adapting filter module communicate to connect,
For receiving the voice data to be identified of Beam-former output, then extracted from the received voice data to be identified of institute
Acoustic feature;Text input module is for receiving the text marking corresponding with voice data to be identified being manually entered;Training
Data memory module and characteristic extracting module and text input module all communicate to connect, it stores the sound that characteristic extracting module is extracted
The corresponding text marking exported in feature and text input module is learned, and acoustic feature and corresponding text marking combine shape
At adaptive training data;Training module and training data memory module communicate to connect, it reads in training data memory module
The adaptive training data of storage simultaneously carry out adaptive training to acoustic model using read adaptive training data.
By taking speech control automatic teller machine as an example, multiple microphones are laterally set on automatic teller machine, which is at least arranged three
It is a, and be configured with certain spacing, which is used to obtain the sound in front of automatic teller machine to form voice signal.Laterally
The sound that the microphone of setting forms 180 ° in front of automatic teller machine obtains region, which obtains the sound in region
Sound includes the control instruction sound and ambient noise of speaker.It obtains in region in the sound and is formed by three Beam-formers
Three wave beams, main beam, the first noise wave beam and the second noise wave beam.The voice signal that multiple microphones obtain is digitized
It is input in Beam-former after forming voice data, the first noise wave beam is exported by the first Beam-former, passes through second
Beam-former exports main beam, exports the second noise wave beam by third Beam-former.By main beam, the first noise wave beam
It is input to adaptive-filtering module together with the second noise wave beam to be filtered, the first noise wave beam and are filtered out from main beam
Two noise wave beams, to export voice data to be identified, which is microphone array output.It will be wait know
Other voice data, which is input in speech recognition device, adaptively to be operated, to improve speech recognition device to the voice to be identified
The recognition accuracy of data then again identifies the voice data to be identified and forms corresponding text data output.
This article notebook data can be sent to automatic teller machine to enable automatic teller machine execute corresponding movement.
The present invention also provides a kind of linear Microphone Array Speech recognition methods, and this method is first by microphone according to line
Property array is configured, the microphone and audio signal for saying people is converted into audio data;Then audio data is sent into wave
Beamformer forms main beam and noise wave beam and export through filtering gives sef-adapting filter module, then by sef-adapting filter
The information of module acquisition noise wave beam and main beam simultaneously rejects noise wave beam from main beam, forms voice number to be identified
According to, finally the voice data to be identified of formation is input in speech recognition device identified and formed text data output.
Specifically,
First step selects three or more microphones, and be horizontally arranged at interval being aligned microphone array 1,
It is formed with one 0 ° to 180 ° of sound before linear microphone array and obtains region, as shown in Figure 1.The sound obtains area
The sound that domain obtains both had included people's one's voice in speech and the noise from ambient enviroment.Microphone receives the sound and obtains region
The sound of acquisition, and the audio signal that will acquire carries out digitized processing and forms an audio data, and exports.
Second step, in conjunction with Fig. 2, the sound in 1 front of linear microphone array obtains region and Beam-former is arranged,
Beam-former obtains region in sound and is formed with positioned at the main beam region at middle part, and the first of main beam region two sides
Noise beam area and the second noise beam area,;Then audio data is input in Beam-former, and exported and main wave
The corresponding main beam in beam region, the first noise wave beam corresponding with the first noise beam area, and with the second noise beam zone
The corresponding second noise wave beam in domain.
Third step filters the first noise wave beam, main beam and the second noise wave beam that Beam-former exports
Wave processing, makes an uproar the first noise wave beam and second according to the first noise wave beam of input and the second noise wave beam noise information
Beam of sound is rejected from main beam, forms a voice data to be identified.Specifically, the master issued in Beam-former is received
Wave beam, the first noise wave beam and the second noise wave beam carry out the first noise wave beam certainly by the first sef-adapting filter 31
It adapts to, the second noise wave beam is carried out adaptively by the second sef-adapting filter 32, first after then carrying out adaptively
Noise wave beam and the second noise wave beam are filtered out and are exported from main beam, and the result after output passes through a judgment module and artificially sets
Fixed standard is compared, if standard is not achieved in the beam quality after adaptive-filtering, the result of output is returned to first
It is again adaptive in sef-adapting filter 31 and the second sef-adapting filter 32, it moves in circles, recognizes until the result of output reaches
For the standard of setting, last result is exported from sef-adapting filter module 3, obtains voice data to be identified.
Four steps identifies voice data to be identified, and forms text data output on machine.
It include an angle before linear microphone array is 0 ° to 180 ° as presently preferred embodiments of the present invention
Sound obtain region, the audio data of formation is input in Beam-former by microphone, and the number of Beam-former is 3
It is a, it is divided into the first Beam-former 21, the second Beam-former 22 and third Beam-former 23, by the first Beam-former
It is used to form the first noise beam area, 180 ° of sound are directed toward at the center of the wave beam of first Beam-former and obtain region
20 ° of direction, the first Beam-former 21 obtain the audio data for being located at linear the first noise of microphone array beam area, and
Export the first noise wave beam;Second Beam-former 22 is used to form main beam region, by the wave beam of second Beam-former
Center be directed toward 90 ° of the direction that 180 ° of sound obtain region, the second Beam-former 22, which obtains, is located at linear microphone array
Main beam region audio data, export main beam;Third Beam-former 23 is used to form the second noise beam area, will
It is directed toward 160 ° of the direction that 180 ° of sound obtain region, third Beam-former 23 in the center of the wave beam of the third Beam-former
The noise for being located at the second noise beam area of linear microphone array is obtained, the second noise wave beam is exported.Wherein, the second wave beam
The width that shaper 22 is formed by the main lobe of wave beam is greater than the first Beam-former and third Beam-former is formed by
The width of the main lobe of wave beam, preferably, the width of the main lobe for the wave beam that the second Beam-former 22 is formed is less than or equal to 90 °, the
The width that one Beam-former and third Beam-former are formed by the main lobe of wave beam is less than or equal to 40 °.
As presently preferred embodiments of the present invention, in the first Beam-former 21, the second Beam-former 22 and third wave
The filter being all correspondingly connected with each of linear microphone array microphone is designed in beamformer 23, and each
A filter is all filtered by filter coefficient corresponding with itself, and filter coefficient is to shape to calculate by fixed beam
Method is calculated, which includes:
yn(k)=xn(k)+vn(k), n=1,2 ..., N (formula one)
In formula one, ynIt (k) is the collected audio data of n-th of microphone, xn(k) and vnIt (k) is collected respectively
Desired signal and additive noise, in formula two,Be Beam-former estimation output, audio data is filtered so that
The output of Beam-former exports again after approaching the desired signal that some microphone receives in linear microphone array,It is
The corresponding filter coefficient of n-th of microphone;
In formula three, em(k) output signal of Beam-former and the error of collected desired signal are indicated, it is equal to
The error e of desired signalX, m(k) with the error e of additive noiseV, m(k) sum;And the error e of desired signalX, m(k) it makes an uproar with additivity
The error e of soundV, m(k) it can be indicated with formula four and formula five, the error of desired signal is equal to output and the wave beam of Beam-former
The difference of the input of shaper, the error of additive noise are equal to the sum of all additive noises.
Because the additive noise of last Beam-former desired output is small as far as possible, obtained using based on error criterion
Mean square error is minimized as shown in formula six outWhenWhen minimum, the optimum filter of output filter
Coefficient hM, o, as shown in formula seven, and in order to guarantee that the distortion of desired signal is minimum, so constraint condition is added, eX, m(k)=0 estimate
Count out optimum filter coefficient hM, o。
It is equipped with acoustic model in speech recognition device 4 therein, using acoustic model to the voice data to be identified of input
Speech recognition is carried out, to identify corresponding speech text.Since the filtering processing of microphone array is inevitably to master
Wave beam causes to be distorted, and carries out the accuracy rate that will affect identification when speech recognition, to voice data in acoustic model to reduce the mistake
Very to the influence of speech recognition accuracy, before acoustic model carries out speech recognition, processed using by the microphone array
Voice data adaptive training is done to the acoustic model, speech recognition device 4 then passes through when carrying out speech recognition
The identification that the acoustic model through adaptive training carries out, so that it is accurate to the identification of voice data to improve speech recognition device
Rate reduces the influence being distorted to speech recognition accuracy.
As presently preferred embodiments of the present invention, acoustic model is adaptively grasped using the voice data to be identified
Make, further comprising the steps of: speech recognition device extracts the voice data to be identified of setting quantity first, and to extraction wait know
Other voice data carries out text marking;Then the corresponding acoustic feature of voice data to be identified of setting quantity is extracted again,
And it combines corresponding text marking to form adaptive training data with acoustic feature;Finally using adaptive training data to sound
It learns model and carries out adaptive training.The acoustic model through adaptively operating is to voice data to be identified after adaptive training terminates
Carry out speech recognition.
By taking speech control automatic teller machine as an example, multiple microphones are laterally set on automatic teller machine, which is at least arranged three
It is a, and be configured with certain spacing, which is used to obtain the sound in front of automatic teller machine to form voice signal.Laterally
The sound that the microphone of setting forms 180 ° in front of automatic teller machine obtains region, which obtains the sound in region
Sound includes the control instruction sound and ambient noise of speaker.It obtains in region in the sound and is formed by three Beam-formers
Three wave beams, main beam, the first noise wave beam and the second noise wave beam.The voice signal that multiple microphones obtain is digitized
It is input in Beam-former after forming voice data, the first noise wave beam is exported by the first Beam-former, passes through second
Beam-former exports main beam, exports the second noise wave beam by third Beam-former.By main beam, the first noise wave beam
It is input to adaptive-filtering module together with the second noise wave beam to be filtered, the first noise wave beam and are filtered out from main beam
Two noise wave beams, to export voice data to be identified, which is microphone array output.It will be wait know
Other voice data, which is input in speech recognition device, adaptively to be operated, to improve speech recognition device to the voice to be identified
The recognition accuracy of data then again identifies the voice data to be identified and forms corresponding text data output.
This article notebook data can be sent to automatic teller machine to enable automatic teller machine execute corresponding movement.
The present invention is directed to specific man-machine interactive voice, does not need real-time tracking sound bearing, and avoiding traditional algorithm may
Because of inhibition or distortion of the sound source position estimated bias bring to desired signal;Algorithm calculation amount is small simultaneously, realizes process letter
Just, cost is relatively low for folk prescription, and the voice data quality of acquisition is high, can be improved the accuracy rate of speech recognition.
It is described the invention in detail above in conjunction with accompanying drawings and embodiments, those skilled in the art can basis
Above description makes many variations example to the present invention.Thus, certain details in embodiment should not constitute limitation of the invention,
The present invention will be using the range that the appended claims define as protection scope of the present invention.