CN106710603A - Speech recognition method and system based on linear microphone array - Google Patents

Speech recognition method and system based on linear microphone array Download PDF

Info

Publication number
CN106710603A
CN106710603A CN201611202169.0A CN201611202169A CN106710603A CN 106710603 A CN106710603 A CN 106710603A CN 201611202169 A CN201611202169 A CN 201611202169A CN 106710603 A CN106710603 A CN 106710603A
Authority
CN
China
Prior art keywords
former
noise
identified
wave
region
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201611202169.0A
Other languages
Chinese (zh)
Other versions
CN106710603B (en
Inventor
贺来朋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen Yunzhixin Intelligent Technology Co Ltd
Unisound Shanghai Intelligent Technology Co Ltd
Original Assignee
SHANGHAI YUZHIYI INFORMATION TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SHANGHAI YUZHIYI INFORMATION TECHNOLOGY Co Ltd filed Critical SHANGHAI YUZHIYI INFORMATION TECHNOLOGY Co Ltd
Priority to CN201611202169.0A priority Critical patent/CN106710603B/en
Publication of CN106710603A publication Critical patent/CN106710603A/en
Application granted granted Critical
Publication of CN106710603B publication Critical patent/CN106710603B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/20Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The invention discloses a speech recognition method based on a linear microphone array. The method comprises the following steps: recording environment sound by utilizing the linear microphone array to form audio data; for a sound obtaining region in front of the linear microphone array, setting beam formers, and forming a main wave beam area in the center and a first noise wave beam area and a second noise wave beam area at the two sides in the sound obtaining region by utilizing the beam formers; inputting the audio data to the beam formers to obtain a main wave beam corresponding to the main wave beam area, a first noise wave beam corresponding to the first noise wave beam area and a second noise wave beam corresponding to the second noise wave beam area respectively; filtering out the first noise wave beam and the second noise wave beam in the main wave beam to obtain speech data to be identified; and carrying out speech recognition on the speech data to be identified to obtain corresponding text data and outputting the text data. The method and system are small in calculation amount, are high in quality of the obtained speech data and can improve speech recognition accuracy.

Description

Using the audio recognition method and system of linear microphone array
Technical field
The present invention relates to man machine language's identification field, espespecially a kind of audio recognition method using linear microphone array and System.
Background technology
In speech recognition system, the audio signal that generally got to microphone carries out noise reduction process, so as to suppress Ambient noise component in audio signal, so as to improve the recognition accuracy of speech recognition system.According to the wheat used in system The difference of gram wind number, the noise reduction algorithm of use can substantially be divided into single microphone noise reduction, dual microphone noise reduction and microphone Array noise reduction algorithm etc..
With the fast development of hardware system, microphone array is just increasingly widely applied.According to opening up for array element Flutter structure different, microphone array can be generally divided into linear array and annular array.Either linear array or circular array Row, when noise reduction process is carried out, are typically necessary the dimensional orientation that desired signal is obtained by auditory localization algorithm, then by solid Determine beamforming algorithm and form a reception wave beam with given shape, and desired signal place is pointed at beam main lobe center Direction.
However, auditory localization is carried out simultaneously and the amount of calculation of adaptive beamforming is very big, and when auditory localization occurs During deviation, it is easy to desired signal is caused to suppress or introduce distortion, and then has influence on speech recognition system performance.
The content of the invention
Defect it is an object of the invention to overcome prior art, proposes that a kind of voice using linear microphone array is known Other method and system, solve existing microphone array set-up mode exist it is computationally intensive, calculate complicated and cost of implementation compared with Problem high, it is therefore intended that reach good noise reduction using microphone array, to obtain high-quality voice data and carry The accuracy rate of speech recognition high.
To achieve these goals, the invention provides a kind of audio recognition method using linear microphone array, institute The method of stating includes:
The sound of environment is recorded to form voice data using linear microphone array;For the linear microphone array The sound in front obtains region and sets Beam-former, and obtaining region formation in the sound using the Beam-former is located at The main beam region at middle part and the first noise beam area and the second noise beam area positioned at both sides;By the audio number Main beam, correspondence the first noise wave beam in the correspondence main beam region are obtained according to being input in the Beam-former The first noise wave beam in region and the second noise wave beam of correspondence the second noise beam area;From the main beam The first noise wave beam and the second noise wave beam are filtered to obtain speech data to be identified;To the language to be identified Sound data carry out speech recognition to obtain corresponding text data and export.
Beneficial effects of the present invention are:The present invention forms three beam areas by obtaining to be designed in region sound, its In two wave beams be used to obtain noise, another wave beam is used to obtain desired signal, and exports correspondence by Beam-former Noise wave beam and main beam, noise wave beam is then further filtered out from main beam by sef-adapting filter module.The party Method does not need real-time tracking sound bearing, it is to avoid traditional algorithm probably due to sound source position estimated bias bring to expecting to believe Number suppression or distortion;Algorithm amount of calculation is small simultaneously, and implementation process is simple and convenient, and cost is relatively low, the speech data quality of acquisition It is high, it is possible to increase the accuracy rate of speech recognition.In addition combined with speech data to the self adaptation of speech recognition device, can be further Improve the accuracy rate of speech recognition.
Further improvement of the present invention is:Region being obtained for the sound in front of the linear microphone array, wave beam is set Shaper, including:The sound obtains region includes the plane domain of 0 ° to 180 ° of angle;It is provided for forming described first and makes an uproar Point to the sound and obtain in first Beam-former in sound wave region, the center of the wave beam that first Beam-former is formed Take 20 ° of directions in region;It is provided for being formed second Beam-former in the main beam region, by second Wave beam forming Point to 90 ° of directions that the sound obtains region in the center of the wave beam that device is formed;It is provided for forming second noise waves Point to the sound and obtain in 3rd Beam-former in beam region, the center of the wave beam that the 3rd Beam-former is formed 160 ° of directions in region.
Further improvement of the present invention is:When Beam-former is set, it is provided with each Beam-former and the line Property microphone array in each microphone correspondence connection wave filter, it is each Wave beam forming to use fixed beam shaping Algorithm Wave filter in device calculates filter coefficient;
The fixed beam shaping Algorithm includes:
yn(k)=xn(k)+vn(k), n=1,2 ..., N (formula one)
In formula one, ynK () is the voice data that n-th microphone is collected, xn(k) and vnK () collects respectively Desired signal and additive noise;In formula two,It is the output of Beam-former, the output of Beam-former is approached linearly The desired signal that certain microphone is received in microphone array,It is the corresponding filter coefficient of n-th microphone;
In formula three, emK () represents the output signal of Beam-former and the error of the desired signal for collecting, it is equal to The error e of desired signalX, mThe error e of (k) and additive noiseV, mThe sum of (k);And the error e of desired signalX, mK () is made an uproar with additivity The error e of soundV, mK () can be represented with formula four and formula five;
Formula six and formula seven are obtained based on mean square error is minimized, by minimizingTo make additive noise minimum, With reference to constraint eX, mK ()=0 is drawing optimum filter coefficient hM, o, h thereinmIt is all wave filter correspondences in Beam-former Filter coefficient matrices, hM, oIt is the corresponding optimal filter coefficient value of all wave filters in Beam-former.
Further improvement of the present invention is:Speech recognition is carried out to the speech data to be identified, including:First with institute State speech data to be identified carries out self adaptation operation to acoustic model;The acoustic model through self adaptation operation is then utilized to institute Stating speech data to be identified carries out speech recognition.
Further improvement of the present invention is:Self adaptation behaviour is carried out to acoustic model using the speech data to be identified Make, including:The speech data to be identified of setting quantity is extracted, and speech data to be identified to being extracted carries out text mark Note;
Extract it is described setting quantity the corresponding acoustic feature of speech data to be identified, and by corresponding text marking with The acoustic feature combines to form adaptive training data;
Adaptive training is carried out to the acoustic model using the adaptive training data.
Present invention also offers a kind of linear Microphone Array Speech identifying system, the system includes:It is linear with described The Beam-former of microphone array communication connection, sound of the Beam-former in front of the linear microphone array is obtained Region is taken to be formed positioned at the main beam region and the first noise beam area and the second noise beam zone positioned at both sides at middle part Domain, for being processed the received voice data and is obtained the main beam in the correspondence main beam region, correspondence First noise wave beam of the first noise beam area and the second noise waves of correspondence the second noise beam area Beam;
Sef-adapting filter module, with Beam-former communication connection, receives the main beam, first noise Wave beam and the second noise wave beam, and for filtering the first noise wave beam and second noise waves from the main beam Beam is obtaining speech data to be identified;
Speech recognition device, with sef-adapting filter module communication connection, receives the speech data to be identified, and For carrying out speech recognition to the speech data to be identified to obtain corresponding text data and export.
Further improvement of the present invention is:The sound obtains region includes plane domain of the angle from 0 ° to 180 °;Institute Stating Beam-former includes:The first Beam-former for forming the first noise waves region, first Wave beam forming Point to 20 ° of directions that the sound obtains region in the center of the wave beam that device is formed;
The second Beam-former for forming the main beam region, the wave beam that second Beam-former is formed Center point to 90 ° of directions that the sound obtains region;
The 3rd Beam-former for forming the second noise beam area, the 3rd Beam-former is formed The center of wave beam point to 160 ° of directions that the sound obtains region.
Further improvement of the present invention is:Be provided with each Beam-former with it is each in the linear microphone array The wave filter of microphone correspondence connection, the wave filter in each Beam-former is provided with corresponding filter coefficient;The filter Ripple device coefficient is calculated by fixed beam shaping Algorithm;
The fixed beam shaping Algorithm includes:
yn(k)=xn(k)+vn(k), n=1,2 ..., N (formula one)
In formula one, ynK () is the voice data that n-th microphone is collected, xn(k) and vnK () collects respectively Desired signal and additive noise;In formula two,It is the output of Beam-former, the output of Beam-former is approached linearly The desired signal that certain microphone is received in microphone array,It is the corresponding filter coefficient of n-th microphone;
In formula three, emK () represents the output signal of Beam-former and the error of the desired signal for collecting, it is equal to The error e of desired signalX, mThe error e of (k) and additive noiseV, mThe sum of (k);And the error e of desired signalX, mK () is made an uproar with additivity The error e of soundV, mK () can be represented with formula four and formula five;
Formula six and formula seven are obtained based on mean square error is minimized, by minimizingTo make additive noise minimum, With reference to constraint eX, mK ()=0 is drawing optimum filter coefficient hM, o, h thereinmIt is all wave filter correspondences in Beam-former Filter coefficient matrices, hM, oIt is the corresponding optimal filter coefficient value of all wave filters in Beam-former.
Further improvement of the present invention is:Institute's speech recognizer includes an acoustic model, and the acoustic model is through described Speech data to be identified is used further to identification speech data to be identified after carrying out adaptive training.
Further improvement of the present invention is:Institute's speech recognizer also include characteristic extracting module, text input module, Training data memory module and training module;
The characteristic extracting module is communicated to connect with the sef-adapting filter module, receives the voice number to be identified According to for extracting acoustic feature from the speech data to be identified for being received;
The text input module is used to be input into text marking corresponding with the speech data to be identified;
The training data memory module is communicated to connect with the characteristic extracting module and the text input module, is used for The acoustic feature and corresponding text marking are stored, the acoustic feature and corresponding text marking combine to form self adaptation instruction Practice data;
The training module is communicated to connect with the training data memory module, is read in the training data memory module Storage adaptive training data simultaneously carry out adaptive training using the adaptive training data for being read to the acoustic model.
Brief description of the drawings
Fig. 1 is the schematic diagram that sound obtains region;
Fig. 2 is the method flow diagram of linear Microphone Array Speech identification;
Fig. 3 is the schematic diagram of multi-channel adaptive wave filter.
Specific embodiment
Below in conjunction with the accompanying drawings, the present invention is described in further detail.With the fast development of hardware system, microphone array Row are just increasingly widely applied.Particularly in man machine language's interaction scenarios, conventional art pair carries out auditory localization simultaneously Amount of calculation with adaptive beamforming is very big, and when deviation occurs in auditory localization, it is easy to suppression is caused to desired signal System introduces distortion, and then has influence on speech recognition system performance.The present invention is applied in front of the linear microphone array 180 ° of plane domain, its applicable scene is man-machine interactive voice.Speaker when by Voice command machine, the speaker Can stand in face of machine, so when linear microphone array is listed in the voice for obtaining speaker, it is only necessary to consider in front of machine Voice, the voice without considering machine rear.The present invention is divided the plane domain in front of microphone, in utilization The wave beam wider in portion obtains the voice of speaker as much as possible, while suppressing ambient noise as much as possible;Using both sides compared with Narrow wave beam obtains environmental noise as much as possible, while suppressing desired human voice signal.Again by adaptive filter algorithm, from Ambient noise component is further eliminated in the output of main beam.Below, with reference to accompanying drawing to linear microphone array language of the invention The method and system of sound identification are illustrated.
As shown in Fig. 2 the invention discloses a kind of linear Microphone Array Speech identifying system, the system includes linear wheat Gram wind array 1, Beam-former, sef-adapting filter module 3 and speech recognition device 4.
Wherein, linear microphone array is used for recording the sound of external environment condition and changing into voice signal by digitizing Voice data.The sound acquisition region recorded is formed in the front of linear microphone array, Beam-former obtains area in sound Domain is formed with positioned at the main beam region at middle part, and in the first noise beam area of main beam region both sides and the second noise Beam area, Beam-former is communicated to connect with linear microphone array, receive main beam region, the first noise beam area and Voice data in second noise beam area simultaneously obtains main beam, the correspondence institute in the correspondence main beam region after being processed State the first noise wave beam of the first noise beam area and the second noise wave beam of correspondence the second noise beam area.
As shown in figure 3, sef-adapting filter module 3 is a multi-channel filter, with Beam-former communication connection, receive Main beam, the first noise wave beam and the second noise wave beam sent in Beam-former, first is passed through by the first noise wave beam Sef-adapting filter 31 carries out self adaptation, second noise wave beam is carried out into self adaptation by the second sef-adapting filter 32, then To carry out filtering in the first noise wave beam after self adaptation and the second noise wave beam main beam and exporting, the result after output passes through Feedback mechanism is delivered to sef-adapting filter 31 and 32, and according to normalization minimum mean-square (Normalized LeastMean Square, NLMS) algorithm constantly updates adaptive filter coefficient, and last result exports from sef-adapting filter module 3, Obtain speech data to be identified.
Speech recognition device 4, with the communication connection of sef-adapting filter module, speech recognition device 4 receives sef-adapting filter mould The speech data to be identified of block output, and speech recognition is carried out to speech data to be identified obtain corresponding text data simultaneously Output.
Sound of the angle from 0 ° to 180 ° is included before linear microphone array 1 and obtains region, wave beam shape The number grown up to be a useful person is 3, respectively the first Beam-former 21, the second Beam-former 22 and the 3rd Beam-former 23, First Beam-former 21 is used to form the first noise beam area, and the center of the wave beam that the first Beam-former 21 is formed refers to 20 ° of region of direction is obtained to sound;Second Beam-former 22 is used to form main beam region, the second Beam-former 22 Point to the direction that sound obtains 90 ° of region in the center of the wave beam for being formed;3rd Beam-former 23 is used for formation second and makes an uproar Point to the direction that sound obtains 160 ° of region in beam of sound region, the center of the wave beam that the 3rd Beam-former 23 is formed.Wherein, The width of the main lobe of the wave beam that the second Beam-former 22 is formed is greater than the first Beam-former and the 3rd Beam-former The width of the main lobe of the wave beam for being formed, it is preferred that the second Beam-former 22 formed wave beam main lobe width less than etc. In 90 °, the width of the main lobe of the wave beam that the first Beam-former and the 3rd Beam-former are formed is less than or equal to 40 °.
As presently preferred embodiments of the present invention, in the first Beam-former 21, the second Beam-former 22 and the 3rd ripple The wave filter of connection corresponding with each microphone in linear microphone array is designed with beamformer 23, and it is each Individual wave filter is all filtered by filter coefficient corresponding with itself, and filter coefficient is to shape to calculate by fixed beam Method is calculated, and the fixed beam shaping Algorithm includes:
yn(k)=xn(k)+vn(k), n=1,2 ..., N (formula one)
In formula one, ynK () is the voice data that n-th microphone is collected, xn(k) and vnK () collects respectively Desired signal and additive noise, in formula two,Be Beam-former estimation output, voice data is filtered so that The output of Beam-former is exported again after approaching the desired signal that certain microphone is received in linear microphone array,It is The corresponding filter coefficient of n-th microphone;
In formula three, emK () represents the output signal of Beam-former and the error of the desired signal for collecting, it is equal to The error e of desired signalX, mThe error e of (k) and additive noiseV, mThe sum of (k);And the error e of desired signalX, mK () is made an uproar with additivity The error e of soundV, mK () can be represented with formula four and formula five, the error of desired signal is equal to output and the wave beam of Beam-former The difference of the input of shaper, the error of additive noise is equal to the sum of all additive noises.
Formula six and formula seven are obtained based on mean square error is minimized, by minimizingTo make additive noise minimum, With reference to constraint eX, mK ()=0 is drawing optimum filter coefficient hM, o, h thereinmIt is all wave filter correspondences in Beam-former Filter coefficient matrices, hM, oIt is the corresponding optimal filter coefficient value of all wave filters in Beam-former.
Because the additive noise of last Beam-former desired output is small as far as possible, obtained using based on error criterion Go out the minimum mean square error as shown in formula sixWhenWhen minimum, the optimum filter of output filter Coefficient hM, o, as shown in formula seven, and the distortion in order to ensure desired signal is minimum, so constraints is added, eX, mEstimate (k)=0 Count out optimum filter coefficient hM, o
Acoustic model is provided with speech recognition device therein 4, using acoustic model to the speech data to be identified that is input into Speech recognition is carried out, to identify corresponding speech text.Because the filtering process of microphone array is inevitably to master Wave beam causes distortion, the accuracy rate of identification can be influenceed when acoustic model carries out speech recognition to speech data, to reduce the mistake Very to the influence of speech recognition accuracy, before acoustic model carries out speech recognition, treated using by the microphone array Speech data adaptive training is done to the acoustic model, speech recognition device 4 then passes through when speech recognition is carried out The identification carried out through the acoustic model of adaptive training, it is accurate so as to improve identification of the speech recognition device to speech data Rate, reduces influence of the distortion to speech recognition accuracy.
As presently preferred embodiments of the present invention, be additionally provided with speech recognition device characteristic extracting module, text input module, Training data memory module and training module.Wherein, characteristic extracting module is communicated to connect with the sef-adapting filter module, Speech data to be identified for receiving Beam-former output, then extracts from the speech data to be identified for being received Acoustic feature;Text input module is used to receive the text marking corresponding with speech data to be identified being manually entered;Training Data memory module and characteristic extracting module and text input module are all communicated to connect, and it stores the sound that characteristic extracting module is extracted The corresponding text marking exported in feature and text input module is learned, and acoustic feature and corresponding text marking are combined into shape Into adaptive training data;Training module is communicated to connect with training data memory module, and it is read in training data memory module The adaptive training data of storage simultaneously carry out adaptive training using the adaptive training data for being read to acoustic model.
By taking speech control automatic teller machine as an example, the horizontally set multiple microphone on automatic teller machine, the microphone is at least provided with three It is individual, and be configured with certain spacing, the microphone is used to obtain the sound in front of automatic teller machine to form voice signal.Laterally The sound that the microphone of setting forms 180 ° in front of automatic teller machine obtains region, and 180 ° of the sound obtains the sound in region Sound includes the control instruction sound and ambient noise of speaker.Obtained in the sound and formed by three Beam-formers in region Three wave beams, main beam, the first noise wave beam and the second noise wave beam.The voice signal that multiple microphones are obtained is digitized into It is input in Beam-former after forming voice data, the first noise wave beam is exported by the first Beam-former, by second Beam-former exports main beam, and the second noise wave beam is exported by the 3rd Beam-former.By main beam, the first noise wave beam Adaptive-filtering module is input to together with the second noise wave beam to be filtered, and the first noise wave beam and are filtered from main beam Two noise wave beams, so as to export speech data to be identified, the speech data to be identified is exported for microphone array.To wait to know Other speech data carries out self adaptation operation in being input to speech recognition device, to improve speech recognition device to the voice to be identified The recognition accuracy of data, is then identified to the speech data to be identified and is formed corresponding text data output again. This article notebook data can be sent to automatic teller machine to make automatic teller machine perform corresponding action.
Present invention also offers a kind of linear Microphone Array Speech recognition methods, the method is first by microphone according to line Property array is configured, the microphone and audio signal for saying people is converted into voice data;Then voice data is sent into ripple Beamformer forms main beam and noise wave beam and export through filtering gives sef-adapting filter module, then by sef-adapting filter The information of module acquisition noise wave beam and main beam simultaneously rejects noise wave beam from main beam, forms voice number to be identified According to, finally by formed speech data to be identified be input in speech recognition device be identified and formed text data output. Specifically,
The microphone of first step, selection three and the above, be horizontally arranged at interval being aligned microphone array 1, Region is obtained in the sound for being previously formed one 0 ° to 180 ° of linear microphone array, as shown in Figure 1.The sound obtains area The sound that domain obtains both includes people's one's voice in speech and the noise from surrounding environment.Microphone receives the sound and obtains region The sound of acquisition, and the audio signal of acquisition is digitized treatment one voice data of formation, and export.
Second step, with reference to Fig. 2, the sound in the front of linear microphone array 1 obtains region and sets Beam-former, Beam-former obtains region and is formed with positioned at the main beam region at middle part in sound, and the first of main beam region both sides Noise beam area and the second noise beam area,;Then voice data is input in Beam-former, and is exported and main ripple The corresponding main beam in beam region, the first noise wave beam corresponding with the first noise beam area, and with the second noise beam zone The corresponding second noise wave beam in domain.
Third step, the first noise wave beam that Beam-former is exported, main beam and the second noise wave beam are filtered Ripple treatment, the first noise wave beam and the second noise wave beam noise information according to input make an uproar the first noise wave beam and second Beam of sound is rejected from main beam, forms a speech data to be identified.Specifically, the master sent in Beam-former is received Wave beam, the first noise wave beam and the second noise wave beam, the first noise wave beam is carried out certainly by the first sef-adapting filter 31 Adapt to, the second noise wave beam is carried out into self adaptation by the second sef-adapting filter 32, then will carry out first after self adaptation Noise wave beam and the second noise wave beam are filtered and exported from main beam, and the result after output passes through a judge module and artificially sets Fixed standard is compared, if the beam quality after adaptive-filtering does not reach standard, the result of output is returned into first Again self adaptation in the sef-adapting filter 32 of sef-adapting filter 31 and second, moves in circles, until the result of output reaches and recognizes It is the standard of setting, last result is exported from sef-adapting filter module 3, obtains speech data to be identified.
Four steps, recognizes speech data to be identified, and formation one text data output is on machine.
As presently preferred embodiments of the present invention, an angle is included before linear microphone array for 0 ° to 180 ° Sound obtain region, be input to the voice data of formation in Beam-former by microphone, and the number of Beam-former is 3 It is individual, it is divided into the first Beam-former 21, the second Beam-former 22 and the 3rd Beam-former 23, by the first Beam-former For forming the first noise beam area, 180 ° of sound are pointed at the center of the wave beam of first Beam-former and obtains region 20 ° of direction, the first Beam-former 21 obtains the voice data positioned at the first noise of linear microphone array beam area, and Export the first noise wave beam;Second Beam-former 22 is used to form main beam region, by the wave beam of second Beam-former Center point to the direction that 180 ° of sound obtain 90 ° of region, the second Beam-former 22 is obtained and is located at linear microphone array Main beam region voice data, export main beam;3rd Beam-former 23 is used to form the second noise beam area, will Point to the direction that 180 ° of sound obtain 160 ° of region, the 3rd Beam-former 23 in the center of the wave beam of the 3rd Beam-former The noise positioned at the second noise beam area of linear microphone array is obtained, the second noise wave beam is exported.Wherein, the second wave beam The width of the main lobe of the wave beam that shaper 22 is formed is greater than the first Beam-former and the 3rd Beam-former formed The width of the main lobe of wave beam, it is preferred that the width of the main lobe of the wave beam that the second Beam-former 22 is formed is less than or equal to 90 °, the The width of the main lobe of the wave beam that one Beam-former and the 3rd Beam-former are formed is less than or equal to 40 °.
As presently preferred embodiments of the present invention, in the first Beam-former 21, the second Beam-former 22 and the 3rd ripple The wave filter of connection corresponding with each microphone in linear microphone array is designed with beamformer 23, and it is each Individual wave filter is all filtered by filter coefficient corresponding with itself, and filter coefficient is to shape to calculate by fixed beam Method is calculated, and the fixed beam shaping Algorithm includes:
yn(k)=xn(k)+vn(k), n=1,2 ..., N (formula one)
In formula one, ynK () is the voice data that n-th microphone is collected, xn(k) and vnK () collects respectively Desired signal and additive noise, in formula two,Be Beam-former estimation output, voice data is filtered so that The output of Beam-former is exported again after approaching the desired signal that certain microphone is received in linear microphone array,It is The corresponding filter coefficient of n-th microphone;
In formula three, emK () represents the output signal of Beam-former and the error of the desired signal for collecting, it is equal to The error e of desired signalX, mThe error e of (k) and additive noiseV, mThe sum of (k);And the error e of desired signalX, mK () is made an uproar with additivity The error e of soundV, mK () can be represented with formula four and formula five, the error of desired signal is equal to output and the wave beam of Beam-former The difference of the input of shaper, the error of additive noise is equal to the sum of all additive noises.
Because the additive noise of last Beam-former desired output is small as far as possible, obtained using based on error criterion Go out the minimum mean square error as shown in formula sixWhenWhen minimum, the optimum filter of output filter Coefficient hM, o, as shown in formula seven, and the distortion in order to ensure desired signal is minimum, so constraints is added, eX, mEstimate (k)=0 Count out optimum filter coefficient hM, o
Acoustic model is provided with speech recognition device therein 4, using acoustic model to the speech data to be identified that is input into Speech recognition is carried out, to identify corresponding speech text.Because the filtering process of microphone array is inevitably to master Wave beam causes distortion, the accuracy rate of identification can be influenceed when acoustic model carries out speech recognition to speech data, to reduce the mistake Very to the influence of speech recognition accuracy, before acoustic model carries out speech recognition, treated using by the microphone array Speech data adaptive training is done to the acoustic model, speech recognition device 4 then passes through when speech recognition is carried out The identification carried out through the acoustic model of adaptive training, it is accurate so as to improve identification of the speech recognition device to speech data Rate, reduces influence of the distortion to speech recognition accuracy.
As presently preferred embodiments of the present invention, self adaptation behaviour is carried out to acoustic model using the speech data to be identified Make, it is further comprising the steps of:Speech recognition device extract first setting quantity speech data to be identified, and to extract wait know Other speech data carries out text marking;Then the corresponding acoustic feature of speech data to be identified of setting quantity is extracted again, And combine corresponding text marking with acoustic feature to form adaptive training data;Finally using adaptive training data to sound Learning model carries out adaptive training.Adaptive training terminate after through self adaptation operate acoustic model to speech data to be identified Carry out speech recognition.
By taking speech control automatic teller machine as an example, the horizontally set multiple microphone on automatic teller machine, the microphone is at least provided with three It is individual, and be configured with certain spacing, the microphone is used to obtain the sound in front of automatic teller machine to form voice signal.Laterally The sound that the microphone of setting forms 180 ° in front of automatic teller machine obtains region, and 180 ° of the sound obtains the sound in region Sound includes the control instruction sound and ambient noise of speaker.Obtained in the sound and formed by three Beam-formers in region Three wave beams, main beam, the first noise wave beam and the second noise wave beam.The voice signal that multiple microphones are obtained is digitized into It is input in Beam-former after forming voice data, the first noise wave beam is exported by the first Beam-former, by second Beam-former exports main beam, and the second noise wave beam is exported by the 3rd Beam-former.By main beam, the first noise wave beam Adaptive-filtering module is input to together with the second noise wave beam to be filtered, and the first noise wave beam and are filtered from main beam Two noise wave beams, so as to export speech data to be identified, the speech data to be identified is exported for microphone array.To wait to know Other speech data carries out self adaptation operation in being input to speech recognition device, to improve speech recognition device to the voice to be identified The recognition accuracy of data, is then identified to the speech data to be identified and is formed corresponding text data output again. This article notebook data can be sent to automatic teller machine to make automatic teller machine perform corresponding action.
The present invention is directed to specific man-machine interactive voice, it is not necessary to real-time tracking sound bearing, it is to avoid traditional algorithm may Because suppression or distortion to desired signal that sound source position estimated bias bring;Algorithm amount of calculation is small simultaneously, implementation process letter Just, cost is relatively low for folk prescription, and the speech data quality of acquisition is high, it is possible to increase the accuracy rate of speech recognition.
The present invention is described in detail above in association with drawings and Examples, those skilled in the art can basis Described above makes many variations example to the present invention.Thus, some of embodiment details should not constitute limitation of the invention, The scope that to be defined using appended claims of the present invention is used as protection scope of the present invention.

Claims (10)

1. a kind of audio recognition method using linear microphone array, it is characterised in that methods described comprises the following steps:
The sound of environment is recorded to form voice data using linear microphone array;
Region is obtained for the sound in front of the linear microphone array and Beam-former is set, using the Beam-former Region is obtained in the sound form main beam region and the first noise beam area positioned at both sides and the positioned at middle part Two noise beam areas;
The voice data is input in the Beam-former to obtain main beam, the correspondence in the correspondence main beam region First noise wave beam of the first noise beam area and the second noise waves of correspondence the second noise beam area Beam;
The first noise wave beam and the second noise wave beam is filtered from the main beam to obtain voice number to be identified According to;
Speech recognition is carried out to the speech data to be identified to obtain corresponding text data and export.
2. the method for claim 1, it is characterised in that the sound obtains region includes that angle is flat from 0 ° to 180 ° Face region, obtains region and sets Beam-former for the sound in front of the linear microphone array, including:It is provided for shape Into the first Beam-former of the first noise beam area, the center of the wave beam that first Beam-former is formed Point to 20 ° of directions that the sound obtains region;
It is provided for being formed second Beam-former in the main beam region, the ripple that second Beam-former is formed Point to 90 ° of directions that the sound obtains region in the center of beam;
It is provided for being formed the 3rd Beam-former of the second noise beam area, by the 3rd Beam-former institute shape Into the center of wave beam point to 160 ° of directions that the sound obtains region.
3. method as claimed in claim 2, it is characterised in that when setting Beam-former, is set in each Beam-former There is the wave filter of connection corresponding with each microphone in the linear microphone array, use fixed beam shaping Algorithm for every Wave filter in one Beam-former calculates filter coefficient;
The fixed beam shaping Algorithm includes:
yn(k)=xn(k)+vn(k), n=1,2 ..., N (formula one)
In formula one, ynK () is the voice data that n-th microphone is collected, xn(k) and vnK () is respectively the expectation for collecting Signal and additive noise;In formula two,It is the output of Beam-former, the output of Beam-former is approached into linear Mike The desired signal that certain microphone is received in wind array,It is the corresponding filter coefficient of n-th microphone;
In formula three, emK () represents the output signal of Beam-former and the error of the desired signal for collecting, it is equal to expects The error e of signalX, mThe error e of (k) and additive noiseV, mThe sum of (k);And the error e of desired signalX, m(k) and additive noise Error eV, mK () can be represented with formula four and formula five:
Formula six and formula seven are obtained based on mean square error is minimized, by minimizingMinimum, the knot with modern power mouthful property noise Contract beam eX, mK ()=0 is drawing optimum filter coefficient hM, o, h thereinmFor all wave filters are corresponding in Beam-former Filter coefficient matrices, nM, oIt is the corresponding optimal filter coefficient value of all wave filters in Beam-former.
4. the method for claim 1, it is characterised in that carry out speech recognition to the speech data to be identified, wraps Include:Self adaptation operation is carried out to acoustic model first with the speech data to be identified;Then utilize what is operated through self adaptation Acoustic model carries out speech recognition to the speech data to be identified.
5. method as claimed in claim 4, it is characterised in that carried out to acoustic model using the speech data to be identified Self adaptation is operated, including:
The speech data to be identified of setting quantity is extracted, and speech data to be identified to being extracted carries out text marking:
Extract it is described setting quantity the corresponding acoustic feature of speech data to be identified, and by corresponding text marking with it is described Acoustic feature combines to form adaptive training data;
Adaptive training is carried out to the acoustic model using the adaptive training data.
6. a kind of linear Microphone Array Speech identifying system, it is characterised in that the system includes:
Linear microphone array, for recording the sound of environment to form voice data;
The Beam-former communicated to connect with the linear microphone array, the Beam-former is in the linear microphone array Sound in front of row obtain region formed positioned at middle part main beam region and the first noise beam area positioned at both sides and Second noise beam area, for being processed the received voice data and is obtained the correspondence main beam region Main beam, the first noise wave beam of correspondence the first noise beam area and correspondence the second noise beam area The second noise wave beam;
Sef-adapting filter module, with Beam-former communication connection, receives the main beam, the first noise wave beam And the output of the second noise wave beam, and for filtering the first noise wave beam and described second from the main beam Noise wave beam is obtaining speech data to be identified;
Speech recognition device, with sef-adapting filter module communication connection, receives the speech data to be identified, and be used for Speech recognition is carried out to the speech data to be identified to obtain corresponding text data and export.
7. system as claimed in claim 6, it is characterised in that the sound obtains region includes that angle is flat from 0 ° to 180 ° Face region;
The Beam-former includes:The first Beam-former for forming the first noise beam area, described first Point to 20 ° of directions that the sound obtains region in the center of the wave beam that Beam-former is formed;
The second Beam-former for forming the main beam region, in the wave beam that second Beam-former is formed The heart points to 90 ° of directions that the sound obtains region;
The 3rd Beam-former for forming the second noise beam area, the ripple that the 3rd Beam-former is formed Point to 160 ° of directions that the sound obtains region in the center of beam.
8. system as claimed in claim 6, it is characterised in that be provided with each Beam-former and the linear microphone array The wave filter of each microphone correspondence connection in row, the wave filter in each Beam-former is provided with corresponding wave filter system Number;The filter coefficient is calculated by fixed beam shaping Algorithm;
The fixed beam shaping Algorithm includes:
yn(k)=xn(k)+vn(k), n=1,2 ..., N (formula one)
In formula one, ynK () is the voice data that n-th microphone is collected, xn(k) and vnK () is respectively the expectation for collecting Signal and additive noise;In formula two,It is the output of Beam-former, the output of Beam-former is approached into linear Mike The desired signal that certain microphone is received in wind array,It is the corresponding filter coefficient of n-th microphone;
In formula three, emK () represents the output signal of Beam-former and the error of the desired signal for collecting, it is equal to expects The error e of signalX, mThe error e of (k) and additive noiseV, mThe sum of (k);And the error e of desired signalX, m(k) and additive noise Error eV, mK () can be represented with formula four and formula five;
Formula six and formula seven are obtained based on mean square error is minimized, by minimizingTo make additive noise minimum, with reference to Constraint eX, mK ()=0 is drawing optimum filter coefficient hM, o, h thereinmIt is the corresponding filter of all wave filters in Beam-former Ripple device coefficient matrix, hM, oIt is the corresponding optimal filter coefficient value of all wave filters in Beam-former.
9. system as claimed in claim 6, it is characterised in that institute's speech recognizer includes an acoustic model, the acoustics Model is used further to identification speech data to be identified after carrying out adaptive training through the speech data to be identified.
10. system as claimed in claim 9, it is characterised in that institute's speech recognizer also includes characteristic extracting module, text This input module, training data memory module and training module;
The characteristic extracting module is communicated to connect with the sef-adapting filter module, receives the speech data to be identified, For extracting acoustic feature from the speech data to be identified for being received;
The text input module is used to be input into text marking corresponding with the speech data to be identified;
The training data memory module is communicated to connect with the characteristic extracting module and the text input module, for storing The acoustic feature and corresponding text marking, the acoustic feature and corresponding text marking combine to form adaptive training number According to;
The training module is communicated to connect with the training data memory module, reads the training data memory module memory storage Adaptive training data simultaneously carry out adaptive training using the adaptive training data for being read to the acoustic model.
CN201611202169.0A 2016-12-23 2016-12-23 Utilize the audio recognition method and system of linear microphone array Active CN106710603B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611202169.0A CN106710603B (en) 2016-12-23 2016-12-23 Utilize the audio recognition method and system of linear microphone array

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611202169.0A CN106710603B (en) 2016-12-23 2016-12-23 Utilize the audio recognition method and system of linear microphone array

Publications (2)

Publication Number Publication Date
CN106710603A true CN106710603A (en) 2017-05-24
CN106710603B CN106710603B (en) 2019-08-06

Family

ID=58903066

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611202169.0A Active CN106710603B (en) 2016-12-23 2016-12-23 Utilize the audio recognition method and system of linear microphone array

Country Status (1)

Country Link
CN (1) CN106710603B (en)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107507623A (en) * 2017-10-09 2017-12-22 维拓智能科技(深圳)有限公司 Self-service terminal based on Microphone Array Speech interaction
CN107785029A (en) * 2017-10-23 2018-03-09 科大讯飞股份有限公司 Target voice detection method and device
CN108364664A (en) * 2018-02-01 2018-08-03 北京云知声信息技术有限公司 The method of automatic data acquisition and mark
CN108696781A (en) * 2018-05-17 2018-10-23 四川湖山电器股份有限公司 A kind of method that the microphone of linear pattern forms directive property in space
CN108922518A (en) * 2018-07-18 2018-11-30 苏州思必驰信息科技有限公司 voice data amplification method and system
CN110322892A (en) * 2019-06-18 2019-10-11 中国船舶工业系统工程研究院 A kind of voice picking up system and method based on microphone array
CN110364176A (en) * 2019-08-21 2019-10-22 百度在线网络技术(北京)有限公司 Audio signal processing method and device
CN110519676A (en) * 2019-08-22 2019-11-29 云知声智能科技股份有限公司 A kind of method of the distributed microphone pickup of decentralization
CN110797042A (en) * 2018-08-03 2020-02-14 杭州海康威视数字技术股份有限公司 Audio processing method, device and storage medium
CN111429916A (en) * 2020-02-20 2020-07-17 西安声联科技有限公司 Sound signal recording system
WO2020192721A1 (en) * 2019-03-28 2020-10-01 华为技术有限公司 Voice awakening method and apparatus, and device and medium
CN111986678A (en) * 2020-09-03 2020-11-24 北京蓦然认知科技有限公司 Voice acquisition method and device for multi-channel voice recognition
CN112216298A (en) * 2019-07-12 2021-01-12 大众问问(北京)信息科技有限公司 Method, device and equipment for orienting sound source by double-microphone array
CN113053408A (en) * 2021-03-12 2021-06-29 云知声智能科技股份有限公司 Sound source separation method and device
CN113301476A (en) * 2021-03-31 2021-08-24 阿里巴巴新加坡控股有限公司 Pickup device and microphone array structure
CN113393856A (en) * 2020-03-11 2021-09-14 华为技术有限公司 Sound pickup method and device and electronic equipment
CN113539288A (en) * 2021-07-22 2021-10-22 南京华捷艾米软件科技有限公司 Voice signal denoising method and device
CN113782024A (en) * 2021-09-27 2021-12-10 上海互问信息科技有限公司 Method for improving automatic voice recognition accuracy rate after voice awakening

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1995034983A1 (en) * 1994-06-14 1995-12-21 Ab Volvo Adaptive microphone arrangement and method for adapting to an incoming target-noise signal
CN1851806A (en) * 2006-05-30 2006-10-25 北京中星微电子有限公司 Adaptive microphone array system and its voice signal processing method
CN102831898A (en) * 2012-08-31 2012-12-19 厦门大学 Microphone array voice enhancement device with sound source direction tracking function and method thereof
CN102969002A (en) * 2012-11-28 2013-03-13 厦门大学 Microphone array speech enhancement device capable of suppressing mobile noise
CN105532017A (en) * 2013-03-12 2016-04-27 谷歌技术控股有限责任公司 Apparatus and method for beamforming to obtain voice and noise signals

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1995034983A1 (en) * 1994-06-14 1995-12-21 Ab Volvo Adaptive microphone arrangement and method for adapting to an incoming target-noise signal
CN1851806A (en) * 2006-05-30 2006-10-25 北京中星微电子有限公司 Adaptive microphone array system and its voice signal processing method
CN102831898A (en) * 2012-08-31 2012-12-19 厦门大学 Microphone array voice enhancement device with sound source direction tracking function and method thereof
CN102969002A (en) * 2012-11-28 2013-03-13 厦门大学 Microphone array speech enhancement device capable of suppressing mobile noise
CN105532017A (en) * 2013-03-12 2016-04-27 谷歌技术控股有限责任公司 Apparatus and method for beamforming to obtain voice and noise signals

Cited By (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107507623A (en) * 2017-10-09 2017-12-22 维拓智能科技(深圳)有限公司 Self-service terminal based on Microphone Array Speech interaction
CN107785029A (en) * 2017-10-23 2018-03-09 科大讯飞股份有限公司 Target voice detection method and device
CN107785029B (en) * 2017-10-23 2021-01-29 科大讯飞股份有限公司 Target voice detection method and device
US11308974B2 (en) 2017-10-23 2022-04-19 Iflytek Co., Ltd. Target voice detection method and apparatus
CN108364664A (en) * 2018-02-01 2018-08-03 北京云知声信息技术有限公司 The method of automatic data acquisition and mark
CN108364664B (en) * 2018-02-01 2020-04-24 云知声智能科技股份有限公司 Method for automatic data acquisition and marking
CN108696781A (en) * 2018-05-17 2018-10-23 四川湖山电器股份有限公司 A kind of method that the microphone of linear pattern forms directive property in space
CN108922518A (en) * 2018-07-18 2018-11-30 苏州思必驰信息科技有限公司 voice data amplification method and system
CN110797042A (en) * 2018-08-03 2020-02-14 杭州海康威视数字技术股份有限公司 Audio processing method, device and storage medium
CN110797042B (en) * 2018-08-03 2022-04-15 杭州海康威视数字技术股份有限公司 Audio processing method, device and storage medium
WO2020192721A1 (en) * 2019-03-28 2020-10-01 华为技术有限公司 Voice awakening method and apparatus, and device and medium
CN110322892A (en) * 2019-06-18 2019-10-11 中国船舶工业系统工程研究院 A kind of voice picking up system and method based on microphone array
CN110322892B (en) * 2019-06-18 2021-11-16 中国船舶工业系统工程研究院 Voice pickup system and method based on microphone array
CN112216298B (en) * 2019-07-12 2024-04-26 大众问问(北京)信息科技有限公司 Dual-microphone array sound source orientation method, device and equipment
CN112216298A (en) * 2019-07-12 2021-01-12 大众问问(北京)信息科技有限公司 Method, device and equipment for orienting sound source by double-microphone array
CN110364176A (en) * 2019-08-21 2019-10-22 百度在线网络技术(北京)有限公司 Audio signal processing method and device
CN110519676A (en) * 2019-08-22 2019-11-29 云知声智能科技股份有限公司 A kind of method of the distributed microphone pickup of decentralization
CN111429916A (en) * 2020-02-20 2020-07-17 西安声联科技有限公司 Sound signal recording system
CN111429916B (en) * 2020-02-20 2023-06-09 西安声联科技有限公司 Sound signal recording system
WO2021180085A1 (en) * 2020-03-11 2021-09-16 华为技术有限公司 Sound pickup method and apparatus and electronic device
CN113393856A (en) * 2020-03-11 2021-09-14 华为技术有限公司 Sound pickup method and device and electronic equipment
CN113393856B (en) * 2020-03-11 2024-01-16 华为技术有限公司 Pickup method and device and electronic equipment
CN111986678B (en) * 2020-09-03 2023-12-29 杭州蓦然认知科技有限公司 Voice acquisition method and device for multipath voice recognition
CN111986678A (en) * 2020-09-03 2020-11-24 北京蓦然认知科技有限公司 Voice acquisition method and device for multi-channel voice recognition
CN113053408A (en) * 2021-03-12 2021-06-29 云知声智能科技股份有限公司 Sound source separation method and device
CN113053408B (en) * 2021-03-12 2022-06-14 云知声智能科技股份有限公司 Sound source separation method and device
CN113301476A (en) * 2021-03-31 2021-08-24 阿里巴巴新加坡控股有限公司 Pickup device and microphone array structure
CN113301476B (en) * 2021-03-31 2023-11-14 阿里巴巴(中国)有限公司 Pickup device and microphone array structure
CN113539288A (en) * 2021-07-22 2021-10-22 南京华捷艾米软件科技有限公司 Voice signal denoising method and device
CN113782024A (en) * 2021-09-27 2021-12-10 上海互问信息科技有限公司 Method for improving automatic voice recognition accuracy rate after voice awakening
CN113782024B (en) * 2021-09-27 2024-03-12 上海互问信息科技有限公司 Method for improving accuracy of automatic voice recognition after voice awakening

Also Published As

Publication number Publication date
CN106710603B (en) 2019-08-06

Similar Documents

Publication Publication Date Title
CN106710603B (en) Utilize the audio recognition method and system of linear microphone array
JP7011075B2 (en) Target voice acquisition method and device based on microphone array
CN106448722B (en) The way of recording, device and system
CN106653041B (en) Audio signal processing apparatus, method and electronic apparatus
JP6954680B2 (en) Speaker confirmation method and speaker confirmation device
CN102421050B (en) Apparatus and method for enhancing audio quality using non-uniform configuration of microphones
CN102831898B (en) Microphone array voice enhancement device with sound source direction tracking function and method thereof
CN102388416B (en) Signal processing apparatus and signal processing method
CN107346661B (en) Microphone array-based remote iris tracking and collecting method
US20170365255A1 (en) Far field automatic speech recognition pre-processing
CN206349145U (en) Audio signal processing apparatus
CN108766419A (en) A kind of abnormal speech detection method based on deep learning
CN106782584A (en) Audio signal processing apparatus, method and electronic equipment
CN103426440A (en) Voice endpoint detection device and voice endpoint detection method utilizing energy spectrum entropy spatial information
CN107017003A (en) A kind of microphone array far field speech sound enhancement device
CN111916101A (en) Deep learning noise reduction method and system fusing bone vibration sensor and double-microphone signals
CN103208291A (en) Speech enhancement method and device applicable to strong noise environments
JP2015070321A (en) Sound processing device, sound processing method, and sound processing program
CN112363112B (en) Sound source positioning method and device based on linear microphone array
CN106992010A (en) Without the microphone array speech enhancement device under the conditions of direct sound wave
WO2020087716A1 (en) Auditory scene recognition method for artificial cochlea
CN108520756A (en) A kind of method and device of speaker's speech Separation
Hazrati et al. Leveraging automatic speech recognition in cochlear implants for improved speech intelligibility under reverberation
CN109862498A (en) A kind of digital deaf-aid sound source direction method based on convolutional neural networks
CN203165457U (en) Voice acquisition device used for noisy environment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20171010

Address after: 200233 Shanghai City, Xuhui District Guangxi 65 No. 1 Jinglu room 702 unit 03

Applicant after: Cloud known sound (Shanghai) Technology Co. Ltd.

Address before: 200233 Shanghai, Qinzhou, North Road, No. 82, building 2, layer 1198,

Applicant before: SHANGHAI YUZHIYI INFORMATION TECHNOLOGY CO., LTD.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20200415

Address after: 200233 Shanghai City, Xuhui District Guangxi 65 No. 1 Jinglu room 702 unit 03

Co-patentee after: Xiamen yunzhixin Intelligent Technology Co., Ltd

Patentee after: YUNZHISHENG (SHANGHAI) INTELLIGENT TECHNOLOGY Co.,Ltd.

Address before: 200233 Shanghai City, Xuhui District Guangxi 65 No. 1 Jinglu room 702 unit 03

Patentee before: YUNZHISHENG (SHANGHAI) INTELLIGENT TECHNOLOGY Co.,Ltd.

TR01 Transfer of patent right