CN105096941A - Voice recognition method and device - Google Patents

Voice recognition method and device Download PDF

Info

Publication number
CN105096941A
CN105096941A CN201510558047.4A CN201510558047A CN105096941A CN 105096941 A CN105096941 A CN 105096941A CN 201510558047 A CN201510558047 A CN 201510558047A CN 105096941 A CN105096941 A CN 105096941A
Authority
CN
China
Prior art keywords
speaker
acoustic model
material information
language material
speech recognition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510558047.4A
Other languages
Chinese (zh)
Other versions
CN105096941B (en
Inventor
杜念冬
邹赛赛
谢延
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Baidu Online Network Technology Beijing Co Ltd
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201510558047.4A priority Critical patent/CN105096941B/en
Publication of CN105096941A publication Critical patent/CN105096941A/en
Application granted granted Critical
Publication of CN105096941B publication Critical patent/CN105096941B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Telephonic Communication Services (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

The present invention discloses a voice recognition method and a device. The method comprises the steps of firstly, acquiring the voice information input by a speaker and acquiring the information of the speaker; secondly, judging the existence/absence of a personal acoustic model corresponding to the speaker according to the information of the speaker; thirdly, upon judging the existence of the personal acoustic model, acquiring the personal acoustic model and conducting the voice recognition on the voice information according to the personal acoustic model of the speaker; fourthly, upon judging the absence of the personal acoustic model, conducting the voice recognition on the voice information according to a basic acoustic model, generating the corpus information of the speaker according to the voice information, and storing the corpus information; fifthly, generating a personal acoustic model for the speaker based on the basic acoustic model and the stored corpus information. Based on the method, acoustic models can be customized for all speakers based on the characteristics of the speakers during the self-adaptive voice recognition process of the speakers. Therefore, the recognition accuracy for each speaker is improved and the user experience is improved.

Description

Audio recognition method and device
Technical field
The present invention relates to technical field of voice recognition, particularly relate to a kind of audio recognition method and device.
Background technology
In recent years, speech recognition technology development is comparatively rapid, and after particularly deep neural network is applied to speech recognition, speech recognition performance obtains and increases substantially.Along with the development of mobile Internet, phonetic entry mode is more and more general, and voice use crowd also more and more extensive.Therefore, the accuracy how improving speech recognition becomes problem demanding prompt solution.
In correlation technique, speech recognition process, mainly through a large amount of voice training, to obtain acoustic model and language model, then carries out speech recognition by this acoustic model and language model to the speech data that speaker inputs.Can find out, training sample is larger, and degree of accuracy is higher, trains the acoustic model effect obtained better, thus improves the accuracy of speech recognition.
But Problems existing is, in the process of above-mentioned speech recognition, have employed a large amount of speech samples, acoustic model is constructed in training, this models applying is in the speech recognition process of all speakers, and the heavier or unclear speaker that talks for dialectal accent, may can not identify the content of this speaker input well by above-mentioned voice recognition mode, reduce the recognition accuracy of this acoustic model, Consumer's Experience is deteriorated.
Summary of the invention
Object of the present invention is intended to solve one of above-mentioned technical matters at least to a certain extent.
For this reason, first object of the present invention is to propose a kind of audio recognition method.The method for the feature of each speaker, can customize their acoustic model based on the speech recognition process of speaker adaptation, thus improves the accuracy of each speaker, improves Consumer's Experience.
Second object of the present invention is to propose a kind of speech recognition equipment.
To achieve these goals, the audio recognition method of first aspect present invention embodiment, comprising: the voice messaging obtaining speaker's input, and obtains the speaker information of described speaker; Judge whether to there is the individual acoustic model corresponding with described speaker according to described speaker information; If existed, then obtain described individual acoustic model, and according to the individual acoustic model of described speaker, speech recognition is carried out to described voice messaging; If there is no, then according to basic acoustic model, speech recognition is carried out to described voice messaging, and generate the language material information of described speaker according to described voice messaging and store; And the individual acoustic model of described speaker is generated according to the language material information of described basic acoustic model and storage.
The audio recognition method of the embodiment of the present invention, first can obtain the voice messaging of speaker's input, and obtain the speaker information of speaker, afterwards, can judge whether to there is the individual acoustic model corresponding with speaker according to speaker information, if exist, then obtain individual acoustic model, and according to the individual acoustic model of speaker, speech recognition is carried out to voice messaging, if do not exist, then according to basic acoustic model, speech recognition is carried out to voice messaging, and generate the language material information of speaker according to voice messaging and store, and the individual acoustic model of speaker is generated according to the language material information of basic acoustic model and storage, namely on acoustic model (the namely above-mentioned basic acoustic model) basis that speaker has nothing to do, the history speech data of given speaker is utilized to train further, obtain the individual acoustic model of this speaker's own characteristic, the individual acoustic model of this speaker is used to identify at speech recognition process, thus everyone speech discrimination accuracy can be improved, be equivalent to like this provide private customized speech-recognition services to the user of all speech recognitions, thus improve Consumer's Experience.
To achieve these goals, the speech recognition equipment of second aspect present invention embodiment, comprising: the first acquisition module, for obtaining the voice messaging of speaker's input, and obtains the speaker information of described speaker; , there is the individual acoustic model corresponding with described speaker for judging whether according to described speaker information in judge module; Sound identification module, during for judging to there is described individual acoustic model at described judge module, obtain described individual acoustic model, and according to the individual acoustic model of described speaker, speech recognition is carried out to described voice messaging, and when described judge module judges there is not described individual acoustic model, according to basic acoustic model, speech recognition is carried out to described voice messaging; First generation module, for generating the language material information of described speaker according to described voice messaging and storing; And second generation module, for generating the individual acoustic model of described speaker according to the language material information of described basic acoustic model and storage.
The speech recognition equipment of the embodiment of the present invention, the voice messaging of speaker's input is obtained by the first acquisition module, and obtain the speaker information of speaker, judge module judges whether to there is the individual acoustic model corresponding with speaker according to speaker information, if exist, sound identification module then obtains individual acoustic model, and according to the individual acoustic model of speaker, speech recognition is carried out to voice messaging, if do not exist, sound identification module then carries out speech recognition according to basic acoustic model to voice messaging, first generation module generates the language material information of speaker according to voice messaging and stores, second generation module generates the individual acoustic model of speaker according to the language material information of basic acoustic model and storage, namely on acoustic model (the namely above-mentioned basic acoustic model) basis that speaker has nothing to do, the history speech data of given speaker is utilized to train further, obtain the individual acoustic model of this speaker's own characteristic, the individual acoustic model of this speaker is used to identify at speech recognition process, thus everyone speech discrimination accuracy can be improved, be equivalent to like this provide private customized speech-recognition services to the user of all speech recognitions, thus improve Consumer's Experience.
Additional aspect of the present invention and advantage will part provide in the following description, and part will become obvious from the following description, or be recognized by practice of the present invention.
Accompanying drawing explanation
Above-mentioned and/or additional aspect of the present invention and advantage will become obvious and easy understand from accompanying drawing below combining to the description of embodiment, wherein:
Fig. 1 is the process flow diagram of audio recognition method according to an embodiment of the invention;
Fig. 2 is the process flow diagram generating individual acoustic model according to an embodiment of the invention;
Fig. 3 is the process flow diagram of audio recognition method in accordance with another embodiment of the present invention;
Fig. 4 is the structured flowchart of speech recognition equipment according to an embodiment of the invention;
Fig. 5 is the structured flowchart of the second generation module according to an embodiment of the invention; And
Fig. 6 is the structured flowchart of speech recognition equipment in accordance with another embodiment of the present invention.
Embodiment
Be described below in detail embodiments of the invention, the example of described embodiment is shown in the drawings, and wherein same or similar label represents same or similar element or has element that is identical or similar functions from start to finish.Be exemplary below by the embodiment be described with reference to the drawings, be intended to for explaining the present invention, and can not limitation of the present invention be interpreted as.
Below in conjunction with accompanying drawing description according to the audio recognition method of the embodiment of the present invention and device.
It should be noted that, speech recognition refers to, by machine, the voice of people is automatically converted to corresponding text.In recent years, speech recognition technology development is comparatively rapid, and after particularly deep neural network is applied to speech recognition, speech recognition system performance obtains and increases substantially.Along with the development of mobile Internet, phonetic entry mode is more and more general, and voice use crowd also more and more extensive.Pronunciation due to each user has its respective acoustic characteristic, if can utilize this feature in identifying, must bring the lifting of recognition system further.
For this reason, the present invention proposes a kind of audio recognition method, comprising: the voice messaging obtaining speaker's input, and obtain the speaker information of speaker; Judge whether to there is the individual acoustic model corresponding with speaker according to speaker information; If existed, then obtain individual acoustic model, and according to the individual acoustic model of speaker, speech recognition is carried out to voice messaging; If there is no, then according to basic acoustic model, speech recognition is carried out to voice messaging, and generate the language material information of speaker according to voice messaging and store; And the individual acoustic model of speaker is generated according to the language material information of basic acoustic model and storage.
Fig. 1 is the process flow diagram of audio recognition method according to an embodiment of the invention.As shown in Figure 1, this audio recognition method can comprise:
S101, obtains the voice messaging of speaker's input, and obtains the speaker information of speaker.
It should be noted that, in an embodiment of the present invention, speaker information can be ID corresponding to speaker (IDentity, identify label number).It is the identifier that speaker distributes that this speaker ID can be server, and the vocal print feature of this speaker ID and speaker has one-to-one relationship.
Particularly, the voice messaging of speaker's input is collected by the microphone in terminal, and vocal print feature can be extracted to this voice messaging, speaker information corresponding to this vocal print feature (speaker ID etc.) can be obtained according to vocal print feature and the corresponding relation of speaker information afterwards.
Be appreciated that in another embodiment of the present invention, speaker information can also be the ID or MAC Address etc. of the terminal that speaker uses.That is, in this step, after the voice messaging obtaining speaker's input, the ID of the terminal that this speaker uses or MAC Address etc. can also be obtained.
S102, judges whether to there is the individual acoustic model corresponding with speaker according to speaker information.
Wherein, in an embodiment of the present invention, individual acoustic model can be regarded as the acoustic model of speaker oneself, and this individual acoustic model can comprise the characteristic voice of speaker.
S103, if existed, then obtains individual acoustic model, and carries out speech recognition according to the individual acoustic model of speaker to voice messaging.
Particularly, when judging to there is the individual acoustic model corresponding with speaker, this individual acoustic model can be obtained, this individual acoustic model and voice messaging can be sent to demoder afterwards, wherein this demoder can be arranged in server, this demoder first can carry out feature extraction to obtain corresponding acoustic feature to this voice messaging, then this acoustic feature can be carried out coupling with individual acoustic model according to certain criterion and compares to obtain recognition result.
S104, if there is no, then carries out speech recognition according to basic acoustic model to voice messaging, and generates the language material information of speaker according to voice messaging and store.
Wherein, in an embodiment of the present invention, basic acoustic model can be regarded as the model being applicable to popular speaker, namely by collecting the speech data of multiple speaker, and acoustic model voice training being carried out to this speech data and obtains.
That is, when judging there is not the individual acoustic model corresponding with speaker, basic acoustic model and voice messaging can be sent to demoder to realize the speech recognition to this voice messaging, the voice messaging that this can be inputted afterwards is as the language material information of this speaker, and store, with history of forming language material information.
S105, generates the individual acoustic model of speaker according to the language material information of basic acoustic model and storage.
Specifically, in an embodiment of the present invention, as shown in Figure 2, the specific implementation process generating individual acoustic model can comprise the steps:
S201, judges whether the quantity of the language material information stored reaches predetermined threshold value.
Particularly, judge whether the quantity of the language material information stored saves bit by bit some (as predetermined threshold value).
S202, if reach predetermined threshold value, then screens to obtain corresponding effective language material information to the language material information stored.
Specifically, in an embodiment of the present invention, when judging that the language material information stored saves bit by bit pre-set threshold numbers, first can obtain the screening parameter of the language material information that every bar stores, and according to screening parameter, the language material information that every bar stores being marked.Afterwards, the mark of the language material information that every bar stores can be generated according to the weight of appraisal result and screening parameter.Finally, the mark of the language material information stored according to every bar screens to obtain corresponding effective language material information to the language material information stored.Wherein, in an embodiment of the present invention, screening parameter can include but not limited to degree of confidence, speech energy, voice length and speech recognition content etc.
More specifically, when the language material information accumulation of speaker is to after some, screening strength can be carried out to the history language material information stored, the language material information that can store every bar carries out confidence calculations respectively, speech energy calculates, voice length computation, speech recognition content calculating etc., can give a mark respectively to result of calculation afterwards, then the final mark of the language material information that every bar stores is gone out according to the weight calculation of give a mark result and each screening parameter, finally, the language material information that the low every bar of mark stores can be filtered out, and using language material information high for remaining mark as effective language material information.
S203, carries out the judgement of speaker's primary and secondary to filter out the effective language material information belonging to same speaker according to every bar effective language material information.
Particularly, can analyze all effective language material information, as the signal to noise ratio (S/N ratio) by calculating every bar language material information, phonetic feature and speaker's sex situation carry out primary and secondary user judgement, if when finding that current speaker comprises multiple nature person, screen to filter out by information such as phonetic feature and speaker's sex situations the effective language material information belonging to same speaker to all effective language material information.
It should be noted that, in another embodiment of the present invention, after the signal to noise ratio (S/N ratio) by calculating every bar language material information, phonetic feature and speaker's sex situation carry out primary and secondary user judgement, if when finding that current speaker comprises multiple nature person, also this current language material information can be rejected, namely reject the language material information including multiple nature person, model training is not carried out to this language material information.
S204, carries out model training to generate the individual acoustic model of speaker according to basic acoustic model to the effective language material information belonging to same speaker.
Specifically, in an embodiment of the present invention, first can carry out acoustic feature extraction to the effective language material information belonging to same speaker, and the speech recognition content corresponding to effective language material information of same speaker is analyzed, to obtain the word level recognition result of high confidence level and corresponding acoustic feature, afterwards, can random initializtion individual acoustic model, and carry out gradient calculation according to basic acoustic model, word level recognition result and acoustic feature, finally, iteration can be carried out to generate individual acoustic model to the gradient after calculating.
That is, language material information can be used to carry out acoustic feature extraction, speech recognition content is analyzed simultaneously, retain the word level recognition result of high confidence level and corresponding acoustic feature, afterwards, can random initializtion individual acoustic model, and the acoustic feature utilizing previous step to retain in conjunction with basic acoustic model and word level recognition result carry out gradient calculation, finally, the gradient obtained is used to carry out iteration optimization individual acoustic model.
Thus, by carrying out model training for the language material information of each speaker to obtain corresponding individual acoustic model, this individual acoustic model comprises the characteristic voice of user, can be used for improving voice accuracy at speech recognition process.
The audio recognition method of the embodiment of the present invention, first can obtain the voice messaging of speaker's input, and obtain the speaker information of speaker, afterwards, can judge whether to there is the individual acoustic model corresponding with speaker according to speaker information, if exist, then obtain individual acoustic model, and according to the individual acoustic model of speaker, speech recognition is carried out to voice messaging, if do not exist, then according to basic acoustic model, speech recognition is carried out to voice messaging, and generate the language material information of speaker according to voice messaging and store, and the individual acoustic model of speaker is generated according to the language material information of basic acoustic model and storage, namely on acoustic model (the namely above-mentioned basic acoustic model) basis that speaker has nothing to do, the history speech data of given speaker is utilized to train further, obtain the individual acoustic model of this speaker's own characteristic, the individual acoustic model of this speaker is used to identify at speech recognition process, thus everyone speech discrimination accuracy can be improved, be equivalent to like this provide private customized speech-recognition services to the user of all speech recognitions, thus improve Consumer's Experience.
Fig. 3 is the process flow diagram of audio recognition method in accordance with another embodiment of the present invention.
In order to improve the accuracy of speech recognition further, in an embodiment of the present invention, also model optimization can be carried out to individual acoustic model.Particularly, as shown in Figure 3, this audio recognition method can comprise:
S301, obtains the voice messaging of speaker's input, and obtains the speaker information of speaker.
S302, judges whether to there is the individual acoustic model corresponding with speaker according to speaker information.
S303, if existed, then obtains individual acoustic model, and carries out speech recognition according to the individual acoustic model of speaker to voice messaging.
S304, carries out model optimization according to voice messaging to individual acoustic model.
Be appreciated that the speech recognition process based on speaker adaptation is transparent to user, user awareness is less than the flow process of adaptive training and self-adapting estimation.Use the increase of the number of times of speech recognition along with user, speech recognition server can by the adaptive model of continuous optimizing user, and the speech discrimination accuracy of user also will improve constantly.As can be seen here, speaker adaptation training is not disposable work, but saving bit by bit along with speaker's language material, constantly repeat adaptive training process.In an embodiment of the present invention, each adaptive training process can be optimized based on individual acoustic model before, to improve constantly speech discrimination accuracy.
S305, if there is no, then carries out speech recognition according to basic acoustic model to voice messaging, and generates the language material information of speaker according to voice messaging and store.
S306, generates the individual acoustic model of speaker according to the language material information of basic acoustic model and storage.
The audio recognition method of the embodiment of the present invention, after according to the individual acoustic model of speaker speech recognition being carried out to voice messaging, also can carry out model optimization according to voice messaging to individual acoustic model, the accuracy of individual speech recognition can be improved so further.
In order to realize above-described embodiment, the invention allows for a kind of speech recognition equipment.
Fig. 4 is the structured flowchart of speech recognition equipment according to an embodiment of the invention.As shown in Figure 4, this speech recognition equipment can comprise: the first acquisition module 10, judge module 20, sound identification module 30, first generation module 40 and the second generation module 50.
Particularly, the first acquisition module 10 can be used for the voice messaging obtaining speaker's input, and obtains the speaker information of speaker.It should be noted that, in an embodiment of the present invention, speaker information can be ID corresponding to speaker.It is the identifier that speaker distributes that this speaker ID can be server, and the vocal print feature of this speaker ID and speaker has one-to-one relationship.
More specifically, first acquisition module 10 collects the voice messaging of speaker's input by the microphone in terminal, and vocal print feature can be extracted to this voice messaging, speaker information corresponding to this vocal print feature (speaker ID etc.) can be obtained according to vocal print feature and the corresponding relation of speaker information afterwards.
Be appreciated that in another embodiment of the present invention, speaker information can also be the ID or MAC Address etc. of the terminal that speaker uses.That is, the first acquisition module 10, after the voice messaging obtaining speaker's input, also can obtain the ID of the terminal that this speaker uses or MAC Address etc.
Judge module 20 can be used for judging whether to there is the individual acoustic model corresponding with speaker according to speaker information.Wherein, in an embodiment of the present invention, individual acoustic model can be regarded as the acoustic model of speaker oneself, and this individual acoustic model can comprise the characteristic voice of speaker.
Sound identification module 30 is used in judge module 20 and judges when there is individual acoustic model, obtain individual acoustic model, and according to the individual acoustic model of speaker, speech recognition is carried out to voice messaging, and when judge module 20 judges to there is not individual acoustic model, according to basic acoustic model, speech recognition is carried out to voice messaging.
More specifically, when judge module 20 judges to there is the individual acoustic model corresponding with speaker, sound identification module 30 can obtain this individual acoustic model, first can carry out feature extraction to obtain corresponding acoustic feature to this voice messaging afterwards, then this acoustic feature can be carried out coupling with individual acoustic model according to certain criterion and compares to obtain recognition result.
Wherein, in an embodiment of the present invention, basic acoustic model can be regarded as the model being applicable to popular speaker, namely by collecting the speech data of multiple speaker, and acoustic model voice training being carried out to this speech data and obtains.
First generation module 40 can be used for generating the language material information of speaker according to voice messaging and storing.More specifically, after sound identification module 30 carries out speech recognition according to basic acoustic model to voice messaging, the voice messaging that this can input by the first generation module 40 as the language material information of this speaker, and stores, with history of forming language material information.
Second generation module 50 can be used for the individual acoustic model generating speaker according to the language material information of basic acoustic model and storage.Specifically, in one embodiment of the invention, as shown in Figure 5, this second generation module 50 can comprise: judging unit 51, first screens unit 52, second and screens unit 53 and generation unit 54.Particularly, judging unit 51 can be used for judging whether the quantity of the language material information stored reaches predetermined threshold value.More specifically, judging unit 51 can judge whether the quantity of the language material information stored saves bit by bit some (as predetermined threshold value).
First screening unit 52 is used in judging unit 51 when judging that the quantity of language material information stored reaches predetermined threshold value, screens to obtain corresponding effective language material information to the language material information stored.Specifically, in an embodiment of the present invention, first screening unit 52 first can obtain the screening parameter of the language material information that every bar stores, and according to screening parameter, the language material information that every bar stores is marked, afterwards, generate the mark of the language material information that every bar stores according to the weight of appraisal result and screening parameter, and the mark of the language material information stored according to every bar screens to obtain corresponding effective language material information to the language material information stored.Wherein, in an embodiment of the present invention, screening parameter can include but not limited to degree of confidence, speech energy, voice length and speech recognition content.
More specifically, when the language material information accumulation of speaker is to after some, first screening unit 52 can carry out screening strength to the history language material information stored, the language material information that can store every bar carries out confidence calculations respectively, speech energy calculates, voice length computation, speech recognition content calculating etc., can give a mark respectively to result of calculation afterwards, then the final mark of the language material information that every bar stores is gone out according to the weight calculation of give a mark result and each screening parameter, finally, the language material information that the low every bar of mark stores can be filtered out, and using language material information high for remaining mark as effective language material information.
Second screening unit 53 can be used for carrying out the judgement of speaker's primary and secondary to filter out the effective language material information belonging to same speaker according to every bar effective language material information.More specifically, second screening unit 53 can be analyzed all effective language material information, as the signal to noise ratio (S/N ratio) by calculating every bar language material information, phonetic feature and speaker's sex situation carry out primary and secondary user judgement, if when finding that current speaker comprises multiple nature person, screen to filter out by information such as phonetic feature and speaker's sex situations the effective language material information belonging to same speaker to all effective language material information.
It should be noted that, in another embodiment of the present invention, second screening unit 53 is after the signal to noise ratio (S/N ratio) by calculating every bar language material information, phonetic feature and speaker's sex situation carry out primary and secondary user judgement, if when finding that current speaker comprises multiple nature person, also this current language material information can be rejected, namely reject the language material information including multiple nature person, model training is not carried out to this language material information.
Generation unit 54 can be used for carrying out model training to generate the individual acoustic model of speaker according to basic acoustic model to the effective language material information belonging to same speaker.Specifically, in an embodiment of the present invention, generation unit 54 first can carry out acoustic feature extraction to the effective language material information belonging to same speaker, and the speech recognition content corresponding to effective language material information of same speaker is analyzed, to obtain the word level recognition result of high confidence level and corresponding acoustic feature, afterwards, can random initializtion individual acoustic model, and carry out gradient calculation according to basic acoustic model, word level recognition result and acoustic feature, finally, iteration can be carried out to generate individual acoustic model to the gradient after calculating.
That is, generation unit 54 can use language material information to carry out acoustic feature extraction, speech recognition content is analyzed simultaneously, retain the word level recognition result of high confidence level and corresponding acoustic feature, afterwards, can random initializtion individual acoustic model, and the acoustic feature utilizing previous step to retain in conjunction with basic acoustic model and word level recognition result carry out gradient calculation, finally, the gradient obtained is used to carry out iteration optimization individual acoustic model.
Thus, by carrying out model training for the language material information of each speaker to obtain corresponding individual acoustic model, this individual acoustic model comprises the characteristic voice of user, can be used for improving voice accuracy at speech recognition process.
Further, in one embodiment of the invention, as shown in Figure 6, this speech recognition equipment also can comprise optimizes module 60, optimize after module 60 is used in and carries out speech recognition according to the individual acoustic model of speaker to voice messaging, according to voice messaging, model optimization is carried out to individual acoustic model.Be appreciated that the speech recognition process based on speaker adaptation is transparent to user, user awareness is less than the flow process of adaptive training and self-adapting estimation.Use the increase of the number of times of speech recognition along with user, speech recognition server can by the adaptive model of continuous optimizing user, and the speech discrimination accuracy of user also will improve constantly.As can be seen here, speaker adaptation training is not disposable work, but saving bit by bit along with speaker's language material, constantly repeat adaptive training process.In an embodiment of the present invention, optimize module 60 to be optimized based on individual acoustic model before in each adaptive training process, to improve constantly speech discrimination accuracy.Thus, the accuracy of individual speech recognition can be improved so further.
The speech recognition equipment of the embodiment of the present invention, the voice messaging of speaker's input is obtained by the first acquisition module, and obtain the speaker information of speaker, judge module judges whether to there is the individual acoustic model corresponding with speaker according to speaker information, if exist, sound identification module then obtains individual acoustic model, and according to the individual acoustic model of speaker, speech recognition is carried out to voice messaging, if do not exist, sound identification module then carries out speech recognition according to basic acoustic model to voice messaging, first generation module generates the language material information of speaker according to voice messaging and stores, second generation module generates the individual acoustic model of speaker according to the language material information of basic acoustic model and storage, namely on acoustic model (the namely above-mentioned basic acoustic model) basis that speaker has nothing to do, the history speech data of given speaker is utilized to train further, obtain the individual acoustic model of this speaker's own characteristic, the individual acoustic model of this speaker is used to identify at speech recognition process, thus everyone speech discrimination accuracy can be improved, be equivalent to like this provide private customized speech-recognition services to the user of all speech recognitions, thus improve Consumer's Experience.
In describing the invention, it is to be appreciated that term " first ", " second " only for describing object, and can not be interpreted as instruction or hint relative importance or the implicit quantity indicating indicated technical characteristic.Thus, be limited with " first ", the feature of " second " can express or impliedly comprise at least one this feature.In describing the invention, the implication of " multiple " is at least two, such as two, three etc., unless otherwise expressly limited specifically.
In the description of this instructions, specific features, structure, material or feature that the description of reference term " embodiment ", " some embodiments ", " example ", " concrete example " or " some examples " etc. means to describe in conjunction with this embodiment or example are contained at least one embodiment of the present invention or example.In this manual, to the schematic representation of above-mentioned term not must for be identical embodiment or example.And the specific features of description, structure, material or feature can combine in one or more embodiment in office or example in an appropriate manner.In addition, when not conflicting, the feature of the different embodiment described in this instructions or example and different embodiment or example can carry out combining and combining by those skilled in the art.
Describe and can be understood in process flow diagram or in this any process otherwise described or method, represent and comprise one or more for realizing the module of the code of the executable instruction of the step of specific logical function or process, fragment or part, and the scope of the preferred embodiment of the present invention comprises other realization, wherein can not according to order that is shown or that discuss, comprise according to involved function by the mode while of basic or by contrary order, carry out n-back test, this should understand by embodiments of the invention person of ordinary skill in the field.
In flow charts represent or in this logic otherwise described and/or step, such as, the sequencing list of the executable instruction for realizing logic function can be considered to, may be embodied in any computer-readable medium, for instruction execution system, device or equipment (as computer based system, comprise the system of processor or other can from instruction execution system, device or equipment instruction fetch and perform the system of instruction) use, or to use in conjunction with these instruction execution systems, device or equipment.With regard to this instructions, " computer-readable medium " can be anyly can to comprise, store, communicate, propagate or transmission procedure for instruction execution system, device or equipment or the device that uses in conjunction with these instruction execution systems, device or equipment.The example more specifically (non-exhaustive list) of computer-readable medium comprises following: the electrical connection section (electronic installation) with one or more wiring, portable computer diskette box (magnetic device), random access memory (RAM), ROM (read-only memory) (ROM), erasablely edit ROM (read-only memory) (EPROM or flash memory), fiber device, and portable optic disk ROM (read-only memory) (CDROM).In addition, computer-readable medium can be even paper or other suitable media that can print described program thereon, because can such as by carrying out optical scanning to paper or other media, then carry out editing, decipher or carry out process with other suitable methods if desired and electronically obtain described program, be then stored in computer memory.
Should be appreciated that each several part of the present invention can realize with hardware, software, firmware or their combination.In the above-described embodiment, multiple step or method can with to store in memory and the software performed by suitable instruction execution system or firmware realize.Such as, if realized with hardware, the same in another embodiment, can realize by any one in following technology well known in the art or their combination: the discrete logic with the logic gates for realizing logic function to data-signal, there is the special IC of suitable combinational logic gate circuit, programmable gate array (PGA), field programmable gate array (FPGA) etc.
Those skilled in the art are appreciated that realizing all or part of step that above-described embodiment method carries is that the hardware that can carry out instruction relevant by program completes, described program can be stored in a kind of computer-readable recording medium, this program perform time, step comprising embodiment of the method one or a combination set of.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing module, also can be that the independent physics of unit exists, also can be integrated in a module by two or more unit.Above-mentioned integrated module both can adopt the form of hardware to realize, and the form of software function module also can be adopted to realize.If described integrated module using the form of software function module realize and as independently production marketing or use time, also can be stored in a computer read/write memory medium.
The above-mentioned storage medium mentioned can be ROM (read-only memory), disk or CD etc.Although illustrate and describe embodiments of the invention above, be understandable that, above-described embodiment is exemplary, can not be interpreted as limitation of the present invention, and those of ordinary skill in the art can change above-described embodiment within the scope of the invention, revises, replace and modification.

Claims (12)

1. an audio recognition method, is characterized in that, comprises the following steps:
Obtain the voice messaging of speaker's input, and obtain the speaker information of described speaker;
Judge whether to there is the individual acoustic model corresponding with described speaker according to described speaker information;
If existed, then obtain described individual acoustic model, and according to the individual acoustic model of described speaker, speech recognition is carried out to described voice messaging;
If there is no, then according to basic acoustic model, speech recognition is carried out to described voice messaging, and generate the language material information of described speaker according to described voice messaging and store; And
The individual acoustic model of described speaker is generated according to the language material information of described basic acoustic model and storage.
2. audio recognition method as claimed in claim 1, is characterized in that, generates the individual acoustic model of described speaker, specifically comprise according to the language material information of described basic acoustic model and storage:
Judge whether the quantity of the language material information stored reaches predetermined threshold value;
If reach predetermined threshold value, then screen to obtain corresponding effective language material information to the language material information of described storage;
The judgement of speaker's primary and secondary is carried out to filter out the effective language material information belonging to same speaker according to every bar effective language material information; And
According to described basic acoustic model, model training is carried out to generate the individual acoustic model of described speaker to the described effective language material information belonging to same speaker.
3. audio recognition method as claimed in claim 2, is characterized in that, screens to obtain corresponding effective language material information, specifically comprise the language material information of described storage:
Obtain the screening parameter of the language material information that every bar stores, and according to described screening parameter, the language material information that described every bar stores is marked;
The mark of the language material information that described every bar stores is generated according to the weight of appraisal result and described screening parameter;
The language material information of mark to described storage according to the language material information of described every bar storage screens to obtain corresponding effective language material information.
4. audio recognition method as claimed in claim 3, it is characterized in that, wherein, described screening parameter comprises degree of confidence, speech energy, voice length and speech recognition content.
5. audio recognition method as claimed in claim 2, is characterized in that, carry out model training to generate the individual acoustic model of described speaker, specifically comprise according to described basic acoustic model to the described effective language material information belonging to same speaker:
Acoustic feature extraction is carried out to the described effective language material information belonging to same speaker, and the speech recognition content corresponding to effective language material information of described same speaker is analyzed, to obtain the word level recognition result of high confidence level and corresponding acoustic feature;
Individual acoustic model described in random initializtion, and carry out gradient calculation according to described basic acoustic model, institute's predicate level recognition result and described acoustic feature; And
Iteration is carried out to generate described individual acoustic model to the gradient after calculating.
6. the audio recognition method according to any one of claim 1 to 5, is characterized in that, after carrying out speech recognition according to the individual acoustic model of described speaker to described voice messaging, described method also comprises:
According to described voice messaging, model optimization is carried out to described individual acoustic model.
7. a speech recognition equipment, is characterized in that, comprising:
First acquisition module, for obtaining the voice messaging of speaker's input, and obtains the speaker information of described speaker;
, there is the individual acoustic model corresponding with described speaker for judging whether according to described speaker information in judge module;
Sound identification module, during for judging to there is described individual acoustic model at described judge module, obtain described individual acoustic model, and according to the individual acoustic model of described speaker, speech recognition is carried out to described voice messaging, and when described judge module judges there is not described individual acoustic model, according to basic acoustic model, speech recognition is carried out to described voice messaging;
First generation module, for generating the language material information of described speaker according to described voice messaging and storing; And
Second generation module, for generating the individual acoustic model of described speaker according to the language material information of described basic acoustic model and storage.
8. speech recognition equipment as claimed in claim 7, it is characterized in that, described second generation module comprises:
Judging unit, for judging whether the quantity of the language material information stored reaches predetermined threshold value;
First screening unit, when the quantity for the language material information judging described storage at described judging unit reaches described predetermined threshold value, screens to obtain corresponding effective language material information to the language material information of described storage;
Second screening unit, for carrying out the judgement of speaker's primary and secondary to filter out the effective language material information belonging to same speaker according to every bar effective language material information; And
Generation unit, for carrying out model training to generate the individual acoustic model of described speaker according to described basic acoustic model to the described effective language material information belonging to same speaker.
9. speech recognition equipment as claimed in claim 8, is characterized in that, described first screening unit specifically for:
Obtain the screening parameter of the language material information that every bar stores, and according to described screening parameter, the language material information that described every bar stores is marked;
The mark of the language material information that described every bar stores is generated according to the weight of appraisal result and described screening parameter;
The language material information of mark to described storage according to the language material information of described every bar storage screens to obtain corresponding effective language material information.
10. speech recognition equipment as claimed in claim 9, it is characterized in that, wherein, described screening parameter comprises degree of confidence, speech energy, voice length and speech recognition content.
11. speech recognition equipments as claimed in claim 8, is characterized in that, described generation unit specifically for:
Acoustic feature extraction is carried out to the described effective language material information belonging to same speaker, and the speech recognition content corresponding to effective language material information of described same speaker is analyzed, to obtain the word level recognition result of high confidence level and corresponding acoustic feature;
Individual acoustic model described in random initializtion, and carry out gradient calculation according to described basic acoustic model, institute's predicate level recognition result and described acoustic feature; And
Iteration is carried out to generate described individual acoustic model to the gradient after calculating.
12. speech recognition equipments according to any one of claim 7 to 11, is characterized in that, also comprise:
Optimize module, for after carrying out speech recognition according to the individual acoustic model of described speaker to described voice messaging, according to described voice messaging, model optimization is carried out to described individual acoustic model.
CN201510558047.4A 2015-09-02 2015-09-02 Audio recognition method and device Active CN105096941B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510558047.4A CN105096941B (en) 2015-09-02 2015-09-02 Audio recognition method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510558047.4A CN105096941B (en) 2015-09-02 2015-09-02 Audio recognition method and device

Publications (2)

Publication Number Publication Date
CN105096941A true CN105096941A (en) 2015-11-25
CN105096941B CN105096941B (en) 2017-10-31

Family

ID=54577227

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510558047.4A Active CN105096941B (en) 2015-09-02 2015-09-02 Audio recognition method and device

Country Status (1)

Country Link
CN (1) CN105096941B (en)

Cited By (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105632489A (en) * 2016-01-20 2016-06-01 曾戟 Voice playing method and voice playing device
CN106683680A (en) * 2017-03-10 2017-05-17 百度在线网络技术(北京)有限公司 Speaker recognition method and device and computer equipment and computer readable media
CN107481718A (en) * 2017-09-20 2017-12-15 广东欧珀移动通信有限公司 Audio recognition method, device, storage medium and electronic equipment
WO2017219991A1 (en) * 2016-06-23 2017-12-28 华为技术有限公司 Optimization method and apparatus suitable for model of pattern recognition, and terminal device
CN107702706A (en) * 2017-09-20 2018-02-16 广东欧珀移动通信有限公司 Determining method of path, device, storage medium and mobile terminal
CN107767880A (en) * 2016-08-16 2018-03-06 杭州萤石网络有限公司 A kind of speech detection method, video camera and smart home nursing system
CN107910008A (en) * 2017-11-13 2018-04-13 河海大学 A kind of audio recognition method based on more acoustic models for personal device
CN108538293A (en) * 2018-04-27 2018-09-14 青岛海信电器股份有限公司 Voice awakening method, device and smart machine
CN108717854A (en) * 2018-05-08 2018-10-30 哈尔滨理工大学 Method for distinguishing speek person based on optimization GFCC characteristic parameters
CN108737324A (en) * 2017-04-13 2018-11-02 腾讯科技(深圳)有限公司 Generate the method, apparatus and relevant device, system of artificial intelligence serviced component
CN109215638A (en) * 2018-10-19 2019-01-15 珠海格力电器股份有限公司 A kind of phonetic study method, apparatus, speech ciphering equipment and storage medium
CN109243468A (en) * 2018-11-14 2019-01-18 北京羽扇智信息科技有限公司 Audio recognition method, device, electronic equipment and storage medium
CN109635209A (en) * 2018-12-12 2019-04-16 广东小天才科技有限公司 A kind of learning Content recommended method and private tutor's equipment
CN110059059A (en) * 2019-03-15 2019-07-26 平安科技(深圳)有限公司 Batch screening technique, device, computer equipment and the storage medium of voice messaging
CN110096479A (en) * 2019-03-15 2019-08-06 平安科技(深圳)有限公司 Batch renaming method, apparatus, computer equipment and the storage medium of voice messaging
CN110120221A (en) * 2019-06-06 2019-08-13 上海蔚来汽车有限公司 The offline audio recognition method of user individual and its system for vehicle system
CN110264997A (en) * 2019-05-30 2019-09-20 北京百度网讯科技有限公司 The method, apparatus and storage medium of voice punctuate
CN110288995A (en) * 2019-07-19 2019-09-27 出门问问(苏州)信息科技有限公司 Exchange method, device, storage medium and electronic equipment based on speech recognition
CN110364162A (en) * 2018-11-15 2019-10-22 腾讯科技(深圳)有限公司 A kind of remapping method and device, storage medium of artificial intelligence
CN110570843A (en) * 2019-06-28 2019-12-13 北京蓦然认知科技有限公司 user voice recognition method and device
CN110634472A (en) * 2018-06-21 2019-12-31 中兴通讯股份有限公司 Voice recognition method, server and computer readable storage medium
WO2020001546A1 (en) * 2018-06-30 2020-01-02 华为技术有限公司 Method, device, and system for speech recognition
CN111081262A (en) * 2019-12-30 2020-04-28 杭州中科先进技术研究院有限公司 Lightweight speech recognition system and method based on customized model
CN111261168A (en) * 2020-01-21 2020-06-09 杭州中科先进技术研究院有限公司 Speech recognition engine and method supporting multi-task and multi-model
CN111292746A (en) * 2020-02-07 2020-06-16 普强时代(珠海横琴)信息技术有限公司 Voice input conversion system based on human-computer interaction
CN111785275A (en) * 2020-06-30 2020-10-16 北京捷通华声科技股份有限公司 Voice recognition method and device
CN111816174A (en) * 2020-06-24 2020-10-23 北京小米松果电子有限公司 Speech recognition method, device and computer readable storage medium
CN113096646A (en) * 2019-12-20 2021-07-09 北京世纪好未来教育科技有限公司 Audio recognition method and device, electronic equipment and storage medium
CN113823263A (en) * 2020-06-19 2021-12-21 深圳Tcl新技术有限公司 Voice recognition method and system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1655235A (en) * 2004-02-12 2005-08-17 微软公司 Automatic identification of telephone callers based on voice characteristics
US20070198257A1 (en) * 2006-02-20 2007-08-23 Microsoft Corporation Speaker authentication
CN103187053A (en) * 2011-12-31 2013-07-03 联想(北京)有限公司 Input method and electronic equipment

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1655235A (en) * 2004-02-12 2005-08-17 微软公司 Automatic identification of telephone callers based on voice characteristics
US20070198257A1 (en) * 2006-02-20 2007-08-23 Microsoft Corporation Speaker authentication
CN103187053A (en) * 2011-12-31 2013-07-03 联想(北京)有限公司 Input method and electronic equipment

Cited By (46)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105632489A (en) * 2016-01-20 2016-06-01 曾戟 Voice playing method and voice playing device
WO2017219991A1 (en) * 2016-06-23 2017-12-28 华为技术有限公司 Optimization method and apparatus suitable for model of pattern recognition, and terminal device
US10825447B2 (en) 2016-06-23 2020-11-03 Huawei Technologies Co., Ltd. Method and apparatus for optimizing model applicable to pattern recognition, and terminal device
CN107767880B (en) * 2016-08-16 2021-04-16 杭州萤石网络有限公司 Voice detection method, camera and intelligent home nursing system
CN107767880A (en) * 2016-08-16 2018-03-06 杭州萤石网络有限公司 A kind of speech detection method, video camera and smart home nursing system
CN106683680A (en) * 2017-03-10 2017-05-17 百度在线网络技术(北京)有限公司 Speaker recognition method and device and computer equipment and computer readable media
CN106683680B (en) * 2017-03-10 2022-03-25 百度在线网络技术(北京)有限公司 Speaker recognition method and device, computer equipment and computer readable medium
CN108737324B (en) * 2017-04-13 2021-03-02 腾讯科技(深圳)有限公司 Method and device for generating artificial intelligence service assembly and related equipment and system
CN108737324A (en) * 2017-04-13 2018-11-02 腾讯科技(深圳)有限公司 Generate the method, apparatus and relevant device, system of artificial intelligence serviced component
CN107481718A (en) * 2017-09-20 2017-12-15 广东欧珀移动通信有限公司 Audio recognition method, device, storage medium and electronic equipment
CN107702706A (en) * 2017-09-20 2018-02-16 广东欧珀移动通信有限公司 Determining method of path, device, storage medium and mobile terminal
CN110310623B (en) * 2017-09-20 2021-12-28 Oppo广东移动通信有限公司 Sample generation method, model training method, device, medium, and electronic apparatus
CN110310623A (en) * 2017-09-20 2019-10-08 Oppo广东移动通信有限公司 Sample generating method, model training method, device, medium and electronic equipment
CN107481718B (en) * 2017-09-20 2019-07-05 Oppo广东移动通信有限公司 Audio recognition method, device, storage medium and electronic equipment
CN107910008B (en) * 2017-11-13 2021-06-11 河海大学 Voice recognition method based on multiple acoustic models for personal equipment
CN107910008A (en) * 2017-11-13 2018-04-13 河海大学 A kind of audio recognition method based on more acoustic models for personal device
CN108538293A (en) * 2018-04-27 2018-09-14 青岛海信电器股份有限公司 Voice awakening method, device and smart machine
CN108717854A (en) * 2018-05-08 2018-10-30 哈尔滨理工大学 Method for distinguishing speek person based on optimization GFCC characteristic parameters
CN110634472A (en) * 2018-06-21 2019-12-31 中兴通讯股份有限公司 Voice recognition method, server and computer readable storage medium
CN110634472B (en) * 2018-06-21 2024-06-04 中兴通讯股份有限公司 Speech recognition method, server and computer readable storage medium
WO2020001546A1 (en) * 2018-06-30 2020-01-02 华为技术有限公司 Method, device, and system for speech recognition
CN110728976A (en) * 2018-06-30 2020-01-24 华为技术有限公司 Method, device and system for voice recognition
CN109215638A (en) * 2018-10-19 2019-01-15 珠海格力电器股份有限公司 A kind of phonetic study method, apparatus, speech ciphering equipment and storage medium
CN109243468A (en) * 2018-11-14 2019-01-18 北京羽扇智信息科技有限公司 Audio recognition method, device, electronic equipment and storage medium
CN110364162A (en) * 2018-11-15 2019-10-22 腾讯科技(深圳)有限公司 A kind of remapping method and device, storage medium of artificial intelligence
CN110364162B (en) * 2018-11-15 2022-03-15 腾讯科技(深圳)有限公司 Artificial intelligence resetting method and device and storage medium
CN109635209A (en) * 2018-12-12 2019-04-16 广东小天才科技有限公司 A kind of learning Content recommended method and private tutor's equipment
CN110059059B (en) * 2019-03-15 2024-04-16 平安科技(深圳)有限公司 Batch screening method and device for voice information, computer equipment and storage medium
CN110096479B (en) * 2019-03-15 2023-07-25 平安科技(深圳)有限公司 Batch renaming method and device for voice information, computer equipment and storage medium
CN110096479A (en) * 2019-03-15 2019-08-06 平安科技(深圳)有限公司 Batch renaming method, apparatus, computer equipment and the storage medium of voice messaging
CN110059059A (en) * 2019-03-15 2019-07-26 平安科技(深圳)有限公司 Batch screening technique, device, computer equipment and the storage medium of voice messaging
CN110264997A (en) * 2019-05-30 2019-09-20 北京百度网讯科技有限公司 The method, apparatus and storage medium of voice punctuate
CN110120221A (en) * 2019-06-06 2019-08-13 上海蔚来汽车有限公司 The offline audio recognition method of user individual and its system for vehicle system
CN110570843B (en) * 2019-06-28 2021-03-05 北京蓦然认知科技有限公司 User voice recognition method and device
CN110570843A (en) * 2019-06-28 2019-12-13 北京蓦然认知科技有限公司 user voice recognition method and device
CN110288995A (en) * 2019-07-19 2019-09-27 出门问问(苏州)信息科技有限公司 Exchange method, device, storage medium and electronic equipment based on speech recognition
CN110288995B (en) * 2019-07-19 2021-07-16 出门问问(苏州)信息科技有限公司 Interaction method and device based on voice recognition, storage medium and electronic equipment
CN113096646B (en) * 2019-12-20 2022-06-07 北京世纪好未来教育科技有限公司 Audio recognition method and device, electronic equipment and storage medium
CN113096646A (en) * 2019-12-20 2021-07-09 北京世纪好未来教育科技有限公司 Audio recognition method and device, electronic equipment and storage medium
CN111081262A (en) * 2019-12-30 2020-04-28 杭州中科先进技术研究院有限公司 Lightweight speech recognition system and method based on customized model
CN111261168A (en) * 2020-01-21 2020-06-09 杭州中科先进技术研究院有限公司 Speech recognition engine and method supporting multi-task and multi-model
CN111292746A (en) * 2020-02-07 2020-06-16 普强时代(珠海横琴)信息技术有限公司 Voice input conversion system based on human-computer interaction
WO2021253779A1 (en) * 2020-06-19 2021-12-23 深圳Tcl新技术有限公司 Speech recognition method and system
CN113823263A (en) * 2020-06-19 2021-12-21 深圳Tcl新技术有限公司 Voice recognition method and system
CN111816174A (en) * 2020-06-24 2020-10-23 北京小米松果电子有限公司 Speech recognition method, device and computer readable storage medium
CN111785275A (en) * 2020-06-30 2020-10-16 北京捷通华声科技股份有限公司 Voice recognition method and device

Also Published As

Publication number Publication date
CN105096941B (en) 2017-10-31

Similar Documents

Publication Publication Date Title
CN105096941A (en) Voice recognition method and device
KR101922776B1 (en) Method and device for voice wake-up
CN108320733B (en) Voice data processing method and device, storage medium and electronic equipment
CN108305641B (en) Method and device for determining emotion information
CN105261357B (en) Sound end detecting method based on statistical model and device
CN108305643B (en) Method and device for determining emotion information
CN104575490B (en) Spoken language pronunciation evaluating method based on deep neural network posterior probability algorithm
CN102982811B (en) Voice endpoint detection method based on real-time decoding
KR102128926B1 (en) Method and device for processing audio information
CN108417201B (en) Single-channel multi-speaker identity recognition method and system
CN108899047B (en) The masking threshold estimation method, apparatus and storage medium of audio signal
CN110136749A (en) The relevant end-to-end speech end-point detecting method of speaker and device
CN106098059A (en) customizable voice awakening method and system
CN105529028A (en) Voice analytical method and apparatus
CN105118502A (en) End point detection method and system of voice identification system
CN104903954A (en) Speaker verification and identification using artificial neural network-based sub-phonetic unit discrimination
CN104036774A (en) Method and system for recognizing Tibetan dialects
CN108269567A (en) For generating the method, apparatus of far field voice data, computing device and computer readable storage medium
CN108364650B (en) Device and method for adjusting voice recognition result
CN107731233A (en) A kind of method for recognizing sound-groove based on RNN
CN104978963A (en) Speech recognition apparatus, method and electronic equipment
CN108986798B (en) Processing method, device and the equipment of voice data
CN104934028A (en) Depth neural network model training method and device used for speech synthesis
CN106504768A (en) Phone testing audio frequency classification method and device based on artificial intelligence
CN108922521A (en) A kind of voice keyword retrieval method, apparatus, equipment and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant