CN103680500A - Speech recognition method and device - Google Patents

Speech recognition method and device Download PDF

Info

Publication number
CN103680500A
CN103680500A CN201210314129.0A CN201210314129A CN103680500A CN 103680500 A CN103680500 A CN 103680500A CN 201210314129 A CN201210314129 A CN 201210314129A CN 103680500 A CN103680500 A CN 103680500A
Authority
CN
China
Prior art keywords
model
sil
hmm
decoding network
context
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201210314129.0A
Other languages
Chinese (zh)
Other versions
CN103680500B (en
Inventor
钱胜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201210314129.0A priority Critical patent/CN103680500B/en
Publication of CN103680500A publication Critical patent/CN103680500A/en
Application granted granted Critical
Publication of CN103680500B publication Critical patent/CN103680500B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention provides a speech recognition method and a speech recognition device, wherein the method comprises the steps that: a context-dependent HMM(hidden markov model) is adopted when a decoding network is trained, a sil (silence) model is added to a suffix in the decoding network, acoustic contexts of the HMM state before and after the sil model are regulated, and the HMM state skip sequence of a to-be-recognized speech is acquired through the decoding network. Furthermore, a skip to the head part of a linguistic model is added at the end of the linguistic model in the decoding network to simulate the influence of a pause between sentences on the context information of the linguistic model. According to the speech recognition method and the speech recognition device, speech recognition effect is improved.

Description

A kind of method and apparatus of speech recognition
[technical field]
The present invention relates to Computer Applied Technology field, particularly a kind of method and apparatus of speech recognition.
[background technology]
Speech recognition technology is to allow machine voice signal be changed into the technology of corresponding text or order by identification and understanding process, the wherein maturation of Hidden Markov Model (HMM) (HMM) technology and the constantly perfect main stream approach that becomes speech recognition.
HMM sets up statistical model to the time series structure of voice signal, an as mathematical dual random process of regarding: one is the implicit stochastic process of coming analog voice signal statistical property to change with the Markov chain with finite state number, another is the stochastic process of the observation sequence that is associated with markovian each state.The former shows by the latter, but the former design parameter is immesurable.In fact people's speech process is exactly a dual random process, and voice signal itself is one and becomes sequence when observable, and HMM has reasonably imitated this process, is comparatively desirable a kind of speech model.
Speech recognition principle based on HMM method is by find out optimum redirect sequence in all possible HMM state transition sequence, using its corresponding text message as recognition result.And decoding network is described all possible HMM state transition, speech recognition is exactly the process of searching best redirect sequence on decoding network, the result of identification must be decoding network can describe a kind of in likely.In identifying, the sequence of HMM state transition is called as path.Only take identification " in " and the simple isolated word of " state " be example, its decoding network as shown in Figure 1, wherein " in " corresponding HMM state transition sequence is " zh ", " ong ", HMM state transition sequence corresponding to " state " is " g ", " uo ", and <s> and </s> are respectively beginning and the terminating symbols of language model.
People, speak in process, speaker often because thinking deeply, hesitate, the reason such as cough, surprised, stutter all can produce pause, in voice signal, pause can be presented as that a period of time do not have sound, or has sound but be not voice sound, but cough or sneeze sound.Be parked in voice, be divided in sentence, pause and sentence between pause, as its name suggests, in sentence, pause and refer to the pause of people in saying process in short, between sentence, pause and refer to that people is when saying many, and between pause.
In existing speech recognition, it is generally acknowledged only quiet having end to end of voice, and in the middle of voice, be not pause, in voice, having like this while pause can be to have semantic word pause wrong identification, and more seriously, because speech recognition is a process of expanding backward according to current state, this mistake can directly have influence on identifying below, causes recognition result to be made another mistake.The key addressing this problem is the correct pause in voice that identifies, and follow-up identifying is carried out backward under correct result; And correctly identify the prerequisite of pausing in voice be all HMM state transitions of description correct in decoding network may.
Existing conventional method is that the suffix in decoding network increases quiet model (sil model), in identifying, run into like this while pausing, sil model can have semantic model competition with other, if sil model is preponderated, is identified as pause (be called again by sil model and absorb).Fig. 2 for increasing the schematic diagram of sil model in decoding network, and in figure, <s> and </s> are respectively beginning and the terminating symbols of language model.
But in actual applications,, because the pause in voice can affect near acoustics pronunciation, the dead time, longer impact was larger, in addition, contextual information for the language model that pauses between sentence can be undergone mutation, and recognition methods of the prior art can not address these problems, and recognition effect is limited.
[summary of the invention]
The invention provides a kind of method and apparatus of speech recognition, so that improve the effect of speech recognition.
Concrete technical scheme is as follows:
A method for speech recognition, the method comprises:
Training adopts context-sensitive Hidden Markov Model (HMM) HMM, the suffix in decoding network to increase quiet sil model during decoding network and adjusts the acoustical context of HMM state before and after this sil model;
Utilize described decoding network to obtain the HMM state transition sequence of voice to be identified.
According to one preferred embodiment of the present invention, the context dependent of HMM state and phoneme in described context-sensitive HMM;
Before and after this sil model of described adjustment, the acoustical context of HMM state is specially: by phoneme in the HMM state before this sil model in decoding network below replace with sil, phoneme in the HMM state after this sil model in decoding network replaced with to sil above.
According to one preferred embodiment of the present invention, the method also comprises: in described decoding network, the end of language model increases by one to the redirect of this language model head.
According to one preferred embodiment of the present invention, the method also comprises: in described HMM state transition sequence basis, query language model is determined after optimal path, if exist in optimal path from the end of described language model to the redirect of head, determine to exist between sentence and pause.
According to one preferred embodiment of the present invention, the method also comprises:
According to the optimal path of described voice to be identified, punctuation mark is added in the position of pausing between described sentence.
A device for speech recognition, this device comprises:
Network training unit, adopts context-sensitive Hidden Markov Model (HMM) HMM, the suffix in decoding network to increase quiet sil model when training decoding network and adjusts the acoustical context of HMM state before and after this sil model;
Path determining unit, for utilizing described decoding network to obtain the HMM state transition sequence of voice to be identified.
According to one preferred embodiment of the present invention, the context dependent of HMM state and phoneme in described context-sensitive HMM;
Described network training unit is when adjusting the acoustical context of sil model front and back HMM state, specifically by phoneme in the HMM state before this sil model in decoding network below replace with sil, phoneme in the HMM state after this sil model in decoding network replaced with to sil above.
According to one preferred embodiment of the present invention, described network training unit, also increases by one to the redirect of this language model head for the end at described decoding network language model.
According to one preferred embodiment of the present invention, described path determining unit, also determines optimal path for query language model in described HMM state transition sequence basis;
This device also comprises:
Pause recognition unit, if the optimal path of determining for described path determining unit exists from the end of described language model to the redirect of head, determines to exist between sentence and pauses.
According to one preferred embodiment of the present invention, described pause recognition unit, also for the optimal path according to described voice to be identified, punctuation mark is added in the position of pausing between described sentence.
As can be seen from the above technical solutions, the present invention adopts context-sensitive HMM model when training decoding network, suffix in decoding network increases sil model and adjusts the mode of the acoustical context of sil model front and back HMM state, simulation pauses on the contextual impact of acoustic model, and the speech recognition of carrying out based on this decoding network has improved the effect of speech recognition.
[accompanying drawing explanation]
Fig. 1 is a simplified example figure of decoding network;
Fig. 2 increases the schematic diagram of sil model in decoding network in prior art;
A kind of schematic diagram of the decoding network that Fig. 3 provides for the embodiment of the present invention;
The another kind of schematic diagram of the decoding network that Fig. 4 provides for the embodiment of the present invention;
The structural drawing of the speech recognition equipment that Fig. 5 provides for the embodiment of the present invention.
[embodiment]
In order to make the object, technical solutions and advantages of the present invention clearer, below in conjunction with the drawings and specific embodiments, describe the present invention.
For the process of speech recognition, be actually and depend on trained decoding network, that is to say, speech recognition at least comprises two processes: first is the training process of decoding network, and second is speech recognition process voice to be identified being carried out based on decoding network.Wherein, in the speech recognition process that voice to be identified are carried out, relate to the inquiry of acoustic model and the inquiry of language model, the inquiry of acoustic model is to obtain the HMM state transition sequence of voice to be identified based on decoding network inquiry acoustic model (acoustic model of using in the embodiment of the present invention comprises HMM and sil model), the inquiry of language model is based on decoding network query language model, thereby determines the result that optimal path obtains speech recognition.
While training decoding network in embodiments of the present invention, adopt context-sensitive HMM, the suffix in decoding network increases sil model and adjusts the acoustical context of this sil model front and back HMM state.
First context-sensitive HMM is simply introduced, so-called context-sensitive HMM for the HMM that describes same phoneme along with the contextual difference of acoustic phoneme difference, take " China " as example, while adopting context-free HMM to describe, HMM state transition sequence is: " zh ", " ong ", " g " and " uo ", decoding network now as shown in fig. 1.If while adopting context-sensitive HMM to describe, HMM state transition sequence is: " zh+ong ", " zh-ong+g ", " ong-g+uo ", " g-uo ", wherein "+" represents below, "-" represents above, for example " zh+ong " expression " zh " be below " ong " time state, " zh-ong+g " expression " ong " be above " zh " and be below the state of " g ", " g-uo " expression " g " be below the state of " uo ".
In embodiments of the present invention, suffix in decoding network increases sil model, sil model is the HMM that is used for describing quiet, noise, non-voice, pause etc. in speech recognition, because the pause in voice can have influence near acoustics pronunciation, therefore the suffix in decoding network increases after sil model, need to adjust the context of acoustics, near the acoustical context sil model that makes to newly increase meets correlation principle.Particularly, can by phoneme in the HMM state before this sil model in decoding network below replace with sil, phoneme in the HMM state after this sil model in decoding network replaced with to sil above.As shown in Figure 3, by below the replacing with of " ong " " sil ", by below also the replacing with of " uo " " sil ", by above the replacing with of " zh " " sil ", by above the replacing with of " g " " sil ".
Carry out above-mentioned increase sil model and adjust after acoustical context, take equally " China " as example, HMM state transition sequence is " sil-zh ", " zh-ong+sil ", " sil ", " sil-g+uo ", " g-guo+sil ".
The sil model increasing in aforesaid way is unified quiet model, pause in voice is mainly the impact on front and back pronunciation factor on the impact of acoustical context, but not quiet phoneme itself, near the mode of the acoustical context above-mentioned sil model inserting by adjustment has been described this impact exactly, thereby can improve recognition effect.
In speech recognition process, the mode of inquiring about for acoustic model inquiry and language model and definite mode of optimal path all do not change, in determining the process of optimal path, at suffix place, sil model and other HMM models are at war with, if sil model is won, these place's voice are identified as sil.
For this special situation of pausing between sentence, at the contextual information of pause place language model, can undergo mutation, the content of supposing one section of voice is W1, W2, W3, W4, wherein between W2 and W3, exists and pauses.If this section of voice are in short, this is to pause in a sentence, and so corresponding optimal path is: <s>W1 W2 W3 W4</s>.If this section of voice are two words, to pause between a sentence, corresponding optimal path is: <s>W1 W2</s><sGreatT.G reaT.GTW3 W4</s>, that is to say, the language model of W2 has below become </s> from W3, and the language model of W3 has become <s> from W2 above.In order to realize the identification pausing between sentence, in embodiments of the present invention can be further in decoding network the end </s> of language model increase by one to the redirect of this language model head <s>, as shown in Figure 4.
In speech recognition process, at language model end, below can the competing between </s> and other language models of this language model, for pausing between sentence, </s> can win.The voice of W1, W2, W3, W4 of still take are example, when recognizing the language model of W2, the language model of W2 is below competed between </s> and the language model of W3, if pause between sentence, </s> can win, if pause in sentence, W3 can win.
After in decoding network, the redirect of this language model head <s> is arrived in one of the end </s> of language model increase, adopt this decoding network to carry out in speech recognition process, query language model in the HMM state transition sequence basis obtaining in acoustic model inquiry, determine after optimal path, if exist in optimal path from the end of language model to the redirect of head, determine to exist between sentence and pause.Take decoding network shown in Fig. 4 as example, while carrying out speech recognition by this decoding network, due to the redirect having increased from language model end </s> to this language model head <s>, in optimal path computation process, " in " to the redirect of " state " increased " in " to the redirect pausing, if " in " win to the redirect pausing, explanation " in " be sentence tail, " in " and " state " between pause be to pause between sentence, in optimal path, be just presented as " in " the end </s> of language model to the redirect of head <s>, sign is exactly in recognition result, to have " </s><sGreatT.Gre aT.GT ".
The identification pausing between sentence can be on voice identification result basis, punctuation mark is added in the position of pausing between sentence, interpolation type the present invention of punctuation mark is not limited, can adopt such as according to pause duration, different punctuation marks being set, the shorter interpolation comma of the duration that for example pauses, the long fullstop etc. that adds of pause duration.
Be more than the detailed description that method provided by the present invention is carried out, below device provided by the present invention be described in detail.
The structural drawing of the speech recognition equipment that Fig. 5 provides for the embodiment of the present invention, as shown in Figure 5, this device can comprise: 500He path, network training unit determining unit 510.
Wherein adopt context-sensitive HMM, the suffix in decoding network to increase sil model during network training unit 500 training decoding network and adjust the acoustical context of HMM state before and after this sil model.
Path determining unit 510 utilizes decoding network to obtain the HMM state transition sequence of voice to be identified.
The context dependent of HMM state and phoneme in above-mentioned context-sensitive HMM, for the HMM that describes same phoneme along with the contextual difference of acoustic phoneme difference.In such cases, network training unit 500 is when adjusting the acoustical context of sil model front and back HMM state, specifically by phoneme in the HMM state before this sil model in decoding network below replace with sil, phoneme in the HMM state after this sil model in decoding network replaced with to sil above.
Except above-mentioned alleviation pauses on the impact of acoustical context in voice, the situation of undergoing mutation for the contextual information of pause place language model between sentence, network training unit 500, also increases by one to the redirect <s> of this language model head for the end </s> at decoding network language model.
The above-mentioned HMM state transition sequence of obtaining is the query script of acoustic model, in addition, also can determine optimal path in conjunction with the inquiry of language model, path determining unit 500, also for query language model in HMM state transition sequence basis, determine optimal path.
Further, this device can also comprise: pause recognition unit 520, if the optimal path of determining for path determining unit 510 exists the redirect from the end </s> of language model to head <s>, determine to exist between sentence and pause.Further application can be to add punctuation mark such as the position of pausing between sentence.
By above description, can be found out, method and apparatus provided by the invention possesses following advantage:
1) the present invention adopts context-sensitive HMM model when training decoding network, suffix in decoding network increases sil model and adjusts the mode of the acoustical context of sil model front and back HMM state, simulation pauses on the contextual impact of acoustic model, and the speech recognition of carrying out based on this decoding network has improved the effect of speech recognition.
2) end that the present invention increases language model in decoding network is simulated between sentence and is paused to the redirect of head, and the language model contextual information that solving pauses between sentence causes the brought impact of undergoing mutation, further improves the effect of speech recognition.
The foregoing is only preferred embodiment of the present invention, in order to limit the present invention, within the spirit and principles in the present invention not all, any modification of making, be equal to replacement, improvement etc., within all should being included in the scope of protection of the invention.

Claims (10)

1. a method for speech recognition, is characterized in that, the method comprises:
Training adopts context-sensitive Hidden Markov Model (HMM) HMM, the suffix in decoding network to increase quiet sil model during decoding network and adjusts the acoustical context of HMM state before and after this sil model;
Utilize described decoding network to obtain the HMM state transition sequence of voice to be identified.
2. method according to claim 1, is characterized in that, the context dependent of HMM state and phoneme in described context-sensitive HMM;
Before and after this sil model of described adjustment, the acoustical context of HMM state is specially: by phoneme in the HMM state before this sil model in decoding network below replace with sil, phoneme in the HMM state after this sil model in decoding network replaced with to sil above.
3. method according to claim 1, is characterized in that, the method also comprises: in described decoding network, the end of language model increases by one to the redirect of this language model head.
4. method according to claim 3, it is characterized in that, the method also comprises: in described HMM state transition sequence basis, query language model is determined after optimal path, if existed in optimal path from the end of described language model to the redirect of head, determines to exist between sentence and pauses.
5. method according to claim 4, is characterized in that, the method also comprises:
According to the optimal path of described voice to be identified, punctuation mark is added in the position of pausing between described sentence.
6. a device for speech recognition, is characterized in that, this device comprises:
Network training unit, adopts context-sensitive Hidden Markov Model (HMM) HMM, the suffix in decoding network to increase quiet sil model when training decoding network and adjusts the acoustical context of HMM state before and after this sil model;
Path determining unit, for utilizing described decoding network to obtain the HMM state transition sequence of voice to be identified.
7. device according to claim 6, is characterized in that, the context dependent of HMM state and phoneme in described context-sensitive HMM;
Described network training unit is when adjusting the acoustical context of sil model front and back HMM state, specifically by phoneme in the HMM state before this sil model in decoding network below replace with sil, phoneme in the HMM state after this sil model in decoding network replaced with to sil above.
8. device according to claim 6, is characterized in that, described network training unit also increases by one to the redirect of this language model head for the end at described decoding network language model.
9. device according to claim 8, is characterized in that, described path determining unit is also determined optimal path for query language model in described HMM state transition sequence basis;
This device also comprises:
Pause recognition unit, if the optimal path of determining for described path determining unit exists from the end of described language model to the redirect of head, determines to exist between sentence and pauses.
10. device according to claim 9, is characterized in that, described pause recognition unit, and also for the optimal path according to described voice to be identified, punctuation mark is added in the position of pausing between described sentence.
CN201210314129.0A 2012-08-29 2012-08-29 A kind of method and apparatus of speech recognition Active CN103680500B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210314129.0A CN103680500B (en) 2012-08-29 2012-08-29 A kind of method and apparatus of speech recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210314129.0A CN103680500B (en) 2012-08-29 2012-08-29 A kind of method and apparatus of speech recognition

Publications (2)

Publication Number Publication Date
CN103680500A true CN103680500A (en) 2014-03-26
CN103680500B CN103680500B (en) 2018-10-16

Family

ID=50317854

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210314129.0A Active CN103680500B (en) 2012-08-29 2012-08-29 A kind of method and apparatus of speech recognition

Country Status (1)

Country Link
CN (1) CN103680500B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105427870A (en) * 2015-12-23 2016-03-23 北京奇虎科技有限公司 Voice recognition method and device aiming at pauses
CN106101094A (en) * 2016-06-08 2016-11-09 联想(北京)有限公司 Audio-frequency processing method, sending ending equipment, receiving device and audio frequency processing system
CN106710606A (en) * 2016-12-29 2017-05-24 百度在线网络技术(北京)有限公司 Method and device for treating voice based on artificial intelligence
CN108389573A (en) * 2018-02-09 2018-08-10 北京易真学思教育科技有限公司 Language Identification and device, training method and device, medium, terminal
CN109274845A (en) * 2018-08-31 2019-01-25 平安科技(深圳)有限公司 Intelligent sound pays a return visit method, apparatus, computer equipment and storage medium automatically
CN109377998A (en) * 2018-12-11 2019-02-22 科大讯飞股份有限公司 A kind of voice interactive method and device
CN109448704A (en) * 2018-11-20 2019-03-08 北京智能管家科技有限公司 Construction method, device, server and the storage medium of tone decoding figure
CN111985208A (en) * 2020-08-18 2020-11-24 沈阳东软智能医疗科技研究院有限公司 Method, device and equipment for realizing punctuation mark filling
CN116312485A (en) * 2023-05-23 2023-06-23 广州小鹏汽车科技有限公司 Voice recognition method and device and vehicle

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006243213A (en) * 2005-03-02 2006-09-14 Advanced Telecommunication Research Institute International Language model conversion device, sound model conversion device, and computer program
CN101399036A (en) * 2007-09-30 2009-04-01 三星电子株式会社 Device and method for conversing voice to be rap music
CN101740024A (en) * 2008-11-19 2010-06-16 中国科学院自动化研究所 Method for automatic evaluation based on generalized fluent spoken language fluency
CN101887725A (en) * 2010-04-30 2010-11-17 中国科学院声学研究所 Phoneme confusion network-based phoneme posterior probability calculation method
CN101894552A (en) * 2010-07-16 2010-11-24 安徽科大讯飞信息科技股份有限公司 Speech spectrum segmentation based singing evaluating system
CN102231278A (en) * 2011-06-10 2011-11-02 安徽科大讯飞信息科技股份有限公司 Method and system for realizing automatic addition of punctuation marks in speech recognition

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006243213A (en) * 2005-03-02 2006-09-14 Advanced Telecommunication Research Institute International Language model conversion device, sound model conversion device, and computer program
CN101399036A (en) * 2007-09-30 2009-04-01 三星电子株式会社 Device and method for conversing voice to be rap music
CN101740024A (en) * 2008-11-19 2010-06-16 中国科学院自动化研究所 Method for automatic evaluation based on generalized fluent spoken language fluency
CN101887725A (en) * 2010-04-30 2010-11-17 中国科学院声学研究所 Phoneme confusion network-based phoneme posterior probability calculation method
CN101894552A (en) * 2010-07-16 2010-11-24 安徽科大讯飞信息科技股份有限公司 Speech spectrum segmentation based singing evaluating system
CN102231278A (en) * 2011-06-10 2011-11-02 安徽科大讯飞信息科技股份有限公司 Method and system for realizing automatic addition of punctuation marks in speech recognition

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105427870A (en) * 2015-12-23 2016-03-23 北京奇虎科技有限公司 Voice recognition method and device aiming at pauses
CN105427870B (en) * 2015-12-23 2019-08-30 北京奇虎科技有限公司 A kind of audio recognition method and device for pause
CN106101094A (en) * 2016-06-08 2016-11-09 联想(北京)有限公司 Audio-frequency processing method, sending ending equipment, receiving device and audio frequency processing system
CN106710606B (en) * 2016-12-29 2019-11-08 百度在线网络技术(北京)有限公司 Method of speech processing and device based on artificial intelligence
CN106710606A (en) * 2016-12-29 2017-05-24 百度在线网络技术(北京)有限公司 Method and device for treating voice based on artificial intelligence
CN108389573A (en) * 2018-02-09 2018-08-10 北京易真学思教育科技有限公司 Language Identification and device, training method and device, medium, terminal
CN108389573B (en) * 2018-02-09 2022-03-08 北京世纪好未来教育科技有限公司 Language identification method and device, training method and device, medium and terminal
CN109274845A (en) * 2018-08-31 2019-01-25 平安科技(深圳)有限公司 Intelligent sound pays a return visit method, apparatus, computer equipment and storage medium automatically
CN109448704A (en) * 2018-11-20 2019-03-08 北京智能管家科技有限公司 Construction method, device, server and the storage medium of tone decoding figure
CN109377998A (en) * 2018-12-11 2019-02-22 科大讯飞股份有限公司 A kind of voice interactive method and device
CN111985208A (en) * 2020-08-18 2020-11-24 沈阳东软智能医疗科技研究院有限公司 Method, device and equipment for realizing punctuation mark filling
CN111985208B (en) * 2020-08-18 2024-03-26 沈阳东软智能医疗科技研究院有限公司 Method, device and equipment for realizing punctuation mark filling
CN116312485A (en) * 2023-05-23 2023-06-23 广州小鹏汽车科技有限公司 Voice recognition method and device and vehicle
CN116312485B (en) * 2023-05-23 2023-08-25 广州小鹏汽车科技有限公司 Voice recognition method and device and vehicle

Also Published As

Publication number Publication date
CN103680500B (en) 2018-10-16

Similar Documents

Publication Publication Date Title
CN103680500A (en) Speech recognition method and device
CN108986791B (en) Chinese and English language voice recognition method and system for civil aviation air-land communication field
KR101183344B1 (en) Automatic speech recognition learning using user corrections
Aleksic et al. Bringing contextual information to google speech recognition.
US9721573B2 (en) Decoding-time prediction of non-verbalized tokens
US8972243B1 (en) Parse information encoding in a finite state transducer
CN105632499B (en) Method and apparatus for optimizing speech recognition results
CN110675854B (en) Chinese and English mixed speech recognition method and device
US20140156276A1 (en) Conversation system and a method for recognizing speech
CN105489222B (en) Audio recognition method and device
US9224387B1 (en) Targeted detection of regions in speech processing data streams
US9495955B1 (en) Acoustic model training
Ostendorf et al. A sequential repetition model for improved disfluency detection.
CN109741734B (en) Voice evaluation method and device and readable medium
US8706487B2 (en) Audio recognition apparatus and speech recognition method using acoustic models and language models
CN111081219A (en) End-to-end voice intention recognition method
Sinclair et al. A semi-markov model for speech segmentation with an utterance-break prior
JP5271299B2 (en) Speech recognition apparatus, speech recognition system, and speech recognition program
JP6605105B1 (en) Sentence symbol insertion apparatus and method
CN101819772A (en) Phonetic segmentation-based isolate word recognition method
CN103474062A (en) Voice identification method
US7139708B1 (en) System and method for speech recognition using an enhanced phone set
Wilbanks The development of FASE: Forced Alignment System for Español and implications for sociolinguistic methodologies
CN111798838A (en) Method, system, equipment and storage medium for improving speech recognition accuracy
CN109147775A (en) A kind of audio recognition method neural network based and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
EXSB Decision made by sipo to initiate substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant