CN104916283A - Voice recognition method and device - Google Patents
Voice recognition method and device Download PDFInfo
- Publication number
- CN104916283A CN104916283A CN201510319421.5A CN201510319421A CN104916283A CN 104916283 A CN104916283 A CN 104916283A CN 201510319421 A CN201510319421 A CN 201510319421A CN 104916283 A CN104916283 A CN 104916283A
- Authority
- CN
- China
- Prior art keywords
- recognition
- buffer storage
- speech
- speech buffer
- recognition result
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Machine Translation (AREA)
Abstract
The invention discloses a voice recognition method and device. The voice recognition method comprises the following steps: S1) receiving input voice information and dividing the voice information into a plurality of voice cache fragments; S2) sequentially carrying out online recognition on the plurality of voice cache fragments; S3) when the online recognition fails, obtaining a plurality of first recognition results corresponding to a plurality of voice cache fragments which have finished online recognition, carrying out off-line recognition on a plurality of voice cache fragments which fail to finish online recognition, and obtaining a plurality of second recognition results corresponding to the off-line recognition; and S4) combining the plurality of first recognition results and the plurality of second recognition results to generate a final recognition result. According to the voice recognition method and device, stability and precision of voice recognition are improved, and furthermore, use experience of a user is improved.
Description
Technical field
The present invention relates to technical field of voice recognition, particularly relate to a kind of audio recognition method and device.
Background technology
Speech recognition technology is a cross discipline, relates to multiple technical field.Along with the continuous progress of science, the range of application of speech recognition technology is also more and more wide, such as phonitic entry method, can be word by the speech conversion of user, thus save the time of user's input characters.
At present, speech recognition technology can be divided into the speech recognition (online speech recognition) based on high in the clouds engine and the speech recognition (off-line speech recognition) two kinds based on local engine.Online speech recognition has high accuracy of identification, high real-time, does not take the advantages such as client device resource, but require higher to network environment, if network network speed is fast not, ONLINE RECOGNITION process will become and slowly even cannot identify, therefore poor stability.And based on this locality, off-line speech recognition mainly identifies that engine identifies voice, therefore can depart from the dependence to network, ensure the stability identified, but identify that precision is poor.
Under present case, also product is had can be simultaneously used in line identification and identified off-line, but be all realize, namely when being in online identifying, if network generation problem based on the strategy of retry, ONLINE RECOGNITION procedure failure will be pointed out, need user to re-enter voice messaging, carry out speech recognition again, then judge to use ONLINE RECOGNITION still to use identified off-line according to network condition, operation inconvenience, poor user experience.
Therefore, a kind of stability is high, identification precision is high recognition methods or device is needed badly.
Summary of the invention
The present invention is intended to solve one of technical matters in correlation technique at least to a certain extent.For this reason, one object of the present invention is to propose a kind of audio recognition method, and the method can improve stability and the precision of identification, and then promotes Consumer's Experience.
Second object of the present invention is to propose a kind of speech recognition equipment.
To achieve these goals, first aspect present invention embodiment proposes a kind of audio recognition method, comprising: the voice messaging of S1, reception input, and described voice messaging is cut into multiple speech buffer storage fragment; S2, successively ONLINE RECOGNITION is carried out to described multiple speech buffer storage fragment; S3, when described ONLINE RECOGNITION makes a mistake, obtain and completed multiple first recognition results corresponding to multiple speech buffer storage fragments of ONLINE RECOGNITION, and identified off-line is carried out to the multiple speech buffer storage fragments not completing ONLINE RECOGNITION, and obtain multiple second recognition results corresponding to described identified off-line; And S4, described multiple first recognition result of merging and described multiple second recognition result are to generate final recognition result.
The audio recognition method of the embodiment of the present invention, by voice messaging being cut into multiple speech buffer storage fragment, successively ONLINE RECOGNITION is carried out to multiple speech buffer storage fragment, and when ONLINE RECOGNITION makes a mistake, directly identified off-line is carried out to the unidentified speech buffer storage fragment completed, and merge multiple first recognition result corresponding to ONLINE RECOGNITION and multiple second recognition results corresponding to identified off-line, improve stability and the precision of speech recognition, and then improve the experience of user.
Second aspect present invention embodiment proposes a kind of speech recognition equipment, comprising: cutting module, for receiving the voice messaging of input, and described voice messaging is cut into multiple speech buffer storage fragment; ONLINE RECOGNITION module, for carrying out ONLINE RECOGNITION to described multiple speech buffer storage fragment successively; Obtain module, for when described ONLINE RECOGNITION makes a mistake, acquisition has completed multiple first recognition results corresponding to multiple speech buffer storage fragments of ONLINE RECOGNITION; Identified off-line module, for when described ONLINE RECOGNITION makes a mistake, carries out identified off-line to the multiple speech buffer storage fragments not completing ONLINE RECOGNITION, described acquisition module, also for obtaining multiple second recognition results corresponding to described identified off-line; And merging module, for merging described multiple first recognition result and described multiple second recognition result to generate final recognition result.
The speech recognition equipment of the embodiment of the present invention, by voice messaging being cut into multiple speech buffer storage fragment, successively ONLINE RECOGNITION is carried out to multiple speech buffer storage fragment, and when ONLINE RECOGNITION makes a mistake, directly identified off-line is carried out to the unidentified speech buffer storage fragment completed, and merge multiple first recognition result corresponding to ONLINE RECOGNITION and multiple second recognition results corresponding to identified off-line, improve stability and the precision of speech recognition, and then improve the experience of user.
Accompanying drawing explanation
Fig. 1 is the process flow diagram of audio recognition method according to an embodiment of the invention.
Fig. 2 is the process flow diagram of the audio recognition method according to the present invention's specific embodiment.
Fig. 3 is the structural representation of speech recognition equipment according to an embodiment of the invention.
Embodiment
Be described below in detail embodiments of the invention, the example of described embodiment is shown in the drawings, and wherein same or similar label represents same or similar element or has element that is identical or similar functions from start to finish.Be exemplary below by the embodiment be described with reference to the drawings, be intended to for explaining the present invention, and can not limitation of the present invention be interpreted as.
Below with reference to the accompanying drawings audio recognition method and the device of the embodiment of the present invention are described.
Fig. 1 is the process flow diagram of audio recognition method according to an embodiment of the invention.
As shown in Figure 1, audio recognition method can comprise:
The voice messaging of S1, reception input, and voice messaging is cut into multiple speech buffer storage fragment.
In an embodiment of the present invention, the voice messaging of user by input equipment such as microphone input can be received, then the voice messaging of reception is cut into multiple speech buffer storage fragment.
Particularly, multipair sound end can be obtained based on speech terminals detection technology, the speech data then between buffer memory often pair sound end, to generate multiple speech buffer storage fragment.Wherein, often pair of sound end comprises voice starting point and the voice terminal corresponding with voice starting point.
For example, the voice that user inputs can be analyzed, the sound end obtained is s1, e1, s2, e2, s3, e3 ... wherein, s1 is first voice starting point, e1 is first voice terminal, then the speech data between buffer memory s1 and e1, thus generate first speech buffer storage fragment v1; S2 is second voice starting point, and e2 is second voice terminal, the speech data then between buffer memory s2 and e2, thus generates second speech buffer storage fragment v2, by that analogy.The multiple speech buffer storage fragment of final generation.
S2, successively ONLINE RECOGNITION is carried out to multiple speech buffer storage fragment.
After voice messaging being cut into multiple speech buffer storage fragment, ONLINE RECOGNITION can be carried out to multiple speech buffer storage fragment successively.
Particularly, by high in the clouds engine, feature extraction is carried out to speech buffer storage fragment, to generate acoustic feature sequence, then according to acoustic model and dictionary, acoustic feature sequence is decoded, thus obtain the acoustic model sequence with acoustic feature sequences match, finally obtain the word sequence of answering with acoustic model sequence pair according to language model, as the first recognition result that speech buffer storage fragment is corresponding.Consistent with the technology of ONLINE RECOGNITION in prior art herein, therefore do not repeat.
S3, when ONLINE RECOGNITION makes a mistake, obtain and completed multiple first recognition results corresponding to multiple speech buffer storage fragments of ONLINE RECOGNITION, and identified off-line is carried out to the multiple speech buffer storage fragments not completing ONLINE RECOGNITION, and obtain multiple second recognition results corresponding to identified off-line.
In the process of ONLINE RECOGNITION, network may produce exception, the slack-off or suspension of such as network speed, then cause ONLINE RECOGNITION to produce mistake.Now, directly can transfer identified off-line to, instead of retry is carried out to ONLINE RECOGNITION.For example, when ONLINE RECOGNITION is to the 3rd speech buffer storage fragment v3, create mistake, then can obtain the first two and complete recognition result a1 and a2 corresponding to the speech buffer storage fragment of identification, delete first and second speech buffer storage fragment v1 and v2 simultaneously.Then, the speech buffer storage fragment v3 never completed starts, and carries out identified off-line, thus ensure that the seamless connection of ONLINE RECOGNITION and identified off-line, ensure that integrity degree and the degree of accuracy of speech recognition.
Wherein, the technology of identified off-line is consistent with the technology of ONLINE RECOGNITION, and difference is that identified off-line uses local engine.Particularly, by local engine, feature extraction is carried out to speech buffer storage fragment, to generate acoustic feature sequence, then according to acoustic model and dictionary, acoustic feature sequence is decoded, thus obtain the acoustic model sequence with acoustic feature sequences match, finally obtain the word sequence of answering with acoustic model sequence pair according to language model, as the second recognition result that speech buffer storage fragment is corresponding.Consistent with the technology of identified off-line in prior art herein, therefore do not repeat.
After identified off-line completes, corresponding multiple second recognition results can be obtained.
In the present embodiment, recognition result corresponding to ONLINE RECOGNITION is the first recognition result, and recognition result corresponding to identified off-line is the second recognition result, and both all can be text message.For example, a1 and a2 is recognition result corresponding to ONLINE RECOGNITION, is the first recognition result; And the recognition result after a3, being identified off-line obtains, be then the second recognition result.
S4, merge multiple first recognition result and multiple second recognition result to generate final recognition result.
After multiple first recognition result of acquisition and multiple second recognition result, can union operation be performed, thus generate final recognition result, such as A=a1+a2+a3+ ...
The audio recognition method of the embodiment of the present invention, by voice messaging being cut into multiple speech buffer storage fragment, successively ONLINE RECOGNITION is carried out to multiple speech buffer storage fragment, and when ONLINE RECOGNITION makes a mistake, directly identified off-line is carried out to the unidentified speech buffer storage fragment completed, and merge multiple first recognition result corresponding to ONLINE RECOGNITION and multiple second recognition results corresponding to identified off-line, improve stability and the precision of speech recognition, and then improve the experience of user.
Fig. 2 is the process flow diagram of the audio recognition method according to the present invention's specific embodiment.
As shown in Figure 2, audio recognition method can comprise:
S201, receives the voice messaging of input, and carries out speech terminals detection to voice messaging.
Such as user input voice messaging be " hello, is Baidu here, could you tell me and whom looks for? ", then can carry out speech detection based on voice endpoint detection technique to this voice messaging, thus the sound end obtained is s1, e1, s2, e2, s3, e3.Wherein, s1 is the voice starting point of " hello ", and e1 is the voice terminal of " hello ", the voice starting point that s2 is " being Baidu here ", the voice terminal that e2 is " being Baidu here ", s3 is the voice starting point of " could you tell me and whom looks for ", and e3 is the voice terminal of " could you tell me and whom looks for ".
S202, according to the sound end detected, generates multiple speech buffer storage.
According to above-mentioned sound end, " hello ", " being Baidu ", " could you tell me and whom looks for " three speech buffer storages can be generated here.
S203, carries out ONLINE RECOGNITION to multiple speech buffer storage.
Successively ONLINE RECOGNITION is carried out to " hello ", " being Baidu ", " could you tell me and whom looks for " here.
S204, when network makes a mistake, obtains and has completed the first recognition result corresponding to multiple speech buffer storages of identification, and carry out identified off-line to the multiple speech buffer storages not completing identification, to obtain the second corresponding recognition result.
When supposing that network makes a mistake, speech buffer storage " hello " and " being Baidu here " have identified, then can obtain corresponding the first recognition result " hello " and " being Baidu " here, and the first recognition result is that ONLINE RECOGNITION obtains, and is text message.Then, from speech buffer storage " be could you tell me and whom is looked for ", can carry out identified off-line, finally obtain the second corresponding recognition result and " could you tell me and whom looks for ", the second recognition result is that identified off-line obtains, and is also text message.
S205, carries out concatenation to the first recognition result and the second recognition result, to generate final recognition result.
Text message " hello ", " being Baidu here ", " could you tell me and whom looks for " are carried out to concatenation, thus are generated final recognition result, namely text message " hello, is Baidu here, may I ask that who are you? "
The audio recognition method of the embodiment of the present invention, by input voice information is carried out speech terminals detection, then according to the sound end detected, generate multiple speech buffer storage, and ONLINE RECOGNITION is carried out to multiple speech buffer storage, and when network makes a mistake, obtain the first recognition result identified, then identified off-line is carried out to the unidentified multiple speech buffer storages completed, obtain the second corresponding recognition result, finally the first recognition result and the second recognition result are spliced, obtain final voice identification result, improve stability and the precision of speech recognition, and then improve the experience of user.
For achieving the above object, the present invention also proposes a kind of speech recognition equipment.
Fig. 3 is the structural representation of speech recognition equipment according to an embodiment of the invention.
As shown in Figure 3, this speech recognition equipment can comprise: cutting module 110, ONLINE RECOGNITION module 120, acquisition module 130, identified off-line module 140 and merging module 150.
Voice messaging for receiving the voice messaging of input, and is cut into multiple speech buffer storage fragment by cutting module 110.
In an embodiment of the present invention, cutting module 110 can receive the voice messaging of user by input equipment such as microphone input, then the voice messaging of reception is cut into multiple speech buffer storage fragment.
Particularly, cutting module 110 can obtain multipair sound end based on speech terminals detection technology, the speech data then between buffer memory often pair sound end, to generate multiple speech buffer storage fragment.Wherein, often pair of sound end comprises voice starting point and the voice terminal corresponding with voice starting point.
For example, the voice that user inputs can be analyzed, the sound end obtained is s1, e1, s2, e2, s3, e3 ... wherein, s1 is first voice starting point, e1 is first voice terminal, then the speech data between buffer memory s1 and e1, thus generate first speech buffer storage fragment v1; S2 is second voice starting point, and e2 is second voice terminal, the speech data then between buffer memory s2 and e2, thus generates second speech buffer storage fragment v2, by that analogy.The multiple speech buffer storage fragment of final generation.
ONLINE RECOGNITION module 120 is for carrying out ONLINE RECOGNITION to multiple speech buffer storage fragment successively.
After voice messaging is cut into multiple speech buffer storage fragment by cutting module 110, ONLINE RECOGNITION module 120 can carry out ONLINE RECOGNITION to multiple speech buffer storage fragment successively.
Particularly, ONLINE RECOGNITION module 120 carries out feature extraction by high in the clouds engine to speech buffer storage fragment, to generate acoustic feature sequence, then according to acoustic model and dictionary, acoustic feature sequence is decoded, thus obtain the acoustic model sequence with acoustic feature sequences match, finally obtain the word sequence of answering with acoustic model sequence pair according to language model, as the first recognition result that speech buffer storage fragment is corresponding.Consistent with the technology of ONLINE RECOGNITION in prior art herein, therefore do not repeat.
Obtain module 130 for when ONLINE RECOGNITION makes a mistake, acquisition has completed multiple first recognition results corresponding to multiple speech buffer storage fragments of ONLINE RECOGNITION.
Identified off-line module 140, for when ONLINE RECOGNITION makes a mistake, carries out identified off-line to the multiple speech buffer storage fragments not completing ONLINE RECOGNITION, obtains module 130 also for obtaining multiple second recognition results corresponding to identified off-line.
In the process of ONLINE RECOGNITION, network may produce exception, the slack-off or suspension of such as network speed, then cause ONLINE RECOGNITION to produce mistake.Now, directly can transfer identified off-line to, instead of retry is carried out to ONLINE RECOGNITION.For example, when ONLINE RECOGNITION is to the 3rd speech buffer storage fragment v3, create mistake, then can obtain the first two and complete recognition result a1 and a2 corresponding to the speech buffer storage fragment of identification, delete first and second speech buffer storage fragment v1 and v2 simultaneously.Then, the speech buffer storage fragment v3 never completed starts, and carries out identified off-line.
Particularly, identified off-line module 140 carries out feature extraction by local engine to speech buffer storage fragment, to generate acoustic feature sequence, then according to acoustic model and dictionary, acoustic feature sequence is decoded, thus obtain the acoustic model sequence with acoustic feature sequences match, finally obtain the word sequence of answering with acoustic model sequence pair according to language model, as the second recognition result that speech buffer storage fragment is corresponding.Consistent with the technology of identified off-line in prior art herein, therefore do not repeat.
After identified off-line completes, obtain module 130 and can obtain corresponding multiple second recognition results.
In the present embodiment, recognition result corresponding to ONLINE RECOGNITION is the first recognition result, and recognition result corresponding to identified off-line is the second recognition result, and both all can be text message.For example, a1 and a2 is recognition result corresponding to ONLINE RECOGNITION, is the first recognition result; And the recognition result after a3, being identified off-line obtains, be then the second recognition result.
Merge module 150 for merging multiple first recognition result and multiple second recognition result to generate final recognition result.
After acquisition module 130 obtains multiple first recognition result and multiple second recognition result, merging module 150 can perform union operation, thus generates final recognition result, such as A=a1+a2+a3+ ...
The speech recognition equipment of the embodiment of the present invention, by voice messaging being cut into multiple speech buffer storage fragment, successively ONLINE RECOGNITION is carried out to multiple speech buffer storage fragment, and when ONLINE RECOGNITION makes a mistake, directly identified off-line is carried out to the unidentified speech buffer storage fragment completed, and merge multiple first recognition result corresponding to ONLINE RECOGNITION and multiple second recognition results corresponding to identified off-line, improve stability and the precision of speech recognition, and then improve the experience of user.
In describing the invention, it will be appreciated that, term " " center ", " longitudinal direction ", " transverse direction ", " length ", " width ", " thickness ", " on ", D score, " front ", " afterwards ", " left side ", " right side ", " vertically ", " level ", " top ", " end " " interior ", " outward ", " clockwise ", " counterclockwise ", " axis ", " radial direction ", orientation or the position relationship of the instruction such as " circumference " are based on orientation shown in the drawings or position relationship, only the present invention for convenience of description and simplified characterization, instead of indicate or imply that the device of indication or element must have specific orientation, with specific azimuth configuration and operation, therefore limitation of the present invention can not be interpreted as.
In addition, term " first ", " second " only for describing object, and can not be interpreted as instruction or hint relative importance or imply the quantity indicating indicated technical characteristic.Thus, be limited with " first ", the feature of " second " can express or impliedly comprise at least one this feature.In describing the invention, the implication of " multiple " is at least two, such as two, three etc., unless otherwise expressly limited specifically.
In the present invention, unless otherwise clearly defined and limited, the term such as term " installation ", " being connected ", " connection ", " fixing " should be interpreted broadly, and such as, can be fixedly connected with, also can be removably connect, or integral; Can be mechanical connection, also can be electrical connection; Can be directly be connected, also indirectly can be connected by intermediary, can be the connection of two element internals or the interaction relationship of two elements, unless otherwise clear and definite restriction.For the ordinary skill in the art, above-mentioned term concrete meaning in the present invention can be understood as the case may be.
In the present invention, unless otherwise clearly defined and limited, fisrt feature second feature " on " or D score can be that the first and second features directly contact, or the first and second features are by intermediary indirect contact.And, fisrt feature second feature " on ", " top " and " above " but fisrt feature directly over second feature or oblique upper, or only represent that fisrt feature level height is higher than second feature.Fisrt feature second feature " under ", " below " and " below " can be fisrt feature immediately below second feature or tiltedly below, or only represent that fisrt feature level height is less than second feature.
In the description of this instructions, specific features, structure, material or feature that the description of reference term " embodiment ", " some embodiments ", " example ", " concrete example " or " some examples " etc. means to describe in conjunction with this embodiment or example are contained at least one embodiment of the present invention or example.In this manual, to the schematic representation of above-mentioned term not must for be identical embodiment or example.And the specific features of description, structure, material or feature can combine in one or more embodiment in office or example in an appropriate manner.In addition, when not conflicting, the feature of the different embodiment described in this instructions or example and different embodiment or example can carry out combining and combining by those skilled in the art.
Although illustrate and describe embodiments of the invention above, be understandable that, above-described embodiment is exemplary, can not be interpreted as limitation of the present invention, and those of ordinary skill in the art can change above-described embodiment within the scope of the invention, revises, replace and modification.
Claims (10)
1. an audio recognition method, is characterized in that, comprising:
The voice messaging of S1, reception input, and described voice messaging is cut into multiple speech buffer storage fragment;
S2, successively ONLINE RECOGNITION is carried out to described multiple speech buffer storage fragment;
S3, when described ONLINE RECOGNITION makes a mistake, obtain and completed multiple first recognition results corresponding to multiple speech buffer storage fragments of ONLINE RECOGNITION, and identified off-line is carried out to the multiple speech buffer storage fragments not completing ONLINE RECOGNITION, and obtain multiple second recognition results corresponding to described identified off-line; And
S4, merge described multiple first recognition result and described multiple second recognition result to generate final recognition result.
2. the method for claim 1, is characterized in that, described described voice messaging is cut into multiple speech buffer storage fragment, comprising:
Obtain multipair sound end based on speech terminals detection technology, wherein, often pair of sound end comprises voice starting point and the voice terminal corresponding with described voice starting point;
Speech data described in buffer memory between often pair of sound end, to generate described multiple speech buffer storage fragment.
3. the method for claim 1, is characterized in that, describedly carries out ONLINE RECOGNITION to described multiple speech buffer storage fragment successively, comprising:
By high in the clouds engine, feature extraction is carried out to described speech buffer storage fragment, to generate acoustic feature sequence;
According to acoustic model and dictionary, described acoustic feature sequence is decoded, obtain the acoustic model sequence with described acoustic feature sequences match; And
The word sequence of answering with described acoustic model sequence pair is obtained, as the first recognition result that described speech buffer storage fragment is corresponding according to language model.
4. the method for claim 1, is characterized in that, the described speech buffer storage fragment to not completing ONLINE RECOGNITION carries out identified off-line, comprising:
By local engine, feature extraction is carried out to described speech buffer storage fragment, to generate acoustic feature sequence;
According to acoustic model and dictionary, described acoustic feature sequence is decoded, obtain the acoustic model sequence with described acoustic feature sequences match; And
The word sequence of answering with described acoustic model sequence pair is obtained, as the second recognition result that described speech buffer storage fragment is corresponding according to language model.
5. the method for claim 1, is characterized in that, described first recognition result and described second recognition result are text message.
6. a speech recognition equipment, is characterized in that, comprising:
Cutting module, for receiving the voice messaging of input, and is cut into multiple speech buffer storage fragment by described voice messaging;
ONLINE RECOGNITION module, for carrying out ONLINE RECOGNITION to described multiple speech buffer storage fragment successively;
Obtain module, for when described ONLINE RECOGNITION makes a mistake, acquisition has completed multiple first recognition results corresponding to multiple speech buffer storage fragments of ONLINE RECOGNITION;
Identified off-line module, for when described ONLINE RECOGNITION makes a mistake, carries out identified off-line to the multiple speech buffer storage fragments not completing ONLINE RECOGNITION, described acquisition module, also for obtaining multiple second recognition results corresponding to described identified off-line; And
Merge module, for merging described multiple first recognition result and described multiple second recognition result to generate final recognition result.
7. device as claimed in claim 6, is characterized in that, described cutting module, specifically for:
Multipair sound end is obtained based on speech terminals detection technology, and the speech data described in buffer memory between often pair of sound end, to generate described multiple speech buffer storage fragment, wherein, often pair of sound end comprises voice starting point and the voice terminal corresponding with described voice starting point.
8. device as claimed in claim 6, is characterized in that, described ONLINE RECOGNITION module, specifically for:
For carrying out feature extraction by high in the clouds engine to described speech buffer storage fragment, to generate acoustic feature sequence, according to acoustic model and dictionary, described acoustic feature sequence is decoded, and obtain the acoustic model sequence with described acoustic feature sequences match, and obtain the word sequence of answering with described acoustic model sequence pair according to language model, as the first recognition result that described speech buffer storage fragment is corresponding.
9. method as claimed in claim 6, is characterized in that, described identified off-line module, specifically for:
By local engine, feature extraction is carried out to described speech buffer storage fragment, to generate acoustic feature sequence, according to acoustic model and dictionary, described acoustic feature sequence is decoded, and obtain the acoustic model sequence with described acoustic feature sequences match, and obtain the word sequence of answering with described acoustic model sequence pair according to language model, as the second recognition result that described speech buffer storage fragment is corresponding.
10. device as claimed in claim 6, it is characterized in that, described first recognition result and described second recognition result are text message.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510319421.5A CN104916283A (en) | 2015-06-11 | 2015-06-11 | Voice recognition method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510319421.5A CN104916283A (en) | 2015-06-11 | 2015-06-11 | Voice recognition method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN104916283A true CN104916283A (en) | 2015-09-16 |
Family
ID=54085312
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510319421.5A Pending CN104916283A (en) | 2015-06-11 | 2015-06-11 | Voice recognition method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104916283A (en) |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105632487A (en) * | 2015-12-31 | 2016-06-01 | 北京奇艺世纪科技有限公司 | Voice recognition method and device |
CN105719642A (en) * | 2016-02-29 | 2016-06-29 | 黄博 | Continuous and long voice recognition method and system and hardware equipment |
CN107170450A (en) * | 2017-06-14 | 2017-09-15 | 上海木爷机器人技术有限公司 | Audio recognition method and device |
WO2017219495A1 (en) * | 2016-06-21 | 2017-12-28 | 宇龙计算机通信科技(深圳)有限公司 | Speech recognition method and system |
CN107767873A (en) * | 2017-10-20 | 2018-03-06 | 广东电网有限责任公司惠州供电局 | A kind of fast and accurately offline speech recognition equipment and method |
CN108172212A (en) * | 2017-12-25 | 2018-06-15 | 横琴国际知识产权交易中心有限公司 | A kind of voice Language Identification and system based on confidence level |
CN109064815A (en) * | 2018-09-04 | 2018-12-21 | 北京粉笔未来科技有限公司 | Online testing method and apparatus calculate equipment and storage medium |
CN109065037A (en) * | 2018-07-10 | 2018-12-21 | 福州瑞芯微电子股份有限公司 | A kind of audio method of flow control based on interactive voice |
CN109410927A (en) * | 2018-11-29 | 2019-03-01 | 北京蓦然认知科技有限公司 | Offline order word parses the audio recognition method combined, device and system with cloud |
CN109741753A (en) * | 2019-01-11 | 2019-05-10 | 百度在线网络技术(北京)有限公司 | A kind of voice interactive method, device, terminal and server |
CN109840052A (en) * | 2019-01-31 | 2019-06-04 | 成都超有爱科技有限公司 | A kind of audio-frequency processing method, device, electronic equipment and storage medium |
WO2019134474A1 (en) * | 2018-01-08 | 2019-07-11 | 珠海格力电器股份有限公司 | Voice control method and device |
CN110060687A (en) * | 2016-09-05 | 2019-07-26 | 北京金山软件有限公司 | A kind of conversion of voice messaging, information generating method and device |
CN111210822A (en) * | 2020-02-12 | 2020-05-29 | 支付宝(杭州)信息技术有限公司 | Speech recognition method and device |
CN111445911A (en) * | 2020-03-28 | 2020-07-24 | 大连鼎创科技开发有限公司 | Home offline online voice recognition switching logic method |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101409072A (en) * | 2007-10-10 | 2009-04-15 | 松下电器产业株式会社 | Embedded equipment, bimodule voice synthesis system and method |
CN102682770A (en) * | 2012-02-23 | 2012-09-19 | 西安雷迪维护系统设备有限公司 | Cloud-computing-based voice recognition system |
CN102708865A (en) * | 2012-04-25 | 2012-10-03 | 北京车音网科技有限公司 | Method, device and system for voice recognition |
CN103079258A (en) * | 2013-01-09 | 2013-05-01 | 广东欧珀移动通信有限公司 | Method for improving speech recognition accuracy and mobile intelligent terminal |
WO2014186143A1 (en) * | 2013-05-13 | 2014-11-20 | Facebook, Inc. | Hybrid, offline/online speech translation system |
-
2015
- 2015-06-11 CN CN201510319421.5A patent/CN104916283A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101409072A (en) * | 2007-10-10 | 2009-04-15 | 松下电器产业株式会社 | Embedded equipment, bimodule voice synthesis system and method |
CN102682770A (en) * | 2012-02-23 | 2012-09-19 | 西安雷迪维护系统设备有限公司 | Cloud-computing-based voice recognition system |
CN102708865A (en) * | 2012-04-25 | 2012-10-03 | 北京车音网科技有限公司 | Method, device and system for voice recognition |
CN103079258A (en) * | 2013-01-09 | 2013-05-01 | 广东欧珀移动通信有限公司 | Method for improving speech recognition accuracy and mobile intelligent terminal |
WO2014186143A1 (en) * | 2013-05-13 | 2014-11-20 | Facebook, Inc. | Hybrid, offline/online speech translation system |
Cited By (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105632487A (en) * | 2015-12-31 | 2016-06-01 | 北京奇艺世纪科技有限公司 | Voice recognition method and device |
CN105632487B (en) * | 2015-12-31 | 2020-04-21 | 北京奇艺世纪科技有限公司 | Voice recognition method and device |
CN105719642A (en) * | 2016-02-29 | 2016-06-29 | 黄博 | Continuous and long voice recognition method and system and hardware equipment |
WO2017219495A1 (en) * | 2016-06-21 | 2017-12-28 | 宇龙计算机通信科技(深圳)有限公司 | Speech recognition method and system |
CN110060687A (en) * | 2016-09-05 | 2019-07-26 | 北京金山软件有限公司 | A kind of conversion of voice messaging, information generating method and device |
CN107170450A (en) * | 2017-06-14 | 2017-09-15 | 上海木爷机器人技术有限公司 | Audio recognition method and device |
CN107767873A (en) * | 2017-10-20 | 2018-03-06 | 广东电网有限责任公司惠州供电局 | A kind of fast and accurately offline speech recognition equipment and method |
CN108172212A (en) * | 2017-12-25 | 2018-06-15 | 横琴国际知识产权交易中心有限公司 | A kind of voice Language Identification and system based on confidence level |
CN108172212B (en) * | 2017-12-25 | 2020-09-11 | 横琴国际知识产权交易中心有限公司 | Confidence-based speech language identification method and system |
WO2019134474A1 (en) * | 2018-01-08 | 2019-07-11 | 珠海格力电器股份有限公司 | Voice control method and device |
CN109065037A (en) * | 2018-07-10 | 2018-12-21 | 福州瑞芯微电子股份有限公司 | A kind of audio method of flow control based on interactive voice |
CN109065037B (en) * | 2018-07-10 | 2023-04-25 | 瑞芯微电子股份有限公司 | Audio stream control method based on voice interaction |
CN109064815A (en) * | 2018-09-04 | 2018-12-21 | 北京粉笔未来科技有限公司 | Online testing method and apparatus calculate equipment and storage medium |
CN109410927A (en) * | 2018-11-29 | 2019-03-01 | 北京蓦然认知科技有限公司 | Offline order word parses the audio recognition method combined, device and system with cloud |
CN109410927B (en) * | 2018-11-29 | 2020-04-03 | 北京蓦然认知科技有限公司 | Voice recognition method, device and system combining offline command word and cloud analysis |
CN109741753A (en) * | 2019-01-11 | 2019-05-10 | 百度在线网络技术(北京)有限公司 | A kind of voice interactive method, device, terminal and server |
CN109840052A (en) * | 2019-01-31 | 2019-06-04 | 成都超有爱科技有限公司 | A kind of audio-frequency processing method, device, electronic equipment and storage medium |
CN109840052B (en) * | 2019-01-31 | 2022-03-18 | 成都超有爱科技有限公司 | Audio processing method and device, electronic equipment and storage medium |
CN111210822A (en) * | 2020-02-12 | 2020-05-29 | 支付宝(杭州)信息技术有限公司 | Speech recognition method and device |
CN111445911A (en) * | 2020-03-28 | 2020-07-24 | 大连鼎创科技开发有限公司 | Home offline online voice recognition switching logic method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104916283A (en) | Voice recognition method and device | |
CN107195295B (en) | Voice recognition method and device based on Chinese-English mixed dictionary | |
CN107731228B (en) | Text conversion method and device for English voice information | |
US9390711B2 (en) | Information recognition method and apparatus | |
CN111309889A (en) | Method and device for text processing | |
CN112487173B (en) | Man-machine conversation method, device and storage medium | |
CN112100349A (en) | Multi-turn dialogue method and device, electronic equipment and storage medium | |
US9601107B2 (en) | Speech recognition system, recognition dictionary registration system, and acoustic model identifier series generation apparatus | |
CN107168546B (en) | Input prompting method and device | |
CN105513590A (en) | Voice recognition method and device | |
CN108710704B (en) | Method and device for determining conversation state, electronic equipment and storage medium | |
CN109243468B (en) | Voice recognition method and device, electronic equipment and storage medium | |
CN104992704A (en) | Speech synthesizing method and device | |
CN102322866B (en) | Navigation method and system based on natural speech recognition | |
CN112509566B (en) | Speech recognition method, device, equipment, storage medium and program product | |
CN103177721A (en) | Voice recognition method and system | |
CN103594085A (en) | Method and system providing speech recognition result | |
CN103514882A (en) | Voice identification method and system | |
CN109410923B (en) | Speech recognition method, apparatus, system and storage medium | |
CN113282736B (en) | Dialogue understanding and model training method, device, equipment and storage medium | |
CN112861548A (en) | Natural language generation and model training method, device, equipment and storage medium | |
CN104714954A (en) | Information searching method and system based on context understanding | |
CN113793599B (en) | Training method of voice recognition model, voice recognition method and device | |
CN105070289A (en) | English name recognition method and device | |
CN105469801A (en) | Input speech restoring method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20150916 |
|
RJ01 | Rejection of invention patent application after publication |