CN104916283A - Voice recognition method and device - Google Patents

Voice recognition method and device Download PDF

Info

Publication number
CN104916283A
CN104916283A CN201510319421.5A CN201510319421A CN104916283A CN 104916283 A CN104916283 A CN 104916283A CN 201510319421 A CN201510319421 A CN 201510319421A CN 104916283 A CN104916283 A CN 104916283A
Authority
CN
China
Prior art keywords
recognition
buffer storage
speech
speech buffer
recognition result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510319421.5A
Other languages
Chinese (zh)
Inventor
段弘
唐立亮
谢延
彭守业
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Baidu Online Network Technology Beijing Co Ltd
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201510319421.5A priority Critical patent/CN104916283A/en
Publication of CN104916283A publication Critical patent/CN104916283A/en
Pending legal-status Critical Current

Links

Landscapes

  • Machine Translation (AREA)

Abstract

The invention discloses a voice recognition method and device. The voice recognition method comprises the following steps: S1) receiving input voice information and dividing the voice information into a plurality of voice cache fragments; S2) sequentially carrying out online recognition on the plurality of voice cache fragments; S3) when the online recognition fails, obtaining a plurality of first recognition results corresponding to a plurality of voice cache fragments which have finished online recognition, carrying out off-line recognition on a plurality of voice cache fragments which fail to finish online recognition, and obtaining a plurality of second recognition results corresponding to the off-line recognition; and S4) combining the plurality of first recognition results and the plurality of second recognition results to generate a final recognition result. According to the voice recognition method and device, stability and precision of voice recognition are improved, and furthermore, use experience of a user is improved.

Description

Audio recognition method and device
Technical field
The present invention relates to technical field of voice recognition, particularly relate to a kind of audio recognition method and device.
Background technology
Speech recognition technology is a cross discipline, relates to multiple technical field.Along with the continuous progress of science, the range of application of speech recognition technology is also more and more wide, such as phonitic entry method, can be word by the speech conversion of user, thus save the time of user's input characters.
At present, speech recognition technology can be divided into the speech recognition (online speech recognition) based on high in the clouds engine and the speech recognition (off-line speech recognition) two kinds based on local engine.Online speech recognition has high accuracy of identification, high real-time, does not take the advantages such as client device resource, but require higher to network environment, if network network speed is fast not, ONLINE RECOGNITION process will become and slowly even cannot identify, therefore poor stability.And based on this locality, off-line speech recognition mainly identifies that engine identifies voice, therefore can depart from the dependence to network, ensure the stability identified, but identify that precision is poor.
Under present case, also product is had can be simultaneously used in line identification and identified off-line, but be all realize, namely when being in online identifying, if network generation problem based on the strategy of retry, ONLINE RECOGNITION procedure failure will be pointed out, need user to re-enter voice messaging, carry out speech recognition again, then judge to use ONLINE RECOGNITION still to use identified off-line according to network condition, operation inconvenience, poor user experience.
Therefore, a kind of stability is high, identification precision is high recognition methods or device is needed badly.
Summary of the invention
The present invention is intended to solve one of technical matters in correlation technique at least to a certain extent.For this reason, one object of the present invention is to propose a kind of audio recognition method, and the method can improve stability and the precision of identification, and then promotes Consumer's Experience.
Second object of the present invention is to propose a kind of speech recognition equipment.
To achieve these goals, first aspect present invention embodiment proposes a kind of audio recognition method, comprising: the voice messaging of S1, reception input, and described voice messaging is cut into multiple speech buffer storage fragment; S2, successively ONLINE RECOGNITION is carried out to described multiple speech buffer storage fragment; S3, when described ONLINE RECOGNITION makes a mistake, obtain and completed multiple first recognition results corresponding to multiple speech buffer storage fragments of ONLINE RECOGNITION, and identified off-line is carried out to the multiple speech buffer storage fragments not completing ONLINE RECOGNITION, and obtain multiple second recognition results corresponding to described identified off-line; And S4, described multiple first recognition result of merging and described multiple second recognition result are to generate final recognition result.
The audio recognition method of the embodiment of the present invention, by voice messaging being cut into multiple speech buffer storage fragment, successively ONLINE RECOGNITION is carried out to multiple speech buffer storage fragment, and when ONLINE RECOGNITION makes a mistake, directly identified off-line is carried out to the unidentified speech buffer storage fragment completed, and merge multiple first recognition result corresponding to ONLINE RECOGNITION and multiple second recognition results corresponding to identified off-line, improve stability and the precision of speech recognition, and then improve the experience of user.
Second aspect present invention embodiment proposes a kind of speech recognition equipment, comprising: cutting module, for receiving the voice messaging of input, and described voice messaging is cut into multiple speech buffer storage fragment; ONLINE RECOGNITION module, for carrying out ONLINE RECOGNITION to described multiple speech buffer storage fragment successively; Obtain module, for when described ONLINE RECOGNITION makes a mistake, acquisition has completed multiple first recognition results corresponding to multiple speech buffer storage fragments of ONLINE RECOGNITION; Identified off-line module, for when described ONLINE RECOGNITION makes a mistake, carries out identified off-line to the multiple speech buffer storage fragments not completing ONLINE RECOGNITION, described acquisition module, also for obtaining multiple second recognition results corresponding to described identified off-line; And merging module, for merging described multiple first recognition result and described multiple second recognition result to generate final recognition result.
The speech recognition equipment of the embodiment of the present invention, by voice messaging being cut into multiple speech buffer storage fragment, successively ONLINE RECOGNITION is carried out to multiple speech buffer storage fragment, and when ONLINE RECOGNITION makes a mistake, directly identified off-line is carried out to the unidentified speech buffer storage fragment completed, and merge multiple first recognition result corresponding to ONLINE RECOGNITION and multiple second recognition results corresponding to identified off-line, improve stability and the precision of speech recognition, and then improve the experience of user.
Accompanying drawing explanation
Fig. 1 is the process flow diagram of audio recognition method according to an embodiment of the invention.
Fig. 2 is the process flow diagram of the audio recognition method according to the present invention's specific embodiment.
Fig. 3 is the structural representation of speech recognition equipment according to an embodiment of the invention.
Embodiment
Be described below in detail embodiments of the invention, the example of described embodiment is shown in the drawings, and wherein same or similar label represents same or similar element or has element that is identical or similar functions from start to finish.Be exemplary below by the embodiment be described with reference to the drawings, be intended to for explaining the present invention, and can not limitation of the present invention be interpreted as.
Below with reference to the accompanying drawings audio recognition method and the device of the embodiment of the present invention are described.
Fig. 1 is the process flow diagram of audio recognition method according to an embodiment of the invention.
As shown in Figure 1, audio recognition method can comprise:
The voice messaging of S1, reception input, and voice messaging is cut into multiple speech buffer storage fragment.
In an embodiment of the present invention, the voice messaging of user by input equipment such as microphone input can be received, then the voice messaging of reception is cut into multiple speech buffer storage fragment.
Particularly, multipair sound end can be obtained based on speech terminals detection technology, the speech data then between buffer memory often pair sound end, to generate multiple speech buffer storage fragment.Wherein, often pair of sound end comprises voice starting point and the voice terminal corresponding with voice starting point.
For example, the voice that user inputs can be analyzed, the sound end obtained is s1, e1, s2, e2, s3, e3 ... wherein, s1 is first voice starting point, e1 is first voice terminal, then the speech data between buffer memory s1 and e1, thus generate first speech buffer storage fragment v1; S2 is second voice starting point, and e2 is second voice terminal, the speech data then between buffer memory s2 and e2, thus generates second speech buffer storage fragment v2, by that analogy.The multiple speech buffer storage fragment of final generation.
S2, successively ONLINE RECOGNITION is carried out to multiple speech buffer storage fragment.
After voice messaging being cut into multiple speech buffer storage fragment, ONLINE RECOGNITION can be carried out to multiple speech buffer storage fragment successively.
Particularly, by high in the clouds engine, feature extraction is carried out to speech buffer storage fragment, to generate acoustic feature sequence, then according to acoustic model and dictionary, acoustic feature sequence is decoded, thus obtain the acoustic model sequence with acoustic feature sequences match, finally obtain the word sequence of answering with acoustic model sequence pair according to language model, as the first recognition result that speech buffer storage fragment is corresponding.Consistent with the technology of ONLINE RECOGNITION in prior art herein, therefore do not repeat.
S3, when ONLINE RECOGNITION makes a mistake, obtain and completed multiple first recognition results corresponding to multiple speech buffer storage fragments of ONLINE RECOGNITION, and identified off-line is carried out to the multiple speech buffer storage fragments not completing ONLINE RECOGNITION, and obtain multiple second recognition results corresponding to identified off-line.
In the process of ONLINE RECOGNITION, network may produce exception, the slack-off or suspension of such as network speed, then cause ONLINE RECOGNITION to produce mistake.Now, directly can transfer identified off-line to, instead of retry is carried out to ONLINE RECOGNITION.For example, when ONLINE RECOGNITION is to the 3rd speech buffer storage fragment v3, create mistake, then can obtain the first two and complete recognition result a1 and a2 corresponding to the speech buffer storage fragment of identification, delete first and second speech buffer storage fragment v1 and v2 simultaneously.Then, the speech buffer storage fragment v3 never completed starts, and carries out identified off-line, thus ensure that the seamless connection of ONLINE RECOGNITION and identified off-line, ensure that integrity degree and the degree of accuracy of speech recognition.
Wherein, the technology of identified off-line is consistent with the technology of ONLINE RECOGNITION, and difference is that identified off-line uses local engine.Particularly, by local engine, feature extraction is carried out to speech buffer storage fragment, to generate acoustic feature sequence, then according to acoustic model and dictionary, acoustic feature sequence is decoded, thus obtain the acoustic model sequence with acoustic feature sequences match, finally obtain the word sequence of answering with acoustic model sequence pair according to language model, as the second recognition result that speech buffer storage fragment is corresponding.Consistent with the technology of identified off-line in prior art herein, therefore do not repeat.
After identified off-line completes, corresponding multiple second recognition results can be obtained.
In the present embodiment, recognition result corresponding to ONLINE RECOGNITION is the first recognition result, and recognition result corresponding to identified off-line is the second recognition result, and both all can be text message.For example, a1 and a2 is recognition result corresponding to ONLINE RECOGNITION, is the first recognition result; And the recognition result after a3, being identified off-line obtains, be then the second recognition result.
S4, merge multiple first recognition result and multiple second recognition result to generate final recognition result.
After multiple first recognition result of acquisition and multiple second recognition result, can union operation be performed, thus generate final recognition result, such as A=a1+a2+a3+ ...
The audio recognition method of the embodiment of the present invention, by voice messaging being cut into multiple speech buffer storage fragment, successively ONLINE RECOGNITION is carried out to multiple speech buffer storage fragment, and when ONLINE RECOGNITION makes a mistake, directly identified off-line is carried out to the unidentified speech buffer storage fragment completed, and merge multiple first recognition result corresponding to ONLINE RECOGNITION and multiple second recognition results corresponding to identified off-line, improve stability and the precision of speech recognition, and then improve the experience of user.
Fig. 2 is the process flow diagram of the audio recognition method according to the present invention's specific embodiment.
As shown in Figure 2, audio recognition method can comprise:
S201, receives the voice messaging of input, and carries out speech terminals detection to voice messaging.
Such as user input voice messaging be " hello, is Baidu here, could you tell me and whom looks for? ", then can carry out speech detection based on voice endpoint detection technique to this voice messaging, thus the sound end obtained is s1, e1, s2, e2, s3, e3.Wherein, s1 is the voice starting point of " hello ", and e1 is the voice terminal of " hello ", the voice starting point that s2 is " being Baidu here ", the voice terminal that e2 is " being Baidu here ", s3 is the voice starting point of " could you tell me and whom looks for ", and e3 is the voice terminal of " could you tell me and whom looks for ".
S202, according to the sound end detected, generates multiple speech buffer storage.
According to above-mentioned sound end, " hello ", " being Baidu ", " could you tell me and whom looks for " three speech buffer storages can be generated here.
S203, carries out ONLINE RECOGNITION to multiple speech buffer storage.
Successively ONLINE RECOGNITION is carried out to " hello ", " being Baidu ", " could you tell me and whom looks for " here.
S204, when network makes a mistake, obtains and has completed the first recognition result corresponding to multiple speech buffer storages of identification, and carry out identified off-line to the multiple speech buffer storages not completing identification, to obtain the second corresponding recognition result.
When supposing that network makes a mistake, speech buffer storage " hello " and " being Baidu here " have identified, then can obtain corresponding the first recognition result " hello " and " being Baidu " here, and the first recognition result is that ONLINE RECOGNITION obtains, and is text message.Then, from speech buffer storage " be could you tell me and whom is looked for ", can carry out identified off-line, finally obtain the second corresponding recognition result and " could you tell me and whom looks for ", the second recognition result is that identified off-line obtains, and is also text message.
S205, carries out concatenation to the first recognition result and the second recognition result, to generate final recognition result.
Text message " hello ", " being Baidu here ", " could you tell me and whom looks for " are carried out to concatenation, thus are generated final recognition result, namely text message " hello, is Baidu here, may I ask that who are you? "
The audio recognition method of the embodiment of the present invention, by input voice information is carried out speech terminals detection, then according to the sound end detected, generate multiple speech buffer storage, and ONLINE RECOGNITION is carried out to multiple speech buffer storage, and when network makes a mistake, obtain the first recognition result identified, then identified off-line is carried out to the unidentified multiple speech buffer storages completed, obtain the second corresponding recognition result, finally the first recognition result and the second recognition result are spliced, obtain final voice identification result, improve stability and the precision of speech recognition, and then improve the experience of user.
For achieving the above object, the present invention also proposes a kind of speech recognition equipment.
Fig. 3 is the structural representation of speech recognition equipment according to an embodiment of the invention.
As shown in Figure 3, this speech recognition equipment can comprise: cutting module 110, ONLINE RECOGNITION module 120, acquisition module 130, identified off-line module 140 and merging module 150.
Voice messaging for receiving the voice messaging of input, and is cut into multiple speech buffer storage fragment by cutting module 110.
In an embodiment of the present invention, cutting module 110 can receive the voice messaging of user by input equipment such as microphone input, then the voice messaging of reception is cut into multiple speech buffer storage fragment.
Particularly, cutting module 110 can obtain multipair sound end based on speech terminals detection technology, the speech data then between buffer memory often pair sound end, to generate multiple speech buffer storage fragment.Wherein, often pair of sound end comprises voice starting point and the voice terminal corresponding with voice starting point.
For example, the voice that user inputs can be analyzed, the sound end obtained is s1, e1, s2, e2, s3, e3 ... wherein, s1 is first voice starting point, e1 is first voice terminal, then the speech data between buffer memory s1 and e1, thus generate first speech buffer storage fragment v1; S2 is second voice starting point, and e2 is second voice terminal, the speech data then between buffer memory s2 and e2, thus generates second speech buffer storage fragment v2, by that analogy.The multiple speech buffer storage fragment of final generation.
ONLINE RECOGNITION module 120 is for carrying out ONLINE RECOGNITION to multiple speech buffer storage fragment successively.
After voice messaging is cut into multiple speech buffer storage fragment by cutting module 110, ONLINE RECOGNITION module 120 can carry out ONLINE RECOGNITION to multiple speech buffer storage fragment successively.
Particularly, ONLINE RECOGNITION module 120 carries out feature extraction by high in the clouds engine to speech buffer storage fragment, to generate acoustic feature sequence, then according to acoustic model and dictionary, acoustic feature sequence is decoded, thus obtain the acoustic model sequence with acoustic feature sequences match, finally obtain the word sequence of answering with acoustic model sequence pair according to language model, as the first recognition result that speech buffer storage fragment is corresponding.Consistent with the technology of ONLINE RECOGNITION in prior art herein, therefore do not repeat.
Obtain module 130 for when ONLINE RECOGNITION makes a mistake, acquisition has completed multiple first recognition results corresponding to multiple speech buffer storage fragments of ONLINE RECOGNITION.
Identified off-line module 140, for when ONLINE RECOGNITION makes a mistake, carries out identified off-line to the multiple speech buffer storage fragments not completing ONLINE RECOGNITION, obtains module 130 also for obtaining multiple second recognition results corresponding to identified off-line.
In the process of ONLINE RECOGNITION, network may produce exception, the slack-off or suspension of such as network speed, then cause ONLINE RECOGNITION to produce mistake.Now, directly can transfer identified off-line to, instead of retry is carried out to ONLINE RECOGNITION.For example, when ONLINE RECOGNITION is to the 3rd speech buffer storage fragment v3, create mistake, then can obtain the first two and complete recognition result a1 and a2 corresponding to the speech buffer storage fragment of identification, delete first and second speech buffer storage fragment v1 and v2 simultaneously.Then, the speech buffer storage fragment v3 never completed starts, and carries out identified off-line.
Particularly, identified off-line module 140 carries out feature extraction by local engine to speech buffer storage fragment, to generate acoustic feature sequence, then according to acoustic model and dictionary, acoustic feature sequence is decoded, thus obtain the acoustic model sequence with acoustic feature sequences match, finally obtain the word sequence of answering with acoustic model sequence pair according to language model, as the second recognition result that speech buffer storage fragment is corresponding.Consistent with the technology of identified off-line in prior art herein, therefore do not repeat.
After identified off-line completes, obtain module 130 and can obtain corresponding multiple second recognition results.
In the present embodiment, recognition result corresponding to ONLINE RECOGNITION is the first recognition result, and recognition result corresponding to identified off-line is the second recognition result, and both all can be text message.For example, a1 and a2 is recognition result corresponding to ONLINE RECOGNITION, is the first recognition result; And the recognition result after a3, being identified off-line obtains, be then the second recognition result.
Merge module 150 for merging multiple first recognition result and multiple second recognition result to generate final recognition result.
After acquisition module 130 obtains multiple first recognition result and multiple second recognition result, merging module 150 can perform union operation, thus generates final recognition result, such as A=a1+a2+a3+ ...
The speech recognition equipment of the embodiment of the present invention, by voice messaging being cut into multiple speech buffer storage fragment, successively ONLINE RECOGNITION is carried out to multiple speech buffer storage fragment, and when ONLINE RECOGNITION makes a mistake, directly identified off-line is carried out to the unidentified speech buffer storage fragment completed, and merge multiple first recognition result corresponding to ONLINE RECOGNITION and multiple second recognition results corresponding to identified off-line, improve stability and the precision of speech recognition, and then improve the experience of user.
In describing the invention, it will be appreciated that, term " " center ", " longitudinal direction ", " transverse direction ", " length ", " width ", " thickness ", " on ", D score, " front ", " afterwards ", " left side ", " right side ", " vertically ", " level ", " top ", " end " " interior ", " outward ", " clockwise ", " counterclockwise ", " axis ", " radial direction ", orientation or the position relationship of the instruction such as " circumference " are based on orientation shown in the drawings or position relationship, only the present invention for convenience of description and simplified characterization, instead of indicate or imply that the device of indication or element must have specific orientation, with specific azimuth configuration and operation, therefore limitation of the present invention can not be interpreted as.
In addition, term " first ", " second " only for describing object, and can not be interpreted as instruction or hint relative importance or imply the quantity indicating indicated technical characteristic.Thus, be limited with " first ", the feature of " second " can express or impliedly comprise at least one this feature.In describing the invention, the implication of " multiple " is at least two, such as two, three etc., unless otherwise expressly limited specifically.
In the present invention, unless otherwise clearly defined and limited, the term such as term " installation ", " being connected ", " connection ", " fixing " should be interpreted broadly, and such as, can be fixedly connected with, also can be removably connect, or integral; Can be mechanical connection, also can be electrical connection; Can be directly be connected, also indirectly can be connected by intermediary, can be the connection of two element internals or the interaction relationship of two elements, unless otherwise clear and definite restriction.For the ordinary skill in the art, above-mentioned term concrete meaning in the present invention can be understood as the case may be.
In the present invention, unless otherwise clearly defined and limited, fisrt feature second feature " on " or D score can be that the first and second features directly contact, or the first and second features are by intermediary indirect contact.And, fisrt feature second feature " on ", " top " and " above " but fisrt feature directly over second feature or oblique upper, or only represent that fisrt feature level height is higher than second feature.Fisrt feature second feature " under ", " below " and " below " can be fisrt feature immediately below second feature or tiltedly below, or only represent that fisrt feature level height is less than second feature.
In the description of this instructions, specific features, structure, material or feature that the description of reference term " embodiment ", " some embodiments ", " example ", " concrete example " or " some examples " etc. means to describe in conjunction with this embodiment or example are contained at least one embodiment of the present invention or example.In this manual, to the schematic representation of above-mentioned term not must for be identical embodiment or example.And the specific features of description, structure, material or feature can combine in one or more embodiment in office or example in an appropriate manner.In addition, when not conflicting, the feature of the different embodiment described in this instructions or example and different embodiment or example can carry out combining and combining by those skilled in the art.
Although illustrate and describe embodiments of the invention above, be understandable that, above-described embodiment is exemplary, can not be interpreted as limitation of the present invention, and those of ordinary skill in the art can change above-described embodiment within the scope of the invention, revises, replace and modification.

Claims (10)

1. an audio recognition method, is characterized in that, comprising:
The voice messaging of S1, reception input, and described voice messaging is cut into multiple speech buffer storage fragment;
S2, successively ONLINE RECOGNITION is carried out to described multiple speech buffer storage fragment;
S3, when described ONLINE RECOGNITION makes a mistake, obtain and completed multiple first recognition results corresponding to multiple speech buffer storage fragments of ONLINE RECOGNITION, and identified off-line is carried out to the multiple speech buffer storage fragments not completing ONLINE RECOGNITION, and obtain multiple second recognition results corresponding to described identified off-line; And
S4, merge described multiple first recognition result and described multiple second recognition result to generate final recognition result.
2. the method for claim 1, is characterized in that, described described voice messaging is cut into multiple speech buffer storage fragment, comprising:
Obtain multipair sound end based on speech terminals detection technology, wherein, often pair of sound end comprises voice starting point and the voice terminal corresponding with described voice starting point;
Speech data described in buffer memory between often pair of sound end, to generate described multiple speech buffer storage fragment.
3. the method for claim 1, is characterized in that, describedly carries out ONLINE RECOGNITION to described multiple speech buffer storage fragment successively, comprising:
By high in the clouds engine, feature extraction is carried out to described speech buffer storage fragment, to generate acoustic feature sequence;
According to acoustic model and dictionary, described acoustic feature sequence is decoded, obtain the acoustic model sequence with described acoustic feature sequences match; And
The word sequence of answering with described acoustic model sequence pair is obtained, as the first recognition result that described speech buffer storage fragment is corresponding according to language model.
4. the method for claim 1, is characterized in that, the described speech buffer storage fragment to not completing ONLINE RECOGNITION carries out identified off-line, comprising:
By local engine, feature extraction is carried out to described speech buffer storage fragment, to generate acoustic feature sequence;
According to acoustic model and dictionary, described acoustic feature sequence is decoded, obtain the acoustic model sequence with described acoustic feature sequences match; And
The word sequence of answering with described acoustic model sequence pair is obtained, as the second recognition result that described speech buffer storage fragment is corresponding according to language model.
5. the method for claim 1, is characterized in that, described first recognition result and described second recognition result are text message.
6. a speech recognition equipment, is characterized in that, comprising:
Cutting module, for receiving the voice messaging of input, and is cut into multiple speech buffer storage fragment by described voice messaging;
ONLINE RECOGNITION module, for carrying out ONLINE RECOGNITION to described multiple speech buffer storage fragment successively;
Obtain module, for when described ONLINE RECOGNITION makes a mistake, acquisition has completed multiple first recognition results corresponding to multiple speech buffer storage fragments of ONLINE RECOGNITION;
Identified off-line module, for when described ONLINE RECOGNITION makes a mistake, carries out identified off-line to the multiple speech buffer storage fragments not completing ONLINE RECOGNITION, described acquisition module, also for obtaining multiple second recognition results corresponding to described identified off-line; And
Merge module, for merging described multiple first recognition result and described multiple second recognition result to generate final recognition result.
7. device as claimed in claim 6, is characterized in that, described cutting module, specifically for:
Multipair sound end is obtained based on speech terminals detection technology, and the speech data described in buffer memory between often pair of sound end, to generate described multiple speech buffer storage fragment, wherein, often pair of sound end comprises voice starting point and the voice terminal corresponding with described voice starting point.
8. device as claimed in claim 6, is characterized in that, described ONLINE RECOGNITION module, specifically for:
For carrying out feature extraction by high in the clouds engine to described speech buffer storage fragment, to generate acoustic feature sequence, according to acoustic model and dictionary, described acoustic feature sequence is decoded, and obtain the acoustic model sequence with described acoustic feature sequences match, and obtain the word sequence of answering with described acoustic model sequence pair according to language model, as the first recognition result that described speech buffer storage fragment is corresponding.
9. method as claimed in claim 6, is characterized in that, described identified off-line module, specifically for:
By local engine, feature extraction is carried out to described speech buffer storage fragment, to generate acoustic feature sequence, according to acoustic model and dictionary, described acoustic feature sequence is decoded, and obtain the acoustic model sequence with described acoustic feature sequences match, and obtain the word sequence of answering with described acoustic model sequence pair according to language model, as the second recognition result that described speech buffer storage fragment is corresponding.
10. device as claimed in claim 6, it is characterized in that, described first recognition result and described second recognition result are text message.
CN201510319421.5A 2015-06-11 2015-06-11 Voice recognition method and device Pending CN104916283A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510319421.5A CN104916283A (en) 2015-06-11 2015-06-11 Voice recognition method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510319421.5A CN104916283A (en) 2015-06-11 2015-06-11 Voice recognition method and device

Publications (1)

Publication Number Publication Date
CN104916283A true CN104916283A (en) 2015-09-16

Family

ID=54085312

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510319421.5A Pending CN104916283A (en) 2015-06-11 2015-06-11 Voice recognition method and device

Country Status (1)

Country Link
CN (1) CN104916283A (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105632487A (en) * 2015-12-31 2016-06-01 北京奇艺世纪科技有限公司 Voice recognition method and device
CN105719642A (en) * 2016-02-29 2016-06-29 黄博 Continuous and long voice recognition method and system and hardware equipment
CN107170450A (en) * 2017-06-14 2017-09-15 上海木爷机器人技术有限公司 Audio recognition method and device
WO2017219495A1 (en) * 2016-06-21 2017-12-28 宇龙计算机通信科技(深圳)有限公司 Speech recognition method and system
CN107767873A (en) * 2017-10-20 2018-03-06 广东电网有限责任公司惠州供电局 A kind of fast and accurately offline speech recognition equipment and method
CN108172212A (en) * 2017-12-25 2018-06-15 横琴国际知识产权交易中心有限公司 A kind of voice Language Identification and system based on confidence level
CN109064815A (en) * 2018-09-04 2018-12-21 北京粉笔未来科技有限公司 Online testing method and apparatus calculate equipment and storage medium
CN109065037A (en) * 2018-07-10 2018-12-21 福州瑞芯微电子股份有限公司 A kind of audio method of flow control based on interactive voice
CN109410927A (en) * 2018-11-29 2019-03-01 北京蓦然认知科技有限公司 Offline order word parses the audio recognition method combined, device and system with cloud
CN109741753A (en) * 2019-01-11 2019-05-10 百度在线网络技术(北京)有限公司 A kind of voice interactive method, device, terminal and server
CN109840052A (en) * 2019-01-31 2019-06-04 成都超有爱科技有限公司 A kind of audio-frequency processing method, device, electronic equipment and storage medium
WO2019134474A1 (en) * 2018-01-08 2019-07-11 珠海格力电器股份有限公司 Voice control method and device
CN110060687A (en) * 2016-09-05 2019-07-26 北京金山软件有限公司 A kind of conversion of voice messaging, information generating method and device
CN111210822A (en) * 2020-02-12 2020-05-29 支付宝(杭州)信息技术有限公司 Speech recognition method and device
CN111445911A (en) * 2020-03-28 2020-07-24 大连鼎创科技开发有限公司 Home offline online voice recognition switching logic method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101409072A (en) * 2007-10-10 2009-04-15 松下电器产业株式会社 Embedded equipment, bimodule voice synthesis system and method
CN102682770A (en) * 2012-02-23 2012-09-19 西安雷迪维护系统设备有限公司 Cloud-computing-based voice recognition system
CN102708865A (en) * 2012-04-25 2012-10-03 北京车音网科技有限公司 Method, device and system for voice recognition
CN103079258A (en) * 2013-01-09 2013-05-01 广东欧珀移动通信有限公司 Method for improving speech recognition accuracy and mobile intelligent terminal
WO2014186143A1 (en) * 2013-05-13 2014-11-20 Facebook, Inc. Hybrid, offline/online speech translation system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101409072A (en) * 2007-10-10 2009-04-15 松下电器产业株式会社 Embedded equipment, bimodule voice synthesis system and method
CN102682770A (en) * 2012-02-23 2012-09-19 西安雷迪维护系统设备有限公司 Cloud-computing-based voice recognition system
CN102708865A (en) * 2012-04-25 2012-10-03 北京车音网科技有限公司 Method, device and system for voice recognition
CN103079258A (en) * 2013-01-09 2013-05-01 广东欧珀移动通信有限公司 Method for improving speech recognition accuracy and mobile intelligent terminal
WO2014186143A1 (en) * 2013-05-13 2014-11-20 Facebook, Inc. Hybrid, offline/online speech translation system

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105632487A (en) * 2015-12-31 2016-06-01 北京奇艺世纪科技有限公司 Voice recognition method and device
CN105632487B (en) * 2015-12-31 2020-04-21 北京奇艺世纪科技有限公司 Voice recognition method and device
CN105719642A (en) * 2016-02-29 2016-06-29 黄博 Continuous and long voice recognition method and system and hardware equipment
WO2017219495A1 (en) * 2016-06-21 2017-12-28 宇龙计算机通信科技(深圳)有限公司 Speech recognition method and system
CN110060687A (en) * 2016-09-05 2019-07-26 北京金山软件有限公司 A kind of conversion of voice messaging, information generating method and device
CN107170450A (en) * 2017-06-14 2017-09-15 上海木爷机器人技术有限公司 Audio recognition method and device
CN107767873A (en) * 2017-10-20 2018-03-06 广东电网有限责任公司惠州供电局 A kind of fast and accurately offline speech recognition equipment and method
CN108172212A (en) * 2017-12-25 2018-06-15 横琴国际知识产权交易中心有限公司 A kind of voice Language Identification and system based on confidence level
CN108172212B (en) * 2017-12-25 2020-09-11 横琴国际知识产权交易中心有限公司 Confidence-based speech language identification method and system
WO2019134474A1 (en) * 2018-01-08 2019-07-11 珠海格力电器股份有限公司 Voice control method and device
CN109065037A (en) * 2018-07-10 2018-12-21 福州瑞芯微电子股份有限公司 A kind of audio method of flow control based on interactive voice
CN109065037B (en) * 2018-07-10 2023-04-25 瑞芯微电子股份有限公司 Audio stream control method based on voice interaction
CN109064815A (en) * 2018-09-04 2018-12-21 北京粉笔未来科技有限公司 Online testing method and apparatus calculate equipment and storage medium
CN109410927A (en) * 2018-11-29 2019-03-01 北京蓦然认知科技有限公司 Offline order word parses the audio recognition method combined, device and system with cloud
CN109410927B (en) * 2018-11-29 2020-04-03 北京蓦然认知科技有限公司 Voice recognition method, device and system combining offline command word and cloud analysis
CN109741753A (en) * 2019-01-11 2019-05-10 百度在线网络技术(北京)有限公司 A kind of voice interactive method, device, terminal and server
CN109840052A (en) * 2019-01-31 2019-06-04 成都超有爱科技有限公司 A kind of audio-frequency processing method, device, electronic equipment and storage medium
CN109840052B (en) * 2019-01-31 2022-03-18 成都超有爱科技有限公司 Audio processing method and device, electronic equipment and storage medium
CN111210822A (en) * 2020-02-12 2020-05-29 支付宝(杭州)信息技术有限公司 Speech recognition method and device
CN111445911A (en) * 2020-03-28 2020-07-24 大连鼎创科技开发有限公司 Home offline online voice recognition switching logic method

Similar Documents

Publication Publication Date Title
CN104916283A (en) Voice recognition method and device
CN107195295B (en) Voice recognition method and device based on Chinese-English mixed dictionary
CN107731228B (en) Text conversion method and device for English voice information
US9390711B2 (en) Information recognition method and apparatus
CN111309889A (en) Method and device for text processing
CN112487173B (en) Man-machine conversation method, device and storage medium
CN112100349A (en) Multi-turn dialogue method and device, electronic equipment and storage medium
US9601107B2 (en) Speech recognition system, recognition dictionary registration system, and acoustic model identifier series generation apparatus
CN107168546B (en) Input prompting method and device
CN105513590A (en) Voice recognition method and device
CN108710704B (en) Method and device for determining conversation state, electronic equipment and storage medium
CN109243468B (en) Voice recognition method and device, electronic equipment and storage medium
CN104992704A (en) Speech synthesizing method and device
CN102322866B (en) Navigation method and system based on natural speech recognition
CN112509566B (en) Speech recognition method, device, equipment, storage medium and program product
CN103177721A (en) Voice recognition method and system
CN103594085A (en) Method and system providing speech recognition result
CN103514882A (en) Voice identification method and system
CN109410923B (en) Speech recognition method, apparatus, system and storage medium
CN113282736B (en) Dialogue understanding and model training method, device, equipment and storage medium
CN112861548A (en) Natural language generation and model training method, device, equipment and storage medium
CN104714954A (en) Information searching method and system based on context understanding
CN113793599B (en) Training method of voice recognition model, voice recognition method and device
CN105070289A (en) English name recognition method and device
CN105469801A (en) Input speech restoring method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20150916

RJ01 Rejection of invention patent application after publication