CN103514882A - Voice identification method and system - Google Patents

Voice identification method and system Download PDF

Info

Publication number
CN103514882A
CN103514882A CN201210227158.3A CN201210227158A CN103514882A CN 103514882 A CN103514882 A CN 103514882A CN 201210227158 A CN201210227158 A CN 201210227158A CN 103514882 A CN103514882 A CN 103514882A
Authority
CN
China
Prior art keywords
phonetic order
recognition result
known variables
speaker
named entity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201210227158.3A
Other languages
Chinese (zh)
Other versions
CN103514882B (en
Inventor
贾磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201210227158.3A priority Critical patent/CN103514882B/en
Publication of CN103514882A publication Critical patent/CN103514882A/en
Application granted granted Critical
Publication of CN103514882B publication Critical patent/CN103514882B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Telephonic Communication Services (AREA)

Abstract

The invention provides a voice identification method and system. The voice identification method comprises the steps that A, a client terminal module sends an obtained user voice instruction to a server module; B, the server module identifies the voice instruction preliminarily by the utilization of an instruction template set and a named entity set, obtains a preliminary identification result, and sends the preliminary identification result to the client terminal module, wherein the preliminary identification result is an identification result containing unknown variable information; C, the client terminal module identifies unknown variables through the named entity information stored in the client terminal module so as to obtain an integral identification result of the voice instruction. By means of the mode, the computing resources of a server can be fully utilized, and voice identification accuracy is improved.

Description

A kind of audio recognition method and system
[technical field]
The present invention relates to speech recognition technology, particularly a kind of method and system of speech recognition.
[background technology]
Along with the development of the software and hardware technology relevant to mobile terminal, it is more and more intelligent that mobile terminal becomes.By voice command, mobile terminal is operated, it is the direction of mobile terminal technical development, and to realize the control of voice command to mobile terminal, its core is correctly to identify user's voice command, only have user's voice command is correctly identified, could triggering mobile terminals carry out corresponding action.In prior art, the speech recognition of mobile terminal has two kinds of methods conventionally:
First method, is in the built-in speech recognition system of mobile terminal, when user sends phonetic order to mobile terminal, utilizes this built-in system to identify phonetic order.This method can make full use of the personal information (for example address list) of preserving on mobile terminal and realize speech recognition, more effective when carrying out the voice operating of phonetic dialing and so on.But this mode exists a problem, and the computing power of mobile terminal is limited, built-in speech recognition system is difficult to complicated voice command to be identified.The for example login of the webpage on mobile terminal, map operation, song inquiry, or the speech recognition relating in the function such as information search, built-in speech recognition system has just been difficult to, and because the computing power of mobile terminal is limited, built-in speech recognition system is difficult to apply complicated speech recognition algorithm, even if also caused this method of prior art to be applied in phonetic dialing, also had the defect that accuracy of identification is low.
Second method, is the phonetic order by acquisition for mobile terminal user, then the phonetic order getting is sent to server, utilizes the speech recognition system that server is set up in advance to identify phonetic order, and recognition result is back to mobile terminal the most at last.This mode can make full use of the powerful computing power of server, thereby realize the function that complicated phonetic order is identified, its shortcoming is, this mode cannot make full use of the personal information of storing on mobile terminal, thereby can have influence on the accuracy of identification of sound bite relevant with the personal information of storing on mobile terminal in phonetic order.
[summary of the invention]
Technical matters to be solved by this invention is to provide a kind of method and system of speech recognition, to realize the computational resource that makes full use of server, improves the object of accuracy of identification.
The present invention is the system that technical scheme that technical solution problem adopts is to provide a kind of speech recognition, comprising: client modules and server module, and wherein, described client modules comprises: voice collecting unit, for obtaining user's phonetic order; Client communication unit, for being sent to server module by described phonetic order; Described server module comprises: the first recognition unit, be used for utilizing instruction template set and named entity set tentatively to identify described phonetic order, obtain preliminary recognition result, wherein said preliminary recognition result is the recognition result that contains known variables information, and described known variables is sound bite relevant to the named entity information of described client stores in described phonetic order; Server communication unit, for being sent to described client modules by described preliminary recognition result; Described client modules also comprises: the second recognition unit, and for utilizing the named entity information of described client stores to identify described known variables, to obtain the complete recognition result of described phonetic order.
The preferred embodiment one of according to the present invention, described the first recognition unit comprises: generation unit between the first decode empty, in advance instruction template set and named entity set being compiled into respectively to two independently WFST networks, to form between the first decode empty; The first decoding unit, for when receiving described phonetic order, utilize between described the first decode empty described phonetic order is decoded, to determine the instruction template under described phonetic order, and the start-stop of described known variables in described phonetic order constantly, and the instruction template under described phonetic order and the start-stop of described known variables in described phonetic order are constantly as described preliminary recognition result.
The preferred embodiment one of according to the present invention, described the second recognition unit comprises: generation unit between the second decode empty, in advance the named entity information of described client stores being compiled into WFST network, to form between the second decode empty; The second decoding unit, for when receiving described preliminary recognition result, start-stop according to described known variables in described phonetic order constantly, from described phonetic order, determine sound bite to be identified, and utilize between described the second decode empty described sound bite to be identified is decoded, obtain the recognition result of described known variables.
The preferred embodiment one of according to the present invention, described server module further comprises: feature extraction unit, for extracting the acoustic feature relevant to speaker from described phonetic order; And described server communication unit is further used for the described acoustic feature relevant to speaker to be sent to described client modules.
The preferred embodiment one of according to the present invention, described client modules further comprises: acoustic training model unit, for utilizing in advance speaker's speech samples training acoustic model relevant to speaker; And, when described the second decoding unit is decoded to described sound bite to be identified, utilize between described relevant to speaker acoustic feature, described the second decode empty and acoustic model described and that speaker is relevant is decoded to described sound bite to be identified.
The present invention also provides a kind of method of speech recognition, comprising: A. client modules is sent to server module by the user speech instruction of obtaining; B. described server module utilizes instruction template set and named entity set tentatively to identify described phonetic order, obtain preliminary recognition result, and described preliminary recognition result is sent to described client modules, wherein said preliminary recognition result is the recognition result that contains known variables information, and described known variables is sound bite relevant to the named entity information of described client stores in described phonetic order; C. described client modules utilizes the named entity information of described client stores to identify described known variables, to obtain the complete recognition result of described phonetic order.
The preferred embodiment one of according to the present invention, the step that described server module utilizes instruction template set and named entity set to carry out preliminary identification to described phonetic order comprises: described server module is when receiving described phonetic order, utilize between the first decode empty described phonetic order is decoded, to determine the instruction template under described phonetic order, and the start-stop of known variables in described phonetic order constantly, and the start-stop in described phonetic order of the instruction template under described phonetic order and described known variables is constantly as described preliminary recognition result, between wherein said the first decode empty, in advance instruction template set and named entity set being compiled into respectively to two independently forms after WFST network.
The preferred embodiment one of according to the present invention, the step that described client modules utilizes the named entity information of described client stores to identify described known variables comprises: described client modules is when receiving described preliminary recognition result, start-stop according to described known variables in described phonetic order constantly, from described phonetic order, determine sound bite to be identified, and utilize between the second decode empty described sound bite to be identified is decoded, obtain the recognition result of described known variables, between wherein said the second decode empty, be to form after in advance the named entity information of described client stores being compiled into WFST network.
The preferred embodiment one of according to the present invention, described step B further comprises: server module extracts the acoustic feature relevant to speaker from described phonetic order, and the described acoustic feature relevant to speaker is sent to described client modules.
The preferred embodiment one of according to the present invention, when described client modules is decoded to described sound bite to be identified, utilize relevant acoustic model between described relevant to speaker acoustic feature, described the second decode empty and with speaker to decode to described sound bite to be identified, wherein the acoustic model relevant to speaker utilizes the training of speaker's speech samples to obtain in advance.
As can be seen from the above technical solutions, the present invention is by being divided into the identification of phonetic order two stages, at server cognitive phase, obtain the preliminary recognition result that comprises known variables information, at client cognitive phase, known variables is identified, thereby obtained the complete recognition result of phonetic order, the computational resource of server can be made full use of, the precision of the information raising speech recognition that is stored in client can be made full use of again simultaneously.
[accompanying drawing explanation]
Fig. 1 is the structural representation block diagram of the embodiment of speech recognition system in the present invention;
Fig. 2 is the structural representation block diagram of the embodiment of the first recognition unit in the present invention;
Fig. 3 is the structural representation block diagram of the embodiment of the second recognition unit in the present invention;
Fig. 4 is the structural representation block diagram of another embodiment of the server module in the speech recognition system in the present invention;
Fig. 5 is the structural representation block diagram of another embodiment of the client modules in the speech recognition system in the present invention;
Fig. 6 is the schematic flow sheet of the embodiment of audio recognition method in the present invention.
[embodiment]
In order to make the object, technical solutions and advantages of the present invention clearer, below in conjunction with the drawings and specific embodiments, describe the present invention.
Please refer to Fig. 1, Fig. 1 is the structural representation block diagram of the embodiment of speech recognition system in the present invention.As shown in Figure 1, this system comprises: client modules 101 and server module 201.Wherein client modules 101 comprises: voice collecting unit 1011, client communication unit 1012 and the second recognition unit 1013.Server module 201 comprises: the first recognition unit 2011 and server communication unit 2012.
Wherein, voice collecting unit 1011, for obtaining user's phonetic order.Client communication unit 2012, for being sent to server module 201 by the phonetic order obtaining.The first recognition unit 2011, for utilizing the phonetic order template set of collection and named entity set tentatively to identify phonetic order, obtain preliminary recognition result, wherein preliminary recognition result is the recognition result that contains known variables information, and known variables is sound bite relevant to the named entity information of client stores in phonetic order.Server communication unit 2012, for being sent to client modules 101 by preliminary recognition result.The second recognition unit 1013, for utilizing the named entity information of client stores to identify known variables, to obtain the complete recognition result of phonetic order.
Below by specific embodiment, said system is introduced.
Please refer to Fig. 2, Fig. 2 is the structural representation block diagram of the embodiment of the first recognition unit in the present invention.As shown in Figure 2, the first recognition unit 2011 comprises: generation unit 2011_1 and the first decoding unit 2011_2 between the first decode empty.
Generation unit 2011_1 between the first decode empty wherein, for in advance the instruction template set of collection and named entity set being compiled into respectively to two independently WFST(weighted finite state transducer, weighted finite state transducer) network, to form between the first decode empty.The first decoding unit 2011_2, for when receiving the phonetic order of client communication unit 2012 transmissions, utilize between the first decode empty phonetic order is decoded, to determine the instruction template under sound instruction, and the start-stop of known variables in phonetic order constantly, and the instruction template under phonetic order and the start-stop of known variables in phonetic order are constantly as preliminary recognition result.
Instruction template, has explained the indicated action of instruction.For example " making a phone call to * * * " is exactly an instruction template, and wherein " * * * " is template groove position, represents that this place can be replaced by named entity.Named entity combination by an instruction template and this template groove type that position limits, just can form a complete instruction.For example " give * * * make a phone call " this instruction template middle slot position " * * * " is defined as name, and named entity " Zhang San " " the making a phone call to Zhang San " obtaining that combine with this instruction template just formed a complete instruction.
In the present invention, instruction template and named entity can obtain by data mining in advance, are appreciated that this can pass through existing techniques in realizing, because this is not emphasis of the present invention, is no longer described in detail at this.
Named entity in the present invention in named entity set, can contain that the those skilled in the art such as name, place name, song title, title can expect, relevant to application on mobile terminal various entities.
In the present invention, between the first decode empty, generation unit 2011_1 is compiled as respectively two independently WFST networks by instruction template set and named entity set, has formed between the first decode empty.WFST network, the network that while being decoding, all possible paths form.Utilize WFST network, the first decoding unit 2011_2 is when decoding, can be according to each frame of phonetic order, actual dynamic expansion, in the middle of expansion, each paths that acoustic model is expanded for each frame provides probability estimate score value, in this process, the first decoding unit 2011_2 carries out beta pruning according to the score value of extensions path to extensions path, when being decoded to the last frame of phonetic order, in all extensions paths, score Gao path is exactly the recognition result that phonetic order obtains at server end.In the present invention, due between the first decode empty by two respectively the independent WFST network of include instruction Template Information and named entity information form, therefore, last word of the recognition result obtaining from server end is recalled forward, just can determine the instruction template that this recognition result is affiliated, and the part matching with named entity in this recognition result, the corresponding sound bite of part matching with named entity in this recognition result, is exactly the known variables in phonetic order.Be appreciated that after having determined the known variables in phonetic order, the start-stop of this known variables in phonetic order also just determined constantly.The first decoding unit 2011_2 is by the instruction template under phonetic order, and the preliminary recognition result of the start-stop of known variables in phonetic order conduct constantly, and the server communication unit 2012 in Fig. 1 is sent to client modules 101 by preliminary recognition result.
In above-mentioned explanation; compiling WFST network and the method for utilizing WFST network to decode; can list of references: Mehryar Mohri; Fernando Pereira; Michael Riley; Weighted Finite-State Transducers in Speech Recognition, Computer Speech & Language Volume16, Issue1, January2002, is called document 1 below Pages69-88(), do not repeat them here herein.
Be appreciated that, the effect that acoustic model plays in decoding is that the probable value that acoustic signal is occurred is estimated, therefore, acoustic model in the present invention can be used the acoustic model of any type, an existing acoustic model for example, or the acoustic model being provided by third party, the present invention does not limit this.
Please refer to Fig. 3, Fig. 3 is the structural representation block diagram of the embodiment of the second recognition unit in the present invention.As shown in Figure 3, the second recognition unit 1013 comprises: generation unit 1013_1, the second decoding unit 1013_2 between the second decode empty.
Generation unit 1013_1 between the second decode empty wherein, for being compiled into WFST network by the named entity information of client stores in advance, to form between the second decode empty.The second decoding unit 1013_2, for when receiving preliminary recognition result, start-stop according to known variables in phonetic order constantly, from phonetic order, determine sound bite to be identified, and utilize between the second decode empty sound bite to be identified is decoded, obtain the recognition result of known variables.
The named entity information of client stores, comprises information, the song title of client stores, the named entity relating in the types of applications that the title of client stores etc. it may occur to persons skilled in the art that in the address list of client stores.Between the second decode empty generation unit 1013_1 obtain the mode between the second decode empty and acquisition the first decode empty of introducing above between mode be similarly, the concrete document 1 that please refer to.
In the present invention, the second decoding unit 1013_2 utilizes between the second decode empty known variables is decoded, it is a decode procedure that limits length (being the length of this sound bite of known variables), compared with prior art, the time of decoding shortens dramatically, and because calculated amount reduces, the limited computational resource of client also can be born this calculated amount preferably.
In the present embodiment, the acoustic model that the second decoding unit 1013_2 is used in decode procedure, the acoustic model using with the first decoding unit 2011_2 is the same, can be acoustic model arbitrarily.
Please refer to Fig. 4, Fig. 4 is the structural representation block diagram of another embodiment of the server module in the speech recognition system in the present invention.As shown in Figure 4, in this embodiment, server module 201 further comprises feature extraction unit 2013, and feature extraction unit 2013 is for extracting the acoustic feature relevant to speaker from phonetic order.And in this embodiment, server communication unit 2012 is further used for the acoustic feature relevant to speaker to be sent to client modules 101.
Please refer to Fig. 5, Fig. 5 is the structural representation block diagram of another embodiment of the client modules in the speech recognition system in the present invention.As shown in Figure 5, in this embodiment, client modules 101 further comprises acoustic training model unit 1014, and acoustic training model unit 1014 is for utilizing in advance speaker's speech samples training acoustic model relevant to speaker.And in this embodiment, when the second 10132 pairs of decoding units sound bite to be identified (being sound bite corresponding to known variables) is decoded, utilize relevant acoustic model between the acoustic feature relevant to speaker that server module 201 sends, the second decode empty and with speaker to decode to sound bite to be identified.
In the embodiment of the speech recognition system shown in Fig. 4 and Fig. 5, at server end, except phonetic order is tentatively identified, also can from phonetic order, extract the acoustic feature relevant to speaker, and, in client, can set up the acoustic model relevant to speaker in advance, by the system of the present embodiment, the computational resource that can make full use of server end completes the evaluation work that extracts the acoustic feature relevant to speaker, and client is owing to there being the acoustic model relevant to speaker, after obtaining the acoustic feature relevant to speaker, can more effectively decode, adopt the system of the present embodiment, can fully accelerate the decoding speed in client, and because the acoustic model of client has speaker adaptation, in client, obtain after the acoustic feature relevant to speaker, decoding accuracy to sound bite corresponding to known variables also can improve greatly.The method that the acoustic training model relevant to speaker and utilization and speaker's acoustic feature is decoded has solution in the prior art, particularly can list of references: Tasos Anastasakos, John McDonough, Richard Schwartz, John Makhoul, A compact modelfor speaker-adaptive training, Spoken Language, 1996.IC SLP96.Proceedings., Fourth International Conference, Volume2, Pages1137-1140.
Please refer to Fig. 6, Fig. 6 is the schematic flow sheet of the embodiment of the method for speech recognition in the present invention.As shown in Figure 6, the method comprises:
Step S301: client modules is sent to server module by the user speech instruction of obtaining.
Step S302: server module utilizes instruction template set and named entity set tentatively to identify phonetic order, obtain preliminary recognition result, and preliminary recognition result is sent to client modules, wherein preliminary recognition result is the recognition result that contains known variables information, and known variables is sound bite relevant to the named entity information of client stores in phonetic order.
Step S303: client modules utilizes the named entity information of client stores to identify known variables, obtains the complete recognition result of phonetic order.
According to an embodiment, the step that in step S302, server module utilizes instruction template set and named entity set to carry out preliminary identification to phonetic order comprises:
Server module is when receiving phonetic order, utilize between the first decode empty phonetic order is decoded, to determine the instruction template under phonetic order, and the start-stop of known variables in described phonetic order constantly, the start-stop in phonetic order of instruction template under phonetic order and known variables is constantly as preliminary recognition result, wherein between the first decode empty, in advance instruction template set and named entity set is compiled into respectively to two and independently after WFST network, forms.
Accordingly, in step S303, the step that client modules utilizes the named entity information of client stores to identify known variables comprises:
Client modules is when receiving preliminary recognition result, start-stop according to known variables in phonetic order constantly, from phonetic order, determine sound bite to be identified, and utilize between the second decode empty sound bite to be identified is decoded, obtaining the recognition result of known variables, is wherein to form after in advance the named entity information of client stores being compiled into WFST network between the second decode empty.
According to another embodiment, step S302 further comprises:
Server module extracts the acoustic feature relevant to speaker from phonetic order, and the acoustic feature relevant to speaker is sent to client modules.
Correspondingly, in step S303, when client modules is decoded at the sound bite to be identified (being sound bite corresponding to known variables), utilize relevant acoustic model between the acoustic feature relevant to speaker, the second decode empty and with speaker to decode to sound bite to be identified, wherein the acoustic model relevant to speaker utilizes the training of speaker's speech samples to obtain in advance.
The foregoing is only preferred embodiment of the present invention, in order to limit the present invention, within the spirit and principles in the present invention not all, any modification of making, be equal to replacement, improvement etc., within all should being included in the scope of protection of the invention.

Claims (10)

1. a system for speech recognition, comprising:
Client modules and server module, wherein,
Described client modules comprises:
Voice collecting unit, for obtaining user's phonetic order;
Client communication unit, for being sent to server module by described phonetic order;
Described server module comprises:
The first recognition unit, be used for utilizing instruction template set and named entity set tentatively to identify described phonetic order, obtain preliminary recognition result, wherein said preliminary recognition result is the recognition result that contains known variables information, and described known variables is sound bite relevant to the named entity information of described client stores in described phonetic order;
Server communication unit, for being sent to described client modules by described preliminary recognition result;
Described client modules also comprises:
The second recognition unit, for utilizing the named entity information of described client stores to identify described known variables, to obtain the complete recognition result of described phonetic order.
2. system according to claim 1, is characterized in that, described the first recognition unit comprises:
Generation unit between the first decode empty, in advance instruction template set and named entity set being compiled into respectively to two independently WFST networks, to form between the first decode empty;
The first decoding unit, for when receiving described phonetic order, utilize between described the first decode empty described phonetic order is decoded, to determine the instruction template under described phonetic order, and the start-stop of described known variables in described phonetic order constantly, and the instruction template under described phonetic order and the start-stop of described known variables in described phonetic order are constantly as described preliminary recognition result.
3. system according to claim 2, is characterized in that, described the second recognition unit comprises:
Generation unit between the second decode empty, in advance the named entity information of described client stores being compiled into WFST network, to form between the second decode empty;
The second decoding unit, for when receiving described preliminary recognition result, start-stop according to described known variables in described phonetic order constantly, from described phonetic order, determine sound bite to be identified, and utilize between described the second decode empty described sound bite to be identified is decoded, obtain the recognition result of described known variables.
4. system according to claim 3, is characterized in that, described server module further comprises:
Feature extraction unit, for extracting the acoustic feature relevant to speaker from described phonetic order;
And described server communication unit is further used for the described acoustic feature relevant to speaker to be sent to described client modules.
5. system according to claim 4, is characterized in that, described client modules further comprises:
Acoustic training model unit, for utilizing in advance speaker's speech samples training acoustic model relevant to speaker;
And, when described the second decoding unit is decoded to described sound bite to be identified, utilize between described relevant to speaker acoustic feature, described the second decode empty and acoustic model described and that speaker is relevant is decoded to described sound bite to be identified.
6. a method for speech recognition, comprising:
A. client modules is sent to server module by the user speech instruction of obtaining;
B. described server module utilizes instruction template set and named entity set tentatively to identify described phonetic order, obtain preliminary recognition result, and described preliminary recognition result is sent to described client modules, wherein said preliminary recognition result is the recognition result that contains known variables information, and described known variables is sound bite relevant to the named entity information of described client stores in described phonetic order;
C. described client modules utilizes the named entity information of described client stores to identify described known variables, to obtain the complete recognition result of described phonetic order.
7. method according to claim 6, is characterized in that, the step that described server module utilizes instruction template set and named entity set to carry out preliminary identification to described phonetic order comprises:
Described server module is when receiving described phonetic order, utilize between the first decode empty described phonetic order is decoded, to determine the instruction template under described phonetic order, and the start-stop of known variables in described phonetic order constantly, and the start-stop in described phonetic order of the instruction template under described phonetic order and described known variables is constantly as described preliminary recognition result, between wherein said the first decode empty, in advance instruction template set and named entity set is compiled into respectively to two and independently after WFST network, forms.
8. method according to claim 7, is characterized in that, the step that described client modules utilizes the named entity information of described client stores to identify described known variables comprises:
Described client modules is when receiving described preliminary recognition result, start-stop according to described known variables in described phonetic order constantly, from described phonetic order, determine sound bite to be identified, and utilize between the second decode empty described sound bite to be identified is decoded, obtaining the recognition result of described known variables, is to form after in advance the named entity information of described client stores being compiled into WFST network between wherein said the second decode empty.
9. method according to claim 8, is characterized in that, described step B further comprises:
Server module extracts the acoustic feature relevant to speaker from described phonetic order, and the described acoustic feature relevant to speaker is sent to described client modules.
10. method according to claim 9, it is characterized in that, when described client modules is decoded to described sound bite to be identified, utilize relevant acoustic model between described relevant to speaker acoustic feature, described the second decode empty and with speaker to decode to described sound bite to be identified, wherein the acoustic model relevant to speaker utilizes the training of speaker's speech samples to obtain in advance.
CN201210227158.3A 2012-06-30 2012-06-30 A kind of audio recognition method and system Active CN103514882B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210227158.3A CN103514882B (en) 2012-06-30 2012-06-30 A kind of audio recognition method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210227158.3A CN103514882B (en) 2012-06-30 2012-06-30 A kind of audio recognition method and system

Publications (2)

Publication Number Publication Date
CN103514882A true CN103514882A (en) 2014-01-15
CN103514882B CN103514882B (en) 2017-11-10

Family

ID=49897508

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210227158.3A Active CN103514882B (en) 2012-06-30 2012-06-30 A kind of audio recognition method and system

Country Status (1)

Country Link
CN (1) CN103514882B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104008132A (en) * 2014-05-04 2014-08-27 深圳市北科瑞声科技有限公司 Voice map searching method and system
CN105632487A (en) * 2015-12-31 2016-06-01 北京奇艺世纪科技有限公司 Voice recognition method and device
CN106057201A (en) * 2016-04-25 2016-10-26 北京市动感生活科技有限公司 Household electrical appliance intelligent voice interaction control method and apparatus
CN106373566A (en) * 2016-08-25 2017-02-01 深圳市元征科技股份有限公司 Data transmission control method and device
CN106529384A (en) * 2015-09-11 2017-03-22 英特尔公司 Technologies for object recognition for internet-of-things edge devices
CN106710592A (en) * 2016-12-29 2017-05-24 北京奇虎科技有限公司 Speech recognition error correction method and speech recognition error correction device used for intelligent hardware equipment
CN108694939A (en) * 2018-05-23 2018-10-23 广州视源电子科技股份有限公司 Phonetic search optimization method, device and system
CN110634472A (en) * 2018-06-21 2019-12-31 中兴通讯股份有限公司 Voice recognition method, server and computer readable storage medium
CN112269556A (en) * 2020-09-21 2021-01-26 北京达佳互联信息技术有限公司 Information display method, device, system, equipment, server and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004114277A2 (en) * 2003-06-12 2004-12-29 Motorola, Inc. System and method for distributed speech recognition with a cache feature
US20090253463A1 (en) * 2008-04-08 2009-10-08 Jong-Ho Shin Mobile terminal and menu control method thereof
CN101557432A (en) * 2008-04-08 2009-10-14 Lg电子株式会社 Mobile terminal and menu control method thereof
CN101971250A (en) * 2008-03-13 2011-02-09 索尼爱立信移动通讯有限公司 Mobile electronic device with active speech recognition

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004114277A2 (en) * 2003-06-12 2004-12-29 Motorola, Inc. System and method for distributed speech recognition with a cache feature
CN101971250A (en) * 2008-03-13 2011-02-09 索尼爱立信移动通讯有限公司 Mobile electronic device with active speech recognition
US20090253463A1 (en) * 2008-04-08 2009-10-08 Jong-Ho Shin Mobile terminal and menu control method thereof
CN101557432A (en) * 2008-04-08 2009-10-14 Lg电子株式会社 Mobile terminal and menu control method thereof

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104008132A (en) * 2014-05-04 2014-08-27 深圳市北科瑞声科技有限公司 Voice map searching method and system
CN104008132B (en) * 2014-05-04 2018-09-25 深圳市北科瑞声科技股份有限公司 Voice map searching method and system
CN106529384A (en) * 2015-09-11 2017-03-22 英特尔公司 Technologies for object recognition for internet-of-things edge devices
CN105632487A (en) * 2015-12-31 2016-06-01 北京奇艺世纪科技有限公司 Voice recognition method and device
CN105632487B (en) * 2015-12-31 2020-04-21 北京奇艺世纪科技有限公司 Voice recognition method and device
CN106057201A (en) * 2016-04-25 2016-10-26 北京市动感生活科技有限公司 Household electrical appliance intelligent voice interaction control method and apparatus
CN106373566A (en) * 2016-08-25 2017-02-01 深圳市元征科技股份有限公司 Data transmission control method and device
CN106710592A (en) * 2016-12-29 2017-05-24 北京奇虎科技有限公司 Speech recognition error correction method and speech recognition error correction device used for intelligent hardware equipment
CN108694939A (en) * 2018-05-23 2018-10-23 广州视源电子科技股份有限公司 Phonetic search optimization method, device and system
CN110634472A (en) * 2018-06-21 2019-12-31 中兴通讯股份有限公司 Voice recognition method, server and computer readable storage medium
CN110634472B (en) * 2018-06-21 2024-06-04 中兴通讯股份有限公司 Speech recognition method, server and computer readable storage medium
CN112269556A (en) * 2020-09-21 2021-01-26 北京达佳互联信息技术有限公司 Information display method, device, system, equipment, server and storage medium

Also Published As

Publication number Publication date
CN103514882B (en) 2017-11-10

Similar Documents

Publication Publication Date Title
CN103514882A (en) Voice identification method and system
US9564127B2 (en) Speech recognition method and system based on user personalized information
CN113327609B (en) Method and apparatus for speech recognition
CN112183120A (en) Speech translation method, device, equipment and storage medium
CN102543071A (en) Voice recognition system and method used for mobile equipment
CN109840052B (en) Audio processing method and device, electronic equipment and storage medium
JP7365985B2 (en) Methods, devices, electronic devices, computer-readable storage media and computer programs for recognizing speech
US11393458B2 (en) Method and apparatus for speech recognition
CN110995943B (en) Multi-user streaming voice recognition method, system, device and medium
JP7375089B2 (en) Method, device, computer readable storage medium and computer program for determining voice response speed
JP2023162265A (en) Text echo cancellation
CN113611316A (en) Man-machine interaction method, device, equipment and storage medium
CN112863496B (en) Voice endpoint detection method and device
CN113724698B (en) Training method, device, equipment and storage medium of voice recognition model
CN112306560B (en) Method and apparatus for waking up an electronic device
CN114743540A (en) Speech recognition method, system, electronic device and storage medium
CN112712793A (en) ASR (error correction) method based on pre-training model under voice interaction and related equipment
CN112002325A (en) Multi-language voice interaction method and device
CN111414748A (en) Traffic data processing method and device
CN111899738A (en) Dialogue generating method, device and storage medium
CN112542157A (en) Voice processing method and device, electronic equipment and computer readable storage medium
CN112151073B (en) Voice processing method, system, equipment and medium
CN113763921B (en) Method and device for correcting text
CN113066507B (en) End-to-end speaker separation method, system and equipment
CN116264078A (en) Speech recognition processing method and device, electronic equipment and readable medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant