CN103020165A - Browser capable of performing voice recognition processing and processing method - Google Patents
Browser capable of performing voice recognition processing and processing method Download PDFInfo
- Publication number
- CN103020165A CN103020165A CN2012104889241A CN201210488924A CN103020165A CN 103020165 A CN103020165 A CN 103020165A CN 2012104889241 A CN2012104889241 A CN 2012104889241A CN 201210488924 A CN201210488924 A CN 201210488924A CN 103020165 A CN103020165 A CN 103020165A
- Authority
- CN
- China
- Prior art keywords
- web
- browser
- local
- voice
- input
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Landscapes
- Information Transfer Between Computers (AREA)
- Telephonic Communication Services (AREA)
Abstract
The invention discloses a browser capable of performing voice recognition processing and a method of performing the voice recognition processing in the browser. The browser comprises a processor, a voice receiver, a local recognition unit and a WEB recognition unit, wherein the processor is suitable for setting a voice-activated mode of the browser as a command inputting mode or a text inputting mode; the voice receiver is suitable for receiving voice input of a user and converting the voice input into voice data; the local recognition unit is suitable for calling a local voice recognition engine on a browser side to recognize the voice data, receiving a local voice recognition result, and performing corresponding browser side processing according to the local voice recognition result; and the WEB recognition unit is used for calling a WEB voice recognition engine to recognize the voice data, receiving a WEB voice recognition result from the WEB voice recognition engine, and performing corresponding browser side processing according to the WEB voice recognition result. According to the browser and the method, the local voice recognition engine and the WEB voice recognition engine are integrated to recognize the voice data, so that the recognition efficiency and accuracy degree are improved.
Description
Technical field
The present invention relates to the Internet communication technology field, particularly carry out the method for voice recognition processing in a kind of browser that carries out voice recognition processing and the browser.
Background technology
Speech recognition is a cross discipline.Recent two decades comes, and speech recognition technology is obtained marked improvement, begins to move towards market from the laboratory.People estimate that in the coming 10 years, speech recognition technology will enter the every field such as industry, household electrical appliances, communication, automotive electronics, medical treatment, home services, consumption electronic product.The related field of speech recognition technology comprises: signal processing, pattern-recognition, theory of probability and information theory, sound generating mechanism and hearing mechanism, artificial intelligence etc.
Browser refers to can the display web page server or the HTML(Hypertext Markup Language of file system, HTML (Hypertext Markup Language)) file content, and allow a kind of software of user and these file interactions.Web browser is mainly by HTTP(Hypertexttransfer protocol, the hypertext transfer agreement) agreement and web page server are mutual and obtain webpage, these webpages are by URL(Uniform/Universal Resource Locator, URL(uniform resource locator)) specifies, file layout is generally HTML, and by MIME(MultipurposeInternet Mail Extensions, multi-functional internet mail expands service) in http protocol, indicate.
Existing browser does not have integrated speech recognition function mostly, namely be indivedual browsers integrated speech identifying function, general only based on local speech recognition engine or Base on Web (network) speech recognition engine only, low, the poor accuracy of recognition efficiency.
Summary of the invention
In view of the above problems, the present invention has been proposed in order to a kind of method of carrying out voice recognition processing in the problems referred to above or a kind of browser that carries out voice recognition processing that addresses the above problem at least in part and the browser that overcomes is provided.
According to one aspect of the present invention, a kind of browser that carries out voice recognition processing is provided, it comprises:
Processor is suitable for receiving user input instruction and the audio control pattern of browser is set for ordering input pattern or text entry mode;
Voice receiver is suitable for receiving user's phonetic entry, and is converted to speech data;
Local recognition unit, being suitable for calling browser side local speech recognition engine identifies described speech data, receive the local voice recognition result, if identify successfully, then under described order input pattern or text entry mode, carry out corresponding browser side according to the local voice recognition result and process;
The WEB recognition unit, be suitable in the situation of described local recognition unit recognition failures, calling the WEB speech recognition engine identifies described speech data, reception is carried out corresponding browser side according to the WEB voice identification result and is processed from the WEB voice identification result of WEB speech recognition engine under order input pattern or text entry mode.
Alternatively, described local recognition unit comprises:
Local calling module is suitable for calling local speech recognition engine described speech data is identified, and receives the local voice recognition result;
Local judge module is suitable for judging whether described local voice recognition result is empty, if so, thinks to send initiate notification to the WEB recognition unit by recognition failures; Otherwise, judge whether text entry mode of current audio control pattern, if so, think and identify successfully, input text corresponding to the described local voice recognition result of input in described browser, otherwise, send local matching instruction to local matching module;
Described local matching module, be suitable for according to described local matching instruction, the command set of described local voice recognition result and described browser is mated, if the match is successful, carry out corresponding browser operation according to described local voice recognition result, otherwise, think to send initiate notification to the WEB recognition unit by recognition failures.
Alternatively, described WEB recognition unit comprises:
The notice receiver module is suitable for receiving the initiate notification that described local recognition unit sends;
The WEB calling module is suitable for calling the WEB speech recognition engine described speech data is identified, and receives the WEB voice identification result from the WEB speech recognition engine;
The WEB judge module is suitable for judging whether text entry mode of current audio control pattern, if so, and input text corresponding to the described WEB voice identification result of input in described browser, otherwise, send the WEB matching instruction to the WEB matching module;
The WEB matching module is suitable for mating according to the command set of described WEB matching instruction with described WEB voice identification result and described browser, and carries out corresponding browser operation according to described WEB voice identification result.
Alternatively, described command set comprises the operational order of described browser.
Alternatively, described browser also comprises: the training utterance recognition unit;
Described training utterance recognition unit comprises:
Module is set up in the training utterance storehouse, is suitable in advance user's speech data being trained, and generates the training utterance storehouse that comprises training utterance and corresponding input text or corresponding described operational order;
The training utterance matching module, be suitable for the training utterance in described speech data and the described training utterance storehouse is complementary, if the match is successful, under text entry mode, the corresponding input text of input in described browser, perhaps, under the order input pattern, carry out corresponding browser operation according to corresponding operational order.
Alternatively, described training utterance recognition unit also comprises: reminding module;
Described reminding module is suitable for prompting user search key commonly used in address commonly used and/or the search column in operational order commonly used, the address field is carried out voice training.
Alternatively, described training utterance recognition unit also comprises: build table module and table look-up module;
The described table module of building is suitable in the calculated address hurdle input and the mapping table of the URL of the address commonly used of recommending and corresponding web page title;
Wherein, described web page title is also corresponding with corresponding training utterance as the input text in the described training utterance storehouse;
Described training utterance matching module also is suitable for when the user carries out phonetic entry in address field, with voice and the described training utterance of user input web page title corresponding to acquisition that be complementary;
Described table look-up module is suitable for inquiring about described mapping table according to the web page title that coupling obtains and obtains corresponding URL, and the URL that obtains is inputed in the address field of described browser.
Alternatively, described browser also comprises: historical voice recognition unit;
Described historical voice recognition unit comprises:
Historical sound bank is set up module, is suitable for historical voice that described system has been identified and corresponding input text or corresponding described operational order and stores and generate historical sound bank;
Historical voice match module, be suitable for the historical voice in described speech data and the described historical sound bank are complementary, if the match is successful, under text entry mode, the corresponding input text of input in described browser, perhaps, under the order input pattern, carry out corresponding browser operation according to corresponding operational order.
Alternatively, described training utterance storehouse and/or historical sound bank and corresponding browser client binding, and the personal information storage of the corresponding browser client of conduct is in the database of browser side or webserver side.
Alternatively, described local recognition unit also is suitable for described speech data is partly being identified in the successful situation, and the not successful speech data of identifying is sent to described WEB recognition unit;
Described WEB recognition unit, also being suitable for calling the WEB speech recognition engine identifies the speech data of not success identification, reception is from the WEB voice identification result of WEB speech recognition engine, integrate local voice recognition result and WEB voice identification result and obtain final recognition result, under order input pattern or text entry mode, carry out corresponding browser side according to final recognition result and process.
The present invention also provides the method for carrying out voice recognition processing in a kind of browser, and it comprises step:
Receive user input instruction and the audio control pattern of browser is set for ordering input pattern or text entry mode;
Receive user's phonetic entry, and be converted to speech data;
Calling browser side local speech recognition engine identifies described speech data, receive the local voice recognition result, if identify successfully, then under described order input pattern or text entry mode, carry out corresponding browser side according to the local voice recognition result and process;
Otherwise, calling the WEB speech recognition engine identifies described speech data, reception is carried out corresponding browser side according to the WEB voice identification result and is processed from the WEB voice identification result of WEB speech recognition engine under order input pattern or text entry mode.
Alternatively, the described browser side local speech recognition engine of calling is identified described speech data, receive the local voice recognition result, if identify successfully, then under described order input pattern or text entry mode, carry out the step that corresponding browser side processes according to the local voice recognition result and comprise:
Call local speech recognition engine described speech data is identified, receive the local voice recognition result;
Under described local voice recognition result is not empty situation, judge whether text entry mode of current audio control pattern, if, think and identify successfully, process ends behind the input text that the described local voice recognition result of input is corresponding in described browser, otherwise
The command set of described local voice recognition result and described browser is mated, if the match is successful, carry out process ends behind the corresponding browser operation according to described local voice recognition result.
Alternatively, the described WEB of calling speech recognition engine is identified described speech data, reception is from the WEB voice identification result of WEB speech recognition engine, and the step of carrying out corresponding browser side processing according to the WEB voice identification result under order input pattern or text entry mode comprises:
Call the WEB speech recognition engine described speech data is identified, receive the WEB voice identification result from the WEB speech recognition engine;
Judge whether text entry mode of current audio control pattern, if, input text corresponding to the described WEB voice identification result of input in described browser, otherwise, mate according to the command set of described WEB matching instruction with described WEB voice identification result and described browser, and carry out corresponding browser operation according to described WEB voice identification result.
Alternatively, described command set comprises the operational order of described browser.
Alternatively, in the described browser side local speech recognition engine of calling described speech data is identified, is also comprised step before:
In advance user's speech data is trained, generate the training utterance storehouse that comprises training utterance and corresponding input text or corresponding described operational order;
Training utterance in described speech data and the described training utterance storehouse is complementary, if the match is successful, under text entry mode, process ends behind the corresponding input text of input in described browser, perhaps, under the order input pattern, carry out process ends behind the corresponding browser operation according to corresponding operational order; Otherwise, carry out the described step that local speech recognition engine is identified described speech data of calling.
Alternatively, described in advance user's speech data the training comprises step before:
Prompting user carries out voice training to search key commonly used in address commonly used and/or the search column in operational order commonly used, the address field.
Alternatively, described generation comprises the training utterance storehouse of training utterance and corresponding input text or corresponding described operational order, also comprises afterwards step:
The mapping table of the URL of the address commonly used of input and recommendation and corresponding web page title in the calculated address hurdle;
Wherein, described web page title is also corresponding with corresponding training utterance as the input text in the described training utterance storehouse;
When the user carries out phonetic entry in address field, with voice and the described training utterance of user input web page title corresponding to acquisition that be complementary;
In the webpage label, open URL corresponding to described web page title.
Alternatively, in the described browser side local speech recognition engine of calling described speech data is identified, is also comprised step before:
The historical voice that described system has been identified and corresponding input text or corresponding described operational order are stored and are generated historical sound bank;
Historical voice in described speech data and the described historical sound bank are complementary, if the match is successful, under text entry mode, process ends behind the corresponding input text of input in described browser, perhaps, under the order input pattern, carry out process ends behind the corresponding browser operation according to corresponding operational order; Otherwise, carry out the described step that local speech recognition engine is identified described speech data of calling.
Alternatively, described training utterance storehouse and/or historical sound bank and corresponding browser client binding, and the personal information storage of the corresponding browser client of conduct is in the database of browser side or webserver side.
Alternatively, the described browser side local speech recognition engine of calling is identified described speech data, receive the local voice recognition result, if identify successfully, then under described order input pattern or text entry mode, carry out corresponding browser side according to the local voice recognition result and process;
Otherwise, calling the WEB speech recognition engine identifies described speech data, reception is carried out corresponding browser side according to the WEB voice identification result and is processed from the WEB voice identification result of WEB speech recognition engine under order input pattern or text entry mode, further be:
Calling browser side local speech recognition engine identifies described speech data, receive the local voice recognition result, if identify fully successfully, then under described order input pattern or text entry mode, carry out corresponding browser side according to the local voice recognition result and process;
Otherwise, calling the WEB speech recognition engine identifies the speech data of not success identification, reception is from the WEB voice identification result of WEB speech recognition engine, integrate local voice recognition result and WEB voice identification result and obtain final recognition result, under order input pattern or text entry mode, carry out corresponding browser side according to final recognition result and process.
Carry out the method for voice recognition processing in the browser that carries out voice recognition processing of the present invention and the browser, integrated local speech recognition engine and WEB speech recognition engine are identified speech data, have significantly improved recognition efficiency and recognition accuracy.Simultaneously, by combined training sound bank and historical sound bank, speech data is carried out the pre-identification in early stage, can further improve recognition efficiency and recognition accuracy.
Above-mentioned explanation only is the general introduction of technical solution of the present invention, for can clearer understanding technological means of the present invention, and can be implemented according to the content of instructions, and for above and other objects of the present invention, feature and advantage can be become apparent, below especially exemplified by the specific embodiment of the present invention.
Description of drawings
By reading hereinafter detailed description of the preferred embodiment, various other advantage and benefits will become cheer and bright for those of ordinary skills.Accompanying drawing only is used for the purpose of preferred implementation is shown, and does not think limitation of the present invention.And in whole accompanying drawing, represent identical parts with identical reference symbol.In the accompanying drawings:
Fig. 1 shows the modular structure synoptic diagram that can carry out according to an embodiment of the invention the browser of voice recognition processing;
Fig. 2 shows the according to an embodiment of the invention modular structure synoptic diagram of local recognition unit;
Fig. 3 shows the according to an embodiment of the invention modular structure synoptic diagram of WEB recognition unit;
Fig. 4 shows the modular structure synoptic diagram that can carry out in accordance with another embodiment of the present invention the browser of voice recognition processing;
Fig. 5 shows the in accordance with another embodiment of the present invention modular structure synoptic diagram of training utterance recognition unit;
Fig. 6 shows the in accordance with another embodiment of the present invention modular structure synoptic diagram of historical voice recognition unit;
Fig. 7 shows the method flow diagram that carries out voice recognition processing in the browser according to an embodiment of the invention;
Fig. 8 shows the method flow diagram that carries out voice recognition processing in the browser in accordance with another embodiment of the present invention;
Fig. 9 shows the method flow diagram that carries out voice recognition processing in the browser of another embodiment according to the present invention.
Embodiment
Exemplary embodiment of the present disclosure is described below with reference to accompanying drawings in more detail.Although shown exemplary embodiment of the present disclosure in the accompanying drawing, yet should be appreciated that and to realize the disclosure and the embodiment that should do not set forth limits here with various forms.On the contrary, it is in order to understand the disclosure more thoroughly that these embodiment are provided, and can with the scope of the present disclosure complete convey to those skilled in the art.
Fig. 1 shows the modular structure synoptic diagram that can carry out according to an embodiment of the invention the browser of voice recognition processing, and as shown in Figure 1, described browser 100 comprises:
Referring to Fig. 2, described local recognition unit 130 comprises:
Described local matching module 133, be suitable for according to described local matching instruction, the command set of described local voice recognition result and described browser is mated, if the match is successful, carry out corresponding browser operation according to described local voice recognition result, otherwise, think to send initiate notification to the WEB recognition unit by recognition failures.
Referring to Fig. 3, described WEB recognition unit 140 comprises:
Wherein, described command set comprises the operational order of described browser 100, such as one page browsing pages that advances, retreats, scrolls up, the downward one page browsing pages etc. that rolls.
In addition, described local recognition unit 130 also is suitable for described speech data is partly being identified in the successful situation, and the not successful speech data of identifying is sent to described WEB recognition unit.
Described WEB recognition unit 140, also being suitable for calling the WEB speech recognition engine identifies the speech data of not success identification, reception is from the WEB voice identification result of WEB speech recognition engine, integrate local voice recognition result and WEB voice identification result and obtain final recognition result, under order input pattern or text entry mode, carry out corresponding browser side according to final recognition result and process.
That is to say, when 130 pairs of speech data identifications of described local recognition unit are not known, be that partial data is when unidentified, can make the local recognition result of 140 pairs of described local recognition units 130 of described WEB recognition unit proofread and correct, namely only the part of not success identification is replenished identification, can take full advantage of like this computational resource of local recognition unit 130, reduce the dependence to WEB recognition unit 140, thereby can improve recognition efficiency.
Fig. 4 shows the modular structure synoptic diagram that can carry out in accordance with another embodiment of the present invention the browser of voice recognition processing, as shown in Figure 4, the described browser of present embodiment and the described browser of a upper embodiment are basic identical, its difference only is that this browser 100 also comprises: training utterance recognition unit 150, training utterance storehouse 160, historical voice recognition unit 170 and historical sound bank 180.
Referring to Fig. 5, described training utterance recognition unit 150 comprises:
Training utterance matching module 152, be suitable for the training utterance in described speech data and the described training utterance storehouse 160 is complementary, if the match is successful, under text entry mode, the corresponding input text of input in described browser, perhaps, under the order input pattern, carry out corresponding browser operation according to corresponding operational order.
Described training utterance recognition unit 150 also comprises: reminding module 153, build table module 154 and table look-up module 155.
Described reminding module 153 is suitable for prompting user search key commonly used in address commonly used and/or the search column in operational order commonly used, the address field is carried out voice training.
The described table module 154 of building is suitable in the calculated address hurdle input and the mapping table of the URL of the address commonly used of recommending and corresponding web page title.
Wherein, described web page title is also corresponding with corresponding training utterance as the input text in the described training utterance storehouse.
Described training utterance matching module 152 also is suitable for when the user carries out phonetic entry in address field, with voice and the described training utterance of user input web page title corresponding to acquisition that be complementary.
Described table look-up module 155 is suitable for inquiring about described mapping table according to the web page title that coupling obtains and obtains corresponding URL, and the URL that obtains is inputed in the address field of described browser.
For example, the user can be with " the phoenix net " that often use as web page title, generate its corresponding training utterance, in mapping table, set up simultaneously the corresponding relation of " phoenix net " and its URL:www.ifeng.com, like this, as user during by phonetic entry " phoenix net ", can automatically open the homepage of phoenix net by browser.
Referring to Fig. 6, described historical voice recognition unit 170 comprises:
Historical sound bank is set up module 171, is suitable for historical voice that described system has been identified and corresponding input text or corresponding described operational order and stores and generate historical sound bank 180.
Historical voice match module 172, be suitable for the historical voice in described speech data and the described historical sound bank 180 are complementary, if the match is successful, under text entry mode, the corresponding input text of input in described browser, perhaps, under the order input pattern, carry out corresponding browser operation according to corresponding operational order.
Wherein, described training utterance storehouse and/or historical sound bank and corresponding browser client binding, and the personal information storage of the corresponding browser client of conduct is in the database of browser side or webserver side, after the user was with account, password login browser, browser Automatically invoked corresponding training utterance storehouse and/or historical sound bank carried out aid identification to the speech data of user's input like this.
Fig. 7 shows the method flow diagram that carries out voice recognition processing in the browser according to an embodiment of the invention, and as shown in Figure 7, described method comprises step:
The audio control pattern that S100, reception user input instruction arrange browser is order input pattern or text entry mode.
S200, reception user's phonetic entry, and be converted to speech data.
S300, call browser side local speech recognition engine described speech data is identified, receive the local voice recognition result, if identify successfully, then under described order input pattern or text entry mode, carry out corresponding browser side according to the local voice recognition result and process.
Described step S300 further comprises:
Call local speech recognition engine described speech data is identified, receive the local voice recognition result;
Under described local voice recognition result is not empty situation, judge whether text entry mode of current audio control pattern, if, think and identify successfully, process ends behind the input text that the described local voice recognition result of input is corresponding in described browser, otherwise
The command set of described local voice recognition result and described browser is mated, if the match is successful, carry out process ends behind the corresponding browser operation according to described local voice recognition result.
S400 otherwise, calling the WEB speech recognition engine identifies described speech data, reception is carried out corresponding browser side according to the WEB voice identification result and is processed from the WEB voice identification result of WEB speech recognition engine under order input pattern or text entry mode.
Described step S400 further comprises:
Call the WEB speech recognition engine described speech data is identified, receive the WEB voice identification result from the WEB speech recognition engine;
Judge whether text entry mode of current audio control pattern, if, input text corresponding to the described WEB voice identification result of input in described browser, otherwise, mate according to the command set of described WEB matching instruction with described WEB voice identification result and described browser, and carry out corresponding browser operation according to described WEB voice identification result.
Wherein, described command set comprises the operational order of described browser.
In addition, described step S300 and S400 can further be:
S300 ', call browser side local speech recognition engine described speech data is identified, receive the local voice recognition result, if identify fully successfully, then under described order input pattern or text entry mode, carry out corresponding browser side according to the local voice recognition result and process;
S400 ' otherwise, calling the WEB speech recognition engine identifies the speech data of not success identification, reception is from the WEB voice identification result of WEB speech recognition engine, integrate local voice recognition result and WEB voice identification result and obtain final recognition result, under order input pattern or text entry mode, carry out corresponding browser side according to final recognition result and process.
Fig. 8 shows the method flow diagram that carries out voice recognition processing in the browser in accordance with another embodiment of the present invention, as shown in Figure 8, the described method of present embodiment and basic identical its difference of the described method of a upper embodiment are that the method also comprised step before described step S300:
P1, in advance user's speech data is trained, generate the training utterance storehouse that comprises training utterance and corresponding input text or corresponding described operational order;
P2, the training utterance in described speech data and the described training utterance storehouse is complementary, if the match is successful, under text entry mode, process ends behind the corresponding input text of input in described browser, perhaps, under the order input pattern, carry out process ends behind the corresponding browser operation according to corresponding operational order; Otherwise, carry out the described step that local speech recognition engine is identified described speech data of calling.
In addition, before described step P1, can also comprise step:
Prompting user carries out voice training to search key commonly used in address commonly used and/or the search column in operational order commonly used, the address field.
Correspondingly, after described step P1, can also comprise step:
The mapping table of the URL of the address commonly used of input and recommendation and corresponding web page title in the calculated address hurdle;
Wherein, described web page title is also corresponding with corresponding training utterance as the input text in the described training utterance storehouse;
When the user carries out phonetic entry in address field, with voice and the described training utterance of user input web page title corresponding to acquisition that be complementary;
In the webpage label, open URL corresponding to described web page title.
Fig. 9 shows the method flow diagram that carries out voice recognition processing in the browser of another embodiment according to the present invention, as shown in Figure 9, present embodiment and first embodiment of the method are basic identical, and its difference only is that method also comprised step among this embodiment before described step S300:
T1, the historical voice that described system has been identified and corresponding input text or corresponding described operational order are stored and are generated historical sound bank;
T2, the historical voice in described speech data and the described historical sound bank are complementary, if the match is successful, under text entry mode, process ends behind the corresponding input text of input in described browser, perhaps, under the order input pattern, carry out process ends behind the corresponding browser operation according to corresponding operational order; Otherwise, carry out the described step that local speech recognition engine is identified described speech data of calling.
In addition, described training utterance storehouse and/or historical sound bank and corresponding browser client binding, and the personal information storage of the corresponding browser client of conduct is in the database of browser side or webserver side.And, the step that increases in above-mentioned latter two embodiment of the method (step P1, P2 and step T1, T2) can be used in combination, that is to say and utilize simultaneously described training utterance storehouse and historical sound bank to carry out the assistant voice identification in early stage, with further raising recognition efficiency.
The embodiment of the invention is described carries out carrying out in the browser of voice recognition processing and the browser method of voice recognition processing, integrated local speech recognition engine and WEB speech recognition engine are identified speech data, have significantly improved recognition efficiency and recognition accuracy.Simultaneously, by combined training sound bank and historical sound bank, speech data is carried out the pre-identification in early stage, can further improve recognition efficiency and recognition accuracy.
Intrinsic not relevant with any certain computer, virtual system or miscellaneous equipment with demonstration at this algorithm that provides.Various general-purpose systems also can be with using based on the teaching at this.According to top description, it is apparent constructing the desired structure of this type systematic.In addition, the present invention is not also for any certain programmed language.Should be understood that and to utilize various programming languages to realize content of the present invention described here, and the top description that language-specific is done is in order to disclose preferred forms of the present invention.
In the instructions that provides herein, a large amount of details have been described.Yet, can understand, embodiments of the invention can be put into practice in the situation of these details not having.In some instances, be not shown specifically known method, structure and technology, so that not fuzzy understanding of this description.
Similarly, be to be understood that, in order to simplify the disclosure and to help to understand one or more in each inventive aspect, in the description to exemplary embodiment of the present invention, each feature of the present invention is grouped together in single embodiment, figure or the description to it sometimes in the above.Yet the method for the disclosure should be construed to the following intention of reflection: namely the present invention for required protection requires the more feature of feature clearly put down in writing than institute in each claim.Or rather, as following claims reflected, inventive aspect was to be less than all features of the disclosed single embodiment in front.Therefore, follow claims of embodiment and incorporate clearly thus this embodiment into, wherein each claim itself is as independent embodiment of the present invention.
Those skilled in the art are appreciated that and can adaptively change and they are arranged in one or more equipment different from this embodiment the module in the equipment among the embodiment.Can be combined into a module or unit or assembly to the module among the embodiment or unit or assembly, and can be divided into a plurality of submodules or subelement or sub-component to them in addition.In such feature and/or process or unit at least some are mutually repelling, and can adopt any combination to disclosed all features in this instructions (comprising claim, summary and the accompanying drawing followed) and so all processes or the unit of disclosed any method or equipment make up.Unless in addition clearly statement, disclosed each feature can be by providing identical, being equal to or the alternative features of similar purpose replaces in this instructions (comprising claim, summary and the accompanying drawing followed).
In addition, those skilled in the art can understand, although embodiment more described herein comprise some feature rather than further feature included among other embodiment, the combination of the feature of different embodiment means and is within the scope of the present invention and forms different embodiment.For example, in the following claims, the one of any of embodiment required for protection can be used with array mode arbitrarily.
All parts embodiment of the present invention can realize with hardware, perhaps realizes with the software module of moving at one or more processor, and perhaps the combination with them realizes.It will be understood by those of skill in the art that and to use in practice microprocessor or digital signal processor (DSP) to realize according to some or all some or repertoire of parts in the browser of the carried out voice recognition processing of the embodiment of the invention.The present invention can also be embodied as be used to part or all equipment or the device program (for example, computer program and computer program) of carrying out method as described herein.Such realization program of the present invention can be stored on the computer-readable medium, perhaps can have the form of one or more signal.Such signal can be downloaded from internet website and obtain, and perhaps provides at carrier signal, perhaps provides with any other form.
It should be noted above-described embodiment the present invention will be described rather than limit the invention, and those skilled in the art can design alternative embodiment in the situation of the scope that does not break away from claims.In the claims, any reference symbol between bracket should be configured to limitations on claims.Word " comprises " not to be got rid of existence and is not listed in element or step in the claim.Being positioned at word " " before the element or " one " does not get rid of and has a plurality of such elements.The present invention can realize by means of the hardware that includes some different elements and by means of the computing machine of suitably programming.In having enumerated the unit claim of some devices, several in these devices can be to come imbody by same hardware branch.The use of word first, second and C grade does not represent any order.Can be title with these word explanations.
Claims (20)
1. browser that can carry out voice recognition processing, it comprises:
Processor is suitable for receiving user input instruction and the audio control pattern of browser is set for ordering input pattern or text entry mode;
Voice receiver is suitable for receiving user's phonetic entry, and is converted to speech data;
Local recognition unit, being suitable for calling browser side local speech recognition engine identifies described speech data, receive the local voice recognition result, if identify successfully, then under described order input pattern or text entry mode, carry out corresponding browser side according to the local voice recognition result and process;
The WEB recognition unit, be suitable in the situation of described local recognition unit recognition failures, calling the WEB speech recognition engine identifies described speech data, reception is carried out corresponding browser side according to the WEB voice identification result and is processed from the WEB voice identification result of WEB speech recognition engine under order input pattern or text entry mode.
2. browser as claimed in claim 1 is characterized in that, described local recognition unit comprises:
Local calling module is suitable for calling local speech recognition engine described speech data is identified, and receives the local voice recognition result;
Local judge module is suitable for judging whether described local voice recognition result is empty, if so, thinks to send initiate notification to the WEB recognition unit by recognition failures; Otherwise, judge whether text entry mode of current audio control pattern, if so, think and identify successfully, input text corresponding to the described local voice recognition result of input in described browser, otherwise, send local matching instruction to local matching module;
Described local matching module, be suitable for according to described local matching instruction, the command set of described local voice recognition result and described browser is mated, if the match is successful, carry out corresponding browser operation according to described local voice recognition result, otherwise, think to send initiate notification to the WEB recognition unit by recognition failures.
3. browser as claimed in claim 2 is characterized in that, described WEB recognition unit comprises:
The notice receiver module is suitable for receiving the initiate notification that described local recognition unit sends;
The WEB calling module is suitable for calling the WEB speech recognition engine described speech data is identified, and receives the WEB voice identification result from the WEB speech recognition engine;
The WEB judge module is suitable for judging whether text entry mode of current audio control pattern, if so, and input text corresponding to the described WEB voice identification result of input in described browser, otherwise, send the WEB matching instruction to the WEB matching module;
The WEB matching module is suitable for mating according to the command set of described WEB matching instruction with described WEB voice identification result and described browser, and carries out corresponding browser operation according to described WEB voice identification result.
4. browser as claimed in claim 2 or claim 3 is characterized in that described command set comprises the operational order of described browser.
5. browser as claimed in claim 4 is characterized in that, described browser also comprises: the training utterance recognition unit;
Described training utterance recognition unit comprises:
Module is set up in the training utterance storehouse, is suitable in advance user's speech data being trained, and generates the training utterance storehouse that comprises training utterance and corresponding input text or corresponding described operational order;
The training utterance matching module, be suitable for the training utterance in described speech data and the described training utterance storehouse is complementary, if the match is successful, under text entry mode, the corresponding input text of input in described browser, perhaps, under the order input pattern, carry out corresponding browser operation according to corresponding operational order.
6. browser as claimed in claim 5 is characterized in that, described training utterance recognition unit also comprises: reminding module;
Described reminding module is suitable for prompting user search key commonly used in address commonly used and/or the search column in operational order commonly used, the address field is carried out voice training.
7. browser as claimed in claim 6 is characterized in that, described training utterance recognition unit also comprises: build table module and table look-up module;
The described table module of building is suitable in the calculated address hurdle input and the mapping table of the URL of the address commonly used of recommending and corresponding web page title;
Wherein, described web page title is also corresponding with corresponding training utterance as the input text in the described training utterance storehouse;
Described training utterance matching module also is suitable for when the user carries out phonetic entry in address field, with voice and the described training utterance of user input web page title corresponding to acquisition that be complementary;
Described table look-up module is suitable for inquiring about described mapping table according to the web page title that coupling obtains and obtains corresponding URL, and the URL that obtains is inputed in the address field of described browser.
8. browser as claimed in claim 5 is characterized in that, described browser also comprises: historical voice recognition unit;
Described historical voice recognition unit comprises:
Historical sound bank is set up module, is suitable for historical voice that described system has been identified and corresponding input text or corresponding described operational order and stores and generate historical sound bank;
Historical voice match module, be suitable for the historical voice in described speech data and the described historical sound bank are complementary, if the match is successful, under text entry mode, the corresponding input text of input in described browser, perhaps, under the order input pattern, carry out corresponding browser operation according to corresponding operational order.
9. browser as claimed in claim 8, it is characterized in that, described training utterance storehouse and/or historical sound bank and corresponding browser client binding, and the personal information storage of the corresponding browser client of conduct is in the database of browser side or webserver side.
10. browser as claimed in claim 1 is characterized in that,
Described local recognition unit also is suitable for described speech data is partly being identified in the successful situation, and the not successful speech data of identifying is sent to described WEB recognition unit;
Described WEB recognition unit, also being suitable for calling the WEB speech recognition engine identifies the speech data of not success identification, reception is from the WEB voice identification result of WEB speech recognition engine, integrate local voice recognition result and WEB voice identification result and obtain final recognition result, under order input pattern or text entry mode, carry out corresponding browser side according to final recognition result and process.
11. carry out the method for voice recognition processing in the browser, it comprises step:
Receive user input instruction and the audio control pattern of browser is set for ordering input pattern or text entry mode;
Receive user's phonetic entry, and be converted to speech data;
Calling browser side local speech recognition engine identifies described speech data, receive the local voice recognition result, if identify successfully, then under described order input pattern or text entry mode, carry out corresponding browser side according to the local voice recognition result and process;
Otherwise, calling the WEB speech recognition engine identifies described speech data, reception is carried out corresponding browser side according to the WEB voice identification result and is processed from the WEB voice identification result of WEB speech recognition engine under order input pattern or text entry mode.
12. method as claimed in claim 11, it is characterized in that, the described browser side local speech recognition engine of calling is identified described speech data, receive the local voice recognition result, if identify successfully, then under described order input pattern or text entry mode, carry out the step that corresponding browser side processes according to the local voice recognition result and comprise:
Call local speech recognition engine described speech data is identified, receive the local voice recognition result;
Under described local voice recognition result is not empty situation, judge whether text entry mode of current audio control pattern, if, think and identify successfully, process ends behind the input text that the described local voice recognition result of input is corresponding in described browser, otherwise
The command set of described local voice recognition result and described browser is mated, if the match is successful, carry out process ends behind the corresponding browser operation according to described local voice recognition result.
13. method as claimed in claim 12, it is characterized in that, the described WEB of calling speech recognition engine is identified described speech data, reception is from the WEB voice identification result of WEB speech recognition engine, and the step of carrying out corresponding browser side processing according to the WEB voice identification result under order input pattern or text entry mode comprises:
Call the WEB speech recognition engine described speech data is identified, receive the WEB voice identification result from the WEB speech recognition engine;
Judge whether text entry mode of current audio control pattern, if, input text corresponding to the described WEB voice identification result of input in described browser, otherwise, mate according to the command set of described WEB matching instruction with described WEB voice identification result and described browser, and carry out corresponding browser operation according to described WEB voice identification result.
14. such as claim 12 or 13 described methods, it is characterized in that described command set comprises the operational order of described browser.
15. method as claimed in claim 14 is characterized in that, in the described browser side local speech recognition engine of calling described speech data is identified, and also comprises step before:
In advance user's speech data is trained, generate the training utterance storehouse that comprises training utterance and corresponding input text or corresponding described operational order;
Training utterance in described speech data and the described training utterance storehouse is complementary, if the match is successful, under text entry mode, process ends behind the corresponding input text of input in described browser, perhaps, under the order input pattern, carry out process ends behind the corresponding browser operation according to corresponding operational order; Otherwise, carry out the described step that local speech recognition engine is identified described speech data of calling.
16. method as claimed in claim 15 is characterized in that, described in advance user's speech data the training comprises step before:
Prompting user carries out voice training to search key commonly used in address commonly used and/or the search column in operational order commonly used, the address field.
17. method as claimed in claim 16 is characterized in that, described generation comprises the training utterance storehouse of training utterance and corresponding input text or corresponding described operational order, also comprises afterwards step:
The mapping table of the URL of the address commonly used of input and recommendation and corresponding web page title in the calculated address hurdle;
Wherein, described web page title is also corresponding with corresponding training utterance as the input text in the described training utterance storehouse;
When the user carries out phonetic entry in address field, with voice and the described training utterance of user input web page title corresponding to acquisition that be complementary;
In the webpage label, open URL corresponding to described web page title.
18. method as claimed in claim 15 is characterized in that, in the described browser side local speech recognition engine of calling described speech data is identified, and also comprises step before:
The historical voice that described system has been identified and corresponding input text or corresponding described operational order are stored and are generated historical sound bank;
Historical voice in described speech data and the described historical sound bank are complementary, if the match is successful, under text entry mode, process ends behind the corresponding input text of input in described browser, perhaps, under the order input pattern, carry out process ends behind the corresponding browser operation according to corresponding operational order; Otherwise, carry out the described step that local speech recognition engine is identified described speech data of calling.
19. method as claimed in claim 18, it is characterized in that, described training utterance storehouse and/or historical sound bank and corresponding browser client binding, and the personal information storage of the corresponding browser client of conduct is in the database of browser side or webserver side.
20. method as claimed in claim 11, it is characterized in that, the described browser side local speech recognition engine of calling is identified described speech data, receive the local voice recognition result, if identify successfully, then under described order input pattern or text entry mode, carry out corresponding browser side according to the local voice recognition result and process;
Otherwise, calling the WEB speech recognition engine identifies described speech data, reception is carried out corresponding browser side according to the WEB voice identification result and is processed from the WEB voice identification result of WEB speech recognition engine under order input pattern or text entry mode, further be:
Calling browser side local speech recognition engine identifies described speech data, receive the local voice recognition result, if identify fully successfully, then under described order input pattern or text entry mode, carry out corresponding browser side according to the local voice recognition result and process;
Otherwise, calling the WEB speech recognition engine identifies the speech data of not success identification, reception is from the WEB voice identification result of WEB speech recognition engine, integrate local voice recognition result and WEB voice identification result and obtain final recognition result, under order input pattern or text entry mode, carry out corresponding browser side according to final recognition result and process.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210488924.1A CN103020165B (en) | 2012-11-26 | 2012-11-26 | Browser and the processing method of voice recognition processing can be carried out |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210488924.1A CN103020165B (en) | 2012-11-26 | 2012-11-26 | Browser and the processing method of voice recognition processing can be carried out |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103020165A true CN103020165A (en) | 2013-04-03 |
CN103020165B CN103020165B (en) | 2016-06-22 |
Family
ID=47968769
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201210488924.1A Active CN103020165B (en) | 2012-11-26 | 2012-11-26 | Browser and the processing method of voice recognition processing can be carried out |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103020165B (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104134442A (en) * | 2014-08-15 | 2014-11-05 | 广东欧珀移动通信有限公司 | Method and device for starting voice services |
CN104575499A (en) * | 2013-10-09 | 2015-04-29 | 携程计算机技术(上海)有限公司 | Voice control method of mobile terminal and mobile terminal |
CN105096950A (en) * | 2014-05-22 | 2015-11-25 | 中兴通讯股份有限公司 | File naming method, file naming device and terminal |
WO2016058425A1 (en) * | 2014-10-17 | 2016-04-21 | 百度在线网络技术(北京)有限公司 | Voice search method, apparatus and device, and computer storage medium |
CN105632491A (en) * | 2014-11-26 | 2016-06-01 | 三星电子株式会社 | Method and electronic device for voice recognition |
CN105940399A (en) * | 2014-03-28 | 2016-09-14 | 松下知识产权经营株式会社 | Speech retrieval device, speech retrieval method, and display device |
CN106373574A (en) * | 2016-08-31 | 2017-02-01 | 乐视控股(北京)有限公司 | Speech recognition processing method and device |
CN106470199A (en) * | 2015-08-21 | 2017-03-01 | 石家庄市善理通益科技有限公司 | The processing method of speech data, device and intercom system |
CN108597522A (en) * | 2018-05-10 | 2018-09-28 | 北京奇艺世纪科技有限公司 | A kind of method of speech processing and device |
CN108831475A (en) * | 2018-05-24 | 2018-11-16 | 广州市千钧网络科技有限公司 | A kind of text message extracting method and system |
CN108881507A (en) * | 2018-03-01 | 2018-11-23 | 赵建文 | A kind of voice browser |
CN114895999A (en) * | 2022-03-31 | 2022-08-12 | 思必驰科技股份有限公司 | Interactive interface-based dialogue method and model |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1786957A (en) * | 2004-12-08 | 2006-06-14 | 国际商业机器公司 | Dynamic switching between local and remote speech rendering |
CN102520792A (en) * | 2011-11-30 | 2012-06-27 | 江苏奇异点网络有限公司 | Voice-type interaction method for network browser |
CN104240707A (en) * | 2012-11-26 | 2014-12-24 | 北京奇虎科技有限公司 | Browser and voice identification processing method for same |
-
2012
- 2012-11-26 CN CN201210488924.1A patent/CN103020165B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1786957A (en) * | 2004-12-08 | 2006-06-14 | 国际商业机器公司 | Dynamic switching between local and remote speech rendering |
CN102520792A (en) * | 2011-11-30 | 2012-06-27 | 江苏奇异点网络有限公司 | Voice-type interaction method for network browser |
CN104240707A (en) * | 2012-11-26 | 2014-12-24 | 北京奇虎科技有限公司 | Browser and voice identification processing method for same |
Non-Patent Citations (3)
Title |
---|
俞一彪等: ""基于MIM 模型的语音识别引擎SDSE及应用"", 《计算机工程与应用》, 30 November 2002 (2002-11-30) * |
俞一彪等: ""语音识别浏览器VoiceIE设计与实现"", 《数据采集与处理》, vol. 17, no. 1, 31 March 2002 (2002-03-31), pages 95 - 99 * |
叶小榕等: ""语音识别技术在数字图书馆检索系统中的应用"", 《科技导报》, 30 June 2008 (2008-06-30), pages 75 - 79 * |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104575499A (en) * | 2013-10-09 | 2015-04-29 | 携程计算机技术(上海)有限公司 | Voice control method of mobile terminal and mobile terminal |
CN105940399B (en) * | 2014-03-28 | 2019-04-26 | 松下知识产权经营株式会社 | Voice search device, speech search method and display device |
CN105940399A (en) * | 2014-03-28 | 2016-09-14 | 松下知识产权经营株式会社 | Speech retrieval device, speech retrieval method, and display device |
CN105096950A (en) * | 2014-05-22 | 2015-11-25 | 中兴通讯股份有限公司 | File naming method, file naming device and terminal |
CN104134442A (en) * | 2014-08-15 | 2014-11-05 | 广东欧珀移动通信有限公司 | Method and device for starting voice services |
WO2016058425A1 (en) * | 2014-10-17 | 2016-04-21 | 百度在线网络技术(北京)有限公司 | Voice search method, apparatus and device, and computer storage medium |
CN105632491B (en) * | 2014-11-26 | 2020-07-21 | 三星电子株式会社 | Method and electronic device for speech recognition |
CN105632491A (en) * | 2014-11-26 | 2016-06-01 | 三星电子株式会社 | Method and electronic device for voice recognition |
CN106470199A (en) * | 2015-08-21 | 2017-03-01 | 石家庄市善理通益科技有限公司 | The processing method of speech data, device and intercom system |
CN106373574A (en) * | 2016-08-31 | 2017-02-01 | 乐视控股(北京)有限公司 | Speech recognition processing method and device |
CN108881507A (en) * | 2018-03-01 | 2018-11-23 | 赵建文 | A kind of voice browser |
CN108881507B (en) * | 2018-03-01 | 2021-08-31 | 赵建文 | System comprising voice browser and block chain voice DNS unit |
CN108597522A (en) * | 2018-05-10 | 2018-09-28 | 北京奇艺世纪科技有限公司 | A kind of method of speech processing and device |
CN108831475A (en) * | 2018-05-24 | 2018-11-16 | 广州市千钧网络科技有限公司 | A kind of text message extracting method and system |
CN108831475B (en) * | 2018-05-24 | 2020-09-29 | 广州市千钧网络科技有限公司 | Text message extraction method and system |
CN114895999A (en) * | 2022-03-31 | 2022-08-12 | 思必驰科技股份有限公司 | Interactive interface-based dialogue method and model |
CN114895999B (en) * | 2022-03-31 | 2023-09-19 | 思必驰科技股份有限公司 | Interaction interface-based dialogue method and system |
Also Published As
Publication number | Publication date |
---|---|
CN103020165B (en) | 2016-06-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102968992B (en) | Voice identification processing method for internet explorer and internet explorer | |
CN103020165A (en) | Browser capable of performing voice recognition processing and processing method | |
CN104240707A (en) | Browser and voice identification processing method for same | |
CN105027121B (en) | The five application page of the machine application is indexed | |
US11669579B2 (en) | Method and apparatus for providing search results | |
CN109190049B (en) | Keyword recommendation method, system, electronic device and computer readable medium | |
CN107590174B (en) | Page access method and device | |
CN102495855B (en) | Automatic login method and device | |
CN108268635B (en) | Method and apparatus for acquiring data | |
CN108334517A (en) | A kind of webpage rendering intent and relevant device | |
CN107368515A (en) | The application program page recommends method and system | |
CN102144228A (en) | Resource locator suggestions from input character sequence | |
CN106663108A (en) | Deep links for native applications | |
CN103577596A (en) | Keyword searching method and device based on current browse webpage | |
CN103500211A (en) | Method and system for providing search services and client side | |
CN101964044A (en) | Method and device for acquiring bar code from browser of mobile phone | |
CN107958078A (en) | Information generating method and device | |
CN103530329A (en) | Method and device for performing webpage loading and browsers | |
CN102932367A (en) | Terminal equipment, system and method for implementing switching of application program | |
CN102063425A (en) | Translation method and device | |
CN103067919A (en) | Building method and authenticating method for connection between computing device and mobile device | |
US9710558B2 (en) | Method and apparatus for navigational searching of a website | |
CN104699836A (en) | Multi-keyword search prompting method and multi-keyword search prompting device | |
CN105260034B (en) | A kind of character input method and device | |
CN107885872B (en) | Method and device for generating information |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20220714 Address after: Room 801, 8th floor, No. 104, floors 1-19, building 2, yard 6, Jiuxianqiao Road, Chaoyang District, Beijing 100015 Patentee after: BEIJING QIHOO TECHNOLOGY Co.,Ltd. Address before: 100088 room 112, block D, 28 new street, new street, Xicheng District, Beijing (Desheng Park) Patentee before: BEIJING QIHOO TECHNOLOGY Co.,Ltd. Patentee before: Qizhi software (Beijing) Co.,Ltd. |