CN103839549A

CN103839549A - Voice instruction control method and system

Info

Publication number: CN103839549A
Application number: CN201210478777.XA
Authority: CN
Inventors: 曾亮; 陈磊; 薄川川; 邓朔; 郝宏伟
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2012-11-22
Filing date: 2012-11-22
Publication date: 2014-06-04

Abstract

The invention discloses a voice instruction control method and system. The voice instruction control method comprises: packing voice data received by a mobile terminal for sending to a server; matching the voice data with training samples in the server, determining a proper identification voice text, and returning the identification voice text to the mobile terminal; and commanding the mobile terminal to execute corresponding operation according to the content of the identification voice text returned by the serer. By using the voice instruction control method and system provided by the invention, the voice data received by the mobile terminal is sent to the server, the server determines the proper identification voice text by matching the voice data with the training samples in the server, such that voice identification is more accurate, the voice instruction accuracy is improved, and the user application experience is improved.

Description

A kind of phonetic order control method and system

[technical field]

The present invention relates to voice control technology field, particularly a kind of phonetic order control method and system.

[background technology]

Siri is the critical function that iphone4S carries, user can directly simply be exchanged and mobile phone is sent to instruction with smart mobile phone by voice, along with the issue of Siri Chinese edition, people never stop the discussion of the intelligent human-machine interaction technology (HCI) such as voice.And the Voice Actions(phonetic order of Android system) very solid reliable voice recognition engine is also provided, its high resolution is startling, but require the language that user inputs to possess strict syntactic structure and form, otherwise system is by None-identified.The no matter Siri of iphone or the Voice Actions of Android system, all only be based on mobile terminal this locality and carry out speech recognition, but owing to being subject to the impact of the factors such as environment for use or user pronunciation and syntactic structure and form, mobile terminal there will be the situation of speech recognition errors or None-identified, affects user's experience.

Therefore, be necessary to propose a kind of new technical scheme, there is the technical matters of speech recognition errors or None-identified to solve existing voice recognition technology.

[summary of the invention]

One object of the present invention is to provide a kind of phonetic order control method and system, is intended to solve existing voice recognition technology and exists the technical matters of speech recognition errors or None-identified.

For achieving the above object, the invention provides a kind of phonetic order control method, comprising:

The speech data packing that mobile terminal is received sends to server;

Speech data is mated with the training sample in server, determine suitable identification speech text, and identification speech text is returned to mobile terminal;

The identification speech text contents command mobile terminal returning according to server is carried out corresponding operation.

In above-mentioned phonetic order control method, before sending to server step, the described speech data packing that mobile terminal is received also comprises: enter intelligent sound identification interface by intelligent sound entrance, the input of wait user speech, and judge efficient voice input within effective time, whether to be detected, if efficient voice input do not detected within effective time, finish this phonetic entry; If efficient voice input detected within effective time, receive user speech.

In above-mentioned phonetic order control method, in described reception user speech step, also comprise: judge whether to recognize user speech input endpoint or input overtime, if not recognizing user speech input endpoint or input does not have overtime, the speech data receiving is encoded, and continue to receive next section of user speech; If recognize user speech input endpoint or input overtimely, stop receiving speech data, complete all encoded speech datas.

In above-mentioned phonetic order control method, described, speech data is mated with the training sample in server, determine suitable identification speech text, and also comprise before identification speech text is returned to mobile terminal step: cloud server receives encoded speech data, encoded speech data is decoded and denoising.

In above-mentioned phonetic order control method, described, speech data is mated with the training sample in server, determine suitable identification speech text, and identification speech text is returned in mobile terminal step and also comprised: according to the additional steering order of speech text content.

In above-mentioned phonetic order control method, before carrying out corresponding operation steps, the described identification speech text contents command mobile terminal returning according to server also comprises: receive identification speech text and resolve steering order, carry out operation corresponding to speech text content according to steering order type command mobile terminal, wherein, described steering order type comprises plug-in application type, local function type, popular type of site and search-type.

The present invention also provides a kind of phonetic order control system, comprises mobile terminal and server, and described mobile terminal comprises data transmission blocks and command execution module, and described server comprises that Data Matching module and data return to module,

Data transmission blocks: for the speech data packing of reception is sent to server;

Command execution module: carry out corresponding operation for the identification speech text contents command mobile terminal returning according to server;

Data Matching module: mate with the training sample of server for the speech data that mobile terminal is sent, determine suitable identification speech text;

Data are returned to module: for identification speech text is returned to mobile terminal.

In above-mentioned phonetic order control system, described mobile terminal also comprises

Interface enters module: for enter intelligent sound identification interface by intelligent sound entrance;

Speech detection module: for waiting for user speech input, and judge efficient voice input whether detected within effective time, if efficient voice input do not detected within effective time, finish this phonetic entry; If efficient voice input detected within effective time, receive speech data by phonetic incepting module.

Phonetic incepting module: for receiving user speech, and judge whether to recognize user speech input endpoint or input overtime, if not recognizing user speech input endpoint or input does not have overtime, by data coding module, the speech data receiving is encoded, phonetic incepting module continues to receive next section of user speech simultaneously; If recognize user speech input endpoint or input overtimely, stop receiving speech data, and complete all encoded speech datas by data coding module;

Data coding module: for all speech datas that receive are encoded, and send encoded speech data by data transmission blocks.

In above-mentioned phonetic order control system, described server also comprises data reception module: the encoded speech data sending for mobile terminal receive, encoded speech data is decoded and denoising.

In above-mentioned phonetic order control system, described Data Matching module is also for determining after suitable identification speech text according to the additional steering order of speech text content.

In above-mentioned phonetic order control system, described mobile terminal also comprises data resolution module: the identification speech text returning for reception server is also resolved steering order, and described command execution module is carried out operation corresponding to speech text content according to steering order type command mobile terminal.

In above-mentioned phonetic order control system, described steering order type comprises plug-in application type, local function type, popular type of site and search-type.

The speech data that phonetic order control method provided by the invention and system receive mobile terminal sends to server, server is by mating speech data with the training sample in server, determine suitable identification speech text, make speech recognition more accurate, improve the degree of accuracy of phonetic order, can greatly avoid the situation of mobile terminal sound identification error or None-identified, improve user's experience; In addition, the present invention classifies to the operating function of mobile terminal by the additional steering order of identification speech text content, improves the degree of accuracy of phonetic order.

For foregoing of the present invention can be become apparent, preferred embodiment cited below particularly, and coordinate appended graphicly, be described in detail below:

[accompanying drawing explanation]

Fig. 1 is the process flow diagram of the phonetic order control method of first embodiment of the invention;

Fig. 2 is the process flow diagram of the phonetic order control method of second embodiment of the invention;

Fig. 3 is the structural representation of the phonetic order control system of first embodiment of the invention;

Fig. 4 is the structural representation of the phonetic order control system of second embodiment of the invention.

[embodiment]

The explanation of following embodiment is graphic with reference to what add, can be in order to the specific embodiment of implementing in order to illustrate the present invention.

Please refer to Fig. 1, is the process flow diagram of the phonetic order control method of first embodiment of the invention.The phonetic order control method of first embodiment of the invention comprises the following steps:

Step S100: the speech data packing that mobile terminal is received sends to server;

Step S110: speech data is mated with the training sample in server, determine suitable identification speech text, and identification speech text is returned to mobile terminal;

In step S110, the present invention, by the speech data of user's input is uploaded onto the server and mated with the training sample in server, makes speech recognition more accurate, can greatly avoid the situation of mobile terminal sound identification error or None-identified;

Step S120: the identification speech text contents command mobile terminal returning according to server is carried out corresponding operation.

Please refer to Fig. 2, is the process flow diagram of the phonetic order control method of second embodiment of the invention.The phonetic order control method of second embodiment of the invention comprises the following steps:

Step S200: enter intelligent sound identification interface by intelligent sound entrance;

In step S200, user can be by clicking intelligent sound quick links icon or long by toolbar(tool bar) mode such as certain hour ejects intelligent sound and identifies interface, specifically seeing also Fig. 3, is mobile terminal intelligent sound identification interfacial effect figure of the present invention.In embodiments of the present invention, length, specifically can arrange according to different demands for being greater than 0.5s by the time of toolbar.

Step S210: wait for user speech input, and judge efficient voice input whether detected within effective time, if efficient voice input, execution step 220 do not detected within effective time; If efficient voice input detected within effective time, execution step 230;

In step 210, refer to the stand-by period of phonetic entry effective time, can arrange according to different demands, be set to 5s effective time in embodiments of the present invention; If user inputs voice within effective time, be efficient voice input, otherwise, if phonetic entry wait timeout finishes this input.

Step S220: finish this phonetic entry;

Step S230: receive user speech, and judge whether to recognize user speech input endpoint or input overtime, if do not recognize user speech input endpoint or input do not have overtime, execution step S240; If recognize user speech input endpoint or input overtime, execution step 250;

In step S230, recognize user speech input endpoint and refer to that user inputs a dead time after complete word or sentence and meets end points condition for identification, end points condition for identification can be set according to different situations, such as 5s, 10s etc.; If recognize user speech input endpoint or input overtimely, be defaulted as this phonetic entry complete, otherwise user can proceed phonetic entry.

Step S240: the speech data receiving is encoded, and again perform step next section of user speech of S230 continuation reception;

Step S250: stop receiving speech data, complete all encoded speech datas;

Step S260: all speech datas after coding are packed and asked to send to server by HTTP;

Step S270: cloud server receives encoded speech data, decodes encoded speech data denoising;

Step S280: decoded speech data is mated with the training sample in server, determine suitable identification speech text, according to the additional steering order of speech text content;

In step S280, the present invention, by the speech data of user's input is uploaded onto the server and mated with the training sample in server, makes speech recognition more accurate, can greatly avoid the situation of mobile terminal sound identification error or None-identified; Steering order is that cloud server is in determining identification speech text, according to the particular content of speech text, be mapped to the conventional operational instruction that client is supported, user side can carry out corresponding operation according to the steering order type command mobile terminal of speech text, for example, play music, send note, make a phone call, open webpage etc., have some situation of mistake identification, but along with the use result of a large number of users is constantly revised, it is accurate that this instruction also can be tending towards.

Step S290: speech text and steering order are returned to mobile terminal;

Step S300: receive speech text and resolve steering order, carrying out operation corresponding to speech text content according to steering order type command mobile terminal;

In step S300, steering order type comprises plug-in application type, local function type, popular type of site and search-type etc., wherein, if steering order type is plug-in application type, open corresponding application according to speech text content, as " music plug-in unit ", " Quick Response Code " etc.; If steering order type is local function type, call corresponding local function according to speech text content, as " opening bookmark ", " emptying all data " etc.; If steering order type is popular type of site, open corresponding webpage according to speech text content, as " Tengxun's homepage ", " Sina website "; Other speech texts that do not belong to above-mentioned three types, the present invention all thinks search-type, directly uses result corresponding to mobile terminal current search engine search speech text; Concrete key data structure is

typedef?enum?{

VoiceControlCmdUnkonwn?=?0x0,

VoiceControlCmdSerach,

VoiceControlCmdPlugin,

VoiceControlCmdLocalApp,

VoiceControlCmdWebSite

VoiceControlCmd; // voice control type

typedef?struct?{

Char * text; // speech recognition text

VoiceControlCmd controlCmd; // control type

Please refer to Fig. 3, is the structural representation of the phonetic order control system of first embodiment of the invention.The phonetic order control system of first embodiment of the invention comprises mobile terminal and server, and mobile terminal comprises data transmission blocks and command execution module, and server comprises that Data Matching module and data return to module, wherein

Data Matching module: mate with the training sample of server for the speech data that mobile terminal is sent, determine suitable identification speech text; Wherein, the present invention, by the speech data of user's input is uploaded onto the server and mated with the training sample in server, makes speech recognition more accurate, can greatly avoid the situation of mobile terminal sound identification error or None-identified;

Data are returned to module: for identification speech text is returned to mobile terminal;

Please refer to Fig. 4, is the structural representation of the phonetic order control system of second embodiment of the invention.The phonetic order control system of second embodiment of the invention comprises mobile terminal and server, mobile terminal comprises that interface enters module, speech detection module, phonetic incepting module, data coding module, data transmission blocks, data resolution module and command execution module, server comprises that data reception module, Data Matching module and data return to module, wherein

Interface enters module: for enter intelligent sound identification interface by intelligent sound entrance; Wherein, user can be by clicking intelligent sound quick links icon or long by toolbar(tool bar) mode such as certain hour ejects intelligent sound identification interface, specifically sees also Fig. 3, is that mobile terminal intelligent sound of the present invention is identified interfacial effect figure.In embodiments of the present invention, length, specifically can arrange according to different demands for being greater than 0.5s by the time of toolbar.

Speech detection module: for waiting for user speech input, and judge efficient voice input whether detected within effective time, if efficient voice input do not detected within effective time, finish this phonetic entry; If efficient voice input detected within effective time, receive speech data by phonetic incepting module; Wherein, refer to the stand-by period of phonetic entry effective time, can arrange according to different demands, be set to 5s effective time in embodiments of the present invention; If user inputs voice within effective time, be efficient voice input, otherwise, if phonetic entry wait timeout finishes this input.

Phonetic incepting module: for receiving user speech, and judge whether to recognize user speech input endpoint or input overtime, if not recognizing user speech input endpoint or input does not have overtime, by data coding module, the speech data receiving is encoded, phonetic incepting module continues to receive next section of user speech simultaneously; If recognize user speech input endpoint or input overtimely, stop receiving speech data, and complete all encoded speech datas by data coding module; Wherein, recognize user speech input endpoint and refer to that user inputs a dead time after complete word or sentence and meets end points condition for identification, end points condition for identification can be set according to different situations, such as 5s, 10s etc.; If recognize user speech input endpoint or input overtimely, be defaulted as this phonetic entry complete, otherwise user can proceed phonetic entry.

Data coding module: for all speech datas that receive are encoded, and send encoded speech data by data transmission blocks;

Data transmission blocks: for all speech datas after coding are packed and asked to send to server by HTTP;

Data resolution module: the identification speech text returning for reception server is also resolved steering order;

Command execution module: for carrying out operation corresponding to speech text content according to steering order type command mobile terminal; Wherein, steering order type comprises plug-in application type, local function type, popular type of site and search-type etc., wherein, if steering order type is plug-in application type, open corresponding application according to speech text content, as " music plug-in unit ", " Quick Response Code " etc.; If steering order type is local function type, call corresponding local function according to speech text content, as " opening bookmark ", " emptying all data " etc.; If steering order type is popular type of site, open corresponding webpage according to speech text content, as " Tengxun's homepage ", " Sina website "; Other speech texts that do not belong to above-mentioned three types, the present invention all thinks search-type, directly uses result corresponding to mobile terminal current search engine search speech text; Concrete key data structure is

typedef?enum?{

VoiceControlCmdUnkonwn?=?0x0,

VoiceControlCmdSerach,

VoiceControlCmdPlugin,

VoiceControlCmdLocalApp,

VoiceControlCmdWebSite

VoiceControlCmd; // voice control type

typedef?struct?{

Char * text; // speech recognition text

VoiceControlCmd controlCmd; // control type

Data reception module: the encoded speech data sending for mobile terminal receive, encoded speech data is decoded and denoising;

Data Matching module: for decoded speech data is mated with the training sample result of server, determine suitable identification speech text, according to the additional steering order of speech text content; Wherein, the present invention, by the speech data of user's input is uploaded onto the server and mated with the training sample in server, makes speech recognition more accurate, can greatly avoid the situation of mobile terminal sound identification error or None-identified; Steering order is that cloud server is in determining identification speech text, according to the particular content of speech text, be mapped to the conventional operational instruction that client is supported, user side can carry out corresponding operation according to the steering order type command mobile terminal of speech text, for example, play music, send note, make a phone call, open webpage etc., have some situation of mistake identification, but along with the use result of a large number of users is constantly revised, it is accurate that this instruction also can be tending towards.

Data are returned to module: for speech text and steering order are returned to mobile terminal;

The speech data that phonetic order control method provided by the invention and system receive mobile terminal sends to server, server is by mating speech data with the training sample in server, determine that returning to mobile terminal after suitable identification speech text carries out corresponding operation again, make speech recognition more accurate, can greatly avoid the situation of mobile terminal sound identification error or None-identified, improve user's experience; In addition, the present invention classifies to the operating function of mobile terminal by the additional steering order of identification speech text content, improves the degree of accuracy of phonetic order.

In sum; although the present invention discloses as above with preferred embodiment; but above preferred embodiment is not in order to limit the present invention; those of ordinary skill in the art; without departing from the spirit and scope of the present invention; all can do various changes and retouching, the scope that therefore protection scope of the present invention defines with claim is as the criterion.

Claims

1. a phonetic order control method, comprising:

The speech data packing that mobile terminal is received sends to server;

2. phonetic order control method according to claim 1, it is characterized in that, before sending to server step, the described speech data packing that mobile terminal is received also comprises: enter intelligent sound identification interface by intelligent sound entrance, the input of wait user speech, and judge efficient voice input within effective time, whether to be detected, if efficient voice input do not detected within effective time, finish this phonetic entry; If efficient voice input detected within effective time, receive user speech.

3. phonetic order control method according to claim 2, it is characterized in that, in described reception user speech step, also comprise: judge whether to recognize user speech input endpoint or input overtime, if not recognizing user speech input endpoint or input does not have overtime, the speech data receiving is encoded, and continue to receive next section of user speech; If recognize user speech input endpoint or input overtimely, stop receiving speech data, complete all encoded speech datas.

4. phonetic order control method according to claim 3, it is characterized in that, described, speech data is mated with the training sample in server, determine suitable identification speech text, and also comprise before identification speech text is returned to mobile terminal step: cloud server receives encoded speech data, encoded speech data is decoded and denoising.

5. phonetic order control method according to claim 1, it is characterized in that, described, speech data is mated with the training sample in server, determine suitable identification speech text, and identification speech text is returned in mobile terminal step and also comprised: according to the additional steering order of speech text content.

6. phonetic order control method according to claim 1 or 5, it is characterized in that, before carrying out corresponding operation steps, the described identification speech text contents command mobile terminal returning according to server also comprises: receive identification speech text and resolve steering order, carry out operation corresponding to speech text content according to steering order type command mobile terminal, wherein, described steering order type comprises plug-in application type, local function type, popular type of site and search-type.

7. a phonetic order control system, is characterized in that, comprises mobile terminal and server, and described mobile terminal comprises data transmission blocks and command execution module, and described server comprises that Data Matching module and data return to module,

8. phonetic order control system according to claim 7, is characterized in that, described mobile terminal also comprises

9. phonetic order control system according to claim 8, is characterized in that, described mobile terminal also comprises

10. phonetic order control system according to claim 9, is characterized in that, described server also comprises data reception module: the encoded speech data sending for mobile terminal receive, encoded speech data is decoded and denoising.

11. phonetic order control system according to claim 7, is characterized in that, described Data Matching module is also for determining after suitable identification speech text according to the additional steering order of speech text content.

12. according to the phonetic order control system described in claim 7 or 11, it is characterized in that, described mobile terminal also comprises data resolution module: the identification speech text returning for reception server is also resolved steering order, and described command execution module is carried out operation corresponding to speech text content according to steering order type command mobile terminal.

13. phonetic order control system according to claim 12, is characterized in that, described steering order type comprises plug-in application type, local function type, popular type of site and search-type.