WO2016112644A1

WO2016112644A1 - Voice control method, apparatus, and terminal

Info

Publication number: WO2016112644A1
Application number: PCT/CN2015/082221
Authority: WO
Inventors: 党松
Original assignee: 中兴通讯股份有限公司
Priority date: 2015-01-13
Filing date: 2015-06-24
Publication date: 2016-07-21
Also published as: CN105845136A

Abstract

A voice control method, apparatus, and terminal, the method comprising: during a voice control process, learning sample operations from a user to obtain a voice response execution model (S101), the voice response execution model comprises execution instructions corresponding to executing the operations; then associating the learned voice response execution model with corresponding voice instructions (S102); when one of the voice instructions inputted by the user is received, triggering and executing the execution instruction associated with said voice instruction in the voice response execution model (S103), such that a corresponding function application on a terminal is activated.

Description

Voice control method, device and terminal

Technical field

This document relates to the field of communications, and in particular, to a voice control method, apparatus, and terminal.

Background technique

With the popularization of smart terminals such as smart phones and IPADs, various terminals with voice interactive assistants have also appeared frequently. For example, Google Now on Android, Siri voice on Apple system, etc. This kind of voice interaction system relies on powerful database support and can communicate with people simply. Although it is full of fun, the practicality varies from person to person. For example, in the current voice interaction system, when you say "I am hungry," the system will help you search for places in the vicinity that can be eaten in a fixed mode. However, some users say "I am hungry" may just want to call the family to ask what to do at night, not to find a restaurant. For another example, when encountering a dangerous situation, the user may need to quietly use the mobile phone to alert and notify the family and report his/her location information, but the voice interaction function of the related technology obviously cannot allow the user to secretly control the mobile phone to complete this. Coherent action. The reason for the above problems is that the current voice interaction assistants rely on the already solidified programs and modes and people to "communicate", which can not meet the ever-changing personality needs of different users.

Summary of the invention

The present invention provides a voice control method, device and terminal, and the voice interaction of the related technology can only achieve a fixed function through a fixed voice instruction, and cannot meet the individualized requirements of different users.

A voice control method includes:

The operation of learning the user example obtains a voice response execution model, the voice response execution model including an execution instruction corresponding to the execution of the operation;

Associating the voice response execution model with a corresponding voice instruction;

After receiving the voice instruction, executing an execution instruction in the voice response execution model associated with the voice instruction.

In an embodiment of the invention, the voice instruction is a private voice command recorded by the user or Standard voice commands preset in the terminal.

In an embodiment of the present invention, when the voice instruction is a private voice instruction, acquiring the private voice instruction includes:

The operation of the user example is obtained before the voice response execution model is obtained, or after the operation of the user example is obtained to obtain the voice response execution model, the voice instruction input by the user is collected;

Performing acoustic feature extraction on the voice command to obtain a corresponding private voice command.

In one embodiment of the invention, the operation of the user example includes controlling one operation of an application, or controlling a plurality of consecutive operations of one application, or controlling the operations of at least two applications.

In an embodiment of the present invention, the operation of learning the user example to obtain the voice response execution model includes:

Record the operation of the user example;

Converting each operation of the user example into a corresponding execution instruction;

The speech response execution model is obtained by solidifying the execution order of each execution instruction in the execution order of each operation.

In one embodiment of the invention, the execution instructions include an execution request instruction and an execution response instruction.

A voice control device includes a model building module, an association module, and an execution module:

The model establishing module is configured to: learn an operation of the user example to obtain a voice response execution model, and the voice response execution model includes an execution instruction corresponding to the performing the operation;

The association module is configured to: associate the voice response execution model with a corresponding voice instruction;

The execution module is configured to: after receiving the voice instruction, execute an execution instruction in a voice response execution model associated with the voice instruction.

In an embodiment of the present invention, the method further includes a voice acquiring module and a voice processing module;

The voice acquiring module is configured to: collect the voice input by the user after the model establishing module learns the operation of the user example to obtain the voice response execution model, or after the model establishing module learns the operation of the user example to obtain the voice response execution model. Instruction

The voice processing module is configured to perform acoustic feature extraction on the voice command to obtain a corresponding private voice command.

In an embodiment of the present invention, the model building module includes a recording submodule, an analysis submodule, and a solidifying submodule;

The recording submodule is configured to: record an operation of a user example;

The analysis sub-module is configured to: convert each operation of the user example into a corresponding execution instruction;

The solidification sub-module is configured to obtain a speech response execution model by curing the execution order of each execution instruction in the execution order of each operation.

A terminal comprising a voice control device as described above.

A computer readable storage medium storing computer executable instructions for performing the method of any of the above.

The voice control method, device and terminal provided by the embodiment of the present invention can learn the operation of the user example to obtain a voice response execution model, and the voice response execution model includes executing an execution instruction corresponding to each operation, and then learning The obtained voice response execution model is associated with the corresponding voice instruction; after receiving the voice command input by the user, the execution command in the voice response execution model associated with the voice instruction can be triggered to start the function application corresponding to the terminal. It can be seen that the embodiment of the present invention can learn the example operation of the user (different users can subscribe to different operations to implement different functions) and obtain a corresponding voice response execution model, that is, different users can privately subscribe the voice control function according to their own needs. Therefore, it can meet the individual needs of different users; get rid of the related technology and the terminal's voice interaction curing mode, enhance the user's personalized experience, and better meet the user's privatization needs.

BRIEF abstract

1 is a schematic flowchart of a voice control method according to Embodiment 1 of the present invention;

2 is a schematic flowchart of obtaining a private voice command of a user according to Embodiment 1 of the present invention;

3 is a schematic flowchart of an operation process of an example of a learning user according to Embodiment 1 of the present invention;

4 is a schematic structural diagram of a voice control apparatus according to Embodiment 2 of the present invention;

FIG. 5 is a schematic structural diagram of another voice control apparatus according to Embodiment 2 of the present invention; FIG.

6 is a schematic structural diagram of a model establishing module of a voice control apparatus according to Embodiment 2 of the present invention;

FIG. 7 is a schematic flowchart of a user private subscription process according to Embodiment 3 of the present invention;

FIG. 8 is a schematic flowchart of a user voice triggering subscription function according to Embodiment 3 of the present invention.

Embodiments of the invention

Embodiments of the present invention will be described below with reference to the accompanying drawings.

Embodiment 1:

In the voice control method provided by the embodiment, the terminal can learn the example operation of the user and obtain the corresponding voice response execution model, that is, different users can customize the voice control function according to their own needs, thereby obtaining the most "understanding". The user and the terminal that most listens to the user's words; get rid of the curing mode of the voice interaction with the terminal, and enhance the personalized experience of the user, and better meet the privatization needs of the user. Referring to FIG. 1, the voice control method in this embodiment includes:

Step 101: Learning an operation of the user example to obtain a voice response execution model, the voice response execution model including an execution instruction corresponding to each operation of executing the user example;

Step 102: Associate the obtained voice response execution model with a corresponding voice instruction.

Step 103: After receiving the voice instruction input by the user, execute an execution instruction in the voice response execution model associated with the voice instruction to start a function application corresponding to the terminal.

It should be understood that the voice command in the above step 102 may be a private voice command recorded by the user, or may be a preset standard voice command in the terminal, and the standard voice command may be preset before the terminal leaves the factory, or Download the corresponding network platform. It should be understood that, in this embodiment, a voice command may be associated with a voice response execution model, or multiple voice commands may be associated with a voice response execution model, that is, a voice response execution model may be triggered. A plurality of voice commands are implemented; for example, four voice commands of "call", "dial", "call", and "initiate" may be associated with a voice response execution model for implementing functions such as calling.

When the voice command in the above step 102 is a private voice command input by the user, the security used by the terminal is better. First, the other user does not know whether the private voice command is an instruction to enable the terminal function and an instruction to enable the function; Second, private voice commands can also bind users, which can further improve security. For the process of obtaining the private voice command of the user in this embodiment, refer to FIG. 2, which includes:

Step 201: Before the operation of the user example is obtained, the voice response execution model is obtained, or after the operation of the user example is obtained, the voice response execution model is obtained, and the voice command input by the user is collected, and the user voice can be implemented by using a device such as a MIC on the terminal. Collection

Step 202: Perform acoustic feature extraction on the collected voice command to obtain a corresponding private voice command, and save the file.

In this embodiment, the user can customize the privatized voice triggering operation process according to the needs of the user. The operation of the user example may be one operation or multiple consecutive operations of one application in the control terminal, or may include controlling at least two applications. The operation, such as the user example, the following operations: wake up the terminal -> open the camera application -> focus -> 3 continuous shooting and save -> exit camera application -> open WeChat application -> select the latest shot of 3 photos -> share to friends ring. The process of learning the user example of this series of operations to obtain a speech response execution model is shown in Figure 3, including:

Step 301: Record the operation of the user example, that is, first input each operation of the user example;

Step 302: Convert each operation of the user example into a corresponding execution instruction, where the execution instruction includes executing an execution request instruction (including a corresponding parameter) corresponding to the action and executing a response instruction (including a corresponding parameter);

Step 303: Acquire an execution sequence of each execution instruction according to an execution order of each operation of the user example to obtain a voice response execution model;

Step 304: Save the obtained voice response execution model, and save the file to a model database local to the terminal. Of course, you can also save to a remote database and call it when needed.

Repeat the above steps 301-304 to obtain a plurality of voice response execution models customized by the user. in. In this embodiment, each time a speech response execution model is obtained, an associated operation may be performed immediately with the corresponding voice instruction, and then the next speech response execution model may be learned; after the learning is completed, a plurality of speech response execution models may be obtained, and then Each voice response execution model is associated with each corresponding voice instruction. In this embodiment, the user may also modify the voice command corresponding to each voice response execution model, or modify the learned voice response execution model. In this embodiment, each operation of the user example may be divided into a system level and an application level, and the operation for waking up the terminal and opening the application belongs to the system level; and the corresponding operation in the opened application belongs to the application level.

Set the user's wake-up terminal -> turn on the camera application -> focus -> 3 continuous shooting and save -> exit camera application -> open WeChat application -> select the latest shot of 3 photos -> share to friends circle this series of operations The operation associated with the corresponding voice response execution model is the user-customized "Heavenly King" voice command. After the completion of the voice response execution model and the "Tianwang Gaihu" voice command, when receiving the voice command issued by the user, it is determined whether the voice command is a "Heavenly King" voice command, and if so, the call is made. The associated speech response execution model executes each execution instruction in the speech response execution model according to the solidified order. The corresponding operation of the terminal is: wake up the terminal -> open the camera application -> focus -> 3 continuous shooting and save -> exit Photo Application -> Open WeChat App -> Select the latest 3 photos -> Share to a circle of friends.

Embodiment 2:

The embodiment also provides a voice control device, which can be applied to various smart terminals such as a smart phone and an iPad. Referring to FIG. 4, the model includes a model building module 41, an association module 42 and an execution module 43:

The model establishing module 41 is configured to: learn an operation of the user example to obtain a voice response execution model, and the voice response execution model includes executing an execution instruction corresponding to each operation;

The association module 42 is configured to: associate a voice response execution model obtained by the model establishment module with a corresponding voice instruction;

The execution module 43 is configured to: after receiving the voice instruction input by the user, execute an execution instruction in the voice response execution model associated with the voice instruction to start a function corresponding to the terminal use.

It should be understood that the voice command in this embodiment may be a private voice command recorded by the user, or may be a preset standard voice command in the terminal, and the standard voice command may be preset by the terminal before leaving the factory, or Download the corresponding network platform. In this embodiment, a voice command may be associated with a voice response execution model, or a plurality of voice commands may be associated with a voice response execution model, that is, triggering a voice response execution model may be implemented by using multiple voice commands.

The voice command in this embodiment may be a private voice command entered by the user, because the security used by the terminal is better at this time. First, other users do not know whether the private voice command is an instruction to enable the terminal function and what function is enabled. Instructions; second, private voice commands can also bind users, thus further improving security. Referring to Figure 5, the voice control device further includes a voice acquisition module 44 and a voice processing module 45;

The voice acquisition module 44 is configured to: before the model establishing module 41 learns the operation of the user example to obtain the voice response execution model, or after the model establishing module 41 learns the operation of the user example to obtain the voice response execution model, the voice command input by the user is collected; The user voice is collected through the MIC and other devices on the terminal;

The voice processing module 45 is configured to perform acoustic feature extraction on the voice command acquired by the voice acquiring module 44 to obtain a corresponding private voice command.

In this embodiment, the user can customize the privatized voice triggering operation process according to the requirements of the user. The operation of the user example may be an operation of controlling one application in the terminal, or may include an operation of controlling at least two applications, such as a user example. The following operations: wake up the terminal -> call emergency call xxx-> immediately initiate location location -> send a message to the emergency contact, with the positioning result. Referring to FIG. 6, the model building module 41 in this embodiment includes a recording submodule 411, an analysis submodule 412, and a solidifying submodule 413;

The recording submodule 411 is configured to: record an operation of the user example;

The analysis sub-module 412 is configured to: convert each operation of the user example into a corresponding execution instruction; the execution instruction includes an execution request instruction (including a corresponding parameter) corresponding to the execution of the action, and an execution response instruction (including a corresponding parameter);

The solidification sub-module 413 is arranged to obtain a speech response execution model in accordance with the execution order of each execution instruction in the execution order of each operation of the user example.

Let the wake-up terminal of the user example -> call emergency call xxx -> immediately initiate location fix -> send a message to the emergency contact, and the operation result associated with the series of operations with the positioning result is the user's private subscription The "Pineapple Jackfruit" voice command. After the completion of the voice response execution model and the "pineapple jackfruit" voice instruction, when the voice instruction issued by the user is received, it is determined whether the voice instruction is a "pineapple jackfruit" voice instruction, and if so, the associated voice response is called. Executing the model, executing each execution instruction in the voice response execution model according to the solidified order, the corresponding operation of the terminal is: waking up the terminal->calling the emergency call xxx->initiating the location location immediately->sending the message to the emergency contact person, With the positioning results.

Embodiment 3:

In this embodiment, the voice command is taken as an example of the private voice command input by the user before the voice response execution model is established, and the entire voice control process is exemplarily described.

The process of user private subscription is shown in Figure 7, including:

Step 701: The user turns on the voice manipulation customization mode of the terminal.

Step 702: The terminal prompts the user to enter a private voice instruction, and the user promptly speaks to the terminal according to the prompt: “Pineapple Jackfruit”;

Step 703: After completing the first pass of the entry, the terminal asks the user to perform another confirmation entry, and the user again prompts to say: “Pineapple Jackfruit”; of course, it can be entered once;

Step 704: After completing the second pass of the entry, the terminal analyzes and models the content recorded twice before and after, and compares whether the content is consistent before and after. If they are consistent, go to step 705. If they are not consistent, go back to step 702 and let the user re-enter;

Step 705: The terminal saves the voice instruction, and prompts the user to complete the private voice instruction entry. Please continue to enter the corresponding operation, and the user begins to input the example operation according to the prompt:

Wake up the terminal -> call emergency call xxx-> immediately initiate location fix -> send a message to the emergency contact, with the result of the positioning.

Step 706: The terminal records the operation of the user example, and the terminal starts the operation of the user example. The line is decomposed and the following speech response execution model is obtained:

(Wake up terminal:)

SYS_LEVEL: WAKE_UP_Req->

SYS_LEVEL: WAKE_UP_Res(OK)->

(Open the phone app:)

SYS_LEVEL: OPEN_APP_Req(Call)->

SYS_LEVEL: OPEN_APP_Res(OK)->

(call emergency number:)

APP_LEVEL: SET_UP_EMERGENCY_CALL_Req(number)->

APP_LEVEL: SET_UP_EMERGENCY_CALL_Res(OK)->

(Open GPS location:)

SYS_LEVEL: OPEN_APP_Req(LBS)->

SYS_LEVEL: OPEN_APP_Res(OK)->

(Initiation of targeting:)

APP_LEVEL: SET_UP_LBS_SERVICE_Req(Local position)->

APP_LEVEL: SET_UP_LBS_SERVICE_Res(Local position result)->

(Open the message app:)

SYS_LEVEL: OPEN_APP_Req(Message)->

SYS_LEVEL: OPEN_APP_Res(OK)->

(Edit message, with positioning results:)

APP_LEVEL: EDIT_req(Local position result and other information)->

APP_LEVEL: EDIT_res(OK)->

(Send a message:)

APP_LEVEL: SEND_MESSAGE_Req(number)->

APP_LEVEL: SEND_MESSAGE_Res (OK)

Step 707: The terminal saves the obtained voice response execution model, and prompts the user to complete the entry, whether to continue the entry; if the user selects yes, go to step 702; if no, go to step 708:

Step 708: Exit the voice manipulation customization mode.

The process of triggering a pre-set function on a specific occasion by a privatized voice command is shown in Figure 8, including:

Step 801: Receive a voice command sent by the user; for example, when the user encounters a certain dangerous situation, for example, in a hijacked state, the user cannot alarm, and the user secretly triggers the terminal by speaking the privatization command “pineapple jackfruit”. Call the police;

Step 802: After receiving the voice command sent by the user, the terminal analyzes the command and compares it with the stored private voice command; if the command is not legal, go to step 801; if it is legal, go to step 803;

Step 803: Call up the voice response execution model corresponding to the voice instruction, and start parsing and executing:

The speech response execution model is executed here as follows:

SYS_LEVEL: WAKE_UP_Req->

SYS_LEVEL: WAKE_UP_Res(OK)->

SYS_LEVEL: OPEN_APP_Req(Call)->

SYS_LEVEL: OPEN_APP_Res(OK)->

APP_LEVEL: SET_UP_EMERGENCY_CALL_Req(number)->

APP_LEVEL: SET_UP_EMERGENCY_CALL_Res(OK)->

SYS_LEVEL: OPEN_APP_Req(LBS)->

SYS_LEVEL: OPEN_APP_Res(OK)->

APP_LEVEL: SET_UP_LBS_SERVICE_Req(Local position)->

APP_LEVEL: SET_UP_LBS_SERVICE_Res(Local position result)->

SYS_LEVEL: OPEN_APP_Req(Message)->

SYS_LEVEL: OPEN_APP_Res(OK)->

APP_LEVEL: EDIT_req(Local position result and other information)->

APP_LEVEL: EDIT_res(OK)->

APP_LEVEL: SEND_MESSAGE_Req(number)->

APP_LEVEL: SEND_MESSAGE_Res (OK)

among them:

SYS_LEVEL: WAKE_UP_Req->

SYS_LEVEL: The action corresponding to WAKE_UP_Res (OK) is: wake up the terminal.

SYS_LEVEL: OPEN_APP_Req(Call)->

SYS_LEVEL: OPEN_APP_Res (OK) corresponding action: open the phone application.

APP_LEVEL: SET_UP_EMERGENCY_CALL_Req(number)->

APP_LEVEL: The action corresponding to SET_UP_EMERGENCY_CALL_Res (OK) is: Make an emergency call.

SYS_LEVEL: OPEN_APP_Req(LBS)->

SYS_LEVEL: OPEN_APP_Res (OK) corresponding action: open GPS positioning.

APP_LEVEL: SET_UP_LBS_SERVICE_Req(Local position)->

APP_LEVEL: SET_UP_LBS_SERVICE_Res (Local position result) -> The corresponding action is: initiate positioning.

SYS_LEVEL: OPEN_APP_Req(Message)->

SYS_LEVEL: OPEN_APP_Res (OK) corresponding action: open the message application.

APP_LEVEL: EDIT_req(Local position result and other information)->

APP_LEVEL: The action corresponding to EDIT_res(OK) is: edit message, with positioning result.

APP_LEVEL: SEND_MESSAGE_Req(number)->

APP_LEVEL: The action corresponding to SEND_MESSAGE_Res (OK) is: send a message.

The final complete model execution process is:

It can be seen that the voice control scheme provided by the embodiment of the present invention can enable the user to get rid of the current curing mode of the voice communication with the terminal, train the listener to listen to the most, and understand the terminal of the user, which not only enhances the personalized experience of the user, but also enhances the personalized experience of the user. Can solve many privatization needs of users.

One of ordinary skill in the art will appreciate that all or a portion of the steps of the above-described embodiments can be implemented using a computer program flow, which can be stored in a computer readable storage medium, such as on a corresponding hardware platform (eg, The system, device, device, device, etc. are executed, and when executed, include one or a combination of the steps of the method embodiments.

Alternatively, all or part of the steps of the above embodiments may also be implemented by using an integrated circuit. These steps may be separately fabricated into individual integrated circuit modules, or multiple modules or steps may be fabricated into a single integrated circuit module. achieve.

The devices/function modules/functional units in the above embodiments may be implemented by a general-purpose computing device, which may be centralized on a single computing device or distributed over a network of multiple computing devices.

When the device/function module/functional unit in the above embodiment is implemented in the form of a software function module and sold or used as a stand-alone product, it can be stored in a computer readable storage medium. The above mentioned computer readable storage medium may be a read only memory, a magnetic disk or an optical disk or the like.

Industrial applicability

Through the embodiment of the present invention, different users can customize the voice control function according to their own needs, so that the personalized requirements of different users can be met; the curing mode of the voice interaction with the terminal is eliminated, and the personalized experience of the user is enhanced. It can better meet the privatization needs of users.

Claims

A voice control method includes:

The operation of learning the user example obtains a voice response execution model, the voice response execution model including an execution instruction corresponding to the execution of the operation;

Associating the voice response execution model with a corresponding voice instruction;

After receiving the voice instruction, executing an execution instruction in the voice response execution model associated with the voice instruction.
The voice control method according to claim 1, wherein the voice command is a private voice command recorded by a user or a standard voice command preset in the terminal.
The voice control method according to claim 2, wherein when the voice instruction is a private voice instruction, acquiring the private voice instruction comprises:

The operation of the user example is obtained before the voice response execution model is obtained, or after the operation of the user example is obtained to obtain the voice response execution model, the voice instruction input by the user is collected;

Performing acoustic feature extraction on the voice command to obtain a corresponding private voice command.
The voice control method according to any one of claims 1 to 3, wherein the operation of the user example includes controlling one operation of an application, or controlling a plurality of continuous operations of one application, or controlling operations of at least two applications. .
The voice control method according to any one of claims 1 to 3, wherein the operation of learning the user example to obtain the voice response execution model comprises:

Record the operation of the user example;

Converting each operation of the user example into a corresponding execution instruction;

The speech response execution model is obtained by solidifying the execution order of each execution instruction in the execution order of each operation.
The voice control method of claim 5, wherein the execution instruction comprises an execution request instruction and an execution response instruction.
A voice control device includes a model building module, an association module, and an execution module:

The model building module is configured to: learn the operation of the user example to obtain a voice response execution mode Type, the voice response execution model includes an execution instruction corresponding to performing the operation;

The association module is configured to: associate the voice response execution model with a corresponding voice instruction;

The execution module is configured to: after receiving the voice instruction, execute an execution instruction in a voice response execution model associated with the voice instruction.
The voice control device according to claim 7, further comprising a voice acquisition module and a voice processing module;

The voice acquiring module is configured to: collect the voice input by the user after the model establishing module learns the operation of the user example to obtain the voice response execution model, or after the model establishing module learns the operation of the user example to obtain the voice response execution model. instruction;

The voice processing module is configured to perform acoustic feature extraction on the voice command to obtain a corresponding private voice command.
The voice control device according to claim 7 or 8, wherein the model building module comprises a recording submodule, an analysis submodule and a solidifying submodule;

The recording submodule is configured to: record an operation of a user example;

The analysis sub-module is configured to: convert each operation of the user example into a corresponding execution instruction;

The solidification sub-module is configured to obtain a speech response execution model by curing the execution order of each execution instruction in the execution order of each operation.
A terminal comprising the voice control device according to any one of claims 7-9.
A computer readable storage medium storing computer executable instructions for performing the method of any of claims 1-6.