WO2016082344A1

WO2016082344A1 - Voice control method and apparatus, and storage medium

Info

Publication number: WO2016082344A1
Application number: PCT/CN2015/072705
Authority: WO
Inventors: 魏占婷
Original assignee: 中兴通讯股份有限公司
Priority date: 2014-11-25
Filing date: 2015-02-10
Publication date: 2016-06-02
Also published as: CN105611033A

Abstract

Provided are a voice control method and apparatus, and a storage medium. The method applied to a terminal side comprises: obtaining an input voice of a user; if the terminal side opens a pre-set function, determining whether an identification voice consistent with the input voice is pre-stored at the terminal side; if the identification voice exists, according to the identification voice, obtaining pre-set information irrelevant to the meaning of the identification voice; and executing an operation corresponding to the pre-set information.

Description

Method, device and storage medium for voice control

Technical field

The present invention relates to the field of communications technologies, and in particular, to a voice control method, apparatus, and storage medium.

Background technique

Mobile phones have become an inseparable tool in people's daily life. The security of mobile phones is becoming more and more important. The frequency of voice input is increasing. The voice input on the market is the actual voice meaning of the user after the terminal recognizes the user's voice. Repeat or display. For example, using Siri (a voice control function introduced by Apple), users can read text messages, introduce restaurants, ask for weather, and set alarm clocks through mobile phones. Siri can support natural language input and can call the system's own weather forecast and schedule. Applications such as scheduling and searching for materials can also continue to learn new voices and intonations and provide a conversational response. However, since the voice input in the prior art is the actual meaning of controlling the voice input by the terminal, the method is easy for other users to easily know the purpose. For example, when a dangerous situation occurs, the user must issue a "dial 110" sound, and the terminal can Automatically dial 110; however, after the user makes a "dial 110" sound, the other person knows the user's intention, can immediately block it, cut off the terminal to dial 110, thereby affecting the user to implement self-help and so on. In summary, the method in which the terminal operates according to the direct meaning of the user's voice is insecure, and other users can more easily acquire the user's intention, thereby affecting the user's operation.

Summary of the invention

In order to solve the existing technical problems, embodiments of the present invention mainly provide a method, an apparatus, and a storage medium for voice control.

The embodiment of the invention provides a method for voice control, which is applied to a terminal side, and the method includes:

Obtain the input voice of the user;

If the terminal side starts the preset function, it is determined whether the terminal side has a signature voice consistent with the input voice.

If the identification voice is present, the pre-set preset information that is not related to the meaning of the identifier voice is acquired according to the identifier voice;

Executing an operation corresponding to the preset information.

The embodiment of the present invention further provides a device for voice control, which is applied to a terminal side, where the device includes: a voice acquiring module, a determining module, a preset information acquiring module, and a first executing module;

a voice acquisition module configured to acquire an input voice of the user;

a determining module, configured to determine whether the terminal side has a signature voice consistent with the input voice, if the preset function is enabled on the terminal side;

a preset information acquiring module, configured to: if the identification voice exists, obtain preset information that is not related to the meaning of the identifier voice according to the identifier voice;

The first execution module is configured to perform an operation corresponding to the preset information.

The embodiment of the present invention further provides a terminal, where the terminal includes a processor, and the processor is configured to acquire an input voice of the user; if the terminal side starts the preset function, it is determined whether the terminal side stores the pre-installation And the preset voice that is not related to the meaning of the voice is obtained according to the voice, and the operation corresponding to the preset information is performed.

The embodiment of the invention further provides a computer storage medium, wherein the computer storage medium stores computer executable instructions, and the computer executable instructions are used to execute the voice control method.

The above technical solution of the present invention has at least the following beneficial effects:

In the voice control method, device, and storage medium of the embodiment of the present invention, after obtaining the input voice of the user, the identifier voice pre-stored by the terminal side is matched, and the matching is obtained after the matching is obtained. The preset information that the meaning of the voice is not related, so that the terminal performs the operation corresponding to the preset information; in the embodiment of the present invention, the preset information that is not related to the meaning of the identified voice is preset, so that other users cannot directly obtain the user. The real intention is to realize personalized voice control settings, which greatly improves the security and service of the terminal voice input; at the same time, it improves user satisfaction.

DRAWINGS

1 is a flow chart showing the basic steps of a method for voice control according to an embodiment of the present invention;

2 is a flow chart showing the basic steps of a method for setting preset information in a method for voice control according to an embodiment of the present invention;

3 is a schematic structural diagram of an apparatus for voice control according to an embodiment of the present invention;

4 is a schematic diagram showing a connection relationship of a specific structure of a device for voice control according to an embodiment of the present invention;

Figure 5 is a flowchart showing the execution of a specific embodiment 1 of the present invention;

Figure 6 is a flowchart showing the execution of a second embodiment of the present invention;

Figure 7 is a flowchart showing the execution of a third embodiment of the present invention;

Figure 8 is a flowchart showing the execution of a fourth embodiment of the present invention;

Figure 9 is a flow chart showing the execution of a fifth embodiment of the present invention.

detailed description

The technical problems, the technical solutions, and the advantages of the present invention will be more clearly described in the following description.

The present invention is directed to the problem that the voice control mode of the terminal is not high in the prior art, and provides a voice control method and device, which is matched with the identifier voice pre-stored by the terminal side after acquiring the input voice of the user, and the matching is consistent. The preset information that is not related to the meaning of the voice is obtained, so that the terminal performs the operation corresponding to the preset information. In the embodiment of the present invention, the preset information that is not related to the meaning of the voice is preset, so that other users cannot directly Get user The real intention is to realize personalized voice control settings, which greatly improves the security and service of the terminal voice input; at the same time, it improves user satisfaction.

As shown in FIG. 1 , an embodiment of the present invention provides a voice control method, which is applied to a terminal side, and includes:

Step 11: Acquire an input voice of the user;

Step 12: If the terminal side starts the preset function, it is determined whether the terminal side has a signature voice that is consistent with the input voice.

Step 13: If the identifier voice exists, obtain preset information that is not related to the meaning of the identifier voice according to the identifier voice.

Step 14: Perform an operation corresponding to the preset information.

In the above embodiment of the present invention, the input voice of the user is the voice sent by the user. Specifically, the terminal is provided with a human interface module, which is an interface for detecting the collected user voice, and is used for collecting the collected voice. The sound is transmitted to the central processing unit of the terminal; the central processing unit of the terminal side performs step 12 and step 13, that is, parsing the input voice of the user, and calling the preset information corresponding to the identification voice consistent with the input voice of the user, wherein, The security of the input voice is guaranteed, and the preset information is not related to the meaning of the voice.

Preferably, as shown in FIG. 2, the specific setting steps of the preset preset information that is not related to the meaning of the identification voice include:

Step 21: Acquire preset information input by the user through a preset interface, where the preset information is used to instruct the terminal to perform a corresponding operation;

In step 22, in response to the operation of the user inputting a voice through the voice interface, the input voice is set as the identifier voice set by the preset information; wherein the preset information and the content of the identifier voice are not related.

In the foregoing embodiment of the present invention, the preset information is an operation content that the user actually wants the terminal to perform, and the preset information needs to be customized by the user through a preset interface, where the preset interface is mainly packaged. The interface for inputting text, the interface for inputting voice, and the interface for calling instructions. In addition, in the voice control setting method provided by the embodiment of the present invention, the user needs to set the identifier voice for the preset information through the voice interface, and the identifier voice and the preset information have a one-to-one correspondence; that is, the terminal detects the user voice, if If the user voice is one of the voices, the preset information corresponding to the voice is obtained, and the terminal performs the operation corresponding to the preset information. The setting method provided by the embodiment of the present invention makes the terminal not directly perform operations according to the actual meaning of the user voice, thereby improving the security of the voice control method of the terminal.

Specifically, in the specific embodiment of the present invention, when the preset interface is an input text interface, step 11 is specifically:

Step 211: Acquire text input by the user through an input text interface.

Or, when the preset interface is an input voice interface, step 11 is specifically:

Step 212: Acquire a voice input by the user through an input voice interface.

Or, when the preset interface is a call instruction interface, step 11 is specifically:

Step 213: Acquire an instruction preset by a user;

Step 214: Acquire an instruction that the user selects from the preset instructions by calling an instruction interface.

In a specific application of the embodiment of the present invention, the interface for inputting text on the terminal side is a text input mode on the UI of the user interface; the interface for inputting voice is a voice input mode on the UI of the user interface; the interface for invoking the command is on the UI of the user interface. Command input mode; specifically, the terminal can customize the usage scene of “text, command, voice input”. For example, in all editing interfaces of the terminal, text input can be started; in the software chat tool dialog interface, text and voice input can be started; Browsing the web page can initiate command input such as "page turning, exiting".

For example, if the user chooses to customize "text input":

1. Provide an interface for the user to input text, for example, the user can input "I am at home";

2. Provide users with a voice-defining interface, and define "yes" and other voices for "I am at home". User sounds or other user-defined sounds can be recorded;

3. In any editing interface of the terminal, the mobile phone detects the voice consistent with the defined voice "yes", and automatically enters the text "I am at home" in the edit box.

If the user chooses to customize "voice input":

1. Providing an interface for the user to input voice, such as the user inputting the voice "test successful";

2. Provide users with a customized voice interface, define "yes" and other voices for "test success", and record user voice or other user-defined voices;

3. In the interactive chat interface, the mobile phone detects the voice consistent with the defined voice "yes" and automatically sends the voice "test success".

If the user chooses a custom "command input":

1. First customize some instructions and provide an interface for the user to invoke the instructions, such as defining a "page flip" command;

2. The user selects a page turning command to define a "page turning" voice for "page turning", and can record a user voice or other voice defined by the user;

3. In the browser interface or the document reading interface, the mobile phone detects the voice consistent with the defined voice "page turning", and the web page or document will automatically turn the page.

It should be noted that the method for providing voice control in the embodiment of the present invention further sets a configuration switch when the terminal is set, that is, the method can be effective only when the configuration switch is turned on, and if the configuration switch is turned off, the terminal can normally recognize the user voice, and The operation corresponding to the actual meaning of the user voice is performed, and the setting of the configuration switch is such that the original function of the terminal is not affected. The configuration method implements a method of custom voice input, which greatly improves the security of the terminal.

Specifically, if the preset function is not enabled on the terminal side, the method further includes:

Step 31: Parse the input voice, and determine a meaning of the input voice;

Step 32: Perform a corresponding operation according to the meaning of the input voice.

In the method for setting preset information in the embodiment of the present invention, the user inputs the actual through the preset interface. The content (preset information) needs to be executed, and the voice is set for the actual content to be executed through the voice interface. After the terminal detects the voice, the terminal needs to execute the actual content to be executed corresponding to the preset information, and implements personalized voice control. The setting greatly improves the security and serviceability of the terminal voice input; at the same time, the user satisfaction is improved.

In order to achieve the above method, as shown in FIG. 3, the embodiment of the present invention further provides a device for voice control, which is applied to the terminal side, and includes:

The voice acquiring module 301 is configured to acquire an input voice of the user.

The determining module 302 is configured to: if the terminal side starts the preset function, determine whether the terminal side stores the identification voice consistent with the input voice in advance;

The preset information obtaining module 303 is configured to: if the identification voice exists, obtain preset information that is not related to the meaning of the identifier voice according to the identifier voice;

The first execution module 304 is configured to perform an operation corresponding to the preset information.

Specifically, in the above embodiment of the present invention, if the preset function is not enabled on the terminal side, the device further includes:

a parsing module configured to parse the input voice to determine a meaning of the input voice;

The second execution module is configured to perform a corresponding operation according to the meaning of the input voice.

Specifically, in the foregoing embodiment of the present invention, the device further includes:

An acquiring module, configured to acquire preset information that is input by the user through a preset interface, where the preset information is used to instruct the terminal to perform a corresponding operation;

a setting module, configured to respond to the operation of the user inputting a voice through a voice interface, and set the input voice as an identifier voice set by the preset information; where the preset information and the content of the identifier voice are not Related.

Specifically, in the foregoing embodiment of the present invention, the acquiring module includes:

The first obtaining submodule is configured to acquire text input by the user through an input text interface.

The second obtaining submodule is configured to acquire the voice input by the user through the input voice interface.

a third obtaining submodule configured to obtain an instruction preset by the user;

And a fourth acquiring submodule configured to acquire an instruction that the user selects from the preset instructions by calling an instruction interface.

In a specific embodiment of the present invention, the function of the voice acquiring module 301 is actually implemented by a human interface module on the terminal, and the corresponding functions of the determining module 302, the preset information acquiring module 303, and the executing module 304 are a central processing unit on the terminal. The terminal further includes a UI interface and a setting module; the specific connection relationship is as shown in FIG. 4, the setting module provides the user to customize the actual input content, and provides corresponding customized sound and storage functions; the human-machine interface module, Detecting the interface for collecting user's voice, which is connected to the setting module through the central processing unit to collect sound and transmit the information to the central processor; the central processor is responsible for the human-machine interface module, the UI module, the setting module and other functional modules, and processes User voice, and call the custom voice input module custom corresponding input, and display the corresponding input in the UI interface; UI interface: according to the processing and calling of the central processor, the user-defined actual input content is displayed in the UI interface.

The specific instructions are as follows:

Embodiment 1

As shown in FIG. 5, firstly, in the message editing interface, the user issues a voice “yes”, and the terminal determines whether it has a custom stored voice; if the voice is not stored, the terminal does not respond; if the voice is stored, the terminal reads The preset information corresponding to the sound, for example, enter "I am at home" in the text box, and then automatically enter "I am at home" in the message edit box.

Specific embodiment 2:

As shown in Figure 6, first the user makes a sound, such as "Ahhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh Preset information corresponding to the sound, such as automatic call 110, and then the terminal is self- Call 110.

Specific embodiment 3:

As shown in FIG. 7 , firstly, in the information editing interface, the user issues a sound, such as “something to make a call”, and the terminal determines whether it has a custom stored sound; if the sound is not stored, the terminal does not respond; if the sound is stored, The terminal reads the preset information corresponding to the sound, for example, automatically transmitting the voice content “test successful”, and then the terminal automatically sends the voice information “test success” to the information receiver.

Specific Embodiment 4:

As shown in Figure 8, first the user makes a sound during the call, such as "I am fine now", the terminal determines whether it has a custom stored sound; if the sound is not stored, the terminal does not respond; if there is stored the sound The terminal reads the preset information corresponding to the sound, for example, automatically sends the voice content "I was caught by the police", and the terminal automatically sends the voice message "I was caught by the police".

Embodiment 5:

As shown in FIG. 9 , firstly, during the process of browsing the webpage, the user issues a sound, such as “turning the page”, and the terminal determines whether it has stored the sound by itself; if the sound is not stored, the terminal does not respond; if the sound is stored, The terminal reads the preset information corresponding to the sound, for example, the webpage automatically scrolls down one page, and the webpage on the terminal automatically flips to the next page.

In order to better implement the method of the embodiment of the present invention, the embodiment of the present invention further provides a terminal, where the terminal includes a processor, and the processor is configured to acquire an input voice of the user; And determining, by the terminal side, whether the identifier voice is consistent with the input voice; if the identifier voice is present, acquiring, according to the identifier voice, preset information that is not related to the meaning of the identifier voice. Performing an operation corresponding to the preset information.

The voice control method according to the embodiment of the present invention may also be stored in a computer readable storage medium if it is implemented in the form of a software function module and sold or used as a stand-alone product. in. Based on such understanding, the technical solution of the embodiments of the present invention may be embodied in the form of a software product in essence or in the form of a software product stored in a storage medium, including a plurality of instructions. A computer device (which may be a personal computer, server, or network device, etc.) is caused to perform all or part of the methods described in various embodiments of the present invention. The foregoing storage medium includes: a U disk, a removable hard disk, a read-only memory (ROM), a magnetic disk, or an optical disk, and the like, which can store program codes. Thus, embodiments of the invention are not limited to any specific combination of hardware and software.

Correspondingly, the embodiment of the present invention further provides a computer storage medium, wherein a computer program for executing the voice control method of the embodiment of the present invention is stored.

It should be noted that the apparatus for voice control provided by the embodiment of the present invention is a device that utilizes the method of voice control described above, and all embodiments of the foregoing methods are applicable to the device, and all of the same or similar beneficial effects can be achieved.

The above is a preferred embodiment of the present invention, and it should be noted that those skilled in the art can also make several improvements and retouchings without departing from the principles of the present invention. It should also be considered as the scope of protection of the present invention.

Industrial applicability

In the embodiment of the present invention, the preset information that is not related to the meaning of the identified voice is preset, so that other users cannot directly obtain the true intention of the user, and the personalized voice control setting is realized, thereby greatly improving the security of the voice input of the terminal. Serviceability; at the same time, improved user satisfaction.

Claims

A voice control method is applied to a terminal side, and the method includes:

Obtain the input voice of the user;

If the terminal side starts the preset function, it is determined whether the terminal side has a signature voice consistent with the input voice.

If the identification voice is present, the pre-set preset information that is not related to the meaning of the identifier voice is acquired according to the identifier voice;

Executing an operation corresponding to the preset information.
The method of voice control according to claim 1, wherein if the terminal side does not enable the preset function, the method further includes:

Parsing the input voice to determine a meaning of the input voice;

Corresponding operations are performed according to the meaning of the input voice.
The voice control method according to claim 1, wherein the setting step of the preset preset information that is not related to the meaning of the identification voice comprises:

Acquiring preset information input by the user through a preset interface, where the preset information is used to instruct the terminal to perform a corresponding operation;

And responding to the operation of the user inputting the voice through the voice interface, setting the input voice as the identifier voice set by the preset information; wherein the preset information and the content of the identifier voice are not related.
The method of claim 3, wherein the acquiring preset information input by the user through a preset interface comprises:

Get the text entered by the user through the input text interface.
The method of claim 3, wherein the acquiring preset information input by the user through a preset interface comprises:

Acquiring the voice input by the user through the input voice interface.
The method of claim 3, wherein the acquiring preset information input by the user through a preset interface comprises:

Obtain an instruction preset by the user;

Obtaining an instruction that the user selects from the preset instructions by calling an instruction interface.
A voice control device is applied to the terminal side, and the device includes: a voice acquisition module, a determination module, a preset information acquisition module, and a first execution module;

a voice acquisition module configured to acquire an input voice of the user;

a determining module, configured to determine whether the terminal side has a signature voice consistent with the input voice, if the preset function is enabled on the terminal side;

a preset information acquiring module, configured to: if the identification voice exists, obtain preset information that is not related to the meaning of the identifier voice according to the identifier voice;

The first execution module is configured to perform an operation corresponding to the preset information.
The apparatus for voice control according to claim 7, wherein if the terminal side does not enable the preset function, the apparatus further includes:

a parsing module configured to parse the input voice to determine a meaning of the input voice;

The second execution module is configured to perform a corresponding operation according to the meaning of the input voice.
The apparatus of claim 7, wherein the apparatus further comprises:

An acquiring module, configured to acquire preset information input by the user through a preset interface, where the preset information is configured to instruct the terminal to perform a corresponding operation;

a setting module, configured to respond to the operation of the user inputting a voice through a voice interface, and set the input voice as an identifier voice set by the preset information; where the preset information and the content of the identifier voice are not Related.
The apparatus for voice control according to claim 9, wherein the obtaining module comprises:

The first obtaining submodule is configured to acquire text input by the user through an input text interface.
The apparatus for voice control according to claim 9, wherein the obtaining module comprises:

The second obtaining submodule is configured to acquire the voice input by the user through the input voice interface.
The apparatus for voice control according to claim 9, wherein the obtaining module comprises:

a third obtaining submodule configured to obtain an instruction preset by the user;

And a fourth acquiring submodule configured to acquire an instruction that the user selects from the preset instructions by calling an instruction interface.
A terminal, the terminal includes a processor, the processor is configured to acquire an input voice of the user; and if the terminal side starts the preset function, determining whether the terminal side pre-stores an identifier consistent with the input voice If the voice is present, the pre-set preset information that is not related to the meaning of the voice is obtained according to the voice, and the operation corresponding to the preset information is performed.
A computer storage medium having stored therein computer executable instructions for performing the method of any of claims 1-6.