CN113496703A

CN113496703A - Method, device and program product for controlling program in voice mode

Info

Publication number: CN113496703A
Application number: CN202110839754.6A
Authority: CN
Inventors: 刘俊启
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-07-23
Filing date: 2021-07-23
Publication date: 2021-10-12
Also published as: WO2023000698A1

Abstract

The method, the equipment and the program product for controlling the program in a voice mode relate to the voice technology, the instruction information of a third party program can be stored in a voice recognition program, when the user operates the third-party program in a voice mode, the voice recognition program determines instruction information corresponding to the voice instruction of the user from the stored instruction information, further, if a plurality of pieces of instruction information corresponding to the voice instructions of the user are specified, the target instruction information may be specified in the plurality of pieces of instruction information corresponding to the voice instructions of the user based on the specified instruction of the user, so that the operable instructions in the target instruction information can be sent to the third-party program specified by the user through the voice recognition program, so that the third-party program can respond to the voice instruction of the user, by this embodiment, the user can control the third-party program without the voice recognition function by voice.

Description

Method, device and program product for controlling program in voice mode

Technical Field

The present disclosure relates to a speech technology in computer technology, and more particularly, to a method, an apparatus, and a program product for controlling a program by speech.

Background

At present, a large number of APPs are arranged in a mobile terminal, and a user can operate a user terminal to start the APPs and use the started APPs.

In the prior art, a user may operate a mobile terminal in multiple manners to run a designated App, for example, may operate the mobile terminal in a touch manner to run the App, and may call up a certain App by using a voice assistant.

Under the general condition, when the inconvenient direct operation cell-phone of user, can adopt the speech mode to call up APP, however, after APP starts, if APP itself does not possess the speech recognition function, then the user can only adopt this APP of mode operation of touch-control. Therefore, when the user is inconvenient to directly operate the mobile terminal by hands in the prior art, the APP cannot be really controlled in a voice mode.

Disclosure of Invention

The disclosure provides a method, equipment and a program product for controlling a program in a voice mode, which aim to solve the technical problem that in the prior art, when a user directly operates a mobile terminal by hands inconveniently, APP control in the voice mode cannot be really realized.

According to a first aspect of the present disclosure, there is provided a method for controlling a program by voice, the method being applied to a voice recognition program of an electronic device in which the voice recognition program and a plurality of third party programs are running, the method comprising:

responding to a voice instruction initiated by a user, and determining instruction information corresponding to the voice instruction, wherein the instruction information of the third-party program is stored in the voice recognition program;

if a plurality of instruction information corresponding to the voice instructions are determined, prompt information is generated, wherein the plurality of instruction information corresponding to the voice instructions belong to different third-party programs respectively, and the prompt information is used for prompting a user to specify the third-party program;

in response to a specified instruction initiated by a user, determining target instruction information corresponding to a third-party program indicated by the specified instruction in a plurality of instruction information corresponding to the voice instruction, and sending an operable instruction in the target instruction information to the third-party program indicated by the specified instruction for response processing.

According to a second aspect of the present disclosure, there is provided a method for controlling a program by voice, the method being applied to a third-party program of an electronic device in which a voice recognition program and a plurality of third-party programs are run, the method comprising:

receiving an operability instruction sent by a voice recognition program; wherein the operable instruction is determined in a plurality of instruction information based on a specified instruction of a user, the plurality of instruction information being information respectively belonging to different third-party programs determined according to a voice instruction of the user; storing instruction information of a third-party program in the voice recognition program;

and completing response processing according to the operable instruction.

According to a third aspect of the present disclosure, there is provided an apparatus for controlling a program by voice, the apparatus being applied to a voice recognition program of an electronic device in which the voice recognition program and a plurality of third party programs are run, the apparatus comprising:

the determining unit is used for responding to a voice instruction initiated by a user and determining instruction information corresponding to the voice instruction, wherein the instruction information of the third-party program is stored in the voice recognition program;

the prompting unit is used for generating prompting information if a plurality of instruction information corresponding to the voice instructions are determined to exist, wherein the plurality of instruction information corresponding to the voice instructions belong to different third-party programs respectively, and the prompting information is used for prompting a user to specify the third-party programs;

and the control unit is used for responding to a specified instruction initiated by a user, determining target instruction information corresponding to a third-party program indicated by the specified instruction in a plurality of instruction information corresponding to the voice instruction, and sending an operable instruction in the target instruction information to the third-party program indicated by the specified instruction for response processing.

According to a fourth aspect of the present disclosure, there is provided an apparatus for controlling a program by voice, the apparatus being applied to a third-party program of an electronic device in which a voice recognition program and a plurality of third-party programs are run, the apparatus comprising:

a receiving unit for receiving an operability instruction transmitted by the voice recognition program; wherein the operable instruction is determined in a plurality of instruction information based on a specified instruction of a user, the plurality of instruction information being information respectively belonging to different third-party programs determined according to a voice instruction of the user; storing instruction information of a third-party program in the voice recognition program;

and the response unit is used for finishing response processing according to the operable instruction.

According to a fifth aspect of the present disclosure, there is provided an electronic device comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of the first or second aspect.

According to a sixth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of the first or second aspect.

According to a seventh aspect of the present disclosure, there is provided a computer program product comprising: a computer program, stored in a readable storage medium, from which at least one processor of an electronic device can read the computer program, execution of the computer program by the at least one processor causing the electronic device to perform the method of the first or second aspect.

The method, the device and the program product for controlling the program in a voice mode can store the instruction information of the third party program in the voice recognition program, when the user operates the third-party program in a voice mode, the voice recognition program determines instruction information corresponding to the voice instruction of the user from the stored instruction information, further, if a plurality of pieces of instruction information corresponding to the voice instructions of the user are specified, the target instruction information may be specified in the plurality of pieces of instruction information corresponding to the voice instructions of the user based on the specified instruction of the user, so that the operable instructions in the target instruction information can be sent to the third-party program specified by the user through the voice recognition program, so that the third-party program can respond to the voice instruction of the user, by this embodiment, the user can control the third-party program without the voice recognition function by voice.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a diagram illustrating a wake-up procedure using voice in an exemplary embodiment;

fig. 2 is a flowchart illustrating a method of controlling a program by voice according to a first exemplary embodiment of the present disclosure;

FIG. 3 is a schematic diagram of an interface shown in an exemplary embodiment of the present disclosure;

fig. 4 is a flowchart illustrating a method of controlling a program by voice according to a second exemplary embodiment of the present disclosure;

fig. 5 is a flowchart illustrating a method of controlling a program by voice according to a third exemplary embodiment of the present disclosure;

fig. 6 is a flowchart illustrating a method of controlling a program by voice according to a fourth exemplary embodiment of the present disclosure;

fig. 7 is a flowchart illustrating a method of controlling a program by voice according to a fifth exemplary embodiment of the present disclosure;

fig. 8 is a schematic structural diagram of an apparatus for controlling a program by voice according to a first exemplary embodiment of the present disclosure;

fig. 9 is a schematic structural diagram of an apparatus for controlling a program by voice according to a second exemplary embodiment of the present disclosure;

fig. 10 is a schematic structural diagram of an apparatus for controlling a program by voice according to a third exemplary embodiment of the present disclosure;

fig. 11 is a schematic structural diagram of an apparatus for controlling a program by voice according to a fourth exemplary embodiment of the present disclosure;

fig. 12 is a block diagram of an electronic device for implementing a method of controlling a program by voice according to an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Fig. 1 is a diagram illustrating a wake-up procedure using a voice mode according to an exemplary embodiment.

As shown in fig. 1, the user 11 may speak a voice command, and the mobile terminal 12 may be provided with a voice recognition program. The mobile terminal 12 may recognize the contents of the voice instruction using a built-in voice recognition program and perform a responsive operation according to the contents of the voice instruction.

For example, when the content of the voice instruction is "start program a", the mobile terminal 12 may execute program a with the interface updated from state 13 to state 14.

If the program a has a voice recognition function, the user can continue to control the program a by the voice control method, and if the program a does not have the voice recognition function, the program a cannot respond when the user continues to control the program a by the voice control method.

Particularly, when it is inconvenient for a user to control the mobile terminal using both hands, the mobile terminal can start only one application program based on a voice instruction of the user, but cannot continue to control the application program based on the voice instruction of the user.

In order to solve the technical problem, in the solution provided by the present disclosure, instruction information of a third-party program is stored in a voice recognition program, and when an electronic device receives a voice instruction used by a user to control the third-party program, the voice recognition program can determine the instruction information corresponding to the voice instruction and send an operable instruction to the third-party program according to the instruction information, so that the third-party program can respond to the voice instruction of the user. By the implementation mode, the user can wake up the third-party program in a voice mode and can control the started third-party program.

Fig. 2 is a flowchart illustrating a method for controlling a program in a voice manner according to an exemplary embodiment of the present disclosure.

The method for controlling the program in the voice mode provided by the disclosure is applied to the voice recognition program of the electronic equipment, and the electronic equipment can execute the method of the disclosure based on the function of the voice recognition program.

Fig. 3 is a schematic interface diagram illustrating an exemplary embodiment of the present disclosure.

As shown in fig. 3, a speech recognition program 31 and a plurality of third party programs 32 are run in the electronic device. After the electronic device receives the voice command, the voice command may be processed by the voice recognition program 31 provided therein.

Wherein the third party program is a program other than the speech recognition program.

Referring to fig. 2, the method for controlling a program in a voice manner according to the present disclosure includes:

step 201, responding to a voice instruction initiated by a user, determining instruction information corresponding to the voice instruction, wherein the instruction information of a third-party program is stored in a voice recognition program.

Specifically, instruction information of any one or more third party programs is stored in the voice recognition program. For example, in the running process of the third-party program, instruction registration information may be sent to the voice recognition program, so that the voice recognition program stores the instruction information of the third-party program. In practical application, the voice recognition program can store a plurality of instruction information of any third-party program.

Further, the voice instruction initiated by the user may be an instruction for controlling any third-party program, and after receiving the voice instruction, the electronic device may process the voice instruction through a built-in voice recognition program.

In an alternative embodiment, each piece of instruction information may include instruction content, and the speech recognition program may determine, according to the content included in the speech instruction and the instruction content included in each piece of instruction information, instruction information corresponding to the speech instruction. The instruction content may be, for example, "previous page", "next page", "tone up a little", "tone down a little", and the like.

In practical application, the voice recognition program may determine instruction information corresponding to a voice instruction initiated by a current user according to pre-stored instruction information of a third-party program. For example, if the voice recognition program stores the instruction information of the program a, the program B, and the program C, the instruction information matching the voice instruction initiated by the user can be determined from the instruction information of the three programs.

For example, if the voice command initiated by the user includes "a little more than the sound", the voice recognition program may search the stored command information for command information corresponding to "a little more than the sound". For example, if the program a is an audio playing program, the program a may have instruction information corresponding to "sound is turned up a little bit". In this embodiment, the voice recognition program may directly acquire an operable instruction in the instruction information of the program a corresponding to "a little more loud" and send it to the program a, so that the program a executes the operable instruction.

In an alternative embodiment, for example, if program B is a communication program, which may also have command information corresponding to "a little louder" then the speech recognition program can determine the two command information corresponding to the speech command. In this case, step 202 may be performed.

Step 202, if it is determined that there are a plurality of instruction information corresponding to the voice instruction, generating a prompt message, where the plurality of instruction information corresponding to the voice instruction belong to different third-party programs, respectively, and the prompt message is used for prompting the user to specify the third-party program.

If a plurality of instruction information corresponding to the voice instruction exist and belong to different third-party programs, the voice recognition program needs to determine the third-party program which the user wants to operate and select the instruction information of the third-party program from the plurality of instruction information, and then the third-party program is controlled based on the instruction information.

In actual application, the voice recognition program can generate prompt information for prompting the user to specify the third-party program. The prompt message may be a prompt message in a voice form, or may also be a prompt message in a text form, which is not limited in the embodiments of the present disclosure.

The user can designate the third-party program in a voice mode, a touch mode or the like. For example, the user may speak a designated third party program. For another example, when the prompt information is displayed in the form of graphics and text, icons of a plurality of third-party programs can be displayed, and the user can designate the third-party programs in the icons in a touch manner.

For example, the voice command initiated by the user includes "a little more loud," and the voice recognition program determines that both program a and program B have command information corresponding to the voice command. The speech recognition program may generate prompt information for programming the program in program a, program B. Thereafter, the user may formulate a third party program based on the reminder information, e.g., program a may be specified.

Step 203, responding to a specified instruction initiated by a user, determining target instruction information corresponding to the third-party program indicated by the specified instruction in a plurality of instruction information corresponding to the voice instruction, and sending an operable instruction in the target instruction information to the third-party program indicated by the specified instruction for response processing.

Specifically, the voice recognition program may respond to a specified instruction initiated by the user, so as to determine the third-party program specified by the user. And target instruction information corresponding to the third-party program specified by the user can be determined from the plurality of determined instruction information corresponding to the voice instruction.

For example, the voice recognition program determines two pieces of instruction information corresponding to the voice instruction of the user, one of which is the instruction information of the program a, and the other is the instruction information of the program B. If the third-party program specified by the user is the program A, the voice recognition program takes the instruction information of the program A in the plurality of pieces of instruction information as target instruction information.

Further, specific operational instructions may be included in the instruction information. For example, when the instruction content in one piece of instruction information is "previous page", the operable instruction in the piece of instruction information is "back", and the piece of instruction information may further include program information, for example, program a.

In practical application, the speech recognition program may send the operable instruction in the target instruction information to the third-party program indicated by the specified instruction, for example, send the operable instruction in the target instruction information to program a.

In one embodiment, the speech recognition program may directly send the operational instructions to the program a, or may send the operational instructions to a system of the electronic device, which forwards the operational instructions to the program a.

And after the third-party program indicated by the specified instruction receives the operational instruction, the corresponding instruction can be executed, so that the third-party program responds. Because the operable instruction corresponds to the voice instruction of the user, when the third-party program executes the operable instruction, the effect of responding to the voice instruction of the user can be achieved, and the user can control the third-party program without the voice recognition function in a voice mode.

The method for controlling the program in the voice mode is applied to a voice recognition program of electronic equipment, the voice recognition program and a plurality of third-party programs are operated in the electronic equipment, and the method comprises the following steps: responding to a voice instruction initiated by a user, and determining instruction information corresponding to the voice instruction, wherein the instruction information of a third-party program is stored in a voice recognition program; if the fact that a plurality of instruction information corresponding to the voice instructions exist is determined, prompt information is generated, wherein the instruction information corresponding to the voice instructions belong to different third-party programs respectively, and the prompt information is used for prompting a user to specify the third-party programs; and in response to a specified instruction initiated by a user, determining target instruction information corresponding to the third-party program indicated by the specified instruction in the instruction information corresponding to the voice instruction, and sending an operable instruction in the target instruction information to the third-party program indicated by the specified instruction for response processing. In the method for controlling the program by voice, the voice recognition program can store the instruction information of the third party program, when the user operates the third-party program in a voice mode, the voice recognition program determines instruction information corresponding to the voice instruction of the user from the stored instruction information, further, if a plurality of pieces of instruction information corresponding to the voice instructions of the user are specified, the target instruction information may be specified in the plurality of pieces of instruction information corresponding to the voice instructions of the user based on the specified instruction of the user, so that the operable instructions in the target instruction information can be sent to the third-party program specified by the user through the voice recognition program, so that the third-party program can respond to the voice instruction of the user, by this embodiment, the user can control the third-party program without the voice recognition function by voice.

Fig. 4 is a flowchart illustrating a method for controlling a program by voice according to a second exemplary embodiment of the present disclosure.

The method for controlling the program in the voice mode provided by the disclosure is applied to the voice recognition program of the electronic equipment, and the electronic equipment can execute the method of the disclosure based on the function of the voice recognition program. A speech recognition program and a plurality of third party programs are run in the electronic device.

Referring to fig. 4, the method for controlling a program in a voice manner according to the present disclosure includes:

step 401, responding to a starting instruction for starting the third party program, and starting the third party program.

Wherein, the user can send a starting instruction for starting the third-party program to the electronic device, and the starting instruction can be an instruction in a voice form, for example. For example, the user may say "start program a", and the voice instruction may be processed by a voice recognition program provided in the electronic device.

In an alternative embodiment, after receiving the start instruction in the form of voice, the voice recognition program may directly start the corresponding program. For example, the third party program may be provided with an interface, such that the speech recognition program can launch the third party program through the interface.

In another embodiment, after receiving the start instruction in the form of voice, the voice recognition program may convert the start instruction in the form of voice into a control instruction, send the control instruction to the system of the electronic device, and start the third-party program based on the control instruction by the system of the electronic device.

Through the implementation, the user can start the third-party program in a voice mode, and when the user is inconvenient to touch the electronic equipment with hands, the third-party program can be started in a voice control mode.

Step 402, in response to a user initiated voice instruction, determining voice content.

The third-party program can be controlled by the user through the voice instruction, for example, the "previous page" of the voice content can be spoken, and the voice recognition program can determine the voice content included in the voice instruction spoken by the user based on the voice recognition algorithm, for example, the "previous page" of the voice content in the voice instruction spoken by the user can be recognized.

Step 403, determining instruction information including voice content in the stored corresponding relation; wherein, the instruction information comprises voice content and operable instructions.

Wherein, the instruction information of the third party program is stored in the voice recognition program.

For example, the speech recognition program may receive instruction registration information transmitted by the third-party program for registering the operational instructions.

In actual application, after the third-party program is started, the started third-party program may send instruction registration information for registering the operable instruction to the voice recognition program.

The third-party program can determine instruction registration information according to the operational instruction which can be supported in the running process, and sends the registration instruction to the voice recognition program.

Specifically, the third-party program may further determine, according to an operable control included in the program current interface, an operable instruction that can be supported in the program current interface, and send instruction registration information for registering the operable instruction to the voice recognition program. For example, if the program currently supports the operable instruction of "next page", "previous page", or "ok", the third-party program may send instruction registration information for registering the operable instruction of "next page, previous page, or ok" to the voice recognition program.

Further, the voice recognition program may be provided with an interface so that a third party program can send instruction registration information to the voice recognition program through the interface. The third-party program may also send instruction registration information to the system of the electronic device, which is forwarded by the system to the speech recognition program.

The voice recognition program may further store a correspondence between the third-party program and the instruction information according to the instruction registration information.

In practical application, after the voice recognition program receives the instruction registration information, the corresponding relationship between the third-party program and the instruction information can be stored according to the instruction registration information. For example, if the program a sends instruction registration information to the voice recognition program, the voice recognition program may determine the instruction information from the instruction registration information and store the association relationship between the program a and the instruction information.

Through the implementation mode, the voice recognition program can store the corresponding relation between the third-party program and the instruction information according to the instruction registration information sent by the third-party program, so that the voice recognition program can control the third-party program according to the voice instruction of the user, and the effect that the user can control the third-party program without the voice recognition function in a voice mode is achieved.

In one embodiment, the instruction registration information sent by the third-party program to the voice recognition program may include an operational instruction, and in this embodiment, the voice recognition program may determine the instruction content corresponding to each operational instruction, so as to obtain the instruction information including the operational instruction and the instruction content corresponding to the operational instruction, and may further store the association relationship between the third-party program and the instruction information.

In another embodiment, the instruction registration information sent by the third-party program to the speech recognition program may include an operable instruction and instruction content corresponding thereto, the speech recognition program may directly store the association relationship between the third-party program and the instruction information, and the instruction information includes the operable instruction and the instruction content corresponding thereto.

For example, the speech recognition program may store the association relationship between the instruction information and the third-party program in the form of an instruction configuration table. The instruction configuration table may include three columns, one column is the instruction content, one column is the program to which the instruction belongs, and the other column is the instruction format of the operable instruction. For example, instruction content "previous page", instruction format "back" may be included in a line of information in the instruction configuration table, and the program is app 1.

Specifically, the speech recognition program stores a correspondence between the instruction information and the third-party program, and the speech recognition program may specifically determine, from among the correspondence, a correspondence that the instruction information includes the speech content.

For example, the stored correspondence includes two correspondences, where correspondence 1 includes a correspondence between instruction information 1 and app 1; correspondence 2 includes a correspondence between instruction information 2 and app 2. The content of the voice included in the instruction information 1 is "previous page", and the content of the voice included in the instruction information 2 is "previous page".

Further, if the voice instruction made by the user includes the voice content "previous page", the voice recognition program may determine that instruction information 1 and instruction information 2 including the voice content are included.

In this embodiment, the voice recognition program can determine the instruction information corresponding to the voice instruction made by the user based on the correspondence between the pre-stored instruction information and the third-party program, and further can convert the voice instruction into the instruction information, so as to achieve the purpose of controlling the third-party program by using the determined instruction information.

Step 404, if it is determined that there are a plurality of instruction information corresponding to the voice instruction, generating a prompt message, where the plurality of instruction information corresponding to the voice instruction belong to different third-party programs, respectively, and the prompt message is used for prompting the user to specify the third-party program.

Step 405, in response to a specified instruction initiated by a user, determining target instruction information corresponding to a third-party program indicated by the specified instruction in a plurality of instruction information corresponding to the voice instruction, and sending an operable instruction in the target instruction information to the third-party program indicated by the specified instruction for response processing.

Steps

404 and 405 are similar to the implementation manner of steps 202 and 203, and are not described again.

And step 406, receiving interface change information sent by the third-party program.

Specifically, if the third-party program sends instruction registration information to the voice recognition program according to the operational instruction supported in the program interface, the third-party program may send interface change information to the voice recognition instruction each time the interface change occurs.

For example, when the interface of the third-party program is switched, the third-party program may send interface change information to the voice recognition program.

In one embodiment, after the interface of the third-party program is switched, the third-party program may continue to send instruction registration information to the voice recognition program, where the information may carry interface change information.

In another embodiment, the third party program may send interface change information to the speech recognition program when it exits.

Step 407, updating the stored corresponding relationship between the third-party program and the instruction information according to the interface change information.

Further, after the voice recognition program receives the interface change information sent by the third-party program, the stored corresponding relationship between the third-party program and the instruction information can be updated.

Because the instruction information stored in the voice recognition program is the information of the operable instruction which can be supported in the current interface of the third-party program, the instruction information of the third-party program stored in the voice recognition program should be updated after the interface of the third-party program is changed, so that the voice recognition program can search the instruction information corresponding to the instruction information according to the voice instruction made by the user.

In one embodiment, if the voice recognition program receives the interface change information sent by the third-party program, the voice recognition program may directly delete the instruction information of the third-party program.

Thereafter, the speech recognition program may continue to receive the instruction registration information sent by the third-party program, and further store the instruction information corresponding to the operational instruction supported in the current interface of the third-party program.

Further, if the third-party program is interface change information sent to the voice recognition program when exiting, the third-party program does not send instruction registration information to the voice recognition program again before restarting.

In actual application, after the interface of the third-party program is changed, the voice recognition program can delete the voice information of the third-party program, so that the problem of misoperation caused by the fact that the voice recognition program processes the voice instruction of the user according to the instruction information of the historical interface of the third-party program is solved.

In an alternative embodiment, the method provided by the present disclosure further comprises:

step 408, if an instruction message corresponding to the voice instruction is determined, an operable instruction is sent to the third-party program to which the instruction message belongs according to the voice instruction.

If only one piece of instruction information corresponding to the voice instruction is determined in the prestored instruction information, the voice recognition program can directly acquire the operable instruction in the instruction information and can also send the operable instruction to the third-party program to which the instruction information corresponding to the voice instruction belongs, so that the third-party program can respond to the operable instruction.

After step 408,

steps

406, 407 may also be performed.

Fig. 5 is a flowchart illustrating a method for controlling a program by voice according to a third exemplary embodiment of the present disclosure.

The method for controlling the program in the voice mode is applied to the third-party program of the electronic equipment, and the electronic equipment can execute the method based on the function of the third-party program.

A speech recognition program and a plurality of third party programs are run in the electronic device. Any third party program may perform the methods provided by the present disclosure.

Referring to fig. 5, the method for controlling a program in a voice manner according to the present disclosure includes:

step 501, receiving an operability instruction sent by a voice recognition program; wherein the operable instruction is determined in a plurality of instruction information based on a specified instruction of the user, the plurality of instruction information being information respectively belonging to different third-party programs determined according to a voice instruction of the user; instruction information of the third-party program is stored in the voice recognition program.

Wherein, the voice recognition program stores the instruction information of any one or more third party programs. For example, in the running process of the third-party program, instruction registration information may be sent to the voice recognition program, so that the voice recognition program stores the instruction information of the third-party program. In practical application, the voice recognition program can store a plurality of instruction information of any third-party program.

If the voice recognition program determines that a plurality of instruction information corresponding to the voice instruction exist, prompt information is generated, wherein the plurality of instruction information corresponding to the voice instruction belong to different third-party programs respectively, and the prompt information is used for prompting a user to specify the third-party program.

In response to a specified instruction initiated by a user, determining target instruction information corresponding to a third-party program indicated by the specified instruction in a plurality of instruction information corresponding to the voice instruction, and sending an operable instruction in the target instruction information to the third-party program indicated by the specified instruction for response processing, so that the third-party program can receive the operable instruction.

The operable instruction is sent to the third-party program by the voice recognition program according to the voice instruction initiated by the user, and even if the third-party program does not have the voice recognition function, the operable instruction can be transmitted to the third-party program in a voice control mode.

For example, the voice recognition program determines two pieces of instruction information corresponding to the voice instruction of the user, one of which is the instruction information of the program a, and the other is the instruction information of the program B. If the third-party program specified by the user is the program A, the voice recognition program takes the instruction information of the program A in the plurality of pieces of instruction information as target instruction information. Thereafter, the speech recognition program can send the operational instructions in the target instruction information to program a.

The speech recognition program may specifically send the operational instructions to the third-party program according to the method shown in the embodiment shown in fig. 2.

Step 502, according to the operable instruction, completing the response processing.

Fig. 6 is a flowchart illustrating a method for controlling a program by voice according to a fourth exemplary embodiment of the present disclosure.

Referring to fig. 6, a method for controlling a program in a voice manner according to an embodiment of the present disclosure includes:

step 601, determining an operable instruction according to a program current interface.

After the third-party program is started, the operable instruction can be determined according to the current program interface. Specifically, the operational instructions supported in the program can be determined according to the current interface of the program.

Specifically, the third-party program may obtain information of an operable control included in the current interface of the program; and determining an operable instruction corresponding to the operable control.

In an alternative embodiment, the third-party program may further determine the instruction content according to the operational instruction, so as to obtain instruction information including the operational instruction and the instruction content corresponding to the operational instruction.

At step 602, registration information for registering the operational instruction is sent to the speech recognition program.

The third party program may also send instruction registration information for registering the operational instructions to the speech recognition program. For example, if the program currently supports the operable instruction of "next page", "previous page", or "ok", the third-party program may send instruction registration information for registering the operable instruction of "next page, previous page, or ok" to the voice recognition program.

Further, the third party program may transmit registration information for registering the instruction information to the system of the electronic device to cause the system to transmit the registration information to the voice recognition program.

In another embodiment, the speech recognition program may be provided with an interface so that a third party program can send instruction registration information to the speech recognition program through the interface. The third-party program may also send instruction registration information to the system of the electronic device, which is forwarded by the system to the speech recognition program.

Specifically, after receiving the instruction registration information, the speech recognition program may store a correspondence relationship between the instruction registration information and the third-party program. For example, if the program a sends instruction registration information to the voice recognition program, the voice recognition program may determine the instruction information from the instruction registration information and store the association relationship between the program a and the instruction information.

Step 603, receiving an operability instruction sent by the voice recognition program; the operable instructions are determined in a plurality of operable instructions based on the specified instructions of the user, and the plurality of operable instructions are determined according to the voice instructions of the user and belong to different third-party programs respectively; the voice recognition program stores the operational instructions of each third party program.

Step 604, response processing is completed according to the operable instruction.

Steps

603 and 604 are similar to the implementation manners of steps 501 and 502, and are not described again.

Step 605, when the program interface is switched, sending interface change information including interface switching information to the voice recognition program; and/or, when the program exits, sending interface change information including program exit information to the voice recognition program.

The voice recognition program can update the stored corresponding relation between the third-party program and the instruction information according to the interface change information. Because the instruction information stored in the voice recognition program is the information of the operable instruction which can be supported in the current interface of the third-party program, the instruction information of the third-party program stored in the voice recognition program should be updated after the interface of the third-party program is changed, so that the voice recognition program can search the instruction information corresponding to the instruction information according to the voice instruction made by the user.

Fig. 7 is a flowchart illustrating a method for controlling a program by voice according to a fifth exemplary embodiment of the present disclosure.

As shown in fig. 7, in the method for controlling a program in a voice manner provided by the present disclosure, the method may specifically include:

step a, a user sends out a starting instruction for starting the third-party program.

And b, the voice recognition program starts the third-party program according to the starting instruction.

And c, determining an operable instruction by the third-party program according to the current program interface.

And d, the third-party program sends registration information for registering the operable instruction to the voice recognition program.

And e, the voice recognition program stores the corresponding relation between the third-party program and the instruction information according to the instruction registration information.

And f, the user initiates a voice instruction for controlling the third-party program.

And g, the voice recognition program determines instruction information corresponding to the voice instruction, and if the voice recognition program determines that a plurality of instruction information corresponding to the voice instruction exist, prompt information is generated.

And h, the user can initiate a specified instruction based on the prompt message so as to specify the third-party program to be controlled.

Step i, the voice recognition program responds to a specified instruction initiated by a user, and determines target instruction information corresponding to a third-party program indicated by the specified instruction in a plurality of instruction information corresponding to the voice instruction.

And j, the voice recognition program sends the operable instruction included in the target instruction information to the third-party program indicated by the specified instruction.

And step k, after receiving the operable instruction, the third-party program can execute the corresponding operable instruction so as to respond to the operable instruction.

Through the process, the third-party program can be controlled by the user in a voice control mode, and even if the third-party program does not have a voice recognition function, the effect can still be achieved.

Fig. 8 is a schematic structural diagram of an apparatus for controlling a program by voice according to a first exemplary embodiment of the present disclosure.

As shown in fig. 8, the apparatus 800 for controlling a program by voice according to the present disclosure is applied to a voice recognition program of an electronic device, where the voice recognition program and a plurality of third party programs are running, and the apparatus 800 includes:

a determining unit 810, configured to determine, in response to a voice instruction initiated by a user, instruction information corresponding to the voice instruction, where the instruction information of the third-party program is stored in the voice recognition program;

a prompting unit 820, configured to generate prompting information if it is determined that there are multiple pieces of instruction information corresponding to the voice instruction, where the multiple pieces of instruction information corresponding to the voice instruction belong to different third-party programs, respectively, and the prompting information is used for prompting a user to specify a third-party program;

the control unit 830 is configured to, in response to a specified instruction initiated by a user, determine target instruction information corresponding to a third-party program indicated by the specified instruction from among a plurality of instruction information corresponding to the voice instruction, and send an operable instruction in the target instruction information to the third-party program indicated by the specified instruction for response processing.

The device for controlling a program in a voice manner provided by the present disclosure is similar to the implementation manner, principle and effect of the embodiment shown in fig. 2, and is not described again.

Fig. 9 is a schematic structural diagram of an apparatus for controlling a program by voice according to a second exemplary embodiment of the present disclosure.

The apparatus 900 for controlling a program by voice provided by the present disclosure, the determining unit 910 is similar to the determining unit 810 in fig. 8, the prompting unit 920 is similar to the prompting unit 820 in fig. 8, and the control unit 930 is similar to the control unit 830 in fig. 8.

Based on the embodiment shown in fig. 8, the present disclosure provides a device 900 for controlling a program in a voice mode,

in an optional implementation manner, the determining unit 910 includes:

a content determining module 911, configured to determine a voice content in response to a voice instruction initiated by a user;

an information determining module 912, configured to determine instruction information including the voice content in the stored correspondence; wherein, the instruction information comprises voice content and operable instructions.

In an optional implementation, the apparatus further includes a changing unit 940 configured to:

receiving interface change information sent by a third-party program;

and updating the corresponding relation between the stored third-party program and the instruction information according to the interface change information.

In an alternative embodiment, the changing unit 940 includes:

a deleting module 941, configured to delete instruction information corresponding to the third party program that sends the interface change information in the correspondence relationship.

In an optional implementation manner, an initiating unit 950 is further included, before the determining unit 910 determines, in response to a voice instruction initiated by a user, instruction information corresponding to the voice instruction:

and responding to a starting instruction for starting the third-party program, and starting the third-party program.

In an optional implementation manner, the correspondence between the third-party program and the instruction information is determined according to instruction registration information, and the instruction registration information is sent by the third-party program.

The device for controlling a program in a voice manner provided by the present disclosure is similar to the implementation manner, principle and effect of the embodiment shown in fig. 4, and is not described again.

Fig. 10 is a schematic structural diagram of an apparatus for controlling a program by voice according to a third exemplary embodiment of the present disclosure.

As shown in fig. 10, the apparatus 1000 is applied to a third-party program of an electronic device, where a speech recognition program and a plurality of third-party programs are running, and the apparatus includes:

a receiving unit 1010 for receiving an operability instruction transmitted by the voice recognition program; wherein the operable instruction is determined in a plurality of instruction information based on a specified instruction of a user, the plurality of instruction information being information respectively belonging to different third-party programs determined according to a voice instruction of the user; storing instruction information of a third-party program in the voice recognition program;

a response unit 1020, configured to complete response processing according to the operational instruction.

The device for controlling a program in a voice manner provided by the present disclosure is similar to the implementation manner, principle and effect of the embodiment shown in fig. 5, and is not described again.

Fig. 11 is a schematic structural diagram of an apparatus for controlling a program by voice according to a fourth exemplary embodiment of the present disclosure.

As shown in fig. 11, in the apparatus 1100 for controlling a program in a voice manner provided by the present disclosure, a receiving unit 1110 is similar to the receiving unit 1010 in fig. 10, and a responding unit 1120 is similar to the responding unit 1020 in fig. 10.

On the basis of the embodiment shown in fig. 10, the apparatus for controlling a program in a voice manner provided by the present disclosure further includes a registration unit 1130, configured to:

determining an operable instruction according to a current interface of a program;

sending registration information for registering the operational instruction to the voice recognition program.

In an alternative embodiment, the registration unit 1130 includes:

a sending module 1131, configured to send registration information for registering the operational instruction to a system of the electronic device, so that the system sends the registration information to the voice recognition program.

In an alternative embodiment, the registration unit 1130 includes an instruction determination module 1131 configured to:

acquiring information of an operable control included in a current interface of a program;

and determining an operable instruction corresponding to the operable control.

In an optional embodiment, the apparatus further includes an updating unit 1140, configured to:

when a program interface is switched, interface change information including interface switching information is sent to the voice recognition program;

and/or sending interface change information including program exit information to the voice recognition program when the program exits.

The present disclosure provides a method, device and program product for controlling a program in a voice manner, which are applied to a voice technology in a computer technology, so as to solve the technical problem in the prior art that when a user directly operates a mobile terminal with hands inconveniently, the APP can not be really controlled in a voice manner.

In the technical scheme of the disclosure, the acquisition, storage, application and the like of the personal information of the related user all accord with the regulations of related laws and regulations, and do not violate the good customs of the public order.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

According to an embodiment of the present disclosure, the present disclosure also provides a computer program product comprising: a computer program, stored in a readable storage medium, from which at least one processor of the electronic device can read the computer program, the at least one processor executing the computer program causing the electronic device to perform the solution provided by any of the embodiments described above.

FIG. 8 illustrates a schematic block diagram of an example electronic device 800 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 12, the apparatus 1200 includes a computing unit 1201 which can perform various appropriate actions and processes in accordance with a computer program stored in a Read Only Memory (ROM)1202 or a computer program loaded from a storage unit 1208 into a Random Access Memory (RAM) 1203. In the RAM 1203, various programs and data required for the operation of the device 1200 may also be stored. The computing unit 1201, the ROM 1202, and the RAM 1203 are connected to each other by a bus 1204. An input/output (I/O) interface 1205 is also connected to bus 1204.

Various components in the device 1200 are connected to the I/O interface 1205 including: an input unit 1206 such as a keyboard, a mouse, or the like; an output unit 1207 such as various types of displays, speakers, and the like; a storage unit 1208, such as a magnetic disk, optical disk, or the like; and a communication unit 1209 such as a network card, modem, wireless communication transceiver, etc. The communication unit 1209 allows the device 1200 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

The computing unit 1201 may be a variety of general purpose and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 1201 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 1201 executes the respective methods and processes described above, for example, a method of controlling a program by voice. For example, in some embodiments, the method of controlling a program by speech may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 1208. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 1200 via the ROM 1202 and/or the communication unit 1209. When the computer program is loaded into the RAM 1203 and executed by the computing unit 1201, one or more steps of the above-described method of controlling a program by speech may be performed. Alternatively, in other embodiments, the computing unit 1201 may be configured by any other suitable means (e.g., by means of firmware) to perform a method of controlling a program by voice.

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server can be a cloud Server, also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service ("Virtual Private Server", or simply "VPS"). The server may also be a server of a distributed system, or a server incorporating a blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A method for controlling a program by voice, the method being applied to a voice recognition program of an electronic device in which the voice recognition program and a plurality of third party programs are running, the method comprising:

2. The method of claim 1, wherein the determining, in response to a user-initiated voice instruction, instruction information corresponding to the voice instruction comprises:

responding to a voice instruction initiated by a user, and determining voice content;

determining instruction information including the voice content in the stored correspondence; wherein, the instruction information comprises voice content and operable instructions.

3. The method of claim 1 or 2, further comprising:

receiving interface change information sent by a third-party program;

4. The method of claim 3, wherein updating the stored correspondence between the third-party program and the instruction information according to the interface change information comprises:

and deleting the instruction information corresponding to the third-party program which sends the interface change information in the corresponding relation.

5. The method of any of claims 1-4, prior to determining, in response to a user-initiated voice instruction, instruction information corresponding to the voice instruction, further comprising:

6. The method of any of claims 1-5, further comprising:

and if determining instruction information corresponding to the voice instruction, sending an operable instruction to a third-party program to which the instruction information belongs according to the voice instruction.

7. The method according to any one of claims 1 to 6, wherein the correspondence between the third-party program and the instruction information is determined based on instruction registration information transmitted by the third-party program.

8. A method for controlling a program by voice, the method being applied to a third-party program of an electronic device, the electronic device having a voice recognition program and a plurality of third-party programs running therein, the method comprising:

and completing response processing according to the operable instruction.

9. The method of claim 8, further comprising:

10. The method of claim 9, wherein said sending registration information to the speech recognition program for registering the operational instructions comprises:

sending registration information for registering the operational instructions to a system of the electronic device to cause the system to send the registration information to the voice recognition program.

11. The method of claim 9, wherein determining actionable instructions from a program current interface comprises:

and determining an operable instruction corresponding to the operable control.

12. The method according to any one of claims 8-11, further comprising:

13. An apparatus for controlling a program by voice, the apparatus being applied to a voice recognition program of an electronic device in which the voice recognition program and a plurality of third party programs are run, the apparatus comprising:

14. The apparatus of claim 13, wherein the determining unit comprises:

the content determining module is used for responding to a voice instruction initiated by a user and determining voice content;

the information determining module is used for determining instruction information including the voice content in the stored corresponding relation; wherein, the instruction information comprises voice content and operable instructions.

15. The apparatus according to claim 13 or 14, further comprising a changing unit for:

receiving interface change information sent by a third-party program;

16. The apparatus of claim 15, wherein the altering means comprises:

and the deleting module is used for deleting the instruction information corresponding to the third-party program which sends the interface change information in the corresponding relation.

17. The apparatus according to any one of claims 13-16, further comprising an initiating unit for, before the determining unit determines, in response to a user-initiated voice instruction, instruction information corresponding to the voice instruction:

18. The apparatus according to any one of claims 13-17, wherein if the prompting unit determines an instruction message corresponding to the voice instruction, the control unit is further configured to send an operable instruction to a third-party program to which the instruction message belongs according to the voice instruction.

19. The apparatus according to any one of claims 13 to 18, wherein the correspondence between the third-party program and the instruction information is determined based on instruction registration information transmitted by the third-party program.

20. An apparatus for controlling a program by voice, the apparatus being applied to a third-party program of an electronic device in which a voice recognition program and a plurality of third-party programs are run, the apparatus comprising:

21. The apparatus of claim 20, further comprising a registration unit to:

22. The apparatus of claim 21, wherein the registration unit comprises:

a sending module, configured to send registration information for registering the operational instruction to a system of the electronic device, so that the system sends the registration information to the voice recognition program.

23. The apparatus of claim 22, wherein the registration unit comprises an instruction determination module to:

and determining an operable instruction corresponding to the operable control.

24. The apparatus according to any of claims 21-23, further comprising an updating unit for:

25. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7 or 8-12.

26. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any of claims 1-7 or 8-12.

27. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-7 or 8-12.