CN113593555A

CN113593555A - Method, device and program product for controlling program in voice mode

Info

Publication number: CN113593555A
Application number: CN202110838410.3A
Authority: CN
Inventors: 刘俊启
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-07-23
Filing date: 2021-07-23
Publication date: 2021-11-02
Also published as: WO2023000697A1

Abstract

The method, the equipment and the program product for controlling the program in a voice mode relate to the voice technology, the instruction information of third-party software can be stored in voice processing software, when the user operates the third-party software in a voice mode, the voice processing software determines instruction information corresponding to the voice instruction of the user from the stored plurality of instruction information, moreover, if a plurality of instruction information corresponding to the voice instruction of the user is determined, an operable instruction corresponding to the third-party software in the current active state can be determined in the plurality of instruction information, and then the operable instruction is sent to the third-party software in the active state, so that the third-party software can respond to the voice instruction of the user, by the implementation, the user can control the third-party software without the voice processing function in a voice mode.

Description

Method, device and program product for controlling program in voice mode

Technical Field

The present disclosure relates to a speech technology in computer technology, and more particularly, to a method, an apparatus, and a program product for controlling a program by speech.

Background

Currently, a large amount of software is provided in a mobile terminal, and a user can operate the software in the terminal to use functions provided by the software.

In the prior art, a user may operate a mobile terminal in multiple ways to start software, for example, the software may be started in a touch manner, and then, for example, a certain software may be called up by using a voice assistant provided in the terminal.

Generally, when a user is inconvenient to directly operate a terminal, the terminal is controlled in a voice mode to run software, but after the software is started, if the software does not have a voice recognition function, the user can only operate the software in a touch mode. Therefore, in the prior art, when a user is inconvenient to control the software in a touch mode, the software cannot be really controlled in a voice mode.

Disclosure of Invention

The disclosure provides a method, equipment and a program product for controlling a program in a voice mode, which aim to solve the technical problem that in the prior art, when a user directly operates a mobile terminal by hands inconveniently, APP control in the voice mode cannot be really realized.

According to a first aspect of the present disclosure, there is provided a method for controlling software by voice, the method being applied to a voice processing software of an electronic device, the electronic device having the voice processing software and a plurality of third party software running therein, the method comprising:

receiving a voice instruction initiated by a user and used for controlling third-party software, and determining instruction information corresponding to the voice instruction, wherein the instruction information of the third-party software is stored in the voice processing software;

if a plurality of instruction information corresponding to the voice instruction are determined, wherein the plurality of instruction information corresponding to the voice instruction belong to different third-party software respectively, and the different third-party software comprises third-party software in an active state, determining an operable instruction corresponding to the third-party software in the plurality of instruction information corresponding to the voice instruction;

and sending the operable instruction to the third-party software in the active state for response processing.

According to a second aspect of the present disclosure, there is provided a method for controlling software by voice, the method being applied to third-party software of an electronic device, the electronic device having voice processing software and a plurality of third-party software running therein, the method comprising:

receiving an operability instruction sent by voice processing software; the operable instruction is determined in a plurality of instruction information according to the third-party software in an active state, and the instruction information is determined according to the voice instruction of the user and belongs to different third-party software respectively; the voice processing software stores instruction information of third-party software;

and completing response processing according to the operable instruction.

According to a third aspect of the present disclosure, there is provided an apparatus for controlling software by voice, the apparatus being applied to voice processing software of an electronic device, the electronic device running therein the voice processing software and a plurality of third party software, the apparatus comprising:

the information determining unit is used for receiving a voice instruction initiated by a user and used for controlling third-party software and determining instruction information corresponding to the voice instruction, wherein the instruction information of the third-party software is stored in the voice processing software;

the instruction determining unit is used for determining an operable instruction corresponding to the third-party software in the active state from a plurality of instruction information corresponding to the voice instruction if a plurality of instruction information corresponding to the voice instruction are determined, wherein the plurality of instruction information corresponding to the voice instruction belong to different third-party software respectively, and the different third-party software comprises the third-party software in the active state;

and the control unit is used for sending the operable instruction to the third-party software in the active state for response processing.

According to a fourth aspect of the present disclosure, there is provided an apparatus for controlling software by voice, the apparatus being applied to third-party software of an electronic device, the electronic device running therein voice processing software and a plurality of third-party software, the apparatus comprising:

the receiving unit is used for receiving operability instructions sent by the voice processing software; the operable instruction is determined in a plurality of instruction information according to the third-party software in an active state, and the instruction information is determined according to the voice instruction of the user and belongs to different third-party software respectively; the voice processing software stores instruction information of third-party software;

and the response unit is used for finishing response processing according to the operable instruction.

According to a fifth aspect of the present disclosure, there is provided an electronic device comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of the first or second aspect.

According to a sixth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of the first or second aspect.

According to a seventh aspect of the present disclosure, there is provided a computer program product comprising: a computer program, stored in a readable storage medium, from which at least one processor of an electronic device can read the computer program, execution of the computer program by the at least one processor causing the electronic device to perform the method of the first or second aspect.

According to the method, the device and the program product for controlling the program in the voice mode, the instruction information of the third-party software can be stored in the voice processing software, when the user controls the third-party software in the voice mode, the voice processing software determines the instruction information corresponding to the voice instruction of the user from the stored plurality of instruction information, and if the plurality of instruction information corresponding to the voice instruction of the user is determined, the operable instruction corresponding to the third-party software in the current active state can be determined from the plurality of instruction information, and then the operable instruction is sent to the third-party software in the active state, so that the third-party software can respond to the voice instruction of the user, and through the implementation mode, the user can control the third-party software without the voice processing function in the voice mode.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a diagram illustrating waking software using voice in an exemplary embodiment;

fig. 2 is a flowchart illustrating a method for controlling software by voice according to a first exemplary embodiment of the present disclosure;

fig. 3 is a flowchart illustrating a method for controlling software by voice according to a second exemplary embodiment of the present disclosure;

fig. 4 is a flowchart illustrating a method of controlling software by voice according to a third exemplary embodiment of the present disclosure;

fig. 5 is a flowchart illustrating a method for controlling software by voice according to a fourth exemplary embodiment of the present disclosure;

fig. 6 is a flowchart illustrating a method of controlling a program by voice according to a fifth exemplary embodiment of the present disclosure;

fig. 7 is a schematic structural diagram of an apparatus for controlling a program by voice according to a first exemplary embodiment of the present disclosure;

fig. 8 is a schematic structural diagram of an apparatus for controlling a program by voice according to a second exemplary embodiment of the present disclosure;

fig. 9 is a schematic structural diagram of an apparatus for controlling a program by voice according to a third exemplary embodiment of the present disclosure;

fig. 10 is a schematic structural diagram of an apparatus for controlling a program by voice according to a fourth exemplary embodiment of the present disclosure;

fig. 11 is a block diagram of an electronic device for implementing a method of controlling a program by voice according to an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

FIG. 1 is a diagram illustrating waking software using a voice mode according to an exemplary embodiment.

As shown in fig. 1, the user 11 may speak a voice command, and the electronic device 12 may be provided with voice recognition software. The electronic device 12 may recognize the content of the voice instruction using the voice recognition software provided and perform the operation of responding according to the content of the voice instruction.

For example, when the content of the voice command is "start first software", the electronic device 12 may run the first program, and the interface thereof is updated from the state 13 to the state 14.

If the first software has the voice recognition function, the user can continue to control the first software based on the voice control mode, and if the first software does not have the voice recognition function, the first software cannot respond when the user continues to control the first software based on the voice control mode.

Especially, when the user is inconvenient to control the electronic device with both hands, the electronic device can only start one application software based on the voice command of the user, but cannot continue to control the application software based on the voice command of the user.

In order to solve the technical problem, in the scheme provided by the disclosure, instruction information of third-party software is stored in the voice processing software, and when the electronic device receives a voice instruction initiated by a user and used for controlling the third-party software, the voice processing software can determine the instruction information corresponding to the voice instruction and send an operable instruction to the third-party software according to the instruction information, so that the third-party software can respond to the voice instruction of the user. By the implementation mode, the user can wake up the third-party software in a voice mode and control the started third-party software.

Fig. 2 is a flowchart illustrating a method for controlling software by voice according to a first exemplary embodiment of the present disclosure.

The method for controlling software in a voice mode provided by the disclosure is applied to a voice processing program of an electronic device, and the electronic device can execute the method of the disclosure based on the function of the voice processing software.

Speech processing software and a plurality of third party software are run in the electronic device. After the electronic device receives the voice command, the voice command can be processed by the voice processing software running in the electronic device.

Wherein the third party program is software other than speech processing software.

With continued reference to fig. 2, the present disclosure provides a method for controlling software by voice, including:

step 201, receiving a voice instruction initiated by a user for controlling third-party software, and determining instruction information corresponding to the voice instruction, wherein the instruction information of the third-party software is stored in the voice processing software.

Specifically, instruction information of any one or more third-party software is stored in the voice processing software. For example, during the running process of the third-party software, the registration information for registering the instruction may be sent to the voice processing software, so that the voice processing software stores the instruction information of the third-party software. In practical application, the voice processing software can store a plurality of instruction messages of any third-party software.

For example, the stored piece of information may include instruction information and identification information of the third party software to which the instruction information belongs. For example, the first software and a piece of instruction information of the first software may be used.

Further, the voice instruction initiated by the user may be an instruction for controlling any third-party software, and after receiving the voice instruction, the electronic device may process the voice instruction through the running voice processing software.

In an optional implementation manner, each piece of instruction information may include instruction content, and the voice processing software may determine, according to the content included in the voice instruction and the instruction content included in each piece of instruction information, instruction information corresponding to the voice instruction. The instruction content may be, for example, "previous page", "next page", "tone up a little", "tone down a little", and the like.

In practical application, the voice processing software may determine instruction information corresponding to a voice instruction initiated by a current user according to pre-stored instruction information of third-party software. For example, the voice processing software stores instruction information of the first software, the second software, and the third software, and then, of the instruction information of the three software, instruction information matched with a voice instruction initiated by a user can be determined.

For example, if the user-initiated voice command includes "previous page", the voice processing software may search the stored command information for command information corresponding to "previous page". For example, the first software is browser software, the first software may have instruction information corresponding to "previous page". In this embodiment, the voice processing software may directly acquire an operable instruction in the instruction information corresponding to the "previous page" of the first software and send it to the first software, so that the first software executes the operable instruction.

In an alternative embodiment, for example, the second software is electronic book software, which may also have instruction information corresponding to "previous page", and the voice processing software can determine the two instruction information corresponding to the voice instruction. In this case, step 202 may be performed.

Step 202, if a plurality of instruction information corresponding to the voice instruction are determined, where the plurality of instruction information corresponding to the voice instruction belong to different third-party software respectively, and the different third-party software includes third-party software in an active state, determining an operable instruction corresponding to the third-party software in the plurality of instruction information corresponding to the voice instruction.

If a plurality of instruction information corresponding to the voice instruction are determined, and the plurality of instruction information corresponding to the voice instruction belong to different third-party software respectively, the voice processing software needs to determine the third-party software which the user wants to operate and select the instruction information of the third-party software from the plurality of instruction information, and then the third-party software is controlled based on the instruction information.

In practical application, software operated by a general user is third-party software which is currently in an active state, so that when the determined third-party software to which the plurality of instruction information belongs includes the third-party software in the active state, the third-party software in the active state can be directly determined as the third-party software which the user wants to operate.

The voice processing software can screen out instruction information of the third-party software in an active state from the determined instruction information and acquire an operable instruction in the instruction information.

Specifically, the third-party software in the active state refers to software displayed in a display interface of the electronic device, for example, if an interface of browser software is currently displayed in the interface of the electronic device, the browser software is the third-party software in the active state.

In one embodiment, the voice instruction initiated by the user includes a "next page", the voice processing software determines that the first software and the second software both have instruction information corresponding to the voice instruction, and the first software is in an active state, so that the voice processing software may obtain an operable instruction in the instruction information of the first software from the determined two pieces of instruction information, and use the operable instruction as an operable instruction matching with the real control intention of the user.

And step 203, sending the operable instruction to the third-party software in the active state for response processing.

In actual application, the voice processing software can send the operational instructions of the active third-party software to the active third-party software.

In one embodiment, the voice processing software may directly send the operable instruction to the third-party software in the active state, and may also send the operable instruction to a system of the electronic device, and the system forwards the operable instruction to the third-party software in the active state.

And after the operational instruction is received by the active third-party software, the corresponding instruction can be executed, so that the response is made. Because the operable instruction corresponds to the voice instruction of the user, when the third-party software executes the operable instruction, the effect of responding to the voice instruction of the user can be achieved, and the user can control the third-party software without the voice processing function in a voice mode.

The method for controlling software in a voice mode is applied to voice processing software of electronic equipment, the voice processing software and a plurality of third-party software run in the electronic equipment, and the method comprises the following steps: receiving a voice instruction initiated by a user and used for controlling third-party software, and determining instruction information corresponding to the voice instruction, wherein the instruction information of the third-party software is stored in the voice processing software; if a plurality of instruction information corresponding to the voice instruction are determined, wherein the plurality of instruction information corresponding to the voice instruction belong to different third-party software respectively, and the different third-party software comprises third-party software in an active state, determining an operable instruction corresponding to the third-party software in the active state from the plurality of instruction information corresponding to the voice instruction; and sending the operable instruction to the third-party software in the active state for response processing. According to the method for controlling the software in the voice mode, the voice processing software can store instruction information of third-party software, when a user controls the third-party software in the voice mode, the voice processing software determines the instruction information corresponding to the voice instruction of the user from the stored instruction information, and if the instruction information corresponding to the voice instruction of the user is determined, an operable instruction corresponding to the third-party software in the current active state can be determined from the instruction information, and then the operable instruction is sent to the third-party software in the active state, so that the third-party software can respond to the voice instruction of the user, and through the implementation mode, the user can control the third-party software without the voice processing function in the voice mode.

Fig. 3 is a flowchart illustrating a method for controlling software by voice according to a second exemplary embodiment of the present disclosure.

With continued reference to fig. 3, the present disclosure provides a method for controlling software by voice, including:

step 301, responding to a starting instruction for starting the third-party program, starting the third-party program.

Wherein, the user can send a starting instruction for starting the third-party software to the electronic device, and the starting instruction can be a voice-form instruction for example. For example, the user may say "open the first software", and the voice instruction may be processed by voice processing software provided in the electronic device.

In an alternative embodiment, after receiving the start instruction in the form of voice, the voice processing software may directly start the corresponding software. For example, the third party software may be provided with an interface so that the speech processing software can run the third party software through the interface.

In another embodiment, after receiving the start instruction in the form of voice, the voice processing software may convert the start instruction in the form of voice into a control instruction, send the control instruction to the system of the electronic device, and start the third-party software based on the control instruction by the system of the electronic device.

Through the implementation mode, the user can start the third-party software in a voice mode, and when the user is inconvenient to touch the electronic equipment with hands, the third-party software can be started in a voice control mode.

Step 302, receiving a voice instruction initiated by a user for controlling third-party software, and determining voice content according to the voice instruction.

Instruction information of a third-party program is stored in the voice recognition software.

For example, the voice recognition software may receive registration information sent by the third party software, the registration information being used to register the operational instructions.

In practical application, after the third-party software in the electronic device is started, the third-party software can send registration information for registering the operable instruction to the voice processing software.

The third-party software can determine the registration information according to the operational instruction which can be supported in the running process and send the registration instruction to the voice processing software.

Specifically, the third-party software may further determine, according to the operable control included in the current software interface, the operable instruction that can be supported in the current software interface, and send registration information for registering the operable instruction to the voice processing software. For example, if the software currently supports the next page, previous page, and definite operational instructions, the third-party software may send registration information for registering the next page, previous page, and definite operational instructions to the voice processing software.

Further, the voice processing software may be provided with an interface so that third party software may send registration information to the voice processing software through the interface. The third party software may also send registration information to the system of the electronic device, which the system forwards to the voice processing software.

The voice recognition software can also store the corresponding relation between the third-party software and the instruction information according to the registration information.

In practical application, after the voice processing software receives the registration information, the corresponding relationship between the third-party software and the instruction information can be stored according to the registration information. For example, if the first software sends registration information to the voice processing software, the voice processing software may determine instruction information according to the registration information and store the association relationship between the first software and the instruction information.

Through the implementation mode, the voice processing software can store the corresponding relation between the third-party software and the instruction information according to the registration information sent by the third-party software, so that the voice processing software can control the third-party software according to the voice instruction of the user, and the effect that the user can control the third-party software without the voice processing function in a voice mode is achieved.

In one embodiment, the registration information sent by the third-party software to the voice processing software may include an operable instruction, and in this embodiment, the voice processing software may determine instruction content corresponding to each operable instruction, thereby obtaining instruction information including the operable instruction and the instruction content corresponding thereto, and may further store an association relationship between the third-party software and the instruction information.

In another embodiment, the registration information sent by the third-party software to the voice processing software may include an operable instruction and instruction content corresponding thereto, and the voice processing software may directly store the association relationship between the third-party software and the instruction information, where the instruction information includes the operable instruction and the instruction content corresponding thereto.

For example, the voice processing software may store the association relationship between the instruction information and the third-party software in the form of an instruction configuration table. The instruction configuration table may include three columns, one column is the instruction content, one column is the software, and the other column is the instruction format of the operable instruction. For example, the instruction content "previous page" and the instruction format "back" may be included in one line of information in the instruction configuration table, and the software is the first software.

The third-party software can be controlled by the user through the voice instruction, for example, the "previous page" of the voice content can be spoken, and the voice processing software can determine the voice content included in the voice instruction spoken by the user based on the voice processing algorithm, for example, the "previous page" of the voice content in the voice instruction spoken by the user can be processed.

Step 303, determining instruction information including voice content in the stored corresponding relation; wherein, the instruction information comprises voice content and operable instructions.

Specifically, the voice processing software stores a corresponding relationship between the instruction information and the third-party software, and the voice processing software can specifically determine, from the corresponding relationship, a corresponding relationship in which the instruction information includes the voice content.

For example, the stored correspondence includes two correspondences, where the correspondence 1 includes a correspondence between the instruction information 1 and the first software; the correspondence relation 2 includes a correspondence relation between the instruction information 2 and the second software. The content of the voice included in the instruction information 1 is "previous page", and the content of the voice included in the instruction information 2 is "previous page".

Further, if the voice instruction made by the user includes the voice content "previous page", the voice processing software may determine that instruction information 1 and instruction information 2 including the voice content.

In this embodiment, the voice processing software can determine the instruction information corresponding to the voice instruction made by the user based on the correspondence between the pre-stored instruction information and the third-party software, and further can convert the voice instruction into the instruction information, so as to achieve the purpose of controlling the third-party software by using the determined instruction information.

Step 304, if a plurality of instruction information corresponding to the voice instruction are determined, wherein the plurality of instruction information corresponding to the voice instruction belong to different third-party software respectively, and the different third-party software includes third-party software in an active state, target instruction information belonging to the third-party software in the active state is determined in the plurality of instruction information corresponding to the voice instruction.

The voice processing software may determine, as the target instruction information, instruction information corresponding to the third-party software in an active state among a plurality of instruction information corresponding to the voice instruction. For example, if it is determined that two pieces of instruction information correspond to a voice instruction, where the first piece of instruction information belongs to the third-party software in an active state, the first piece of instruction information may be determined as target instruction information.

And step 305, determining the operable instruction in the target instruction information as the operable instruction of the third-party software in the active state.

Specifically, the voice processing software may obtain the operational instruction included in the target instruction information as the operational instruction of the third-party software in the active state.

In this embodiment, when a plurality of instruction information corresponding to the voice instruction initiated by the user are determined, the voice processing software may filter out an operable instruction that meets the control intention of the user, and may further respond to the voice instruction of the user according to the operable instruction.

And step 306, sending the operable instruction to the third-party software in the active state for response processing.

Step 306 is similar to the implementation of step 203, and is not described again.

And 307, receiving interface change information sent by the third-party software.

Specifically, if the third-party software sends instruction registration information to the voice processing software according to the operational instruction supported in the software interface, the third-party software may send interface change information to the voice processing instruction each time the interface of the third-party software is changed.

For example, when the interface of the third-party software is switched, the third-party software may send interface change information to the voice processing software.

In one embodiment, after the interface of the third-party software is switched, the third-party software may continue to send instruction registration information to the voice processing software, where the instruction registration information may carry interface change information.

In another embodiment, the third party software may send interface change information to the speech processing software when it exits.

And step 308, updating the stored corresponding relation between the third-party software and the instruction information according to the interface change information.

Further, after the voice processing software receives the interface change information sent by the third-party software, the stored corresponding relationship between the third-party software and the instruction information can be updated.

Because the instruction information stored in the voice processing software is the information of the operational instruction which can be supported in the current interface of the third-party software, the instruction information of the third-party software stored in the voice processing software should be updated after the interface of the third-party software is changed, so that the voice processing software can search the instruction information corresponding to the instruction information according to the voice instruction made by the user.

In one embodiment, if the voice processing software receives the interface change information sent by the third-party software, the voice processing software may directly delete the instruction information of the third-party software.

Thereafter, the voice processing software may continue to receive the registration information sent by the third-party software, and further store the instruction information corresponding to the operational instruction supported in the current interface of the third-party software.

Further, if the third-party software is interface change information sent to the voice processing software when exiting, the third-party software will not send instruction registration information to the voice processing software again before restarting.

In practical application, after the interface of the third-party software is changed, the voice processing software can delete the voice information of the third-party software, so that the problem of misoperation caused by the fact that the voice processing software processes the voice instruction of a user according to the instruction information of the historical interface of the third-party software is solved.

Fig. 4 is a flowchart illustrating a method for controlling software by voice according to a third exemplary embodiment of the present disclosure.

The method for controlling the software in the voice mode is applied to third-party software of the electronic equipment, and the electronic equipment can execute the method based on the function of the third-party software.

Speech processing software and a plurality of third party software are run in the electronic device. Any third party software may perform the methods provided by the present disclosure.

With continued reference to fig. 4, the present disclosure provides a method for controlling software by voice, including:

step 401, receiving an operability instruction sent by voice processing software; the operable instruction is determined in a plurality of instruction information according to the third-party software in an active state, and the instruction information is determined according to the voice instruction of the user and belongs to different third-party software respectively; the voice processing software is stored with instruction information of third-party software.

Wherein, the instruction information of any one or more third-party software is stored in the voice processing software. For example, in the running process of the third-party software, the instruction registration information may be sent to the voice processing software, so that the voice processing software stores the instruction information of the third-party software. In practical application, the voice processing software can store a plurality of instruction messages of any third-party software.

Further, the voice instruction initiated by the user may be an instruction for controlling any third-party software, and after receiving the voice instruction, the electronic device may process the voice instruction through built-in voice processing software.

If the voice processing software determines that a plurality of instruction information corresponding to the voice instruction exist, and different third-party software comprises the third-party software in the active state, determining an operable instruction corresponding to the third-party software in the active state in the plurality of instruction information corresponding to the voice instruction.

For example, the voice instruction initiated by the user includes "next page", the voice processing software determines that the first software and the second software both have instruction information corresponding to the voice instruction, and the first software is in an active state, then the voice processing software may obtain an operable instruction in the instruction information of the first software from the two pieces of determined instruction information, and take the operable instruction as an operable instruction matching with the real control intention of the user.

The voice processing software can send the operable instruction to the third-party software in the active state for response processing, and further respond to the voice instruction of the user.

The speech processing software may specifically send the operational instructions to the third party software according to the method shown in the embodiment shown in fig. 2.

Step 402, response processing is completed according to the operable instruction.

And after the third-party software indicated by the specified instruction receives the operational instruction, the corresponding instruction can be executed, so that the third-party software responds. Because the operable instruction corresponds to the voice instruction of the user, when the third-party software executes the operable instruction, the effect of responding to the voice instruction of the user can be achieved, and the user can control the third-party software without the voice processing function in a voice mode.

Fig. 5 is a flowchart illustrating a method for controlling software by voice according to a fourth exemplary embodiment of the present disclosure.

With continued reference to fig. 5, the present disclosure provides a method for controlling software by voice, including:

step 501, determining an operable instruction according to a current software interface.

After the third-party program is started, the operable instruction can be determined according to the current program interface. Specifically, the operational instructions supported in the program can be determined according to the current interface of the program.

Specifically, the third-party software may obtain information of an operable control included in the current interface of the software; and determining an operable instruction corresponding to the operable control.

In an optional implementation manner, the third-party software may further determine the instruction content according to the operational instruction, so as to obtain instruction information including the operational instruction and the instruction content corresponding to the operational instruction.

Step 502, sending registration information for registering the operational instructions to the voice processing software.

The third party software may also send registration information to the voice processing software for registering these operational instructions. For example, if the software currently supports the next page, previous page, and definite operational instructions, the third-party software may send registration information for registering the next page, previous page, and definite operational instructions to the voice processing software.

Further, the third party software may send registration information for registering the instruction information to the system of the electronic device to cause the system to send the registration information to the voice processing software.

In another embodiment, the voice processing software may be provided with an interface so that third party software may send registration information to the voice processing software through the interface. The third party software may also send registration information to the system of the electronic device, which is forwarded by the system to the voice processing software.

Specifically, after receiving the registration information, the voice processing software may store the corresponding relationship between the third-party software and the instruction information according to the registration information. For example, if the first software sends registration information to the voice processing software, the voice processing software may determine instruction information according to the registration information and store the association relationship between the first software and the instruction information.

Step 503, receiving an operability instruction sent by the voice processing software; the operable instruction is determined in a plurality of instruction information according to the third-party software in an active state, and the plurality of instruction information are information which belongs to different third-party software respectively and is determined according to the voice instruction of the user; the voice processing software stores instruction information of third-party software.

Step 504, response processing is completed according to the operable instruction.

The implementation manners of

steps

503 and 504 are similar to those of

steps

401 and 402, and are not described again.

Step 505, when the software interface is switched, sending interface change information including interface switching information to the voice processing software; and/or sending interface change information comprising software exit information to the voice processing software when the software exits.

In one embodiment, after the interface of the third-party software is switched, the third-party software may continue to send registration information to the voice processing software, where the registration information may carry interface change information.

The voice processing software can update the corresponding relation between the stored third-party software and the instruction information according to the interface change information. Because the instruction information stored in the voice processing software is the information of the operational instruction which can be supported in the current interface of the third-party software, the instruction information of the third-party software stored in the voice processing software should be updated after the interface of the third-party software is changed, so that the voice processing software can search the instruction information corresponding to the instruction information according to the voice instruction made by the user.

Fig. 6 is a flowchart illustrating a method for controlling a program by voice according to a fifth exemplary embodiment of the present disclosure.

As shown in fig. 6, in the method for controlling a program in a voice manner provided by the present disclosure, the method may specifically include:

step a, a user sends out a starting instruction for starting third-party software.

And b, the voice recognition software starts the third-party software according to the starting instruction.

And c, determining an operable instruction by the third-party software according to the current software interface.

And d, the third-party software sends registration information for registering the operable instruction to the voice recognition software.

And e, the voice recognition software stores the corresponding relation between the third-party software and the instruction information according to the registration information.

And f, the user initiates a voice instruction for controlling the third-party software.

And g, the voice recognition software determines instruction information corresponding to the voice instruction, and if the instruction information corresponding to the voice instruction is determined to be provided with a plurality of instruction information corresponding to the voice instruction, an operable instruction corresponding to the third-party software in the active state is determined in the instruction information corresponding to the voice instruction.

And step i, the voice recognition software sends the determined operable instruction to the third-party software in the active state.

And step j, after the third-party software receives the operable instruction, the corresponding operable instruction can be executed, so that the operable instruction is responded.

Through the process, the third-party software can be controlled by the user in a voice control mode, and even if the third-party software does not have a voice recognition function, the effect can still be achieved.

Fig. 7 is a schematic structural diagram of an apparatus for controlling a program by voice according to a first exemplary embodiment of the present disclosure.

As shown in fig. 7, the apparatus 700 for controlling a program by voice according to the present disclosure is applied to a voice processing software of an electronic device, where the voice processing software and a plurality of third party software are run in the electronic device, and the apparatus 700 includes:

an information determining unit 710, configured to receive a voice instruction initiated by a user and used to control third-party software, and determine instruction information corresponding to the voice instruction, where the instruction information of the third-party software is stored in the voice processing software;

an instruction determining unit 720, configured to determine, if a plurality of pieces of instruction information corresponding to the voice instruction are determined, where the plurality of pieces of instruction information corresponding to the voice instruction belong to different third-party software respectively, and the different third-party software includes a third-party software in an active state, an operable instruction corresponding to the third-party software in the active state in the plurality of pieces of instruction information corresponding to the voice instruction;

and the control unit 730 is used for sending the operable instruction to the third-party software in the active state for response processing.

The device for controlling a program in a voice manner provided by the present disclosure is similar to the implementation manner, principle and effect of the embodiment shown in fig. 2, and is not described again.

Fig. 8 is a schematic structural diagram of an apparatus for controlling a program by voice according to a second exemplary embodiment of the present disclosure.

As shown in fig. 8, in the apparatus 800 for controlling a program by voice according to the present disclosure, an information determining unit 810 is similar to the information determining unit 710 in fig. 7, an instruction determining unit 820 is similar to the instruction determining unit 720 in fig. 7, and a control unit 830 is similar to the control unit 730 in fig. 7.

Based on the embodiment shown in fig. 7, the present disclosure provides a device 800 for controlling a program in a voice manner

The information determining unit 810 includes:

a content determining module 811, configured to determine a voice content included in the voice instruction;

an information determining module 812, configured to determine instruction information including the voice content in the stored correspondence; wherein, the instruction information comprises voice content and operable instructions.

In an optional implementation, the instruction determining unit 820 includes:

a target information determination module 821, configured to determine target instruction information corresponding to the active third-party software from among a plurality of instruction information corresponding to the voice instruction;

an instruction determining module 822, configured to determine an operable instruction in the target instruction information as an operable instruction corresponding to the active third-party software.

In an optional implementation, the apparatus further includes a changing unit 840, configured to:

receiving interface change information sent by third-party software;

and updating the corresponding relation between the stored third-party software and the instruction information according to the interface change information.

In an alternative embodiment, the changing unit 840 includes:

and a deleting module 841, configured to delete instruction information corresponding to the third-party software that sends the interface change information in the correspondence relationship.

In an optional implementation manner, the correspondence between the third-party software and the instruction information is determined according to registration information, and the registration information is sent by the third-party software.

The device for controlling a program in a voice manner provided by the present disclosure is similar to the implementation manner, principle and effect of the embodiment shown in fig. 3, and is not described again.

Fig. 9 is a schematic structural diagram of an apparatus for controlling a program by voice according to a third exemplary embodiment of the present disclosure.

As shown in fig. 9, the apparatus 900 is applied to third-party software of an electronic device, where a voice processing software and a plurality of third-party software are running, and includes:

a receiving unit 910, configured to receive an operability instruction sent by voice processing software; the operable instruction is determined in a plurality of instruction information according to the third-party software in an active state, and the instruction information is determined according to the voice instruction of the user and belongs to different third-party software respectively; the voice processing software stores instruction information of third-party software;

and a response unit 920, configured to complete response processing according to the operational instruction.

The device for controlling a program in a voice manner provided by the present disclosure is similar to the implementation manner, principle and effect of the embodiment shown in fig. 4, and is not described again.

Fig. 10 is a schematic structural diagram of an apparatus for controlling a program by voice according to a fourth exemplary embodiment of the present disclosure.

As shown in fig. 10, the present disclosure provides a device 1000 for controlling a program in a voice manner, in which a receiving unit 1010 is similar to the receiving unit 910 in fig. 9, and a responding unit 1020 is similar to the responding unit 920 in fig. 9.

On the basis of the embodiment shown in fig. 9, a registration unit 1030 is further included, configured to:

determining an operable instruction according to a current software interface;

sending registration information for registering the operational instructions to the voice processing software.

In an optional implementation manner, the registering unit 1030 includes a sending module 1031, configured to:

sending registration information for registering the operational instructions to a system of the electronic device to cause the system to send the registration information to the voice processing software.

In an optional implementation manner, the registration unit 1030 includes an instruction determining module 1032, configured to:

acquiring information of an operable control included in a current interface of the software;

and determining an operable instruction corresponding to the operable control.

In an optional implementation, the apparatus further includes a changing unit 1040, configured to:

when a software interface is switched, interface change information including interface switching information is sent to the voice processing software;

and/or sending interface change information comprising software exit information to the voice processing software when the software exits.

The device for controlling a program in a voice manner provided by the present disclosure is similar to the implementation manner, principle and effect of the embodiment shown in fig. 5, and is not described again.

The present disclosure provides a method, device and program product for controlling a program in a voice manner, which are applied to a voice technology in a computer technology, so as to solve the technical problem in the prior art that when a user directly operates a mobile terminal with hands inconveniently, the APP can not be really controlled in a voice manner.

In the technical scheme of the disclosure, the acquisition, storage, application and the like of the personal information of the related user all accord with the regulations of related laws and regulations, and do not violate the good customs of the public order.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

According to an embodiment of the present disclosure, the present disclosure also provides a computer program product comprising: a computer program, stored in a readable storage medium, from which at least one processor of the electronic device can read the computer program, the at least one processor executing the computer program causing the electronic device to perform the solution provided by any of the embodiments described above.

FIG. 11 shows a schematic block diagram of an example electronic device 1100 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 11, the device 1100 comprises a computing unit 1101, which may perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)1102 or a computer program loaded from a storage unit 1108 into a Random Access Memory (RAM) 1103. In the RAM 1103, various programs and data necessary for the operation of the device 1100 may also be stored. The calculation unit 1101, the ROM 1102, and the RAM 1103 are connected to each other by a bus 1104. An input/output (I/O) interface 1105 is also connected to bus 1104.

A number of components in device 1100 connect to I/O interface 1105, including: an input unit 1106 such as a keyboard, a mouse, and the like; an output unit 1107 such as various types of displays, speakers, and the like; a storage unit 1108 such as a magnetic disk, optical disk, or the like; and a communication unit 1109 such as a network card, a modem, a wireless communication transceiver, and the like. The communication unit 1109 allows the device 1100 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

The computing unit 1101 can be a variety of general purpose and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 1101 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and the like. The calculation unit 1101 performs the respective methods and processes described above, for example, a method of controlling software by voice. For example, in some embodiments, the method of controlling software by speech may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 1108. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 1100 via ROM 1102 and/or communication unit 1109. When the computer program is loaded into RAM 1103 and executed by the computing unit 1101, one or more steps of the above-described method of controlling software by speech may be performed. Alternatively, in other embodiments, the computing unit 1101 may be configured by any other suitable means (e.g., by means of firmware) to perform a method of controlling software by voice.

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server can be a cloud Server, also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service ("Virtual Private Server", or simply "VPS"). The server may also be a server of a distributed system, or a server incorporating a blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A method for controlling software by voice, the method being applied to voice processing software of an electronic device, the electronic device having the voice processing software and a plurality of third party software running therein, the method comprising:

2. The method of claim 1, wherein the determining instruction information corresponding to the voice instruction comprises:

determining voice content according to the voice instruction;

determining instruction information including the voice content in the stored correspondence; wherein, the instruction information comprises voice content and operable instructions.

3. The method according to claim 1 or 2, wherein the determining an operational instruction corresponding to the active third-party software in the plurality of instruction information corresponding to the voice instruction comprises:

determining target instruction information belonging to the third-party software in an active state from a plurality of instruction information corresponding to the voice instruction;

and determining the operable instruction in the target instruction information as the operable instruction of the third-party software in the active state.

4. The method of any of claims 1-3, further comprising:

receiving interface change information sent by third-party software;

5. The method of claim 4, wherein updating the stored correspondence between the third-party software and the instruction information according to the interface change information comprises:

and deleting the instruction information corresponding to the third-party software which sends the interface change information in the corresponding relation.

6. The method according to any one of claims 1 to 5, wherein the correspondence between the third-party software and the instruction information is determined according to registration information sent by the third-party software.

7. A method for controlling software in a voice mode, the method is applied to third-party software of an electronic device, voice processing software and a plurality of third-party software run in the electronic device, and the method comprises the following steps:

and completing response processing according to the operable instruction.

8. The method of claim 7, further comprising:

determining an operable instruction according to a current software interface;

9. The method of claim 8, wherein said sending registration information to the speech processing software for registering the operational instructions comprises:

10. The method of claim 8, wherein determining operational instructions from the software current interface comprises:

and determining an operable instruction corresponding to the operable control.

11. The method according to any one of claims 7-10, further comprising:

12. An apparatus for controlling software by voice, the apparatus being applied to voice processing software of an electronic device, the electronic device having the voice processing software and a plurality of third party software running therein, the apparatus comprising:

13. The apparatus of claim 12, wherein the information determining unit comprises:

the content determining module is used for determining the voice content included in the voice instruction;

the information determining module is used for determining instruction information including the voice content in the stored corresponding relation; wherein, the instruction information comprises voice content and operable instructions.

14. The apparatus of claim 12 or 13, wherein the instruction determination unit comprises:

the target information determining module is used for determining target instruction information corresponding to the active third-party software in a plurality of instruction information corresponding to the voice instruction;

and the instruction determining module is used for determining the operable instruction in the target instruction information as the operable instruction corresponding to the third-party software in the active state.

15. The apparatus according to any of claims 12-14, further comprising a modification unit for:

receiving interface change information sent by third-party software;

16. The apparatus of claim 15, wherein the altering means comprises:

and the deleting module is used for deleting the instruction information corresponding to the third-party software which sends the interface change information in the corresponding relation.

17. The apparatus according to any one of claims 12 to 16, wherein the correspondence between the third-party software and the instruction information is determined based on registration information transmitted by the third-party software.

18. An apparatus for controlling software by voice, the apparatus is applied to third-party software of an electronic device, the electronic device runs with voice processing software and a plurality of third-party software, the apparatus comprises:

19. The apparatus of claim 18, further comprising a registration unit to:

determining an operable instruction according to a current software interface;

20. The apparatus of claim 19, wherein the registration unit comprises a sending module configured to:

21. The apparatus of claim 19, wherein the registration unit comprises an instruction determination module to:

and determining an operable instruction corresponding to the operable control.

22. The apparatus according to any of claims 18-21, further comprising a modification unit for:

23. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6 or 7-11.

24. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any of claims 1-6 or 7-11.

25. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-6 or 7-11.