CN112286485B

CN112286485B - Method and device for controlling application through voice, electronic equipment and storage medium

Info

Publication number: CN112286485B
Application number: CN202011596720.0A
Authority: CN
Inventors: 熊文龙; 贺永强
Original assignee: Zhidao Network Technology Beijing Co Ltd
Current assignee: Zhidao Network Technology Beijing Co Ltd
Priority date: 2020-12-30
Filing date: 2020-12-30
Publication date: 2021-04-13
Anticipated expiration: 2040-12-30
Also published as: CN112286485A

Abstract

The embodiment of the invention provides a method, a device, electronic equipment and a storage medium for controlling application through voice; the method comprises the following steps: receiving a voice instruction sent by a user, and determining a search keyword according to the voice instruction and a target application; sending the search keyword to a cloud server so that the cloud server determines identification information of a target control from a preset control identification information set of a target application according to the search keyword; receiving identification information of the target control, and judging whether the candidate control information set contains position information of the target control or not according to the identification information of the target control and a preset candidate control information set; and when the candidate control information set contains the position information of the target control, generating simulated click behaviors according to the position information of the target control.

Description

Method and device for controlling application through voice, electronic equipment and storage medium

Technical Field

The present invention relates to the field of voice control technologies, and in particular, to a method and an apparatus for controlling an application through voice, an electronic device, and a storage medium.

Background

An Application (APP) is software that runs on an intelligent mobile terminal and is capable of performing a specific function. Conventional applications generally require manual operation by a user to implement control of the application, such as by clicking a touch screen to manipulate buttons of the application.

In some applications, however, there is some inconvenience in controlling the application by manual operation. For example, when a user drives a car, the navigation software is controlled through manual operation, especially complicated manual operation such as character input is easy to disperse the energy of the user, and potential safety hazards are brought. It is therefore desirable to operate an application by means of speech.

Some existing applications, such as the Goodpasts map, implement voice operations on applications by internally integrating SDKs for voice operations. However, for most current applications, the SDK for voice operation is not integrated in advance, which makes the applications unable to be controlled by voice, and limits the use scenarios of the applications.

Disclosure of Invention

To solve the problems in the prior art, embodiments of the present invention provide a method and an apparatus for controlling an application through voice, an electronic device, and a storage medium.

An embodiment of a first aspect of the present invention provides a method for controlling an application by voice, where the method is applied to an electronic device, and the method includes:

receiving a voice instruction sent by a user, and determining a search keyword according to the voice instruction and a target application; the target application is an application which runs at the forefront end of the electronic equipment at the current moment;

sending the search keyword to a cloud server so that the cloud server determines identification information of a target control from a preset control identification information set of a target application according to the search keyword, wherein the target control is determined according to the voice instruction;

receiving identification information of the target control, and judging whether the candidate control information set contains position information of the target control or not according to the identification information of the target control and a preset candidate control information set; the candidate control information comprises identification information and position information of a candidate control, and the candidate control is a control contained in an interface of the target application at the current moment;

and when the candidate control information set contains the position information of the target control, generating simulated click behaviors according to the position information of the target control.

According to the method for controlling the application through the voice, which is provided by the invention, the step of determining the search keyword according to the voice instruction and the target application comprises the following steps:

performing semantic recognition on the voice instruction to obtain an intention text of the voice instruction;

acquiring an identifier of the target application;

and determining a search keyword according to the intention text of the voice instruction and the identification of the target application.

According to the method for controlling the application through the voice, provided by the invention, the identification information comprises text information, identification information and type information;

correspondingly, the determining whether the candidate control information set includes the position information of the target control according to the identification information of the target control and a preset candidate control information set includes:

step S1, comparing the text information of the target control with the text information of each candidate control in the candidate control information set one by one, and if the comparison is successful, executing step S4; if the comparison fails, comparing the identification information of the target control with the identification information of each candidate control in the candidate control information set one by one, and if the comparison succeeds, executing step S4; if the comparison fails, comparing the type information of the target control with the type information of each candidate control in the candidate control information set one by one, and if the comparison is successful, executing step S4; if the comparison fails, go to step S2;

step S2, judging whether the number of times of the comparison failure reaches a preset first threshold value, returning a prompt message of the task execution failure when the number of times of the comparison failure reaches the preset first threshold value, and ending the operation, otherwise, executing step S3;

step S3, when the time interval between the comparison operation and the previous comparison operation reaches a preset second threshold, re-executing step S1;

and step S4, outputting the position information of the candidate control matched with the target control, and ending the operation.

According to the method for controlling the application by the voice, provided by the invention, before the step of receiving the voice instruction sent by the user, the method further comprises the following steps:

monitoring the interface of the target application, and determining that a control contained in the interface of the target application at the current moment is a candidate control after the interface of the target application is changed;

scanning the candidate control to obtain the identification information and the position information of the candidate control;

and obtaining a candidate control information set according to the identification information and the position information of the candidate control.

The embodiment of the second aspect of the present invention provides a method for controlling an application through a voice, which is applied to a cloud server, and the method includes:

receiving a search keyword; the search keyword is determined according to a voice instruction sent by a user and a target application, wherein the target application is an application which runs at the forefront of the electronic equipment at the current moment;

determining identification information of a target control from a preset control identification information set of the target application according to the search keyword; the target control is determined according to the voice instruction;

sending the identification information of the target control so that the electronic equipment can judge whether the candidate control information set contains the position information of the target control or not according to the identification information of the target control and a preset candidate control information set; the candidate control information comprises identification information and position information of a candidate control, and the candidate control is a control contained in an interface of the target application at the current moment; and when the candidate control information set contains the position information of the target control, the electronic equipment generates a simulated click behavior according to the position information of the target control.

According to the method for controlling the application through the voice, provided by the invention, the search keyword comprises an intention text of a voice instruction and an identification of a target application; the control identification information at least comprises text information of the control;

correspondingly, the determining the identification information of the target control from the preset control identification information set of the target application according to the search keyword includes:

determining a control identification information set of the target application according to the identification of the target application;

matching the intention text of the voice instruction with the text information of each control in the control identification information set of the target application;

and taking the control matched with the intention text of the voice instruction as a target control, and determining the identification information of the target control according to the control identification information set of the target application.

According to the method for controlling an application by voice provided by the present invention, before the step of receiving a search keyword, the method further comprises:

scanning all controls in the application to obtain identification information of the controls;

and storing the identification information of the controls belonging to the same application in a centralized manner to obtain an application control identification information set.

An embodiment of a third aspect of the present invention provides a system for controlling an application by voice, including: the electronic equipment is in communication connection with the cloud server;

the electronic device is used for executing the steps of the method for controlling the application through the voice according to the embodiment of the first aspect of the invention;

the cloud server is configured to execute the steps of the method for controlling an application through voice according to the embodiment of the second aspect of the present invention.

In a fourth aspect, the present invention provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the method for controlling an application by voice according to the first aspect of the present invention or implements the steps of the method for controlling an application by voice according to the second aspect of the present invention when executing the program.

A fifth aspect embodiment of the present invention provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of a method for controlling an application by speech as described in the first aspect embodiment of the present invention, or performs the steps of a method for controlling an application by speech as described in the second aspect embodiment of the present invention.

According to the method, the device, the electronic equipment and the storage medium for controlling the application through the voice, provided by the embodiment of the invention, the search keyword is determined through the voice command of the user, the identification information of the target control is determined according to the search keyword, and the position information of the target control is further obtained, so that the simulated click behavior is generated, the function of the voice control is decoupled, the voice control can be realized without integrating a Software Development Kit (SDK) for the voice control into the application by an application developer, and the development cost of the voice control function is greatly reduced.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

FIG. 1 is a flow chart of a method for controlling an application through speech according to an embodiment of the present invention;

FIG. 2 is a flow chart of a method for controlling an application through speech according to another embodiment of the present invention;

FIG. 3 is a schematic diagram of an electronic device in a system for controlling an application through speech provided by the present invention;

fig. 4 is a schematic diagram of a cloud server in a system for controlling an application through voice according to the present invention;

fig. 5 is a schematic physical structure diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In a plurality of fields such as automobile driving and home life, there is a wide demand for controlling applications in a voice manner. However, for most applications at present, application developers do not integrate a voice control mode in the applications, which limits the applicable scenarios of the applications.

In the IOS system of Apple inc, although voice interaction via Siri is provided, for example, a user lets Apple Music play Music via Siri. However, Siri can only control functions developed by apple company, and for applications developed by third-party companies, Siri can only open applications, and cannot further operate functions inside the applications. If more operations need to be performed through the Siri, developers of third-party applications need to access the Siri interface by themselves to complete a large amount of access development, which is time-consuming and labor-consuming.

In an Android (Android) system, on the premise that the SDK for voice operation is not integrated in advance in the application, a function capable of directly controlling the application does not exist.

In order to implement voice control for an application without a voice control function, an embodiment of the present invention provides a method for controlling an application through voice.

Fig. 1 is a flowchart of a method for controlling an application by voice according to an embodiment of the present invention, and as shown in fig. 1, the method for controlling an application by voice according to an embodiment of the present invention is applied to an electronic device, and includes:

step 101, receiving a voice instruction sent by a user, and determining a search keyword according to the voice instruction and a target application.

In this embodiment, the electronic device is an electronic device having a voice receiving function and an intelligent operating system, such as a mobile phone, a tablet computer, a notebook computer, a palm computer, a vehicle-mounted terminal, a wearable device, an ultra-mobile personal computer (UMPC), a netbook, or a Personal Digital Assistant (PDA), and the embodiment of the present application is not particularly limited.

In this embodiment, the voice instruction refers to an instruction issued by a user in a voice manner. The voice command may be a wake-free voice command or a wake-up voice command, and the embodiment of the present application is not particularly limited.

In this embodiment, the target application refers to an application currently running at the frontmost end of the electronic device. For example, an application a, an application B, and an application C are running on a smartphone at the same time. The application B is an application currently used by a user, the interface of the application B is the current interface of the smart phone, and the application A and the application C run in the background of the smart phone. Then application B is the target application in question.

The method further comprises the following steps:

acquiring an identifier of the target application;

In the present embodiment, the intention text (targetText) of the voice command is text information that can reflect the user intention included in the voice command. The intended text of the voice command can be obtained by performing semantic recognition on the voice command.

For example, the user says a sentence "please find me what good movie has been recently". By performing semantic recognition on the voice command, auxiliary expressions such as mood auxiliary words and quantifier words in the voice command can be removed, and an intention text capable of describing core meaning expression of the user, such as 'finding a film', can be obtained.

How to semantically recognize the voice instruction is common knowledge of those skilled in the art, and is not further described here.

Each application has a unique identifier that is used to refer to the application. As in the android operating system, the Package Name (Package Name) is used as the unique identifier of the android application.

The target application can be determined according to the current running condition of the electronic equipment, and then the unique identifier of the target application is obtained. Combining the unique identification of the target application with the intended text of the voice instruction may generate a search keyword.

And 102, sending the search keyword to a cloud server so that the cloud server determines identification information of a target control from a preset control identification information set of a target application according to the search keyword, wherein the target control is determined according to the voice instruction.

In this embodiment, all elements in the application can be regarded as controls, and for example, a button, a picture, a paragraph, a segment of text, an input box, a drop-down box, and the like in the application can be regarded as controls.

In this embodiment, the control identification information may include text information of the control, identification information of the control, and type information of the control. For example, in a video playing application, a plurality of columns such as a television show, a movie, a variety, an animation, and the like are included. On the home page of the video playing application, each column corresponds to a button, and the page of the corresponding column can be accessed by clicking the button. In this example, any one of the buttons may serve as a control. The text information on the button, such as 'movie', is the text information of the control; a unique marking character string corresponding to the button, such as 123a556d67332&, is identification information of the control; the type of "button" is the type information of the control.

As will be readily understood by those skilled in the art, an application includes a plurality of controls, each having respective control-identifying information. Therefore, in this embodiment, the control identification information of all the controls in the same application is stored in the same control identification information set. That is, one control identification information set contains control identification information of all controls in the same application.

The cloud server generally includes a control identification information set of each of the plurality of applications. It has been mentioned in the previous description that the search keywords comprise the unique identification of the target application and the intended text of the voice instruction. Therefore, the cloud server can find the control identification information set corresponding to the target application according to the unique identifier of the target application. And then comparing the intention text of the voice instruction with the text information of each control in the searched control identification information set, wherein when the intention text of the voice instruction is consistent with the text information of a certain control, the control is the target control. The target control is actually the control that the user desires to control through voice instructions. After the target control is determined, the identification information of the target control can be determined by the control identification information set of the target application.

It should be noted that the correspondence described in this embodiment is not limited to that two pieces of text information are completely identical, and may be that one piece of text information is a subset of another piece of text information, or that the text similarity of two pieces of text information is above a preset threshold. How to calculate the text similarity is common knowledge in the art and is not further described here.

For example, the application B as the target application has a plurality of buttons, in which the text information of the button a is a drama, the text information of the button B is a movie, the text information of the button c is a variety, and the text information of the button d is an animation. A control identification information set corresponding to the target application B is preset in the cloud server, and the control identification information set comprises control identification information corresponding to a button a, a button B, a button c and a button d. When the intention text of the voice instruction is 'find movie', comparing the intention text with the text information of each control in the control identification information set of the target application B, and finding that the text information of the button B is consistent with the intention text of the voice instruction, so that the button B is used as the target control, and the identification information of the button B is further obtained.

The identification information of the target control determined by the cloud server can be returned to the electronic equipment.

103, receiving the identification information of the target control, and judging whether the candidate control information set contains the position information of the target control according to the identification information of the target control and a preset candidate control information set.

In this embodiment, the candidate control refers to a control included in the interface of the target application at the current time. One skilled in the art will readily appreciate that an application will typically include multiple interfaces. For example, in a video playing application, the user can enter the interface of the home page first after clicking, and if one button in the home page is clicked, the user can enter the interface of the sub-page, for example, the user can enter the interface of the movie column by clicking the button of the movie column in the home page. The interface displayed to the user at the current moment of the target application is called a current interface, and the current interface of the target application changes along with the operation of the user.

The controls contained in different interfaces are different, for example, the interface of the home page of the video playing application contains a plurality of controls such as a television program column clicking button, a movie column clicking button, a comprehensive art column clicking button and an animation column clicking button; the movie column interface includes a plurality of controls such as an inland movie column click button, a harbor movie column click button, a japanese and korean movie column click button, an europe and america movie column click button, an action sheet column click button, and a love sheet column click button. When the interface of the target application changes, the control corresponding to the candidate control also changes.

In this embodiment, the candidate control information includes identification information and position information of the candidate control.

The identification information has been explained in the foregoing description, and generally includes text information, identification information, and type information, for example. The meaning of these information is not described repeatedly here.

The position information of the candidate control is used for describing the position of the candidate control in the interface, and generally includes an x coordinate and a y coordinate of an origin of the candidate control, a width value and a height value of the candidate control, and the like. In the prior art, the upper left corner of the default interface is the origin of the coordinate system (x =0, y = 0), and the uppermost point on the left side of the candidate control is taken as the origin. And combining the x coordinate and the y coordinate of the origin of the candidate control and the width value and the height value of the candidate control to determine the position of the candidate control in the interface.

Optionally, the candidate control information further includes visibility identification information. The visibility identification information is used for describing whether the candidate control is visible in the interface.

In this embodiment, the candidate control information set includes information of controls included in the current interface of the target application. When the current interface of the target application changes, the control information in the candidate control information set also changes correspondingly.

Since the current interface of the target application does not necessarily contain the target control that the user wishes to invoke, in this step, it is necessary to determine whether the target control exists in the candidate controls according to the identification information of the target control and the candidate control information set, and if so, the position information of the target control is acquired from the candidate control information set.

The judging whether the candidate control information set contains the position information of the target control or not according to the identification information of the target control and a preset candidate control information set comprises the following steps:

Since the content in the candidate control information set may change along with the interface change of the target application, the step S1 may be performed again after a certain time (the second threshold), for example, 2 seconds. If the number of times of the comparison operation exceeds a preset first threshold value, for example, after 3 times of comparison operation, the task still fails, and prompt information of task execution failure can be returned to the user.

It should be noted that, when comparing the text information of the target control with the text information of each candidate control in the candidate control information set one by one, in order to improve efficiency, the text information may be used as a regular expression, and a pattern.

And step 104, when the candidate control information set contains the position information of the target control, generating a simulated click behavior according to the position information of the target control.

As can be known from the foregoing description, the candidate control information includes, in addition to the identification information of the control, the position information of the control, so that when the identification information of the target control is consistent with the identification information of a certain candidate control, the position information of the target control can be obtained from the candidate control information set.

In the foregoing description, it has been mentioned that the position information of the control includes an x coordinate and a y coordinate of the origin of the control, a width value and a height value of the control, and the like. The position information of the center point of the target control on the current interface of the target application can be calculated according to the position information of the target control, the position information of the center point of the target control on the current interface of the target application and a value corresponding to click operation (click) are packaged into a MotionEvent data packet, and the MotionEvent data packet is sent out through a screen drive of the electronic equipment, so that the behavior of simulating click can be realized.

The method for controlling the application through the voice determines the search keyword through the voice command of the user, determines the identification information of the target control according to the search keyword, and further obtains the position information of the target control, thereby generating the simulated click behavior, decoupling the function of the voice control, realizing the voice control without integrating a Software Development Kit (SDK) for the voice control into the application by an application developer, and greatly reducing the development cost of the voice control function.

Based on any of the above embodiments, in this embodiment, before the step of receiving a voice instruction issued by a user, the method further includes:

In the previous embodiment of the present invention, the candidate control information set is preset. In this embodiment, a generation process of the candidate control information set is explained.

First, the interface of the target application is monitored in real time.

In this embodiment, a separate Service may be run in the background of the electronic device, and the Service monitors changes in any third-party application interface other than itself, particularly changes in the target application.

Wherein the change includes but is not limited to the operation that the application is clicked, slid, pressed for a long time, switched by the window, and the like.

And then, when the interface of the target application changes, triggering the Service to perform control scanning on the target application interface once, and acquiring data of all controls on the current interface of the target application in the scanning process. And generating independent data object NodeInfo for each control, wherein the data object NodeInfo at least comprises identification information and position information of the control. The identification information generally includes text information, identification information, and type information. The position information of the control comprises an x coordinate and a y coordinate of the origin of the control, a width value and a height value of the control and the like. Optionally, the data object NodeInfo further includes visibility identification information.

Finally, all data objects NodeInfo are stored in the NodeInfoList linked list of the memory.

Since the control contained in the interface of the target application at the current time is the candidate control, the obtained NodeInfoList linked list of the memory is also the candidate control information set.

The method for controlling the application through the voice can discover the change of the target application interface in time by monitoring the interface of the target application, establish a control information set for the control contained in the interface of the target application at the current moment, help to determine the position information of the target control, generate the simulated click behavior, decouple the function of the voice control, realize the voice control without integrating a Software Development Kit (SDK) for the voice control into the application by an application developer, and greatly reduce the development cost of the voice control function.

Fig. 2 is a flowchart of a method for controlling an application through voice according to another embodiment of the present invention, as shown in fig. 2, the method for controlling an application through voice according to another embodiment of the present invention is applied to a cloud server, and the method includes:

step 201, receiving a search keyword; the search keyword is determined according to a voice instruction sent by a user and a target application, wherein the target application is an application which runs at the forefront of the electronic equipment at the current moment.

In this embodiment, the target application refers to an application currently running at the frontmost end of the electronic device.

Performing semantic recognition on the voice instruction to obtain an intention text of the voice instruction; in conjunction with the identification of the target application, a search keyword may be generated.

Step 202, determining identification information of the target control from a preset control identification information set of the target application according to the search keyword.

In this embodiment, the control identification information may include text information of the control, identification information of the control, and type information of the control.

Step 203, sending the identification information of the target control, so that the electronic device judges whether the candidate control information set contains the position information of the target control according to the identification information of the target control and a preset candidate control information set; the candidate control information comprises identification information and position information of a candidate control, and the candidate control is a control contained in an interface of the target application at the current moment; and when the candidate control information set contains the position information of the target control, the electronic equipment generates a simulated click behavior according to the position information of the target control.

In this embodiment, the candidate control refers to a control included in the interface of the target application at the current time.

In this embodiment, the candidate control information includes identification information and position information of the candidate control. Optionally, the candidate control information further includes visibility identification information.

After the position information of the target control is obtained, the position information of the central point of the target control on the current interface of the target application can be calculated, the position information of the central point of the target control on the current interface of the target application and a value corresponding to click operation (click) are packaged into a MotionEvent data packet, and the MotionEvent data packet is sent out through a screen drive of the electronic equipment, so that the behavior of simulating click can be realized.

Based on any of the above embodiments, in this embodiment, before the step of receiving the search keyword, the method further includes:

In the foregoing embodiment, the control identification information set of the target application is preset, and in this embodiment, a generation process of the control identification information set of the application is described.

Firstly, scanning all controls in any application, and collecting identification information of the controls in the scanning process, wherein the identification information comprises text information (text, character string array type), identification information (viewId, character string array type) and type information (className, character string array type) of the controls. In addition, it is also necessary to know the application to which the control belongs, such as identification information of the application (e.g., packageaname in android application).

Since the application to which the control belongs can be known in the scanning process, the identification information of the controls that can belong to the same application is stored in a centralized manner, for example, the identification information of the controls of one application is stored in an array.

Since the applications used by the electronic device are various, in order to support implementation of the method of the present invention, it is necessary to perform the above processing on as many applications as possible, and the obtained control identification information sets of the applications are uniformly stored in the cloud server.

The method for controlling the application through the voice can pre-establish the identification information set of the control through scanning and data acquisition of the control in the application, is beneficial to determining the identification information of the target control and further determining the position information of the target control, thereby generating the simulated click behavior, decoupling the function of voice control, realizing the voice control without integrating a Software Development Kit (SDK) for the voice control into the application by an application developer, and greatly reducing the development cost of the voice control function.

Based on any of the above embodiments, the present invention further provides a system for controlling an application by voice, including: the electronic equipment is in communication connection with the cloud server.

Fig. 3 is a schematic diagram of an electronic device in a system for controlling an application through voice according to the present invention, and as shown in fig. 3, the electronic device includes:

a search keyword determining module 301, configured to receive a voice instruction sent by a user, and determine a search keyword according to the voice instruction and a target application; the target application is an application which runs at the forefront end of the electronic equipment at the current moment;

a search keyword sending module 302, configured to send the search keyword to a cloud server, so that the cloud server determines, according to the search keyword, identification information of a target control from a preset control identification information set of a target application, where the target control is determined according to the voice instruction;

a target control position information judgment module 303, configured to receive identification information of the target control, and judge, according to the identification information of the target control and a preset candidate control information set, whether the candidate control information set includes position information of the target control; the candidate control information comprises identification information and position information of a candidate control, and the candidate control is a control contained in an interface of the target application at the current moment;

and a simulated click behavior generation module 304, configured to generate a simulated click behavior according to the position information of the target control when the candidate control information set includes the position information of the target control.

According to the system for controlling the application through the voice, provided by the embodiment of the invention, the electronic equipment determines the search keyword through the voice command of the user, determines the identification information of the target control according to the search keyword, and further obtains the position information of the target control, so that the simulated click behavior is generated, the function of the voice control is decoupled, the voice control can be realized without an application developer integrating a Software Development Kit (SDK) for the voice control into the application, and the development cost of the voice control function is greatly reduced.

Fig. 4 is a schematic diagram of a cloud server in the system for controlling an application through voice according to the present invention, and as shown in fig. 4, the cloud server includes:

a search keyword receiving module 401, configured to receive a search keyword; the search keyword is determined according to a voice instruction sent by a user and a target application, wherein the target application is an application which runs at the forefront of the electronic equipment at the current moment;

a target control identification information determining module 402, configured to determine identification information of a target control from a preset control identification information set of a target application according to the search keyword; the target control is determined according to the voice instruction;

a target control identification information sending module 403, configured to send identification information of the target control, so that the electronic device determines, according to the identification information of the target control and a preset candidate control information set, whether the candidate control information set includes position information of the target control; the candidate control information comprises identification information and position information of a candidate control, and the candidate control is a control contained in an interface of the target application at the current moment; and when the candidate control information set contains the position information of the target control, the electronic equipment generates a simulated click behavior according to the position information of the target control.

According to the system for controlling the application by the voice, provided by the embodiment of the invention, the cloud server determines the search keyword through the voice instruction of the user, determines the identification information of the target control according to the search keyword, and further obtains the position information of the target control, so that the simulated click behavior is generated, the function of the voice control is decoupled, the voice control can be realized without an application developer integrating a Software Development Kit (SDK) for the voice control into the application, and the development cost of the voice control function is greatly reduced.

Fig. 5 is a schematic physical structure diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 5, the electronic device may include: a processor (processor)510, a communication Interface (Communications Interface)520, a memory (memory)530 and a communication bus 540, wherein the processor 510, the communication Interface 520 and the memory 530 communicate with each other via the communication bus 540. Processor 510 may call logic instructions in memory 530 to perform the following method:

Or performing the following method:

Furthermore, the logic instructions in the memory 530 may be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In another aspect, an embodiment of the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program is implemented by a processor to perform the method provided by the foregoing embodiments, for example, including:

Or for example, include:

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A method for controlling an application through voice is applied to an electronic device, and the method comprises the following steps:

2. The method for controlling an application through voice according to claim 1, wherein the determining a search keyword according to the voice instruction and a target application comprises:

acquiring an identifier of the target application;

3. The method of claim 1, wherein the recognition information includes text information, identification information, and type information;

4. A method for controlling an application by speech according to any of claims 1 to 3, characterised in that before the step of receiving a speech instruction issued by a user, the method further comprises:

5. A method for controlling application through voice is applied to a cloud server, and comprises the following steps:

6. The method of claim 5, wherein the search keyword comprises an intention text of a voice instruction and an identification of a target application; the control identification information at least comprises text information of the control;

7. The method of controlling an application by speech according to claim 5 or 6, characterised in that before the step of receiving a search keyword, the method further comprises:

8. A system for controlling an application by speech, comprising: the electronic equipment is in communication connection with the cloud server;

the electronic device for performing the steps of the method of controlling an application by speech according to any one of claims 1 to 4;

the cloud server configured to perform the steps of the method for controlling an application by voice according to any one of claims 5 to 7.

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method of controlling an application by speech according to any one of claims 1 to 4 or the steps of the method of controlling an application by speech according to any one of claims 5 to 7 when executing the program.

10. A non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of controlling an application by speech according to any one of claims 1 to 4, or the steps of the method of controlling an application by speech according to any one of claims 5 to 7.