CN117992150A

CN117992150A - Simulated clicking method and electronic equipment

Info

Publication number: CN117992150A
Application number: CN202211379904.0A
Authority: CN
Inventors: 张淑庆; 曹林; 张庭玉
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2022-11-04
Filing date: 2022-11-04
Publication date: 2024-05-07
Also published as: WO2024093993A1

Abstract

The embodiment of the invention provides a simulated clicking method and electronic equipment. The method comprises the following steps: according to a voice instruction input by a user, at least one target instruction is obtained, wherein the target instruction comprises target control semantics and operation content; determining a first target control from the acquired screen controls according to the target control semantics; and executing the simulated clicking operation according to the operation content and the first target control. The embodiment of the invention is applied to the field of artificial intelligence (ARTIFICIAL INTELLIGENCE, AI), the target control is matched through control semantics, the problem that the target control cannot be positioned due to update of an application version is avoided, the simulated clicking method can be compatible with the application version before and after update, and the electronic equipment can execute correct simulated clicking operation without re-adapting UX layout change of the target application, so that maintenance cost is reduced, and user experience is improved.

Description

Simulated clicking method and electronic equipment

Technical Field

The embodiment of the invention relates to the field of artificial intelligence, in particular to a click simulation method and electronic equipment.

Background

As the internet evolves faster and faster, more and more electronic devices support analog click functions. In the related art, there are two kinds of analog clicking schemes as follows: in the scheme, the electronic equipment records the coordinates and the operation intervals of the target control, and repeatedly executes the target control by using the recorded coordinates and the operation intervals so as to achieve the effect of simulating clicking; in another scheme, the electronic device obtains a page component tree through the barrier-free service (AccessibilityService), matches the ID or text of the control by using the page component tree to match the target control, and performs simulated clicking through the obtained coordinates of the target control or invokes the target control to perform simulated clicking.

However, if the version of the third party Application (APP) on the electronic device is updated, the layout of the user experience (User Experience, UX) of the third party APP is changed, and the simulated click scheme in the related art cannot be implemented, so that the user experience is affected. For example, ifWhen the version of the (1) is updated, UX layout adjustment occurs, so that a user cannot use the current 'WeChat' simulation click skill; meanwhile, for a user-defined shortcut operation scene, the user can also fail due to the change of the UX layout, and further the user cannot use the corresponding simulated click skills.

In summary, the simulated click scheme in the related art cannot be compatible with the frequently updated third party APP, and the electronic device needs to adapt to the UX layout of the updated third party APP again to execute the simulated click operation, so that the maintenance cost is increased, and the user experience is reduced.

Disclosure of Invention

In view of this, the embodiment of the invention provides a method and an electronic device for simulating clicking, which are used for reducing maintenance cost when an application version is updated and improving user experience.

A first aspect provides a method of simulating clicking, the method being applied to an electronic device, the method comprising:

according to a voice instruction input by a user, at least one target instruction is obtained, wherein the target instruction comprises target control semantics and operation content;

determining a first target control from the acquired screen controls according to the target control semantics;

And executing the simulated clicking operation according to the operation content and the first target control.

In one possible implementation manner, the voice command input by the user obtains at least one target command, including:

Converting the voice instruction into a voice stream;

The voice stream is sent to a server, so that the server determines the user intention according to the voice stream, and determines an instruction sequence matched with the user intention from a set atomic instruction set, wherein the instruction sequence comprises at least one target instruction;

And receiving the instruction sequence returned by the server.

In such an implementation, the atomic instruction set is provided on the server side, and the server can support an atomic instruction set with a larger data volume and can support a more complex calculation process.

In one possible implementation manner, the obtaining at least one target instruction according to the voice instruction input by the user includes:

Converting the voice instruction into a voice stream;

determining the intention of a user according to the voice stream;

And determining an instruction sequence matched with the user intention from a set atomic instruction set according to the user intention, wherein the instruction sequence comprises at least one target instruction.

In the implementation mode, the atomic instruction set is arranged on the side of the electronic equipment, so that the acquisition process of the instruction sequence can be realized locally on the electronic equipment, and the instruction interaction with a server is not needed. In a possible implementation manner, the determining, according to the target control semantics, the first target control from the acquired screen controls includes:

carrying out semantic recognition on the screen control to generate screen control semantics of the screen control;

And determining the first target control from the screen control according to the screen control semantics and the target control semantics, wherein the screen control semantics of the first target control are the same as the target control semantics.

In the implementation mode, the first target control is determined through the control semantics, so that the accurate selection of the first target control is realized, and the follow-up completion of the simulated clicking operation is ensured.

In one possible implementation manner, the performing semantic recognition on the screen control to generate the screen control semantic of the screen control includes:

Traversing the acquired screen control tree to acquire sub-nodes, wherein each sub-node corresponds to one screen control, and the sub-node comprises a class path;

inquiring control semantics corresponding to the class path from a set control semantics pool according to the class path;

and determining the control semantics corresponding to the class path as the screen control semantics.

In the implementation mode, the control semantics can be quickly and accurately queried through setting the control semantics pool, and the screen control semantics can be accurately determined through the class path.

In one possible implementation manner, the performing a simulated click operation according to the operation content and the first target control includes:

and if the number of the first target controls is one, executing the operation on the first target controls according to the operation content.

In the implementation manner, the result of the first target control is unique, and screening of the target control is not needed, so that the positioning efficiency of the target control is improved.

In one possible implementation manner, the target instruction further includes target control attribute information, and the performing, according to the operation content and the first target control, a simulated click operation includes:

If the number of the first target controls is multiple, screening a second target control from the multiple first target controls according to the target control attribute information and the acquired screen control attribute information of the first target controls, wherein the screen control attribute information of the second target control is matched with the target control attribute information;

and executing the simulated clicking operation according to the operation content and the second target control.

In the implementation manner, as the first target control is not unique, the plurality of first target controls can be filtered through the control attribute information, so that more accurate screening of the target controls is realized, the screening quantity of the target controls is further reduced, the calculation amount of the subsequent target control positioning process is reduced, and the positioning efficiency of the target controls is further improved.

if the number of the first target controls is multiple, acquiring a human eye gazing area of a user, wherein the human eye gazing area is an area gazed by the user on a current screen interface;

According to the eye gaze area, a designated target control is selected from a plurality of first target controls, wherein the designated target control is the first target control positioned in the eye gaze area;

and if the number of the designated target controls is one, executing the operation on the designated target controls according to the operation content.

In the implementation mode, the first target control is filtered through the obtained eye gazing area, so that the positioning range of the target control is narrowed; if a designated target control is filtered out directly, the simulated clicking operation can be performed directly, so that the positioning efficiency of the target control is improved.

In one possible implementation, the target instruction further includes target control attribute information, and the method further includes:

If the number of the appointed target controls is multiple, screening a second target control from the multiple appointed target controls according to the target control attribute information and the acquired screen control attribute information of the appointed target controls, wherein the screen control attribute information of the second target control is matched with the target control attribute information;

In the implementation manner, if one appointed target control cannot be filtered out directly, namely, when a plurality of appointed target controls are filtered out, the screening quantity of the target controls can be further reduced, so that the calculation amount of the subsequent target control positioning process is reduced, and the positioning efficiency of the target controls is further improved.

In one possible implementation manner, the performing a simulated click operation according to the operation content and the second target control includes:

and if the number of the second target controls is one, executing the operation on the second target controls according to the operation content.

In the implementation mode, the result of the second target control is unique, and screening of the target control is not needed, so that the positioning efficiency of the target control is improved.

if the number of the second target controls is multiple, displaying prompt information of the second target controls;

acquiring a first control selection instruction input by a user according to the prompt information of the second target control;

according to the first control selection instruction, selecting a third target control from a plurality of second target controls;

And executing the operation on the third target control according to the operation content.

In the implementation mode, the user can conveniently and rapidly select the third target control by setting the prompt information of the second target control, so that the positioning efficiency of the target control is improved.

If the number of the first target controls is multiple, according to the target control attribute information and the acquired screen control attribute information of the first target controls, second target controls are not screened out from the multiple first target controls, and prompt information of the first target controls is displayed;

Acquiring a second control selection instruction input by a user according to the prompt information of the first target control;

According to the second control selection instruction, selecting a third target control from a plurality of first target controls;

In the implementation mode, the user can conveniently and rapidly select the third target control by setting the prompt information of the first target control, so that the positioning efficiency of the target control is improved.

A second aspect provides an electronic device comprising: a display screen; one or more processors; a memory; and one or more computer programs, wherein the one or more computer programs are stored in the memory, the one or more computer programs comprising instructions that, when executed by the electronic device, cause the electronic device to perform the first aspect or any of the possible implementations of the first aspect to simulate a click method.

A third aspect provides a computer readable storage medium comprising a stored program, wherein the program, when run, controls an electronic device in which the computer readable storage medium is located to perform the first aspect or any one of the possible implementations of the first aspect to simulate a clicking method

A fourth aspect provides a computer program product comprising instructions which, when run on a computer or any of the at least one processors, cause the computer to perform the method of simulating a click of the first aspect or any of the possible implementations of the first aspect.

According to the technical scheme provided by the embodiment of the invention, the target control is matched through the control semantics, so that the problem that the target control cannot be positioned due to update of the application version is avoided, the simulated clicking method can be compatible with the application version before and after update, the electronic equipment can execute correct simulated clicking operation without re-adapting UX layout change of the target application, and therefore, the maintenance cost is reduced, and the user experience is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of the hardware architecture of an electronic device in some embodiments;

FIG. 2 is a block diagram of the software architecture of an electronic device in some embodiments;

FIG. 3 is a flow chart of a simulated click method in some embodiments;

FIG. 4a is a flow diagram of acquiring screen control semantics in some embodiments

FIG. 4b is a flow diagram of performing a simulated click operation in some embodiments;

FIG. 4c is a flow chart of performing a simulated click operation in other embodiments;

FIG. 5a is a flow chart of a simulated click method in other embodiments;

FIG. 5b is a flow chart of instructions executed in other embodiments;

FIG. 6 is a schematic diagram of a structure of an analog pointing device in some embodiments;

Fig. 7 is a schematic diagram of the structure of a clicking unit in some embodiments.

Detailed Description

For a better understanding of the technical solution of the present invention, the following detailed description of the embodiments of the present invention refers to the accompanying drawings.

It should be understood that the described embodiments are merely some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be understood that the term "and/or" as used herein is merely one way of describing an association of associated objects, meaning that there may be three relationships, e.g., a and/or b, which may represent: the first and second cases exist separately, and the first and second cases exist separately. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship.

The simulated clicking is to complete some automatic clicking operations through scripts and system instructions, so that manual clicking is not needed. In the related art, the simulated click can be applied to various application scenes such as automatic test or barrier-free service.

A related art provides a method of simulating a click control to implement simulated clicking. The method comprises the following steps: the electronic equipment registers barrier-free service to the system, and operation interface information to be monitored is set in the registration information of AccessibilityService; when the electronic equipment monitors that the information of the operation interface changes, the electronic equipment sends trigger information to AccessibilityService; the electronic equipment receives AccessibilityService a control view of the operation interface obtained according to the trigger information, and calls a first designated function according to the control view to traverse a root view in the operation interface; and the electronic equipment acquires the target view according to the root view, and realizes the simulated clicking of the control on the target view through the second designated function. The scheme is that the characteristic information of the target view is matched with the characteristic information of the sub-view under the root view to position the target view, wherein the characteristic information can comprise class names of the target view, control information of the target view, display text information of the target view and the like. However, in the related art solution, once the version of the target application is updated, the feature information of the target view changes, and the preset feature information cannot be compatible with the changed feature information, so that the electronic device cannot execute the simulated click.

Another related art provides a method of human-computer interaction to implement simulated clicking. The method comprises the following steps: acquiring current interface content information in the process of running a man-machine interaction application in the electronic equipment; determining one or more controls on the interface according to the interface content information, wherein the one or more controls comprise one or more of buttons, icons, pictures and characters; acquiring a voice instruction of a user; matching a target control from one or more controls according to the voice instruction; and determining the user intention according to the voice instruction, and responding to the user intention to execute the operation on the target control. In the related art, once the version of the target application is updated, the interface content information is changed or refreshed, and the interface content information is changed, so that the target control cannot be matched according to the original voice instruction of the user, and the electronic device cannot execute the simulated clicking operation on the target control.

In the above related art solution, after the APP to be clicked is updated, a unique identifier of a control in a User Interface (UI), for example, a text, an ID, or a path (path) changes, which results in that according to an execution action defined in a preset simulation execution script, a target control cannot be found, so that the simulation clicking operation cannot be performed, that is, the simulation clicking scheme cannot be compatible with application versions before and after updating, and the electronic device needs to adapt to the UX layout change of the APP again, so that the simulation clicking operation can be performed correctly, thereby improving maintenance cost and reducing User experience.

In order to solve the technical problems, the embodiment of the invention provides electronic equipment. The hardware structure of the electronic device performing the simulated click method is described in detail below with reference to fig. 1. In some embodiments, the electronic device includes, but is not limited to, an on-board Or other operating system device. Electronic devices include, but are not limited to, cell phones, tablet computers, notebook computers, desktop computers, smart screens, wearable devices, and the like.

As shown in fig. 1, a schematic hardware structure of an electronic device is provided, and as shown in fig. 1, the electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (universal serial bus, USB) interface 130, a charge management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, a sensor module 180, a key 190, a motor 191, an indicator 192, a camera 193, a display 194, and a subscriber identity module (subscriber identification module, SIM) card interface 195, etc. It should be understood that the illustrated structure of the embodiment of the present invention does not constitute a specific limitation on the electronic device 100. In other embodiments of the invention, electronic device 100 may include more or fewer components than shown, or certain components may be combined, or certain components may be split, or different arrangements of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

The processor 110 may include one or more processing units, such as: the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (IMAGE SIGNAL processor, ISP), a controller, a video codec, a digital signal processor (DIGITAL SIGNAL processor, DSP), a baseband processor, and/or a neural-Network Processor (NPU), etc. Wherein the different processing units may be separate devices or may be integrated in one or more processors. The controller can generate operation control signals according to the instruction operation codes and the time sequence signals to finish the control of instruction fetching and instruction execution. A memory may also be provided in the processor 110 for storing instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. The memory may hold instructions or data that the processor 110 has just used or recycled. If the processor 110 needs to reuse the instruction or data, it can be called directly from the memory. Repeated accesses are avoided and the latency of the processor 110 is reduced, thereby improving the efficiency of the system.

In some embodiments, the processor 110 may include one or more interfaces. The interfaces may include an integrated circuit (inter-INTEGRATED CIRCUIT, I2C) interface, an integrated circuit built-in audio (inter-INTEGRATED CIRCUIT SOUND, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, a universal asynchronous receiver transmitter (universal asynchronous receiver/transmitter, UART) interface, a mobile industry processor interface (mobile industry processor interface, MIPI), a general-purpose input/output (GPIO) interface, a subscriber identity module (subscriber identity module, SIM) interface, and/or a universal serial bus (universal serial bus, USB) interface, among others. It should be understood that the interfacing relationship between the modules illustrated in the embodiments of the present invention is only illustrative, and is not meant to limit the structure of the electronic device 100. In other embodiments of the present invention, the electronic device 100 may also employ different interfacing manners in the above embodiments, or a combination of multiple interfacing manners.

The charge management module 140 is configured to receive a charge input from a charger. The charger can be a wireless charger or a wired charger. In some wired charging embodiments, the charge management module 140 may receive a charging input of a wired charger through the USB interface 130. In some wireless charging embodiments, the charge management module 140 may receive wireless charging input through a wireless charging coil of the electronic device 100. The charging management module 140 may also supply power to the electronic device through the power management module 141 while charging the battery 142.

The power management module 141 is used for connecting the battery 142, and the charge management module 140 and the processor 110. The power management module 141 receives input from the battery 142 and/or the charge management module 140 to power the processor 110, the internal memory 121, the display 194, the camera 193, the wireless communication module 160, and the like. The power management module 141 may also be configured to monitor battery capacity, battery cycle number, battery health (leakage, impedance) and other parameters. In other embodiments, the power management module 141 may also be provided in the processor 110. In other embodiments, the power management module 141 and the charge management module 140 may be disposed in the same device.

The wireless communication function of the electronic device 100 may be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, a modem processor, a baseband processor, and the like. The antennas 1 and 2 are used for transmitting and receiving electromagnetic wave signals. Each antenna in the electronic device 100 may be used to cover a single or multiple communication bands. Different antennas may also be multiplexed to improve the utilization of the antennas. For example: the antenna 1 may be multiplexed into a diversity antenna of a wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.

The mobile communication module 150 may provide a solution for wireless communication including 2G/3G/4G/5G/6G, etc. applied on the electronic device 100. The mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (low noise amplifier, LNA), etc. The mobile communication module 150 may receive electromagnetic waves from the antenna 1, perform processes such as filtering, amplifying, and the like on the received electromagnetic waves, and transmit the processed electromagnetic waves to the modem processor for demodulation. The mobile communication module 150 can amplify the signal modulated by the modem processor, and convert the signal into electromagnetic waves through the antenna 1 to radiate. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be disposed in the processor 110. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be provided in the same device as at least some of the modules of the processor 110. The modem processor may include a modulator and a demodulator. The modulator is used for modulating the low-frequency baseband signal to be transmitted into a medium-high frequency signal. The demodulator is used for demodulating the received electromagnetic wave signal into a low-frequency baseband signal. The demodulator then transmits the demodulated low frequency baseband signal to the baseband processor for processing. The low frequency baseband signal is processed by the baseband processor and then transferred to the application processor. The application processor outputs sound signals through an audio device (not limited to the speaker 170A, the receiver 170B, etc.), or displays images or video through the display screen 194. In some embodiments, the modem processor may be a stand-alone device. In other embodiments, the modem processor may be provided in the same device as the mobile communication module 150 or other functional module, independent of the processor 110.

The wireless communication module 160 may provide solutions for wireless communication including wireless local area network (wireless local area networks, WLAN) (e.g., wireless fidelity (WIRELESS FIDELITY, wi-Fi) network), bluetooth (BT), global navigation satellite system (global navigation SATELLITE SYSTEM, GNSS), frequency modulation (frequency modulation, FM), near field communication (NEAR FIELD communication, NFC), infrared (IR), etc., as applied to the electronic device 100. The wireless communication module 160 may be one or more devices that integrate at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2, modulates the electromagnetic wave signals, filters the electromagnetic wave signals, and transmits the processed signals to the processor 110. The wireless communication module 160 may also receive a signal to be transmitted from the processor 110, frequency modulate it, amplify it, and convert it to electromagnetic waves for radiation via the antenna 2.

In some embodiments, antenna 1 and mobile communication module 150 of electronic device 100 are coupled, and antenna 2 and wireless communication module 160 are coupled, such that electronic device 100 may communicate with a network and other devices through wireless communication techniques. The wireless communication techniques can include the Global System for Mobile communications (global system for mobile communications, GSM), general packet radio service (GENERAL PACKET radio service, GPRS), code division multiple access (code division multiple access, CDMA), wideband code division multiple access (wideband code division multiple access, WCDMA), time division code division multiple access (time-division code division multiple access, TD-SCDMA), long term evolution (long term evolution, LTE), 5G and subsequent evolution standards, BT, GNSS, WLAN, NFC, FM, and/or IR techniques, among others. The GNSS may include a global satellite positioning system (global positioning system, GPS), a global navigation satellite system (global navigation SATELLITE SYSTEM, GLONASS), a beidou satellite navigation system (Beidou navigation SATELLITE SYSTEM, BDS), a quasi zenith satellite system (quasi-zenith SATELLITE SYSTEM, QZSS) and/or a satellite based augmentation system (SATELLITE BASED AUGMENTATION SYSTEMS, SBAS).

The electronic device 100 implements display functions through a GPU, a display screen 194, an application processor, and the like. The GPU is a microprocessor for image processing, and is connected to the display 194 and the application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. Processor 110 may include one or more GPUs that execute instructions to generate or change display information. Electronic device 100 may implement shooting functionality through an ISP, one or more cameras 193, video codecs, a GPU, one or more display screens 194, an application processor, and the like.

The external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to enable expansion of the memory capabilities of the electronic device 100. The external memory card communicates with the processor 110 through an external memory interface 120 to implement data storage functions. For example, files such as music, video, etc. are stored in an external memory card.

The internal memory 121 may be used to store computer executable program code including instructions. The internal memory 121 may include a storage program area and a storage data area. The storage program area may store an application program (such as a sound playing function, an image playing function, etc.) required for at least one function of the operating system, etc. The storage data area may store data created during use of the electronic device 100 (e.g., audio data, phonebook, etc.), and so on. In addition, the internal memory 121 may include a high-speed random access memory, and may further include a nonvolatile memory such as at least one magnetic disk storage device, a flash memory device, a universal flash memory (universal flash storage, UFS), and the like. The processor 110 performs various functional applications of the electronic device 100 and data processing by executing instructions stored in the internal memory 121 and/or instructions stored in a memory provided in the processor.

The electronic device 100 may implement audio functions through an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, an application processor, and the like. Such as music playing, recording, etc.

The sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, a bone conduction sensor 180M, and the like. The touch sensor 180K is also referred to as a "touch device". The touch sensor 180K may be disposed on the display screen 194, and the touch sensor 180K and the display screen 194 form a touch screen, which is also called a "touch screen". The touch sensor 180K is for detecting a touch operation acting thereon or thereabout. The touch sensor may communicate the detected touch operation to the application processor to determine the touch event type. Visual output related to touch operations may be provided through the display 194. In other embodiments, the touch sensor 180K may also be disposed on the surface of the electronic device 100 at a different location than the display 194.

The SIM card interface 195 is used to connect a SIM card. The SIM card may be inserted into the SIM card interface 195, or removed from the SIM card interface 195 to enable contact and separation with the electronic device 100. The electronic device 100 may support 1 or N SIM card interfaces, N being a positive integer greater than 1. The SIM card interface 195 may support Nano SIM cards, micro SIM cards, and the like. The same SIM card interface 195 may be used to insert multiple cards simultaneously. The types of the plurality of cards may be the same or different. The SIM card interface 195 may also be compatible with different types of SIM cards. The SIM card interface 195 may also be compatible with external memory cards. The electronic device 100 interacts with the network through the SIM card to realize functions such as communication and data communication. In some embodiments, the electronic device 100 employs esims, i.e.: an embedded SIM card. The eSIM card can be embedded in the electronic device 100 and cannot be separated from the electronic device 100.

The software structure of the electronic device 100 performing the simulated click method is described in detail below with reference to fig. 2. Illustratively, as shown in FIG. 2, a software architecture block diagram of an electronic device 100 is provided. In the embodiment of the invention, taking an Android system with a layered architecture as an example, a software structure of the electronic device 100 is illustrated. The layered architecture divides the software into several layers, each with distinct roles and branches. The layers communicate with each other through a software interface. In some embodiments, the Android system is divided into four layers, from top to bottom, an application layer, an application framework layer, an Zhuoyun rows (Android runtime) and system libraries, and a kernel layer, respectively.

The application layer may include a series of application packages. As shown in FIG. 2, the application package may includeThe system comprises a target application, a control semantic module, bluetooth and other application programs, wherein the control semantic module is used for storing a control semantic pool. Further, applications may include camera, video, gallery, calendar, phone call, map, navigation, WLAN, bluetooth, music, short message, etc. applications. The following are/>, examples of the present inventionAnd the target application. As shown in fig. 2, in one possible implementation, a server 200 communicatively connected to the electronic device 100 is also shown in fig. 2, where the server 200 may be a cloud server, and the server 200 includes a voice recognition component, a user intent component, a session management component, and an atomic instruction module, where the atomic instruction module is configured to store an atomic instruction set.

Is an application program installed on the electronic device 100 and having a voice recognition function. As shown in the figure 2 of the drawings, An interface component and an instruction execution component can be included. Wherein the interface component can receive a voice command input by a user, convert the voice command into a voice stream and output the voice stream to the voice recognition component, the voice recognition component can recognize the voice stream input by the user to generate text information and output the text information to the user intention component, for example, the voice recognition component can be an automatic voice recognition (Automatic Speech Recognition, ASR) component; the user intent component may identify the text information to generate a user intent and output the user intent to the session management component, e.g., the user intent component may be a natural language understanding (Natural Language Understanding, NLU) component; the session management (dialog management, DM) component may send the user intent to the atomic instruction module; the principle instruction module obtains an instruction sequence through an atomic instruction set according to user intention and outputs the instruction sequence to the session management component, wherein the instruction sequence comprises at least one instruction; the session management component issues the instruction sequence to the interface component; the interface component receives the instruction sequence and outputs the instruction sequence to the instruction execution component; the instruction execution component can execute instructions in the instruction sequence to realize simulated click operation, and the instruction execution component can be particularly used for executing the instructions in the instruction sequence through the set control semantic pool.

In another possible implementation manner, the atomic instruction module may also be provided in the electronic device 100, that is, the atomic instruction set stored in the atomic instruction module is provided in the electronic device 100, where the application layer of the electronic device 100 includes the atomic instruction module is not specifically shown.

In another possible implementation, a voice recognition component, a user intention component, and a session management component may also be provided in the electronic device 100, where the application layer of the electronic device 100 includes the voice recognition component, the user intention component, and the session management component, which are not specifically illustrated. The target application is a third party application installed on the electronic device 100 or a native application installed in the electronic device 100 before shipment. For example, the third party application may includeEtc., the native application may include a camera, gallery, calendar, conversation, etc.

The application framework layer provides an application programming interface (application programming interface, API) and programming framework for the application of the application layer. The application framework layer includes a number of predefined functions. As shown in FIG. 2, the application framework layer may include a window manager, a content provider, a view system, a telephony manager, a resource manager, a notification manager, and the like. The window manager is used for managing window programs. The window manager can acquire the size of the display screen, judge whether a status bar exists, lock the screen, intercept the screen and the like. The content provider is used to store and retrieve data and make such data accessible to applications. The data may include video, images, audio, calls made and received, browsing history and bookmarks, phonebooks, etc. The view system includes visual controls, such as controls to display text, controls to display pictures, and the like. The view system may be used to build applications. The display interface may be composed of one or more views. For example, a display interface including a text message notification icon may include a view displaying text and a view displaying a picture. The telephony manager is used to provide the communication functions of the electronic device 100. Such as the management of call status (including on, hung-up, etc.). The resource manager provides various resources for the application program, such as localization strings, icons, pictures, layout files, video files, and the like. The notification manager allows the application to display notification information in a status bar, can be used to communicate notification type messages, can automatically disappear after a short dwell, and does not require user interaction. Such as notification manager is used to inform that the download is complete, message alerts, etc. The notification manager may also be a notification in the form of a chart or scroll bar text that appears on the system top status bar, such as a notification of a background running application, or a notification that appears on the screen in the form of a dialog window. For example, a text message is prompted in a status bar, a prompt tone is emitted, the electronic device vibrates, and an indicator light blinks, etc.

Android run time includes a core library and virtual machines. Android runtime is responsible for scheduling and management of the android system. The core library consists of two parts: one part is a function which needs to be called by java language, and the other part is a core library of android. The application layer and the application framework layer run in a virtual machine. The virtual machine executes java files of the application program layer and the application program framework layer as binary files. The virtual machine is used for executing the functions of object life cycle management, stack management, thread management, security and exception management, garbage collection and the like.

The system library may include a plurality of functional modules. For example: surface manager (surface manager), media Libraries (Media Libraries), three-dimensional graphics processing Libraries (e.g., openGL ES), 2D graphics engines (e.g., SGL), etc. The surface manager is used to manage the display subsystem and provides a fusion of 2D and 3D layers for multiple applications. Media libraries support a variety of commonly used audio, video format playback and recording, still image files, and the like. The media library may support a variety of audio and video encoding formats, such as MPEG4, h.264, MP3, AAC, AMR, JPG, PNG, etc. The three-dimensional graphic processing library is used for realizing three-dimensional graphic drawing, image rendering, synthesis, layer processing and the like. The 2D graphics engine is a drawing engine for 2D drawing.

The kernel layer is a layer between hardware and software. The inner core layer at least comprises a display driver, a camera driver, an audio driver and a sensor driver.

The workflow of the electronic device 100 software and hardware is illustrated below in connection with capturing a photo scene.

When touch sensor 180K receives a touch operation, a corresponding hardware interrupt is issued to the kernel layer. The kernel layer processes the touch operation into the original input event (including information such as touch coordinates, time stamp of touch operation, etc.). The original input event is stored at the kernel layer. The application framework layer acquires an original input event from the kernel layer, and identifies a control corresponding to the input event. Taking the touch operation as a touch click operation, taking a control corresponding to the click operation as an example of a control of a camera application icon, the camera application calls an interface of an application framework layer, starts the camera application, further starts a camera driver by calling a kernel layer, and captures a still image or video by the camera 193.

The embodiment of the invention can realize the simulated clicking function of the control through the electronic equipment shown in the figures 1 and 2. Before introducing the simulated clicking method in the embodiment of the invention, a description is first given of several application scenarios related to the embodiment of the invention.

In a possible application scenario, the simulated clicking method of the embodiment of the invention can be applied to an information transmission scenario, for example, the information transmission scenario can include transmissionInformation scene or transmission/>Information scenes, etc. Because the third party application does not have the function of a deep link (Deeplink), the sub-page cannot be directly accessed through the deep link, so that one-step direct of the voice function cannot be realized, and the process of sending information by a simulated user can be realized only by executing multiple simulated click operations. For example, a user passes through/>, installed on an electronic deviceBy applying the input voice command of sending WeChat to small instruction hello, the electronic device can sequentially execute the following simulated click operations: open WeChat-click search box-enter min-click search results-click send box-enter "hello" -click send.

In another possible application scenario, the simulated clicking method of the embodiment of the invention can be applied to a card punching scenario and a shopping scenario. In practical applications, the user needs to repeatedly perform some fixed clicking operations on the electronic device, and the user can pass throughThe click operation steps are recorded to complete the learning process, thereby achieving the purpose of enriching the existing skills. For example, for a punch-through scenario, the user passes/>By inputting voice "nailing and punching card", thenScreen-recording page completion of application open/>Applying, entering a card punching page, clicking a screen recording operation of the card punching icon and the like, and clicking to complete learning; then, the user is/>By applying the input voice command of 'nailing and punching card', the electronic equipment can sequentially execute the following simulated clicking operations: opening/>-Entering a step of clicking a punch icon on a punch page. For example, for shopping scenarios, the user passes/>Application inputs a voice "teach you to buy clothes", then at/>Screen-recording page completion of application open/>-Clicking a search box-entering a screen of clothes and clicking to complete learning; then, the user is in wisdom/>By applying the voice command of buying clothes, the electronic equipment can sequentially execute the following simulated clicking operations: opening/>-Clicking on a search box-entering clothing etc.

In order to solve the technical problem that the electronic equipment in the related art needs to be re-adapted to the UX layout change of the target application to correctly execute the simulated click operation, the embodiment of the invention provides a simulated click method. In some embodiments, the electronic device determines at least one target instruction according to a voice instruction input by a user, the target instruction includes target control semantics and operation content, determines a first target control from the acquired screen controls according to the target control semantics, and executes simulated click operation according to the operation content and the first target control.

FIG. 3 is a flow chart of a method of simulating a click in some embodiments, as shown in FIG. 3, the method comprising:

step 102, the electronic device obtains at least one target instruction according to a voice instruction input by a user, wherein the target instruction comprises target control semantics and operation content.

When the user needs to perform the simulated click operation on the electronic equipment, a voice command can be input to the electronic equipment, and the electronic equipment can acquire the voice command input by the user through the microphone. In some embodiments, the electronic device may be configured toApplication realizes the simulated clicking method, so that the user can wake up/> on the electronic equipmentFor example, the user may wake up the "smart voice" by voice wake-up or power key wake-up, and input voice instructions via the "smart voice". The electronic device may then perform a simulated click operation indicated by the voice command.

In some embodiments, the voice instructions may include instructions to simulate a click operation for a third party application or instructions to simulate a click operation for a native application installed in the electronic device prior to shipment. For example, the third party application includesWhen, the voice instruction may include a target/>Instructions to perform a simulated click operation, the voice instructions may include "WeChat to little say hello"; for another example, where the native application comprises a telephone, the voice command may comprise a command to perform an analog click operation on the telephone, and the voice command may comprise "call to small.

After the electronic device obtains the voice command, the electronic device needs to obtain the target command through the voice command so as to execute the simulated clicking operation through the target command.

In one possible implementation, if the atomic instruction set is set in the server, step 102 may include:

step S12, the electronic equipment converts the voice instruction input by the user into a voice stream and sends the voice stream to the server.

Step S14, the server determines the intention of the user according to the voice stream.

In step S14, the electronic device recognizes the voice stream to generate text information, and in some embodiments, the electronic device may recognize the voice command through ASR technology to generate text information; the electronic device then identifies the text information to generate a user intent, which in some embodiments may be generated by the electronic device identifying the text information via NLU techniques.

In the embodiment of the invention, the user intention can comprise intention identification information and intention slot information, wherein the intention identification information can be used for representing the purpose of the user, and the intention slot information is used for representing specific content of the intention. For example, the voice command is a voice of "SEND WeChat to small clear you good", the text information is a text of "SEND WeChat to small clear you good", the intention identification information is "send_ SOCIAL _message", and the slot information includes "app: wechat "," sender: zhang Sang "," receiver: litetral "and" content: you are good. And S16, the server determines an instruction sequence matched with the user intention from the set atomic instruction set, wherein the instruction sequence comprises at least one target instruction.

In the embodiment of the invention, the electronic equipment can set an atomic instruction set, or the server can set the atomic instruction set. The atomic instruction set is a combination of atomic instructions, and the file type of the atomic instruction set may be application package name_update version number.

The atomic instruction set includes a plurality of instructions. The instructions may include operational content, which is a specific action of clicking on the control, e.g., the operational content may include a single click, a double click, a slide click, a long press, or text fill, etc., where the single click may include a left click or a right click. In one possible implementation, the instructions may further include control attribute information, e.g., the control attribute information may include at least one of control identification, coordinate area, descriptive text, layout context information, clickable, visible; in one possible implementation, the instructions may also include supplemental information, e.g., the supplemental information may include an operation interval, a number of retries, etc.

In the embodiment of the invention, part of the instructions in the atomic instruction set can also comprise control semantics, the atomic instruction set can support semantic definition by adding the control semantics in the instructions, the instructions comprising the control semantics in the atomic instruction set can be called target instructions, and the control semantics in the target instructions can be called target control semantics. The target instructions in the atomic instruction set may include target control semantics and operational content, optionally the target instructions may also include control attribute information, and further the target instructions may also include supplemental information.

In the embodiment of the invention, the instruction which does not comprise the control semantics in the atomic instruction set can be called as a non-target instruction, and the non-target instruction can comprise operation content, optionally, the non-target instruction can also comprise control attribute information, and further, the non-target instruction can also comprise supplementary information.

For example, the instruction sequence formed by the instructions in the atomic instruction set is as follows:

In the target instruction of the instruction sequence, the control semantic is SearchBar, the control attribute information is (Class: android. Widget. Edit text), and the operation content is "para [ keywords ].

Then in step S16, since the user' S intention includes the intention identification information and the intention slot information, the server may determine an instruction sequence matching the intention identification information and the intention slot information from the atomic instruction set, the instruction sequence may include at least one target instruction, and optionally, the instruction sequence may further include at least one non-target instruction. The intended identification information is "send_ SOCIAL _message" and the slot information includes "app: wechat "," sender: zhang Sang "," receiver: litetral "and" content: for example, the plurality of instructions in the matched instruction sequence may include a check instruction, an open WeChat instruction, a search button search instruction, a search box input instruction, a search button click instruction, a search result search instruction, a text input box input instruction, a click send instruction, a session end instruction, and the like in sequence, where the check instruction, the open WeChat instruction, and the session end instruction are non-target instructions, and the search button search instruction, the search box input instruction, the search button click instruction, the search result search instruction, the text input box input instruction, and the click send instruction are target instructions.

Step S18, the electronic equipment receives an instruction sequence returned by the server.

In another possible implementation, if the atomic instruction set is provided in the electronic device, step 102 may include:

Step S22, the electronic equipment converts the voice instruction input by the user into a voice stream.

Step S24, the electronic equipment determines the intention of the user according to the voice stream.

Step S26, the electronic equipment determines an instruction sequence matched with the intention of the user from the set atomic instruction set, wherein the instruction sequence comprises at least one target instruction.

The descriptions of step S22 to step S26 can be referred to the descriptions of step S12 to step S16, and are not repeated here.

Step 104, the electronic equipment determines a first target control from the acquired screen controls according to the semantics of the target control.

In the embodiment of the invention, the control can comprise a system control or a custom control. If the control is divided according to whether the control is visible or not, a control visible to a user or a control invisible to the user exists on the current screen interface of the electronic device, for example, the control visible to the user can comprise pictures, characters, options, icons, buttons, search boxes, sending boxes, input boxes and the like, and the control invisible to the user comprises blank input boxes, intervals, frames and the like.

In the embodiment of the invention, the control can be abstractly defined to obtain the control semantics of the control. In some embodiments, the control semantics can be used for describing the type of the control, and the control semantics can be obtained by performing semantic training on the type of the control, and then the control semantics are the control type, and the specific description can be seen in the following table 1 and the description of the content in table 1. In other embodiments, the control semantics are used for describing an application scene of the control, semantic training can be performed on the application scene of the control to obtain the semantic control, the control semantics are control description information, if the control has description text, the description text of the control can be identified by using an NLU technology to generate the control description information, and the control description information is used as the control semantics, for example, the description text can be used for describing the application scene of the control, and the control is a search box of a news page or a search box of a shopping page, and because the application scenes are different, the text description information of the search box of the news page and the text description information of the search box of the shopping page are also different, so that the text description information of the search box of the news page and the search box of the shopping page have different control semantics.

For example, taking the search box as an example, the control identification of the search box changes due to the application version update. However, in the embodiment of the invention, the target control is positioned through the control semantics, so that the search box in the old version can be positioned through the control semantics of the search box type, and the search box in the new version can be positioned without positioning the target control through the control identification, thereby avoiding the problem that the target control cannot be positioned due to updating of the application version, and enabling the simulated clicking method to be compatible with the application version before and after updating.

Also for example toFor example, a control list on a contact list page, before 21 years/>The list in the version adopts a system control, and the list in the version is 'android. Widget. ListView'; last 22 yearsThe list in the version adopts a custom control, and then the list in the version is "com.tent.mm.view.x2c.x2clistview". The control semantics of the list in the two versions before and after updating are both list types, so that the list control in the old version can be positioned by the control semantics of the list type, and the list control in the new version can be positioned, the problem that the target control cannot be positioned due to the updating of the application version is avoided, and the simulated clicking method can be compatible with the application version before and after updating.

The control types of the controls are specifically described below through table 1. Wherein the control type may be predefined type information, table 1 shows the predefined control type.

TABLE 1

As shown in Table 1 above, the control types may include a search box type, a send box type, a check item type, a check box type, a list item, an image list type, a number selection type, a function switch type, a clickable view type, a slider type, or a web page view type.

If the control semantics are the search box type, the control corresponding to the control semantics is the search box, for example, the search box isSearch boxes for searching contacts. If the control semantics are the type of the sending frame, the control corresponding to the control semantics is the sending frame, for example, the sending frame is/>Input box of message. And if the control semantics are the check item type, the control corresponding to the control semantics is the check item. If the control semantics are the check box type, the control corresponding to the control semantics is a check box. And if the control semantics are of the list type, the control corresponding to the control semantics is of the list. If the control semantics are the list item type, the control corresponding to the control semantics is the list item. And if the control semantics are of the image list type, the control corresponding to the control semantics is of the image list. And if the control semantics are the digital selection type, the control corresponding to the control semantics is the digital selection. And if the control semantics are the function switch type, the control corresponding to the control semantics is the function switch. If the control semantics are the clickable view types, the control corresponding to the control semantics is the clickable view. And if the control semantics are of the slider type, the control corresponding to the control semantics is of the slider type. If the control semantics are the webpage view type, the control corresponding to the control semantics is the webpage view.

In the embodiment of the invention, the electronic device displays the current screen interface, and the current screen interface of the electronic device comprises a plurality of screen controls, so that the electronic device acquires the screen controls of the current screen interface before executing step 104. In particular, the electronic device may obtain, through the barrier-free service (AccessibilityService), a screen control tree of the current screen interface, the screen control tree including a plurality of child nodes, the child nodes including class paths, each child node corresponding to one screen control, each screen control including screen control attribute information, e.g., screen control attribute information including at least one of control identification, coordinate region, descriptive text, layout context information, clickable, visible.

In some embodiments, step 104 may comprise:

And S32, the electronic equipment performs semantic recognition on the screen control to generate screen control semantics of the screen control.

In some embodiments, fig. 4a is a flowchart of acquiring screen control semantics in some embodiments, as shown in fig. 4a, step S32 includes:

step S322, the electronic device traverses the obtained screen control tree to obtain child nodes, where each child node corresponds to one screen control and includes a class path.

For example, the electronic device traverses the screen control tree V to record a List of child nodes List < V >, which includes a plurality of child nodes.

For example, the screen control corresponding to one child node is "com.tent.mm.ui.widget.image view.weiimage view".

Step S324, the electronic equipment queries the control semantics corresponding to the class path from the set control semantics pool according to the class path.

In the embodiment of the invention, the electronic equipment can be provided with the control semantic pool, and the control semantic pool can be used for supporting the semantic recognition function. The control semantic pool includes a trained control semantic model that includes a class path (CLASSPATH) and control semantics corresponding to the class path. The electronic device can query out the control semantics corresponding to the class path through the control semantics model in the control semantics pool. For example, the queried control semantics are control type "Base".

And step S326, the electronic equipment determines the control semantics corresponding to the class path as screen control semantics.

For example, the electronic device determines the control type "Base" as the screen control semantic, and thus, the screen control semantic corresponding to the screen control "com.content.mm.ui.widget.imageview.webimageview" is "Base".

Step S44, the electronic device determines a first target control from the screen controls according to the screen control semantics and the target control semantics, wherein the screen control semantics of the first target control are the same as the target control semantics.

Specifically, the electronic device may match the target control semantics with the screen control semantics, and if the screen control semantics with the same target control semantics are matched, determine the screen control corresponding to the screen control semantics as the first target control.

For example, describing a target command cmd3 as an example, the cmd3 command is used to search for a search button and click on, and the cmd3 command is as follows:

In the target instruction cmd3, the target control semantic is "Base", and the screen control semantic is "Base", so that the screen control semantic is the same as the target control semantic, and the screen control "com.tent.mm.ui.widget.imageview.webimageview" can be determined as the first target control.

And 106, the electronic equipment executes the simulated clicking operation according to the operation content and the first target control.

For example, the content of the operation is text filling, and the first target control is a search box.

In one possible implementation, the specific implementation of step 106 may be seen in fig. 4b. FIG. 4b is a flow chart of performing a simulated click operation in some embodiments, as shown in FIG. 4b, step 106 may specifically include:

Step 1062, the electronic device determines that the number of the first target controls is one or more, and if the number of the first target controls is one, step 1064 is executed; if the number of first target controls is plural, then step 1066 is performed.

If the electronic device determines that the number of the first target controls is one, it indicates that the filtering operation is not needed to be performed on the first target controls, and step 1064 may be executed; if the number of first target controls is determined to be multiple, it indicates that one target control needs to be filtered out from the multiple first target controls, and step 1066 may be performed.

Step 1064, the electronic device executes the operation on the first target control according to the operation content, and the flow ends.

According to the embodiment of the invention, the electronic equipment executes the simulated clicking operation on the first target control in a mode of simulating and calling the control or in a mode of acquiring the coordinates of the control sitting in the area according to the operation content.

For example, the operation content is text filling "xiaoming", and the first target control is a search box, and the electronic device performs text filling in the search box to input "xiaoming" in the search box.

For another example, taking the target instruction cmd3 as an example:

Clicking the search button operation is performed on the first target control "com.

Step 1066, the electronic device, according to the target control attribute information and the screen control attribute information of the first target control, whether a second target control can be screened from the plurality of first target controls, where the screen control attribute information of the second target control is matched with the target control attribute information, if yes, executing step 1068; if not, step 1070 is performed.

In some embodiments, the screen control attribute information may include at least one of control identification, coordinate area, descriptive text, layout context information, clickable, visible; the target control attribute information may include at least one of control identification, coordinate area, descriptive text, layout context information, clickable, visible.

As an alternative, the screen control attribute information comprises a first control identifier, the target control attribute information comprises a second control identifier, the control identifier is an inherent attribute of the control, and the control identifier can be obtained in a getId mode. The electronic device can match the first control identifier with the second control identifier, and if the first control identifier is completely matched or partially matched with the second control identifier, it is determined that the screen control attribute information of the second target control is matched with the target control attribute information, so that the second target control can be screened out from a plurality of first target controls. The control identification is completely matched, namely that all fields of the first control identification and the second control identification are completely the same, and accurate matching of control attribute information can be achieved in a complete matching mode, so that accuracy of matching of the control attribute information is improved. The control identification partial matching means that partial fields of the first control identification and the second control identification are the same, and fuzzy matching of control attribute information can be achieved in a partial matching mode, so that compatibility of a scheme is improved.

As another alternative, the screen control attribute information includes a first coordinate region and the target control attribute information includes a second target region. The electronic device may match the first coordinate area with the second coordinate area, and if the first coordinate area is the same as the second coordinate area, for example, the first coordinate area and the second coordinate area are both upper half-screen areas, it is determined that the screen control attribute information of the second target control matches the target control attribute information, so that the second target control may be screened out from the plurality of first target controls. And the coordinate areas are adopted to filter the plurality of first target controls, so that the positioning range of the target controls can be narrowed.

As another alternative, the screen control attribute information comprises a first descriptive text, the target control attribute information comprises a second descriptive text, and the descriptive text is an inherent attribute of the control, and can be obtained in a 'Desc' mode. The electronic device can match the first description text with the second description text, and if the first description text is completely matched with the second description text or is partially matched with the second description text, the screen control attribute information of the second target control is determined to be matched with the target control attribute information, so that the second target control can be screened out from a plurality of first target controls. The control identification complete matching means that all fields of the first description text and the second description text are identical, and accurate matching of control attribute information can be achieved in a complete matching mode, so that accuracy of matching of the control attribute information is improved. The control identification partial matching means that partial fields of the first description text and the second description text are identical, and fuzzy matching of control attribute information can be achieved in a partial matching mode, so that compatibility of a scheme is improved.

As another alternative, for the first target control, some of which cannot be filtered through inherent attributes such as control identification or description text, the first target control may be filtered through layout context information, where the layout context information includes parent node information, the screen control attribute information includes first parent node information, the target control attribute information includes second parent node information, the electronic device may match the first parent node information with the second parent node information, and if the first parent node information is the same as the second parent node information, it is determined that the screen control attribute information of the second target control matches the target control attribute information, so that the second target control may be selected from the plurality of first target controls.

As another alternative, the screen control attribute information includes clickable, and the target control attribute information includes clickable, and the electronic device determines that the screen control attribute information of the second target control matches the target control attribute information, so that the second target control can be screened out from the plurality of first target controls.

As another alternative, the screen control attribute information includes a visible, and the target control attribute information includes a visible, and the electronic device determines that the screen control attribute information of the second target control matches the target control attribute information, so that the second target control can be screened out from the plurality of first target controls.

In practical applications, for different target applications, the target control attribute information may be one of control identification, coordinate area, descriptive text, layout context information, whether clickable and visible, or any combination thereof, which are not listed here.

Taking a target instruction cmd3 as an example, the attribute information of the target control is descriptive text, and the following instruction "selector": "(Desc: search)" it is known that the description text is "search", which can be obtained by "getContentDescription", that is, "view. The electronic device can match the target control attribute information search with the descriptive text of the screen control attribute information to screen a second target control from the plurality of first target controls.

And the electronic equipment executes the simulated clicking operation according to the operation content and the second target control.

Step 1068, the electronic device determining that the number of the second target controls is one or more, and executing step 1070 if the number of the second target controls is one; if the number of the second target controls is plural, step 1072 is performed.

If the electronic device determines that the number of the second target controls is one, it indicates that the filtering operation is not required to be performed on the second target controls, and step 1070 may be executed; if the number of the second target controls is determined to be multiple, it indicates that the plurality of second target controls still have the controls with the same semantics, the same type or the same control identifier, and one target control needs to be filtered out from the plurality of second target controls, and step 1072 may be executed.

In the embodiment of the invention, when the number of the first target controls is multiple, the electronic equipment can filter the multiple first target controls through the control attribute information, so that the positioning range of the target controls is reduced. If a second target control is directly filtered out, the simulated clicking operation can be directly carried out, so that the positioning efficiency of the target control is improved; if one second target control cannot be filtered out directly, namely, when a plurality of second target controls are filtered out, the selection range of the target control can be further narrowed, and the second target control can be conveniently selected by a subsequent user.

Step 1070, the electronic device executes the operation on the second target control according to the operation content, and the flow ends. According to the embodiment of the invention, the electronic equipment executes the simulated clicking operation on the second target control in a mode of simulating and calling the control or in a mode of acquiring the coordinates of the control sitting in the area according to the operation content.

Step 1072, the electronic device displays the prompt information of the second target control.

In some embodiments, a transparent layer is disposed above the control, so the electronic device may set the prompt information in the transparent layer above the second target control to display the prompt information of the second target control. For convenience of user to input control selection instruction by voice mode, the prompt information can be corner mark, for example, the corner mark can include number "1", "2" or "3".

Step 1074, the electronic device obtains a first control selection instruction input by the user according to the prompt information of the second target control.

The user inputs a first control selection instruction to the electronic equipment in a voice mode. The user speaks the number in the landmark, and the control selection instruction is a voice instruction for selecting the number in the landmark, for example, the first control selection instruction is a voice instruction for selecting the number "1" in the landmark.

Step 1076, the electronic device selects a third target control from the plurality of second target controls according to the first control selection instruction.

For example, the electronic device selects a third target control from the plurality of second target controls according to a voice command for selecting the number "1" in the corner mark.

And 1078, the electronic equipment executes the operation on the third target control according to the operation content, and the flow is ended.

According to the embodiment of the invention, the electronic equipment executes the simulated clicking operation on the third target control in a mode of simulating and calling the control or in a mode of acquiring the coordinates of the control sitting in the area according to the operation content. For example, the operation content is text filling "xiaoming", and the first target control is a search box, and the electronic device performs text filling in the search box to input "xiaoming" in the search box.

Step 1080, the electronic device displays the prompt information of the first target control.

In some embodiments, a transparent layer is disposed above the control, so the electronic device may set the prompt information in the transparent layer above the first target control to display the prompt information of the first target control. For convenience of user to input control selection instruction by voice mode, the prompt information can be corner mark, for example, the corner mark can include number "1", "2" or "3".

Step 1082, the electronic device obtains a second control selection instruction input by the user according to the prompt information of the first target control.

The user inputs a second control selection instruction to the electronic equipment in a voice mode. The user speaks the number in the landmark, and the second control selection instruction is a voice instruction for selecting the number in the landmark, for example, the second control selection instruction is a voice instruction for selecting the number "2" in the landmark.

Step 1084, the electronic device selects a third target control from the plurality of first target controls according to the second control selection instruction.

For example, the electronic device selects a third target control from the plurality of first target controls according to a voice command for selecting the number "2" in the corner mark.

Step 1086, the electronic device executes the operation on the third target control according to the operation content.

For example, the electronic device performs text filling in the search box to input "xiaoming" in the search box.

In another possible implementation, the specific implementation of step 106 may be seen in fig. 4c. FIG. 4c is a flowchart of performing a simulated click operation in other embodiments, as shown in FIG. 4c, step 106 may specifically include:

Step 1062, the electronic device determines that the number of the first target controls is one or more, and if the number of the first target controls is one, step 1064 is executed; if the number of the first target controls is plural, step S1 is executed.

Step S1, the electronic equipment acquires a human eye gazing area of a user, wherein the human eye gazing area is an area gazed by the user on a current screen interface.

In some embodiments, the electronic device may obtain eye movement data of the user through the camera; the electronic equipment inputs the eye movement data into an eye movement data estimation model and outputs an estimation result, wherein the estimation result is the probability that a plurality of subareas divided in a current screen interface are watched by the eyes of a user; the electronic equipment selects the maximum probability from the probabilities that the plurality of subareas are watched by the eyes of the user, and the subarea with the maximum probability is determined as the eye watched area.

For example, the plurality of sub-areas includes an upper half-screen area and a lower half-screen area, and since the probability that the upper half-screen area is gazed by the human eye of the user is large, the upper half-screen area is determined as the human eye gazing area.

Step S2, the electronic equipment screens out a designated target control from a plurality of first target controls according to the eye-gazing area, wherein the designated target control is the first target control positioned in the eye-gazing area.

For example, the human eye gazing area is an upper half screen area, two search boxes in the plurality of first target controls are respectively located in the upper half screen area and the lower half screen area, and then the search box located in the upper half screen is determined to be the designated target control.

In the embodiment of the present invention, if the electronic device cannot screen the designated target control from the plurality of first target controls according to the eye gaze area, for example, if the first target control does not exist in the upper half screen area as the eye gaze area, step S3 is not executed any more, step 1066 in fig. 4b may be executed, and the control filtering process is continued.

Step S3, the electronic equipment judges that the number of the specified target controls is one or more, and if the number of the specified target controls is one, step S4 is executed; if the number of the designated target controls is plural, step S5 is executed.

And S4, the electronic equipment executes the operation on the designated target control according to the operation content, and the process is ended.

Step S5, the electronic equipment judges whether a second target control can be screened from a plurality of specified target controls according to the target control attribute information and the acquired screen control attribute information of the specified target controls, the screen control attribute information of the second target control is matched with the target control attribute information, and if yes, step 1068 is executed; if not, step 1080 is performed.

Step 1070, the electronic device executes the operation on the second target control according to the operation content, and the flow ends.

Step 1072, the electronic device displays the first prompt information to the user.

Step 1074, the electronic device obtains a first control selection instruction input by the user according to the first prompt information.

Step 1080, the electronic device presents the second prompt to the user.

Step 1082, the electronic device obtains a second control selection instruction input by the user according to the second prompt information.

As shown in fig. 4c, the descriptions of the steps S3 to S5 can be referred to the descriptions of the steps 1062 to 1066, and are not repeated here. In the scheme of fig. 4c, the electronic device filters the first target control through the obtained eye gaze area, so that the positioning range of the target control is narrowed. If a designated target control is directly filtered out, the simulated clicking operation can be directly carried out, so that the positioning efficiency of the target control is improved; if one appointed target control cannot be filtered out directly, namely when a plurality of appointed target controls are filtered out, the screening quantity of the target controls can be further reduced, so that the calculation amount of the subsequent target control positioning process is reduced, and the positioning efficiency of the target controls is further improved. Thus, the electronic device completes the simulated clicking operation for an item mark instruction. Then, if multiple target instructions are determined in step 104, the electronic device may repeatedly execute steps 104 to 106 to complete the simulated click operation for the remaining target instructions.

In an embodiment of the present invention, in a possible implementation manner, after step 104, the electronic device does not determine, according to the semantics of the target control, the first target control from the acquired screen controls, for example, if the acquired screen controls are all unknown controls, if there is a case that the first target control cannot be determined, ending the session for the voice instruction input by the user, and no longer executing steps 104 to 106.

The following describes the simulated clicking method in the embodiment of the present invention in detail by taking a user through an intelligent assistant sending WeChat as an example with reference to FIGS. 2, 5a and 5 b. FIG. 5a is a flow chart of an analog click method in other embodiments, and FIG. 5b is a flow chart of instructions executed in other embodiments, as shown in FIGS. 5a and 5b, the method comprising:

Step 202, a user inputs a voice command to the interface component.

For example, the voice command is "WeChat give little to say you good".

Step 204, the interface component converts the voice command into a voice stream and sends the voice stream to the voice recognition component.

Step 206, the voice recognition component recognizes the voice stream to generate text information.

For example, the speech recognition component can generate text information by recognizing a speech stream through ASR techniques.

Step 208, the speech recognition component sends the text information to the user intent component.

Step 210, the user intention component identifies the text information to generate the user intention.

For example, the user intent component identifies text information through NLU techniques to generate user intent.

Step 212, the user intent component sends the user intent to the session management component.

Step 214, the session management component sends a query instruction to the atomic instruction module, the query instruction including a user intent.

And step 216, the atomic instruction module queries an instruction sequence matched with the user intention from the set atomic instruction set according to the user intention.

Step 218, the atomic instruction module returns a query result to the session management component, where the query result includes an instruction sequence.

Step 220, the session management component parses the instruction sequence from the query result.

Step 222, the session management component issues a sequence of instructions to the interface component.

Step 224, the interface component outputs the instruction sequence to the instruction execution component.

For example, the plurality of instructions in the instruction sequence sequentially comprise: checking instructions, opening a micro-letter instruction, searching a search button instruction, inputting a search box instruction, clicking a search button instruction, selecting a search result instruction, inputting a text input box instruction, clicking a sending instruction and ending a session instruction.

Step 226, the instruction execution component executes the instruction sequence.

Specifically, step 226 may include:

cmd1: the instruction execution component responds to the checking instruction and checks whether the simulated click is supported, if so, the step cmd2 is executed; if not, the session is ended.

Cmd2: the instruction execution component opens the micro-letter in response to an open micro-letter instruction.

Cmd3: the instruction execution component, in response to a find search button instruction, finds a search button and clicks on the search button.

Cmd4: the instruction execution component inputs a "small Ming" in the search box in response to the search box inputting an instruction.

Cmd5: the instruction execution component clicks the search button in response to clicking the search button instruction.

Cmd6: the instruction execution component, in response to selecting the search result instruction, selects the search result and clicks "mins" in the list.

Cmd7: the instruction execution component, in response to the text input box entering an instruction, looks up the text input box in the dialog page and enters "hello" in the text input box.

Cmd8: the instruction execution component looks up the send button and clicks the send button in response to clicking the send instruction.

Cmd9: the instruction execution component ends the session in response to the session end instruction to complete the instruction execution process.

As shown in FIG. 5b, each target instruction in steps cmd3 to cmd8 can be executed by step 104 in FIG. 3 and steps in FIG. 4b or FIG. 4c, which are not described here.

In the embodiment of the invention, the target control is determined through the control semantics, so that the problem that the target control cannot be positioned due to update of the application version is avoided, the simulated clicking method can be compatible with the application version before and after the update, the electronic equipment can execute correct simulated clicking operation without re-adapting UX layout change of the target application, thereby reducing maintenance cost and improving user experience.

In the embodiment of the invention, after the target application is updated, the electronic equipment can realize the simulated click operation without re-adapting UX layout change of the target application, so that the correct simulated click operation can be realized under the condition that the application of the intelligent assistant is not updated, one version of application can support the frequently updated target application to realize the simulated click operation, the service stability is enhanced, the maintenance cost of the current simulated click scheme is greatly reduced, and the user experience is improved.

In the embodiment of the invention, the control semantics of the control are obtained through diversified and rich abstract definitions of the control, the control is identified through the control semantics, and the identified control is subjected to auxiliary filtering through the control attribute information, so that the accurate selection of the target control is realized.

FIG. 6 is a schematic diagram of an exemplary pointing device in some embodiments, including an acquisition module 1, a determination module 2, and an exemplary pointing module 3, as shown in FIG. 6. The acquisition module 1 is used for acquiring at least one target instruction according to a voice instruction input by a user, wherein the target instruction comprises target control semantics and operation content; the determining module 2 is used for determining a first target control from the acquired screen controls according to the target control semantics; the simulated click module 3 is configured to execute a simulated click operation according to the operation content and the first target control.

In a possible implementation, the acquisition module 1 comprises a conversion unit 11 and a transceiving unit 12. The conversion unit 11 is configured to convert the voice command into a voice stream. The transceiver unit 12 is configured to send the voice stream to a server, so that the server determines a user intention according to the voice stream, and determines an instruction sequence matching the user intention from a set atomic instruction set, where the instruction sequence includes at least one target instruction; and receiving the instruction sequence returned by the server.

In one possible implementation, the acquisition module 1 comprises a conversion unit 11, a first determination unit 13 and a second determination unit 14. The conversion unit 11 is configured to convert the voice command into a voice stream; the first determining unit 13 is configured to determine a user intention according to the voice stream; the second determining unit 14 is configured to determine, from the set atomic instruction set, an instruction sequence matching the user intention, according to the user intention, the instruction sequence including at least one target instruction.

In a possible implementation, the determination module 2 comprises a semantic recognition unit 21 and a third determination unit 22. The semantic recognition unit 21 is used for carrying out semantic recognition on the screen control and generating screen control semantics of the screen control; the third determining unit 22 is configured to determine, according to the screen control semantics and the target control semantics, the first target control from the screen controls, where the screen control semantics of the first target control is the same as the target control semantics.

In a possible implementation manner, the semantic recognition unit 21 is configured to traverse the acquired screen control tree to acquire child nodes, where each child node corresponds to one of the screen controls, and the child nodes include a class path; inquiring control semantics corresponding to the class path from a set control semantics pool according to the class path; and determining the control semantics corresponding to the class path as the screen control semantics.

In one possible implementation, the simulated click module 3 comprises a click unit 31. The clicking unit 31 is configured to execute, according to the operation content, an operation on the first target control if the number of the first target controls is one.

In a possible implementation, the target instruction further includes target control attribute information, and the simulated click module 3 includes a first filtering unit 32 and a clicking unit 31. The first screening unit 32 is configured to screen, if the number of the first target controls is multiple, a second target control from the multiple first target controls according to the target control attribute information and the acquired screen control attribute information of the first target control, where the screen control attribute information of the second target control is matched with the target control attribute information; the clicking unit 31 is configured to perform a simulated clicking operation according to the operation content and the second target control.

In a possible implementation, the analog click module 3 comprises a first acquisition unit 33, a second screening unit 34 and a click unit 31. The first obtaining unit 33 is configured to obtain, if the number of the first target controls is plural, a human eye gazing area of the user, where the human eye gazing area is an area gazed by the user on the current screen interface; the second screening unit 34 is configured to screen a specified target control from a plurality of first target controls according to the eye-gazing area, where the specified target control is a first target control located in the eye-gazing area; the clicking unit 31 is configured to execute, according to the operation content, an operation on the specified target control if the number of specified target controls is one.

In one possible implementation, the target instruction further includes target control attribute information; the simulated click module comprises a third screening unit 35. The third screening unit 35 is configured to screen, if the number of specified target controls is multiple, a second target control from the multiple specified target controls according to the target control attribute information and the acquired screen control attribute information of the specified target controls, where the screen control attribute information of the second target control is matched with the target control attribute information; the clicking unit 31 is further configured to perform a simulated clicking operation according to the operation content and the second target control.

In one possible implementation manner, the clicking unit 31 is configured to execute, according to the operation content, an operation on the second target control if the number of the second target controls is one.

In one possible implementation, fig. 7 is a schematic structural diagram of a clicking unit in some embodiments, and as shown in fig. 7, the clicking unit 31 includes a presentation subunit 311, an acquisition subunit 312, a selection subunit 313, and an operation subunit 314. The display subunit 311 is configured to display the prompt information of the second target control if the number of the second target controls is multiple; the obtaining subunit 312 is configured to obtain a first control selection instruction input by the user according to the prompt information of the second target control; the selecting subunit 313 is configured to select, according to the first control selecting instruction, a third target control from the plurality of second target controls; the operation subunit 314 is configured to execute, according to the operation content, an operation on the third target control.

In a possible implementation manner, the target instruction further includes target control attribute information, and the simulated click module 3 includes a display unit 36, a second obtaining unit 37, a selecting unit 38, and an operation unit 39. The display unit 36 is configured to display, if the number of the first target controls is multiple, prompt information of the first target control when a second target control is not screened from the multiple first target controls according to the target control attribute information and the acquired screen control attribute information of the first target control by the first screening unit 32; the second obtaining unit 37 is configured to obtain a second control selection instruction input by the user according to the prompt information of the first target control; the selecting unit 38 is configured to select a third target control from the plurality of first target controls according to the second control selection instruction; the operation unit 39 is configured to perform an operation on the third target control according to the operation content.

According to the simulated click device provided by the embodiment of the invention, the target control is determined through the control semantics, the problem that the target control cannot be positioned due to update of the application version is avoided, the simulated click method can be compatible with the application version before and after update, the electronic equipment can execute correct simulated click operation without re-adapting UX layout change of the target application, so that maintenance cost is reduced, and user experience is improved.

The embodiment of the invention provides electronic equipment, which comprises: a display screen; one or more processors; a memory; and one or more computer programs, wherein the one or more computer programs are stored in the memory, the one or more computer programs comprising instructions that, when executed by the electronic device, enable the electronic device to perform the various steps of the simulated click method embodiments described above.

Embodiments of the present invention provide a computer readable storage medium having instructions stored therein that, when executed on a computer, cause the computer to perform the steps of the simulated click method embodiments described above.

Embodiments of the present invention provide a computer program product comprising instructions which, when run on a computer or on any of at least one processor, cause the computer to perform the steps of the simulated click method embodiments described above.

In the embodiments of the present invention, "at least one" means one or more, and "a plurality" means two or more. "and/or", describes an association relation of association objects, and indicates that there may be three kinds of relations, for example, a and/or B, and may indicate that a alone exists, a and B together, and B alone exists. Wherein A, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one of the following" and the like means any combination of these items, including any combination of single or plural items. For example, at least one of a, b and c may represent: a, b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c may be single or plural.

Those of ordinary skill in the art will appreciate that the various elements and algorithm steps described in the embodiments disclosed herein can be implemented as a combination of electronic hardware, computer software, and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.

In several embodiments provided by the present invention, any of the functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The foregoing is merely exemplary embodiments of the present invention, and any person skilled in the art may easily conceive of changes or substitutions within the technical scope of the present invention, which should be covered by the present invention. The protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A method of simulating clicking, the method being applied to an electronic device, the method comprising:

2. The method of claim 1, wherein the user-entered voice command to obtain at least one target command comprises:

Converting the voice instruction into a voice stream;

And receiving the instruction sequence returned by the server.

3. The method of claim 1, wherein the obtaining at least one target instruction according to the voice instruction input by the user comprises:

Converting the voice instruction into a voice stream;

determining the intention of a user according to the voice stream;

4. The method of claim 1, wherein the determining a first target control from the acquired screen controls according to the target control semantics comprises:

5. The method of claim 4, wherein the semantic recognition of the screen control to generate screen control semantics of the screen control comprises:

6. The method of claim 1, wherein the performing a simulated click operation according to the operation content and the first target control comprises:

7. The method of claim 1, wherein the target instruction further includes target control attribute information, and wherein the performing a simulated click operation according to the operation content and the first target control includes:

8. The method of claim 1, wherein the performing a simulated click operation according to the operation content and the first target control comprises:

9. The method of claim 8, wherein the target instruction further comprises target control attribute information, the method further comprising:

10. The method according to claim 7 or 9, wherein the performing a simulated click operation according to the operation content and the second target control comprises:

11. The method according to claim 7 or 9, wherein the performing a simulated click operation according to the operation content and the second target control comprises:

12. The method of claim 1, the target instruction further comprising target control attribute information, the performing a simulated click operation according to the operation content and the first target control comprising:

13. An electronic device, comprising: a display screen; one or more processors; a memory; and one or more computer programs, wherein the one or more computer programs are stored in the memory, the one or more computer programs comprising instructions that, when executed by the electronic device, cause the electronic device to perform the simulated click method of any of claims 1-12.

14. A computer readable storage medium, characterized in that the computer readable storage medium comprises a stored program, wherein the program when run controls an electronic device in which the computer readable storage medium is located to perform the simulated click method of any one of claims 1 to 12.

15. A computer program product comprising instructions which, when run on a computer or any of the at least one processor, cause the computer to perform the simulated click method of any of claims 1 to 12.